freebsd-skq

Author	SHA1	Message	Date
zec	d78a1b1a82	Change the curvnet variable from a global const struct vnet , previously always pointing to the default vnet context, to a dynamically changing thread-local one. The currvnet context should be set on entry to networking code via CURVNET_SET() macros, and reverted to previous state via CURVNET_RESTORE(). Recursions on curvnet are permitted, though strongly discuouraged. This change should have no functional impact on nooptions VIMAGE kernel builds, where CURVNET_ macros expand to whitespace. The curthread->td_vnet (aka curvnet) variable's purpose is to be an indicator of the vnet context in which the current network-related operation takes place, in case we cannot deduce the current vnet context from any other source, such as by looking at mbuf's m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so far curvnet has turned out to be an invaluable consistency checking aid: it helps to catch cases when sockets, ifnets or any other vnet-aware structures may have leaked from one vnet to another. The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros was a result of an empirical iterative process, whith an aim to reduce recursions on CURVNET_SET() to a minimum, while still reducing the scope of CURVNET_SET() to networking only operations - the alternative would be calling CURVNET_SET() on each system call entry. In general, curvnet has to be set in three typicall cases: when processing socket-related requests from userspace or from within the kernel; when processing inbound traffic flowing from device drivers to upper layers of the networking stack, and when executing timer-driven networking functions. This change also introduces a DDB subcommand to show the list of all vnet instances. Approved by: julian (mentor)	2009-05-05 10:56:12 +00:00
zec	7a56f17240	Make indentation more uniform accross vnet container structs. This is a purely cosmetic / NOP change. Reviewed by: bz Approved by: julian (mentor) Verified by: svn diff -x -w producing no output	2009-05-02 08:16:26 +00:00
bms	0c2462747f	Limit scope of acquisition of INP_RLOCK for multicast input filter to the scope of its use, even though this may thrash the lock if the INP is referenced for other purposes. Tested by: David Wolfskill	2009-05-01 11:05:24 +00:00
zec	39b6dc8ba2	Permit buiding kernels with options VIMAGE, restricted to only a single active network stack instance. Turning on options VIMAGE at compile time yields the following changes relative to default kernel build: 1) V_ accessor macros for virtualized variables resolve to structure fields via base pointers, instead of being resolved as fields in global structs or plain global variables. As an example, V_ifnet becomes: options VIMAGE: ((struct vnet_net ) vnet_net)->_ifnet default build: vnet_net_0._ifnet options VIMAGE_GLOBALS: ifnet 2) INIT_VNET_ macros will declare and set up base pointers to be used by V_ accessor macros, instead of resolving to whitespace: INIT_VNET_NET(ifp->if_vnet); becomes struct vnet_net vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET]; 3) Memory for vnet modules registered via vnet_mod_register() is now allocated at run time in sys/kern/kern_vimage.c, instead of per vnet module structs being declared as globals. If required, vnet modules can now request the framework to provide them with allocated bzeroed memory by filling in the vmi_size field in their vmi_modinfo structures. 4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are extended to hold a pointer to the parent vnet. options VIMAGE builds will fill in those fields as required. 5) curvnet is introduced as a new global variable in options VIMAGE builds, always pointing to the default and only struct vnet. 6) struct sysctl_oid has been extended with additional two fields to store major and minor virtualization module identifiers, oid_v_subs and oid_v_mod. SYSCTL_V_ family of macros will fill in those fields accordingly, and store the offset in the appropriate vnet container struct in oid_arg1. In sysctl handlers dealing with virtualized sysctls, the SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target variable and make it available in arg1 variable for further processing. Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have been deleted. Reviewed by: bz, rwatson Approved by: julian (mentor)	2009-04-30 13:36:26 +00:00
bms	32a71137f0	Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit: import from p4 bms_netdev. Summary of changes: * Connect netinet6/in6_mcast.c to build. The legacy KAME KPIs are mostly preserved. * Eliminate now dead code from ip6_output.c. Don't do mbuf bingo, we are not going to do RFC 2292 style CMSG tricks for multicast options as they are not required by any current IPv6 normative reference. * Refactor transports (UDP, raw_ip6) to do own mcast filtering. SCTP, TCP unaffected by this change. * Add ip6_msource, in6_msource structs to in6_var.h. * Hookup mld_ifinfo state to in6_ifextra, allocate from domifattach path. * Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced. Kernel consumers which need this should use in6m_lookup(). * Refactor IPv6 socket group memberships to use a vector (like IPv4). * Update ifmcstat(8) for IPv6 SSM. * Add witness lock order for IN6_MULTI_LOCK. * Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths. * Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup. * Update carp(4) for new IPv6 SSM KPIs. * Virtualize ip6_mrouter socket. Changes mostly localized to IPv6 MROUTING. * Don't do a local group lookup in MROUTING. * Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge(). * Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode. * Bump __FreeBSD_version to 800084. * Update UPDATING. NOTE WELL: * This code hasn't been tested against real MLDv2 queriers (yet), although the on-wire protocol has been verified in Wireshark. * There are a few unresolved issues in the socket layer APIs to do with scope ID propagation. * There is a LOR present in ip6_output()'s use of in6_setscope() which needs to be resolved. See comments in mld6.c. This is believed to be benign and can't be avoided for the moment without re-introducing an indirect netisr. This work was mostly derived from the IGMPv3 implementation, and has been sponsored by a third party.	2009-04-29 19:19:13 +00:00
bms	59d63727b6	Add MLDv2 protocol header, but do not connect it to the build.	2009-04-29 11:31:23 +00:00
bms	94f4dd57ca	Import IPv6 SSM module but do not connect it to the build.	2009-04-29 11:26:45 +00:00
bms	027b0363c6	Add IN6ADDR_LINKLOCAL_ALLV2ROUTERS_INIT, in6addr_linklocal_allv2routers for use by MLDv2. Add IPv6 SSM socket layer membership vector size constants and tree bounds. Remove unreferenced struct ipv6_mreq_source; SSM for IPv6 goes straight to the RFC 3678 socket options.	2009-04-29 10:22:44 +00:00
zec	8d976eab5c	In preparation for turning on options VIMAGE in next commits, rearrange / replace / adjust several INIT_VNET_* initializer macros, all of which currently resolve to whitespace. Reviewed by: bz (an older version of the patch) Approved by: julian (mentor)	2009-04-26 22:06:42 +00:00
bz	2dc7da25f0	Compare protosw pointer with NULL. MFC after: 1 month	2009-04-23 17:41:54 +00:00
rwatson	6dff27073d	Assert the interface address list lock in IFP_TO_IA6(), as it will iterate the interface address list. Marginally expand IF_ADDR_LOCK() coverage in mld6.c to make sure it's held when IFP_TO_IA6() is called. MFC after: 2 weeks	2009-04-20 22:56:34 +00:00
rwatson	22bdc8dd64	Prefer structure fields (ifa_link) to macro aliases for them (ifa_list). MFC after: 2 weeks	2009-04-20 22:45:21 +00:00
rwatson	6fc60785e7	Acquire interface address list lock around access to if_addrhead, closing several writer-writer races, and some read-write races. MFC after: 2 weeks	2009-04-20 21:37:46 +00:00
rwatson	084ce14c28	Use TAILQ_FOREACH() and TAILQ_FOREACH_SAFE() rather than manually accessing queue(9) structure fields for if_addrhead. Prefer FreeBSD field name if_addrhead to compatibility macro if_addrlist. MFC after: 2 weeks	2009-04-20 21:05:37 +00:00
rwatson	eb422ada74	Close some but not all writer-writer races when maintaining IPv6 interface address lists by locking the interface address list lock. MFC after: 2 weeks	2009-04-20 16:05:16 +00:00
rwatson	859e97941f	Lock interface address lists before iterating over them in nd6. MFC after: 2 weeks	2009-04-20 14:41:23 +00:00
kmacy	24b38efdce	Change if_output to take a struct route as its fourth argument in order to allow passing a cached struct llentry * down to L2 Reviewed by: rwatson	2009-04-16 20:30:28 +00:00
kmacy	52b9562b83	add an llentry to struct route{_in6} to allow it to be passed around with the rtentry	2009-04-15 20:34:19 +00:00
rwatson	4801e9aee9	Update stats in struct icmpstat and icmp6stat using four new macros: ICMPSTAT_ADD(), ICMPSTAT_INC(), ICMP6STAT_ADD(), and ICMP6STAT_INC(), rather than directly manipulating the fields of these structures across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures. In on case, icmp6stat members are manipulated indirectly, by icmp6_errcount(), and this will require further work to fix for per-CPU stats. MFC after: 3 days	2009-04-12 13:22:33 +00:00
rwatson	6958bbf168	Commit file omitted in r190962: Update stats in struct udpstat using two new macros, UDPSTAT_ADD() and UDPSTAT_INC(), rather than directly manipulating the fields across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures. MFC after: 3 days	2009-04-12 11:53:12 +00:00
zec	b39b54e6de	Introduce vnet module registration / initialization framework with dependency tracking and ordering enforcement. With this change, per-vnet initialization functions introduced with r190787 are no longer directly called from traditional initialization functions (which cc in most cases inlined to pre-r190787 code), but are instead registered via the vnet framework first, and are invoked only after all prerequisite modules have been initialized. In the long run, this framework should allow us to both initialize and dismantle multiple vnet instances in a correct order. The problem this change aims to solve is how to replay the initialization sequence of various network stack components, which have been traditionally triggered via different mechanisms (SYSINIT, protosw). Note that this initialization sequence was and still can be subtly different depending on whether certain pieces of code have been statically compiled into the kernel, loaded as modules by boot loader, or kldloaded at run time. The approach is simple - we record the initialization sequence established by the traditional mechanisms whenever vnet_mod_register() is called for a particular vnet module. The vnet_mod_register_multi() variant allows a single initializer function to be registered multiple times but with different arguments - currently this is only used in kern/uipc_domain.c by net_add_domain() with different struct domain * as arguments, which allows for protosw-registered initialization routines to be invoked in a correct order by the new vnet initialization framework. For the purpose of identifying vnet modules, each vnet module has to have a unique ID, which is statically assigned in sys/vimage.h. Dynamic assignment of vnet module IDs is not supported yet. A vnet module may specify a single prerequisite module at registration time by filling in the vmi_dependson field of its vnet_modinfo struct with the ID of the module it depends on. Unless specified otherwise, all vnet modules depend on VNET_MOD_NET (container for ifnet list head, rt_tables etc.), which thus has to and will always be initialized first. The framework will panic if it detects any unresolved dependencies before completing system initialization. Detection of unresolved dependencies for vnet modules registered after boot (kldloaded modules) is not provided. Note that the fact that each module can specify only a single prerequisite may become problematic in the long run. In particular, INET6 depends on INET being already instantiated, due to TCP / UDP structures residing in INET container. IPSEC also depends on INET, which will in turn additionally complicate making INET6-only kernel configs a reality. The entire registration framework can be compiled out by turning on the VIMAGE_GLOBALS kernel config option. Reviewed by: bz Approved by: julian (mentor)	2009-04-11 05:58:58 +00:00
zec	c85551e0bc	First pass at separating per-vnet initializer functions from existing functions for initializing global state. At this stage, the new per-vnet initializer functions are directly called from the existing global initialization code, which should in most cases result in compiler inlining those new functions, hence yielding a near-zero functional change. Modify the existing initializer functions which are invoked via protosw, like ip_init() et. al., to allow them to be invoked multiple times, i.e. per each vnet. Global state, if any, is initialized only if such functions are called within the context of vnet0, which will be determined via the IS_DEFAULT_VNET(curvnet) check (currently always true). While here, V_irtualize a few remaining global UMA zones used by net/netinet/netipsec networking code. While it is not yet clear to me or anybody else whether this is the right thing to do, at this stage this makes the code more readable, and makes it easier to track uncollected UMA-zone-backed objects on vnet removal. In the long run, it's quite possible that some form of shared use of UMA zone pools among multiple vnets should be considered. Bump __FreeBSD_version due to changes in layout of structs vnet_ipfw, vnet_inet and vnet_net. Approved by: julian (mentor)	2009-04-06 22:29:41 +00:00
bms	76f193cd69	Introduce a number of changes to the MROUTING code. This is purely a forwarding plane cleanup; no control plane code is involved. Summary: * Split IPv4 and IPv6 MROUTING support. The static compile-time kernel option remains the same, however, the modules may now be built for IPv4 and IPv6 separately as ip_mroute_mod and ip6_mroute_mod. * Clean up the IPv4 multicast forwarding code to use BSD queue and hash table constructs. Don't build our own timer abstractions when ratecheck() and timevalclear() etc will do. * Expose the multicast forwarding cache (MFC) and virtual interface table (VIF) as sysctls, to reduce netstat's dependence on libkvm for this information for running kernels. * bandwidth meters however still require libkvm. * Make the MFC hash table size a boot/load-time tunable ULONG, net.inet.ip.mfchashsize (defaults to 256). * Remove unused members from struct vif and struct mfc. * Kill RSVP support, as no current RSVP implementation uses it. These stubs could be moved to raw_ip.c. * Don't share locks or initialization between IPv4 and IPv6. * Don't use a static struct route_in6 in ip6_mroute.c. The v6 code is still using a cached struct route_in6, this is moved to mif6 for the time being. * More cleanup remains to be merged from ip_mroute.c to ip6_mroute.c. v4 path tested using ports/net/mcast-tools. v6 changes are mostly mechanical locking and have not been tested. As these changes partially break some kernel ABIs, they will not be MFCed. There is a lot more work to be done here. Reviewed by: Pavlin Radoslavov	2009-03-19 01:43:03 +00:00
rwatson	70b6a8119c	Remove IFF_NEEDSGIANT, a compatibility infrastructure introduced in FreeBSD 5.x to allow network device drivers to run with Giant despite the network stack being Giant-free. This significantly simplifies calls into ioctl() on network interfaces, especially in the multicast code, as well as eliminates deferred invocation of interface if_start routines. Disable the build on device drivers still depending on IFF_NEEDSGIANT as they no longer compile. They will be removed in a few weeks if they haven't been made MPSAFE in that time. Disabled drivers: if_ar if_axe if_aue if_cdce if_cue if_kue if_ray if_rue if_rum if_sr if_udav if_ural if_zyd Drivers that were already disabled because of tty changes: if_ppp if_sl Discussed on: arch@	2009-03-15 14:21:05 +00:00
rwatson	038bfe209e	Correct a number of evolved problems with inp_vflag and inp_flags: certain flags that should have been in inp_flags ended up in inp_vflag, meaning that they were inconsistently locked, and in one case, interpreted. Move the following flags from inp_vflag to gaps in the inp_flags space (and clean up the inp_flags constants to make gaps more obvious to future takers): INP_TIMEWAIT INP_SOCKREF INP_ONESBCAST INP_DROPPED Some aspects of this change have no effect on kernel ABI at all, as these are UDP/TCP/IP-internal uses; however, netstat and sockstat detect INP_TIMEWAIT when listing TCP sockets, so any MFC will need to take this into account. MFC after: 1 week (or after dependencies are MFC'd) Reviewed by: bz	2009-03-15 09:58:31 +00:00
marius	74f63d4ce1	On architectures with strict alignment requirements compensate the misalignment of the IP header that prepending the EtherIP header might have caused. PR: 131921 MFC after: 1 week	2009-03-07 19:08:58 +00:00
bz	59d53a5bdb	Start removing IPv6 Type 0 Routing header code. RH0 was deprecated by RFC 5095. While most of the code had been disabled by #if 0 already, leave a bit of infrastructure for possible RH2 code and a log message under BURN_BRIDGES in case a user still tries to send RH0 packets. Reviewed by: gnn (a bit back, earlier version)	2009-03-03 13:12:12 +00:00
bz	4321e2a8f4	Add size-guards evaluated at compile-time to the main struct vnet_* which are not in a module of their own like gif. Single kernel compiles and universe will fail if the size of the struct changes. Th expected values are given in sys/vimage.h. See the comments where how to handle this. Requested by: peter	2009-03-01 11:01:00 +00:00
bz	df2be82cec	For all files including net/vnet.h directly include opt_route.h and net/route.h. Remove the hidden include of opt_route.h and net/route.h from net/vnet.h. We need to make sure that both opt_route.h and net/route.h are included before net/vnet.h because of the way MRT figures out the number of FIBs from the kernel option. If we do not, we end up with the default number of 1 when including net/vnet.h and array sizes are wrong. This does not change the list of files which depend on opt_route.h but we can identify them now more easily.	2009-02-27 14:12:05 +00:00
bz	710220924b	Shuffle the vimage.h includes or add where missing.	2009-02-27 13:22:26 +00:00
rwatson	52c114d8ab	Assert the radix head lock in in6_rtqkill(). MFC after: 3 days	2009-02-23 22:58:59 +00:00
bz	8d30abae87	Try to remove/assimilate as much of formerly IPv4/6 specific (duplicate) code in sys/netipsec/ipsec.c and fold it into common, INET/6 independent functions. The file local functions ipsec4_setspidx_inpcb() and ipsec6_setspidx_inpcb() were 1:1 identical after the change in r186528. Rename to ipsec_setspidx_inpcb() and remove the duplicate. Public functions ipsec[46]_get_policy() were 1:1 identical. Remove one copy and merge in the factored out code from ipsec_get_policy() into the other. The public function left is now called ipsec_get_policy() and callers were adapted. Public functions ipsec[46]_set_policy() were 1:1 identical. Rename file local ipsec_set_policy() function to ipsec_set_policy_internal(). Remove one copy of the public functions, rename the other to ipsec_set_policy() and adapt callers. Public functions ipsec[46]_hdrsiz() were logically identical (ignoring one questionable assert in the v6 version). Rename the file local ipsec_hdrsiz() to ipsec_hdrsiz_internal(), the public function to ipsec_hdrsiz(), remove the duplicate copy and adapt the callers. The v6 version had been unused anyway. Cleanup comments. Public functions ipsec[46]_in_reject() were logically identical apart from statistics. Move the common code into a file local ipsec46_in_reject() leaving vimage+statistics in small AF specific wrapper functions. Note: unfortunately we already have a public ipsec_in_reject(). Reviewed by: sam Discussed with: rwatson (renaming to *_internal) MFC after: 26 days X-MFC: keep wrapper functions for public symbols?	2009-02-08 09:27:07 +00:00
jamie	aac9010144	Don't bother null-checking the thread pointer before the prison checks in udp6_connect (td is already dereferenced elsewhere without such a check). This makes the conversion from a sockaddr to a sockaddr_in6 always happen, so convert once at the beginning of the function rather than twice in the middle. Approved by: bz (mentor)	2009-02-05 15:04:23 +00:00
jamie	bbcda547da	Remove redundant calls of prison_local_ip4 in in_pcbbind_setup, and of prison_local_ip6 in in6_pcbbind. Approved by: bz (mentor)	2009-02-05 14:25:53 +00:00
jamie	12bbe1869f	Standardize the various prison_foo_ip[46] functions and prison_if to return zero on success and an error code otherwise. The possible errors are EADDRNOTAVAIL if an address being checked for doesn't match the prison, and EAFNOSUPPORT if the prison doesn't have any addresses in that address family. For most callers of these functions, use the returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or EINVAL. Always include a jailed() check in these functions, where a non-jailed cred always returns success (and makes no changes). Remove the explicit jailed() checks that preceded many of the function calls. Approved by: bz (mentor)	2009-02-05 14:06:09 +00:00
bz	5af7ae8eac	When iterating through the list trying to find a router in defrouter_select(), NULL the cached llentry after unlocking as we are no longer interested in it and with the second iteration would try to unlock it again resulting in panic: Lock (rw) lle not locked @ ... Reported by: Mark Atkinson <m.atkinson@f5.com> Tested by: Mark Atkinson <m.atkinson@f5.com> PR: kern/128247 (in follow-up, unrelated to original report)	2009-02-04 10:35:27 +00:00
rrs	520c389cb4	- Cleanup checksum code. - Prepare for CRC offloading, add MIB counters (RS/MT). - Bugfix: Disable CRC computation for IPv6 addresses with local scope (MT). - Bugfix: Handle close() with SO_LINGER correctly when notifications are generated during the close() call(MT). - Bugfix: Generate DRY event when sender is dry during subscription. Only for 1-to-1 style sockets (RS/MT) - Bugfix: Put vtags for the correct amount of time into time-wait (MT). - Bugfix: Clear vtag entries correctly on expiration (MT). - Bugfix: shutdown() indicates ENOTCONN when called for unconnected 1-to-1 style sockets (MT). - Bugfix: In sctp Auth code (PL). - Add support for devices that support SCTP csum offload (igb). - Add missing sctp_associd to mib sysctl xsctp_tcb structure (RS) Obtained from: With help from Peter Lei and Michael Tuexen	2009-02-03 11:04:03 +00:00
bz	5d8f0a53a7	Remove the single global unlocked route cache ip6_forward_rt from the inet6 stack along with statistics and make sure we properly free the rt in all cases. While the current situation is not better performance wise it prevents panics seen more often these days. After more inet6 and ipsec cleanup we should be able to improve the situation again passing the rt to ip6_forward directly. Leave the ip6_forward_rt entry in struct vinet6 but mark it for removal. PR: kern/128247, kern/131038 MFC after: 25 days Committed from: Bugathon #6 Tested by: Denis Ahrens <denis@h3q.com> (different initial version)	2009-02-01 21:11:08 +00:00
bz	033060866c	Remove unused local MACROs. Submitted by: Christoph Mallon christoph.mallon@gmx.de MFC after: 2 weeks	2009-01-31 17:35:44 +00:00
bz	b3bbe5cac1	Coalesce two consecutive #ifdef IPSEC blocks. Move the skip_ipsec: label below the goto as we can never have ipsecrt set if we get to that label so there is no need to check. MFC after: 2 weeks	2009-01-31 12:24:53 +00:00
bz	f922834f0f	Remove dead code from #if 0: we do not have an ipsrcchk_rt anywhere else. MFC after: 2 weeks	2009-01-31 11:19:20 +00:00
bz	226b2a700e	Like with r185713 make sure to not leak a lock as rtalloc1(9) returns a locked route. Thus we have to use RTFREE_LOCKED(9) to get it unlocked and rtfree(9)d rather than just rtfree(9)d. Since the PR was filed, new places with the same problem were added with new code. Also check that the rt is valid before freeing it either way there. PR: kern/129793 Submitted by: Dheeraj Reddy <dheeraj@ece.gatech.edu> MFC after: 2 weeks Committed from: Bugathon #6	2009-01-31 10:48:02 +00:00
bz	ec7e619d54	Remove 4 entirely unsued ip6 variables. Leave then in struct vinet6 to not break the ABI with kernel modules but mark them for removal so we can do it in one batch when the time is right. MFC after: 1 month	2009-01-30 23:40:24 +00:00
bz	6dddd78341	For consistency with prison_{local,remote,check}_ipN rename prison_getipN to prison_get_ipN. Submitted by: jamie (as part of a larger patch) MFC after: 1 week	2009-01-25 10:11:58 +00:00
sam	b278e68100	remove too noisy DIAGNOSTIC code Reviewed by: qingli	2009-01-18 07:20:02 +00:00
qingli	751dff3610	Revive the RTF_LLINFO flag in route.h. The kernel code is guarded by the new kernel option COMPAT_ROUTE_FLAGS for binary backward compatibility. The RTF_LLDATA flag maps to the same value as RTF_LLINFO. RTF_LLDATA is used by the arp and ndp utilities. The RTF_LLDATA flag is always returned to the userland regardless whether the COMPAT_ROUTE_FLAGS is defined.	2009-01-12 11:24:32 +00:00
bz	ffd2421407	Restrict arp, ndp and theoretically the FIB listing (if not read with libkvm) to the addresses of a prison, when inside a jail. [1] As the patch from the PR was pre-'new-arp', add checks to the llt_dump handlers as well. While touching RTM_GET in route_output(), consistently use curthread credentials rather than the creds from the socket there. [2] PR: kern/68189 Submitted by: Mark Delany <sxcg2-fuwxj@qmda.emu.st> [1] Discussed with: rwatson [2] Reviewed by: rwatson MFC after: 4 weeks	2009-01-09 21:57:49 +00:00
bz	60c950d4ff	Make SIOCGIFADDR and related, as well as SIOCGIFADDR_IN6 and related jail-aware. Up to now we returned the first address of the interface for SIOCGIFADDR w/o an ifr_addr in the query. This caused problems for programs querying for an address but running inside a jail, as the address returned usually did not belong to the jail. Like for v6, if there was an ifr_addr given on v4, you could probe for more addresses on the interfaces that you were not allowed to see from inside a jail. Return an error (EADDRNOTAVAIL) in that case now unless the address is on the given interface and valid for the jail. PR: kern/114325 Reviewed by: rwatson MFC after: 4 weeks	2009-01-09 13:06:56 +00:00
rrs	fcaf24fb54	Addresses Roberts comments on comments. Also adds the KASSERT and checks suggested. Reviewed by: The udp tunneling was discussed on net@ under the thread entitled "Heads up -- Thinking about UDP and tunneling"	2009-01-06 13:27:56 +00:00
rrs	8bff422255	Add the ability of an alternate transport protocol to easily tunnel over udp by providing a hook function that will be called instead of appending to the socket buffer.	2009-01-06 12:13:40 +00:00
bz	086c4b5b79	Switch the last protosw* structs to C99 initializers. Reviewed by: ed, julian, Christoph Mallon <christoph.mallon@gmx.de> MFC after: 2 weeks	2009-01-05 20:29:01 +00:00
rwatson	e259848db5	Unlike with struct protosw, several instances of struct ip6protosw did not use C99-style sparse structure initialization, so remove NULL assignments for now-removed pr_usrreq function pointers. Reported by: Chris Ruiz <yr.retarded at gmail.com>	2009-01-04 21:53:42 +00:00
rwatson	6db41b8313	struct ip6protosw is a copy of struct protosw, so remove pr_usrreq there to reflect removal from struct protosw. Spotted by: ed	2009-01-04 21:13:51 +00:00
qingli	efe3f87721	Some modules such as SCTP supplies a valid route entry as an input argument to ip_output(). The destionation is represented in a sockaddr{} object that may contain other pieces of information, e.g., port number. This same destination sockaddr{} object may be passed into L2 code, which could be used to create a L2 entry. Since there exists a L2 table per address family, the L2 lookup function can make address family specific comparison instead of the generic bcmp() operation over the entire sockaddr{} structure. Note in the IPv6 case the sin6_scope_id is not compared because the address is currently stored in the embedded form inside the kernel. The in6_lltable_lookup() has to account for the scope-id if this storage format were to change in the future.	2009-01-03 00:27:28 +00:00
qingli	1d851edfc0	This checkin addresses a couple of issues: 1. The "route" command allows route insertion through the interface-direct option "-iface". During if_attach(), an sockaddr_dl{} entry is created for the interface and is part of the interface address list. This sockaddr_dl{} entry describes the interface in detail. The "route" command selects this entry as the "gateway" object when the "-iface" option is present. The "arp" and "ndp" commands also interact with the kernel through the routing socket when adding and removing static L2 entries. The static L2 information is also provided through the "gateway" object with an AF_LINK family type, similar to what is provided by the "route" command. In order to differentiate between these two types of operations, a RTF_LLDATA flag is introduced. This flag is set by the "arp" and "ndp" commands when issuing the add and delete commands. This flag is also set in each L2 entry returned by the kernel. The "arp" and "ndp" command follows a convention where a RTM_GET is issued first followed by a RTM_ADD/DELETE. This RTM_GET request fills in the fields for a "rtm" object, which is reinjected into the kernel by a subsequent RTM_ADD/DELETE command. The entry returend from RTM_GET is a prefix route, so the RTF_LLDATA flag must be specified when issuing the RTM_ADD/DELETE messages. 2. Enforce the convention that NET_RT_FLAGS with a 0 w_arg is the specification for retrieving L2 information. Also optimized the code logic. Reviewed by: julian	2008-12-26 19:45:24 +00:00
kmacy	7c3c2fbe0c	avoid lock recursion by deferring the link check until after LLE lock is dropped	2008-12-24 01:08:18 +00:00
bz	f9f31751ac	Correct variable name in comment. MFC after: 4 weeks	2008-12-22 12:54:52 +00:00
qingli	8f88fc89cb	Similar to the INET case, do not destroy the nd6 entries for interface addresses until those addresses are removed. I already made the patch in INET but forgot to bring the code over for INET6.	2008-12-22 07:11:15 +00:00
bz	fcc42d6a25	Only unlock the llentry if it is actually valid. Reported by: ed	2008-12-18 19:09:14 +00:00
bz	b1db56aa98	Another step assimilating IPv[46] PCB code: normalize IN6P_* compat flags usage to their equialent INP_* counterpart. Discussed with: rwatson Reviewed by: rwatson MFC after: 4 weeks	2008-12-17 13:00:18 +00:00
bz	ea0d9d2e9a	Use inc_flags instead of the inc_isipv6 alias which so far had been the only flag with random usage patterns. Switch inc_flags to be used as a real bit field by using INC_ISIPV6 with bitops to check for the 'isipv6' condition. While here fix a place or two where in case of v4 inc_flags were not properly initialized before.[1] Found by: rwatson during review [1] Discussed with: rwatson Reviewed by: rwatson MFC after: 4 weeks	2008-12-17 12:52:34 +00:00
qingli	c6b6112234	Remove the rt argument from nd6_storelladdr() because rt is no longer accessed.	2008-12-17 10:27:34 +00:00
qingli	3bfc2293f2	A couple of files were not meant to be committed.	2008-12-17 10:19:53 +00:00
qingli	c6a0a000ca	in6_clsroute() was applied to prefix routes causing some of them to expire. in6_clsroute() was only applied to cloned routes that are no longer applicable after the arp-v2 commit.	2008-12-17 10:03:49 +00:00
kmacy	43e7c1af8b	* Compare pointer with NULL * Remove trailing whitespace (added in r186162) * Reduce indentation by rephrasing test Submitted by: Christopher Mallon (christoph dot mallon at gmx dot de)	2008-12-16 23:56:24 +00:00
kmacy	447f3863c6	- Simplify handling of the deferring of mbuf transmit until after lle lock drop - add a couple of comments to clarify intent	2008-12-16 23:06:36 +00:00
kmacy	222f4e20a8	check pointers against NULL	2008-12-16 06:01:08 +00:00
kmacy	c501489004	convert more pointer validation checks to checking against NULL	2008-12-16 03:12:44 +00:00
kmacy	e73e761720	simplify locking in find_pfxlist_reachable_router	2008-12-16 03:05:18 +00:00
kmacy	bf113303e6	explicitly check return of lla_lookup against NULL	2008-12-16 02:47:22 +00:00
kmacy	ed9ff236d5	advance tail pointer in nd6_output_lle and check lla_output return against NULL	2008-12-16 02:33:53 +00:00
kmacy	54c2e2ce52	check return from lla_lookup against NULL not zero	2008-12-16 02:30:42 +00:00
kmacy	aca7e14bdb	make sure redirect doesn't return without dropping the lock	2008-12-16 02:06:26 +00:00
kmacy	9682e6d337	need to check that lle is not null before unlock if the break condition is not met also fix the break condition to explicitly check against NULL	2008-12-16 02:05:11 +00:00
kmacy	0b5a9dada1	unlock the llentry after use in find_pfxlist_reachable_router	2008-12-16 01:58:30 +00:00
qingli	e1f9a89b0d	Initialize the variable "router", and apply "static_route" flag across the entire nd6_cache_lladdr() function.	2008-12-16 01:21:19 +00:00
kmacy	c9eebde165	unlock and destroy an llentry's lock before freeing Found by: sam	2008-12-16 00:20:49 +00:00
kmacy	505bc29767	unlock looked up llentrys in defrouter_select	2008-12-16 00:18:04 +00:00
kmacy	8cc0e3cda9	fix two use after frees in nd6_cache_lladdr caused by last minute unlock shuffling	2008-12-16 00:16:51 +00:00
bz	03f6bb9dc9	Another step assimilating IPv[46] PCB code - directly use the inpcb names rather than the following IPv6 compat macros: in6pcb,in6p_sp, in6p_ip6_nxt,in6p_flowinfo,in6p_vflag, in6p_flags,in6p_socket,in6p_lport,in6p_fport,in6p_ppcb and sotoin6pcb(). Apart from removing duplicate code in netipsec, this is a pure whitespace, not a functional change. Discussed with: rwatson Reviewed by: rwatson (version before review requested changes) MFC after: 4 weeks (set the timer and see then)	2008-12-15 21:50:54 +00:00
qingli	ec826ad5c7	This main goals of this project are: 1. separating L2 tables (ARP, NDP) from the L3 routing tables 2. removing as much locking dependencies among these layers as possible to allow for some parallelism in the search operations 3. simplify the logic in the routing code, The most notable end result is the obsolescent of the route cloning (RTF_CLONING) concept, which translated into code reduction in both IPv4 ARP and IPv6 NDP related modules, and size reduction in struct rtentry{}. The change in design obsoletes the semantics of RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland applications such as "arp" and "ndp" have been modified to reflect those changes. The output from "netstat -r" shows only the routing entries. Quite a few developers have contributed to this project in the past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and Andre Oppermann. And most recently: - Kip Macy revised the locking code completely, thus completing the last piece of the puzzle, Kip has also been conducting active functional testing - Sam Leffler has helped me improving/refactoring the code, and provided valuable reviews - Julian Elischer setup the perforce tree for me and has helped me maintaining that branch before the svn conversion	2008-12-15 06:10:57 +00:00
kmacy	9c68b9dedd	in6_addroute is called through rnh_addadr which is always called with the radix node head lock held exclusively. Pass RTF_RNH_LOCKED to rtalloc so that rtalloc1_fib will not try to re-acquire the lock.	2008-12-13 20:15:42 +00:00
bz	98e7fe0e6a	Second round of putting global variables, which were virtualized but formerly missed under VIMAGE_GLOBAL. Put the extern declarations of the virtualized globals under VIMAGE_GLOBAL as the globals themsevles are already. This will help by the time when we are going to remove the globals entirely. Sponsored by: The FreeBSD Foundation	2008-12-13 19:13:03 +00:00
kmacy	408657ab73	RTF_RNH_LOCKED needs to be passed in the flags arg not report, apologies to thompsa	2008-12-12 02:07:45 +00:00
thompsa	7a14f2c921	Pass RTF_RNH_LOCKED to rtalloc1 sunce the node head is locked, this avoids a recursive lock panic on inet6 detach. Reviewed by: kmacy	2008-12-12 01:46:59 +00:00
bz	83a32f8750	Put a global variables, which were virtualized but formerly missed under VIMAGE_GLOBAL. Start putting the extern declarations of the virtualized globals under VIMAGE_GLOBAL as the globals themsevles are already. This will help by the time when we are going to remove the globals entirely. While there garbage collect a few dead externs from ip6_var.h. Sponsored by: The FreeBSD Foundation	2008-12-11 16:26:38 +00:00
zec	7b573d1496	Conditionally compile out V_ globals while instantiating the appropriate container structures, depending on VIMAGE_GLOBALS compile time option. Make VIMAGE_GLOBALS a new compile-time option, which by default will not be defined, resulting in instatiations of global variables selected for V_irtualization (enclosed in #ifdef VIMAGE_GLOBALS blocks) to be effectively compiled out. Instantiate new global container structures to hold V_irtualized variables: vnet_net_0, vnet_inet_0, vnet_inet6_0, vnet_ipsec_0, vnet_netgraph_0, and vnet_gif_0. Update the VSYM() macro so that depending on VIMAGE_GLOBALS the V_ macros resolve either to the original globals, or to fields inside container structures, i.e. effectively #ifdef VIMAGE_GLOBALS #define V_rt_tables rt_tables #else #define V_rt_tables vnet_net_0._rt_tables #endif Update SYSCTL_V_*() macros to operate either on globals or on fields inside container structs. Extend the internal kldsym() lookups with the ability to resolve selected fields inside the virtualization container structs. This applies only to the fields which are explicitly registered for kldsym() visibility via VNET_MOD_DECLARE() and vnet_mod_register(), currently this is done only in sys/net/if.c. Fix a few broken instances of MODULE_GLOBAL() macro use in SCTP code, and modify the MODULE_GLOBAL() macro to resolve to V_ macros, which in turn result in proper code being generated depending on VIMAGE_GLOBALS. De-virtualize local static variables in sys/contrib/pf/net/pf_subr.c which were prematurely V_irtualized by automated V_ prepending scripts during earlier merging steps. PF virtualization will be done separately, most probably after next PF import. Convert a few variable initializations at instantiation to initialization in init functions, most notably in ipfw. Also convert TUNABLE_INT() initializers for V_ variables to TUNABLE_FETCH_INT() in initializer functions. Discussed at: devsummit Strassburg Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-12-10 23:12:39 +00:00
imp	689d225f30	Add missing include to sys/lock.h before sys/rwlock.h	2008-12-08 00:28:21 +00:00
kmacy	598b522b42	- convert radix node head lock from mutex to rwlock - make radix node head lock not recursive - fix LOR in rtexpunge - fix LOR in rtredirect Reviewed by: sam	2008-12-07 21:15:43 +00:00
rrs	0f2b9dafa3	Code from the hack-session known as the IETF (and a bit of debugging afterwards): - Fix protection code for notification generation. - Decouple associd from vtag - Allow vtags to have less strigent requirements in non-uniqueness. o don't pre-hash them when you issue one in a cookie. o Allow duplicates and use addresses and ports to discriminate amongst the duplicates during lookup. - Add support for the NAT draft draft-ietf-behave-sctpnat-00, this is still experimental and needs more extensive testing with the Jason Butt ipfw changes. - Support for the SENDER_DRY event to get DTLS in OpenSSL working with a set of patches from Michael Tuexen (hopefully heading to OpenSSL soon). - Update the support of SCTP-AUTH by Peter Lei. - Use macros for refcounting. - Fix MTU for UDP encapsulation. - Fix reporting back of unsent data. - Update assoc send counter handling to be consistent with endpoint sent counter. - Fix a bug in PR-SCTP. - Fix so we only send another FWD-TSN when a SACK arrives IF and only if the adv-peer-ack point progressed. However we still make sure a timer is running if we do have an adv_peer_ack point. - Fix PR-SCTP bug where chunks were retransmitted if they are sent unreliable but not abandoned yet. With the help of: Michael Teuxen and Peter Lei :-) MFC after: 4 weeks	2008-12-06 13:19:54 +00:00
bz	604d89458a	Rather than using hidden includes (with cicular dependencies), directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files. For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h. Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation	2008-12-02 21:37:28 +00:00
bz	d2730d5b27	MFp4: Bring in updated jail support from bz_jail branch. This enhances the current jail implementation to permit multiple addresses per jail. In addtion to IPv4, IPv6 is supported as well. Due to updated checks it is even possible to have jails without an IP address at all, which basically gives one a chroot with restricted process view, no networking,.. SCTP support was updated and supports IPv6 in jails as well. Cpuset support permits jails to be bound to specific processor sets after creation. Jails can have an unrestricted (no duplicate protection, etc.) name in addition to the hostname. The jail name cannot be changed from within a jail and is considered to be used for management purposes or as audit-token in the future. DDB 'show jails' command was added to aid debugging. Proper compat support permits 32bit jail binaries to be used on 64bit systems to manage jails. Also backward compatibility was preserved where possible: for jail v1 syscalls, as well as with user space management utilities. Both jail as well as prison version were updated for the new features. A gap was intentionally left as the intermediate versions had been used by various patches floating around the last years. Bump __FreeBSD_version for the afore mentioned and in kernel changes. Special thanks to: - Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches and Olivier Houchard (cognet) for initial single-IPv6 patches. - Jeff Roberson (jeff) and Randall Stewart (rrs) for their help, ideas and review on cpuset and SCTP support. - Robert Watson (rwatson) for lots and lots of help, discussions, suggestions and review of most of the patch at various stages. - John Baldwin (jhb) for his help. - Simon L. Nielsen (simon) as early adopter testing changes on cluster machines as well as all the testers and people who provided feedback the last months on freebsd-jail and other channels. - My employer, CK Software GmbH, for the support so I could work on this. Reviewed by: (see above) MFC after: 3 months (this is just so that I get the mail) X-MFC Before: 7.2-RELEASE if possible	2008-11-29 14:32:14 +00:00
zec	7ecd715d48	Unhide declarations of network stack virtualization structs from underneath #ifdef VIMAGE blocks. This change introduces some churn in #include ordering and nesting throughout the network stack and drivers but is not expected to cause any additional issues. In the next step this will allow us to instantiate the virtualization container structures and switch from using global variables to their "containerized" counterparts. Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-11-28 23:30:51 +00:00
bz	864141180e	Merge in6_pcbfree() into in_pcbfree() which after the previous IPsec change in r185366 only differed in two additonal IPv6 lines. Rather than splattering conditional code everywhere add the v6 check centrally at this single place. Reviewed by: rwatson (as part of a larger changset) MFC after: 6 weeks () () possibly need to leave a stub wrapper in 7 to keep the symbol.	2008-11-27 12:04:35 +00:00
bz	9ef49d8b6f	Unify ipsec[46]_delete_pcbpolicy in ipsec_delete_pcbpolicy. Ignoring different names because of macros (in6pcb, in6p_sp) and inp vs. in6p variable name both functions were entirely identical. Reviewed by: rwatson (as part of a larger changeset) MFC after: 6 weeks () () possibly need to leave a stub wrappers in 7 to keep the symbols.	2008-11-27 10:43:08 +00:00
zec	95a15f5c84	Merge more of currently non-functional (i.e. resolving to whitespace) macros from p4/vimage branch. Do a better job at enclosing all instantiations of globals scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks. De-virtualize and mark as const saorder_state_alive and saorder_state_any arrays from ipsec code, given that they are never updated at runtime, so virtualizing them would be pointless. Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-11-26 22:32:07 +00:00
bz	b089198d31	Remove in6_pcbdetach() as it is exactly the same function as in_pcbdetach() and we don't need the code twice. Reviewed by: rwatson MFC after: 6 weeks () () possibly need to leave a stub wrapper in 7 to keep the symbol.	2008-11-26 20:52:26 +00:00
bz	2df0dfae25	Unify the v4 and v6 versions of pcbdetach and pcbfree as good as possible so that they are easily diffable. No functional changes. Reviewed by: rwatson MFC after: 6 weeks	2008-11-26 12:54:31 +00:00
bz	3c7e39c293	Plug a credential leak in case the inpcb is freed by in6_pcbfree() instead of in_pcbfree(); missed in r183606. Reviewed by: rwatson MFC after: 3 days (instantly for 7.1-RC?)	2008-11-26 12:24:18 +00:00
zec	815d52c5df	Change the initialization methodology for global variables scheduled for virtualization. Instead of initializing the affected global variables at instatiation, assign initial values to them in initializer functions. As a rule, initialization at instatiation for such variables should never be introduced again from now on. Furthermore, enclose all instantiations of such global variables in #ifdef VIMAGE_GLOBALS blocks. Essentialy, this change should have zero functional impact. In the next phase of merging network stack virtualization infrastructure from p4/vimage branch, the new initialization methology will allow us to switch between using global variables and their counterparts residing in virtualization containers with minimum code churn, and in the long run allow us to intialize multiple instances of such container structures. Discussed at: devsummit Strassburg Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-11-19 09:39:34 +00:00
rwatson	0db6d4519c	Add a MAC label, MAC Framework, and MAC policy entry points for IPv6 fragment reassembly queues. This allows policies to label reassembly queues, perform access control checks when matching fragments to a queue, update a queue label when fragments are matched, and label the resulting reassembled datagram. Obtained from: TrustedBSD Project	2008-10-26 22:45:18 +00:00
des	a1e1ad22e0	Fix a number of style issues in the MALLOC / FREE commit. I've tried to be careful not to fix anything that was already broken; the NFSv4 code is particularly bad in this respect.	2008-10-23 20:26:15 +00:00
des	66f807ed8b	Retire the MALLOC and FREE macros. They are an abomination unto style(9). MFC after: 3 months	2008-10-23 15:53:51 +00:00
bz	0991899a98	Bring over the change switching from using sequential to random ephemeral port allocation as implemented in netinet/in_pcb.c rev. 1.143 (initially from OpenBSD) and follow-up commits during the last four and a half years including rev. 1.157, 1.162 and 1.199. This now is relying on the same infrastructure as has been implemented in in_pcb.c since rev. 1.199. Reviewed by: silby, rpaulo, mlaier MFC after: 2 months	2008-10-20 18:43:59 +00:00
bz	88b6e9b1ce	Check that the mbuf len is positive (like we do in the v4 case). Read the other way round this means that even with the checks the m_len turned negative in some cases which led to panics. The reason to my understanding seems to be that the checks are wrong (also for v4) ignoring possible padding when checking cmsg_len or padding after data when adjusting the mbuf. Doing proper cheks seems to break applications like named so further investigation and regression tests are needed. PR: kern/119123 Tested by: Ashish Shukla wahjava gmail.com MFC after: 3 days	2008-10-15 19:24:18 +00:00
rwatson	fabec516f4	When disconnecting a UDPv6 socket, acquire the socket lock around the changing of the so_state field, as is done in UDPv4. Remove XXX locking comment. MFC after: 3 days	2008-10-12 20:01:32 +00:00
bz	d3e532c1da	Style changes: compare pointer to NULL and move a }. MFC after: 6 weeks	2008-10-04 17:07:58 +00:00
bz	77f80e0672	Cache so_cred as inp_cred in the inpcb. This means that inp_cred is always there, even after the socket has gone away. It also means that it is constant for the lifetime of the inp. Both facts lead to simpler code and possibly less locking. Suggested by: rwatson Reviewed by: rwatson MFC after: 6 weeks X-MFC Note: use a inp_pspare for inp_cred	2008-10-04 15:06:34 +00:00
zec	8797d4caec	Step 1.5 of importing the network stack virtualization infrastructure from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit Introduce INIT_VNET_() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs. Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT(). Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.). All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(). () netipsec/keysock.c did not validate depending on compile time options. Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-10-02 15:37:58 +00:00
cperciva	678568e481	Default to ignoring potentially evil IPv6 Neighbor Solicitation messages. Approved by: so (cperciva) Approved by: re (kensmith) Security: FreeBSD-SA-08:10.nd6 Thanks to: jinmei, bz	2008-10-02 00:32:59 +00:00
rwatson	12ddb86062	When invoking the udp_send() from udp6_send() due to use of a v6-mapped IPv4 address, first drop the udbinfo and inpcb locks, which will otherwise be recursed. This leads to a potential minor race, but is preferable to a deadlock when acquiring a read lock after a write lock on the inpcb. MFC after: 3 days Reported by: Norbert Papke <fbsd-ml@scrapper.ca>, lioux	2008-09-22 06:44:03 +00:00
bz	83b4eaaa16	mld_timerresid() returns ms so instead of doing the maths in usec and then dividing down to ms, do the maths in ms. Obtained from: NetBSD mld6.c rev. 1.47 MFC after: 2 months	2008-09-10 19:42:13 +00:00
simon	6bb93e188c	- Fix amd64 local privilege escalation. [08:07] - Fix nmount(2) local privilege escalation. [08:08] - Fix IPv6 remote kernel panics. [08:09] Fix for [08:07] is merge of r181823. Submitted by: kib [08:07], csjp [08:08], bz [08:09] Reviewed by: peter [08:07], jhb [08:07] Reviewed by: jinmei [08:09], rwatson [08:09] Approved by: re (SA blanket) Approved by: so (simon) Security: FreeBSD-SA-08:07.amd64 Security: FreeBSD-SA-08:08.nmount Security: FreeBSD-SA-08:09.icmp6	2008-09-03 19:09:47 +00:00
bz	78dd3921ca	Fix a bug, when a specially crafted ICMPV6 MLD packet could lead to an integer divide by zero panic in the kernel, if the kernel was run with hz<1000. Neither i386, pc98, amd64 or sparc64 are affected in the currently supported branches and default configuration. Submitted by: Miikka Saukko, Ossi Herrala and Jukka Taimisto from the CROSS project at Codenomicon Ltd. via CERT-FI. Reviewed by: bz, rwatson Security: CVE-2008-2464 MFC after: 8 hours	2008-09-03 08:13:58 +00:00
rwatson	2d23f13f7f	In UDPv6, reduce scope of global udbinfo lock during append to last matching socket by dropping it before udp6_append(), and remove duplicate unlocks of udbinfo and inpcb in sysctl return path. MFC after: 3 days	2008-08-31 13:16:45 +00:00
julian	e951e2f915	another missed V_	2008-08-25 06:09:32 +00:00
julian	03a5241ea0	Fix some of the formatting fixes.. It's amazing how some thing stand out in a commit message.	2008-08-20 01:24:55 +00:00
julian	0592958505	A bunch of formatting fixes brough to light by, or created by the Vimage commit a few days ago.	2008-08-20 01:05:56 +00:00
bz	8a795a9f76	As part of step 1.5 of the vimage framework resolve conflicts with file local static globals which would be folded onto the same name with the V_ macros. Reviewed by: kris, brooks, simon	2008-08-18 13:16:19 +00:00
bz	1021d43b56	Commit step 1 of the vimage project, (network stack) virtualization work done by Marko Zec (zec@). This is the first in a series of commits over the course of the next few weeks. Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only. We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again. Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch	2008-08-17 23:27:27 +00:00
bz	eafee510a9	Fix a regression introduced in r179289 splitting up ip6_savecontrol() into v4-only vs. v6-only inp_flags processing. When ip6_savecontrol_v4() is called from ip6_savecontrol() we were not passing back the mp thus the information will be missing in userland. Istead of going with a * as suggested in the PR we are returning **mp now and passing in the v4only flag as a pointer argument. PR: kern/126349 Reviewed by: rwatson, dwmalone	2008-08-16 06:39:18 +00:00
rwatson	13b6ce6962	Adopt the slightly weaker consistency locking approach used in IPv4 raw sockets for IPv6 raw sockets: separately lock the inpcb for determining the destination address for a connect()'d raw socket at the rip6_send() layer, and then re-acquire the inpcb lock in the rip6_output() layer to query other options on the socket. Previously, the global raw IP socket lock was used, which while correct and marginally more consistent, could add significantly to global raw IP socket lock contention. MFC after: 1 week	2008-07-30 09:26:27 +00:00
rwatson	4082f24815	When copying in and out current ICMPv6 filters on a raw IPv6 socket, lock the inpcb and use a local stack variable to copy to/from userspace so that sooptcopyin()/sooptcopyout() aren't called while holding an rwlock. While here, fix a bug in which a failed sooptcopyin() might lead to partially consistent ICMPv6 filters on the socket by not ignoring the error returned by sooptcopyin(). MFC after: 2 weeks	2008-07-29 19:37:16 +00:00
rwatson	cd641465ef	Since we fail IPv6 raw socket allocation if inp->in6p_icmp6filt can't be allocated, there's no need to conditionize use and freeing of it later. MFC after: 1 week	2008-07-29 18:09:46 +00:00
rwatson	f16d022cdb	Marginally decomplicate set/getsockopt code in ip6_output.c by simply using the passed arguments explicitly and unconditionally rather than testing them and calling panic(). The result is the same but easier to read. MFC after: 3 days	2008-07-29 09:31:03 +00:00
mav	c8ae327077	Move inpcb lock higher to protect some nonbinding fields reading. It fixes nothing at this time, but decided to be more correct.	2008-07-28 19:32:18 +00:00
mav	8a791bfa67	According to in_pcb.h protocol binding information has double locking. It allows access it while list travercing holding only global pcbinfo lock.	2008-07-27 20:30:34 +00:00
bz	362cb79214	Pass the ucred along into in{,6}_pcblookup_local for upcoming prison checks. Reviewed by: rwatson	2008-07-10 13:31:11 +00:00
bz	4b9bb0069f	For consistency take lport as u_short in in{,6}_pcblookup_local. All callers either pass in an u_short or u_int16_t. Reviewed by: rwatson	2008-07-10 13:23:22 +00:00
rrs	a51aa927fa	1) Adds the rest of the VIMAGE change macros 2) Adds some __UserSpace__ on some of the common defines that the user space code needs 3) Fixes a bug when we send up data to a user that failed. We need to a) trim off the data chunk headers, if present, and b) make sure the frag bit is communicated properly for the msgs coming off the stream queues... i.e. we see if some of the msg has been taken. Obtained from: jeli contributed the VIMAGE changes on this pass Thanks Julain!	2008-07-09 16:45:30 +00:00
bz	c0ef832fd2	Document required locking in in6_sleectsrc() in case an inp is passed in by adding an assert. Requested by: rwatson Reviewed by: rwatson	2008-07-09 16:33:21 +00:00
bz	13896f2e51	Change the parameters to in6_selectsrc(): - pass in the inp instead of both in6p_moptions and laddr. - pass in cred for upcoming prison checks. Reviewed by: rwatson	2008-07-08 18:41:36 +00:00
rwatson	7e3b07cdf5	Use soreceive_dgram() and sosend_dgram() with UDPv6, as we do with UDPv4. Tested by: ps MFC after: 3 months	2008-07-08 10:15:23 +00:00
rwatson	00f0ab40d4	Drop read lock on udbinfo earlier during delivery to the last matching UDP socket for a datagram; the inpcb read lock is sufficient to provide inpcb stability during udp6_append(). MFC after: 1 month	2008-07-07 10:11:17 +00:00
rwatson	6ee57a292b	Improve approximation of style(9) in raw socket code.	2008-07-05 18:03:39 +00:00
rwatson	051819b847	Introduce a new lock, hostname_mtx, and use it to synchronize access to global hostname and domainname variables. Where necessary, copy to or from a stack-local buffer before performing copyin() or copyout(). A few uses, such as in cd9660 and daemon_saver, remain under-synchronized and will require further updates. Correct a bug in which a failed copyin() of domainname would leave domainname potentially corrupted. MFC after: 3 weeks	2008-07-05 13:10:10 +00:00
rwatson	482bfeab47	Remove NETISR_MPSAFE, which allows specific netisr handlers to be directly dispatched without Giant, and add NETISR_FORCEQUEUE, which allows specific netisr handlers to always be dispatched via a queue (deferred). Mark the usb and if_ppp netisr handlers as NETISR_FORCEQUEUE, and explicitly acquire Giant in those handlers. Previously, any netisr handler not marked NETISR_MPSAFE would necessarily run deferred and with Giant acquired. This change removes Giant scaffolding from the netisr infrastructure, but NETISR_FORCEQUEUE allows non-MPSAFE handlers to continue to force deferred dispatch so as to avoid lock order reversals between their acqusition of Giant and any calling context. It is likely we will be able to remove NETISR_FORCEQUEUE once IFF_NEEDSGIANT is removed, as non-MPSAFE usb and if_ppp drivers will no longer be supported. Reviewed by: bz MFC after: 1 month X-MFC note: We can't remove NETISR_MPSAFE from stable/7 for KPI reasons, but the rest can go back.	2008-07-04 00:21:38 +00:00
rwatson	20a754dc21	Remove GIANT_REQUIRED from IPv6 input, forward, and frag6 code. The frag6 code is believed to be MPSAFE, and leaving aside the IPv6 route cache in forwarding, Giant appears not to adequately synchronize the data structures in the input or forwarding paths.	2008-07-03 10:55:13 +00:00
rwatson	a2caa98b95	Set the IPv6 netisr handler as NETISR_MPSAFE on the basis that, despite there still being some well-known races in mld6 and nd6, running with Giant over the netisr handler provides little or not additional synchronization that might cause mld6 and nd6 to behave better.	2008-07-02 23:12:40 +00:00
bz	43f7cc1db4	Try to fix errors introduced in svn180085/cvs rev. 1.10: * Include ip6_var.h for ip6stat. * Use the correct name under ip6stat: `ip6s_cantforward' instead of its IPv4 counterpart. MFC after: 10 days	2008-06-29 07:34:21 +00:00
kan	7d4f905059	Repair botched variable rename. Pointy hat to: julian	2008-06-29 04:33:45 +00:00
julian	6a69bd4db2	Oops, we've been incrementing the wrong cantforward variable. Obtained from: vimage tree	2008-06-29 00:25:16 +00:00
julian	ffd508c001	Rename two vars so that they are different from the same vars in ipv4. They are static so it was not a problem 'per se' but it was confusing to the reader. Obtained from: vimage tree	2008-06-29 00:17:45 +00:00
rrs	7782c49376	- Macro-izes the packed declaration in all headers. - Vimage prep - these are major restructures to move all global variables to be accessed via a macro or two. The variables all go into a single structure. - Asconf address addition tweaks (add_or_del Interfaces) - Fix rwnd calcualtion to be more conservative. - Support SACK_IMMEDIATE flag to skip delayed sack by demand of peer. - Comment updates in the sack mapping calculations - Invarients panic added. - Pre-support for UDP tunneling (we can do this on MAC but will need added support from UDP to get a "pipe" of UDP packets in. - clear trace buffer sysctl added when local tracing on. Note the majority of this huge patch is all the vimage prep stuff :-)	2008-06-14 07:58:05 +00:00
rwatson	68f17e9b68	Employ read locks on UDP inpcbs, rather than write locks, when monitoring UDP connections using sysctls. In some cases, add previously missing locking of inpcbs, as inp_socket is followed, which also allows us to drop global locks more quickly. MFC after: 1 week	2008-05-29 08:27:14 +00:00
bz	f3ab94f7b4	Factor out the v4-only vs. the v6-only inp_flags processing in ip6_savecontrol in preparation for udp_append() to no longer need an WLOCK as we will no longer be modifying socket options. Requested by: rwatson Reviewed by: gnn MFC after: 10 days	2008-05-24 15:20:48 +00:00
rrs	8a66346564	- Adds support for the multi-asconf (From Kozuka-san) - Adds some prepwork (Not all yet) for vimage in particular support the delete the sctppcbinfo.xx structs. There is still a leak in here if it were to be called plus we stil need the regrouping (From Me and Michael Tuexen) - Adds support for UDP tunneling. For BSD there is no socket yet setup so its disabled, but major argument changes are in here to emcompass the passing of the port number (zero when you don't have a udp tunnel, the default for BSD). Will add some hooks in UDP here shortly (discussed with Robert) that will allow easy tunneling. (Mainly from Peter Lei and Michael Tuexen with some BSD work from me :-D) - Some ease for windows, evidently leave is reserved by their compile move label leave: -> out: MFC after: 1 week	2008-05-20 13:47:46 +00:00
julian	1dfc5c98a4	Add code to allow the system to handle multiple routing tables. This particular implementation is designed to be fully backwards compatible and to be MFC-able to 7.x (and 6.x) Currently the only protocol that can make use of the multiple tables is IPv4 Similar functionality exists in OpenBSD and Linux. From my notes: ----- One thing where FreeBSD has been falling behind, and which by chance I have some time to work on is "policy based routing", which allows different packet streams to be routed by more than just the destination address. Constraints: ------------ I want to make some form of this available in the 6.x tree (and by extension 7.x) , but FreeBSD in general needs it so I might as well do it in -current and back port the portions I need. One of the ways that this can be done is to have the ability to instantiate multiple kernel routing tables (which I will now refer to as "Forwarding Information Bases" or "FIBs" for political correctness reasons). Which FIB a particular packet uses to make the next hop decision can be decided by a number of mechanisms. The policies these mechanisms implement are the "Policies" referred to in "Policy based routing". One of the constraints I have if I try to back port this work to 6.x is that it must be implemented as a EXTENSION to the existing ABIs in 6.x so that third party applications do not need to be recompiled in timespan of the branch. This first version will not have some of the bells and whistles that will come with later versions. It will, for example, be limited to 16 tables in the first commit. Implementation method, Compatible version. (part 1) ------------------------------- For this reason I have implemented a "sufficient subset" of a multiple routing table solution in Perforce, and back-ported it to 6.x. (also in Perforce though not always caught up with what I have done in -current/P4). The subset allows a number of FIBs to be defined at compile time (8 is sufficient for my purposes in 6.x) and implements the changes needed to allow IPV4 to use them. I have not done the changes for ipv6 simply because I do not need it, and I do not have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it. Other protocol families are left untouched and should there be users with proprietary protocol families, they should continue to work and be oblivious to the existence of the extra FIBs. To understand how this is done, one must know that the current FIB code starts everything off with a single dimensional array of pointers to FIB head structures (One per protocol family), each of which in turn points to the trie of routes available to that family. The basic change in the ABI compatible version of the change is to extent that array to be a 2 dimensional array, so that instead of protocol family X looking at rt_tables[X] for the table it needs, it looks at rt_tables[Y][X] when for all protocol families except ipv4 Y is always 0. Code that is unaware of the change always just sees the first row of the table, which of course looks just like the one dimensional array that existed before. The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign() are all maintained, but refer only to the first row of the array, so that existing callers in proprietary protocols can continue to do the "right thing". Some new entry points are added, for the exclusive use of ipv4 code called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(), which have an extra argument which refers the code to the correct row. In addition, there are some new entry points (currently called rtalloc_fib() and friends) that check the Address family being looked up and call either rtalloc() (and friends) if the protocol is not IPv4 forcing the action to row 0 or to the appropriate row if it IS IPv4 (and that info is available). These are for calling from code that is not specific to any particular protocol. The way these are implemented would change in the non ABI preserving code to be added later. One feature of the first version of the code is that for ipv4, the interface routes show up automatically on all the FIBs, so that no matter what FIB you select you always have the basic direct attached hosts available to you. (rtinit() does this automatically). You CAN delete an interface route from one FIB should you want to but by default it's there. ARP information is also available in each FIB. It's assumed that the same machine would have the same MAC address, regardless of which FIB you are using to get to it. This brings us as to how the correct FIB is selected for an outgoing IPV4 packet. Firstly, all packets have a FIB associated with them. if nothing has been done to change it, it will be FIB 0. The FIB is changed in the following ways. Packets fall into one of a number of classes. 1/ locally generated packets, coming from a socket/PCB. Such packets select a FIB from a number associated with the socket/PCB. This in turn is inherited from the process, but can be changed by a socket option. The process in turn inherits it on fork. I have written a utility call setfib that acts a bit like nice.. setfib -3 ping target.example.com # will use fib 3 for ping. It is an obvious extension to make it a property of a jail but I have not done so. It can be achieved by combining the setfib and jail commands. 2/ packets received on an interface for forwarding. By default these packets would use table 0, (or possibly a number settable in a sysctl(not yet)). but prior to routing the firewall can inspect them (see below). (possibly in the future you may be able to associate a FIB with packets received on an interface.. An ifconfig arg, but not yet.) 3/ packets inspected by a packet classifier, which can arbitrarily associate a fib with it on a packet by packet basis. A fib assigned to a packet by a packet classifier (such as ipfw) would over-ride a fib associated by a more default source. (such as cases 1 or 2). 4/ a tcp listen socket associated with a fib will generate accept sockets that are associated with that same fib. 5/ Packets generated in response to some other packet (e.g. reset or icmp packets). These should use the FIB associated with the packet being reponded to. 6/ Packets generated during encapsulation. gif, tun and other tunnel interfaces will encapsulate using the FIB that was in effect withthe proces that set up the tunnel. thus setfib 1 ifconfig gif0 [tunnel instructions] will set the fib for the tunnel to use to be fib 1. Routing messages would be associated with their process, and thus select one FIB or another. messages from the kernel would be associated with the fib they refer to and would only be received by a routing socket associated with that fib. (not yet implemented) In addition Netstat has been edited to be able to cope with the fact that the array is now 2 dimensional. (It looks in system memory using libkvm (!)). Old versions of netstat see only the first FIB. In addition two sysctls are added to give: a) the number of FIBs compiled in (active) b) the default FIB of the calling process. Early testing experience: ------------------------- Basically our (IronPort's) appliance does this functionality already using ipfw fwd but that method has some drawbacks. For example, It can't fully simulate a routing table because it can't influence the socket's choice of local address when a connect() is done. Testing during the generating of these changes has been remarkably smooth so far. Multiple tables have co-existed with no notable side effects, and packets have been routes accordingly. ipfw has grown 2 new keywords: setfib N ip from anay to any count ip from any to any fib N In pf there seems to be a requirement to be able to give symbolic names to the fibs but I do not have that capacity. I am not sure if it is required. SCTP has interestingly enough built in support for this, called VRFs in Cisco parlance. it will be interesting to see how that handles it when it suddenly actually does something. Where to next: -------------------- After committing the ABI compatible version and MFCing it, I'd like to proceed in a forward direction in -current. this will result in some roto-tilling in the routing code. Firstly: the current code's idea of having a separate tree per protocol family, all of the same format, and pointed to by the 1 dimensional array is a bit silly. Especially when one considers that there is code that makes assumptions about every protocol having the same internal structures there. Some protocols don't WANT that sort of structure. (for example the whole idea of a netmask is foreign to appletalk). This needs to be made opaque to the external code. My suggested first change is to add routing method pointers to the 'domain' structure, along with information pointing the data. instead of having an array of pointers to uniform structures, there would be an array pointing to the 'domain' structures for each protocol address domain (protocol family), and the methods this reached would be called. The methods would have an argument that gives FIB number, but the protocol would be free to ignore it. When the ABI can be changed it raises the possibilty of the addition of a fib entry into the "struct route". Currently, the structure contains the sockaddr of the desination, and the resulting fib entry. To make this work fully, one could add a fib number so that given an address and a fib, one can find the third element, the fib entry. Interaction with the ARP layer/ LL layer would need to be revisited as well. Qing Li has been working on this already. This work was sponsored by Ironport Systems/Cisco Reviewed by: several including rwatson, bz and mlair (parts each) Obtained from: Ironport systems/Cisco	2008-05-09 23:03:00 +00:00
rwatson	e51618e321	Acquire a read lock, rather than a write lock, on a UDPv6 inpcb when delivering to the socket or extracting socket details for monitoring purposes. MFC after: 3 months	2008-04-22 12:20:33 +00:00
rwatson	a1fcc01258	In ICMPv6, read lock rather than write lock the inpcb on receive. MFC after: 3 months	2008-04-21 12:08:40 +00:00
rwatson	9ee84cddef	With IPv4 raw sockets, read lock rather than write lock the inpcb when receiving or transmitting. With IPv6 raw sockets, read lock rather than write lock the inpcb when receiving. Unfortunately, IPv6 source address selection appears to require a write lock on the inpcb for the time being. MFC after: 3 months	2008-04-21 12:06:41 +00:00
rwatson	e93ab31cf5	When querying a local or remote address on an IPv6 socket, use only a read lock on the inpcb. MFC after: 3 months	2008-04-19 14:36:19 +00:00
rwatson	ca47fccd6b	Convert pcbinfo and inpcb mutexes to rwlocks, and modify macros to explicitly select write locking for all use of the inpcb mutex. Update some pcbinfo lock assertions to assert locked rather than write-locked, although in practice almost all uses of the pcbinfo rwlock main exclusive, and all instances of inpcb lock acquisition are exclusive. This change should introduce (ideally) little functional change. However, it lays the groundwork for significantly increased parallelism in the TCP/IP code. MFC after: 3 months Tested by: kris (superset of committered patch)	2008-04-17 21:38:18 +00:00
rrs	0eceb328ee	- Have SCTP use the new pru_flush functionality PR: 122710 MFC after: 1 week	2008-04-14 18:12:37 +00:00
qingli	4e8901ea7a	This patch provides the back end support for equal-cost multi-path (ECMP) for both IPv4 and IPv6. Previously, multipath route insertion is disallowed. For example, route add -net 192.103.54.0/24 10.9.44.1 route add -net 192.103.54.0/24 10.9.44.2 The second route insertion will trigger an error message of "add net 192.103.54.0/24: gateway 10.2.5.2: route already in table" Multiple default routes can also be inserted. Here is the netstat output: default 10.2.5.1 UGS 0 3074 bge0 => default 10.2.5.2 UGS 0 0 bge0 When multipath routes exist, the "route delete" command requires a specific gateway to be specified or else an error message would be displayed. For example, route delete default would fail and trigger the following error message: "route: writing to routing socket: No such process" "delete net default: not in table" On the other hand, route delete default 10.2.5.2 would be successful: "delete net default: gateway 10.2.5.2" One does not have to specify a gateway if there is only a single route for a particular destination. I need to perform more testings on address aliases and multiple interfaces that have the same IP prefixes. This patch as it stands today is not yet ready for prime time. Therefore, the ECMP code fragments are fully guarded by the RADIX_MPATH macro. Include the "options RADIX_MPATH" in the kernel configuration to enable this feature. Reviewed by: robert, sam, gnn, julian, kmacy	2008-04-13 05:45:14 +00:00
rwatson	00684a83a1	In in_pcbnotifyall() and in6_pcbnotify(), use LIST_FOREACH_SAFE() and eliminate unnecessary local variable caching of the list head pointer, making the code a bit easier to read. MFC after: 3 weeks	2008-04-06 21:20:56 +00:00
ru	3b1bf8c2e9	Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT. Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true since the advent of MBUMA. Reviewed by: arch There are ongoing disputes as to whether we want to switch to directly using UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.	2008-03-25 09:39:02 +00:00
bz	33dfb1706b	Correct IPsec behaviour with a 'use' level in SP but no SA available. In that case return an continue processing the packet without IPsec. PR: 121384 MFC after: 5 days Reported by: Cyrus Rahman (crahman gmail.com) Tested by: Cyrus Rahman (crahman gmail.com) [slightly older version]	2008-03-14 16:38:11 +00:00
bz	51315b3d89	Correct reference counting on the SP for outgoing IPv6 IPsec connections. PR: 121374 Reported by: Cyrus Rahman (crahman gmail.com) Tested by: Cyrus Rahman (crahman gmail.com) MFC after: 5 days	2008-03-14 11:55:04 +00:00
bz	f507f0e4fa	#if 0 out a currently unsued (and incomplete) function: ip6_ipsec_mtu(). No need to compile 'dead' code. I am leaving it in because we will have to review the concept and should use the common function in various places. MFC after: 5 days	2008-03-14 11:44:30 +00:00
bz	693055a8ae	Replace the function name in two identical printfs by __func__, __LINE__ so we can distinguish them when people report a problem. PR: 121373 MFC after: 5 days	2008-03-14 11:09:11 +00:00
bz	cfb85f0c07	Rather than passing around a cached 'priv', pass in an ucred to ipsec_set_policy and do the privilege check only if needed. Try to assimilate both ip_ctloutput code blocks calling ipsec*_set_policy. Reviewed by: rwatson	2008-02-02 14:11:31 +00:00
bz	1c376286e0	Replace the last susers calls in netinet6/ with privilege checks. Introduce a new privilege allowing to set certain IP header options (hop-by-hop, routing headers). Leave a few comments to be addressed later. Reviewed by: rwatson (older version, before addressing his comments)	2008-01-24 08:25:59 +00:00
bz	866f483083	Correct the commented out debugging printf()s in REPLACE and NEXT macros. ip6_sprintf() needs a buffer as first argument these days. MFC after: 2 weeks	2008-01-20 10:08:15 +00:00
obrien	7eb385c2d8	un-__P()	2008-01-08 19:08:58 +00:00
rwatson	4a0d85f1d4	Fix leaking MAC labels for IPv6 inpcbs by adding missing MAC label destroy call; this transpired because the inpcb alloc path for IPv4/IPv6 is the same code, but IPv6 has a separate free path. The results was that as new IPv6 TCP connections were created, kernel memory would gradually leak. MFC after: 3 days Reported by: tanyong <tanyong at ercist dot iscas dot ac dot cn>, zhouzhouyi	2007-12-17 17:20:57 +00:00
obrien	0d684d927b	Clean up VCS Ids.	2007-12-10 16:03:40 +00:00
julian	e38fed7fb7	Remove more dup'd code MFC After: 1 week	2007-12-06 22:48:24 +00:00
julian	87a49d3e6e	remove duped code Reviewed By: gnn MRC after: 1 week	2007-12-06 22:44:24 +00:00
mtm	46c3db4ab1	Instead of manually freeing the packet options structure (and not even doing a good job of it) in the copypktopts() function, just call ip6_clearpktopts() directly. Otherwise, the callers of this function would end up freeing the memory twice. Reviewed by: jinmei PR: kern/116360	2007-11-21 16:01:42 +00:00
rwatson	2bca3d4001	Move towards more explicit support for various network protocol stacks in the TrustedBSD MAC Framework: - Add mac_atalk.c and add explicit entry point mac_netatalk_aarp_send() for AARP packet labeling, rather than using a generic link layer entry point. - Add mac_inet6.c and add explicit entry point mac_netinet6_nd6_send() for ND6 packet labeling, rather than using a generic link layer entry point. - Add expliict entry point mac_netinet_arp_send() for ARP packet labeling, and mac_netinet_igmp_send() for IGMP packet labeling, rather than using a generic link layer entry point. - Remove previous genering link layer entry point, mac_mbuf_create_linklayer() as it is no longer used. - Add implementations of new entry points to various policies, largely by replicating the existing link layer entry point for them; remove old link layer entry point implementation. - Make MAC_IFNET_LOCK(), MAC_IFNET_UNLOCK(), and mac_ifnet_mtx global to the MAC Framework rather than static to mac_net.c as it is now needed outside of mac_net.c. Obtained from: TrustedBSD Project	2007-10-28 15:55:23 +00:00
rwatson	a3b8fc4866	Rename 'mac_mbuf_create_from_firewall' to 'mac_netinet_firewall_send' as we move towards netinet as a pseudo-object for the MAC Framework. Rename 'mac_create_mbuf_linklayer' to 'mac_mbuf_create_linklayer' to reflect general object-first ordering preference. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer	2007-10-26 13:18:38 +00:00
rwatson	60570a92bf	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer	2007-10-24 19:04:04 +00:00
jhb	afdad2635d	Close a race when trying to lookup a gateway route in rt_check(). Specifically, if two threads were doing concurrent lookups and the existing gateway was marked down, the the first thread would drop a reference on the gateway route and then unlock the "root" route while it tried to allocate a new route. The second thread could then also drop a reference on the same gateway route resulting in a reference underflow. Fix this by clearing the gateway route pointer after dropping the reference count but before dropping the lock. Secondly, in this same case, the second thread would overwrite the gateway route pointer w/o free'ing a reference to the route installed by the first thread. In practice this would probably just fix a lost reference that would result in a route never being freed. This fixes panics observed in rt_check() and rtexpunge(). MFC after: 1 week PR: kern/112490 Insight from: mehuljv at yahoo.com Reviewed by: ru (found the "not-setting it to NULL" part) Tested by: several	2007-10-22 19:01:26 +00:00
rrs	73fcd49c86	- Incorrect error EAGAIN returned for invalid send on a locked stream (using EEOR mode). Changed to EINVAL (in sctp_output.c) - Static analysis comments added - fix in mobility code to return a value (static analysis found). - sctp6_notify function made visible instead of static (this is needed for Panda). Approved by: re@freebsd.org (B Mah)	2007-09-13 10:36:43 +00:00
rrs	e1de0a1eda	- send call has a reference to uio->uio_resid in the recent send code, but uio may be NULL on sendfile calls. Change to use sndlen variable. - EMSGSIZE is not being returned in non-blocking mode and needs a small tweak to look if the msg would ever fit when returning EWOULDBLOCK. - FWD-TSN has a bug in stream processing which could cause a panic. This is a follow on to the codenomicon fix. - PDAPI level 1 and 2 do not work unless the reader gets his returned buffer full. Fix so we can break out when at level 1 or 2. - Fix fast-handoff features to copy across properly on accepted sockets - Fix sctp_peeloff() system call when no true system call exists to screen arguments for errors. In cases where a real system call exists the system call itself does this. - Fix raddr leak in recent add-ip code change for bundled asconfs (even when non-bundled asconfs are received) - Make sure ipi_addr lock is held when walking global addr list. Need to change this lock type to a rwlock(). - Add don't wake flag on both input and output when the socket is closing. - When deleting an address verify the interface is correct before allowing the delete to process. This protects panda and unnumbered. - Clean up old sysctl stuff and get rid of the old Open/Net BSD structures. - Add a function to watch the ranges in the sysctl sets. - When appending in the reassembly queue, validate that the assoc has not gone to about to be freed. If so (in the middle) abort out. Note this especially effects MAC I think due to the lock/unlock they do (or with LOCK testing in place). - Netstat patch to get rid of warnings. - Make sure that no data gets queued to inactive/unconfirmed destinations. This especially effect CMT but also makes a impact on regular SCTP as well. - During init collision when we detect seq number out of sync we need to treat it like Case C and discard the cookie (no invarient needed here). - Atomic access to the random store. - When we declare a vtag good, we need to shove it into the time wait hash to prevent further use. When the tag is put into the assoc hash, we need to remove it from the twait hash (where it will surely be). This prevents duplicate tag assignments. - Move decr-ref count to better protect sysctl out of data. - ltrace error corrections in sctp6_usrreq.c - Add hook for interface up/down to be sent to us. - Make sysctl() exported structures independent of processor architecture. - Fix route and src addr cache clearing for delete address case. - Make sure address marked SCTP_DEL_IP_ADDRESS is never selected as src addr. - in icmp handling fixed so we actually look at the icmp codes to figure out what to do. - Modified mobility code. Reception of DELETE IP ADDRESS for a primary destination and SET PRIMARY for a new primary destination is used for retransmission trigger to the new primary destination. Also, in this case, destination of chunks in send_queue are changed to the new primary destination. - Fix so that we disallow sending by mbuf to ever have EEOR mode set upon it. Approved by: re@freebsd.org (B Mah)	2007-09-08 17:48:46 +00:00
rrs	4dd82bd675	- Locking compatiability changes. This involves adding additional flags to many function calls. The flags only get used in BSD when we compile with lock testing. These flags allow apple to escape the "giant" lock it holds on the socket and have more fine-grained locking in the NKE. It also allows us to test (with witness) the locking used by apple via a compile switch (manually applied). Approved by: re@freebsd.org(B Mah)	2007-09-08 11:35:11 +00:00
rwatson	a9286385b1	Continue UDP/UDPv6 synchronization project: - Fix copyrights, comments in UDPv6. - Remove macro defines for in6pcb and udp6stat. - Consistently refer to inpcbs as 'inp' and not also 'in6p'. Reviewed by: gnn, jinmei, bz Approved by: re (bmah)	2007-09-08 08:18:24 +00:00
rrs	e335457f91	- During shutdown pending, when the last sack came in and the last message on the send stream was "null" but still there, a state we allow, we could get hung and not clean it up and wait for the shutdown guard timer to clear the association without a graceful close. Fix this so that that we properly clean up. - Added support for Multiple ASCONF per new RFC. We only (so far) accept input of these and cannot yet generate a multi-asconf. - Sysctl'd support for experimental Fast Handover feature. Always disabled unless sysctl or socket option changes to enable. - Error case in add-ip where the peer supports AUTH and ADD-IP but does NOT require AUTH of ASCONF/ASCONF-ACK. We need to ABORT in this case. - According to the Kyoto summit of socket api developers (Solaris, Linux, BSD). We need to have: o non-eeor mode messages be atomic - Fixed o Allow implicit setup of an assoc in 1-2-1 model if using the sctp_**() send calls - Fixed o Get rid of HAVE_XXX declarations - Done o add a sctp_pr_policy in hole in sndrcvinfo structure - Done o add a PR_SCTP_POLICY_VALID type flag - yet to-do in a future patch! - Optimize sctp6 calls to reuse code in sctp_usrreq. Also optimize when we close sending out the data and disabling Nagle. - Change key concatenation order to match the auth RFC - When sending OOTB shutdown_complete always do csum. - Don't send PKT-DROP to a PKT-DROP - For abort chunks just always checksums same for shutdown-complete. - inpcb_free front state had a bug where in queue data could wedge an assoc. We need to just abandon ones in front states (free_assoc). - If a peer sends us a 64k abort, we would try to assemble a response packet which may be larger than 64k. This then would be dropped by IP. Instead make a "minimum" size for us 64k-2k (we want at least 2k for our initack). If we receive such an init discard it early without all the processing. - When we peel off we must increment the tcb ref count to keep it from being freed from underneath us. - handling fwd-tsn had bugs that caused memory overwrites when given faulty data, fixed so can't happen and we also stop at the first bad stream no. - Fixed so comm-up generates the adaption indication. - peeloff did not get the hmac params copied. - fix it so we lock the addr list when doing src-addr selection (in future we need to use a multi-reader/one writer lock here) - During lowlevel output, we could end up with a _l_addr set to null if the iterator is calling the output routine. This means we would possibly crash when we gather the MTU info. Fix so we only do the gather where we have a src address cached. - we need to be sure to set abort flag on conn state when we receive an abort. - peeloff could leak a socket. Moved code so the close will find the socket if the peeloff fails (uipc_syscalls.c) Approved by: re@freebsd.org(Ken Smith)	2007-08-27 05:19:48 +00:00
rrs	1d0af67d1a	- Fix address add handling to clear cached routes and source addresses when peer acks the add in case the routing table changes. - Fix sctp_lower_sosend to send shutdown chunk for mbuf send case when sndlen = 0 and sinfoflag = SCTP_EOF - Fix sctp_lower_sosend for SCTP_ABORT mbuf send case with null data, So that it does not send the "null" data mbuf out and cause it to get freed twice. - Fix so auto-asconf sysctl actually effect the socket's asconf state. - Do not allow SCTP_AUTO_ASCONF option to be used on subset bound sockets. - Memset bug in sctp_output.c (arguments were reversed) submitted found and reported by Dave Jones (davej@codemonkey.org.uk). - PD-API point needs to be invoked >= not just > to conform to socket api draft this fixes sctp_indata.c in the two places need to be >=. - move M_NOTIFICATION to use M_PROTO5. - PEER_ADDR_PARAMS did not fail properly if you specify an address that is not in the association with a valid assoc_id. This meant you got or set the stcb level values instead of the destination you thought you were going to get/set. Now validate if the stcb is non-null and the net is NULL that the sa_family is set and the address is unspecified otherwise return an error. - The thread based iterator could crash if associations were freed at the exact time it was running. rework the worker thread to use the increment/decrement to prevent this and no longer use the markers that the timer based iterator uses. - Fix the memleak in sctp_add_addr_to_vrf() for the case when it is detected that ifa is already pointing to a ifn. - Fix it so that if someone is so insane that they drop the send window below the minimal add mark, they still can send. - Changed all state for associations to use mask safe macro. - During front states in association freeing in sctp_inpcbfree, we had a locking problem where locks were not in place where they should have been. - Free association calls were not testing the return value in sctp_inpcb_free() properly... others should be cast void returns where we don't care about the return value. - If a reference count is held on an assoc, even from the "force free" we should not do the actual free.. but instead let the timer free it. - When we enter sctp_input(), if the SCTP_ASOC_ABOUT_TO_BE_FREED flag is set, we must NOT process the packet but handle it like ootb. This is because while freeing an assoc we release the locks to get all the higher order locks so we can purge all the hash tables. This leaves a hole if a packet comes in just at that point. Now sctp_common_input_processing() will call the ootb code in such a case. - Change MBUF M_NOTIFICATION to use M_PROTO5 (per Sam L). This makes it so we don't have a conflict (I think this is a covertity change). We made this change AFTER some conversation and looking to make sure that M_PROTO5 does not have a problem between SCTP and the 802.11 stuff (which is the only other place its used). - Fixed lock order reversal and missing atomic protection around locked_tcb during association lookup and the 1-2-1 model. - Added debug to source address selection. - V6 output must always do checksum even for loopback. - Remove more locks around inp that are not needed for an atomically added/subtracted ref count. - slight optimization in the way we zero the array in sctp_sack_check() - It was possible to respond to a ABORT() with bad checksum with a PKT-DROP. This lead to a PKT-DROP/ABORT war. Add code to NOT send a PKT-DROP to any ABORT(). - Add an option for local logging (useful for macintosh or when you need better performing during debugging). Note no commands are here to get the log info, you must just use kgdb. - The timer code needs to be aware of if it needs to call sctp_sack_check() to slide the maps and adjust the cum-ack. This is because it may be out of sync cum-ack wise. - Added threshold managment logging. - If the user picked just the right size, that just filled the send window minus one mtu, we would enter a forever loop not copying and at the same time not blocking. Change from < to <= solves this. - Sysctl added to control the fragment interleave level which defaults to 1. - My rwnd control was not being used to control the rwnd properly (we did not add and subtract to it :-() this is now fixed so we handle small messages (1 byte etc) better to bring our rwnd down more slowly. Approved by: re@freebsd.org (Bruce Mah)	2007-08-24 00:53:53 +00:00
bz	3793d89229	Rename option IPSEC_FILTERGIF to IPSEC_FILTERTUNNEL. Also rename the related functions in a similar way. There are no functional changes. For a packet coming in with IPsec tunnel mode, the default is to only call into the firewall with the "outer" IP header and payload. With this option turned on, in addition to the "outer" parts, the "inner" IP header and payload are passed to the firewall too when going through ip_input() the second time. The option was never only related to a gif(4) tunnel within an IPsec tunnel and thus the name was very misleading. Discussed at: BSDCan 2007 Best new name suggested by: rwatson Reviewed by: rwatson Approved by: re (bmah)	2007-08-05 16:16:15 +00:00
rwatson	3b7397a1a3	Continue effort to improve parity between UDPv4 and UDPv6: add a missing scope security check for the UDPv6 socket credential lookup service, allowing security policies to bound access to credential information. While not an immediate issue for Jail, which doesn't allow use of UDPv6, this may be relevant to other security policies that may wish to control ident lookups. While here, eliminate a very unlikely panic case, in which a socket in the process of being freed is inspected by the sysctl. Approved by: re (kensmith) Reviewed by: bz	2007-07-27 08:25:02 +00:00
rrs	1db8ba2474	- take out a needless panic under invariants for sctp_output.c - Fix addrs's error checking of sctp_sendx(3) when addrcnt is less than SCTP_SMALL_IOVEC_SIZE - re-add back inpcb_bind local address check bypass capability - Fix it so sctp_opt_info is independant of assoc_id postion. - Fix cookie life set to use MSEC_TO_TICKS() macro. - asconf changes o More comment changes/clarifications related to the old local address "not" list which is now an explicit restricted list. o Rename some functions for clarity: - sctp_add/del_local_addr_assoc to xxx_local_addr_restricted() - asconf related iterator functions to sctp_asconf_iterator_xxx() o Fix bug when the same address is deleted and added (and removed from the asconf queue) where the ifa is "freed" twice refcount wise, possibly freeing it completely. o Fix bug in output where the first ASCONF would not go out after the last address is changed (e.g. only goes out when retransmitted). o Fix bug where multiple ASCONFs can be bundled in the same packet with the and with the same serial numbers. o Fix asconf stcb iterator to not send ASCONF until after all work queue entries have been processed. o Change behavior so that when the last address is deleted (auto asconf on a bound all endpoint) no action is taken until an address is added; at that time, an ASCONF add+delete is sent (if the assoc is still up). o Fix local address counting so that address scoping is taken into account. o #ifdef SCTP_TIMER_BASED_ASCONF the old timer triggered sending of ASCONF (after an RTO). The default now is to send ASCONF immediately (except for the case of changing/deleting the last usable address). Approved by: re(ken smith)@freebsd.org	2007-07-24 20:06:02 +00:00
rwatson	9f42d6219f	Continue effort to align UDPv4 and UDPv6 implementations by merging udp6_output() from udp6_output.c to udp6_usrreq.c, matching the UDPv4 structure, and allowing us to remove udp6_output.c. Reviewed by: bz, gnn Approved by: re (bmah)	2007-07-23 07:58:58 +00:00
rrs	1918b8aea1	- remove duplicate code from sctp_asconf.c - remove duplicate #include <sys/priv.h> that is not under #ifdef FreeBSD version to allow compile on 6.1 - static analysis changes per the cisco SA tool including: o some SA_IGNORE comments o some checks for NULL before unlock. o type corrections int -> size_t - Fix it so sctp_alloc_asoc takes a thread/proc argument. Without this we pass a NULL in to bind on implicit assoc setup and crash :-( Approved by: re@freebsd.org(Ken Smith)	2007-07-21 21:41:32 +00:00
rwatson	5fe56c549d	Attempt to improve feature parity between UDPv4 and UDPv6 by merging UDPv4 features to UDPv6: - Add MAC checks on delivery and MAC labeling on transmit. - Check for (and reject) datagrams with destination port 0. - For multicast delivery, check the source port only if the socket being considered as a destination has been connected. - Implement UDP blackholing based on net.inet.udp.blackhole. - Add a new ICMPv6 unreachable reply rate limiting category for failed delivery attempts and implement rate limiting for UDPv6 (submitted by bz). Approved by: re (kensmith) Reviewed by: bz	2007-07-19 22:34:25 +00:00
bz	e7080d2707	Restore behavior changed with rev. 1.46 and make IPV6_IPSEC_POLICY always visible again. This unbreaks some third party user space applications. PR: 114491 Reported by: sumikawa Reviewed by: sumikawa Approved by: re (hrs)	2007-07-19 09:16:40 +00:00
rrs	baae800484	- added pre-checks to the bindx call. - use proper tick gathering macro instead of ticks directly. - Placed reasonable boundaries on sets that a user can do that are converted to ticks from ms. - Fix CMT_PF to always check to be sure CMT is on. - Fix ticks use of CMT_PF. - put back code to allow asconfs to be queued while INITs are in flight and before the assoc is established. - During window probes, an ack'd packet might be left with the window probe mark on it causing it to be retransmitted. Change so that the flight decrease macro clears the window_probe mark. - Additional logging flight size/reading and ASOC LOG. This is only enabled if you manually insert things into opt_sctp.h since its a set of debug code only. - Found an interesting SMP race in the way data was appended which could cause a reader to lose a part of a message, had to reorder when we marked the message was complete to after the data was appended. - bug in ADD-IP for the subset bound socket case when the peer has only one address - fix ASCONF implicit success/error handling case - proper support of jails in Freebsd 6> - copy out the timeval for the 64 bit sparc world on cookie-echo alignment error crashes without this). Approved by: re(Ken Smith)	2007-07-17 20:58:26 +00:00
rrs	1e9af2c480	- Modular congestion control, with RFC2581 being the default. - CMT_PF states added (w/sysctl to turn the PF version on) - sctp_input.c had a missing incr of cookie case when the auth was bad. This meant a free was called without an increment to refcnt, added increment like rest of code. - There was a case, unlikely, when the scope of the destination changed (this is a TSNH case). In that case, it would not free the alloc'ed asoc (in sctp_input.c). - When listed addresses found a colliding cookie/Init, then the collided upon tcb was not unlocked in sctp_pcb.c - Add error checking on arguments of sctp_sendx(3) to prevent it from referencing a NULL pointer. - Fix an error return of sctp_sendx(3), it was returing ENOMEM not -1. - Get assoc id was changed to use the sanctified socket api method for getting a assoc id (PEER_ADDR_INFO instead of PEER_ADDR_PARAMS). - Fix it so a peeled off socket will get a proper error return if it trys to send to a different address then it is connected to. - Fix so that select_a_stream can avoid an endless loop that could hang a caller. - time_entered (state set time) was not being set in all cases to the time we went established. Approved by: re(ken smith)	2007-07-14 09:36:28 +00:00
rwatson	c9dd6df88f	General style, white space, and comment cleanup; move to ANSI C prototypes, don't use register, etc. Synchronize structure and layout to the IPv4 versions of these functions to a greater extent, making visual comparison easier. Remove now stale or incorrect comments. Enable full lock assertions, and correct one exception handling case where the wrong label was jumped to. Tested by: bz Approved by: re (bmah)	2007-07-09 17:47:04 +00:00
delphij	42fe5e7f83	Space cleanup Approved by: re (rwatson)	2007-07-05 16:29:40 +00:00
delphij	e6f8b0995d	ANSIfy[1] plus some style cleanup nearby. Discussed with: gnn, rwatson Submitted by: Karl Sj?dahl - dunceor <dunceor gmail com> [1] Approved by: re (rwatson)	2007-07-05 16:23:49 +00:00
peter	c5afa8b483	Fix a stray splx() that caused a new warning. Approved by: re (rwatson)	2007-07-05 06:54:03 +00:00
peter	ac23a203e7	Fix 'assignment used as truth value' warning Approved by: re (rwatson)	2007-07-05 06:27:15 +00:00
gnn	fbe5a6e596	Remove a last, dangling, file from the Kame IPsec code. Approved by: re Spotted by: rwatson, bz	2007-07-04 01:03:48 +00:00
mlaier	83807ec50d	Link pf 4.1 to the build: - move ftp-proxy from libexec to usr.sbin - add tftp-proxy - new altq mtag link Approved by: re (kensmith)	2007-07-03 12:46:08 +00:00
gnn	aeca69ded5	Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSEC option is now deprecated, as well as the KAME IPsec code. What was FAST_IPSEC is now IPSEC. Approved by: re Sponsored by: Secure Computing	2007-07-03 12:13:45 +00:00
gnn	1fb03182c6	Removing old, dead, KAME IPsec files as part of the move to the new FAST_IPSEC based IPsec stack. Approved by: re Reviewed by: bz	2007-07-02 04:02:21 +00:00
gnn	3c65b7d246	Follow on cleanup and removal of two unnecessary include files. Reviewed by: bz Approved by: re Supported by: Secure Computing	2007-07-01 12:31:01 +00:00
gnn	0cd74db89b	Commit IPv6 support for FAST_IPSEC to the tree. This commit includes only the kernel files, the rest of the files will follow in a second commit. Reviewed by: bz Approved by: re Supported by: Secure Computing	2007-07-01 11:41:27 +00:00

... 2 3 4 5 6 ...

1028 Commits