freebsd-dev

Author	SHA1	Message	Date
Bjoern A. Zeeb	becba438d2	Plug reference leaks in the link-layer code ("new-arp") that previously prevented the link-layer entry from being freed. In both in.c and in6.c (though that code path seems to be basically dead) plug a reference leak in case of a pending callout being drained. In if_ether.c consistently add a reference before resetting the callout and in case we canceled a pending one remove the reference for that. In the final case in arptimer, before freeing the expired entry, remove the reference again and explicitly call callout_stop() to clear the active flag. In nd6.c:nd6_free() we are only ever called from the callout function and thus need to remove the reference there as well before calling into llentry_free(). In if_llatbl.c when freeing entire tables make sure that in case we cancel a pending callout to remove the reference as well. Reviewed by: qingli (earlier version) MFC after: 10 days Problem observed, patch tested by: simon on ipv6gw.f.o, Christian Kratzer (ck cksoft.de), Evgenii Davidov (dado korolev-net.ru) PR: kern/144564 Configurations still affected: with options FLOWTABLE	2010-04-11 16:04:08 +00:00
Qing Li	c7ea0aa648	One of the advantages of enabling ECMP (a.k.a RADIX_MPATH) is to allow for connection load balancing across interfaces. Currently the address alias handling method is colliding with the ECMP code. For example, when two interfaces are configured on the same prefix, only one prefix route is installed. So connection load balancing among the available interfaces is not possible. The other advantage of ECMP is for failover. The issue with the current code, is that the interface link-state is not reflected in the route entry. For example, if there are two interfaces on the same prefix, the cable on one interface is unplugged, new and existing connections should switch over to the other interface. This is not done today and packets go into a black hole. Also, there is a small bug in the kernel where deleting ECMP routes in the userland will always return an error even though the command is successfully executed. MFC after: 5 days	2010-03-09 01:11:45 +00:00
Qing Li	d577d18a00	Some of the existing ppp and vpn related scripts create and set the IP addresses of the tunnel end points to the same value. In these cases the loopback route is not installed for the local end. Verified by: avg MFC after: 5 days	2010-02-02 20:38:30 +00:00
Qing Li	646c800540	Ensure an address is removed from the interface address list when the installation of that address fails. PR: 139559	2010-01-08 17:49:24 +00:00
Qing Li	ccbb9c359d	Consolidate the route message generation code for when address aliases were added or deleted. The announced route entry for an address alias is no longer empty because this empty route entry was causing some route daemon to fail and exit abnormally. MFC after: 5 days	2009-12-30 22:13:01 +00:00
Qing Li	c7ab66020f	The proxy arp entries could not be added into the system over the IFF_POINTOPOINT link types. The reason was due to the routing entry returned from the kernel covering the remote end is of an interface type that does not support ARP. This patch fixes this problem by providing a hint to the kernel routing code, which indicates the prefix route instead of the PPP host route should be returned to the caller. Since a host route to the local end point is also added into the routing table, and there could be multiple such instantiations due to multiple PPP links can be created with the same local end IP address, this patch also fixes the loopback route installation failure problem observed prior to this patch. The reference count of loopback route to local end would be either incremented or decremented. The first instantiation would create the entry and the last removal would delete the route entry. MFC after: 5 days	2009-12-30 21:35:34 +00:00
Qing Li	6cb2b4e7a8	Use the correct option name in the preprocessor command to enable or disable diagnostic messages. Reviewed by: ru MFC after: 3 days	2009-10-23 18:27:34 +00:00
Qing Li	93704ac5d7	This patch fixes the following issues in the ARP operation: 1. There is a regression issue in the ARP code. The incomplete ARP entry was timing out too quickly (1 second timeout), as such, a new entry is created each time arpresolve() is called. Therefore the maximum attempts made is always 1. Consequently the error code returned to the application is always 0. 2. Set the expiration of each incomplete entry to a 20-second lifetime. 3. Return "incomplete" entries to the application. Reviewed by: kmacy MFC after: 3 days	2009-10-15 06:12:04 +00:00
Qing Li	b4a22c365c	Remove a log message from production code. This log message can be triggered by a misconfigured host that is sending out gratuious ARPs. This log message can also be triggered during a network renumbering event when multiple prefixes co-exist on a single network segment. MFC after: immediately	2009-10-02 01:45:11 +00:00
Qing Li	fa3cfd39ff	Previously, if an address alias is configured on an interface, and this address alias has a prefix matching that of another address configured on the same interface, then the ARP entry for the alias is not deleted from the ARP table when that address alias is removed. This patch fixes the aforementioned issue. PR: kern/139113 MFC after: 3 days	2009-10-02 01:34:55 +00:00
Qing Li	9bb7d0f47a	Self pointing routes are installed for configured interface addresses and address aliases. After an interface is brought down and brought back up again, those self pointing routes disappeared. This patch ensures after an interface is brought back up, the loopback routes are reinstalled properly. Reviewed by: bz MFC after: immediately	2009-09-15 19:18:34 +00:00
Qing Li	96ed1732bb	The bootp code installs an interface address and the nfs client module tries to install the same address again. This extra code is removed, which was discovered by the removal of a call to in_ifscrub() in r196714. This call to in_ifscrub is put back here because the SIOCAIFADDR command can be used to change the prefix length of an existing alias. Reviewed by: kmacy	2009-09-15 01:01:03 +00:00
Navdeep Parhar	9a31144537	Add arp_update_event. This replaces route_arp_update_event, which has not worked since the arp-v2 rewrite. The event handler will be called with the llentry write-locked and can examine la_flags to determine whether the entry is being added or removed. Reviewed by: gnn, kmacy Approved by: gnn (mentor) MFC after: 1 month	2009-09-08 21:17:17 +00:00
Qing Li	1bf38b1292	This patch fixes the following issues: - Routing messages are not generated when adding and removing interface address aliases. - Loopback route installed for an interface address alias is not deleted from the routing table when that address alias is removed from the associated interface. - Function in_ifscrub() is called extraneously. Reviewed by: gnn, kmacy, sam MFC after: 3 days	2009-08-31 21:02:48 +00:00
Robert Watson	dc56e98f0d	Use locks specific to the lltable code, rather than borrow the ifnet list/index locks, to protect link layer address tables. This avoids lock order issues during interface teardown, but maintains the bug that sysctl copy routines may be called while a non-sleepable lock is held. Reviewed by: bz, kmacy MFC after: 3 days	2009-08-25 09:52:38 +00:00
Robert Watson	77dfcdc445	Rework global locks for interface list and index management, correcting several critical bugs, including race conditions and lock order issues: Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an sxlock. Either can be held to stablize the lists and indexes, but both are required to write. This allows the list to be held stable in both network interrupt contexts and sleepable user threads across sleeping memory allocations or device driver interactions. As before, writes to the interface list must occur from sleepable contexts. Reviewed by: bz, julian MFC after: 3 days	2009-08-23 20:40:19 +00:00
Robert Watson	530c006014	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)	2009-08-01 19:26:27 +00:00
Qing Li	df813b7ea2	This patch does the following: - Allow loopback route to be installed for address assigned to interface of IFF_POINTOPOINT type. - Install loopback route for an IPv4 interface addreess when the "useloopback" sysctl variable is enabled. Similarly, install loopback route for an IPv6 interface address when the sysctl variable "nd6_useloopback" is enabled. Deleting loopback routes for interface addresses is unconditional in case these sysctl variables were disabled after an interface address has been assigned. Reviewed by: bz Approved by: re	2009-07-27 17:08:06 +00:00
Robert Watson	1e77c1056a	Remove unused VNET_SET() and related macros; only VNET_GET() is ever actually used. Rename VNET_GET() to VNET() to shorten variable references. Discussed with: bz, julian Reviewed by: bz Approved by: re (kensmith, kib)	2009-07-16 21:13:04 +00:00
Robert Watson	eddfbb763d	Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)	2009-07-14 22:48:30 +00:00
Robert Watson	2d9cfabad4	Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the in_ifaddrhead and INADDR_HASH address lists. Previously, these lists were used unsynchronized as they were effectively never changed in steady state, but we've seen increasing reports of writer-writer races on very busy VPN servers as core count has gone up (and similar configurations where address lists change frequently and concurrently). For the time being, use rwlocks rather than rmlocks in order to take advantage of their better lock debugging support. As a result, we don't enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion is complete and a performance analysis has been done. This means that one class of reader-writer races still exists. MFC after: 6 weeks Reviewed by: bz	2009-06-25 11:52:33 +00:00
Robert Watson	8c0fec805f	Modify most routines returning 'struct ifaddr *' to return references rather than pointers, requiring callers to properly dispose of those references. The following routines now return references: ifaddr_byindex ifa_ifwithaddr ifa_ifwithbroadaddr ifa_ifwithdstaddr ifa_ifwithnet ifaof_ifpforaddr ifa_ifwithroute ifa_ifwithroute_fib rt_getifa rt_getifa_fib IFP_TO_IA ip_rtaddr in6_ifawithifp in6ifa_ifpforlinklocal in6ifa_ifpwithaddr in6_ifadd carp_iamatch6 ip6_getdstifaddr Remove unused macro which didn't have required referencing: IFP_TO_IA6 This closes many small races in which changes to interface or address lists while an ifaddr was in use could lead to use of freed memory (etc). In a few cases, add missing if_addr_list locking required to safely acquire references. Because of a lack of deep copying support, we accept a race in which an in6_ifaddr pointed to by mbuf tags and extracted with ip6_getdstifaddr() doesn't hold a reference while in transmit. Once we have mbuf tag deep copy support, this can be fixed. Reviewed by: bz Obtained from: Apple, Inc. (portions) MFC after: 6 weeks (portions)	2009-06-23 20:19:09 +00:00
Robert Watson	1099f828b3	Clean up common ifaddr management: - Unify reference count and lock initialization in a single function, ifa_init(). - Move tear-down from a macro (IFAFREE) to a function ifa_free(). - Move reference count bump from a macro (IFAREF) to a function ifa_ref(). - Instead of using a u_int protected by a mutex to refcount(9) for reference count management. The ifa_mtx is now used for exactly one ioctl, and possibly should be removed. MFC after: 3 weeks	2009-06-21 19:30:33 +00:00
Bjoern A. Zeeb	8d8bc0182e	After r193232 rt_tables in vnet.h are no longer indirectly dependent on the ROUTETABLES kernel option thus there is no need to include opt_route.h anymore in all consumers of vnet.h and no longer depend on it for module builds. Remove the hidden include in flowtable.h as well and leave the two explicit #includes in ip_input.c and ip_output.c.	2009-06-08 19:57:35 +00:00
Bjoern A. Zeeb	f81a8a320c	If including vnet.h one has to include opt_route.h as well. This is because struct vnet_net holds the rt_tables[][] for MRT and array size is compile time dependent. If you had ROUTETABLES set to >1 after r192011 V_loif was pointing into nonsense leading to strange results or even panics for some people. Reviewed by: mz	2009-05-22 23:03:15 +00:00
Qing Li	c9d763bf41	When an interface address is removed and the last prefix route is also being deleted, the link-layer address table (arp or nd6) will flush those L2 llinfo entries that match the removed prefix. Reviewed by: kmacy	2009-05-20 21:07:15 +00:00
Bjoern A. Zeeb	1600c117d8	Unbreak options VIMAGE builds, in a followup to r192011 which did not introduce INIT_VNET_NET() initializers necessary for accessing V_loif. Submitted by: zec Reviewed by: julian	2009-05-17 20:53:10 +00:00
Qing Li	92fac99477	Ignore the INADDR_ANY address inserted/deleted by DHCP when installing a loopback route to the interface address.	2009-05-14 05:27:09 +00:00
Qing Li	ebc90701ac	This patch adds a host route to an interface address (that is assigned to a non loopback/ppp link types) through the loopback interface. Prior to the new L2/L3 rewrite, this host route is implicitly added by the L2 code during RTM_RESOLVE of that interface address. This host route is deleted when that interface is removed. Reviewed by: kmacy	2009-05-12 07:41:20 +00:00
Marko Zec	093f25f8c8	In preparation for turning on options VIMAGE in next commits, rearrange / replace / adjust several INIT_VNET_* initializer macros, all of which currently resolve to whitespace. Reviewed by: bz (an older version of the patch) Approved by: julian (mentor)	2009-04-26 22:06:42 +00:00
Robert Watson	588885f2f5	Expand coverage of IF_ADDR_LOCK() in in_control() from point of initial lookup of 'ia' from if_addrhead through most use. Note that we currently have to drop it prematurely in some cases due to calls out to the routing and interface code while using 'ia', but this closes many races. Annotate several potential races that persist after this change. Move to using M_NOWAIT for allocating new interface addresses due to lock(s) being held. MFC after: 3 weeks	2009-04-25 23:02:57 +00:00
Robert Watson	07cde5e92c	In in_purgemaddrs(), remove the inm being freed from the address list before freeing it, rather than vice version, to avoid potential use after free. Reviewed by: bms	2009-04-24 22:11:53 +00:00
Robert Watson	cf7b18f15e	Relocate permissions checking code in in_control() to before the body of the implementation of ioctls. This makes the mapping of ioctls to specific privileges more explicit, and also simplifies the implementation by reducing the use of FALLTHROUGH handling in switch. While this is not intended to be a functional change, it does mean that certain privilege checks are now performed earlier, so EPERM might be returned in preference to EADDRNOTAVAIL for management ioctls that could have failed for both reasons. MFC after: 3 weeks	2009-04-24 09:54:46 +00:00
Robert Watson	bbb3fb6194	Reorganize in_control() so that invariants are more obvious, and so that it is easier to lock: - Handle the unsupported ioctl case at the beginning of in_control(), handing off to ifp->if_ioctl, rather than looking up interfaces and addresses unnecessarily in this case. - Make it an invariant that ifp is always non-NULL when running in_control()-implemented ioctls, simplifying the code structure. MFC after: 3 weeks	2009-04-23 21:41:37 +00:00
Robert Watson	8021456a24	Protect against some writer-writer races in in_control() by acquiring the interface address list lock around interface address list modifications. More to do here. MFC after: 2 weeks	2009-04-19 22:16:19 +00:00
Bruce M Simpson	56663a40eb	Deal with the case where ifma_protospec may be NULL, during any IPv4 multicast operations which reference it. There is a potential race because ifma_protospec is set to NULL when we discover the underlying ifnet has gone away. This write is not covered by the IF_ADDR_LOCK, and it's difficult to widen its scope without making it a recursive lock. It isn't clear why this manifests more quickly with 802.11 interfaces, but does not seem to manifest at all with wired interfaces. With this change, the 802.11 related panics reported by sam@ and cokane@ should go away. It is not the right fix, that requires more thought before 8.0. Idea from: sam Tested by: cokane	2009-03-17 14:41:54 +00:00
Robert Watson	e5adda3d51	Remove IFF_NEEDSGIANT, a compatibility infrastructure introduced in FreeBSD 5.x to allow network device drivers to run with Giant despite the network stack being Giant-free. This significantly simplifies calls into ioctl() on network interfaces, especially in the multicast code, as well as eliminates deferred invocation of interface if_start routines. Disable the build on device drivers still depending on IFF_NEEDSGIANT as they no longer compile. They will be removed in a few weeks if they haven't been made MPSAFE in that time. Disabled drivers: if_ar if_axe if_aue if_cdce if_cue if_kue if_ray if_rue if_rum if_sr if_udav if_ural if_zyd Drivers that were already disabled because of tty changes: if_ppp if_sl Discussed on: arch@	2009-03-15 14:21:05 +00:00
Bruce M Simpson	c75aa3548f	Fix uninitialized use of ifp for ii. Found by: Peter Holm	2009-03-09 22:54:17 +00:00
Bruce M Simpson	d10910e6ce	Merge IGMPv3 and Source-Specific Multicast (SSM) to the FreeBSD IPv4 stack. Diffs are minimized against p4. PCS has been used for some protocol verification, more widespread testing of recorded sources in Group-and-Source queries is needed. sizeof(struct igmpstat) has changed. __FreeBSD_version is bumped to 800070.	2009-03-09 17:53:05 +00:00
Jamie Gritton	b89e82dd87	Standardize the various prison_foo_ip[46] functions and prison_if to return zero on success and an error code otherwise. The possible errors are EADDRNOTAVAIL if an address being checked for doesn't match the prison, and EAFNOSUPPORT if the prison doesn't have any addresses in that address family. For most callers of these functions, use the returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or EINVAL. Always include a jailed() check in these functions, where a non-jailed cred always returns success (and makes no changes). Remove the explicit jailed() checks that preceded many of the function calls. Approved by: bz (mentor)	2009-02-05 14:06:09 +00:00
Sam Leffler	cbd1844537	remove too noisy DIAGNOSTIC code Reviewed by: qingli	2009-01-18 07:20:02 +00:00
Bjoern A. Zeeb	813dd6ae5e	Restrict arp, ndp and theoretically the FIB listing (if not read with libkvm) to the addresses of a prison, when inside a jail. [1] As the patch from the PR was pre-'new-arp', add checks to the llt_dump handlers as well. While touching RTM_GET in route_output(), consistently use curthread credentials rather than the creds from the socket there. [2] PR: kern/68189 Submitted by: Mark Delany <sxcg2-fuwxj@qmda.emu.st> [1] Discussed with: rwatson [2] Reviewed by: rwatson MFC after: 4 weeks	2009-01-09 21:57:49 +00:00
Bjoern A. Zeeb	5ce0eb7f08	Make SIOCGIFADDR and related, as well as SIOCGIFADDR_IN6 and related jail-aware. Up to now we returned the first address of the interface for SIOCGIFADDR w/o an ifr_addr in the query. This caused problems for programs querying for an address but running inside a jail, as the address returned usually did not belong to the jail. Like for v6, if there was an ifr_addr given on v4, you could probe for more addresses on the interfaces that you were not allowed to see from inside a jail. Return an error (EADDRNOTAVAIL) in that case now unless the address is on the given interface and valid for the jail. PR: kern/114325 Reviewed by: rwatson MFC after: 4 weeks	2009-01-09 13:06:56 +00:00
Hartmut Brandt	c0e9a8a154	Set a minimum of information in the routing message (like version and type) so that generic routing message parsing code can parse the messages for L2 info that are retrieved via the sysctl interface.	2009-01-09 10:58:59 +00:00
Qing Li	dc49549713	Some modules such as SCTP supplies a valid route entry as an input argument to ip_output(). The destionation is represented in a sockaddr{} object that may contain other pieces of information, e.g., port number. This same destination sockaddr{} object may be passed into L2 code, which could be used to create a L2 entry. Since there exists a L2 table per address family, the L2 lookup function can make address family specific comparison instead of the generic bcmp() operation over the entire sockaddr{} structure. Note in the IPv6 case the sin6_scope_id is not compared because the address is currently stored in the embedded form inside the kernel. The in6_lltable_lookup() has to account for the scope-id if this storage format were to change in the future.	2009-01-03 00:27:28 +00:00
Bjoern A. Zeeb	42d866dd69	For consistency use LLE_IS_VALID() in this 4th place that is actually interested in the (void *)-1 return value hack. This way we can easily identify those special parts of the code.	2008-12-28 21:18:01 +00:00
Qing Li	8eca593c5a	This checkin addresses a couple of issues: 1. The "route" command allows route insertion through the interface-direct option "-iface". During if_attach(), an sockaddr_dl{} entry is created for the interface and is part of the interface address list. This sockaddr_dl{} entry describes the interface in detail. The "route" command selects this entry as the "gateway" object when the "-iface" option is present. The "arp" and "ndp" commands also interact with the kernel through the routing socket when adding and removing static L2 entries. The static L2 information is also provided through the "gateway" object with an AF_LINK family type, similar to what is provided by the "route" command. In order to differentiate between these two types of operations, a RTF_LLDATA flag is introduced. This flag is set by the "arp" and "ndp" commands when issuing the add and delete commands. This flag is also set in each L2 entry returned by the kernel. The "arp" and "ndp" command follows a convention where a RTM_GET is issued first followed by a RTM_ADD/DELETE. This RTM_GET request fills in the fields for a "rtm" object, which is reinjected into the kernel by a subsequent RTM_ADD/DELETE command. The entry returend from RTM_GET is a prefix route, so the RTF_LLDATA flag must be specified when issuing the RTM_ADD/DELETE messages. 2. Enforce the convention that NET_RT_FLAGS with a 0 w_arg is the specification for retrieving L2 information. Also optimized the code logic. Reviewed by: julian	2008-12-26 19:45:24 +00:00
Kip Macy	fbc2ca1bef	unlock and destroy an llentry's lock before freeing Found by: sam	2008-12-16 00:20:49 +00:00
Qing Li	6e6b3f7cbc	This main goals of this project are: 1. separating L2 tables (ARP, NDP) from the L3 routing tables 2. removing as much locking dependencies among these layers as possible to allow for some parallelism in the search operations 3. simplify the logic in the routing code, The most notable end result is the obsolescent of the route cloning (RTF_CLONING) concept, which translated into code reduction in both IPv4 ARP and IPv6 NDP related modules, and size reduction in struct rtentry{}. The change in design obsoletes the semantics of RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland applications such as "arp" and "ndp" have been modified to reflect those changes. The output from "netstat -r" shows only the routing entries. Quite a few developers have contributed to this project in the past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and Andre Oppermann. And most recently: - Kip Macy revised the locking code completely, thus completing the last piece of the puzzle, Kip has also been conducting active functional testing - Sam Leffler has helped me improving/refactoring the code, and provided valuable reviews - Julian Elischer setup the perforce tree for me and has helped me maintaining that branch before the svn conversion	2008-12-15 06:10:57 +00:00
Bjoern A. Zeeb	4b79449e2f	Rather than using hidden includes (with cicular dependencies), directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files. For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h. Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation	2008-12-02 21:37:28 +00:00

1 2 3 4

159 Commits