freebsd-dev

Author	SHA1	Message	Date
Bjoern A. Zeeb	89856f7e2d	Get closer to a VIMAGE network stack teardown from top to bottom rather than removing the network interfaces first. This change is rather larger and convoluted as the ordering requirements cannot be separated. Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and related modules to their own SI_SUB_PROTO_FIREWALL. Move initialization of "physical" interfaces to SI_SUB_DRIVERS, move virtual (cloned) interfaces to SI_SUB_PSEUDO. Move Multicast to SI_SUB_PROTO_MC. Re-work parts of multicast initialisation and teardown, not taking the huge amount of memory into account if used as a module yet. For interface teardown we try to do as many of them as we can on SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling over a higher layer protocol such as IP. In that case the interface has to go along (or before) the higher layer protocol is shutdown. Kernel hhooks need to go last on teardown as they may be used at various higher layers and we cannot remove them before we cleaned up the higher layers. For interface teardown there are multiple paths: (a) a cloned interface is destroyed (inside a VIMAGE or in the base system), (b) any interface is moved from a virtual network stack to a different network stack ("vmove"), or (c) a virtual network stack is being shut down. All code paths go through if_detach_internal() where we, depending on the vmove flag or the vnet state, make a decision on how much to shut down; in case we are destroying a VNET the individual protocol layers will cleanup their own parts thus we cannot do so again for each interface as we end up with, e.g., double-frees, destroying locks twice or acquiring already destroyed locks. When calling into protocol cleanups we equally have to tell them whether they need to detach upper layer protocols ("ulp") or not (e.g., in6_ifdetach()). Provide or enahnce helper functions to do proper cleanup at a protocol rather than at an interface level. Approved by: re (hrs) Obtained from: projects/vnet Reviewed by: gnn, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6747	2016-06-21 13:48:49 +00:00
Bjoern A. Zeeb	74b44794c6	Make KASSERT message more useful by printing the variables on which we assert. Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2016-06-06 22:34:12 +00:00
Pedro F. Giffuni	a4641f4eaa	sys/net*: minor spelling fixes. No functional change.	2016-05-03 18:05:43 +00:00
Andrey V. Elsukov	9f8b8e793b	mld_v2_dispatch_general_query() is used by mld_fasttimo_vnet() to send a reply to the MLDv2 General Query. In case when router has a lot of multicast groups, the reply can take several packets due to MTU limitation. Also we have a limit MLD_MAX_RESPONSE_BURST == 4, that limits the number of packets we send in one shot. Then we recalculate the timer value and schedule the remaining packets for sending. The problem is that when we call mld_v2_dispatch_general_query() to send remaining packets, we queue new reply in the same mbuf queue. And when number of packets is bigger than MLD_MAX_RESPONSE_BURST, we get endless reply of MLDv2 reports. To fix this, add the check for remaining packets in the queue. PR: 204831 MFC after: 1 week Sponsored by: Yandex LLC	2015-12-01 11:17:41 +00:00
Gleb Smirnoff	9e62a5a379	- Rename 'struct mld_ifinfo' into 'struct mld_ifsoftc', since it really represents a context. - Preserve name 'struct mld_ifinfo' for a new structure, that will be stable API between userland and kernel. - Make sysctl_mld_ifinfo() return the new 'struct mld_ifinfo', instead of old one, which had a bunch of internal kernel structures in it. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2015-02-19 22:37:01 +00:00
Gleb Smirnoff	a99c84d4e6	Use new struct mbufq instead of struct ifqueue to manage packet queues in IPv6 multicast code. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2015-02-19 01:21:23 +00:00
Robert Watson	ed6a66ca6c	To ease changes to underlying mbuf structure and the mbuf allocator, reduce the knowledge of mbuf layout, and in particular constants such as M_EXT, MLEN, MHLEN, and so on, in mbuf consumers by unifying various alignment utility functions (M_ALIGN(), MH_ALIGN(), MEXT_ALIGN() in a single M_ALIGN() macro, implemented by a now-inlined m_align() function: - Move m_align() from uipc_mbuf.c to mbuf.h; mark as __inline. - Reimplement M_ALIGN(), MH_ALIGN(), and MEXT_ALIGN() using m_align(). - Update consumers around the tree to simply use M_ALIGN(). This change eliminates a number of cases where mbuf consumers must be aware of whether or not mbufs returned by the allocator use external storage, but also assumptions about the size of the returned mbuf. This will make it easier to introduce changes in how we use external storage, as well as features such as variable-size mbufs. Differential Revision: https://reviews.freebsd.org/D1436 Reviewed by: glebius, trasz, gnn, bz Sponsored by: EMC / Isilon Storage Division	2015-01-05 09:58:32 +00:00
Gleb Smirnoff	6df8a71067	Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed. Sponsored by: Nginx, Inc.	2014-11-07 09:39:05 +00:00
Hans Petter Selasky	af3b2549c4	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
Glen Barber	37a107a407	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
Hans Petter Selasky	3da1cf1e88	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
Andrey V. Elsukov	e74966f60b	Mechanically replace direct accessing to if_xname to using if_name() macro.	2014-01-10 12:33:28 +00:00
Gleb Smirnoff	76039bc84f	The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 17:58:36 +00:00
Andre Oppermann	1b4381afbb	Restructure the mbuf pkthdr to make it fit for upcoming capabilities and features. The changes in particular are: o Remove rarely used "header" pointer and replace it with a 64bit protocol/ layer specific union PH_loc for local use. Protocols can flexibly overlay their own 8 to 64 bit fields to store information while the packet is worked on. o Mechanically convert IP reassembly, IGMP/MLD and ATM to use pkthdr.PH_loc instead of pkthdr.header. o Extend csum_flags to 64bits to allow for additional future offload information to be carried (e.g. iSCSI, IPsec offload, and others). o Move the RSS hash type enumerator from abusing m_flags to its own 8bit rsstype field. Adjust accessor macros. o Add cosqos field to store Class of Service / Quality of Service information with the packet. It is not yet supported in any drivers but allows us to get on par with Cisco/Juniper in routing applications (plus MPLS QoS) with a modernized ALTQ. o Add four 8 bit fields l[2-5]hlen to store the relative header offsets from the start of the packet. This is important for various offload capabilities and to relieve the drivers from having to parse the packet and protocol headers to find out location of checksums and other information. Header parsing in drivers is a lot of copy-paste and unhandled corner cases which we want to avoid. o Add another flexible 64bit union to map various additional persistent packet information, like ether_vtag, tso_segsz and csum fields. Depending on the csum_flags settings some fields may have different usage making it very flexible and adaptable to future capabilities. o Restructure the CSUM flags to better signify their outbound (down the stack) and inbound (up the stack) use. The CSUM flags used to be a bit chaotic and rather poorly documented leading to incorrect use in many places. Bring clarity into their use through better naming. Compatibility mappings are provided to preserve the API. The drivers can be corrected one by one and MFC'd without issue. o The size of pkthdr stays the same at 48/56bytes (32/64bit architectures). Sponsored by: The FreeBSD Foundation	2013-08-24 19:51:18 +00:00
Andre Oppermann	86bd049144	Add m_clrprotoflags() to clear protocol specific mbuf flags at up and downwards layer crossings. Consistently use it within IP, IPv6 and ethernet protocols. Discussed with: trociny, glebius	2013-08-19 13:27:32 +00:00
Gleb Smirnoff	7b07d1bed0	- Use m_getcl() instead of hand allocating. - Use m_get()/m_gethdr() instead of macros. - Remove superfluous cleaning of mbuf fields after allocation. Sponsored by: Nginx, Inc.	2013-03-15 12:50:29 +00:00
Gleb Smirnoff	eb1b1807af	Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually	2012-12-05 08:04:20 +00:00
Kevin Lo	9823d52705	Revert previous commit... Pointyhat to: kevlo (myself)	2012-10-10 08:36:38 +00:00
Kevin Lo	a10cee30c9	Prefer NULL over 0 for pointers	2012-10-09 08:27:40 +00:00
Bruce M Simpson	df7e35725b	Kick the current-state report timer when a V1 group report would be triggered. Submitted by: rpaulo@ MFC after: 3 days	2012-06-28 23:48:40 +00:00
Bruce M Simpson	b7d882304b	Fix a typo in MLD query exponent processing. Submitted by: rpaulo@ MFC after: 3 days	2012-06-28 23:45:37 +00:00
Bruce M Simpson	289ca95209	In MLDv2 general query processing, do not enforce the strict check on query origins. Submitted by: Gu Yong MFC after: 3 days	2012-06-28 23:44:47 +00:00
John Baldwin	137f91e80f	Convert all users of IF_ADDR_LOCK to use new locking macros that specify either a read lock or write lock. Reviewed by: bz MFC after: 2 weeks	2012-01-05 19:00:36 +00:00
John Baldwin	19b0c9b246	Use the mli_relinmhead list normally used to defer calls to in6m_release_locked() to defer calls to mld_v1_transmit_report() until after the IF_ADDR_LOCK is dropped. This removes a race where the lock is dropped and reacquired while attempting to walk an interface's address list. Reviewed by: bz MFC after: 1 week	2012-01-04 13:35:20 +00:00
John Baldwin	c6f4ea8062	When cancelling multicast timers on an interface, don't release the reference on a group in the leaving state while iterating over the loop. Instead, use the same approach used in igmp_ifdetach() and mld_ifdetach() of placing the groups to free on pending release list and then releasing the references after dropping the IF_ADDR_LOCK. This closes an ugly race where the code was dropping the lock in the middle of iterating over the list. It also fixes some additional potential use-after-free bugs since the cancellation routine also applied other changes to the group after dropping the reference. Now those changes are performed before the reference is dropped and the group is potentially freed. Prodded to fix by: glebius Reviewed by: bz MFC after: 1 week	2012-01-03 20:34:52 +00:00
John Baldwin	f5b50e25ec	Use TAILQ_FOREACH() instead of TAILQ_FOREACH_SAFE() for some loops that do not modify the queues they iterate over. Submitted by: glebius	2012-01-03 16:22:29 +00:00
Gleb Smirnoff	6d18ea8ff9	Fix double free. PR: kern/163089 Submitted by: Herbie Robinson <Herbie.Robinson stratus.com>	2011-12-07 13:37:42 +00:00
Ed Schouten	6472ac3d8a	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
Ed Schouten	d745c852be	Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs. This means that their use is restricted to a single C file.	2011-11-07 06:44:47 +00:00
Sergey Kandaurov	0ad2addc9d	Fix if_addr_mtx recursion in mld6. mld_set_version() is called only from mld_v1_input_query() and mld_v2_input_query() both holding the if_addr_mtx lock, and then calling into mld_v2_cancel_link_timers() acquires it the second time, which results in mtx recursion. To avoid that, delay if_addr_mtx acquisition until after mld_set_version() is called; while here, further reduce locking scope to protect only the needed pieces: if_multiaddrs, in6m_lookup_locked(). PR: kern/158426 Reported by: Thomas <tps vr-web.de>, Tom Vijlbrief <tom.vijlbrief xs4all.nl> Tested by: Tom Vijlbrief Reviewed by: bz Approved by: re (kib)	2011-08-22 23:39:40 +00:00
Dimitry Andric	3e288e6238	After some off-list discussion, revert a number of changes to the DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various people working on the affected files. A better long-term solution is still being considered. This reversal may give some modules empty set_pcpu or set_vnet sections, but these are harmless. Changes reverted: ------------------------------------------------------------------------ r215318 \| dim \| 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) \| 4 lines Instead of unconditionally emitting .globl's for the __start_set_xxx and __stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu sections are actually defined. ------------------------------------------------------------------------ r215317 \| dim \| 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) \| 3 lines Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree. ------------------------------------------------------------------------ r215316 \| dim \| 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) \| 2 lines Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.	2010-11-22 19:32:54 +00:00
Dimitry Andric	31c6a0037e	Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree.	2010-11-14 20:38:11 +00:00
Bruce M Simpson	f1014c074d	When embedding the scope ID in MLDv1 output, check if the scope of the address being embedded is in fact link-local, before attempting to embed it. Note that this operation is a side-effect of trying to avoid recursion on the IN6 scope lock. PR: 144560 Submitted by: Petr Lampa MFC after: 3 days	2010-04-10 12:24:21 +00:00
Bruce M Simpson	aa16623133	Use ALLOW_NEW_SOURCES and BLOCK_OLD_SOURCES to signal a join or leave with SSM MLDv2 by default. This is current practice and complies with RFC 4604, as well as being required by production IPv6 networks in Japan. The behaviour may be disabled by setting the net.inet6.mld.use_allow sysctl/tunable to 0. Requested by: Hideki Yamamoto MFC after: 1 week	2009-12-22 20:40:22 +00:00
Bruce M Simpson	977ff62485	Add missing #include <sys/ktr.h>. Submitted by: Hideki Yamamoto MFC after: 1 week	2009-12-15 10:40:40 +00:00
Robert Watson	530c006014	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)	2009-08-01 19:26:27 +00:00
Robert Watson	d0728d7174	Introduce and use a sysinit-based initialization scheme for virtual network stacks, VNET_SYSINIT: - Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will occur each time a network stack is instantiated and destroyed. In the !VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT. For the VIMAGE case, we instead use SYSINIT's to track their order and properties on registration, using them for each vnet when created/ destroyed, or immediately on module load for already-started vnets. - Remove vnet_modinfo mechanism that existed to serve this purpose previously, as well as its dependency scheme: we now just use the SYSINIT ordering scheme. - Implement VNET_DOMAIN_SET() to allow protocol domains to declare that they want init functions to be called for each virtual network stack rather than just once at boot, compiling down to DOMAIN_SET() in the non-VIMAGE case. - Walk all virtualized kernel subsystems and make use of these instead of modinfo or DOMAIN_SET() for init/uninit events. In some cases, convert modular components from using modevent to using sysinit (where appropriate). In some cases, do minor rejuggling of SYSINIT ordering to make room for or better manage events. Portions submitted by: jhb (VNET_SYSINIT), bz (cleanup) Discussed with: jhb, bz, julian, zec Reviewed by: bz Approved by: re (VIMAGE blanket)	2009-07-23 20:46:49 +00:00
Robert Watson	0a4747d4d0	Garbage collect vnet module registrations that have neither constructors nor destructors, as there's no actual work to do. In most cases, the constructors weren't needed because of the existing protocol initialization functions run by net_init_domain() as part of VNET_MOD_NET, or they were eliminated when support for static initialization of virtualized globals was added. Garbage collect dependency references to modules without constructors or destructors, notably VNET_MOD_INET and VNET_MOD_INET6. Reviewed by: bz Approved by: re (vimage blanket)	2009-07-20 13:55:33 +00:00
Robert Watson	5ee847d3ac	Reimplement and/or implement vnet list locking by replacing a mostly unused custom mutex/condvar-based sleep locks with two locks: an rwlock (for non-sleeping use) and sxlock (for sleeping use). Either acquired for read is sufficient to stabilize the vnet list, but both must be acquired for write to modify the list. Replace previous no-op read locking macros, used in various places in the stack, with actual locking to prevent race conditions. Callers must declare when they may perform unbounded sleeps or not when selecting how to lock. Refactor vnet sysinits so that the vnet list and locks are initialized before kernel modules are linked, as the kernel linker will use them for modules loaded by the boot loader. Update various consumers of these KPIs based on whether they may sleep or not. Reviewed by: bz Approved by: re (kib)	2009-07-19 14:20:53 +00:00
Robert Watson	1e77c1056a	Remove unused VNET_SET() and related macros; only VNET_GET() is ever actually used. Rename VNET_GET() to VNET() to shorten variable references. Discussed with: bz, julian Reviewed by: bz Approved by: re (kensmith, kib)	2009-07-16 21:13:04 +00:00
Robert Watson	eddfbb763d	Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)	2009-07-14 22:48:30 +00:00
Robert Watson	8c0fec805f	Modify most routines returning 'struct ifaddr *' to return references rather than pointers, requiring callers to properly dispose of those references. The following routines now return references: ifaddr_byindex ifa_ifwithaddr ifa_ifwithbroadaddr ifa_ifwithdstaddr ifa_ifwithnet ifaof_ifpforaddr ifa_ifwithroute ifa_ifwithroute_fib rt_getifa rt_getifa_fib IFP_TO_IA ip_rtaddr in6_ifawithifp in6ifa_ifpforlinklocal in6ifa_ifpwithaddr in6_ifadd carp_iamatch6 ip6_getdstifaddr Remove unused macro which didn't have required referencing: IFP_TO_IA6 This closes many small races in which changes to interface or address lists while an ifaddr was in use could lead to use of freed memory (etc). In a few cases, add missing if_addr_list locking required to safely acquire references. Because of a lack of deep copying support, we accept a race in which an in6_ifaddr pointed to by mbuf tags and extracted with ip6_getdstifaddr() doesn't hold a reference while in transmit. Once we have mbuf tag deep copy support, this can be fixed. Reviewed by: bz Obtained from: Apple, Inc. (portions) MFC after: 6 weeks (portions)	2009-06-23 20:19:09 +00:00
Marko Zec	878a6d7dff	Remove unnecessary #ifdef lines and code. Approved by: julian (mentor)	2009-06-12 09:31:14 +00:00
Bruce M Simpson	29dc7bc636	Merge final round of MLD changes from p4: ip6_input.c, in6.h: * Add netinet6-specific mbuf flag M_RTALERT_MLD, shadowing M_PROTO6. * Always set this flag if HBH Router Alert option is present for MLD, even when not forwarding. icmp6.c: * In icmp6_input(), spell m->m_pkthdr.rcvif as ifp to be consistent. * Use scope ID for verifying input. Do not apply SSM filters here, no inpcb. * Check for M_RTALERT_MLD when validating MLD traffic, as we can't see IPv6 hop options outside of ip6_input(). in6_mcast.c: * Use KAME scope/zone ID in in6_multi. * Update net.inet6.ip6.mcast.filters implementation to use scope IDs for comparisons. * Fix scope ID treatment in multicast socket option processing. Scope IDs passed in from userland will be ignored as other less ambiguous APIs exist for specifying the link. * Tighten userland input checks in IPv6 SSM delta and full-state ops. * Source filter embedded scope IDs need to be revisited, for now just clear them and ignore them on input. * Adapt KAME behaviour of looking up the scope ID in the default zone for multicast leaves, when the interface is ambiguous. mld6.c: * Tighten origin checks on MLD traffic as per RFC3810 Section 6.2: * ip6_src MAY be the unspecified address for MLDv1 reports. * ip6_src MAY have link-local address scope for MLDv1 reports, MLDv1 queries, and MLDv2 queries. * Perform address field validation before accepting queries. * Use KAME scope/zone ID in query/report processing. * Break const correctness for mld_v1_input_report(), mld_v1_input_query() as we temporarily modify the input mbuf chain. * Clear the scope ID before handoff to userland MLD daemon. * Fix MLDv1 old querier present timer processing. With the protocol defaults, hosts should revert to MLDv2 after 260s. * Add net.inet6.mld.v1enable sysctl, default to on. ifmcstat.c: * Use sysctl by default; -K requests kvm(3) if so compiled. mld.4: * Connect man page to build. Tested using PCS.	2009-05-27 18:57:13 +00:00
Bruce M Simpson	05e5bb311b	Pullup from p4 tip: * Fix MLDv2 general query timer (fallout from automated refactoring). * Refactor MLDv1 timer. MLDv2 query processing is now working.	2009-05-21 18:05:17 +00:00
Bruce M Simpson	0ed39d3ec4	Pullup svn source to p4 top of tree: * Fix LOR in MLDv2 query input path. * Strip embedded KAME scope IDs for on-wire IPv6 address comparisons.	2009-05-21 17:01:38 +00:00
Marko Zec	94e9f5a1c2	Remove unnecessary CURVNET_SET() calls where curvnet context is (i.e. seems to be) already set. This should reduce console noise due to curvnet recursion reports. This change has no impact on nooptions VIMAGE builds. Approved by: julian (mentor)	2009-05-06 13:30:46 +00:00
Marko Zec	21ca7b57bd	Change the curvnet variable from a global const struct vnet , previously always pointing to the default vnet context, to a dynamically changing thread-local one. The currvnet context should be set on entry to networking code via CURVNET_SET() macros, and reverted to previous state via CURVNET_RESTORE(). Recursions on curvnet are permitted, though strongly discuouraged. This change should have no functional impact on nooptions VIMAGE kernel builds, where CURVNET_ macros expand to whitespace. The curthread->td_vnet (aka curvnet) variable's purpose is to be an indicator of the vnet context in which the current network-related operation takes place, in case we cannot deduce the current vnet context from any other source, such as by looking at mbuf's m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so far curvnet has turned out to be an invaluable consistency checking aid: it helps to catch cases when sockets, ifnets or any other vnet-aware structures may have leaked from one vnet to another. The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros was a result of an empirical iterative process, whith an aim to reduce recursions on CURVNET_SET() to a minimum, while still reducing the scope of CURVNET_SET() to networking only operations - the alternative would be calling CURVNET_SET() on each system call entry. In general, curvnet has to be set in three typicall cases: when processing socket-related requests from userspace or from within the kernel; when processing inbound traffic flowing from device drivers to upper layers of the networking stack, and when executing timer-driven networking functions. This change also introduces a DDB subcommand to show the list of all vnet instances. Approved by: julian (mentor)	2009-05-05 10:56:12 +00:00
Marko Zec	f6dfe47a14	Permit buiding kernels with options VIMAGE, restricted to only a single active network stack instance. Turning on options VIMAGE at compile time yields the following changes relative to default kernel build: 1) V_ accessor macros for virtualized variables resolve to structure fields via base pointers, instead of being resolved as fields in global structs or plain global variables. As an example, V_ifnet becomes: options VIMAGE: ((struct vnet_net ) vnet_net)->_ifnet default build: vnet_net_0._ifnet options VIMAGE_GLOBALS: ifnet 2) INIT_VNET_ macros will declare and set up base pointers to be used by V_ accessor macros, instead of resolving to whitespace: INIT_VNET_NET(ifp->if_vnet); becomes struct vnet_net vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET]; 3) Memory for vnet modules registered via vnet_mod_register() is now allocated at run time in sys/kern/kern_vimage.c, instead of per vnet module structs being declared as globals. If required, vnet modules can now request the framework to provide them with allocated bzeroed memory by filling in the vmi_size field in their vmi_modinfo structures. 4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are extended to hold a pointer to the parent vnet. options VIMAGE builds will fill in those fields as required. 5) curvnet is introduced as a new global variable in options VIMAGE builds, always pointing to the default and only struct vnet. 6) struct sysctl_oid has been extended with additional two fields to store major and minor virtualization module identifiers, oid_v_subs and oid_v_mod. SYSCTL_V_ family of macros will fill in those fields accordingly, and store the offset in the appropriate vnet container struct in oid_arg1. In sysctl handlers dealing with virtualized sysctls, the SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target variable and make it available in arg1 variable for further processing. Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have been deleted. Reviewed by: bz, rwatson Approved by: julian (mentor)	2009-04-30 13:36:26 +00:00
Bruce M Simpson	33cde13046	Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit: import from p4 bms_netdev. Summary of changes: * Connect netinet6/in6_mcast.c to build. The legacy KAME KPIs are mostly preserved. * Eliminate now dead code from ip6_output.c. Don't do mbuf bingo, we are not going to do RFC 2292 style CMSG tricks for multicast options as they are not required by any current IPv6 normative reference. * Refactor transports (UDP, raw_ip6) to do own mcast filtering. SCTP, TCP unaffected by this change. * Add ip6_msource, in6_msource structs to in6_var.h. * Hookup mld_ifinfo state to in6_ifextra, allocate from domifattach path. * Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced. Kernel consumers which need this should use in6m_lookup(). * Refactor IPv6 socket group memberships to use a vector (like IPv4). * Update ifmcstat(8) for IPv6 SSM. * Add witness lock order for IN6_MULTI_LOCK. * Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths. * Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup. * Update carp(4) for new IPv6 SSM KPIs. * Virtualize ip6_mrouter socket. Changes mostly localized to IPv6 MROUTING. * Don't do a local group lookup in MROUTING. * Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge(). * Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode. * Bump __FreeBSD_version to 800084. * Update UPDATING. NOTE WELL: * This code hasn't been tested against real MLDv2 queriers (yet), although the on-wire protocol has been verified in Wireshark. * There are a few unresolved issues in the socket layer APIs to do with scope ID propagation. * There is a LOR present in ip6_output()'s use of in6_setscope() which needs to be resolved. See comments in mld6.c. This is believed to be benign and can't be avoided for the moment without re-introducing an indirect netisr. This work was mostly derived from the IGMPv3 implementation, and has been sponsored by a third party.	2009-04-29 19:19:13 +00:00

1 2

92 Commits