freebsd-skq

Author	SHA1	Message	Date
jtl	e5f23fbf44	Implement a limit on on the number of IPv6 reassembly queues per bucket. There is a hashing algorithm which should distribute IPv6 reassembly queues across the available buckets in a relatively even way. However, if there is a flaw in the hashing algorithm which allows a large number of IPv6 fragment reassembly queues to end up in a single bucket, a per- bucket limit could help mitigate the performance impact of this flaw. Implement such a limit, with a default of twice the maximum number of reassembly queues divided by the number of buckets. Recalculate the limit any time the maximum number of reassembly queues changes. However, allow the user to override the value using a sysctl (net.inet6.ip6.maxfragbucketsize). Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:27:41 +00:00
jtl	a7668fa529	Add a limit of the number of fragments per IPv6 packet. The IPv4 fragment reassembly code supports a limit on the number of fragments per packet. The default limit is currently 17 fragments. Among other things, this limit serves to limit the number of fragments the code must parse when trying to reassembly a packet. Add a limit to the IPv6 reassembly code. By default, limit a packet to 65 fragments (64 on the queue, plus one final fragment to complete the packet). This allows an average fragment size of 1,008 bytes, which should be sufficient to hold a fragment. (Recall that the IPv6 minimum MTU is 1280 bytes. Therefore, this configuration allows a full-size IPv6 packet to be fragmented on a link with the minimum MTU and still carry approximately 272 bytes of headers before the fragmented portion of the packet.) Users can adjust this limit using the net.inet6.ip6.maxfragsperpacket sysctl. Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:26:07 +00:00
jtl	1f361945df	Make the IPv6 fragment limits be global, rather than per-VNET, limits. The IPv6 reassembly fragment limit is based on the number of mbuf clusters, which are a global resource. However, the limit is currently applied on a per-VNET basis. Given enough VNETs (or given sufficient customization on enough VNETs), it is possible that the sum of all the VNET fragment limits will exceed the number of mbuf clusters available in the system. Given the fact that the fragment limits are intended (at least in part) to regulate access to a global resource, the IPv6 fragment limit should be applied on a global basis. Note that it is still possible to disable fragmentation for a particular VNET by setting the net.inet6.ip6.maxfragpackets sysctl to 0 for that VNET. In addition, it is now possible to disable fragmentation globally by setting the net.inet6.ip6.maxfrags sysctl to 0. Reviewed by: jhb Security: FreeBSD-SA-18:10.ip Security: CVE-2018-6923	2018-08-14 17:24:26 +00:00
mmacy	ce91e745ec	ip(6)_freemoptions: defer imo destruction to epoch callback task Avoid the ugly unlock / lock of the inpcbinfo where we need to figure out what kind of lock we hold by simply deferring the operation to another context. (Also a small dependency for converting the pcbinfo read lock to epoch)	2018-05-20 00:22:28 +00:00
ae	8d74fbedd3	Modify ip6_get_prevhdr() to be able use it safely. Instead of returning pointer to the previous header, return its offset. In frag6_input() use m_copyback() and determined offset to store next header instead of accessing to it by pointer and assuming that the memory is contiguous. In rip6_input() use offset returned by ip6_get_prevhdr() instead of calculating it from pointers arithmetic, because IP header can belong to another mbuf in the chain. Reported by: Maxime Villard <max at m00nbsd dot net> Reviewed by: kp MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D14158	2018-02-05 09:22:07 +00:00
pfg	4736ccfd9c	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.	2017-11-20 19:43:44 +00:00
imp	7e6cabd06e	Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96	2017-02-28 23:42:47 +00:00
bz	fac944a70a	The pr_destroy field does not allow us to run the teardown code in a specific order. VNET_SYSUNINITs however are doing exactly that. Thus remove the VIMAGE conditional field from the domain(9) protosw structure and replace it with VNET_SYSUNINITs. This also allows us to change some order and to make the teardown functions file local static. Also convert divert(4) as it uses the same mechanism ip(4) and ip6(4) use internally. Slightly reshuffle the SI_SUB_* fields in kernel.h and add a new ones, e.g., for pfil consumers (firewalls), partially for this commit and for others to come. Reviewed by: gnn, tuexen (sctp), jhb (kernel.h) Obtained from: projects/vnet MFC after: 2 weeks X-MFC: do not remove pr_destroy Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6652	2016-06-01 10:14:04 +00:00
glebius	163857deb4	New way to manage reference counting of mbuf external storage. The m_ext.ext_cnt pointer becomes a union. It can now hold the refcount value itself. To tell that m_ext.ext_flags flag EXT_FLAG_EMBREF is used. The first mbuf to attach a cluster stores the refcount. The further mbufs to reference the cluster point at refcount in the first mbuf. The first mbuf is freed only when the last reference is freed. The benefit over refcounts stored in separate slabs is that now refcounts of different, unrelated mbufs do not share a cache line. For EXT_EXTREF mbufs the zone_ext_refcnt is no longer needed, and m_extadd() becomes void, making widely used M_EXTADD macro safe. For EXT_SFBUF mbufs the sf_ext_ref() is removed, which was an optimization exactly against the cache aliasing problem with regular refcounting. Discussed with: rrs, rwatson, gnn, hiren, sbruno, np Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D5396 Sponsored by: Netflix	2016-03-01 00:17:14 +00:00
melifaro	21632a9bd9	Split in6_selectsrc() into in6_selectsrc_addr() and in6_selectsrc_socket(). in6_selectsrc() has 2 class of users: socket-based one (raw/udp/pcb/etc) and socket-less (ND code). The main reason for that change is inability to specify non-default FIB for callers w/o socket since (internally) inpcb is used to determine fib. As as result, add 2 wrappers for in6_selectsrc() (making in6_selectsrc() static): 1) in6_selectsrc_socket() for the former class. Embed scope_ambiguous check along with returning hop limit when needed. 2) in6_selectsrc_addr() for the latter case. Add 'fibnum' argument and pass IPv6 address w/ explicitly specified scope as separate argument. Reviewed by: ae (previous version)	2016-01-10 13:40:29 +00:00
melifaro	113d546f8e	Remove 'struct route_int6' argument from in6_selectsrc() and in6_selectif(). The main task of in6_selectsrc() is to return IPv6 SAS (along with output interface used for scope checks). No data-path code uses route argument for caching. The only users are icmp6 (reflect code), ND6 ns/na generation code. All this fucntions are control-plane, so there is no reason to try to 'optimize' something by passing cached route into to ip6_output(). Given that, simplify code by eliminating in6_selectsrc() 'struct route_in6' argument. Since in6_selectif() is used only by in6_selectsrc(), eliminate its 'struct route_in6' argument, too. While here, reshape rte-related code inside in6_selectif() to free lookup result immediately after saving all the needed fields.	2016-01-03 10:43:23 +00:00
adrian	7ba24ae636	[netinet6]: Create a new IPv6 netisr which expects the frames to have been verified. This is required for fragments and encapsulated data (eg tunneling) to be redistributed to the RSS bucket based on the eventual IPv6 header and protocol (TCP, UDP, etc) header. * Add an mbuf tag with the state of IPv6 options parsing before the frame is queued into the direct dispatch handler; * Continue processing and complete the frame reception in the correct RSS bucket / netisr context. Testing results are in the phabricator review. Differential Revision: https://reviews.freebsd.org/D3563 Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn>	2015-11-06 23:07:43 +00:00
adrian	a3c3341951	Implement RSS hashing/re-hashing for IPv6 ingress packets. This mirrors the basic IPv4 implementation - IPv6 packets under RSS now are checked for a correct RSS hash and if one isn't provided, it's done in software. This only handles the initial receive - it doesn't yet handle reinjecting / rehashing packets after being decapsulated from various tunneling setups. That'll come in some follow-up work. For non-RSS users, this is almost a giant no-op. It does change a couple of ipv6 methods to use const mbuf * instead of mbuf * but it doesn't have any functional changes. So, the following now occurs: * If the NIC doesn't do any RSS hashing, it's all done in software. Single-queue, non-RSS NICs will now have the RX path distributed into multiple receive netisr queues. * If the NIC provides the wrong hash (eg only IPv6 hash when we needed an IPv6 TCP hash, or IPv6 UDP hash when we expected IPv6 hash) then the hash is recalculated. * .. if the hash is recalculated, it'll end up being injected into the correct netisr queue for v6 processing. Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn> Differential Revision: https://reviews.freebsd.org/D3504	2015-08-29 07:14:29 +00:00
melifaro	a915efe931	Simplify ip[6] simploop: Do not pass 'dst' sockaddr to ip[6]_mloopback: - We have explicit check for AF_INET in ip_output() - We assume ip header inside passed mbuf in ip_mloopback - We assume ip6 header inside passed mbuf in ip6_mloopback	2015-08-08 15:58:35 +00:00
kp	86dedea3cb	Preserve IPv6 fragment IDs accross reassembly and refragmentation When forwarding fragmented IPv6 packets and filtering with PF we reassemble and refragment. That means we generate new fragment headers and a new fragment ID. We already save the fragment IDs so we can do the reassembly so it's straightforward to apply the incoming fragment ID on the refragmented packets. Differential Revision: https://reviews.freebsd.org/D2188 Approved by: gnn (mentor)	2015-04-01 12:15:01 +00:00
ae	a312c1bedf	Fix deadlock in IPv6 PCB code. When several threads are trying to send datagram to the same destination, but fragmentation is disabled and datagram size exceeds link MTU, ip6_output() calls pfctlinput2(PRC_MSGSIZE). It does notify all sockets wanted to know MTU to this destination. And since all threads hold PCB lock while sending, taking the lock for each PCB in the in6_pcbnotify() leads to deadlock. RFC 3542 p.11.3 suggests notify all application wanted to receive IPV6_PATHMTU ancillary data for each ICMPv6 packet too big message. But it doesn't require this, when we don't receive ICMPv6 message. Change ip6_notify_pmtu() function to be able use it directly from ip6_output() to notify only one socket, and to notify all sockets when ICMPv6 packet too big message received. PR: 197059 Differential Revision: https://reviews.freebsd.org/D1949 Reviewed by: no objection from #network Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2015-03-04 11:20:01 +00:00
glebius	1b68ebd476	Factor out ip6_fragment() function, to be used in IPv6 stack and pf(4). Submitted by: Kristof Provost Differential Revision: D1766	2015-02-16 06:30:27 +00:00
glebius	35ef97e1c7	Factor out ip6_deletefraghdr() function, to be shared between IPv6 stack and pf(4). Submitted by: Kristof Provost Reviewed by: ae Differential Revision: D1764	2015-02-16 01:12:20 +00:00
melifaro	b5d711d3a6	Renove faith(4) and faithd(8) from base. It looks like industry have chosen different (and more traditional) stateless/statuful NAT64 as translation mechanism. Last non-trivial commits to both faith(4) and faithd(8) happened more than 12 years ago, so I assume it is time to drop RFC3142 in FreeBSD. No objections from: net@	2014-11-09 21:33:01 +00:00
ae	61303c568a	Remove ip6_getdstifaddr() and all functions to work with auxiliary data. It isn't safe to keep unreferenced ifaddrs. Use in6ifa_ifwithaddr() to determine ifaddr corresponding to destination address. Since currently we keep addresses with embedded scope zone, in6ifa_ifwithaddr is called with zero zoneid and marked with XXX. Also remove route and lle lookups from ip6_input. Use in6ifa_ifwithaddr() instead. Sponsored by: Yandex LLC	2014-11-08 19:38:34 +00:00
kevlo	7727a3c215	Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have only one protocol switch structure that is shared between ipv4 and ipv6. Phabric: D476 Reviewed by: jhb	2014-08-08 01:57:15 +00:00
vanhu	451f0d7511	Fixed IPv4-in-IPv6 and IPv6-in-IPv4 IPsec tunnels. For IPv6-in-IPv4, you may need to do the following command on the tunnel interface if it is configured as IPv4 only: ifconfig <interface> inet6 -ifdisabled Code logic inspired from NetBSD. PR: kern/169438 Submitted by: emeric.poupon@netasq.com Reviewed by: fabient, ae Obtained from: NETASQ	2014-05-28 12:45:27 +00:00
glebius	d734bed796	Since both netinet/ and netinet6/ call into netipsec/ and netpfil/, the protocol specific mbuf flags are shared between them. - Move all M_FOO definitions into a single place: netinet/in6.h, to avoid future clashes. - Resolve clash between M_DECRYPTED and M_SKIP_FIREWALL which resulted in a failure of operation of IPSEC and packet filters. Thanks to Nicolas and Georgios for all the hard work on bisecting, testing and finally finding the root of the problem. PR: kern/186755 PR: kern/185876 In collaboration with: Georgios Amanakis <gamanakis gmail.com> In collaboration with: Nicolas DEFFAYET <nicolas-ml deffayet.com> Sponsored by: Nginx, Inc.	2014-03-12 14:29:08 +00:00
andre	fd76db4587	Move the global M_SKIP_FIREWALL mbuf flags to a protocol layer specific flag instead. The flag is only used within the IP and IPv6 layer 3 protocols. Because some firewall packages treat IPv4 and IPv6 packets the same the flag should have the same value for both. Discussed with: trociny, glebius	2013-08-19 11:08:36 +00:00
ae	e5b002a3b8	Migrate structs ip6stat, icmp6stat and rip6stat to PCPU counters.	2013-07-09 09:54:54 +00:00
ae	1a36dfcc87	Prepare network statistics structures for migration to PCPU counters. Use uint64_t as type for all fields of structures. Changed structures: ahstat, arpstat, espstat, icmp6_ifstat, icmp6stat, in6_ifstat, ip6stat, ipcompstat, ipipstat, ipsecstat, mrt6stat, mrtstat, pfkeystat, pim6stat, pimstat, rip6stat, udpstat. Discussed with: arch@	2013-07-09 09:32:06 +00:00
tijl	40254de0a6	Fix build after r249543.	2013-04-16 16:59:29 +00:00
ae	e7b578dd8b	Replace hardcoded numbers.	2013-04-16 11:12:58 +00:00
ae	001d436ac8	Use M_PROTO7 flag for M_IP6_NEXTHOP, because M_PROTO2 was used for M_AUTHIPHDR. Pointy hat to: ae Reported by: Vadim Goncharov MFC after: 3 days	2012-12-17 14:36:56 +00:00
ae	4354018055	Remove the recently added sysctl variable net.pfil.forward. Instead, add protocol specific mbuf flags M_IP_NEXTHOP and M_IP6_NEXTHOP. Use them to indicate that the mbuf's chain contains the PACKET_TAG_IPFORWARD tag. And do a tag lookup only when this flag is set. Suggested by: andre	2012-11-02 01:20:55 +00:00
delphij	3948ce713c	Remove __P. Submitted by: kevlo Reviewed by: md5(1) MFC after: 2 months	2012-10-22 21:49:56 +00:00
bz	eda9e50c52	MFp4 bz_ipv6_fast: Hide the ip6aux functions. The only one referenced outside ip6_input.c is not compiled in yet (__notyet__) in route6.c (r235954). We do have accessor functions that should be used. Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems Reviewed by: gnn (as part of the whole) MFC After: 3 days X-MFC: KPI?	2012-05-25 01:48:15 +00:00
bz	9eb6f57f87	In selectroute() add a missing fibnum argument to an in6_rtalloc() call in an #if 0 section. In in6_selecthlim() optimize a case where in6p cannot be NULL due to an earlier check. More consistently use u_int instead of int for fibnum function arguments. Sponsored by: Cisco Systems, Inc. MFC after: 3 days	2012-02-24 20:06:04 +00:00
bz	dcdb23291f	Merge multi-FIB IPv6 support from projects/multi-fibv6/head/: Extend the so far IPv4-only support for multiple routing tables (FIBs) introduced in r178888 to IPv6 providing feature parity. This includes an extended rtalloc(9) KPI for IPv6, the necessary adjustments to the network stack, and user land support as in netstat. Sponsored by: Cisco Systems, Inc. Reviewed by: melifaro (basically) MFC after: 10 days	2012-02-17 02:39:58 +00:00
hrs	08320280c6	Add $ipv6_cpe_wanif to enable functionality required for IPv6 CPE (r225485). When setting an interface name to it, the following configurations will be enabled: 1. "no_radr" is set to all IPv6 interfaces automatically. 2. "-no_radr accept_rtadv" will be set only for $ipv6_cpe_wanif. This is done just before evaluating $ifconfig_IF_ipv6 in the rc.d scripts (this means you can manually supersede this configuration if necessary). 3. The node will add RA-sending routers to the default router list even if net.inet6.ip6.forwarding=1. This mode is added to conform to RFC 6204 (a router which connects the end-user network to a service provider network). To enable packet forwarding, you still need to set ipv6_gateway_enable=YES. Note that accepting router entries into the default router list when packet forwarding capability and a routing daemon are enabled can result in messing up the routing table. To minimize such unexpected behaviors, "no_radr" is set on all interfaces but $ipv6_cpe_wanif. Approved by: re (bz)	2011-09-13 00:06:11 +00:00
hrs	4c2206b625	- Accept Router Advertisement messages even when net.inet6.ip6.forwarding=1. - A new per-interface knob IFF_ND6_NO_RADR and sysctl IPV6CTL_NO_RADR. This controls if accepting a route in an RA message as the default route. The default value for each interface can be set by net.inet6.ip6.no_radr. The system wide default value is 0. - A new sysctl: net.inet6.ip6.norbit_raif. This controls if setting R-bit in NA on RA accepting interfaces. The default is 0 (R-bit is set based on net.inet6.ip6.forwarding). Background: IPv6 host/router model suggests a router sends an RA and a host accepts it for router discovery. Because of that, KAME implementation does not allow accepting RAs when net.inet6.ip6.forwarding=1. Accepting RAs on a router can make the routing table confused since it can change the default router unintentionally. However, in practice there are cases where we cannot distinguish a host from a router clearly. For example, a customer edge router often works as a host against the ISP, and as a router against the LAN at the same time. Another example is a complex network configurations like an L2TP tunnel for IPv6 connection to Internet over an Ethernet link with another native IPv6 subnet. In this case, the physical interface for the native IPv6 subnet works as a host, and the pseudo-interface for L2TP works as the default IP forwarding route. Problem: Disabling processing RA messages when net.inet6.ip6.forwarding=1 and accepting them when net.inet6.ip6.forward=0 cause the following practical issues: - A router cannot perform SLAAC. It becomes a problem if a box has multiple interfaces and you want to use SLAAC on some of them, for example. A customer edge router for IPv6 Internet access service using an IPv6-over-IPv6 tunnel sometimes needs SLAAC on the physical interface for administration purpose; updating firmware and so on (link-local addresses can be used there, but GUAs by SLAAC are often used for scalability). - When a host has multiple IPv6 interfaces and it receives multiple RAs on them, controlling the default route is difficult. Router preferences defined in RFC 4191 works only when the routers on the links are under your control. Details of Implementation Changes: Router Advertisement messages will be accepted even when net.inet6.ip6.forwarding=1. More precisely, the conditions are as follow: (ACCEPT_RTADV && !NO_RADR && !ip6.forwarding) => Normal RA processing on that interface. (as IPv6 host) (ACCEPT_RTADV && (NO_RADR \|\| ip6.forwarding)) => Accept RA but add the router to the defroute list with rtlifetime=0 unconditionally. This effectively prevents from setting the received router address as the box's default route. (!ACCEPT_RTADV) => No RA processing on that interface. ACCEPT_RTADV and NO_RADR are per-interface knob. In short, all interface are classified as "RA-accepting" or not. An RA-accepting interface always processes RA messages regardless of ip6.forwarding. The difference caused by NO_RADR or ip6.forwarding is whether the RA source address is considered as the default router or not. R-bit in NA on the RA accepting interfaces is set based on net.inet6.ip6.forwarding. While RFC 6204 W-1 rule (for CPE case) suggests a router should disable the R-bit completely even when the box has net.inet6.ip6.forwarding=1, I believe there is no technical reason with doing so. This behavior can be set by a new sysctl net.inet6.ip6.norbit_raif (the default is 0). Usage: # ifconfig fxp0 inet6 accept_rtadv => accept RA on fxp0 # ifconfig fxp0 inet6 accept_rtadv no_radr => accept RA on fxp0 but ignore default route information in it. # sysctl net.inet6.ip6.norbit_no_radr=1 => R-bit in NAs on RA accepting interfaces will always be set to 0.	2011-06-06 02:14:23 +00:00
brucec	cd6001f0b6	Fix more continuous/contiguous typos (cf. r215955)	2010-11-27 21:51:39 +00:00
bz	18b40a43ee	MFp4 CH=183052 183053 183258: In protosw we define pr_protocol as short, while on the wire it is an uint8_t. That way we can have "internal" protocols like DIVERT, SEND or gaps for modules (PROTO_SPACER). Switch ipproto_{un,}register to accept a short protocol number() and do an upfront check for valid boundries. With this we also consistently report EPROTONOSUPPORT for out of bounds protocols, as we did for proto == 0. This allows a caller to not error for this case, which is especially important if we want to automatically call these from domain handling. () the functions have been without any in-tree consumer since the initial introducation, so this is considered save. Implement ip6proto_{un,}register() similarly to their legacy IP counter parts to allow modules to hook up dynamically. Reviewed by: philip, will MFC after: 1 week	2010-09-02 17:43:44 +00:00
bz	0a90ef1728	MFP4: @176978-176982, 176984, 176990-176994, 177441 "Whitspace" churn after the VIMAGE/VNET whirls. Remove the need for some "init" functions within the network stack, like pim6_init(), icmp_init() or significantly shorten others like ip6_init() and nd6_init(), using static initialization again where possible and formerly missed. Move (most) variables back to the place they used to be before the container structs and VIMAGE_GLOABLS (before r185088) and try to reduce the diff to stable/7 and earlier as good as possible, to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9. This also removes some header file pollution for putatively static global variables. Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are no longer needed. Reviewed by: jhb Discussed with: rwatson Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 6 days	2010-04-29 11:52:42 +00:00
julian	79c1f884ef	Virtualize the pfil hooks so that different jails may chose different packet filters. ALso allows ipfw to be enabled on on ejail and disabled on another. In 8.0 it's a global setting. Sitting aroung in tree waiting to commit for: 2 months MFC after: 2 months	2009-10-11 05:59:43 +00:00
rwatson	88f8de4d40	Remove unused VNET_SET() and related macros; only VNET_GET() is ever actually used. Rename VNET_GET() to VNET() to shorten variable references. Discussed with: bz, julian Reviewed by: bz Approved by: re (kensmith, kib)	2009-07-16 21:13:04 +00:00
rwatson	57ca4583e7	Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)	2009-07-14 22:48:30 +00:00
bz	a8839212d2	Make callers to in6_selectsrc() and in6_pcbladdr() pass in memory to save the selected source address rather than returning an unreferenced copy to a pointer that might long be gone by the time we use the pointer for anything meaningful. Asked for by: rwatson Reviewed by: rwatson	2009-06-23 22:08:55 +00:00
zec	8b1f38241a	Introduce an infrastructure for dismantling vnet instances. Vnet modules and protocol domains may now register destructor functions to clean up and release per-module state. The destructor mechanisms can be triggered by invoking "vimage -d", or a future equivalent command which will be provided via the new jail framework. While this patch introduces numerous placeholder destructor functions, many of those are currently incomplete, thus leaking memory or (even worse) failing to stop all running timers. Many of such issues are already known and will be incrementaly fixed over the next weeks in smaller incremental commits. Apart from introducing new fields in structs ifnet, domain, protosw and vnet_net, which requires the kernel and modules to be rebuilt, this change should have no impact on nooptions VIMAGE builds, since vnet destructors can only be called in VIMAGE kernels. Moreover, destructor functions should be in general compiled in only in options VIMAGE builds, except for kernel modules which can be safely kldunloaded at run time. Bump __FreeBSD_version to 800097. Reviewed by: bz, julian Approved by: rwatson, kib (re), julian (mentor)	2009-06-08 17:15:40 +00:00
bms	32a71137f0	Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit: import from p4 bms_netdev. Summary of changes: * Connect netinet6/in6_mcast.c to build. The legacy KAME KPIs are mostly preserved. * Eliminate now dead code from ip6_output.c. Don't do mbuf bingo, we are not going to do RFC 2292 style CMSG tricks for multicast options as they are not required by any current IPv6 normative reference. * Refactor transports (UDP, raw_ip6) to do own mcast filtering. SCTP, TCP unaffected by this change. * Add ip6_msource, in6_msource structs to in6_var.h. * Hookup mld_ifinfo state to in6_ifextra, allocate from domifattach path. * Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced. Kernel consumers which need this should use in6m_lookup(). * Refactor IPv6 socket group memberships to use a vector (like IPv4). * Update ifmcstat(8) for IPv6 SSM. * Add witness lock order for IN6_MULTI_LOCK. * Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths. * Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup. * Update carp(4) for new IPv6 SSM KPIs. * Virtualize ip6_mrouter socket. Changes mostly localized to IPv6 MROUTING. * Don't do a local group lookup in MROUTING. * Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge(). * Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode. * Bump __FreeBSD_version to 800084. * Update UPDATING. NOTE WELL: * This code hasn't been tested against real MLDv2 queriers (yet), although the on-wire protocol has been verified in Wireshark. * There are a few unresolved issues in the socket layer APIs to do with scope ID propagation. * There is a LOR present in ip6_output()'s use of in6_setscope() which needs to be resolved. See comments in mld6.c. This is believed to be benign and can't be avoided for the moment without re-introducing an indirect netisr. This work was mostly derived from the IGMPv3 implementation, and has been sponsored by a third party.	2009-04-29 19:19:13 +00:00
bz	5d8f0a53a7	Remove the single global unlocked route cache ip6_forward_rt from the inet6 stack along with statistics and make sure we properly free the rt in all cases. While the current situation is not better performance wise it prevents panics seen more often these days. After more inet6 and ipsec cleanup we should be able to improve the situation again passing the rt to ip6_forward directly. Leave the ip6_forward_rt entry in struct vinet6 but mark it for removal. PR: kern/128247, kern/131038 MFC after: 25 days Committed from: Bugathon #6 Tested by: Denis Ahrens <denis@h3q.com> (different initial version)	2009-02-01 21:11:08 +00:00
bz	ec7e619d54	Remove 4 entirely unsued ip6 variables. Leave then in struct vinet6 to not break the ABI with kernel modules but mark them for removal so we can do it in one batch when the time is right. MFC after: 1 month	2009-01-30 23:40:24 +00:00
qingli	ec826ad5c7	This main goals of this project are: 1. separating L2 tables (ARP, NDP) from the L3 routing tables 2. removing as much locking dependencies among these layers as possible to allow for some parallelism in the search operations 3. simplify the logic in the routing code, The most notable end result is the obsolescent of the route cloning (RTF_CLONING) concept, which translated into code reduction in both IPv4 ARP and IPv6 NDP related modules, and size reduction in struct rtentry{}. The change in design obsoletes the semantics of RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland applications such as "arp" and "ndp" have been modified to reflect those changes. The output from "netstat -r" shows only the routing entries. Quite a few developers have contributed to this project in the past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and Andre Oppermann. And most recently: - Kip Macy revised the locking code completely, thus completing the last piece of the puzzle, Kip has also been conducting active functional testing - Sam Leffler has helped me improving/refactoring the code, and provided valuable reviews - Julian Elischer setup the perforce tree for me and has helped me maintaining that branch before the svn conversion	2008-12-15 06:10:57 +00:00
bz	98e7fe0e6a	Second round of putting global variables, which were virtualized but formerly missed under VIMAGE_GLOBAL. Put the extern declarations of the virtualized globals under VIMAGE_GLOBAL as the globals themsevles are already. This will help by the time when we are going to remove the globals entirely. Sponsored by: The FreeBSD Foundation	2008-12-13 19:13:03 +00:00
bz	83a32f8750	Put a global variables, which were virtualized but formerly missed under VIMAGE_GLOBAL. Start putting the extern declarations of the virtualized globals under VIMAGE_GLOBAL as the globals themsevles are already. This will help by the time when we are going to remove the globals entirely. While there garbage collect a few dead externs from ip6_var.h. Sponsored by: The FreeBSD Foundation	2008-12-11 16:26:38 +00:00

1 2

95 Commits