From 743eee666f210e57243adc9ff8eb72e1082bff48 Mon Sep 17 00:00:00 2001 From: SUZUKI Shinsuke Date: Fri, 21 Oct 2005 16:23:01 +0000 Subject: [PATCH] sync with KAME regarding NDP - introduced fine-grain-timer to manage ND-caches and IPv6 Multicast-Listeners - supports Router-Preference - better prefix lifetime management - more spec-comformant DAD advertisement - updated RFC/internet-draft revisions Obtained from: KAME Reviewed by: ume, gnn MFC after: 2 month --- share/doc/IPv6/IMPLEMENTATION | 240 +++++++---- sys/netinet/icmp6.h | 6 +- sys/netinet6/icmp6.c | 4 +- sys/netinet6/in6.c | 404 ++++++++++++++---- sys/netinet6/in6.h | 10 +- sys/netinet6/in6_ifattach.c | 28 +- sys/netinet6/in6_ifattach.h | 2 +- sys/netinet6/in6_var.h | 19 +- sys/netinet6/ip6_output.c | 27 +- sys/netinet6/mld6.c | 162 +++++--- sys/netinet6/mld6_var.h | 1 + sys/netinet6/nd6.c | 548 ++++++++++++++---------- sys/netinet6/nd6.h | 53 ++- sys/netinet6/nd6_nbr.c | 67 +-- sys/netinet6/nd6_rtr.c | 757 +++++++++++++++++++++------------- 15 files changed, 1531 insertions(+), 797 deletions(-) diff --git a/share/doc/IPv6/IMPLEMENTATION b/share/doc/IPv6/IMPLEMENTATION index 9715112a4bfd..bd5f9a14f089 100644 --- a/share/doc/IPv6/IMPLEMENTATION +++ b/share/doc/IPv6/IMPLEMENTATION @@ -204,19 +204,16 @@ RFC3542: Advanced Sockets API for IPv6 (revised) * For supported library functions/kernel APIs, see sys/netinet6/ADVAPI. * Some of the updates in the draft are not implemented yet. See TODO.2292bis for more details. -draft-ietf-ipngwg-icmp-name-lookups-09: IPv6 Name Lookups Through ICMP -draft-ietf-ngtrans-tcpudp-relay-04.txt: - An IPv6-to-IPv4 transport relay translator - * FAITH tcp relay translator (faithd) implements this. See 3.1 for more - details. -draft-ietf-ipngwg-router-selection-01.txt: - Default Router Preferences and More-Specific Routes - * router-side only. -draft-ietf-ipngwg-scoping-arch-02.txt: - The architecture, text representation, and usage of IPv6 - scoped addresses. +RFC4007: IPv6 Scoped Address Architecture * some part of the documentation (especially about the routing model) is not supported yet. + * zone indices that contain scope types have not been supported yet. + +draft-ietf-ipngwg-icmp-name-lookups-09: IPv6 Name Lookups Through ICMP +draft-ietf-ipv6-router-selection-07.txt: + Default Router Preferences and More-Specific Routes + * router-side: both router preference and specific routes are supported. + * host-side: only router preference is supported. draft-ietf-pim-sm-v2-new-02.txt A revised version of RFC2362, which includes the IPv6 specific packet format and protocol descriptions. @@ -224,8 +221,12 @@ draft-ietf-dnsext-mdns-00.txt: Multicast DNS * kame/mdnsd has test implementation, which will not be built in default compilation. The draft will experience a major change in the near future, so don't rely upon it. +draft-ietf-ipngwg-icmp-v3-02.txt: ICMPv6 for IPv6 specification (revised) + * See 1.9 in this document for details. draft-itojun-ipv6-tcp-to-anycast-01.txt: Disconnecting TCP connection toward IPv6 anycast address +draft-ietf-ipv6-rfc2462bis-06.txt: IPv6 Stateless Address + Autoconfiguration (revised) draft-itojun-ipv6-transition-abuse-01.txt: Possible abuse against IPv6 transition technologies (expired) * KAME does not implement RFC1933/2893 automatic tunnel. @@ -240,10 +241,11 @@ draft-itojun-ipv6-flowlabel-api-01.txt: Socket API for IPv6 flow label field 1.2 Neighbor Discovery -Neighbor Discovery is fairly stable. Currently Address Resolution, -Duplicated Address Detection, and Neighbor Unreachability Detection -are supported. In the near future we will be adding Unsolicited Neighbor -Advertisement transmission command as admin tool. +Our implementation of Neighbor Discovery is fairly stable. Currently +Address Resolution, Duplicated Address Detection, and Neighbor +Unreachability Detection are supported. In the near future we will be +adding an Unsolicited Neighbor Advertisement transmission command as +an administration tool. Duplicated Address Detection (DAD) will be performed when an IPv6 address is assigned to a network interface, or the network interface is enabled @@ -253,6 +255,21 @@ generated to syslog (and usually to console). The "duplicated" mark can be checked with ifconfig. It is administrators' responsibility to check for and recover from DAD failures. We may try to improve failure recovery in future KAME code. + +A successor version of RFC2462 (called rfc2462bis) clarifies the +behavior when DAD fails (i.e., duplicate is detected): if the +duplicate address is a link-local address formed from an interface +identifier based on the hardware address which is supposed to be +uniquely assigned (e.g., EUI-64 for an Ethernet interface), IPv6 +operation on the interface should be disabled. The KAME +implementation supports this as follows: if this type of duplicate is +detected, the kernel marks "disabled" in the ND specific data +structure for the interface. Every IPv6 I/O operation in the kernel +checks this mark, and the kernel will drop packets received on or +being sent to the "disabled" interface. Whether the IPv6 operation is +disabled or not can be confirmed by the ndp(8) command. See the man +page for more details. + DAD procedure may not be effective on certain network interfaces/drivers. If a network driver needs long initialization time (with wireless network interfaces this situation is popular), and the driver mistakingly raises @@ -261,15 +278,13 @@ DAD probes to not-really-ready network driver and the packet will not go out from the interface. In such cases, network drivers should be corrected. Some of network drivers loop multicast packets back to themselves, -even if instructed not to do so (especially in promiscuous mode). -In such cases DAD may fail, because DAD engine sees inbound NS packet -(actually from the node itself) and considers it as a sign of duplicate. -In this case, drivers should be corrected to honor IFF_SIMPLEX behavior. -For example, you may need to check source MAC address on an inbound packet, -and reject it if it is from the node itself. -You may also want to look at #if condition marked "heuristics" in -sys/netinet6/nd6_nbr.c:nd6_dad_timer() as workaround (note that the code -fragment in "heuristics" section is not spec conformant). +even if instructed not to do so (especially in promiscuous mode). In +such cases DAD may fail, because the DAD engine sees inbound NS packet +(actually from the node itself) and considers it as a sign of +duplicate. In this case, drivers should be corrected to honor +IFF_SIMPLEX behavior. For example, you may need to check source MAC +address on an inbound packet, and reject it if it is from the node +itself. Neighbor Discovery specification (RFC2461) does not talk about neighbor cache handling in the following cases: @@ -281,12 +296,35 @@ For (1), we implemented workaround based on discussions on IETF ipngwg mailing list. For more details, see the comments in the source code and email thread started from (IPng 7155), dated Feb 6 1999. -IPv6 on-link determination rule (RFC2461) is quite different from assumptions -in BSD IPv4 network code. To implement behavior in RFC2461 section 5.2 -(when default router list is empty), the kernel needs to know the default +IPv6 on-link determination rule (RFC2461) is quite different from +assumptions in BSD IPv4 network code. To implement the behavior in +RFC2461 section 6.3.6 (3), the kernel needs to know the default outgoing interface. To configure the default outgoing interface, use -commands like "ndp -I de0" as root. Note that the spec misuse the word -"host" and "node" in several places in the section. +commands like "ndp -I de0" as root. Then the kernel will have a +"default" route to the interface with the cloning "C" bit being on. +This default route will cause to make a neighbor cache entry for every +destination that does not match an explicit route entry. + +Note that we intentionally disable configuring the default interface +by default. This is because we found it sometimes caused inconvenient +situation while it was rarely useful in practical usage. For example, +consider a destination that has both IPv4 and IPv6 addresses but is +only reachable via IPv4. Since our getaddrinfo(3) prefers IPv6 by +default, an (TCP) application using the library with PF_UNSPEC first +tries to connect to the IPv6 address. If we turn on RFC 2461 6.3.6 +(3), we have to wait for quite a long period before the first attempt +to make a connection fails. If we turn it off, the first attempt will +immediately fail with EHOSTUNREACH, and then the application can try +the next, reachable address. + +The notion of the default interface is also disabled when the node is +acting as a router. The reason is that routers tend to control all +routes stored in the kernel and the default route automatically +installed would rather confuse the routers. Note that the spec misuse +the word "host" and "node" in several places in Section 5.2 of RFC +2461. We basically read the word "node" in this section as "host," +and thus believe the implementation policy does not break the +specification. To avoid possible DoS attacks and infinite loops, KAME stack will accept only 10 options on ND packet. Therefore, if you have 20 prefix options @@ -312,32 +350,37 @@ There are certain limitations, though: We do not prohibit hosts from doing proxy ND, but there will be very limited use in it. -Starting mid March 2000, we support Neighbor Unreachability Detection (NUD) -on p2p interfaces, including tunnel interfaces (gif). NUD is turned on by -default. Before March 2000 KAME stack did not perform NUD on p2p interfaces. -If the change raises any interoperability issues, you can turn off/on NUD -by per-interface basis. Use "ndp -i interface -nud" to turn it off. -Consult ndp(8) for details. +Starting mid March 2000, we support Neighbor Unreachability Detection +(NUD) on p2p interfaces, including tunnel interfaces (gif). NUD is +turned on by default. Before March 2000 the KAME stack did not +perform NUD on p2p interfaces. If the change raises any +interoperability issues, you can turn off/on NUD by per-interface +basis. Use "ndp -i interface -nud" to turn it off. Consult ndp(8) +for details. RFC2461 specifies upper-layer reachability confirmation hint. Whenever upper-layer reachability confirmation hint comes, ND process can use it to optimize neighbor discovery process - ND process can omit real ND exchange and keep the neighbor cache state in REACHABLE. We currently have two sources for hints: (1) setsockopt(IPV6_REACHCONF) -defined by 2292bis API, and (2) hints from tcp_input. -It is questionable if they are really trustworthy. For example, a rogue -userland program can use IPV6_REACHCONF to confuse ND process. Neighbor -cache is a system-wide information pool, and it is bad to allow single process -to affect others. Also, tcp_input can be hosed by hijack attempts. It is -wrong to allow hijack attempts to affect ND process. -Starting June 2000, ND code has a protection mechanism against incorrect -upper-layer reachability confirmation. ND code counts subsequent upper-layer -hints. If the number of hints reaches maximum, ND code will ignore further -upper-layer hints and run real ND process to confirm reachability to the peer. -sysctl net.inet6.icmp6.nd6_maxnudhint defines maximum # of subsequent +defined by the RFC3542 API, and (2) hints from tcp(6)_input. + +It is questionable if they are really trustworthy. For example, a +rogue userland program can use IPV6_REACHCONF to confuse the ND +process. Neighbor cache is a system-wide information pool, and it is +bad to allow a single process to affect others. Also, tcp(6)_input +can be hosed by hijack attempts. It is wrong to allow hijack attempts +to affect the ND process. + +Starting June 2000, the ND code has a protection mechanism against +incorrect upper-layer reachability confirmation. The ND code counts +subsequent upper-layer hints. If the number of hints reaches the +maximum, the ND code will ignore further upper-layer hints and run +real ND process to confirm reachability to the peer. sysctl +net.inet6.icmp6.nd6_maxnudhint defines the maximum # of subsequent upper-layer hints to be accepted. (from April 2000 to June 2000, we rejected setsockopt(IPV6_REACHCONF) from -non-root process - after local discussion, it looks that hints are not +non-root process - after a local discussion, it looks that hints are not that trustworthy even if they are from privileged processes) If inbound ND packets carry invalid values, the KAME kernel will @@ -681,29 +724,34 @@ The first step in stateless address configuration is Duplicated Address Detection (DAD). See 1.2 for more detail on DAD. When a host hears Router Advertisement from the router, a host may -autoconfigure itself by stateless address autoconfiguration. -This behavior can be controlled by net.inet6.ip6.accept_rtadv -(host autoconfigures itself if it is set to 1). -By autoconfiguration, network address prefix for the receiving interface -(usually global address prefix) is added. The default route is also -configured. +autoconfigure itself by stateless address autoconfiguration. This +behavior can be controlled by the net.inet6.ip6.accept_rtadv sysctl +variable and a per-interface flag managed in the kernel. The latter, +which we call "if_accept_rtadv" here, can be changed by the ndp(8) +command (see the manpage for more details). When the sysctl variable +is set to 1, and the flag is set, the host autoconfigures itself. By +autoconfiguration, network address prefixes for the receiving +interface (usually global address prefix) are added. The default +route is also configured. Routers periodically generate Router Advertisement packets. To request an adjacent router to generate RA packet, a host can transmit Router Solicitation. To generate an RS packet at any time, use the -"rtsol" command. The "rtsold" daemon is also available. "rtsold" -generates Router Solicitation whenever necessary, and it works great +"rtsol" command. The "rtsold" daemon is also available. "rtsold" +generates Router Solicitation whenever necessary, and it works greatly for nomadic usage (notebooks/laptops). If one wishes to ignore Router Advertisements, use sysctl to set net.inet6.ip6.accept_rtadv to 0. +Additionally, ndp(8) command can be used to control the behavior +per-interface basis. To generate Router Advertisement from a router, use the "rtadvd" daemon. Note that the IPv6 specification assumes the following items and that nonconforming cases are left unspecified: - Only hosts will listen to router advertisements -- Hosts have single network interface (except loopback) +- Hosts have a single network interface (except loopback) This is therefore unwise to enable net.inet6.ip6.accept_rtadv on routers, -or multi-interface host. A misconfigured node can behave strange +or multi-interface hosts. A misconfigured node can behave strange (KAME code allows nonconforming configuration, for those who would like to do some experiments). @@ -713,12 +761,17 @@ To summarize the sysctl knob: 0 0 host (to be manually configured) 0 1 router 1 0 autoconfigured host - (spec assumes that host has single - interface only, autoconfigred host with - multiple interface is out-of-scope) + (spec assumes that hosts have a single + interface only, autoconfigred hosts + with multiple interfaces are + out-of-scope) 1 1 invalid, or experimental (out-of-scope of spec) +The if_accept_rtadv flag is referred only when accept_rtadv is 1 (the +latter two cases). The flag does not have any effects when the sysctl +variable is 0. + See 1.2 in the document for relationship between DAD and autoconfiguration. 1.4.3 DHCPv6 @@ -792,6 +845,14 @@ sent from a user application as follows: routers, since some routing daemons stop advertising prefixes (addresses) on interfaces that have become down. + - prefer addresses on "preferred" interfaces. "Preferred" + interfaces can be specified by the ndp(8) command. By default, + no interface is preferred, that is, this rule does not apply. + Again, this rule is particularly useful for routers, since there + is a convention, among router administrators, of assigning + "stable" addresses on a particular interface (typically a + loopback interface). + In any case, addresses that break the scope zone of the destination, or addresses whose zone do not contain the outgoing interface are never chosen. @@ -1396,7 +1457,7 @@ Here are couple of comments: The form can be used as a trigger for TCP DoS attack. KAME code already filters them out. - The following examples are seemingly illegal. It seems that there's general - consensus among ipngwg for those. (1) mobile-ip6 home address option, + consensus among ipngwg for those. (1) Mobile IPv6 home address option, (2) offlink packets (so routers should not forward them). KAME implmements (2) already. @@ -1601,9 +1662,12 @@ The following table lists the network drivers we have tried so far. bah zbus/amiga NG(*) cnw pcmcia/i386 ok ok yes ep pcmcia/i386 ok ok - + fxp pci/i386 ok(*2) ok - + tlp pci/i386 ok ok - le sbus/sparc ok ok yes ne pci/i386 ok ok yes ne pcmcia/i386 ok ok yes + rtk pci/i386 ok ok - wi pcmcia/i386 ok ok yes (ATM) en pci/i386 ok ok - @@ -1629,7 +1693,7 @@ Here is a list of FreeBSD 3.x-RELEASE drivers and its conditions: (*) These drivers are distributed with PAO as PAO3 (http://www.jp.freebsd.org/PAO/). -(**) there are trouble reports with multicast filter initialization. +(**) there were trouble reports with multicast filter initialization. More drivers will just simply work on KAME FreeBSD 3.x-RELEASE but have not been checked yet. @@ -1677,6 +1741,7 @@ You may want to use "@insert" directive in /etc/pccard.conf to invoke (*) exp driver has serious conflict with KAME initialization sequence. A workaround is committed into sys/i386/pci/if_exp.c, and should be okay by now. + 3. Translator We categorize IPv4/IPv6 translator into 4 types. @@ -1720,13 +1785,13 @@ the connection will be relayed toward IPv4 destination 163.221.202.12. faithd must be invoked on FAITH-relay dual stack node. -For more details, consult kame/kame/faithd/README and -draft-ietf-ngtrans-tcpudp-relay-04.txt. +For more details, consult kame/kame/faithd/README and RFC3142. 3.2 IPv6-to-IPv4 header translator (to be written) + 4. IPsec IPsec is implemented as the following three components. @@ -1902,7 +1967,7 @@ Currently supported algorithms are: keyed SHA1 with 96bit crypto checksum (no document) HMAC MD5 with 96bit crypto checksum (rfc2403.txt HMAC SHA1 with 96bit crypto checksum (rfc2404.txt) - HMAC SHA2-256 with 96bit crypto checksum (no document) + HMAC SHA2-256 with 96bit crypto checksum (draft-ietf-ipsec-ciph-sha-256-00.txt) HMAC SHA2-384 with 96bit crypto checksum (no document) HMAC SHA2-512 with 96bit crypto checksum (no document) HMAC RIPEMD160 with 96bit crypto checksum (RFC2857) @@ -1916,11 +1981,10 @@ Currently supported algorithms are: BLOWFISH CBC (rfc2451.txt) CAST128 CBC (rfc2451.txt) RIJNDAEL/AES CBC (rfc3602.txt) - AES counter mode (draft-ietf-ipsec-ciph-aes-ctr-03.txt) + AES counter mode (rfc3686.txt) - each of the above can be combined with: - ESP authentication with HMAC-MD5(96bit) - ESP authentication with HMAC-SHA1(96bit) + each of the above can be combined with new IPsec AH schemes for + ESP authentication. IPComp RFC2394: IP Payload Compression Using DEFLATE @@ -2000,19 +2064,26 @@ Here are (some of) platforms we have tested IPsec/IKE interoperability in the past, no particular order. Note that both ends (KAME and others) may have modified their implementation, so use the following list just for reference purposes. - ACC, allied-telesis, Altiga, Ashley-laurent (vpcom.com), BlueSteel, - CISCO IOS, Cryptek, Checkpoint FW-1, Data Fellows (F-Secure), - Ericsson, Fitel, FreeS/WAN, HiFn, HITACHI, IBM AIX, IIJ, Intel Canada, - Intel Packet Protect, MEW NetCocoon, MGCS, Microsoft WinNT/2000, - NAI PGPnet, NetLock, NIST (linux IPsec + plutoplus), NEC IX5000, - Netscreen, NxNetworks, OpenBSD isakmpd, Pivotal, Radguard, RapidStream, - RedCreek, Routerware, RSA, SSH (both IPv4/IPv6), Secure Computing, - Soliton, Sun Solaris8, TIS/NAI Gauntret, Toshiba, VPNet, - Yamaha RT series + 6WIND, ACC, Allied-telesis, Altiga, Ashley-laurent (vpcom.com), + BlueSteel, CISCO IOS, Checkpoint FW-1, Compaq Tru54 UNIX + X5.1B-BL4, Cryptek, Data Fellows (F-Secure), Ericsson, + F-Secure VPN+ 5.40, Fitec, Fitel, FreeS/WAN, HITACHI, HiFn, + IBM AIX 5.1, III, IIJ (fujie stack), Intel Canada, Intel + Packet Protect, MEW NetCocoon, MGCS, Microsoft WinNT/2000/XP, + NAI PGPnet, NEC IX5000, NIST (linux IPsec + plutoplus), + NetLock, Netoctave, Netopia, Netscreen, Nokia EPOC, Nortel + GatewayController/CallServer 2000 (not released yet), + NxNetworks, OpenBSD isakmpd on OpenBSD, Oullim information + technologies SECUREWORKS VPN gateway 3.0, Pivotal, RSA, + Radguard, RapidStream, RedCreek, Routerware, SSH, SecGo + CryptoIP v3, Secure Computing, Soliton, Sun Solaris 8, + TIS/NAI Gauntret, Toshiba, Trilogy AdmitOne 2.6, Trustworks + TrustedClient v3.2, USAGI linux, VPNet, Yamaha RT series, + ZyXEL Here are (some of) platforms we have tested IPComp/IKE interoperability in the past, in no particular order. - IRE, SSH (both IPv4/IPv6), NetLock + Compaq, IRE, SSH, NetLock, FreeS/WAN, F-Secure VPN+ 5.40 VPNC (vpnc.org) provides IPsec conformance tests, using KAME and OpenBSD IPsec/IKE implementations. Their test results are available at @@ -2147,9 +2218,11 @@ interoperate. 5. ALTQ -KAME kit includes ALTQ 2.1 code, which supports FreeBSD2, FreeBSD3, -NetBSD and OpenBSD. For BSD/OS, ALTQ does not work. -ALTQ in KAME supports (or tries to support) IPv6. +KAME kit includes ALTQ, which supports FreeBSD3, FreeBSD4, FreeBSD5 +NetBSD. OpenBSD has ALTQ merged into pf and its ALTQ code is not +compatible with other platforms so that KAME's ALTQ is not used for +OpenBSD. For BSD/OS, ALTQ does not work. +ALTQ in KAME supports IPv6. (actually, ALTQ is developed on KAME repository since ALTQ 2.1 - Jan 2000) ALTQ occupies single character device number. For FreeBSD, it is officially @@ -2167,7 +2240,8 @@ compile ALTQ-ready kernel for other archititectures, take the following steps: - before building userland, change netbsd/{lib,usr.sbin,usr.bin}/Makefile (or openbsd/foobaa) so that it will visit altq-related sub directories. -6. mobile-ip6 + +6. Mobile IPv6 6.1 KAME node as correspondent node diff --git a/sys/netinet/icmp6.h b/sys/netinet/icmp6.h index b35ec09e3afc..3b6189a817d2 100644 --- a/sys/netinet/icmp6.h +++ b/sys/netinet/icmp6.h @@ -619,7 +619,11 @@ struct icmp6stat { #define ICMPV6CTL_ND6_DEBUG 18 #define ICMPV6CTL_ND6_DRLIST 19 #define ICMPV6CTL_ND6_PRLIST 20 -#define ICMPV6CTL_MAXID 21 +#define ICMPV6CTL_MLD_MAXSRCFILTER 21 +#define ICMPV6CTL_MLD_SOMAXSRC 22 +#define ICMPV6CTL_MLD_VERSION 23 +#define ICMPV6CTL_ND6_MAXQLEN 24 +#define ICMPV6CTL_MAXID 25 #define RTF_PROBEMTU RTF_PROTO1 diff --git a/sys/netinet6/icmp6.c b/sys/netinet6/icmp6.c index 573d7c24a3c5..43c2237badc3 100644 --- a/sys/netinet6/icmp6.c +++ b/sys/netinet6/icmp6.c @@ -2178,7 +2178,7 @@ void icmp6_fasttimo() { - mld6_fasttimeo(); + return; } static const char * @@ -2415,7 +2415,7 @@ icmp6_redirect_output(m0, rt) icmp6_errcount(&icmp6stat.icp6s_outerrhist, ND_REDIRECT, 0); /* if we are not router, we don't send icmp6 redirect */ - if (!ip6_forwarding || ip6_accept_rtadv) + if (!ip6_forwarding) goto fail; /* sanity check */ diff --git a/sys/netinet6/in6.c b/sys/netinet6/in6.c index 53d2c2d4dfeb..d5a8e159c96d 100644 --- a/sys/netinet6/in6.c +++ b/sys/netinet6/in6.c @@ -328,6 +328,7 @@ in6_control(so, cmd, data, ifp, td) struct in6_ifaddr *ia = NULL; struct in6_aliasreq *ifra = (struct in6_aliasreq *)data; int error, privileged; + struct sockaddr_in6 *sa6; privileged = 0; if (td == NULL || !suser(td)) @@ -408,19 +409,56 @@ in6_control(so, cmd, data, ifp, td) /* * Find address for this interface, if it exists. + * + * In netinet code, we have checked ifra_addr in SIOCSIF*ADDR operation + * only, and used the first interface address as the target of other + * operations (without checking ifra_addr). This was because netinet + * code/API assumed at most 1 interface address per interface. + * Since IPv6 allows a node to assign multiple addresses + * on a single interface, we almost always look and check the + * presence of ifra_addr, and reject invalid ones here. + * It also decreases duplicated code among SIOC*_IN6 operations. */ - if (ifra->ifra_addr.sin6_family == AF_INET6) { /* XXX */ + switch (cmd) { + case SIOCAIFADDR_IN6: + case SIOCSIFPHYADDR_IN6: + sa6 = &ifra->ifra_addr; + break; + case SIOCSIFADDR_IN6: + case SIOCGIFADDR_IN6: + case SIOCSIFDSTADDR_IN6: + case SIOCSIFNETMASK_IN6: + case SIOCGIFDSTADDR_IN6: + case SIOCGIFNETMASK_IN6: + case SIOCDIFADDR_IN6: + case SIOCGIFPSRCADDR_IN6: + case SIOCGIFPDSTADDR_IN6: + case SIOCGIFAFLAG_IN6: + case SIOCSNDFLUSH_IN6: + case SIOCSPFXFLUSH_IN6: + case SIOCSRTRFLUSH_IN6: + case SIOCGIFALIFETIME_IN6: + case SIOCSIFALIFETIME_IN6: + case SIOCGIFSTAT_IN6: + case SIOCGIFSTAT_ICMP6: + sa6 = &ifr->ifr_addr; + break; + default: + sa6 = NULL; + break; + } + if (sa6 && sa6->sin6_family == AF_INET6) { int error = 0; - if (ifra->ifra_addr.sin6_scope_id != 0) - error = sa6_embedscope(&ifra->ifra_addr, 0); + if (sa6->sin6_scope_id != 0) + error = sa6_embedscope(sa6, 0); else - error = in6_setscope(&ifra->ifra_addr.sin6_addr, - ifp, NULL); + error = in6_setscope(&sa6->sin6_addr, ifp, NULL); if (error != 0) return (error); - ia = in6ifa_ifpwithaddr(ifp, &ifra->ifra_addr.sin6_addr); - } + ia = in6ifa_ifpwithaddr(ifp, &sa6->sin6_addr); + } else + ia = NULL; switch (cmd) { case SIOCSIFADDR_IN6: @@ -538,6 +576,42 @@ in6_control(so, cmd, data, ifp, td) case SIOCGIFALIFETIME_IN6: ifr->ifr_ifru.ifru_lifetime = ia->ia6_lifetime; + if (ia->ia6_lifetime.ia6t_vltime != ND6_INFINITE_LIFETIME) { + time_t maxexpire; + struct in6_addrlifetime *retlt = + &ifr->ifr_ifru.ifru_lifetime; + + /* + * XXX: adjust expiration time assuming time_t is + * signed. + */ + maxexpire = (-1) & + ~(1 << ((sizeof(maxexpire) * 8) - 1)); + if (ia->ia6_lifetime.ia6t_vltime < + maxexpire - ia->ia6_updatetime) { + retlt->ia6t_expire = ia->ia6_updatetime + + ia->ia6_lifetime.ia6t_vltime; + } else + retlt->ia6t_expire = maxexpire; + } + if (ia->ia6_lifetime.ia6t_pltime != ND6_INFINITE_LIFETIME) { + time_t maxexpire; + struct in6_addrlifetime *retlt = + &ifr->ifr_ifru.ifru_lifetime; + + /* + * XXX: adjust expiration time assuming time_t is + * signed. + */ + maxexpire = (-1) & + ~(1 << ((sizeof(maxexpire) * 8) - 1)); + if (ia->ia6_lifetime.ia6t_pltime < + maxexpire - ia->ia6_updatetime) { + retlt->ia6t_preferred = ia->ia6_updatetime + + ia->ia6_lifetime.ia6t_pltime; + } else + retlt->ia6t_preferred = maxexpire; + } break; case SIOCSIFALIFETIME_IN6: @@ -558,13 +632,14 @@ in6_control(so, cmd, data, ifp, td) case SIOCAIFADDR_IN6: { int i, error = 0; - struct nd_prefix pr0, *pr; + struct nd_prefixctl pr0; + struct nd_prefix *pr; /* * first, make or update the interface address structure, * and link it to the list. */ - if ((error = in6_update_ifa(ifp, ifra, ia)) != 0) + if ((error = in6_update_ifa(ifp, ifra, ia, 0)) != 0) return (error); /* @@ -586,7 +661,6 @@ in6_control(so, cmd, data, ifp, td) break; /* we don't need to install a host route. */ } pr0.ndpr_prefix = ifra->ifra_addr; - pr0.ndpr_mask = ifra->ifra_prefixmask.sin6_addr; /* apply the mask for safety. */ for (i = 0; i < 4; i++) { pr0.ndpr_prefix.sin6_addr.s6_addr32[i] &= @@ -638,7 +712,7 @@ in6_control(so, cmd, data, ifp, td) if (ip6_use_tempaddr && pr->ndpr_refcnt == 1) { int e; - if ((e = in6_tmpifadd(ia, 1)) != 0) { + if ((e = in6_tmpifadd(ia, 1, 0)) != 0) { log(LOG_NOTICE, "in6_control: " "failed to create a " "temporary address, " @@ -662,7 +736,8 @@ in6_control(so, cmd, data, ifp, td) case SIOCDIFADDR_IN6: { int i = 0; - struct nd_prefix pr0, *pr; + struct nd_prefixctl pr0; + struct nd_prefix *pr; /* * If the address being deleted is the only one that owns @@ -680,10 +755,10 @@ in6_control(so, cmd, data, ifp, td) if (pr0.ndpr_plen == 128) goto purgeaddr; pr0.ndpr_prefix = ia->ia_addr; - pr0.ndpr_mask = ia->ia_prefixmask.sin6_addr; + /* apply the mask for safety. */ for (i = 0; i < 4; i++) { pr0.ndpr_prefix.sin6_addr.s6_addr32[i] &= - ia->ia_prefixmask.sin6_addr.s6_addr32[i]; + ifra->ifra_prefixmask.sin6_addr.s6_addr32[i]; } /* * The logic of the following condition is a bit complicated. @@ -723,16 +798,20 @@ in6_control(so, cmd, data, ifp, td) * XXX: should this be performed under splnet()? */ int -in6_update_ifa(ifp, ifra, ia) +in6_update_ifa(ifp, ifra, ia, flags) struct ifnet *ifp; struct in6_aliasreq *ifra; struct in6_ifaddr *ia; + int flags; { int error = 0, hostIsNew = 0, plen = -1; struct in6_ifaddr *oia; struct sockaddr_in6 dst6; struct in6_addrlifetime *lt; + struct in6_multi_mship *imm; + struct in6_multi *in6m_sol; struct rtentry *rt; + int delay; /* Validate parameters */ if (ifp == NULL || ifra == NULL) /* this maybe redundant */ @@ -818,10 +897,8 @@ in6_update_ifa(ifp, ifra, ia) } /* lifetime consistency check */ lt = &ifra->ifra_lifetime; - if (lt->ia6t_vltime != ND6_INFINITE_LIFETIME - && lt->ia6t_vltime + time_second < time_second) { - return EINVAL; - } + if (lt->ia6t_pltime > lt->ia6t_vltime) + return (EINVAL); if (lt->ia6t_vltime == 0) { /* * the following log might be noisy, but this is a typical @@ -830,10 +907,9 @@ in6_update_ifa(ifp, ifra, ia) nd6log((LOG_INFO, "in6_update_ifa: valid lifetime is 0 for %s\n", ip6_sprintf(&ifra->ifra_addr.sin6_addr))); - } - if (lt->ia6t_pltime != ND6_INFINITE_LIFETIME - && lt->ia6t_pltime + time_second < time_second) { - return EINVAL; + + if (ia == NULL) + return (0); /* there's nothing to do */ } /* @@ -852,11 +928,12 @@ in6_update_ifa(ifp, ifra, ia) if (ia == NULL) return (ENOBUFS); bzero((caddr_t)ia, sizeof(*ia)); - /* Initialize the address and masks */ + /* Initialize the address and masks, and put time stamp */ IFA_LOCK_INIT(&ia->ia_ifa); ia->ia_ifa.ifa_addr = (struct sockaddr *)&ia->ia_addr; ia->ia_addr.sin6_family = AF_INET6; ia->ia_addr.sin6_len = sizeof(ia->ia_addr); + ia->ia6_createtime = time_second; if ((ifp->if_flags & (IFF_POINTOPOINT | IFF_LOOPBACK)) != 0) { /* * XXX: some functions expect that ifa_dstaddr is not @@ -881,6 +958,9 @@ in6_update_ifa(ifp, ifra, ia) TAILQ_INSERT_TAIL(&ifp->if_addrlist, &ia->ia_ifa, ifa_list); } + /* update timestamp */ + ia->ia6_updatetime = time_second; + /* set prefix mask */ if (ifra->ifra_prefixmask.sin6_len) { /* @@ -945,8 +1025,6 @@ in6_update_ifa(ifp, ifra, ia) * configure address flags. */ ia->ia6_flags = ifra->ifra_flags; - ia->ia6_flags &= ~IN6_IFF_DUPLICATED; /* safety */ - ia->ia6_flags &= ~IN6_IFF_NODAD; /* Mobile IPv6 */ /* * backward compatibility - if IN6_IFF_DEPRECATED is set from the * userland, make it deprecated. @@ -955,17 +1033,14 @@ in6_update_ifa(ifp, ifra, ia) ia->ia6_lifetime.ia6t_pltime = 0; ia->ia6_lifetime.ia6t_preferred = time_second; } - /* - * Perform DAD, if needed. - * XXX It may be of use, if we can administratively - * disable DAD. + * Make the address tentative before joining multicast addresses, + * so that corresponding MLD responses would not have a tentative + * source address. */ - if (in6if_do_dad(ifp) && hostIsNew && - (ifra->ifra_flags & IN6_IFF_NODAD) == 0) { + ia->ia6_flags &= ~IN6_IFF_DUPLICATED; /* safety */ + if (hostIsNew && in6if_do_dad(ifp)) ia->ia6_flags |= IN6_IFF_TENTATIVE; - nd6_dad_start((struct ifaddr *)ia, NULL); - } /* * We are done if we have simply modified an existing address. @@ -979,9 +1054,9 @@ in6_update_ifa(ifp, ifra, ia) */ /* Join necessary multicast groups */ + in6m_sol = NULL; if ((ifp->if_flags & IFF_MULTICAST) != 0) { struct sockaddr_in6 mltaddr, mltmask; - struct in6_multi *in6m; struct in6_addr llsol; /* join solicited multicast addr for new host id */ @@ -997,15 +1072,29 @@ in6_update_ifa(ifp, ifra, ia) "in6_setscope failed\n"); goto cleanup; } - (void)in6_addmulti(&llsol, ifp, &error); + delay = 0; + if ((flags & IN6_IFAUPDATE_DADDELAY)) { + /* + * We need a random delay for DAD on the address + * being configured. It also means delaying + * transmission of the corresponding MLD report to + * avoid report collision. + * [draft-ietf-ipv6-rfc2462bis-02.txt] + */ + delay = arc4random() % + (MAX_RTR_SOLICITATION_DELAY * hz); + } + imm = in6_joingroup(ifp, &llsol, &error, delay); if (error != 0) { nd6log((LOG_WARNING, "in6_update_ifa: addmulti failed for " "%s on %s (errno=%d)\n", ip6_sprintf(&llsol), if_name(ifp), error)); - goto cleanup; + in6_purgeaddr((struct ifaddr *)ia); + return (error); } + in6m_sol = imm->i6mm_maddr; bzero(&mltmask, sizeof(mltmask)); mltmask.sin6_len = sizeof(struct sockaddr_in6); @@ -1050,37 +1139,67 @@ in6_update_ifa(ifp, ifra, ia) } else RTFREE_LOCKED(rt); - IN6_LOOKUP_MULTI(mltaddr.sin6_addr, ifp, in6m); - if (in6m == NULL) { - (void)in6_addmulti(&mltaddr.sin6_addr, ifp, &error); - if (error != 0) { - nd6log((LOG_WARNING, - "in6_update_ifa: addmulti failed for " - "%s on %s (errno=%d)\n", - ip6_sprintf(&mltaddr.sin6_addr), - if_name(ifp), error)); - goto cleanup; + /* + * XXX: do we really need this automatic routes? + * We should probably reconsider this stuff. Most applications + * actually do not need the routes, since they usually specify + * the outgoing interface. + */ + rt = rtalloc1((struct sockaddr *)&mltaddr, 0, 0UL); + if (rt) { + /* XXX: only works in !SCOPEDROUTING case. */ + if (memcmp(&mltaddr.sin6_addr, + &((struct sockaddr_in6 *)rt_key(rt))->sin6_addr, + MLTMASK_LEN)) { + RTFREE_LOCKED(rt); + rt = NULL; } } + if (!rt) { + error = rtrequest(RTM_ADD, (struct sockaddr *)&mltaddr, + (struct sockaddr *)&ia->ia_addr, + (struct sockaddr *)&mltmask, RTF_UP | RTF_CLONING, + (struct rtentry **)0); + if (error) + goto cleanup; + } else { + RTFREE_LOCKED(rt); + } + + imm = in6_joingroup(ifp, &mltaddr.sin6_addr, &error, 0); + if (!imm) { + nd6log((LOG_WARNING, + "in6_update_ifa: addmulti failed for " + "%s on %s (errno=%d)\n", + ip6_sprintf(&mltaddr.sin6_addr), + if_name(ifp), error)); + goto cleanup; + } /* * join node information group address */ #define hostnamelen strlen(hostname) + delay = 0; + if ((flags & IN6_IFAUPDATE_DADDELAY)) { + /* + * The spec doesn't say anything about delay for this + * group, but the same logic should apply. + */ + delay = arc4random() % + (MAX_RTR_SOLICITATION_DELAY * hz); + } if (in6_nigroup(ifp, hostname, hostnamelen, &mltaddr.sin6_addr) == 0) { - IN6_LOOKUP_MULTI(mltaddr.sin6_addr, ifp, in6m); - if (in6m == NULL) { - (void)in6_addmulti(&mltaddr.sin6_addr, - ifp, &error); - if (error != 0) { - nd6log((LOG_WARNING, "in6_update_ifa: " - "addmulti failed for " - "%s on %s (errno=%d)\n", - ip6_sprintf(&mltaddr.sin6_addr), - if_name(ifp), error)); - goto cleanup; - } + imm = in6_joingroup(ifp, &mltaddr.sin6_addr, &error, + delay); /* XXX jinmei */ + if (!imm) { + nd6log((LOG_WARNING, "in6_update_ifa: " + "addmulti failed for %s on %s " + "(errno=%d)\n", + ip6_sprintf(&mltaddr.sin6_addr), + if_name(ifp), error)); + /* XXX not very fatal, go on... */ } } #undef hostnamelen @@ -1113,21 +1232,77 @@ in6_update_ifa(ifp, ifra, ia) } else RTFREE_LOCKED(rt); - IN6_LOOKUP_MULTI(mltaddr.sin6_addr, ifp, in6m); - if (in6m == NULL) { - (void)in6_addmulti(&mltaddr.sin6_addr, ifp, &error); - if (error != 0) { - nd6log((LOG_WARNING, "in6_update_ifa: " - "addmulti failed for %s on %s " - "(errno=%d)\n", - ip6_sprintf(&mltaddr.sin6_addr), - if_name(ifp), error)); - goto cleanup; + /* XXX: again, do we really need the route? */ + rt = rtalloc1((struct sockaddr *)&mltaddr, 0, 0UL); + if (rt) { + if (memcmp(&mltaddr.sin6_addr, + &((struct sockaddr_in6 *)rt_key(rt))->sin6_addr, + MLTMASK_LEN)) { + RTFREE_LOCKED(rt); + rt = NULL; } } + if (!rt) { + error = rtrequest(RTM_ADD, (struct sockaddr *)&mltaddr, + (struct sockaddr *)&ia->ia_addr, + (struct sockaddr *)&mltmask, RTF_UP | RTF_CLONING, + (struct rtentry **)0); + if (error) + goto cleanup; + } else { + RTFREE_LOCKED(rt); + } + + imm = in6_joingroup(ifp, &mltaddr.sin6_addr, &error, 0); + if (!imm) { + nd6log((LOG_WARNING, "in6_update_ifa: " + "addmulti failed for %s on %s " + "(errno=%d)\n", + ip6_sprintf(&mltaddr.sin6_addr), + if_name(ifp), error)); + goto cleanup; + } #undef MLTMASK_LEN } + /* + * Perform DAD, if needed. + * XXX It may be of use, if we can administratively + * disable DAD. + */ + if (hostIsNew && in6if_do_dad(ifp) && + ((ifra->ifra_flags & IN6_IFF_NODAD) == 0) && + (ia->ia6_flags & IN6_IFF_TENTATIVE)) + { + int mindelay, maxdelay; + + delay = 0; + if ((flags & IN6_IFAUPDATE_DADDELAY)) { + /* + * We need to impose a delay before sending an NS + * for DAD. Check if we also needed a delay for the + * corresponding MLD message. If we did, the delay + * should be larger than the MLD delay (this could be + * relaxed a bit, but this simple logic is at least + * safe). + */ + mindelay = 0; + if (in6m_sol != NULL && + in6m_sol->in6m_state == MLD_REPORTPENDING) { + mindelay = in6m_sol->in6m_timer; + } + maxdelay = MAX_RTR_SOLICITATION_DELAY * hz; + if (maxdelay - mindelay == 0) + delay = 0; + else { + delay = + (arc4random() % (maxdelay - mindelay)) + + mindelay; + } + } + nd6_dad_start((struct ifaddr *)ia, delay); + } + return (error); unlink: @@ -1603,10 +1778,11 @@ in6_ifinit(ifp, ia, sin6, newhost) } struct in6_multi_mship * -in6_joingroup(ifp, addr, errorp) +in6_joingroup(ifp, addr, errorp, delay) struct ifnet *ifp; struct in6_addr *addr; int *errorp; + int delay; { struct in6_multi_mship *imm; @@ -1615,7 +1791,7 @@ in6_joingroup(ifp, addr, errorp) *errorp = ENOBUFS; return NULL; } - imm->i6mm_maddr = in6_addmulti(addr, ifp, errorp); + imm->i6mm_maddr = in6_addmulti(addr, ifp, errorp, delay); if (!imm->i6mm_maddr) { /* *errorp is alrady set */ free(imm, M_IP6MADDR); @@ -1943,21 +2119,27 @@ in6_if_up(ifp) { struct ifaddr *ifa; struct in6_ifaddr *ia; - int dad_delay; /* delay ticks before DAD output */ + + TAILQ_FOREACH(ifa, &ifp->if_addrlist, ifa_list) { + if (ifa->ifa_addr->sa_family != AF_INET6) + continue; + ia = (struct in6_ifaddr *)ifa; + if (ia->ia6_flags & IN6_IFF_TENTATIVE) { + /* + * The TENTATIVE flag was likely set by hand + * beforehand, implicitly indicating the need for DAD. + * We may be able to skip the random delay in this + * case, but we impose delays just in case. + */ + nd6_dad_start(ifa, + arc4random() % (MAX_RTR_SOLICITATION_DELAY * hz)); + } + } /* * special cases, like 6to4, are handled in in6_ifattach */ in6_ifattach(ifp, NULL); - - dad_delay = 0; - TAILQ_FOREACH(ifa, &ifp->if_addrlist, ifa_list) { - if (ifa->ifa_addr->sa_family != AF_INET6) - continue; - ia = (struct in6_ifaddr *)ifa; - if (ia->ia6_flags & IN6_IFF_TENTATIVE) - nd6_dad_start(ifa, &dad_delay); - } } int @@ -2021,6 +2203,66 @@ in6_setmaxmtu() in6_maxmtu = maxmtu; } +/* + * Provide the length of interface identifiers to be used for the link attached + * to the given interface. The length should be defined in "IPv6 over + * xxx-link" document. Note that address architecture might also define + * the length for a particular set of address prefixes, regardless of the + * link type. As clarified in rfc2462bis, those two definitions should be + * consistent, and those really are as of August 2004. + */ +int +in6_if2idlen(ifp) + struct ifnet *ifp; +{ + switch (ifp->if_type) { + case IFT_ETHER: /* RFC2464 */ +#ifdef IFT_PROPVIRTUAL + case IFT_PROPVIRTUAL: /* XXX: no RFC. treat it as ether */ +#endif +#ifdef IFT_L2VLAN + case IFT_L2VLAN: /* ditto */ +#endif +#ifdef IFT_IEEE80211 + case IFT_IEEE80211: /* ditto */ +#endif +#ifdef IFT_MIP + case IFT_MIP: /* ditto */ +#endif + return (64); + case IFT_FDDI: /* RFC2467 */ + return (64); + case IFT_ISO88025: /* RFC2470 (IPv6 over Token Ring) */ + return (64); + case IFT_PPP: /* RFC2472 */ + return (64); + case IFT_ARCNET: /* RFC2497 */ + return (64); + case IFT_FRELAY: /* RFC2590 */ + return (64); + case IFT_IEEE1394: /* RFC3146 */ + return (64); + case IFT_GIF: + return (64); /* draft-ietf-v6ops-mech-v2-07 */ + case IFT_LOOP: + return (64); /* XXX: is this really correct? */ + default: + /* + * Unknown link type: + * It might be controversial to use the today's common constant + * of 64 for these cases unconditionally. For full compliance, + * we should return an error in this case. On the other hand, + * if we simply miss the standard for the link type or a new + * standard is defined for a new link type, the IFID length + * is very likely to be the common constant. As a compromise, + * we always use the constant, but make an explicit notice + * indicating the "unknown" case. + */ + printf("in6_if2idlen: unknown link type (%d)\n", ifp->if_type); + return (64); + } +} + void * in6_domifattach(ifp) struct ifnet *ifp; diff --git a/sys/netinet6/in6.h b/sys/netinet6/in6.h index 8777c1c172ab..150f9863b6df 100644 --- a/sys/netinet6/in6.h +++ b/sys/netinet6/in6.h @@ -374,11 +374,13 @@ extern const struct in6_addr in6addr_linklocal_allrouters; (IN6_IS_ADDR_MC_LINKLOCAL(a))) #define IFA6_IS_DEPRECATED(a) \ - ((a)->ia6_lifetime.ia6t_preferred != 0 && \ - (a)->ia6_lifetime.ia6t_preferred < time_second) + ((a)->ia6_lifetime.ia6t_pltime != ND6_INFINITE_LIFETIME && \ + (u_int32_t)((time_second - (a)->ia6_updatetime)) > \ + (a)->ia6_lifetime.ia6t_pltime) #define IFA6_IS_INVALID(a) \ - ((a)->ia6_lifetime.ia6t_expire != 0 && \ - (a)->ia6_lifetime.ia6t_expire < time_second) + ((a)->ia6_lifetime.ia6t_vltime != ND6_INFINITE_LIFETIME && \ + (u_int32_t)((time_second - (a)->ia6_updatetime)) > \ + (a)->ia6_lifetime.ia6t_vltime) #endif /* _KERNEL */ /* diff --git a/sys/netinet6/in6_ifattach.c b/sys/netinet6/in6_ifattach.c index 1ddf5fed6bb2..6b7d022ea6e7 100644 --- a/sys/netinet6/in6_ifattach.c +++ b/sys/netinet6/in6_ifattach.c @@ -419,7 +419,7 @@ in6_ifattach_linklocal(ifp, altifp) { struct in6_ifaddr *ia; struct in6_aliasreq ifra; - struct nd_prefix pr0; + struct nd_prefixctl pr0; int i, error; /* @@ -457,20 +457,14 @@ in6_ifattach_linklocal(ifp, altifp) ifra.ifra_lifetime.ia6t_vltime = ND6_INFINITE_LIFETIME; ifra.ifra_lifetime.ia6t_pltime = ND6_INFINITE_LIFETIME; - /* - * Do not let in6_update_ifa() do DAD, since we need a random delay - * before sending an NS at the first time the interface becomes up. - * Instead, in6_if_up() will start DAD with a proper random delay. - */ - ifra.ifra_flags |= IN6_IFF_NODAD; - /* * Now call in6_update_ifa() to do a bunch of procedures to configure * a link-local address. We can set the 3rd argument to NULL, because * we know there's no other link-local address on the interface * and therefore we are adding one (instead of updating one). */ - if ((error = in6_update_ifa(ifp, &ifra, NULL)) != 0) { + if ((error = in6_update_ifa(ifp, &ifra, NULL, + IN6_IFAUPDATE_DADDELAY)) != 0) { /* * XXX: When the interface does not support IPv6, this call * would fail in the SIOCSIFADDR ioctl. I believe the @@ -485,11 +479,6 @@ in6_ifattach_linklocal(ifp, altifp) return (-1); } - /* - * Adjust ia6_flags so that in6_if_up will perform DAD. - * XXX: Some P2P interfaces seem not to send packets just after - * becoming up, so we skip p2p interfaces for safety. - */ ia = in6ifa_ifpforlinklocal(ifp, 0); /* ia must not be NULL */ #ifdef DIAGNOSTIC if (!ia) { @@ -497,10 +486,6 @@ in6_ifattach_linklocal(ifp, altifp) /* NOTREACHED */ } #endif - if (in6if_do_dad(ifp) && (ifp->if_flags & IFF_POINTOPOINT) == 0) { - ia->ia6_flags &= ~IN6_IFF_NODAD; - ia->ia6_flags |= IN6_IFF_TENTATIVE; - } /* * Make the link-local prefix (fe80::%link/64) as on-link. @@ -513,7 +498,6 @@ in6_ifattach_linklocal(ifp, altifp) pr0.ndpr_ifp = ifp; /* this should be 64 at this moment. */ pr0.ndpr_plen = in6_mask2len(&ifra.ifra_prefixmask.sin6_addr, NULL); - pr0.ndpr_mask = ifra.ifra_prefixmask.sin6_addr; pr0.ndpr_prefix = ifra.ifra_addr; /* apply the mask for safety. (nd6_prelist_add will apply it again) */ for (i = 0; i < 4; i++) { @@ -588,7 +572,7 @@ in6_ifattach_loopback(ifp) * We are sure that this is a newly assigned address, so we can set * NULL to the 3rd arg. */ - if ((error = in6_update_ifa(ifp, &ifra, NULL)) != 0) { + if ((error = in6_update_ifa(ifp, &ifra, NULL, 0)) != 0) { nd6log((LOG_ERR, "in6_ifattach_loopback: failed to configure " "the loopback address on %s (errno=%d)\n", if_name(ifp), error)); @@ -854,7 +838,7 @@ in6_ifdetach(ifp) } } -void +int in6_get_tmpifid(ifp, retbuf, baseid, generate) struct ifnet *ifp; u_int8_t *retbuf; @@ -878,6 +862,8 @@ in6_get_tmpifid(ifp, retbuf, baseid, generate) ndi->randomid); } bcopy(ndi->randomid, retbuf, 8); + + return (0); } void diff --git a/sys/netinet6/in6_ifattach.h b/sys/netinet6/in6_ifattach.h index c91f3ffcc865..77cc88f1be4d 100644 --- a/sys/netinet6/in6_ifattach.h +++ b/sys/netinet6/in6_ifattach.h @@ -36,7 +36,7 @@ #ifdef _KERNEL void in6_ifattach __P((struct ifnet *, struct ifnet *)); void in6_ifdetach __P((struct ifnet *)); -void in6_get_tmpifid __P((struct ifnet *, u_int8_t *, const u_int8_t *, int)); +int in6_get_tmpifid __P((struct ifnet *, u_int8_t *, const u_int8_t *, int)); void in6_tmpaddrtimer __P((void *)); int in6_get_hw_ifid __P((struct ifnet *, struct in6_addr *)); int in6_nigroup __P((struct ifnet *, const char *, int, struct in6_addr *)); diff --git a/sys/netinet6/in6_var.h b/sys/netinet6/in6_var.h index 6bf87b205917..881b21f8b914 100644 --- a/sys/netinet6/in6_var.h +++ b/sys/netinet6/in6_var.h @@ -108,7 +108,10 @@ struct in6_ifaddr { int ia6_flags; struct in6_addrlifetime ia6_lifetime; - struct ifprefix *ia6_ifpr; /* back pointer to ifprefix */ + time_t ia6_createtime; /* the creation time of this address, which is + * currently used for temporary addresses only. + */ + time_t ia6_updatetime; /* back pointer to the ND prefix (for autoconfigured addresses only) */ struct nd_prefix *ia6_ndpr; @@ -518,9 +521,16 @@ struct in6_multi { u_int in6m_refcount; /* # membership claims by sockets */ u_int in6m_state; /* state of the membership */ u_int in6m_timer; /* MLD6 listener report timer */ + struct timeval in6m_timer_expire; /* when the timer expires */ + struct callout *in6m_timer_ch; }; +#define IN6M_TIMER_UNDEF -1 + #ifdef _KERNEL +/* flags to in6_update_ifa */ +#define IN6_IFAUPDATE_DADDELAY 0x1 /* first time to configure an address */ + extern LIST_HEAD(in6_multihead, in6_multi) in6_multihead; /* @@ -579,15 +589,15 @@ do { \ } while(0) struct in6_multi *in6_addmulti __P((struct in6_addr *, struct ifnet *, - int *)); + int *, int)); void in6_delmulti __P((struct in6_multi *)); -struct in6_multi_mship *in6_joingroup(struct ifnet *, struct in6_addr *, int *); +struct in6_multi_mship *in6_joingroup(struct ifnet *, struct in6_addr *, int *, int); int in6_leavegroup(struct in6_multi_mship *); int in6_mask2len __P((struct in6_addr *, u_char *)); int in6_control __P((struct socket *, u_long, caddr_t, struct ifnet *, struct thread *)); int in6_update_ifa __P((struct ifnet *, struct in6_aliasreq *, - struct in6_ifaddr *)); + struct in6_ifaddr *, int)); void in6_purgeaddr __P((struct ifaddr *)); int in6if_do_dad __P((struct ifnet *)); void in6_purgeif __P((struct ifnet *)); @@ -595,6 +605,7 @@ void in6_savemkludge __P((struct in6_ifaddr *)); void *in6_domifattach __P((struct ifnet *)); void in6_domifdetach __P((struct ifnet *, void *)); void in6_setmaxmtu __P((void)); +int in6_if2idlen __P((struct ifnet *)); void in6_restoremkludge __P((struct in6_ifaddr *, struct ifnet *)); void in6_purgemkludge __P((struct ifnet *)); struct in6_ifaddr *in6ifa_ifpforlinklocal __P((struct ifnet *, int)); diff --git a/sys/netinet6/ip6_output.c b/sys/netinet6/ip6_output.c index 193f877b3a52..b7f0cdd259e9 100644 --- a/sys/netinet6/ip6_output.c +++ b/sys/netinet6/ip6_output.c @@ -2764,16 +2764,9 @@ ip6_setmoptions(optname, im6op, m) * Everything looks good; add a new record to the multicast * address list for the given interface. */ - imm = malloc(sizeof(*imm), M_IP6MADDR, M_WAITOK); - if (imm == NULL) { - error = ENOBUFS; + imm = in6_joingroup(ifp, &mreq->ipv6mr_multiaddr, &error, 0); + if (imm == NULL) break; - } - if ((imm->i6mm_maddr = - in6_addmulti(&mreq->ipv6mr_multiaddr, ifp, &error)) == NULL) { - free(imm, M_IP6MADDR); - break; - } LIST_INSERT_HEAD(&im6o->im6o_memberships, imm, i6mm_chain); break; @@ -3206,7 +3199,9 @@ ip6_setpktopt(optname, buf, len, opt, priv, sticky, cmsg, uproto) /* turn off the previous option, then set the new option. */ ip6_clearpktopts(opt, IPV6_NEXTHOP); - opt->ip6po_nexthop = malloc(*buf, M_IP6OPT, M_WAITOK); + opt->ip6po_nexthop = malloc(*buf, M_IP6OPT, M_NOWAIT); + if (opt->ip6po_nexthop == NULL) + return (ENOBUFS); bcopy(buf, opt->ip6po_nexthop, *buf); break; @@ -3239,7 +3234,9 @@ ip6_setpktopt(optname, buf, len, opt, priv, sticky, cmsg, uproto) /* turn off the previous option, then set the new option. */ ip6_clearpktopts(opt, IPV6_HOPOPTS); - opt->ip6po_hbh = malloc(hbhlen, M_IP6OPT, M_WAITOK); + opt->ip6po_hbh = malloc(hbhlen, M_IP6OPT, M_NOWAIT); + if (opt->ip6po_hbh == NULL) + return (ENOBUFS); bcopy(hbh, opt->ip6po_hbh, hbhlen); break; @@ -3301,7 +3298,9 @@ ip6_setpktopt(optname, buf, len, opt, priv, sticky, cmsg, uproto) /* turn off the previous option, then set the new option. */ ip6_clearpktopts(opt, optname); - *newdest = malloc(destlen, M_IP6OPT, M_WAITOK); + *newdest = malloc(destlen, M_IP6OPT, M_NOWAIT); + if (newdest == NULL) + return (ENOBUFS); bcopy(dest, *newdest, destlen); break; @@ -3341,7 +3340,9 @@ ip6_setpktopt(optname, buf, len, opt, priv, sticky, cmsg, uproto) /* turn off the previous option */ ip6_clearpktopts(opt, IPV6_RTHDR); - opt->ip6po_rthdr = malloc(rthlen, M_IP6OPT, M_WAITOK); + opt->ip6po_rthdr = malloc(rthlen, M_IP6OPT, M_NOWAIT); + if (opt->ip6po_rthdr == NULL) + return (ENOBUFS); bcopy(rth, opt->ip6po_rthdr, rthlen); break; diff --git a/sys/netinet6/mld6.c b/sys/netinet6/mld6.c index f5194b787549..d6ea7c810bda 100644 --- a/sys/netinet6/mld6.c +++ b/sys/netinet6/mld6.c @@ -74,12 +74,15 @@ #include #include #include +#include +#include #include #include #include #include +#include #include #include #include @@ -101,9 +104,12 @@ #define MLD_UNSOLICITED_REPORT_INTERVAL 10 static struct ip6_pktopts ip6_opts; -static int mld6_timers_are_running; static void mld6_sendpkt(struct in6_multi *, int, const struct in6_addr *); +static void mld_starttimer(struct in6_multi *); +static void mld_stoptimer(struct in6_multi *); +static void mld_timeo(struct in6_multi *); +static u_long mld_timerresid(struct in6_multi *); void mld6_init() @@ -112,8 +118,6 @@ mld6_init() struct ip6_hbh *hbh = (struct ip6_hbh *)hbh_buf; u_int16_t rtalert_code = htons((u_int16_t)IP6OPT_RTALERT_MLD); - mld6_timers_are_running = 0; - /* ip6h_nxt will be fill in later */ hbh->ip6h_len = 0; /* (8 >> 3) - 1 */ @@ -128,6 +132,84 @@ mld6_init() ip6_opts.ip6po_hbh = hbh; } +static void +mld_starttimer(in6m) + struct in6_multi *in6m; +{ + struct timeval now; + + microtime(&now); + in6m->in6m_timer_expire.tv_sec = now.tv_sec + in6m->in6m_timer / hz; + in6m->in6m_timer_expire.tv_usec = now.tv_usec + + (in6m->in6m_timer % hz) * (1000000 / hz); + if (in6m->in6m_timer_expire.tv_usec > 1000000) { + in6m->in6m_timer_expire.tv_sec++; + in6m->in6m_timer_expire.tv_usec -= 1000000; + } + + /* start or restart the timer */ + callout_reset(in6m->in6m_timer_ch, in6m->in6m_timer, + (void (*) __P((void *)))mld_timeo, in6m); +} + +static void +mld_stoptimer(in6m) + struct in6_multi *in6m; +{ + if (in6m->in6m_timer == IN6M_TIMER_UNDEF) + return; + + callout_stop(in6m->in6m_timer_ch); + in6m->in6m_timer = IN6M_TIMER_UNDEF; +} + +static void +mld_timeo(in6m) + struct in6_multi *in6m; +{ + int s = splnet(); + + in6m->in6m_timer = IN6M_TIMER_UNDEF; + + callout_stop(in6m->in6m_timer_ch); + + switch (in6m->in6m_state) { + case MLD_REPORTPENDING: + mld6_start_listening(in6m); + break; + default: + mld6_sendpkt(in6m, MLD_LISTENER_REPORT, NULL); + break; + } + + splx(s); +} + +static u_long +mld_timerresid(in6m) + struct in6_multi *in6m; +{ + struct timeval now, diff; + + microtime(&now); + + if (now.tv_sec > in6m->in6m_timer_expire.tv_sec || + (now.tv_sec == in6m->in6m_timer_expire.tv_sec && + now.tv_usec > in6m->in6m_timer_expire.tv_usec)) { + return (0); + } + diff = in6m->in6m_timer_expire; + diff.tv_sec -= now.tv_sec; + diff.tv_usec -= now.tv_usec; + if (diff.tv_usec < 0) { + diff.tv_sec--; + diff.tv_usec += 1000000; + } + + /* return the remaining time in milliseconds */ + return (((u_long)(diff.tv_sec * 1000000 + diff.tv_usec)) / 1000); +} + void mld6_start_listening(in6m) struct in6_multi *in6m; @@ -155,11 +237,11 @@ mld6_start_listening(in6m) in6m->in6m_state = MLD_OTHERLISTENER; } else { mld6_sendpkt(in6m, MLD_LISTENER_REPORT, NULL); - in6m->in6m_timer = - MLD_RANDOM_DELAY(MLD_UNSOLICITED_REPORT_INTERVAL * - PR_FASTHZ); + in6m->in6m_timer = arc4random() % + MLD_UNSOLICITED_REPORT_INTERVAL * hz; in6m->in6m_state = MLD_IREPORTEDLAST; - mld6_timers_are_running = 1; + + mld_starttimer(in6m); } splx(s); } @@ -276,6 +358,8 @@ mld6_input(m, off) * - Use the value specified in the query message as * the maximum timeout. */ + timer = ntohs(mldh->mld_maxdelay); + IFP_TO_IA6(ifp, ia); if (ia == NULL) break; @@ -305,16 +389,17 @@ mld6_input(m, off) IN6_ARE_ADDR_EQUAL(&mld_addr, &in6m->in6m_addr)) { if (timer == 0) { /* send a report immediately */ + mld_stoptimer(in6m); mld6_sendpkt(in6m, MLD_LISTENER_REPORT, NULL); in6m->in6m_timer = 0; /* reset timer */ in6m->in6m_state = MLD_IREPORTEDLAST; } else if (in6m->in6m_timer == 0 || /*idle state*/ - in6m->in6m_timer > timer) { - in6m->in6m_timer = - MLD_RANDOM_DELAY(timer); - mld6_timers_are_running = 1; + mld_timerresid(in6m) > (u_long)timer) { + in6m->in6m_timer = arc4random() % + (int)((long)(timer * hz) / 1000); + mld_starttimer(in6m); } } } @@ -355,39 +440,6 @@ mld6_input(m, off) m_freem(m); } -void -mld6_fasttimeo() -{ - struct in6_multi *in6m; - struct in6_multistep step; - int s; - - /* - * Quick check to see if any work needs to be done, in order - * to minimize the overhead of fasttimo processing. - */ - if (!mld6_timers_are_running) - return; - - s = splnet(); - - mld6_timers_are_running = 0; - IN6_FIRST_MULTI(step, in6m); - while (in6m != NULL) { - if (in6m->in6m_timer == 0) { - /* do nothing */ - } else if (--in6m->in6m_timer == 0) { - mld6_sendpkt(in6m, MLD_LISTENER_REPORT, NULL); - in6m->in6m_state = MLD_IREPORTEDLAST; - } else { - mld6_timers_are_running = 1; - } - IN6_NEXT_MULTI(step, in6m); - } - - splx(s); -} - static void mld6_sendpkt(in6m, type, dst) struct in6_multi *in6m; @@ -492,10 +544,10 @@ mld6_sendpkt(in6m, type, dst) * and the number of source is not 0. */ struct in6_multi * -in6_addmulti(maddr6, ifp, errorp) +in6_addmulti(maddr6, ifp, errorp, delay) struct in6_addr *maddr6; struct ifnet *ifp; - int *errorp; + int *errorp, delay; { struct in6_multi *in6m; struct ifmultiaddr *ifma; @@ -542,8 +594,25 @@ in6_addmulti(maddr6, ifp, errorp) in6m->in6m_refcount = 1; in6m->in6m_ifma = ifma; ifma->ifma_protospec = in6m; + in6m->in6m_timer_ch = malloc(sizeof(*in6m->in6m_timer_ch), M_IP6MADDR, + M_NOWAIT); + if (in6m->in6m_timer_ch == NULL) { + free(in6m, M_IP6MADDR); + splx(s); + return (NULL); + } LIST_INSERT_HEAD(&in6_multihead, in6m, in6m_entry); + callout_init(in6m->in6m_timer_ch, 0); + in6m->in6m_timer = delay; + if (in6m->in6m_timer > 0) { + in6m->in6m_state = MLD_REPORTPENDING; + mld_starttimer(in6m); + + splx(s); + return (in6m); + } + /* * Let MLD6 know that we have joined a new IPv6 multicast * group. @@ -571,6 +640,7 @@ in6_delmulti(in6m) mld6_stop_listening(in6m); ifma->ifma_protospec = NULL; LIST_REMOVE(in6m, in6m_entry); + free(in6m->in6m_timer_ch, M_IP6MADDR); free(in6m, M_IP6MADDR); } /* XXX - should be separate API for when we have an ifma? */ diff --git a/sys/netinet6/mld6_var.h b/sys/netinet6/mld6_var.h index af98e97074e3..a4e52f21f28c 100644 --- a/sys/netinet6/mld6_var.h +++ b/sys/netinet6/mld6_var.h @@ -42,6 +42,7 @@ */ #define MLD_OTHERLISTENER 0 #define MLD_IREPORTEDLAST 1 +#define MLD_REPORTPENDING 2 /* implementation specific */ void mld6_init(void); void mld6_input(struct mbuf *, int); diff --git a/sys/netinet6/nd6.c b/sys/netinet6/nd6.c index 8331ea1ca840..da2a6170f38b 100644 --- a/sys/netinet6/nd6.c +++ b/sys/netinet6/nd6.c @@ -67,6 +67,8 @@ #include #include +#include + #include #define ND6_SLOWTIMER_INTERVAL (60 * 60) /* 1 hour */ @@ -87,6 +89,7 @@ int nd6_gctimer = (60 * 60 * 24); /* 1 day: garbage collection timer */ int nd6_maxndopt = 10; /* max # of ND options allowed */ int nd6_maxnudhint = 0; /* max # of subsequent upper layer hints */ +int nd6_maxqueuelen = 1; /* max # of packets cached in unresolved ND entries */ #ifdef ND6_DEBUG int nd6_debug = 1; @@ -109,6 +112,8 @@ static int nd6_is_new_addr_neighbor __P((struct sockaddr_in6 *, static void nd6_setmtu0 __P((struct ifnet *, struct nd_ifinfo *)); static void nd6_slowtimo __P((void *)); static int regen_tmpaddr __P((struct in6_ifaddr *)); +static struct llinfo_nd6 *nd6_free __P((struct rtentry *, int)); +static void nd6_llinfo_timer __P((void *)); struct callout nd6_slowtimo_ch; struct callout nd6_timer_ch; @@ -382,6 +387,133 @@ nd6_options(ndopts) return 0; } +/* + * ND6 timer routine to handle ND6 entries + */ +void +nd6_llinfo_settimer(ln, tick) + struct llinfo_nd6 *ln; + long tick; +{ + if (tick < 0) { + ln->ln_expire = 0; + ln->ln_ntick = 0; + callout_stop(&ln->ln_timer_ch); + } else { + ln->ln_expire = time_second + tick / hz; + if (tick > INT_MAX) { + ln->ln_ntick = tick - INT_MAX; + callout_reset(&ln->ln_timer_ch, INT_MAX, + nd6_llinfo_timer, ln); + } else { + ln->ln_ntick = 0; + callout_reset(&ln->ln_timer_ch, tick, + nd6_llinfo_timer, ln); + } + } +} + +static void +nd6_llinfo_timer(arg) + void *arg; +{ + struct llinfo_nd6 *ln; + struct rtentry *rt; + struct in6_addr *dst; + struct ifnet *ifp; + struct nd_ifinfo *ndi = NULL; + + ln = (struct llinfo_nd6 *)arg; + + if (ln->ln_ntick > 0) { + if (ln->ln_ntick > INT_MAX) { + ln->ln_ntick -= INT_MAX; + nd6_llinfo_settimer(ln, INT_MAX); + } else { + ln->ln_ntick = 0; + nd6_llinfo_settimer(ln, ln->ln_ntick); + } + return; + } + + if ((rt = ln->ln_rt) == NULL) + panic("ln->ln_rt == NULL"); + if ((ifp = rt->rt_ifp) == NULL) + panic("ln->ln_rt->rt_ifp == NULL"); + ndi = ND_IFINFO(ifp); + + /* sanity check */ + if (rt->rt_llinfo && (struct llinfo_nd6 *)rt->rt_llinfo != ln) + panic("rt_llinfo(%p) is not equal to ln(%p)", + rt->rt_llinfo, ln); + if (rt_key(rt) == NULL) + panic("rt key is NULL in nd6_timer(ln=%p)", ln); + + dst = &((struct sockaddr_in6 *)rt_key(rt))->sin6_addr; + + switch (ln->ln_state) { + case ND6_LLINFO_INCOMPLETE: + if (ln->ln_asked < nd6_mmaxtries) { + ln->ln_asked++; + nd6_llinfo_settimer(ln, (long)ndi->retrans * hz / 1000); + nd6_ns_output(ifp, NULL, dst, ln, 0); + } else { + struct mbuf *m = ln->ln_hold; + if (m) { + /* + * assuming every packet in ln_hold has the + * same IP header + */ + ln->ln_hold = NULL; + icmp6_error2(m, ICMP6_DST_UNREACH, + ICMP6_DST_UNREACH_ADDR, 0, rt->rt_ifp); + } + if (rt) + (void)nd6_free(rt, 0); + ln = NULL; + } + break; + case ND6_LLINFO_REACHABLE: + if (!ND6_LLINFO_PERMANENT(ln)) { + ln->ln_state = ND6_LLINFO_STALE; + nd6_llinfo_settimer(ln, (long)nd6_gctimer * hz); + } + break; + + case ND6_LLINFO_STALE: + /* Garbage Collection(RFC 2461 5.3) */ + if (!ND6_LLINFO_PERMANENT(ln)) { + (void)nd6_free(rt, 1); + ln = NULL; + } + break; + + case ND6_LLINFO_DELAY: + if (ndi && (ndi->flags & ND6_IFF_PERFORMNUD) != 0) { + /* We need NUD */ + ln->ln_asked = 1; + ln->ln_state = ND6_LLINFO_PROBE; + nd6_llinfo_settimer(ln, (long)ndi->retrans * hz / 1000); + nd6_ns_output(ifp, dst, dst, ln, 0); + } else { + ln->ln_state = ND6_LLINFO_STALE; /* XXX */ + nd6_llinfo_settimer(ln, (long)nd6_gctimer * hz); + } + break; + case ND6_LLINFO_PROBE: + if (ln->ln_asked < nd6_umaxtries) { + ln->ln_asked++; + nd6_llinfo_settimer(ln, (long)ndi->retrans * hz / 1000); + nd6_ns_output(ifp, dst, dst, ln, 0); + } else { + (void)nd6_free(rt, 0); + ln = NULL; + } + break; + } +} + + /* * ND6 timer routine to expire default route list and prefix list */ @@ -390,117 +522,16 @@ nd6_timer(ignored_arg) void *ignored_arg; { int s; - struct llinfo_nd6 *ln; struct nd_defrouter *dr; struct nd_prefix *pr; - struct ifnet *ifp; struct in6_ifaddr *ia6, *nia6; struct in6_addrlifetime *lt6; - s = splnet(); callout_reset(&nd6_timer_ch, nd6_prune * hz, nd6_timer, NULL); - ln = llinfo_nd6.ln_next; - while (ln && ln != &llinfo_nd6) { - struct rtentry *rt; - struct sockaddr_in6 *dst; - struct llinfo_nd6 *next = ln->ln_next; - /* XXX: used for the DELAY case only: */ - struct nd_ifinfo *ndi = NULL; - - if ((rt = ln->ln_rt) == NULL) { - ln = next; - continue; - } - if ((ifp = rt->rt_ifp) == NULL) { - ln = next; - continue; - } - ndi = ND_IFINFO(ifp); - dst = (struct sockaddr_in6 *)rt_key(rt); - - if (ln->ln_expire > time_second) { - ln = next; - continue; - } - - /* sanity check */ - if (!rt) - panic("rt=0 in nd6_timer(ln=%p)", ln); - if (rt->rt_llinfo && (struct llinfo_nd6 *)rt->rt_llinfo != ln) - panic("rt_llinfo(%p) is not equal to ln(%p)", - rt->rt_llinfo, ln); - if (!dst) - panic("dst=0 in nd6_timer(ln=%p)", ln); - - switch (ln->ln_state) { - case ND6_LLINFO_INCOMPLETE: - if (ln->ln_asked < nd6_mmaxtries) { - ln->ln_asked++; - ln->ln_expire = time_second + - ND_IFINFO(ifp)->retrans / 1000; - nd6_ns_output(ifp, NULL, &dst->sin6_addr, - ln, 0); - } else { - struct mbuf *m = ln->ln_hold; - if (m) { - /* - * assuming every packet in ln_hold has - * the same IP header - */ - ln->ln_hold = NULL; - icmp6_error2(m, ICMP6_DST_UNREACH, - ICMP6_DST_UNREACH_ADDR, 0, - rt->rt_ifp); - } - next = nd6_free(rt); - } - break; - case ND6_LLINFO_REACHABLE: - if (ln->ln_expire) { - ln->ln_state = ND6_LLINFO_STALE; - ln->ln_expire = time_second + nd6_gctimer; - } - break; - - case ND6_LLINFO_STALE: - /* Garbage Collection(RFC 2461 5.3) */ - if (ln->ln_expire) - next = nd6_free(rt); - break; - - case ND6_LLINFO_DELAY: - if (ndi && (ndi->flags & ND6_IFF_PERFORMNUD) != 0) { - /* We need NUD */ - ln->ln_asked = 1; - ln->ln_state = ND6_LLINFO_PROBE; - ln->ln_expire = time_second + - ndi->retrans / 1000; - nd6_ns_output(ifp, &dst->sin6_addr, - &dst->sin6_addr, - ln, 0); - } else { - ln->ln_state = ND6_LLINFO_STALE; /* XXX */ - ln->ln_expire = time_second + nd6_gctimer; - } - break; - case ND6_LLINFO_PROBE: - if (ln->ln_asked < nd6_umaxtries) { - ln->ln_asked++; - ln->ln_expire = time_second + - ND_IFINFO(ifp)->retrans / 1000; - nd6_ns_output(ifp, &dst->sin6_addr, - &dst->sin6_addr, ln, 0); - } else { - next = nd6_free(rt); - } - break; - } - ln = next; - } - /* expire default router list */ + s = splnet(); dr = TAILQ_FIRST(&nd_defrouter); while (dr) { if (dr->expire && dr->expire < time_second) { @@ -594,7 +625,8 @@ nd6_timer(ignored_arg) * since pltime is just for autoconf, pltime processing for * prefix is not necessary. */ - if (pr->ndpr_expire && pr->ndpr_expire < time_second) { + if (pr->ndpr_vltime != ND6_INFINITE_LIFETIME && + time_second - pr->ndpr_lastupdate > pr->ndpr_vltime) { struct nd_prefix *t; t = pr->ndpr_next; @@ -663,7 +695,7 @@ regen_tmpaddr(ia6) if (public_ifa6 != NULL) { int e; - if ((e = in6_tmpifadd(public_ifa6, 0)) != 0) { + if ((e = in6_tmpifadd(public_ifa6, 0, 0)) != 0) { log(LOG_NOTICE, "regen_tmpaddr: failed to create a new" " tmp addr,errno=%d\n", e); return (-1); @@ -683,21 +715,29 @@ nd6_purge(ifp) struct ifnet *ifp; { struct llinfo_nd6 *ln, *nln; - struct nd_defrouter *dr, *ndr, drany; + struct nd_defrouter *dr, *ndr; struct nd_prefix *pr, *npr; - /* Nuke default router list entries toward ifp */ - if ((dr = TAILQ_FIRST(&nd_defrouter)) != NULL) { - /* - * The first entry of the list may be stored in - * the routing table, so we'll delete it later. - */ - for (dr = TAILQ_NEXT(dr, dr_entry); dr; dr = ndr) { - ndr = TAILQ_NEXT(dr, dr_entry); - if (dr->ifp == ifp) - defrtrlist_del(dr); - } - dr = TAILQ_FIRST(&nd_defrouter); + /* + * Nuke default router list entries toward ifp. + * We defer removal of default router list entries that is installed + * in the routing table, in order to keep additional side effects as + * small as possible. + */ + for (dr = TAILQ_FIRST(&nd_defrouter); dr; dr = ndr) { + ndr = TAILQ_NEXT(dr, dr_entry); + if (dr->installed) + continue; + + if (dr->ifp == ifp) + defrtrlist_del(dr); + } + + for (dr = TAILQ_FIRST(&nd_defrouter); dr; dr = ndr) { + ndr = TAILQ_NEXT(dr, dr_entry); + if (!dr->installed) + continue; + if (dr->ifp == ifp) defrtrlist_del(dr); } @@ -706,6 +746,14 @@ nd6_purge(ifp) for (pr = nd_prefix.lh_first; pr; pr = npr) { npr = pr->ndpr_next; if (pr->ndpr_ifp == ifp) { + /* + * Because if_detach() does *not* release prefixes + * while purging addresses the reference count will + * still be above zero. We therefore reset it to + * make sure that the prefix really gets purged. + */ + pr->ndpr_refcnt = 0; + /* * Previously, pr->ndpr_addr is removed as well, * but I strongly believe we don't have to do it. @@ -724,8 +772,6 @@ nd6_purge(ifp) if (!ip6_forwarding && ip6_accept_rtadv) { /* XXX: too restrictive? */ /* refresh default router list */ - bzero(&drany, sizeof(drany)); - defrouter_delreq(&drany, 0); defrouter_select(); } @@ -746,7 +792,7 @@ nd6_purge(ifp) rt->rt_gateway->sa_family == AF_LINK) { sdl = (struct sockaddr_dl *)rt->rt_gateway; if (sdl->sdl_index == ifp->if_index) - nln = nd6_free(rt); + nln = nd6_free(rt, 0); } ln = nln; } @@ -833,6 +879,10 @@ nd6_lookup(addr6, create, ifp) * own address on a non-loopback interface. Instead, we should * use rt->rt_ifa->ifa_ifp, which would specify the REAL * interface. + * Note also that ifa_ifp and ifp may differ when we connect two + * interfaces to a same link, install a link prefix to an interface, + * and try to install a neighbor cache on an interface that does not + * have a route to the prefix. */ if ((rt->rt_flags & RTF_GATEWAY) || (rt->rt_flags & RTF_LLINFO) == 0 || rt->rt_gateway->sa_family != AF_LINK || rt->rt_llinfo == NULL || @@ -861,6 +911,7 @@ nd6_is_new_addr_neighbor(addr, ifp) struct ifnet *ifp; { struct nd_prefix *pr; + struct ifaddr *dstaddr; /* * A link-local address is always a neighbor. @@ -903,6 +954,14 @@ nd6_is_new_addr_neighbor(addr, ifp) return (1); } + /* + * If the address is assigned on the node of the other side of + * a p2p interface, the address should be a neighbor. + */ + dstaddr = ifa_ifwithdstaddr((struct sockaddr *)addr); + if ((dstaddr != NULL) && (dstaddr->ifa_ifp == ifp)) + return (1); + /* * If the default router list is empty, all addresses are regarded * as on-link, and thus, as a neighbor. @@ -943,10 +1002,14 @@ nd6_is_addr_neighbor(addr, ifp) /* * Free an nd6 llinfo entry. + * Since the function would cause significant changes in the kernel, DO NOT + * make it global, unless you have a strong reason for the change, and are sure + * that the change is safe. */ -struct llinfo_nd6 * -nd6_free(rt) +static struct llinfo_nd6 * +nd6_free(rt, gc) struct rtentry *rt; + int gc; { struct llinfo_nd6 *ln = (struct llinfo_nd6 *)rt->rt_llinfo, *next; struct in6_addr in6 = ((struct sockaddr_in6 *)rt_key(rt))->sin6_addr; @@ -957,12 +1020,38 @@ nd6_free(rt) * even though it is not harmful, it was not really necessary. */ - if (!ip6_forwarding && ip6_accept_rtadv) { /* XXX: too restrictive? */ + /* cancel timer */ + nd6_llinfo_settimer(ln, -1); + + if (!ip6_forwarding) { int s; s = splnet(); dr = defrouter_lookup(&((struct sockaddr_in6 *)rt_key(rt))->sin6_addr, rt->rt_ifp); + if (dr != NULL && dr->expire && + ln->ln_state == ND6_LLINFO_STALE && gc) { + /* + * If the reason for the deletion is just garbage + * collection, and the neighbor is an active default + * router, do not delete it. Instead, reset the GC + * timer using the router's lifetime. + * Simply deleting the entry would affect default + * router selection, which is not necessarily a good + * thing, especially when we're using router preference + * values. + * XXX: the check for ln_state would be redundant, + * but we intentionally keep it just in case. + */ + if (dr->expire > time_second) + nd6_llinfo_settimer(ln, + (dr->expire - time_second) * hz); + else + nd6_llinfo_settimer(ln, (long)nd6_gctimer * hz); + splx(s); + return (ln->ln_next); + } + if (ln->ln_router || dr) { /* * rt6_flush must be called whether or not the neighbor @@ -996,19 +1085,10 @@ nd6_free(rt) */ pfxlist_onlink_check(); - if (dr == TAILQ_FIRST(&nd_defrouter)) { - /* - * It is used as the current default router, - * so we have to move it to the end of the - * list and choose a new one. - * XXX: it is not very efficient if this is - * the only router. - */ - TAILQ_REMOVE(&nd_defrouter, dr, dr_entry); - TAILQ_INSERT_TAIL(&nd_defrouter, dr, dr_entry); - - defrouter_select(); - } + /* + * refresh default router list + */ + defrouter_select(); } splx(s); } @@ -1079,9 +1159,10 @@ nd6_nud_hint(rt, dst6, force) } ln->ln_state = ND6_LLINFO_REACHABLE; - if (ln->ln_expire) - ln->ln_expire = time_second + - ND_IFINFO(rt->rt_ifp)->reachable; + if (!ND6_LLINFO_PERMANENT(ln)) { + nd6_llinfo_settimer(ln, + (long)ND_IFINFO(rt->rt_ifp)->reachable * hz); + } } void @@ -1143,12 +1224,13 @@ nd6_rtrequest(req, rt, info) * SIN(rt_mask(rt))->sin_addr.s_addr != 0xffffffff) * rt->rt_flags |= RTF_CLONING; */ - if (rt->rt_flags & (RTF_CLONING | RTF_LLINFO)) { + if ((rt->rt_flags & RTF_CLONING) || + ((rt->rt_flags & RTF_LLINFO) && ln == NULL)) { /* - * Case 1: This route should come from - * a route to interface. RTF_LLINFO flag is set - * for a host route whose destination should be - * treated as on-link. + * Case 1: This route should come from a route to + * interface (RTF_CLONING case) or the route should be + * treated as on-link but is currently not + * (RTF_LLINFO && ln == NULL case). */ rt_setgate(rt, rt_key(rt), (struct sockaddr *)&null_sdl); @@ -1156,11 +1238,7 @@ nd6_rtrequest(req, rt, info) SDL(gate)->sdl_type = ifp->if_type; SDL(gate)->sdl_index = ifp->if_index; if (ln) - ln->ln_expire = time_second; - if (ln && ln->ln_expire == 0) { - /* kludge for desktops */ - ln->ln_expire = 1; - } + nd6_llinfo_settimer(ln, 0); if ((rt->rt_flags & RTF_CLONING) != 0) break; } @@ -1215,6 +1293,8 @@ nd6_rtrequest(req, rt, info) nd6_allocated++; bzero(ln, sizeof(*ln)); ln->ln_rt = rt; + callout_init(&ln->ln_timer_ch, 0); + /* this is required for "ndp" command. - shin */ if (req == RTM_ADD) { /* @@ -1230,7 +1310,7 @@ nd6_rtrequest(req, rt, info) * initialized in rtrequest(), so rt_expire is 0. */ ln->ln_state = ND6_LLINFO_NOSTATE; - ln->ln_expire = time_second; + nd6_llinfo_settimer(ln, 0); } rt->rt_flags |= RTF_LLINFO; ln->ln_next = llinfo_nd6.ln_next; @@ -1246,7 +1326,7 @@ nd6_rtrequest(req, rt, info) &SIN6(rt_key(rt))->sin6_addr); if (ifa) { caddr_t macp = nd6_ifptomac(ifp); - ln->ln_expire = 0; + nd6_llinfo_settimer(ln, -1); ln->ln_state = ND6_LLINFO_REACHABLE; ln->ln_byhint = 0; if (macp) { @@ -1270,7 +1350,7 @@ nd6_rtrequest(req, rt, info) } } } else if (rt->rt_flags & RTF_ANNOUNCE) { - ln->ln_expire = 0; + nd6_llinfo_settimer(ln, -1); ln->ln_state = ND6_LLINFO_REACHABLE; ln->ln_byhint = 0; @@ -1286,7 +1366,8 @@ nd6_rtrequest(req, rt, info) llsol.s6_addr8[12] = 0xff; if (in6_setscope(&llsol, ifp, NULL)) break; - if (!in6_addmulti(&llsol, ifp, &error)) { + if (in6_addmulti(&llsol, ifp, + &error, 0) == NULL) { nd6log((LOG_ERR, "%s: failed to join " "%s (errno=%d)\n", if_name(ifp), ip6_sprintf(&llsol), error)); @@ -1320,6 +1401,7 @@ nd6_rtrequest(req, rt, info) ln->ln_next->ln_prev = ln->ln_prev; ln->ln_prev->ln_next = ln->ln_next; ln->ln_prev = NULL; + nd6_llinfo_settimer(ln, -1); rt->rt_llinfo = 0; rt->rt_flags &= ~RTF_LLINFO; if (ln->ln_hold) @@ -1339,7 +1421,7 @@ nd6_ioctl(cmd, data, ifp) struct in6_ndireq *ndi = (struct in6_ndireq *)data; struct in6_nbrinfo *nbi = (struct in6_nbrinfo *)data; struct in6_ndifreq *ndif = (struct in6_ndifreq *)data; - struct nd_defrouter *dr, any; + struct nd_defrouter *dr; struct nd_prefix *pr; struct rtentry *rt; int i = 0, error = 0; @@ -1392,7 +1474,22 @@ nd6_ioctl(cmd, data, ifp) oprl->prefix[i].vltime = pr->ndpr_vltime; oprl->prefix[i].pltime = pr->ndpr_pltime; oprl->prefix[i].if_index = pr->ndpr_ifp->if_index; - oprl->prefix[i].expire = pr->ndpr_expire; + if (pr->ndpr_vltime == ND6_INFINITE_LIFETIME) + oprl->prefix[i].expire = 0; + else { + time_t maxexpire; + + /* XXX: we assume time_t is signed. */ + maxexpire = (-1) & + ~(1 << ((sizeof(maxexpire) * 8) - 1)); + if (pr->ndpr_vltime < + maxexpire - pr->ndpr_lastupdate) { + oprl->prefix[i].expire = + pr->ndpr_lastupdate + + pr->ndpr_vltime; + } else + oprl->prefix[i].expire = maxexpire; + } pfr = pr->ndpr_advrtrs.lh_first; j = 0; @@ -1430,7 +1527,6 @@ nd6_ioctl(cmd, data, ifp) break; case SIOCGIFINFO_IN6: ND = *ND_IFINFO(ifp); - ND.linkmtu = IN6_LINKMTU(ifp); break; case SIOCSIFINFO_IN6: /* @@ -1465,15 +1561,9 @@ nd6_ioctl(cmd, data, ifp) break; #undef ND case SIOCSNDFLUSH_IN6: /* XXX: the ioctl name is confusing... */ - /* flush default router list */ - /* - * xxx sumikawa: should not delete route if default - * route equals to the top of default router list - */ - bzero(&any, sizeof(any)); - defrouter_delreq(&any, 0); + /* sync kernel routing table with the default router list */ + defrouter_reset(); defrouter_select(); - /* xxx sumikawa: flush prefix list */ break; case SIOCSPFXFLUSH_IN6: { @@ -1511,17 +1601,12 @@ nd6_ioctl(cmd, data, ifp) struct nd_defrouter *dr, *next; s = splnet(); - if ((dr = TAILQ_FIRST(&nd_defrouter)) != NULL) { - /* - * The first entry of the list may be stored in - * the routing table, so we'll delete it later. - */ - for (dr = TAILQ_NEXT(dr, dr_entry); dr; dr = next) { - next = TAILQ_NEXT(dr, dr_entry); - defrtrlist_del(dr); - } - defrtrlist_del(TAILQ_FIRST(&nd_defrouter)); + defrouter_reset(); + for (dr = TAILQ_FIRST(&nd_defrouter); dr; dr = next) { + next = TAILQ_NEXT(dr, dr_entry); + defrtrlist_del(dr); } + defrouter_select(); splx(s); break; } @@ -1613,7 +1698,7 @@ nd6_cache_lladdr(ifp, from, lladdr, lladdrlen, type, code) return NULL; if ((rt->rt_flags & (RTF_GATEWAY | RTF_LLINFO)) != RTF_LLINFO) { fail: - (void)nd6_free(rt); + (void)nd6_free(rt, 0); return NULL; } ln = (struct llinfo_nd6 *)rt->rt_llinfo; @@ -1682,20 +1767,36 @@ nd6_cache_lladdr(ifp, from, lladdr, lladdrlen, type, code) * we must set the timer now, although it is actually * meaningless. */ - ln->ln_expire = time_second + nd6_gctimer; + nd6_llinfo_settimer(ln, (long)nd6_gctimer * hz); if (ln->ln_hold) { - /* - * we assume ifp is not a p2p here, so just - * set the 2nd argument as the 1st one. - */ - nd6_output(ifp, ifp, ln->ln_hold, - (struct sockaddr_in6 *)rt_key(rt), rt); + struct mbuf *m_hold, *m_hold_next; + for (m_hold = ln->ln_hold; m_hold; + m_hold = m_hold_next) { + struct mbuf *mpkt = NULL; + + m_hold_next = m_hold->m_nextpkt; + mpkt = m_copym(m_hold, 0, M_COPYALL, M_DONTWAIT); + if (mpkt == NULL) { + m_freem(m_hold); + break; + } + mpkt->m_nextpkt = NULL; + + /* + * we assume ifp is not a p2p here, so + * just set the 2nd argument as the + * 1st one. + */ + nd6_output(ifp, ifp, mpkt, + (struct sockaddr_in6 *)rt_key(rt), + rt); + } ln->ln_hold = NULL; } } else if (ln->ln_state == ND6_LLINFO_INCOMPLETE) { /* probe right away */ - ln->ln_expire = time_second; + nd6_llinfo_settimer((void *)ln, 0); } } @@ -1789,7 +1890,6 @@ static void nd6_slowtimo(ignored_arg) void *ignored_arg; { - int s = splnet(); struct nd_ifinfo *nd6if; struct ifnet *ifp; @@ -1811,7 +1911,6 @@ nd6_slowtimo(ignored_arg) } } IFNET_RUNLOCK(); - splx(s); } #define senderr(e) { error = (e); goto bad;} @@ -1931,7 +2030,7 @@ nd6_output(ifp, origifp, m0, dst, rt0) if ((ifp->if_flags & IFF_POINTOPOINT) != 0 && ln->ln_state < ND6_LLINFO_REACHABLE) { ln->ln_state = ND6_LLINFO_STALE; - ln->ln_expire = time_second + nd6_gctimer; + nd6_llinfo_settimer(ln, (long)nd6_gctimer * hz); } /* @@ -1944,7 +2043,7 @@ nd6_output(ifp, origifp, m0, dst, rt0) if (ln->ln_state == ND6_LLINFO_STALE) { ln->ln_asked = 0; ln->ln_state = ND6_LLINFO_DELAY; - ln->ln_expire = time_second + nd6_delay; + nd6_llinfo_settimer(ln, (long)nd6_delay * hz); } /* @@ -1957,26 +2056,44 @@ nd6_output(ifp, origifp, m0, dst, rt0) /* * There is a neighbor cache entry, but no ethernet address - * response yet. Replace the held mbuf (if any) with this - * latest one. - * - * This code conforms to the rate-limiting rule described in Section - * 7.2.2 of RFC 2461, because the timer is set correctly after sending - * an NS below. + * response yet. Append this latest packet to the end of the + * packet queue in the mbuf, unless the number of the packet + * does not exceed nd6_maxqueuelen. When it exceeds nd6_maxqueuelen, + * the oldest packet in the queue will be removed. */ if (ln->ln_state == ND6_LLINFO_NOSTATE) ln->ln_state = ND6_LLINFO_INCOMPLETE; - if (ln->ln_hold) - m_freem(ln->ln_hold); - ln->ln_hold = m; - if (ln->ln_expire) { - if (ln->ln_asked < nd6_mmaxtries && - ln->ln_expire < time_second) { - ln->ln_asked++; - ln->ln_expire = time_second + - ND_IFINFO(ifp)->retrans / 1000; - nd6_ns_output(ifp, NULL, &dst->sin6_addr, ln, 0); + if (ln->ln_hold) { + struct mbuf *m_hold; + int i; + + i = 0; + for (m_hold = ln->ln_hold; m_hold; m_hold = m_hold->m_nextpkt) { + i++; + if (m_hold->m_nextpkt == NULL) { + m_hold->m_nextpkt = m; + break; + } } + while (i >= nd6_maxqueuelen) { + m_hold = ln->ln_hold; + ln->ln_hold = ln->ln_hold->m_nextpkt; + m_free(m_hold); + i--; + } + } else { + ln->ln_hold = m; + } + + /* + * If there has been no NS for the neighbor after entering the + * INCOMPLETE state, send the first solicitation. + */ + if (!ND6_LLINFO_PERMANENT(ln) && ln->ln_asked == 0) { + ln->ln_asked++; + nd6_llinfo_settimer(ln, + (long)ND_IFINFO(ifp)->retrans * hz / 1000); + nd6_ns_output(ifp, NULL, &dst->sin6_addr, ln, 0); } return (0); @@ -2128,6 +2245,8 @@ SYSCTL_NODE(_net_inet6_icmp6, ICMPV6CTL_ND6_DRLIST, nd6_drlist, CTLFLAG_RD, nd6_sysctl_drlist, ""); SYSCTL_NODE(_net_inet6_icmp6, ICMPV6CTL_ND6_PRLIST, nd6_prlist, CTLFLAG_RD, nd6_sysctl_prlist, ""); +SYSCTL_INT(_net_inet6_icmp6, ICMPV6CTL_ND6_MAXQLEN, nd6_maxqueuelen, + CTLFLAG_RW, &nd6_maxqueuelen, 1, ""); static int nd6_sysctl_drlist(SYSCTL_HANDLER_ARGS) @@ -2151,12 +2270,7 @@ nd6_sysctl_drlist(SYSCTL_HANDLER_ARGS) d->rtaddr.sin6_family = AF_INET6; d->rtaddr.sin6_len = sizeof(d->rtaddr); d->rtaddr.sin6_addr = dr->rtaddr; - if (sa6_recoverscope(&d->rtaddr)) { - log(LOG_ERR, - "scope error in router list (%s)\n", - ip6_sprintf(&d->rtaddr.sin6_addr)); - /* XXX: press on... */ - } + sa6_recoverscope(&d->rtaddr); d->flags = dr->flags; d->rtlifetime = dr->rtlifetime; d->expire = dr->expire; @@ -2209,7 +2323,21 @@ nd6_sysctl_prlist(SYSCTL_HANDLER_ARGS) p->vltime = pr->ndpr_vltime; p->pltime = pr->ndpr_pltime; p->if_index = pr->ndpr_ifp->if_index; - p->expire = pr->ndpr_expire; + if (pr->ndpr_vltime == ND6_INFINITE_LIFETIME) + p->expire = 0; + else { + time_t maxexpire; + + /* XXX: we assume time_t is signed. */ + maxexpire = (-1) & + ~(1 << ((sizeof(maxexpire) * 8) - 1)); + if (pr->ndpr_vltime < + maxexpire - pr->ndpr_lastupdate) { + p->expire = pr->ndpr_lastupdate + + pr->ndpr_vltime; + } else + p->expire = maxexpire; + } p->refcnt = pr->ndpr_refcnt; p->flags = pr->ndpr_stateflags; p->origin = PR_ORIG_RA; diff --git a/sys/netinet6/nd6.h b/sys/netinet6/nd6.h index 760671c604c3..1d7262352228 100644 --- a/sys/netinet6/nd6.h +++ b/sys/netinet6/nd6.h @@ -51,6 +51,9 @@ struct llinfo_nd6 { short ln_state; /* reachability state */ short ln_router; /* 2^0: ND6 router bit */ int ln_byhint; /* # of times we made it reachable by UL hint */ + + long ln_ntick; + struct callout ln_timer_ch; }; #define ND6_LLINFO_NOSTATE -2 @@ -69,6 +72,7 @@ struct llinfo_nd6 { #define ND6_LLINFO_PROBE 4 #define ND6_IS_LLINFO_PROBREACH(n) ((n)->ln_state > ND6_LLINFO_INCOMPLETE) +#define ND6_LLINFO_PERMANENT(n) (((n)->ln_expire == 0) && ((n)->ln_state > ND6_LLINFO_INCOMPLETE)) struct nd_ifinfo { u_int32_t linkmtu; /* LinkMTU */ @@ -92,6 +96,7 @@ struct nd_ifinfo { #define ND6_IFF_IFDISABLED 0x8 /* IPv6 operation is disabled due to * DAD failure. (XXX: not ND-specific) */ +#define ND6_IFF_DONT_SET_IFROUTE 0x10 #ifdef _KERNEL #define ND_IFINFO(ifp) \ @@ -243,18 +248,36 @@ struct nd_defrouter { u_short rtlifetime; u_long expire; struct ifnet *ifp; + int installed; /* is installed into kernel routing table */ }; +struct nd_prefixctl { + struct ifnet *ndpr_ifp; + + /* prefix */ + struct sockaddr_in6 ndpr_prefix; + u_char ndpr_plen; + + u_int32_t ndpr_vltime; /* advertised valid lifetime */ + u_int32_t ndpr_pltime; /* advertised preferred lifetime */ + + struct prf_ra ndpr_flags; +}; + + struct nd_prefix { struct ifnet *ndpr_ifp; LIST_ENTRY(nd_prefix) ndpr_entry; struct sockaddr_in6 ndpr_prefix; /* prefix */ struct in6_addr ndpr_mask; /* netmask derived from the prefix */ - struct in6_addr ndpr_addr; /* address that is derived from the prefix */ + u_int32_t ndpr_vltime; /* advertised valid lifetime */ u_int32_t ndpr_pltime; /* advertised preferred lifetime */ + time_t ndpr_expire; /* expiration time of the prefix */ time_t ndpr_preferred; /* preferred time of the prefix */ + time_t ndpr_lastupdate; /* reception time of last advertisement */ + struct prf_ra ndpr_flags; u_int32_t ndpr_stateflags; /* actual state flags */ /* list of routers that advertise the prefix: */ @@ -268,12 +291,7 @@ struct nd_prefix { #define ndpr_raf ndpr_flags #define ndpr_raf_onlink ndpr_flags.onlink #define ndpr_raf_auto ndpr_flags.autonomous - -/* - * We keep expired prefix for certain amount of time, for validation purposes. - * 1800s = MaxRtrAdvInterval - */ -#define NDPR_KEEP_EXPIRED (1800 * 2) +#define ndpr_raf_router ndpr_flags.router /* * Message format for use in obtaining information about prefixes @@ -301,9 +319,6 @@ struct inet6_ndpr_msghdr { #define prm_rrf_decrvalid prm_flags.prf_rr.decrvalid #define prm_rrf_decrprefd prm_flags.prf_rr.decrprefd -#define ifpr2ndpr(ifpr) ((struct nd_prefix *)(ifpr)) -#define ndpr2ifpr(ndpr) ((struct ifprefix *)(ndpr)) - struct nd_pfxrouter { LIST_ENTRY(nd_pfxrouter) pfr_entry; #define pfr_next pfr_entry.le_next @@ -321,7 +336,6 @@ extern int nd6_useloopback; extern int nd6_maxnudhint; extern int nd6_gctimer; extern struct llinfo_nd6 llinfo_nd6; -extern struct nd_ifinfo *nd_ifinfo; extern struct nd_drhead nd_defrouter; extern struct nd_prhead nd_prefix; extern int nd6_debug; @@ -373,9 +387,9 @@ struct nd_opt_hdr *nd6_option __P((union nd_opts *)); int nd6_options __P((union nd_opts *)); struct rtentry *nd6_lookup __P((struct in6_addr *, int, struct ifnet *)); void nd6_setmtu __P((struct ifnet *)); +void nd6_llinfo_settimer __P((struct llinfo_nd6 *, long)); void nd6_timer __P((void *)); void nd6_purge __P((struct ifnet *)); -struct llinfo_nd6 *nd6_free __P((struct rtentry *)); void nd6_nud_hint __P((struct rtentry *, struct in6_addr *, int)); int nd6_resolve __P((struct ifnet *, struct rtentry *, struct mbuf *, struct sockaddr *, u_char *)); @@ -385,9 +399,9 @@ struct rtentry *nd6_cache_lladdr __P((struct ifnet *, struct in6_addr *, char *, int, int, int)); int nd6_output __P((struct ifnet *, struct ifnet *, struct mbuf *, struct sockaddr_in6 *, struct rtentry *)); +int nd6_need_cache __P((struct ifnet *)); int nd6_storelladdr __P((struct ifnet *, struct rtentry *, struct mbuf *, struct sockaddr *, u_char *)); -int nd6_need_cache __P((struct ifnet *)); /* nd6_nbr.c */ void nd6_na_input __P((struct mbuf *, int, int)); @@ -397,7 +411,7 @@ void nd6_ns_input __P((struct mbuf *, int, int)); void nd6_ns_output __P((struct ifnet *, const struct in6_addr *, const struct in6_addr *, struct llinfo_nd6 *, int)); caddr_t nd6_ifptomac __P((struct ifnet *)); -void nd6_dad_start __P((struct ifaddr *, int *)); +void nd6_dad_start __P((struct ifaddr *, int)); void nd6_dad_stop __P((struct ifaddr *)); void nd6_dad_duplicated __P((struct ifaddr *)); @@ -406,23 +420,20 @@ void nd6_rs_input __P((struct mbuf *, int, int)); void nd6_ra_input __P((struct mbuf *, int, int)); void prelist_del __P((struct nd_prefix *)); void defrouter_addreq __P((struct nd_defrouter *)); -void defrouter_delreq __P((struct nd_defrouter *, int)); +void defrouter_reset __P((void)); void defrouter_select __P((void)); void defrtrlist_del __P((struct nd_defrouter *)); void prelist_remove __P((struct nd_prefix *)); -int prelist_update __P((struct nd_prefix *, struct nd_defrouter *, - struct mbuf *)); -int nd6_prelist_add __P((struct nd_prefix *, struct nd_defrouter *, +int nd6_prelist_add __P((struct nd_prefixctl *, struct nd_defrouter *, struct nd_prefix **)); int nd6_prefix_onlink __P((struct nd_prefix *)); int nd6_prefix_offlink __P((struct nd_prefix *)); void pfxlist_onlink_check __P((void)); struct nd_defrouter *defrouter_lookup __P((struct in6_addr *, struct ifnet *)); -struct nd_prefix *nd6_prefix_lookup __P((struct nd_prefix *)); -int in6_init_prefix_ltimes __P((struct nd_prefix *)); +struct nd_prefix *nd6_prefix_lookup __P((struct nd_prefixctl *)); void rt6_flush __P((struct in6_addr *, struct ifnet *)); int nd6_setdefaultiface __P((int)); -int in6_tmpifadd __P((const struct in6_ifaddr *, int)); +int in6_tmpifadd __P((const struct in6_ifaddr *, int, int)); #endif /* _KERNEL */ diff --git a/sys/netinet6/nd6_nbr.c b/sys/netinet6/nd6_nbr.c index f596c6465e99..94c00ee8660f 100644 --- a/sys/netinet6/nd6_nbr.c +++ b/sys/netinet6/nd6_nbr.c @@ -677,13 +677,13 @@ nd6_na_input(m, off, icmp6len) if (is_solicited) { ln->ln_state = ND6_LLINFO_REACHABLE; ln->ln_byhint = 0; - if (ln->ln_expire) { - ln->ln_expire = time_second + - ND_IFINFO(rt->rt_ifp)->reachable; + if (!ND6_LLINFO_PERMANENT(ln)) { + nd6_llinfo_settimer(ln, + (long)ND_IFINFO(rt->rt_ifp)->reachable * hz); } } else { ln->ln_state = ND6_LLINFO_STALE; - ln->ln_expire = time_second + nd6_gctimer; + nd6_llinfo_settimer(ln, (long)nd6_gctimer * hz); } if ((ln->ln_router = is_router) != 0) { /* @@ -730,14 +730,14 @@ nd6_na_input(m, off, icmp6len) * 1 1 y n (2a) L *->REACHABLE * 1 1 y y (2a) L *->REACHABLE */ - if (!is_override && (lladdr != NULL && llchange)) { /* (1) */ + if (!is_override && (lladdr != NULL && llchange)) { /* (1) */ /* * If state is REACHABLE, make it STALE. * no other updates should be done. */ if (ln->ln_state == ND6_LLINFO_REACHABLE) { ln->ln_state = ND6_LLINFO_STALE; - ln->ln_expire = time_second + nd6_gctimer; + nd6_llinfo_settimer(ln, (long)nd6_gctimer * hz); } goto freeit; } else if (is_override /* (2a) */ @@ -759,14 +759,15 @@ nd6_na_input(m, off, icmp6len) if (is_solicited) { ln->ln_state = ND6_LLINFO_REACHABLE; ln->ln_byhint = 0; - if (ln->ln_expire) { - ln->ln_expire = time_second + - ND_IFINFO(ifp)->reachable; + if (!ND6_LLINFO_PERMANENT(ln)) { + nd6_llinfo_settimer(ln, + (long)ND_IFINFO(ifp)->reachable * hz); } } else { if (lladdr != NULL && llchange) { ln->ln_state = ND6_LLINFO_STALE; - ln->ln_expire = time_second + nd6_gctimer; + nd6_llinfo_settimer(ln, + (long)nd6_gctimer * hz); } } } @@ -793,7 +794,7 @@ nd6_na_input(m, off, icmp6len) dr = defrouter_lookup(in6, ifp); if (dr) defrtrlist_del(dr); - else if (!ip6_forwarding && ip6_accept_rtadv) { + else if (!ip6_forwarding) { /* * Even if the neighbor is not in the default * router list, the neighbor may be used @@ -810,12 +811,25 @@ nd6_na_input(m, off, icmp6len) rt->rt_flags &= ~RTF_REJECT; ln->ln_asked = 0; if (ln->ln_hold) { - /* - * we assume ifp is not a loopback here, so just set the 2nd - * argument as the 1st one. - */ - nd6_output(ifp, ifp, ln->ln_hold, - (struct sockaddr_in6 *)rt_key(rt), rt); + struct mbuf *m_hold, *m_hold_next; + + for (m_hold = ln->ln_hold; m_hold; m_hold = m_hold_next) { + struct mbuf *mpkt = NULL; + + m_hold_next = m_hold->m_nextpkt; + mpkt = m_copym(m_hold, 0, M_COPYALL, M_DONTWAIT); + if (mpkt == NULL) { + m_freem(m_hold); + break; + } + mpkt->m_nextpkt = NULL; + /* + * we assume ifp is not a loopback here, so just set + * the 2nd argument as the 1st one. + */ + nd6_output(ifp, ifp, mpkt, + (struct sockaddr_in6 *)rt_key(rt), rt); + } ln->ln_hold = NULL; } @@ -1081,9 +1095,9 @@ nd6_dad_stoptimer(dp) * Start Duplicate Address Detection (DAD) for specified interface address. */ void -nd6_dad_start(ifa, tick) +nd6_dad_start(ifa, delay) struct ifaddr *ifa; - int *tick; /* minimum delay ticks for IFF_UP event */ + int delay; { struct in6_ifaddr *ia = (struct in6_ifaddr *)ifa; struct dadq *dp; @@ -1151,19 +1165,12 @@ nd6_dad_start(ifa, tick) dp->dad_count = ip6_dad_count; dp->dad_ns_icount = dp->dad_na_icount = 0; dp->dad_ns_ocount = dp->dad_ns_tcount = 0; - if (tick == NULL) { + if (delay == 0) { nd6_dad_ns_output(dp, ifa); nd6_dad_starttimer(dp, - ND_IFINFO(ifa->ifa_ifp)->retrans * hz / 1000); + (long)ND_IFINFO(ifa->ifa_ifp)->retrans * hz / 1000); } else { - int ntick; - - if (*tick == 0) - ntick = arc4random() % (MAX_RTR_SOLICITATION_DELAY * hz); - else - ntick = *tick + arc4random() % (hz / 2); - *tick = ntick; - nd6_dad_starttimer(dp, ntick); + nd6_dad_starttimer(dp, delay); } } @@ -1246,7 +1253,7 @@ nd6_dad_timer(ifa) */ nd6_dad_ns_output(dp, ifa); nd6_dad_starttimer(dp, - ND_IFINFO(ifa->ifa_ifp)->retrans * hz / 1000); + (long)ND_IFINFO(ifa->ifa_ifp)->retrans * hz / 1000); } else { /* * We have transmitted sufficient number of DAD packets. diff --git a/sys/netinet6/nd6_rtr.c b/sys/netinet6/nd6_rtr.c index d9efa71b722e..3738ada1b23e 100644 --- a/sys/netinet6/nd6_rtr.c +++ b/sys/netinet6/nd6_rtr.c @@ -64,18 +64,21 @@ #define SDL(s) ((struct sockaddr_dl *)s) +static int rtpref __P((struct nd_defrouter *)); static struct nd_defrouter *defrtrlist_update __P((struct nd_defrouter *)); -static struct in6_ifaddr *in6_ifadd __P((struct nd_prefix *, - struct in6_addr *)); +static int prelist_update __P((struct nd_prefixctl *, struct nd_defrouter *, + struct mbuf *, int)); +static struct in6_ifaddr *in6_ifadd __P((struct nd_prefixctl *, int)); static struct nd_pfxrouter *pfxrtr_lookup __P((struct nd_prefix *, struct nd_defrouter *)); static void pfxrtr_add __P((struct nd_prefix *, struct nd_defrouter *)); static void pfxrtr_del __P((struct nd_pfxrouter *)); static struct nd_pfxrouter *find_pfxlist_reachable_router __P((struct nd_prefix *)); -static void defrouter_addifreq __P((struct ifnet *)); +static void defrouter_delreq __P((struct nd_defrouter *)); static void nd6_rtmsg __P((int, struct rtentry *)); +static int in6_init_prefix_ltimes __P((struct nd_prefix *)); static void in6_init_address_ltimes __P((struct nd_prefix *, struct in6_addrlifetime *)); @@ -98,6 +101,13 @@ static int ip6_temp_valid_lifetime = 1800; */ int ip6_temp_regen_advance = TEMPADDR_REGEN_ADVANCE; +/* RTPREF_MEDIUM has to be 0! */ +#define RTPREF_HIGH 1 +#define RTPREF_MEDIUM 0 +#define RTPREF_LOW (-1) +#define RTPREF_RESERVED (-2) +#define RTPREF_INVALID (-3) /* internal */ + /* * Receive Router Solicitation Message - just for routers. * Router solicitation/advertisement is mostly managed by userland program @@ -200,6 +210,7 @@ nd6_ra_input(m, off, icmp6len) struct ip6_hdr *ip6 = mtod(m, struct ip6_hdr *); struct nd_router_advert *nd_ra; struct in6_addr saddr6 = ip6->ip6_src; + int mcast = 0; union nd_opts ndopts; struct nd_defrouter *dr; @@ -252,9 +263,23 @@ nd6_ra_input(m, off, icmp6len) struct nd_defrouter dr0; u_int32_t advreachable = nd_ra->nd_ra_reachable; + /* remember if this is a multicasted advertisement */ + if (IN6_IS_ADDR_MULTICAST(&ip6->ip6_dst)) + mcast = 1; + + bzero(&dr0, sizeof(dr0)); dr0.rtaddr = saddr6; dr0.flags = nd_ra->nd_ra_flags_reserved; - dr0.rtlifetime = ntohs(nd_ra->nd_ra_router_lifetime); + if (rtpref(&dr0) == RTPREF_RESERVED) { + /* + * "reserved" router preference should be treated as + * 0-lifetime. Note that rtpref() covers the case that the + * kernel is not configured to support the preference + * extension. + */ + dr0.rtlifetime = 0; + } else + dr0.rtlifetime = ntohs(nd_ra->nd_ra_router_lifetime); dr0.expire = time_second + dr0.rtlifetime; dr0.ifp = ifp; /* unspecified or not? (RFC 2461 6.3.4) */ @@ -280,7 +305,7 @@ nd6_ra_input(m, off, icmp6len) if (ndopts.nd_opts_pi) { struct nd_opt_hdr *pt; struct nd_opt_prefix_info *pi = NULL; - struct nd_prefix pr; + struct nd_prefixctl pr; for (pt = (struct nd_opt_hdr *)ndopts.nd_opts_pi; pt <= (struct nd_opt_hdr *)ndopts.nd_opts_pi_end; @@ -315,17 +340,6 @@ nd6_ra_input(m, off, icmp6len) continue; } - /* aggregatable unicast address, rfc2374 */ - if ((pi->nd_opt_pi_prefix.s6_addr8[0] & 0xe0) == 0x20 - && pi->nd_opt_pi_prefix_len != 64) { - nd6log((LOG_INFO, - "nd6_ra_input: invalid prefixlen " - "%d for rfc2374 prefix %s, ignored\n", - pi->nd_opt_pi_prefix_len, - ip6_sprintf(&pi->nd_opt_pi_prefix))); - continue; - } - bzero(&pr, sizeof(pr)); pr.ndpr_prefix.sin6_family = AF_INET6; pr.ndpr_prefix.sin6_len = sizeof(pr.ndpr_prefix); @@ -339,9 +353,7 @@ nd6_ra_input(m, off, icmp6len) pr.ndpr_plen = pi->nd_opt_pi_prefix_len; pr.ndpr_vltime = ntohl(pi->nd_opt_pi_valid_time); pr.ndpr_pltime = ntohl(pi->nd_opt_pi_preferred_time); - if (in6_init_prefix_ltimes(&pr)) - continue; /* prefix lifetime init failed */ - (void)prelist_update(&pr, dr, m); + (void)prelist_update(&pr, dr, m, mcast); } } @@ -437,9 +449,11 @@ nd6_rtmsg(cmd, rt) info.rti_info[RTAX_DST] = rt_key(rt); info.rti_info[RTAX_GATEWAY] = rt->rt_gateway; info.rti_info[RTAX_NETMASK] = rt_mask(rt); - info.rti_info[RTAX_IFP] = - (struct sockaddr *)TAILQ_FIRST(&rt->rt_ifp->if_addrlist); - info.rti_info[RTAX_IFA] = rt->rt_ifa->ifa_addr; + if (rt->rt_ifp) { + info.rti_info[RTAX_IFP] = + TAILQ_FIRST(&rt->rt_ifp->if_addrlist)->ifa_addr; + info.rti_info[RTAX_IFA] = rt->rt_ifa->ifa_addr; + } rt_missmsg(cmd, &info, rt->rt_flags, 0); } @@ -450,6 +464,8 @@ defrouter_addreq(new) { struct sockaddr_in6 def, mask, gate; struct rtentry *newrt = NULL; + int s; + int error; bzero(&def, sizeof(def)); bzero(&mask, sizeof(mask)); @@ -457,10 +473,11 @@ defrouter_addreq(new) def.sin6_len = mask.sin6_len = gate.sin6_len = sizeof(struct sockaddr_in6); - def.sin6_family = mask.sin6_family = gate.sin6_family = AF_INET6; + def.sin6_family = gate.sin6_family = AF_INET6; gate.sin6_addr = new->rtaddr; - (void)rtrequest(RTM_ADD, (struct sockaddr *)&def, + s = splnet(); + error = rtrequest(RTM_ADD, (struct sockaddr *)&def, (struct sockaddr *)&gate, (struct sockaddr *)&mask, RTF_GATEWAY, &newrt); if (newrt) { @@ -469,55 +486,12 @@ defrouter_addreq(new) RT_REMREF(newrt); RT_UNLOCK(newrt); } + if (error == 0) + new->installed = 1; + splx(s); return; } -/* Add a route to a given interface as default */ -void -defrouter_addifreq(ifp) - struct ifnet *ifp; -{ - struct sockaddr_in6 def, mask; - struct ifaddr *ifa; - struct rtentry *newrt = NULL; - int error, flags; - - bzero(&def, sizeof(def)); - bzero(&mask, sizeof(mask)); - - def.sin6_len = mask.sin6_len = sizeof(struct sockaddr_in6); - def.sin6_family = mask.sin6_family = AF_INET6; - - /* - * Search for an ifaddr beloging to the specified interface. - * XXX: An IPv6 address are required to be assigned on the interface. - */ - if ((ifa = ifaof_ifpforaddr((struct sockaddr *)&def, ifp)) == NULL) { - nd6log((LOG_ERR, /* better error? */ - "defrouter_addifreq: failed to find an ifaddr " - "to install a route to interface %s\n", - if_name(ifp))); - return; - } - - flags = ifa->ifa_flags; - error = rtrequest(RTM_ADD, (struct sockaddr *)&def, ifa->ifa_addr, - (struct sockaddr *)&mask, flags, &newrt); - if (error != 0) { - nd6log((LOG_ERR, - "defrouter_addifreq: failed to install a route to " - "interface %s (errno = %d)\n", - if_name(ifp), error)); - } else { - if (newrt) { - RT_LOCK(newrt); - nd6_rtmsg(RTM_ADD, newrt); - RT_REMREF(newrt); - RT_UNLOCK(newrt); - } - } -} - struct nd_defrouter * defrouter_lookup(addr, ifp) struct in6_addr *addr; @@ -534,10 +508,14 @@ defrouter_lookup(addr, ifp) return (NULL); /* search failed */ } -void -defrouter_delreq(dr, dofree) +/* + * Remove the default route for a given router. + * This is just a subroutine function for defrouter_select(), and should + * not be called from anywhere else. + */ +static void +defrouter_delreq(dr) struct nd_defrouter *dr; - int dofree; { struct sockaddr_in6 def, mask, gate; struct rtentry *oldrt = NULL; @@ -548,7 +526,7 @@ defrouter_delreq(dr, dofree) def.sin6_len = mask.sin6_len = gate.sin6_len = sizeof(struct sockaddr_in6); - def.sin6_family = mask.sin6_family = gate.sin6_family = AF_INET6; + def.sin6_family = gate.sin6_family = AF_INET6; gate.sin6_addr = dr->rtaddr; rtrequest(RTM_DELETE, (struct sockaddr *)&def, @@ -559,8 +537,25 @@ defrouter_delreq(dr, dofree) RTFREE(oldrt); } - if (dofree) /* XXX: necessary? */ - free(dr, M_IP6NDP); + dr->installed = 0; +} + +/* + * remove all default routes from default router list + */ +void +defrouter_reset() +{ + struct nd_defrouter *dr; + + for (dr = TAILQ_FIRST(&nd_defrouter); dr; + dr = TAILQ_NEXT(dr, dr_entry)) + defrouter_delreq(dr); + + /* + * XXX should we also nuke any default routers in the kernel, by + * going through them by rtalloc1()? + */ } void @@ -577,9 +572,10 @@ defrtrlist_del(dr) if (!ip6_forwarding && ip6_accept_rtadv) /* XXX: better condition? */ rt6_flush(&dr->rtaddr, dr->ifp); - if (dr == TAILQ_FIRST(&nd_defrouter)) - deldr = dr; /* The router is primary. */ - + if (dr->installed) { + deldr = dr; + defrouter_delreq(dr); + } TAILQ_REMOVE(&nd_defrouter, dr, dr_entry); /* @@ -604,87 +600,143 @@ defrtrlist_del(dr) } /* - * Default Router Selection according to Section 6.3.6 of RFC 2461: - * 1) Routers that are reachable or probably reachable should be - * preferred. + * Default Router Selection according to Section 6.3.6 of RFC 2461 and + * draft-ietf-ipngwg-router-selection: + * 1) Routers that are reachable or probably reachable should be preferred. + * If we have more than one (probably) reachable router, prefer ones + * with the highest router preference. * 2) When no routers on the list are known to be reachable or * probably reachable, routers SHOULD be selected in a round-robin - * fashion. + * fashion, regardless of router preference values. * 3) If the Default Router List is empty, assume that all * destinations are on-link. + * + * We assume nd_defrouter is sorted by router preference value. + * Since the code below covers both with and without router preference cases, + * we do not need to classify the cases by ifdef. + * + * At this moment, we do not try to install more than one default router, + * even when the multipath routing is available, because we're not sure about + * the benefits for stub hosts comparing to the risk of making the code + * complicated and the possibility of introducing bugs. */ void defrouter_select() { int s = splnet(); - struct nd_defrouter *dr, anydr; + struct nd_defrouter *dr, *selected_dr = NULL, *installed_dr = NULL; struct rtentry *rt = NULL; struct llinfo_nd6 *ln = NULL; + /* + * This function should be called only when acting as an autoconfigured + * host. Although the remaining part of this function is not effective + * if the node is not an autoconfigured host, we explicitly exclude + * such cases here for safety. + */ + if (ip6_forwarding || !ip6_accept_rtadv) { + nd6log((LOG_WARNING, + "defrouter_select: called unexpectedly (forwarding=%d, " + "accept_rtadv=%d)\n", ip6_forwarding, ip6_accept_rtadv)); + splx(s); + return; + } + + /* + * Let's handle easy case (3) first: + * If default router list is empty, there's nothing to be done. + */ + if (!TAILQ_FIRST(&nd_defrouter)) { + splx(s); + return; + } + /* * Search for a (probably) reachable router from the list. + * We just pick up the first reachable one (if any), assuming that + * the ordering rule of the list described in defrtrlist_update(). */ for (dr = TAILQ_FIRST(&nd_defrouter); dr; dr = TAILQ_NEXT(dr, dr_entry)) { - if ((rt = nd6_lookup(&dr->rtaddr, 0, dr->ifp)) && + if (selected_dr == NULL && + (rt = nd6_lookup(&dr->rtaddr, 0, dr->ifp)) && (ln = (struct llinfo_nd6 *)rt->rt_llinfo) && ND6_IS_LLINFO_PROBREACH(ln)) { - /* Got it, and move it to the head */ - TAILQ_REMOVE(&nd_defrouter, dr, dr_entry); - TAILQ_INSERT_HEAD(&nd_defrouter, dr, dr_entry); - break; + selected_dr = dr; } + + if (dr->installed && installed_dr == NULL) + installed_dr = dr; + else if (dr->installed && installed_dr) { + /* this should not happen. warn for diagnosis. */ + log(LOG_ERR, "defrouter_select: more than one router" + " is installed\n"); + } + } + /* + * If none of the default routers was found to be reachable, + * round-robin the list regardless of preference. + * Otherwise, if we have an installed router, check if the selected + * (reachable) router should really be preferred to the installed one. + * We only prefer the new router when the old one is not reachable + * or when the new one has a really higher preference value. + */ + if (selected_dr == NULL) { + if (installed_dr == NULL || !TAILQ_NEXT(installed_dr, dr_entry)) + selected_dr = TAILQ_FIRST(&nd_defrouter); + else + selected_dr = TAILQ_NEXT(installed_dr, dr_entry); + } else if (installed_dr && + (rt = nd6_lookup(&installed_dr->rtaddr, 0, installed_dr->ifp)) && + (ln = (struct llinfo_nd6 *)rt->rt_llinfo) && + ND6_IS_LLINFO_PROBREACH(ln) && + rtpref(selected_dr) <= rtpref(installed_dr)) { + selected_dr = installed_dr; } - if ((dr = TAILQ_FIRST(&nd_defrouter))) { - /* - * De-install the previous default gateway and install - * a new one. - * Note that if there is no reachable router in the list, - * the head entry will be used anyway. - * XXX: do we have to check the current routing table entry? - */ - bzero(&anydr, sizeof(anydr)); - defrouter_delreq(&anydr, 0); - defrouter_addreq(dr); - } - else { - /* - * The Default Router List is empty, so install the default - * route to an inteface. - * XXX: The specification does not say this mechanism should - * be restricted to hosts, but this would be not useful - * (even harmful) for routers. - */ - if (!ip6_forwarding) { - /* - * De-install the current default route - * in advance. - */ - bzero(&anydr, sizeof(anydr)); - defrouter_delreq(&anydr, 0); - if (nd6_defifp) { - /* - * Install a route to the default interface - * as default route. - * XXX: we enable this for host only, because - * this may override a default route installed - * a user process (e.g. routing daemon) in a - * router case. - */ - defrouter_addifreq(nd6_defifp); - } else { - nd6log((LOG_INFO, "defrouter_select: " - "there's no default router and no default" - " interface\n")); - } - } + /* + * If the selected router is different than the installed one, + * remove the installed router and install the selected one. + * Note that the selected router is never NULL here. + */ + if (installed_dr != selected_dr) { + if (installed_dr) + defrouter_delreq(installed_dr); + defrouter_addreq(selected_dr); } splx(s); return; } +/* + * for default router selection + * regards router-preference field as a 2-bit signed integer + */ +static int +rtpref(struct nd_defrouter *dr) +{ + switch (dr->flags & ND_RA_FLAG_RTPREF_MASK) { + case ND_RA_FLAG_RTPREF_HIGH: + return (RTPREF_HIGH); + case ND_RA_FLAG_RTPREF_MEDIUM: + return (RTPREF_MEDIUM); + case ND_RA_FLAG_RTPREF_RSV: + return (RTPREF_RESERVED); + case ND_RA_FLAG_RTPREF_LOW: + return (RTPREF_LOW); + default: + /* + * This case should never happen. If it did, it would mean a + * serious bug of kernel internal. We thus always bark here. + * Or, can we even panic? + */ + log(LOG_ERR, "rtpref: impossible RA flag %x\n", dr->flags); + return (RTPREF_INVALID); + } + /* NOTREACHED */ +} + static struct nd_defrouter * defrtrlist_update(new) struct nd_defrouter *new; @@ -698,10 +750,34 @@ defrtrlist_update(new) defrtrlist_del(dr); dr = NULL; } else { + int oldpref = rtpref(dr); + /* override */ dr->flags = new->flags; /* xxx flag check */ dr->rtlifetime = new->rtlifetime; dr->expire = new->expire; + + /* + * If the preference does not change, there's no need + * to sort the entries. + */ + if (rtpref(new) == oldpref) { + splx(s); + return (dr); + } + + /* + * preferred router may be changed, so relocate + * this router. + * XXX: calling TAILQ_REMOVE directly is a bad manner. + * However, since defrtrlist_del() has many side + * effects, we intentionally do so here. + * defrouter_select() below will handle routing + * changes later. + */ + TAILQ_REMOVE(&nd_defrouter, dr, dr_entry); + n = dr; + goto insert; } splx(s); return (dr); @@ -721,14 +797,27 @@ defrtrlist_update(new) bzero(n, sizeof(*n)); *n = *new; +insert: /* - * Insert the new router at the end of the Default Router List. - * If there is no other router, install it anyway. Otherwise, - * just continue to use the current default router. + * Insert the new router in the Default Router List; + * The Default Router List should be in the descending order + * of router-preferece. Routers with the same preference are + * sorted in the arriving time order. */ - TAILQ_INSERT_TAIL(&nd_defrouter, n, dr_entry); - if (TAILQ_FIRST(&nd_defrouter) == n) - defrouter_select(); + + /* insert at the end of the group */ + for (dr = TAILQ_FIRST(&nd_defrouter); dr; + dr = TAILQ_NEXT(dr, dr_entry)) { + if (rtpref(n) > rtpref(dr)) + break; + } + if (dr) + TAILQ_INSERT_BEFORE(dr, n, dr_entry); + else + TAILQ_INSERT_TAIL(&nd_defrouter, n, dr_entry); + + defrouter_select(); + splx(s); return (n); @@ -776,16 +865,16 @@ pfxrtr_del(pfr) } struct nd_prefix * -nd6_prefix_lookup(pr) - struct nd_prefix *pr; +nd6_prefix_lookup(key) + struct nd_prefixctl *key; { struct nd_prefix *search; for (search = nd_prefix.lh_first; search; search = search->ndpr_next) { - if (pr->ndpr_ifp == search->ndpr_ifp && - pr->ndpr_plen == search->ndpr_plen && - in6_are_prefix_equal(&pr->ndpr_prefix.sin6_addr, - &search->ndpr_prefix.sin6_addr, pr->ndpr_plen)) { + if (key->ndpr_ifp == search->ndpr_ifp && + key->ndpr_plen == search->ndpr_plen && + in6_are_prefix_equal(&key->ndpr_prefix.sin6_addr, + &search->ndpr_prefix.sin6_addr, key->ndpr_plen)) { break; } } @@ -795,17 +884,29 @@ nd6_prefix_lookup(pr) int nd6_prelist_add(pr, dr, newp) - struct nd_prefix *pr, **newp; + struct nd_prefixctl *pr; + struct nd_prefix **newp; struct nd_defrouter *dr; { struct nd_prefix *new = NULL; + int error = 0; int i, s; new = (struct nd_prefix *)malloc(sizeof(*new), M_IP6NDP, M_NOWAIT); if (new == NULL) return(ENOMEM); bzero(new, sizeof(*new)); - *new = *pr; + new->ndpr_ifp = pr->ndpr_ifp; + new->ndpr_prefix = pr->ndpr_prefix; + new->ndpr_plen = pr->ndpr_plen; + new->ndpr_vltime = pr->ndpr_vltime; + new->ndpr_pltime = pr->ndpr_pltime; + new->ndpr_flags = pr->ndpr_flags; + if ((error = in6_init_prefix_ltimes(new)) != 0) { + free(new, M_IP6NDP); + return(error); + } + new->ndpr_lastupdate = time_second; if (newp != NULL) *newp = new; @@ -851,6 +952,7 @@ prelist_remove(pr) /* make sure to invalidate the prefix until it is really freed. */ pr->ndpr_vltime = 0; pr->ndpr_pltime = 0; + /* * Though these flags are now meaningless, we'd rather keep the value * of pr->ndpr_raf_onlink and pr->ndpr_raf_auto not to confuse users @@ -887,11 +989,12 @@ prelist_remove(pr) pfxlist_onlink_check(); } -int -prelist_update(new, dr, m) - struct nd_prefix *new; +static int +prelist_update(new, dr, m, mcast) + struct nd_prefixctl *new; struct nd_defrouter *dr; /* may be NULL */ struct mbuf *m; + int mcast; { struct in6_ifaddr *ia6 = NULL, *ia6_match = NULL; struct ifaddr *ifa; @@ -933,8 +1036,8 @@ prelist_update(new, dr, m) if (new->ndpr_raf_onlink) { pr->ndpr_vltime = new->ndpr_vltime; pr->ndpr_pltime = new->ndpr_pltime; - pr->ndpr_preferred = new->ndpr_preferred; - pr->ndpr_expire = new->ndpr_expire; + (void)in6_init_prefix_ltimes(pr); /* XXX error case? */ + pr->ndpr_lastupdate = time_second; } if (new->ndpr_raf_onlink && @@ -964,8 +1067,6 @@ prelist_update(new, dr, m) if (new->ndpr_raf_onlink == 0 && new->ndpr_raf_auto == 0) goto end; - bzero(&new->ndpr_addr, sizeof(struct in6_addr)); - error = nd6_prelist_add(new, dr, &newpr); if (error != 0 || newpr == NULL) { nd6log((LOG_NOTICE, "prelist_update: " @@ -1000,34 +1101,44 @@ prelist_update(new, dr, m) /* 5.5.3 (a). Ignore the prefix without the A bit set. */ if (!new->ndpr_raf_auto) - goto afteraddrconf; + goto end; /* * 5.5.3 (b). the link-local prefix should have been ignored in * nd6_ra_input. */ - /* - * 5.5.3 (c). Consistency check on lifetimes: pltime <= vltime. - * This should have been done in nd6_ra_input. - */ + /* 5.5.3 (c). Consistency check on lifetimes: pltime <= vltime. */ + if (new->ndpr_pltime > new->ndpr_vltime) { + error = EINVAL; /* XXX: won't be used */ + goto end; + } /* - * 5.5.3 (d). If the prefix advertised does not match the prefix of an - * address already in the list, and the Valid Lifetime is not 0, - * form an address. Note that even a manually configured address - * should reject autoconfiguration of a new address. + * 5.5.3 (d). If the prefix advertised is not equal to the prefix of + * an address configured by stateless autoconfiguration already in the + * list of addresses associated with the interface, and the Valid + * Lifetime is not 0, form an address. We first check if we have + * a matching prefix. + * Note: we apply a clarification in rfc2462bis-02 here. We only + * consider autoconfigured addresses while RFC2462 simply said + * "address". */ TAILQ_FOREACH(ifa, &ifp->if_addrlist, ifa_list) { struct in6_ifaddr *ifa6; - int ifa_plen; - u_int32_t storedlifetime; + u_int32_t remaininglifetime; if (ifa->ifa_addr->sa_family != AF_INET6) continue; ifa6 = (struct in6_ifaddr *)ifa; + /* + * We only consider autoconfigured addresses as per rfc2462bis. + */ + if (!(ifa6->ia6_flags & IN6_IFF_AUTOCONF)) + continue; + /* * Spec is not clear here, but I believe we should concentrate * on unicast (i.e. not anycast) addresses. @@ -1036,48 +1147,57 @@ prelist_update(new, dr, m) if ((ifa6->ia6_flags & IN6_IFF_ANYCAST) != 0) continue; - ifa_plen = in6_mask2len(&ifa6->ia_prefixmask.sin6_addr, NULL); - if (ifa_plen != new->ndpr_plen || - !in6_are_prefix_equal(&ifa6->ia_addr.sin6_addr, - &new->ndpr_prefix.sin6_addr, ifa_plen)) + /* + * Ignore the address if it is not associated with a prefix + * or is associated with a prefix that is different from this + * one. (pr is never NULL here) + */ + if (ifa6->ia6_ndpr != pr) continue; if (ia6_match == NULL) /* remember the first one */ ia6_match = ifa6; - if ((ifa6->ia6_flags & IN6_IFF_AUTOCONF) == 0) - continue; - /* * An already autoconfigured address matched. Now that we * are sure there is at least one matched address, we can * proceed to 5.5.3. (e): update the lifetimes according to the * "two hours" rule and the privacy extension. + * We apply some clarifications in rfc2462bis: + * - use remaininglifetime instead of storedlifetime as a + * variable name + * - remove the dead code in the "two-hour" rule */ #define TWOHOUR (120*60) lt6_tmp = ifa6->ia6_lifetime; if (lt6_tmp.ia6t_vltime == ND6_INFINITE_LIFETIME) - storedlifetime = ND6_INFINITE_LIFETIME; - else if (IFA6_IS_INVALID(ifa6)) - storedlifetime = 0; - else - storedlifetime = lt6_tmp.ia6t_expire - time_second; + remaininglifetime = ND6_INFINITE_LIFETIME; + else if (time_second - ifa6->ia6_updatetime > + lt6_tmp.ia6t_vltime) { + /* + * The case of "invalid" address. We should usually + * not see this case. + */ + remaininglifetime = 0; + } else + remaininglifetime = lt6_tmp.ia6t_vltime - + (time_second - ifa6->ia6_updatetime); /* when not updating, keep the current stored lifetime. */ - lt6_tmp.ia6t_vltime = storedlifetime; + lt6_tmp.ia6t_vltime = remaininglifetime; if (TWOHOUR < new->ndpr_vltime || - storedlifetime < new->ndpr_vltime) { + remaininglifetime < new->ndpr_vltime) { lt6_tmp.ia6t_vltime = new->ndpr_vltime; - } else if (storedlifetime <= TWOHOUR) { + } else if (remaininglifetime <= TWOHOUR) { if (auth) { lt6_tmp.ia6t_vltime = new->ndpr_vltime; } } else { /* * new->ndpr_vltime <= TWOHOUR && - * TWOHOUR < storedlifetime + * TWOHOUR < remaininglifetime */ lt6_tmp.ia6t_vltime = TWOHOUR; } @@ -1087,35 +1207,78 @@ prelist_update(new, dr, m) in6_init_address_ltimes(pr, <6_tmp); - /* - * When adjusting the lifetimes of an existing temporary - * address, only lower the lifetimes. - * RFC 3041 3.3. (1). - * XXX: how should we modify ia6t_[pv]ltime? - */ - if ((ifa6->ia6_flags & IN6_IFF_TEMPORARY) != 0) { - if (lt6_tmp.ia6t_expire == 0 || /* no expire */ - lt6_tmp.ia6t_expire > - ifa6->ia6_lifetime.ia6t_expire) { - lt6_tmp.ia6t_expire = - ifa6->ia6_lifetime.ia6t_expire; + /* + * We need to treat lifetimes for temporary addresses + * differently, according to + * draft-ietf-ipv6-privacy-addrs-v2-01.txt 3.3 (1); + * we only update the lifetimes when they are in the maximum + * intervals. + */ + if ((ifa6->ia6_flags & IN6_IFF_TEMPORARY) != 0) { + u_int32_t maxvltime, maxpltime; + + if (ip6_temp_valid_lifetime > + (u_int32_t)((time_second - ifa6->ia6_createtime) + + ip6_desync_factor)) { + maxvltime = ip6_temp_valid_lifetime - + (time_second - ifa6->ia6_createtime) - + ip6_desync_factor; + } else + maxvltime = 0; + if (ip6_temp_preferred_lifetime > + (u_int32_t)((time_second - ifa6->ia6_createtime) + + ip6_desync_factor)) { + maxpltime = ip6_temp_preferred_lifetime - + (time_second - ifa6->ia6_createtime) - + ip6_desync_factor; + } else + maxpltime = 0; + + if (lt6_tmp.ia6t_vltime == ND6_INFINITE_LIFETIME || + lt6_tmp.ia6t_vltime > maxvltime) { + lt6_tmp.ia6t_vltime = maxvltime; } - if (lt6_tmp.ia6t_preferred == 0 || /* no expire */ - lt6_tmp.ia6t_preferred > - ifa6->ia6_lifetime.ia6t_preferred) { - lt6_tmp.ia6t_preferred = - ifa6->ia6_lifetime.ia6t_preferred; + if (lt6_tmp.ia6t_pltime == ND6_INFINITE_LIFETIME || + lt6_tmp.ia6t_pltime > maxpltime) { + lt6_tmp.ia6t_pltime = maxpltime; } } - ifa6->ia6_lifetime = lt6_tmp; + ifa6->ia6_updatetime = time_second; } if (ia6_match == NULL && new->ndpr_vltime) { + int ifidlen; + /* + * 5.5.3 (d) (continued) * No address matched and the valid lifetime is non-zero. * Create a new address. */ - if ((ia6 = in6_ifadd(new, NULL)) != NULL) { + + /* + * Prefix Length check: + * If the sum of the prefix length and interface identifier + * length does not equal 128 bits, the Prefix Information + * option MUST be ignored. The length of the interface + * identifier is defined in a separate link-type specific + * document. + */ + ifidlen = in6_if2idlen(ifp); + if (ifidlen < 0) { + /* this should not happen, so we always log it. */ + log(LOG_ERR, "prelist_update: IFID undefined (%s)\n", + if_name(ifp)); + goto end; + } + if (ifidlen + pr->ndpr_plen != 128) { + nd6log((LOG_INFO, + "prelist_update: invalid prefixlen " + "%d for %s, ignored\n", + pr->ndpr_plen, if_name(ifp))); + goto end; + } + + if ((ia6 = in6_ifadd(new, mcast)) != NULL) { /* * note that we should use pr (not new) for reference. */ @@ -1136,7 +1299,7 @@ prelist_update(new, dr, m) */ if (ip6_use_tempaddr) { int e; - if ((e = in6_tmpifadd(ia6, 1)) != 0) { + if ((e = in6_tmpifadd(ia6, 1, 1)) != 0) { nd6log((LOG_NOTICE, "prelist_update: " "failed to create a temporary " "address, errno=%d\n", @@ -1156,8 +1319,6 @@ prelist_update(new, dr, m) } } - afteraddrconf: - end: splx(s); return error; @@ -1206,6 +1367,8 @@ pfxlist_onlink_check() { struct nd_prefix *pr; struct in6_ifaddr *ifa; + struct nd_defrouter *dr; + struct nd_pfxrouter *pfxrtr = NULL; /* * Check if there is a prefix that has a reachable advertising @@ -1216,12 +1379,34 @@ pfxlist_onlink_check() break; } + /* + * If we have no such prefix, check whether we still have a router + * that does not advertise any prefixes. + */ if (pr == NULL) { - /* - * There is at least one prefix that has a reachable router. - * Detach prefixes which have no reachable advertising - * router, and attach other prefixes. - */ + for (dr = TAILQ_FIRST(&nd_defrouter); dr; + dr = TAILQ_NEXT(dr, dr_entry)) { + struct nd_prefix *pr0; + + for (pr0 = nd_prefix.lh_first; pr0; + pr0 = pr0->ndpr_next) { + if ((pfxrtr = pfxrtr_lookup(pr0, dr)) != NULL) + break; + } + if (pfxrtr != NULL) + break; + } + } + if (pr != NULL || (TAILQ_FIRST(&nd_defrouter) && pfxrtr == NULL)) { + /* + * There is at least one prefix that has a reachable router, + * or at least a router which probably does not advertise + * any prefixes. The latter would be the case when we move + * to a new link where we have a router that does not provide + * prefixes and we configure an address by hand. + * Detach prefixes which have no reachable advertising + * router, and attach other prefixes. + */ for (pr = nd_prefix.lh_first; pr; pr = pr->ndpr_next) { /* XXX: a link-local prefix should never be detached */ if (IN6_IS_ADDR_LINKLOCAL(&pr->ndpr_prefix.sin6_addr)) @@ -1327,10 +1512,15 @@ pfxlist_onlink_check() if (ifa->ia6_ndpr == NULL) /* XXX: see above. */ continue; - if (find_pfxlist_reachable_router(ifa->ia6_ndpr)) - ifa->ia6_flags &= ~IN6_IFF_DETACHED; - else + if (find_pfxlist_reachable_router(ifa->ia6_ndpr)) { + if (ifa->ia6_flags & IN6_IFF_DETACHED) { + ifa->ia6_flags &= ~IN6_IFF_DETACHED; + ifa->ia6_flags |= IN6_IFF_TENTATIVE; + nd6_dad_start((struct ifaddr *)ifa, 0); + } + } else { ifa->ia6_flags |= IN6_IFF_DETACHED; + } } } else { @@ -1338,7 +1528,12 @@ pfxlist_onlink_check() if ((ifa->ia6_flags & IN6_IFF_AUTOCONF) == 0) continue; - ifa->ia6_flags &= ~IN6_IFF_DETACHED; + if (ifa->ia6_flags & IN6_IFF_DETACHED) { + ifa->ia6_flags &= ~IN6_IFF_DETACHED; + ifa->ia6_flags |= IN6_IFF_TENTATIVE; + /* Do we need a delay in this case? */ + nd6_dad_start((struct ifaddr *)ifa, 0); + } } } } @@ -1544,9 +1739,9 @@ nd6_prefix_offlink(pr) } static struct in6_ifaddr * -in6_ifadd(pr, ifid) - struct nd_prefix *pr; - struct in6_addr *ifid; /* Mobile IPv6 addition */ +in6_ifadd(pr, mcast) + struct nd_prefixctl *pr; + int mcast; { struct ifnet *ifp = pr->ndpr_ifp; struct ifaddr *ifa; @@ -1555,6 +1750,7 @@ in6_ifadd(pr, ifid) int error, plen0; struct in6_addr mask; int prefixlen = pr->ndpr_plen; + int updateflags; in6_prefixlen2mask(&mask, prefixlen); @@ -1604,24 +1800,21 @@ in6_ifadd(pr, ifid) ifra.ifra_addr.sin6_family = AF_INET6; ifra.ifra_addr.sin6_len = sizeof(struct sockaddr_in6); /* prefix */ - bcopy(&pr->ndpr_prefix.sin6_addr, &ifra.ifra_addr.sin6_addr, - sizeof(ifra.ifra_addr.sin6_addr)); + ifra.ifra_addr.sin6_addr = pr->ndpr_prefix.sin6_addr; ifra.ifra_addr.sin6_addr.s6_addr32[0] &= mask.s6_addr32[0]; ifra.ifra_addr.sin6_addr.s6_addr32[1] &= mask.s6_addr32[1]; ifra.ifra_addr.sin6_addr.s6_addr32[2] &= mask.s6_addr32[2]; ifra.ifra_addr.sin6_addr.s6_addr32[3] &= mask.s6_addr32[3]; /* interface ID */ - if (ifid == NULL || IN6_IS_ADDR_UNSPECIFIED(ifid)) - ifid = &ib->ia_addr.sin6_addr; ifra.ifra_addr.sin6_addr.s6_addr32[0] |= - (ifid->s6_addr32[0] & ~mask.s6_addr32[0]); + (ib->ia_addr.sin6_addr.s6_addr32[0] & ~mask.s6_addr32[0]); ifra.ifra_addr.sin6_addr.s6_addr32[1] |= - (ifid->s6_addr32[1] & ~mask.s6_addr32[1]); + (ib->ia_addr.sin6_addr.s6_addr32[1] & ~mask.s6_addr32[1]); ifra.ifra_addr.sin6_addr.s6_addr32[2] |= - (ifid->s6_addr32[2] & ~mask.s6_addr32[2]); + (ib->ia_addr.sin6_addr.s6_addr32[2] & ~mask.s6_addr32[2]); ifra.ifra_addr.sin6_addr.s6_addr32[3] |= - (ifid->s6_addr32[3] & ~mask.s6_addr32[3]); + (ib->ia_addr.sin6_addr.s6_addr32[3] & ~mask.s6_addr32[3]); /* new prefix mask. */ ifra.ifra_prefixmask.sin6_len = sizeof(struct sockaddr_in6); @@ -1629,33 +1822,36 @@ in6_ifadd(pr, ifid) bcopy(&mask, &ifra.ifra_prefixmask.sin6_addr, sizeof(ifra.ifra_prefixmask.sin6_addr)); - /* - * lifetime. - * XXX: in6_init_address_ltimes would override these values later. - * We should reconsider this logic. - */ + /* lifetimes. */ ifra.ifra_lifetime.ia6t_vltime = pr->ndpr_vltime; ifra.ifra_lifetime.ia6t_pltime = pr->ndpr_pltime; /* XXX: scope zone ID? */ ifra.ifra_flags |= IN6_IFF_AUTOCONF; /* obey autoconf */ - /* - * temporarily set the nopfx flag to avoid conflict. - * XXX: we should reconsider the entire mechanism about prefix - * manipulation. + + /* + * Make sure that we do not have this address already. This should + * usually not happen, but we can still see this case, e.g., if we + * have manually configured the exact address to be configured. */ - ifra.ifra_flags |= IN6_IFF_NOPFX; + if (in6ifa_ifpwithaddr(ifp, &ifra.ifra_addr.sin6_addr) != NULL) { + /* this should be rare enough to make an explicit log */ + log(LOG_INFO, "in6_ifadd: %s is already configured\n", + ip6_sprintf(&ifra.ifra_addr.sin6_addr)); + return (NULL); + } /* - * keep the new address, regardless of the result of in6_update_ifa. - * XXX: this address is now meaningless. - * We should reconsider its role. + * Allocate ifaddr structure, link into chain, etc. + * If we are going to create a new address upon receiving a multicasted + * RA, we need to impose a random delay before starting DAD. + * [draft-ietf-ipv6-rfc2462bis-02.txt, Section 5.4.2] */ - pr->ndpr_addr = ifra.ifra_addr.sin6_addr; - - /* allocate ifaddr structure, link into chain, etc. */ - if ((error = in6_update_ifa(ifp, &ifra, NULL)) != 0) { + updateflags = 0; + if (mcast) + updateflags |= IN6_IFAUPDATE_DADDELAY; + if ((error = in6_update_ifa(ifp, &ifra, NULL, updateflags)) != 0) { nd6log((LOG_ERR, "in6_ifadd: failed to make ifaddr %s on %s (errno=%d)\n", ip6_sprintf(&ifra.ifra_addr.sin6_addr), if_name(ifp), @@ -1669,15 +1865,16 @@ in6_ifadd(pr, ifid) } int -in6_tmpifadd(ia0, forcegen) +in6_tmpifadd(ia0, forcegen, delay) const struct in6_ifaddr *ia0; /* corresponding public address */ - int forcegen; + int forcegen, delay; { struct ifnet *ifp = ia0->ia_ifa.ifa_ifp; - struct in6_ifaddr *newia; + struct in6_ifaddr *newia, *ia; struct in6_aliasreq ifra; int i, error; int trylimit = 3; /* XXX: adhoc value */ + int updateflags; u_int32_t randid[2]; time_t vltime0, pltime0; @@ -1693,27 +1890,38 @@ in6_tmpifadd(ia0, forcegen) } again: - in6_get_tmpifid(ifp, (u_int8_t *)randid, - (const u_int8_t *)&ia0->ia_addr.sin6_addr.s6_addr[8], forcegen); + if (in6_get_tmpifid(ifp, (u_int8_t *)randid, + (const u_int8_t *)&ia0->ia_addr.sin6_addr.s6_addr[8], forcegen)) { + nd6log((LOG_NOTICE, "in6_tmpifadd: failed to find a good " + "random IFID\n")); + return (EINVAL); + } ifra.ifra_addr.sin6_addr.s6_addr32[2] |= (randid[0] & ~(ifra.ifra_prefixmask.sin6_addr.s6_addr32[2])); ifra.ifra_addr.sin6_addr.s6_addr32[3] |= (randid[1] & ~(ifra.ifra_prefixmask.sin6_addr.s6_addr32[3])); - /* - * If by chance the new temporary address is the same as an address - * already assigned to the interface, generate a new randomized - * interface identifier and repeat this step. - * RFC 3041 3.3 (4). + /* + * in6_get_tmpifid() quite likely provided a unique interface ID. + * However, we may still have a chance to see collision, because + * there may be a time lag between generation of the ID and generation + * of the address. So, we'll do one more sanity check. */ - if (in6ifa_ifpwithaddr(ifp, &ifra.ifra_addr.sin6_addr) != NULL) { - if (trylimit-- == 0) { - nd6log((LOG_NOTICE, "in6_tmpifadd: failed to find " - "a unique random IFID\n")); - return (EEXIST); + for (ia = in6_ifaddr; ia; ia = ia->ia_next) { + if (IN6_ARE_ADDR_EQUAL(&ia->ia_addr.sin6_addr, + &ifra.ifra_addr.sin6_addr)) { + if (trylimit-- == 0) { + /* + * Give up. Something strange should have + * happened. + */ + nd6log((LOG_NOTICE, "in6_tmpifadd: failed to " + "find a unique random IFID\n")); + return (EEXIST); + } + forcegen = 1; + goto again; } - forcegen = 1; - goto again; } /* @@ -1723,16 +1931,18 @@ in6_tmpifadd(ia0, forcegen) * of the public address or TEMP_PREFERRED_LIFETIME - * DESYNC_FACTOR. */ - if (ia0->ia6_lifetime.ia6t_expire != 0) { + if (ia0->ia6_lifetime.ia6t_vltime != ND6_INFINITE_LIFETIME) { vltime0 = IFA6_IS_INVALID(ia0) ? 0 : - (ia0->ia6_lifetime.ia6t_expire - time_second); + (ia0->ia6_lifetime.ia6t_vltime - + (time_second - ia0->ia6_updatetime)); if (vltime0 > ip6_temp_valid_lifetime) vltime0 = ip6_temp_valid_lifetime; } else vltime0 = ip6_temp_valid_lifetime; - if (ia0->ia6_lifetime.ia6t_preferred != 0) { + if (ia0->ia6_lifetime.ia6t_pltime != ND6_INFINITE_LIFETIME) { pltime0 = IFA6_IS_DEPRECATED(ia0) ? 0 : - (ia0->ia6_lifetime.ia6t_preferred - time_second); + (ia0->ia6_lifetime.ia6t_pltime - + (time_second - ia0->ia6_updatetime)); if (pltime0 > ip6_temp_preferred_lifetime - ip6_desync_factor){ pltime0 = ip6_temp_preferred_lifetime - ip6_desync_factor; @@ -1754,7 +1964,10 @@ in6_tmpifadd(ia0, forcegen) ifra.ifra_flags |= (IN6_IFF_AUTOCONF|IN6_IFF_TEMPORARY); /* allocate ifaddr structure, link into chain, etc. */ - if ((error = in6_update_ifa(ifp, &ifra, NULL)) != 0) + updateflags = 0; + if (delay) + updateflags |= IN6_IFAUPDATE_DADDELAY; + if ((error = in6_update_ifa(ifp, &ifra, NULL, updateflags)) != 0) return (error); newia = in6ifa_ifpwithaddr(ifp, &ifra.ifra_addr.sin6_addr); @@ -1780,16 +1993,9 @@ in6_tmpifadd(ia0, forcegen) return (0); } -int +static int in6_init_prefix_ltimes(struct nd_prefix *ndpr) { - /* check if preferred lifetime > valid lifetime. RFC2462 5.5.3 (c) */ - if (ndpr->ndpr_pltime > ndpr->ndpr_vltime) { - nd6log((LOG_INFO, "in6_init_prefix_ltimes: preferred lifetime" - "(%d) is greater than valid lifetime(%d)\n", - (u_int)ndpr->ndpr_pltime, (u_int)ndpr->ndpr_vltime)); - return (EINVAL); - } if (ndpr->ndpr_pltime == ND6_INFINITE_LIFETIME) ndpr->ndpr_preferred = 0; else @@ -1891,6 +2097,8 @@ nd6_setdefaultiface(ifindex) if (ifindex < 0 || if_index < ifindex) return (EINVAL); + if (ifindex != 0 && !ifnet_byindex(ifindex)) + return (EINVAL); if (nd6_defifindex != ifindex) { nd6_defifindex = ifindex; @@ -1899,17 +2107,6 @@ nd6_setdefaultiface(ifindex) else nd6_defifp = NULL; - /* - * If the Default Router List is empty, install a route - * to the specified interface as default or remove the default - * route when the default interface becomes canceled. - * The check for the queue is actually redundant, but - * we do this here to avoid re-install the default route - * if the list is NOT empty. - */ - if (TAILQ_FIRST(&nd_defrouter) == NULL) - defrouter_select(); - /* * Our current implementation assumes one-to-one maping between * interfaces and links, so it would be natural to use the