From addeef8284676f5176cd777bace5e4f5bc3e32e2 Mon Sep 17 00:00:00 2001 From: "Bruce A. Mah" Date: Fri, 17 Oct 2003 15:12:01 +0000 Subject: [PATCH] Add multicast(4) and pim(4) manual pages and hook up to the build. Submitted by: Pavlin Radoslavov Reviewed by: hsu, bmah MFC after: 2 weeks --- share/man/man4/Makefile | 2 + share/man/man4/multicast.4 | 917 +++++++++++++++++++++++++++++++++++++ share/man/man4/pim.4 | 192 ++++++++ 3 files changed, 1111 insertions(+) create mode 100644 share/man/man4/multicast.4 create mode 100644 share/man/man4/pim.4 diff --git a/share/man/man4/Makefile b/share/man/man4/Makefile index 5262fbb5eea6..3dfbc5a4200c 100644 --- a/share/man/man4/Makefile +++ b/share/man/man4/Makefile @@ -130,6 +130,7 @@ MAN= aac.4 \ mac_test.4 \ mouse.4 \ mtio.4 \ + multicast.4 \ my.4 \ natm.4 \ natmip.4 \ @@ -186,6 +187,7 @@ MAN= aac.4 \ pcm.4 \ pcn.4 \ pcvt.4 \ + pim.4 \ polling.4 \ ppbus.4 \ ppc.4 \ diff --git a/share/man/man4/multicast.4 b/share/man/man4/multicast.4 new file mode 100644 index 000000000000..e71c16d6c820 --- /dev/null +++ b/share/man/man4/multicast.4 @@ -0,0 +1,917 @@ +.\" Copyright (c) 2001-2003 International Computer Science Institute +.\" +.\" Permission is hereby granted, free of charge, to any person obtaining a +.\" copy of this software and associated documentation files (the "Software"), +.\" to deal in the Software without restriction, including without limitation +.\" the rights to use, copy, modify, merge, publish, distribute, sublicense, +.\" and/or sell copies of the Software, and to permit persons to whom the +.\" Software is furnished to do so, subject to the following conditions: +.\" +.\" The above copyright notice and this permission notice shall be included in +.\" all copies or substantial portions of the Software. +.\" +.\" The names and trademarks of copyright holders may not be used in +.\" advertising or publicity pertaining to the software without specific +.\" prior permission. Title to copyright in this software and any associated +.\" documentation will at all times remain with the copyright holders. +.\" +.\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +.\" IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +.\" FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +.\" AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +.\" LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING +.\" FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER +.\" DEALINGS IN THE SOFTWARE. +.\" +.\" $FreeBSD$ +.\" +.Dd September 4, 2003 +.Dt MULTICAST 4 +.Os +.\" +.Sh NAME +.Nm multicast +.Nd Multicast Routing +.\" +.Sh SYNOPSIS +.Cd "options MROUTING" +.Pp +.In sys/types.h +.In sys/socket.h +.In netinet/in.h +.In netinet/ip_mroute.h +.In netinet6/ip6_mroute.h +.Ft int +.Fn getsockopt "int s" IPPROTO_IP MRT_INIT "void *optval" "socklen_t *optlen" +.Ft int +.Fn setsockopt "int s" IPPROTO_IP MRT_INIT "const void *optval" "socklen_t optlen" +.Ft int +.Fn getsockopt "int s" IPPROTO_IPV6 MRT6_INIT "void *optval" "socklen_t *optlen" +.Ft int +.Fn setsockopt "int s" IPPROTO_IPV6 MRT6_INIT "const void *optval" "socklen_t optlen" +.Sh DESCRIPTION +.Tn "Multicast routing" +is used to efficiently propagate data +packets to a set of multicast listeners in multipoint networks. +If unicast is used to replicate the data to all listeners, +then some of the network links may carry multiple copies of the same +data packets. +With multicast routing, the overhead is reduced to one copy +(at most) per network link. +.Pp +All multicast-capable routers must run a common multicast routing +protocol. +The Distance Vector Multicast Routing Protocol (DVMRP) +was the first developed multicast routing protocol. +Later, other protocols such as Multicast Extensions to OSPF (MOSPF), +Core Based Trees (CBT), +Protocol Independent Multicast - Sparse Mode (PIM-SM), +and Protocol Independent Multicast - Dense Mode (PIM-DM) +were developed as well. +.Pp +To start multicast routing, +the user must enable multicast forwarding in the kernel +(see +.Sx SYNOPSIS +about the kernel configuration options), +and must run a multicast routing capable user-level process. +From developer's point of view, +the programming guide described in the +.Sx "Programming Guide" +section should be used to control the multicast forwarding in the kernel. +.\" +.Ss Programming Guide +This section provides information about the basic multicast routing API. +The so-called +.Dq advanced multicast API +is described in the +.Sx "Advanced Multicast API Programming Guide" +section. +.Pp +First, a multicast routing socket must be open. +That socket would be used +to control the multicast forwarding in the kernel. +Note that most operations below require certain privilege +(i.e., root privilege): +.Pp +.Bd -literal +/* IPv4 */ +int mrouter_s4; +mrouter_s4 = socket(AF_INET, SOCK_RAW, IPPROTO_IGMP); +.Ed +.Pp +.Bd -literal +int mrouter_s6; +mrouter_s6 = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6); +.Ed +.Pp +Note that if the router needs to open an IGMP or ICMPv6 socket +(in case of IPv4 and IPv6 respectively) +for sending or receiving of IGMP or MLD multicast group membership messages, +then the same mrouter_s4 or mrouter_s6 sockets should be used +for sending and receiving respectively IGMP or MLD messages. +In case of BSD-derived kernel, it may be possible to open separate sockets +for IGMP or MLD messages only. +However, some other kernels (e.g., Linux) require that the multicast +routing socket must be used for sending and receiving of IGMP or MLD +messages. +Therefore, for portability reason the multicast +routing socket should be reused for IGMP and MLD messages as well. +.Pp +After the multicast routing socket is open, it can be used to enable +or disable multicast forwarding in the kernel: +.Bd -literal +/* IPv4 */ +int v = 1; /* 1 to enable, or 0 to disable */ +setsockopt(mrouter_s4, IPPROTO_IP, MRT_INIT, (void *)&v, sizeof(v)); +.Ed +.Pp +.Bd -literal +/* IPv6 */ +int v = 1; /* 1 to enable, or 0 to disable */ +setsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_INIT, (void *)&v, sizeof(v)); +\&... +/* If necessary, filter all ICMPv6 messages */ +struct icmp6_filter filter; +ICMP6_FILTER_SETBLOCKALL(&filter); +setsockopt(mrouter_s6, IPPROTO_ICMPV6, ICMP6_FILTER, (void *)&filter, + sizeof(filter)); +.Ed +.Pp +After multicast forwarding is enabled, the multicast routing socket +can be used to enable PIM processing in the kernel if we are running PIM-SM or +PIM-DM +(see +.Xr pim 4 ) . +.Pp +For each network interface (e.g., physical or a virtual tunnel) +that would be used for multicast forwarding, a corresponding +multicast interface must be added to the kernel: +.Bd -literal +/* IPv4 */ +struct vifctl vc; +memset(&vc, 0, sizeof(vc)); +/* Assign all vifctl fields as appropriate */ +vc.vifc_vifi = vif_index; +vc.vifc_flags = vif_flags; +vc.vifc_threshold = min_ttl_threshold; +vc.vifc_rate_limit = max_rate_limit; +memcpy(&vc.vifc_lcl_addr, &vif_local_address, sizeof(vc.vifc_lcl_addr)); +if (vc.vifc_flags & VIFF_TUNNEL) + memcpy(&vc.vifc_rmt_addr, &vif_remote_address, + sizeof(vc.vifc_rmt_addr)); +setsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_VIF, (void *)&vc, + sizeof(vc)); +.Ed +.Pp +The +.Dq vif_index +must be unique per vif. +The +.Dq vif_flags +contains the +.Dq VIFF_* +flags as defined in . +The +.Dq min_ttl_threshold +contains the minimum TTL a multicast data packet must have to be +forwarded on that vif. +Typically, it would have value of 1. +The +.Dq max_rate_limit +contains the maximum rate (in bits/s) of the multicast data packets forwarded +on that vif. +Value of 0 means no limit. +The +.Dq vif_local_address +contains the local IP address of the corresponding local interface. +The +.Dq vif_remote_address +contains the remote IP address in case of DVMRP multicast tunnels. +.Pp +.Bd -literal +/* IPv6 */ +struct mif6ctl mc; +memset(&mc, 0, sizeof(mc)); +/* Assign all mif6ctl fields as appropriate */ +mc.mif6c_mifi = mif_index; +mc.mif6c_flags = mif_flags; +mc.mif6c_pifi = pif_index; +setsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_ADD_MIF, (void *)&mc, + sizeof(mc)); +.Ed +.Pp +The +.Dq mif_index +must be unique per vif. +The +.Dq mif_flags +contains the +.Dq MIFF_* +flags as defined in . +The +.Dq pif_index +is the physical interface index of the corresponding local interface. +.Pp +A multicast interface is deleted by: +.Bd -literal +/* IPv4 */ +vifi_t vifi = vif_index; +setsockopt(mrouter_s4, IPPROTO_IP, MRT_DEL_VIF, (void *)&vifi, + sizeof(vifi)); +.Ed +.Pp +.Bd -literal +/* IPv6 */ +mifi_t mifi = mif_index; +setsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_DEL_MIF, (void *)&mifi, + sizeof(mifi)); +.Ed +.Pp +After the multicast forwarding is enabled, and the multicast virtual +interfaces are +added, the kernel may deliver upcall messages (also called signals +later in this text) on the multicast routing socket that was open +earlier with +.Dq MRT_INIT +or +.Dq MRT6_INIT . +The IPv4 upcalls have +.Dq struct igmpmsg +header (see ) with field +.Dq im_mbz +set to zero. +Note that this header follows the structure of +.Dq struct ip +with the protocol field +.Dq ip_p +set to zero. +The IPv6 upcalls have +.Dq struct mrt6msg +header (see ) with field +.Dq im6_mbz +set to zero. +Note that this header follows the structure of +.Dq struct ip6_hdr +with the next header field +.Dq ip6_nxt +set to zero. +.Pp +The upcall header contains field +.Dq im_msgtype +and +.Dq im6_msgtype +with the type of the upcall +.Dq IGMPMSG_* +and +.Dq MRT6MSG_* +for IPv4 and IPv6 respectively. +The values of the rest of the upcall header fields +and the body of the upcall message depend on the particular upcall type. +.Pp +If the upcall message type is +.Dq IGMPMSG_NOCACHE +or +.Dq MRT6MSG_NOCACHE , +this is an indication that a multicast packet has reached the multicast +router, but the router has no forwarding state for that packet. +Typically, the upcall would be a signal for the multicast routing +user-level process to install the appropriate Multicast Forwarding +Cache (MFC) entry in the kernel. +.Pp +A MFC entry is added by: +.Bd -literal +/* IPv4 */ +struct mfcctl mc; +memset(&mc, 0, sizeof(mc)); +memcpy(&mc.mfcc_origin, &source_addr, sizeof(mc.mfcc_origin)); +memcpy(&mc.mfcc_mcastgrp, &group_addr, sizeof(mc.mfcc_mcastgrp)); +mc.mfcc_parent = iif_index; +for (i = 0; i < maxvifs; i++) + mc.mfcc_ttls[i] = oifs_ttl[i]; +setsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_MFC, + (void *)&mc, sizeof(mc)); +.Ed +.Pp +.Bd -literal +/* IPv6 */ +struct mf6cctl mc; +memset(&mc, 0, sizeof(mc)); +memcpy(&mc.mf6cc_origin, &source_addr, sizeof(mc.mf6cc_origin)); +memcpy(&mc.mf6cc_mcastgrp, &group_addr, sizeof(mf6cc_mcastgrp)); +mc.mf6cc_parent = iif_index; +for (i = 0; i < maxvifs; i++) + if (oifs_ttl[i] > 0) + IF_SET(i, &mc.mf6cc_ifset); +setsockopt(mrouter_s4, IPPROTO_IPV6, MRT6_ADD_MFC, + (void *)&mc, sizeof(mc)); +.Ed +.Pp +The +.Dq source_addr +and +.Dq group_addr +are the source and group address of the multicast packet (as set +in the upcall message). +The +.Dq iif_index +is the virtual interface index of the multicast interface the multicast +packets for this specific source and group address should be received on. +The +.Dq oifs_ttl[] +array contains the minimum TTL (per interface) a multicast packet +should have to be forwarded on an outgoing interface. +If the TTL value is zero, the corresponding interface is not included +in the set of outgoing interfaces. +Note that in case of IPv6 only the set of outgoing interfaces can +be specified. +.Pp +A MFC entry is deleted by: +.Bd -literal +/* IPv4 */ +struct mfcctl mc; +memset(&mc, 0, sizeof(mc)); +memcpy(&mc.mfcc_origin, &source_addr, sizeof(mc.mfcc_origin)); +memcpy(&mc.mfcc_mcastgrp, &group_addr, sizeof(mc.mfcc_mcastgrp)); +setsockopt(mrouter_s4, IPPROTO_IP, MRT_DEL_MFC, + (void *)&mc, sizeof(mc)); +.Ed +.Pp +.Bd -literal +/* IPv6 */ +struct mf6cctl mc; +memset(&mc, 0, sizeof(mc)); +memcpy(&mc.mf6cc_origin, &source_addr, sizeof(mc.mf6cc_origin)); +memcpy(&mc.mf6cc_mcastgrp, &group_addr, sizeof(mf6cc_mcastgrp)); +setsockopt(mrouter_s4, IPPROTO_IPV6, MRT6_DEL_MFC, + (void *)&mc, sizeof(mc)); +.Ed +.Pp +The following method can be used to get various statistics per +installed MFC entry in the kernel (e.g., the number of forwarded +packets per source and group address): +.Bd -literal +/* IPv4 */ +struct sioc_sg_req sgreq; +memset(&sgreq, 0, sizeof(sgreq)); +memcpy(&sgreq.src, &source_addr, sizeof(sgreq.src)); +memcpy(&sgreq.grp, &group_addr, sizeof(sgreq.grp)); +ioctl(mrouter_s4, SIOCGETSGCNT, &sgreq); +.Ed +.Pp +.Bd -literal +/* IPv6 */ +struct sioc_sg_req6 sgreq; +memset(&sgreq, 0, sizeof(sgreq)); +memcpy(&sgreq.src, &source_addr, sizeof(sgreq.src)); +memcpy(&sgreq.grp, &group_addr, sizeof(sgreq.grp)); +ioctl(mrouter_s6, SIOCGETSGCNT_IN6, &sgreq); +.Ed +.Pp +The following method can be used to get various statistics per +multicast virtual interface in the kernel (e.g., the number of forwarded +packets per interface): +.Bd -literal +/* IPv4 */ +struct sioc_vif_req vreq; +memset(&vreq, 0, sizeof(vreq)); +vreq.vifi = vif_index; +ioctl(mrouter_s4, SIOCGETVIFCNT, &vreq); +.Ed +.Pp +.Bd -literal +/* IPv6 */ +struct sioc_mif_req6 mreq; +memset(&mreq, 0, sizeof(mreq)); +mreq.mifi = vif_index; +ioctl(mrouter_s6, SIOCGETMIFCNT_IN6, &mreq); +.Ed +.Pp +.Ss Advanced Multicast API Programming Guide +If we want to add new features in the kernel, it becomes difficult +to preserve backward compatibility (binary and API), +and at the same time to allow user-level processes to take advantage of +the new features (if the kernel supports them). +.Pp +One of the mechanisms that allows us to preserve the backward +compatibility is a sort of negotiation +between the user-level process and the kernel: +.Bl -enum +.It +The user-level process tries to enable in the kernel the set of new +features (and the corresponding API) it would like to use. +.It +The kernel returns the (sub)set of features it knows about +and is willing to be enabled. +.It +The user-level process uses only that set of features +the kernel has agreed on. +.El +.\" +.Pp +To support backward compatibility, if the user-level process doesn't +ask for any new features, the kernel defaults to the basic +multicast API (see the +.Sx "Programming Guide" +section). +.\" XXX: edit as appropriate after the advanced multicast API is +.\" supported under IPv6 +Currently, the advanced multicast API exists only for IPv4; +in the future there will be IPv6 support as well. +.Pp +Below is a summary of the expandable API solution. +Note that all new options and structures are defined +in and , +unless stated otherwise. +.Pp +The user-level process uses new get/setsockopt() options to +perform the API features negotiation with the kernel. +This negotiation must be performed right after the multicast routing +socket is open. +The set of desired/allowed features is stored in a bitset +(currently, in uint32_t; i.e., maximum of 32 new features). +The new get/setsockopt() options are +.Dq MRT_API_SUPPORT +and +.Dq MRT_API_CONFIG . +Example: +.Bd -literal +uint32_t v; +getsockopt(sock, IPPROTO_IP, MRT_API_SUPPORT, (void *)&v, sizeof(v)); +.Ed +.Pp +would set in +.Dq v +the pre-defined bits that the kernel API supports. +The eight least significant bits in uint32_t are same as the +eight possible flags +.Dq MRT_MFC_FLAGS_* +that can be used in +.Dq mfcc_flags +as part of the new definition of +.Dq struct mfcctl +(see below about those flags), which leaves 24 flags for other new features. +The value returned by getsockopt(MRT_API_SUPPORT) is read-only; in other +words, setsockopt(MRT_API_SUPPORT) would fail. +.Pp +To modify the API, and to set some specific feature in the kernel, then: +.Bd -literal +uint32_t v = MRT_MFC_FLAGS_DISABLE_WRONGVIF; +if (setsockopt(sock, IPPROTO_IP, MRT_API_CONFIG, (void *)&v, sizeof(v)) + != 0) { + return (ERROR); +} +if (v & MRT_MFC_FLAGS_DISABLE_WRONGVIF) + return (OK); /* Success */ +else + return (ERROR); +.Ed +.Pp +In other words, when setsockopt(MRT_API_CONFIG) is called, the +argument to it specifies the desired set of features to +be enabled in the API and the kernel. +The return value in +.Dq v +is the actual (sub)set of features that were enabled in the kernel. +To obtain later the same set of features that were enabled, then: +.Bd -literal +getsockopt(sock, IPPROTO_IP, MRT_API_CONFIG, (void *)&v, sizeof(v)); +.Ed +.Pp +The set of enabled features is global. +In other words, setsockopt(MRT_API_CONFIG) +should be called right after setsockopt(MRT_INIT). +.Pp +Currently, the following set of new features is defined: +.Bd -literal +#define MRT_MFC_FLAGS_DISABLE_WRONGVIF (1 << 0) /* disable WRONGVIF signals */ +#define MRT_MFC_FLAGS_BORDER_VIF (1 << 1) /* border vif */ +#define MRT_MFC_RP (1 << 8) /* enable RP address */ +#define MRT_MFC_BW_UPCALL (1 << 9) /* enable bw upcalls */ +.Ed +.\" .Pp +.\" In the future there might be: +.\" .Bd -literal +.\" #define MRT_MFC_GROUP_SPECIFIC (1 << 10) /* allow (*,G) MFC entries */ +.\" .Ed +.\" .Pp +.\" to allow (*,G) MFC entries (i.e., group-specific entries) in the kernel. +.\" For now this is left-out until it is clear whether +.\" (*,G) MFC support is the preferred solution instead of something more generic +.\" solution for example. +.\" +.\" 2. The newly defined struct mfcctl2. +.\" +.Pp +The advanced multicast API uses a newly defined +.Dq struct mfcctl2 +instead of the traditional +.Dq struct mfcctl . +The original +.Dq struct mfcctl +is kept as is. +The new +.Dq struct mfcctl2 +is: +.Bd -literal +/* + * The new argument structure for MRT_ADD_MFC and MRT_DEL_MFC overlays + * and extends the old struct mfcctl. + */ +struct mfcctl2 { + /* the mfcctl fields */ + struct in_addr mfcc_origin; /* ip origin of mcasts */ + struct in_addr mfcc_mcastgrp; /* multicast group associated*/ + vifi_t mfcc_parent; /* incoming vif */ + u_char mfcc_ttls[MAXVIFS];/* forwarding ttls on vifs */ + + /* extension fields */ + uint8_t mfcc_flags[MAXVIFS];/* the MRT_MFC_FLAGS_* flags*/ + struct in_addr mfcc_rp; /* the RP address */ +}; +.Ed +.Pp +The new fields are +.Dq mfcc_flags[MAXVIFS] +and +.Dq mfcc_rp . +Note that for compatibility reasons they are added at the end. +.Pp +The +.Dq mfcc_flags[MAXVIFS] +field is used to set various flags per +interface per (S,G) entry. +Currently, the defined flags are: +.Bd -literal +#define MRT_MFC_FLAGS_DISABLE_WRONGVIF (1 << 0) /* disable WRONGVIF signals */ +#define MRT_MFC_FLAGS_BORDER_VIF (1 << 1) /* border vif */ +.Ed +.Pp +The +.Dq MRT_MFC_FLAGS_DISABLE_WRONGVIF +flag is used to explicitly disable the +.Dq IGMPMSG_WRONGVIF +kernel signal at the (S,G) granularity if a multicast data packet +arrives on the wrong interface. +Usually, this signal is used to +complete the shortest-path switch in case of PIM-SM multicast routing, +or to trigger a PIM assert message. +However, it should not be delivered for interfaces that are not in +the outgoing interface set, and that are not expecting to +become an incoming interface. +Hence, if the +.Dq MRT_MFC_FLAGS_DISABLE_WRONGVIF +flag is set for some of the +interfaces, then a data packet that arrives on that interface for +that MFC entry will NOT trigger a WRONGVIF signal. +If that flag is not set, then a signal is triggered (the default action). +.Pp +The +.Dq MRT_MFC_FLAGS_BORDER_VIF +flag is used to specify whether the Border-bit in PIM +Register messages should be set (in case when the Register encapsulation +is performed inside the kernel). +If it is set for the special PIM Register kernel virtual interface +(see +.Xr pim 4 ) , +the Border-bit in the Register messages sent to the RP will be set. +.Pp +The remaining six bits are reserved for future usage. +.Pp +The +.Dq mfcc_rp +field is used to specify the RP address (in case of PIM-SM multicast routing) +for a multicast +group G if we want to perform kernel-level PIM Register encapsulation. +The +.Dq mfcc_rp +field is used only if the +.Dq MRT_MFC_RP +advanced API flag/capability has been successfully set by +setsockopt(MRT_API_CONFIG). +.Pp +.\" +.\" 3. Kernel-level PIM Register encapsulation +.\" +If the +.Dq MRT_MFC_RP +flag was successfully set by +setsockopt(MRT_API_CONFIG), then the kernel will attempt to perform +the PIM Register encapsulation itself instead of sending the +multicast data packets to user level (inside IGMPMSG_WHOLEPKT +upcalls) for user-level encapsulation. +The RP address would be taken from the +.Dq mfcc_rp +field +inside the new +.Dq struct mfcctl2 . +However, even if the +.Dq MRT_MFC_RP +flag was successfully set, if the +.Dq mfcc_rp +field was set to +.Dq INADDR_ANY , +then the +kernel will still deliver an IGMPMSG_WHOLEPKT upcall with the +multicast data packet to the user-level process. +.Pp +In addition, if the multicast data packet is too large to fit within +a single IP packet after the PIM Register encapsulation (e.g., if +its size was on the order of 65500 bytes), the data packet will be +fragmented, and then each of the fragments will be encapsulated +separately. +Note that typically a multicast data packet can be that +large only if it was originated locally from the same hosts that +performs the encapsulation; otherwise the transmission of the +multicast data packet over Ethernet for example would have +fragmented it into much smaller pieces. +.\" +.\" Note that if this code is ported to IPv6, we may need the kernel to +.\" perform MTU discovery to the RP, and keep those discoveries inside +.\" the kernel so the encapsulating router may send back ICMP +.\" Fragmentation Required if the size of the multicast data packet is +.\" too large (see "Encapsulating data packets in the Register Tunnel" +.\" in Section 4.4.1 in the PIM-SM spec +.\" draft-ietf-pim-sm-v2-new-05.{txt,ps}). +.\" For IPv4 we may be able to get away without it, but for IPv6 we need +.\" that. +.\" +.\" 4. Mechanism for "multicast bandwidth monitoring and upcalls". +.\" +.Pp +Typically, a multicast routing user-level process would need to know the +forwarding bandwidth for some data flow. +For example, the multicast routing process may want to timeout idle MFC +entries, or in case of PIM-SM it can initiate (S,G) shortest-path switch if +the bandwidth rate is above a threshold for example. +.Pp +The original solution for measuring the bandwidth of a dataflow was +that a user-level process would periodically +query the kernel about the number of forwarded packets/bytes per +(S,G), and then based on those numbers it would estimate whether a source +has been idle, or whether the source's transmission bandwidth is above a +threshold. +That solution is far from being scalable, hence the need for a new +mechanism for bandwidth monitoring. +.Pp +Below is a description of the bandwidth monitoring mechanism. +.Bl -bullet +.It +If the bandwidth of a data flow satisfies some pre-defined filter, +the kernel delivers an upcall on the multicast routing socket +to the multicast routing process that has installed that filter. +.It +The bandwidth-upcall filters are installed per (S,G). There can be +more than one filter per (S,G). +.It +Instead of supporting all possible comparison operations +(i.e., < <= == != > >= ), there is support only for the +<= and >= operations, +because this makes the kernel-level implementation simpler, +and because practically we need only those two. +Further, the missing operations can be simulated by secondary +user-level filtering of those <= and >= filters. +For example, to simulate !=, then we need to install filter +.Dq bw <= 0xffffffff , +and after an +upcall is received, we need to check whether +.Dq measured_bw != expected_bw . +.It +The bandwidth-upcall mechanism is enabled by +setsockopt(MRT_API_CONFIG) for the MRT_MFC_BW_UPCALL flag. +.It +The bandwidth-upcall filters are added/deleted by the new +setsockopt(MRT_ADD_BW_UPCALL) and setsockopt(MRT_DEL_BW_UPCALL) +respectively (with the appropriate +.Dq struct bw_upcall +argument of course). +.El +.Pp +From application point of view, a developer needs to know about +the following: +.Bd -literal +/* + * Structure for installing or delivering an upcall if the + * measured bandwidth is above or below a threshold. + * + * User programs (e.g. daemons) may have a need to know when the + * bandwidth used by some data flow is above or below some threshold. + * This interface allows the userland to specify the threshold (in + * bytes and/or packets) and the measurement interval. Flows are + * all packet with the same source and destination IP address. + * At the moment the code is only used for multicast destinations + * but there is nothing that prevents its use for unicast. + * + * The measurement interval cannot be shorter than some Tmin (currently, 3s). + * The threshold is set in packets and/or bytes per_interval. + * + * Measurement works as follows: + * + * For >= measurements: + * The first packet marks the start of a measurement interval. + * During an interval we count packets and bytes, and when we + * pass the threshold we deliver an upcall and we are done. + * The first packet after the end of the interval resets the + * count and restarts the measurement. + * + * For <= measurement: + * We start a timer to fire at the end of the interval, and + * then for each incoming packet we count packets and bytes. + * When the timer fires, we compare the value with the threshold, + * schedule an upcall if we are below, and restart the measurement + * (reschedule timer and zero counters). + */ + +struct bw_data { + struct timeval b_time; + uint64_t b_packets; + uint64_t b_bytes; +}; + +struct bw_upcall { + struct in_addr bu_src; /* source address */ + struct in_addr bu_dst; /* destination address */ + uint32_t bu_flags; /* misc flags (see below) */ +#define BW_UPCALL_UNIT_PACKETS (1 << 0) /* threshold (in packets) */ +#define BW_UPCALL_UNIT_BYTES (1 << 1) /* threshold (in bytes) */ +#define BW_UPCALL_GEQ (1 << 2) /* upcall if bw >= threshold */ +#define BW_UPCALL_LEQ (1 << 3) /* upcall if bw <= threshold */ +#define BW_UPCALL_DELETE_ALL (1 << 4) /* delete all upcalls for s,d*/ + struct bw_data bu_threshold; /* the bw threshold */ + struct bw_data bu_measured; /* the measured bw */ +}; + +/* max. number of upcalls to deliver together */ +#define BW_UPCALLS_MAX 128 +/* min. threshold time interval for bandwidth measurement */ +#define BW_UPCALL_THRESHOLD_INTERVAL_MIN_SEC 3 +#define BW_UPCALL_THRESHOLD_INTERVAL_MIN_USEC 0 +.Ed +.Pp +The +.Dq bw_upcall +structure is used as an argument to +setsockopt(MRT_ADD_BW_UPCALL) and setsockopt(MRT_DEL_BW_UPCALL). +Each setsockopt(MRT_ADD_BW_UPCALL) installs a filter in the kernel +for the source and destination address in the +.Dq bw_upcall +argument, +and that filter will trigger an upcall according to the following +pseudo-algorithm: +.Bd -literal + if (bw_upcall_oper IS ">=") { + if (((bw_upcall_unit & PACKETS == PACKETS) && + (measured_packets >= threshold_packets)) || + ((bw_upcall_unit & BYTES == BYTES) && + (measured_bytes >= threshold_bytes))) + SEND_UPCALL("measured bandwidth is >= threshold"); + } + if (bw_upcall_oper IS "<=" && measured_interval >= threshold_interval) { + if (((bw_upcall_unit & PACKETS == PACKETS) && + (measured_packets <= threshold_packets)) || + ((bw_upcall_unit & BYTES == BYTES) && + (measured_bytes <= threshold_bytes))) + SEND_UPCALL("measured bandwidth is <= threshold"); + } +.Ed +.Pp +In the same +.Dq bw_upcall +the unit can be specified in both BYTES and PACKETS. +However, the GEQ and LEQ flags are mutually exclusive. +.Pp +Basically, an upcall is delivered if the measured bandwidth is >= or +<= the threshold bandwidth (within the specified measurement +interval). +For practical reasons, the smallest value for the measurement +interval is 3 seconds. +If smaller values are allowed, then the bandwidth +estimation may be less accurate, or the potentially very high frequency +of the generated upcalls may introduce too much overhead. +For the >= operation, the answer may be known before the end of +.Dq threshold_interval , +therefore the upcall may be delivered earlier. +For the <= operation however, we must wait +until the threshold interval has expired to know the answer. +.Pp +Example of usage: +.Bd -literal +struct bw_upcall bw_upcall; +/* Assign all bw_upcall fields as appropriate */ +memset(&bw_upcall, 0, sizeof(bw_upcall)); +memcpy(&bw_upcall.bu_src, &source, sizeof(bw_upcall.bu_src)); +memcpy(&bw_upcall.bu_dst, &group, sizeof(bw_upcall.bu_dst)); +bw_upcall.bu_threshold.b_data = threshold_interval; +bw_upcall.bu_threshold.b_packets = threshold_packets; +bw_upcall.bu_threshold.b_bytes = threshold_bytes; +if (is_threshold_in_packets) + bw_upcall.bu_flags |= BW_UPCALL_UNIT_PACKETS; +if (is_threshold_in_bytes) + bw_upcall.bu_flags |= BW_UPCALL_UNIT_BYTES; +do { + if (is_geq_upcall) { + bw_upcall.bu_flags |= BW_UPCALL_GEQ; + break; + } + if (is_leq_upcall) { + bw_upcall.bu_flags |= BW_UPCALL_LEQ; + break; + } + return (ERROR); +} while (0); +setsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_BW_UPCALL, + (void *)&bw_upcall, sizeof(bw_upcall)); +.Ed +.Pp +To delete a single filter, then use MRT_DEL_BW_UPCALL, +and the fields of bw_upcall must be set +exactly same as when MRT_ADD_BW_UPCALL was called. +.Pp +To delete all bandwidth filters for a given (S,G), then +only the +.Dq bu_src +and +.Dq bu_dst +fields in +.Dq struct bw_upcall +need to be set, and then just set only the +.Dq BW_UPCALL_DELETE_ALL +flag inside field +.Dq bw_upcall.bu_flags . +.Pp +The bandwidth upcalls are received by aggregating them in the new upcall +message: +.Bd -literal +#define IGMPMSG_BW_UPCALL 4 /* BW monitoring upcall */ +.Ed +.Pp +This message is an array of +.Dq struct bw_upcall +elements (up to BW_UPCALLS_MAX = 128). +The upcalls are +delivered when there are 128 pending upcalls, or when 1 second has +expired since the previous upcall (whichever comes first). +In an +.Dq struct upcall +element, the +.Dq bu_measured +field is filled-in to +indicate the particular measured values. +However, because of the way +the particular intervals are measured, the user should be careful how +bu_measured.b_time is used. +For example, if the +filter is installed to trigger an upcall if the number of packets +is >= 1, then +.Dq bu_measured +may have a value of zero in the upcalls after the +first one, because the measured interval for >= filters is +.Dq clocked +by the forwarded packets. +Hence, this upcall mechanism should not be used for measuring +the exact value of the bandwidth of the forwarded data. +To measure the exact bandwidth, the user would need to +get the forwarded packets statistics with the ioctl(SIOCGETSGCNT) +mechanism +(see the +.Sx Programming Guide +section) . +.Pp +Note that the upcalls for a filter are delivered until the specific +filter is deleted, but no more frequently than once per +.Dq bu_threshold.b_time . +For example, if the filter is specified to +deliver a signal if bw >= 1 packet, the first packet will trigger a +signal, but the next upcall will be triggered no earlier than +.Dq bu_threshold.b_time +after the previous upcall. +.Pp +.\" +.Sh SEE ALSO +.Xr getsockopt 2 , +.Xr recvfrom 2 , +.Xr recvmsg 2 , +.Xr setsockopt 2 , +.Xr socket 2 , +.Xr icmp6 4 , +.Xr inet 4 , +.Xr inet6 4 , +.Xr intro 4 , +.Xr ip 4 , +.Xr ip6 4 , +.Xr pim 4 +.\" +.Pp +.Sh AUTHORS +The original multicast code was written by David Waitzman (BBN Labs), +and later modified by the following individuals: +Steve Deering (Stanford), Mark J. Steiglitz (Stanford), +Van Jacobson (LBL), Ajit Thyagarajan (PARC), +Bill Fenner (PARC). +The IPv6 multicast support was implemented by the KAME project +(http://www.kame.net), and was based on the IPv4 multicast code. +The advanced multicast API and the multicast bandwidth +monitoring were implemented by Pavlin Radoslavov (ICSI) +in collaboration with Chris Brown (NextHop). +.Pp +This manual page was written by Pavlin Radoslavov (ICSI). diff --git a/share/man/man4/pim.4 b/share/man/man4/pim.4 new file mode 100644 index 000000000000..bf1e88d009ac --- /dev/null +++ b/share/man/man4/pim.4 @@ -0,0 +1,192 @@ +.\" Copyright (c) 2001-2003 International Computer Science Institute +.\" +.\" Permission is hereby granted, free of charge, to any person obtaining a +.\" copy of this software and associated documentation files (the "Software"), +.\" to deal in the Software without restriction, including without limitation +.\" the rights to use, copy, modify, merge, publish, distribute, sublicense, +.\" and/or sell copies of the Software, and to permit persons to whom the +.\" Software is furnished to do so, subject to the following conditions: +.\" +.\" The above copyright notice and this permission notice shall be included in +.\" all copies or substantial portions of the Software. +.\" +.\" The names and trademarks of copyright holders may not be used in +.\" advertising or publicity pertaining to the software without specific +.\" prior permission. Title to copyright in this software and any associated +.\" documentation will at all times remain with the copyright holders. +.\" +.\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +.\" IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +.\" FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +.\" AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +.\" LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING +.\" FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER +.\" DEALINGS IN THE SOFTWARE. +.\" +.\" $FreeBSD$ +.\" +.Dd September 4, 2003 +.Dt PIM 4 +.Os +.\" +.Sh NAME +.Nm pim +.Nd Protocol Independent Multicast +.\" +.Sh SYNOPSIS +.Cd "options MROUTING" +.Cd "options PIM" +.Pp +.In sys/types.h +.In sys/socket.h +.In netinet/in.h +.In netinet/ip_mroute.h +.In netinet/pim.h +.Ft int +.Fn getsockopt "int s" IPPROTO_IP MRT_PIM "void *optval" "socklen_t *optlen" +.Ft int +.Fn setsockopt "int s" IPPROTO_IP MRT_PIM "const void *optval" "socklen_t optlen" +.Ft int +.Fn getsockopt "int s" IPPROTO_IPV6 MRT6_PIM "void *optval" "socklen_t *optlen" +.Ft int +.Fn setsockopt "int s" IPPROTO_IPV6 MRT6_PIM "const void *optval" "socklen_t optlen" +.Sh DESCRIPTION +.Tn PIM +is the common name for two multicast routing protocols: +Protocol Independent Multicast - Sparse Mode (PIM-SM) and +Protocol Independent Multicast - Dense Mode (PIM-DM). +.Pp +PIM-SM is a multicast routing protocol that can use the underlying +unicast routing information base or a separate multicast-capable +routing information base. +It builds unidirectional shared trees rooted at a Rendezvous +Point (RP) per group, +and optionally creates shortest-path trees per source. +.Pp +PIM-DM is a multicast routing protocol that uses the underlying +unicast routing information base to flood multicast datagrams +to all multicast routers. +Prune messages are used to prevent future datagrams from propagating +to routers with no group membership information. +.Pp +Both PIM-SM and PIM-DM are fairly complex protocols, +though PIM-SM is much more complex. +To enable PIM-SM or PIM-DM multicast routing in a router, +the user must enable multicast routing and PIM processing in the kernel +(see +.Sx SYNOPSIS +about the kernel configuration options), +and must run a PIM-SM or PIM-DM capable user-level process. +From developer's point of view, +the programming guide described in the +.Sx "Programming Guide" +section should be used to control the PIM processing in the kernel. +.\" +.Ss Programming Guide +After a multicast routing socket is open and multicast forwarding +is enabled in the kernel +(see +.Xr multicast 4 ) , +one of the following socket options should be used to enable or disable +PIM processing in the kernel. +Note that those options require certain privilege +(i.e., root privilege): +.Pp +.Bd -literal +/* IPv4 */ +int v = 1; /* 1 to enable, or 0 to disable */ +setsockopt(mrouter_s4, IPPROTO_IP, MRT_PIM, (void *)&v, sizeof(v)); +.Ed +.Pp +.Bd -literal +/* IPv6 */ +int v = 1; /* 1 to enable, or 0 to disable */ +setsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_PIM, (void *)&v, sizeof(v)); +.Ed +.Pp +After PIM processing is enabled, the multicast-capable interfaces +should be added +(see +.Xr multicast 4 ) . +In case of PIM-SM, the PIM-Register virtual interface must be added +as well. +This can be accomplished by using the following options: +.Bd -literal +/* IPv4 */ +struct vifctl vc; +memset(&vc, 0, sizeof(vc)); +/* Assign all vifctl fields as appropriate */ +\&... +if (is_pim_register_vif) + vc.vifc_flags |= VIFF_REGISTER; +setsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_VIF, (void *)&vc, + sizeof(vc)); +.Ed +.Bd -literal +/* IPv6 */ +struct mif6ctl mc; +memset(&mc, 0, sizeof(mc)); +/* Assign all mif6ctl fields as appropriate */ +\&... +if (is_pim_register_vif) + mc.mif6c_flags |= MIFF_REGISTER; +setsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_ADD_MIF, (void *)&mc, + sizeof(mc)); +.Ed +.Pp +Sending or receiving of PIM packets can be accomplished by +opening first a +.Dq raw socket +(see +.Xr socket 2 ) , +with protocol value of +.Dq IPPROTO_PIM : +.Bd -literal +/* IPv4 */ +int pim_s4; +pim_s4 = socket(AF_INET, SOCK_RAW, IPPROTO_PIM); +.Ed +.Bd -literal +/* IPv6 */ +int pim_s6; +pim_s6 = socket(AF_INET6, SOCK_RAW, IPPROTO_PIM); +.Ed +.Pp +Then, the following system calls can be used to send or receive PIM +packets: +.Xr sendto 2 , +.Xr sendmsg 2 , +.Xr recvfrom 2 , +.Xr recvmsg 2 . +.\" +.Sh SEE ALSO +.Xr getsockopt 2 , +.Xr recvfrom 2 , +.Xr recvmsg 2 , +.Xr sendmsg 2 , +.Xr sendto 2 , +.Xr setsockopt 2 , +.Xr socket 2 , +.Xr inet 4 , +.Xr intro 4 , +.Xr ip 4 , +.Xr multicast 4 +.\" +.Sh STANDARDS +.\" XXX the PIM-SM number must be updated after RFC 2362 is +.\" replaced by a new RFC by the end of year 2003 or so. +The PIM-SM protocol is specified in RFC 2362 (to be replaced by +.Xr draft-ietf-pim-sm-v2-new-* ) . +The PIM-DM protocol is specified in +.Xr draft-ietf-pim-dm-new-v2-* ) . +.\" +.Sh AUTHORS +The original IPv4 PIM kernel support for IRIX and SunOS-4.x was +implemented by Ahmed Helmy (USC and SGI). +Later the code was ported to various BSD flavors and modified by +George Edmond Eddy (Rusty) (ISI), +Hitoshi Asaeda (WIDE Project), and Pavlin Radoslavov (USC/ISI and ICSI). +The IPv6 PIM kernel support was implemented by the KAME project +(http://www.kame.net), and was based on the IPv4 PIM kernel support. +.Pp +This manual page was written by Pavlin Radoslavov (ICSI).