2005-01-07 01:45:51 +00:00
|
|
|
/*-
|
2017-11-20 19:43:44 +00:00
|
|
|
* SPDX-License-Identifier: BSD-3-Clause
|
|
|
|
*
|
1994-05-24 10:09:53 +00:00
|
|
|
* Copyright (c) 1982, 1986, 1991, 1993
|
|
|
|
* The Regents of the University of California. All rights reserved.
|
2004-11-13 17:05:40 +00:00
|
|
|
* Copyright (C) 2001 WIDE Project. All rights reserved.
|
1994-05-24 10:09:53 +00:00
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
2017-02-28 23:42:47 +00:00
|
|
|
* 3. Neither the name of the University nor the names of its contributors
|
1994-05-24 10:09:53 +00:00
|
|
|
* may be used to endorse or promote products derived from this software
|
|
|
|
* without specific prior written permission.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
|
|
|
|
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
|
|
|
|
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
|
|
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
|
|
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
|
|
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
|
|
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
|
|
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
|
|
* SUCH DAMAGE.
|
|
|
|
*
|
1995-09-21 17:50:45 +00:00
|
|
|
* @(#)in.c 8.4 (Berkeley) 1/9/95
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
|
|
|
|
2007-10-07 20:44:24 +00:00
|
|
|
#include <sys/cdefs.h>
|
|
|
|
__FBSDID("$FreeBSD$");
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <sys/param.h>
|
2013-10-28 07:29:16 +00:00
|
|
|
#include <sys/eventhandler.h>
|
1994-05-25 09:21:21 +00:00
|
|
|
#include <sys/systm.h>
|
1997-03-24 11:33:46 +00:00
|
|
|
#include <sys/sockio.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <sys/malloc.h>
|
2006-11-06 13:42:10 +00:00
|
|
|
#include <sys/priv.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <sys/socket.h>
|
2009-01-09 13:06:56 +00:00
|
|
|
#include <sys/jail.h>
|
1995-12-09 20:43:53 +00:00
|
|
|
#include <sys/kernel.h>
|
2015-07-29 08:12:05 +00:00
|
|
|
#include <sys/lock.h>
|
2009-01-09 13:06:56 +00:00
|
|
|
#include <sys/proc.h>
|
2015-07-29 08:12:05 +00:00
|
|
|
#include <sys/rmlock.h>
|
1995-12-09 20:43:53 +00:00
|
|
|
#include <sys/sysctl.h>
|
2009-05-12 07:41:20 +00:00
|
|
|
#include <sys/syslog.h>
|
2013-11-05 07:44:15 +00:00
|
|
|
#include <sys/sx.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
|
|
|
|
#include <net/if.h>
|
2009-07-27 17:08:06 +00:00
|
|
|
#include <net/if_var.h>
|
2010-11-12 22:03:02 +00:00
|
|
|
#include <net/if_arp.h>
|
2009-05-12 07:41:20 +00:00
|
|
|
#include <net/if_dl.h>
|
This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
|
|
|
#include <net/if_llatbl.h>
|
1999-12-22 19:13:38 +00:00
|
|
|
#include <net/if_types.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <net/route.h>
|
Split rtinit() into multiple functions.
rtinit[1]() is a function used to add or remove interface address prefix routes,
similar to ifa_maintain_loopback_route().
It was intended to be family-agnostic. There is a problem with this approach
in reality.
1) IPv6 code does not use it for the ifa routes. There is a separate layer,
nd6_prelist_(), providing interface for maintaining interface routes. Its part,
responsible for the actual route table interaction, mimics rtenty() code.
2) rtinit tries to combine multiple actions in the same function: constructing
proper route attributes and handling iterations over multiple fibs, for the
non-zero net.add_addr_allfibs use case. It notably increases the code complexity.
3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag
for p2p connections, host routes and p2p routes are handled in the same way.
Additionally, mapping IFA flags to RTF flags makes the interface pretty messy.
It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface
aliases.
4) rtinit() is the last customer passing non-masked prefixes to rib_action(),
complicating rib_action() implementation.
5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive"
ifa messages in certain corner cases.
To address all these points, the following has been done:
* rtinit() has been split into multiple functions:
- Route attribute construction were moved to the per-address-family functions,
dealing with (2), (3) and (4).
- funnction providing net.add_addr_allfibs handling and route rtsock notificaions
is the new routing table inteface.
- rtsock ifa notificaion has been moved out as well. resulting set of funcion are only
responsible for the actual route notifications.
Side effects:
* /32 alias does not result in interface routes (/32 route and "host" route)
* RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses
Differential revision: https://reviews.freebsd.org/D28186
2021-01-09 00:19:25 +00:00
|
|
|
#include <net/route/nhop.h>
|
|
|
|
#include <net/route/route_ctl.h>
|
2009-05-12 07:41:20 +00:00
|
|
|
#include <net/vnet.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
|
A major overhaul of the CARP implementation. The ip_carp.c was started
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.
The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.
ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.
To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]
The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.
Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!
PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by: bz
Submitted by: bz [1]
2011-12-16 12:16:56 +00:00
|
|
|
#include <netinet/if_ether.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <netinet/in.h>
|
lltable: do not require prefix lookup when checking lle allocation rules.
With the new FIB_ALGO infrastructure, nearly all subsystems use
fib[46]_lookup() functions, which provides lockless lookups.
A number of places remains that uses old-style lookup functions, that
still requires RIB read lock to return the result. One of such places
is arp processing code.
FIB_ALGO implementation makes some tradeoffs, resulting in (relatively)
prolonged periods of holding RIB_WLOCK. If the lock is held and datapath
competes for it, the RX ring may get blocked, ending in traffic delays and losses.
As currently arp processing is performed directly in the interrupt handler,
handling ARP replies triggers the problem descibed above when the amount of
ARP replies is high.
To be more specific, prior to creating new ARP entry, routing lookup for the entry
address in interface fib is executed. The following conditions are the verified:
1. If lookup returns an empty result, or the resulting prefix is non-directly-reachable,
failure is returned. The only exception are host routes w/ gateway==address.
2. If the routing lookup returns different interface and non-host route,
we want to support the use case of having multiple interfaces with the same prefix.
In fact, the current code just checks if the returned prefix covers target address
(always true) and effectively allow allocating ARP entries for any directly-reachable prefix,
regardless of its interface.
Change the code to perform the following:
1) use fib4_lookup() to get the nexthop, instead of requesting exact prefix.
2) Rewrite first condition check using nexthop flags (1:1 match)
3) Rewrite second condition to check for interface addresses matching target address on
the input interface.
Differential Revision: https://reviews.freebsd.org/D31824
Reviewed by: ae
MFC after: 1 week
PR: 257965
2021-09-03 11:48:36 +00:00
|
|
|
#include <netinet/in_fib.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <netinet/in_var.h>
|
2001-08-04 17:10:14 +00:00
|
|
|
#include <netinet/in_pcb.h>
|
Import rewrite of IPv4 socket multicast layer to support source-specific
and protocol-independent host mode multicast. The code is written to
accomodate IPv6, IGMPv3 and MLDv2 with only a little additional work.
This change only pertains to FreeBSD's use as a multicast end-station and
does not concern multicast routing; for an IGMPv3/MLDv2 router
implementation, consider the XORP project.
The work is based on Wilbert de Graaf's IGMPv3 code drop for FreeBSD 4.6,
which is available at: http://www.kloosterhof.com/wilbert/igmpv3.html
Summary
* IPv4 multicast socket processing is now moved out of ip_output.c
into a new module, in_mcast.c.
* The in_mcast.c module implements the IPv4 legacy any-source API in
terms of the protocol-independent source-specific API.
* Source filters are lazy allocated as the common case does not use them.
They are part of per inpcb state and are covered by the inpcb lock.
* struct ip_mreqn is now supported to allow applications to specify
multicast joins by interface index in the legacy IPv4 any-source API.
* In UDP, an incoming multicast datagram only requires that the source
port matches the 4-tuple if the socket was already bound by source port.
An unbound socket SHOULD be able to receive multicasts sent from an
ephemeral source port.
* The UDP socket multicast filter mode defaults to exclusive, that is,
sources present in the per-socket list will be blocked from delivery.
* The RFC 3678 userland functions have been added to libc: setsourcefilter,
getsourcefilter, setipv4sourcefilter, getipv4sourcefilter.
* Definitions for IGMPv3 are merged but not yet used.
* struct sockaddr_storage is now referenced from <netinet/in.h>. It
is therefore defined there if not already declared in the same way
as for the C99 types.
* The RFC 1724 hack (specify 0.0.0.0/8 addresses to IP_MULTICAST_IF
which are then interpreted as interface indexes) is now deprecated.
* A patch for the Rhyolite.com routed in the FreeBSD base system
is available in the -net archives. This only affects individuals
running RIPv1 or RIPv2 via point-to-point and/or unnumbered interfaces.
* Make IPv6 detach path similar to IPv4's in code flow; functionally same.
* Bump __FreeBSD_version to 700048; see UPDATING.
This work was financially supported by another FreeBSD committer.
Obtained from: p4://bms_netdev
Submitted by: Wilbert de Graaf (original work)
Reviewed by: rwatson (locking), silence from fenner,
net@ (but with encouragement)
2007-06-12 16:24:56 +00:00
|
|
|
#include <netinet/ip_var.h>
|
A major overhaul of the CARP implementation. The ip_carp.c was started
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.
The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.
ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.
To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]
The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.
Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!
PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by: bz
Submitted by: bz [1]
2011-12-16 12:16:56 +00:00
|
|
|
#include <netinet/ip_carp.h>
|
2009-03-09 17:53:05 +00:00
|
|
|
#include <netinet/igmp_var.h>
|
Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.
Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.
Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.
This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.
Bump __FreeBSD_version and update UPDATING.
Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)
2009-07-14 22:48:30 +00:00
|
|
|
#include <netinet/udp.h>
|
|
|
|
#include <netinet/udp_var.h>
|
1997-10-11 18:31:40 +00:00
|
|
|
|
2013-11-06 19:46:20 +00:00
|
|
|
static int in_aifaddr_ioctl(u_long, caddr_t, struct ifnet *, struct thread *);
|
2017-01-25 19:04:08 +00:00
|
|
|
static int in_difaddr_ioctl(u_long, caddr_t, struct ifnet *, struct thread *);
|
2020-10-14 09:22:54 +00:00
|
|
|
static int in_gifaddr_ioctl(u_long, caddr_t, struct ifnet *, struct thread *);
|
1999-12-22 19:13:38 +00:00
|
|
|
|
2002-03-19 21:25:46 +00:00
|
|
|
static void in_socktrim(struct sockaddr_in *);
|
2007-03-20 00:36:10 +00:00
|
|
|
static void in_purgemaddrs(struct ifnet *);
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2021-01-19 23:50:34 +00:00
|
|
|
static bool ia_need_loopback_route(const struct in_ifaddr *);
|
|
|
|
|
2018-07-24 16:35:52 +00:00
|
|
|
VNET_DEFINE_STATIC(int, nosameprefix);
|
A major overhaul of the CARP implementation. The ip_carp.c was started
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.
The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.
ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.
To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]
The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.
Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!
PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by: bz
Submitted by: bz [1]
2011-12-16 12:16:56 +00:00
|
|
|
#define V_nosameprefix VNET(nosameprefix)
|
2014-11-07 09:39:05 +00:00
|
|
|
SYSCTL_INT(_net_inet_ip, OID_AUTO, no_same_prefix, CTLFLAG_VNET | CTLFLAG_RW,
|
A major overhaul of the CARP implementation. The ip_carp.c was started
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.
The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.
ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.
To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]
The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.
Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!
PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by: bz
Submitted by: bz [1]
2011-12-16 12:16:56 +00:00
|
|
|
&VNET_NAME(nosameprefix), 0,
|
2005-08-18 10:34:30 +00:00
|
|
|
"Refuse to create same prefixes on different interfaces");
|
1997-01-13 21:26:53 +00:00
|
|
|
|
2021-09-05 18:14:04 +00:00
|
|
|
VNET_DEFINE_STATIC(bool, broadcast_lowest);
|
|
|
|
#define V_broadcast_lowest VNET(broadcast_lowest)
|
|
|
|
SYSCTL_BOOL(_net_inet_ip, OID_AUTO, broadcast_lowest, CTLFLAG_VNET | CTLFLAG_RW,
|
|
|
|
&VNET_NAME(broadcast_lowest), 0,
|
|
|
|
"Treat lowest address on a subnet (host 0) as broadcast");
|
|
|
|
|
2010-04-29 11:52:42 +00:00
|
|
|
VNET_DECLARE(struct inpcbinfo, ripcbinfo);
|
|
|
|
#define V_ripcbinfo VNET(ripcbinfo)
|
|
|
|
|
2013-11-05 07:44:15 +00:00
|
|
|
static struct sx in_control_sx;
|
|
|
|
SX_SYSINIT(in_control_sx, &in_control_sx, "in_control");
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
|
|
|
* Return 1 if an internet address is for a ``local'' host
|
2011-10-15 16:28:06 +00:00
|
|
|
* (one to which we have a connection).
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
1994-05-25 09:21:21 +00:00
|
|
|
int
|
2007-05-10 15:58:48 +00:00
|
|
|
in_localaddr(struct in_addr in)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2015-07-29 08:12:05 +00:00
|
|
|
struct rm_priotracker in_ifa_tracker;
|
2017-05-17 00:34:34 +00:00
|
|
|
u_long i = ntohl(in.s_addr);
|
|
|
|
struct in_ifaddr *ia;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2015-07-29 08:12:05 +00:00
|
|
|
IN_IFADDR_RLOCK(&in_ifa_tracker);
|
2018-05-18 20:13:34 +00:00
|
|
|
CK_STAILQ_FOREACH(ia, &V_in_ifaddrhead, ia_link) {
|
2011-10-15 16:28:06 +00:00
|
|
|
if ((i & ia->ia_subnetmask) == ia->ia_subnet) {
|
2015-07-29 08:12:05 +00:00
|
|
|
IN_IFADDR_RUNLOCK(&in_ifa_tracker);
|
2011-10-15 16:28:06 +00:00
|
|
|
return (1);
|
2009-06-25 11:52:33 +00:00
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
2015-07-29 08:12:05 +00:00
|
|
|
IN_IFADDR_RUNLOCK(&in_ifa_tracker);
|
1994-05-24 10:09:53 +00:00
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2004-08-11 11:49:48 +00:00
|
|
|
/*
|
|
|
|
* Return 1 if an internet address is for the local host and configured
|
|
|
|
* on one of its interfaces.
|
|
|
|
*/
|
|
|
|
int
|
2007-05-10 15:58:48 +00:00
|
|
|
in_localip(struct in_addr in)
|
2004-08-11 11:49:48 +00:00
|
|
|
{
|
2015-07-29 08:12:05 +00:00
|
|
|
struct rm_priotracker in_ifa_tracker;
|
2004-08-11 11:49:48 +00:00
|
|
|
struct in_ifaddr *ia;
|
|
|
|
|
2015-07-29 08:12:05 +00:00
|
|
|
IN_IFADDR_RLOCK(&in_ifa_tracker);
|
2004-08-11 11:49:48 +00:00
|
|
|
LIST_FOREACH(ia, INADDR_HASH(in.s_addr), ia_hash) {
|
2009-06-25 11:52:33 +00:00
|
|
|
if (IA_SIN(ia)->sin_addr.s_addr == in.s_addr) {
|
2015-07-29 08:12:05 +00:00
|
|
|
IN_IFADDR_RUNLOCK(&in_ifa_tracker);
|
2008-10-26 19:17:25 +00:00
|
|
|
return (1);
|
2009-06-25 11:52:33 +00:00
|
|
|
}
|
2004-08-11 11:49:48 +00:00
|
|
|
}
|
2015-07-29 08:12:05 +00:00
|
|
|
IN_IFADDR_RUNLOCK(&in_ifa_tracker);
|
2008-10-26 19:17:25 +00:00
|
|
|
return (0);
|
2004-08-11 11:49:48 +00:00
|
|
|
}
|
|
|
|
|
2015-04-17 11:57:06 +00:00
|
|
|
/*
|
|
|
|
* Return 1 if an internet address is configured on an interface.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
in_ifhasaddr(struct ifnet *ifp, struct in_addr in)
|
|
|
|
{
|
|
|
|
struct ifaddr *ifa;
|
|
|
|
struct in_ifaddr *ia;
|
|
|
|
|
Widen NET_EPOCH coverage.
When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.
However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.
Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.
On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().
This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.
Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.
This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.
Reviewed by: gallatin, hselasky, cy, adrian, kristof
Differential Revision: https://reviews.freebsd.org/D19111
2019-10-07 22:40:05 +00:00
|
|
|
NET_EPOCH_ASSERT();
|
|
|
|
|
2018-05-18 20:13:34 +00:00
|
|
|
CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
|
2015-04-17 11:57:06 +00:00
|
|
|
if (ifa->ifa_addr->sa_family != AF_INET)
|
|
|
|
continue;
|
|
|
|
ia = (struct in_ifaddr *)ifa;
|
Widen NET_EPOCH coverage.
When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.
However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.
Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.
On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().
This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.
Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.
This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.
Reviewed by: gallatin, hselasky, cy, adrian, kristof
Differential Revision: https://reviews.freebsd.org/D19111
2019-10-07 22:40:05 +00:00
|
|
|
if (ia->ia_addr.sin_addr.s_addr == in.s_addr)
|
2015-04-17 11:57:06 +00:00
|
|
|
return (1);
|
|
|
|
}
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2013-11-05 07:44:15 +00:00
|
|
|
/*
|
|
|
|
* Return a reference to the interface address which is different to
|
|
|
|
* the supplied one but with same IP address value.
|
|
|
|
*/
|
|
|
|
static struct in_ifaddr *
|
2021-02-16 20:00:46 +00:00
|
|
|
in_localip_more(struct in_ifaddr *original_ia)
|
2013-11-05 07:44:15 +00:00
|
|
|
{
|
2015-07-29 08:12:05 +00:00
|
|
|
struct rm_priotracker in_ifa_tracker;
|
2021-02-16 20:00:46 +00:00
|
|
|
in_addr_t original_addr = IA_SIN(original_ia)->sin_addr.s_addr;
|
|
|
|
uint32_t original_fib = original_ia->ia_ifa.ifa_ifp->if_fib;
|
|
|
|
struct in_ifaddr *ia;
|
2013-11-05 07:44:15 +00:00
|
|
|
|
2015-07-29 08:12:05 +00:00
|
|
|
IN_IFADDR_RLOCK(&in_ifa_tracker);
|
2021-02-16 20:00:46 +00:00
|
|
|
LIST_FOREACH(ia, INADDR_HASH(original_addr), ia_hash) {
|
|
|
|
in_addr_t addr = IA_SIN(ia)->sin_addr.s_addr;
|
|
|
|
uint32_t fib = ia->ia_ifa.ifa_ifp->if_fib;
|
|
|
|
if (!V_rt_add_addr_allfibs && (original_fib != fib))
|
|
|
|
continue;
|
|
|
|
if ((original_ia != ia) && (original_addr == addr)) {
|
|
|
|
ifa_ref(&ia->ia_ifa);
|
2015-07-29 08:12:05 +00:00
|
|
|
IN_IFADDR_RUNLOCK(&in_ifa_tracker);
|
2021-02-16 20:00:46 +00:00
|
|
|
return (ia);
|
2013-11-05 07:44:15 +00:00
|
|
|
}
|
|
|
|
}
|
2015-07-29 08:12:05 +00:00
|
|
|
IN_IFADDR_RUNLOCK(&in_ifa_tracker);
|
2013-11-05 07:44:15 +00:00
|
|
|
|
|
|
|
return (NULL);
|
|
|
|
}
|
|
|
|
|
2021-09-06 22:08:15 +00:00
|
|
|
/*
|
|
|
|
* Tries to find first IPv4 address in the provided fib.
|
|
|
|
* Prefers non-loopback addresses and return loopback IFF
|
|
|
|
* @loopback_ok is set.
|
|
|
|
*
|
|
|
|
* Returns ifa or NULL.
|
|
|
|
*/
|
|
|
|
struct in_ifaddr *
|
|
|
|
in_findlocal(uint32_t fibnum, bool loopback_ok)
|
|
|
|
{
|
|
|
|
struct rm_priotracker in_ifa_tracker;
|
|
|
|
struct in_ifaddr *ia = NULL, *ia_lo = NULL;
|
|
|
|
|
|
|
|
NET_EPOCH_ASSERT();
|
|
|
|
|
|
|
|
IN_IFADDR_RLOCK(&in_ifa_tracker);
|
|
|
|
CK_STAILQ_FOREACH(ia, &V_in_ifaddrhead, ia_link) {
|
|
|
|
uint32_t ia_fib = ia->ia_ifa.ifa_ifp->if_fib;
|
|
|
|
if (!V_rt_add_addr_allfibs && (fibnum != ia_fib))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (!IN_LOOPBACK(ntohl(IA_SIN(ia)->sin_addr.s_addr)))
|
|
|
|
break;
|
|
|
|
if (loopback_ok)
|
|
|
|
ia_lo = ia;
|
|
|
|
}
|
|
|
|
IN_IFADDR_RUNLOCK(&in_ifa_tracker);
|
|
|
|
|
|
|
|
if (ia == NULL)
|
|
|
|
ia = ia_lo;
|
|
|
|
|
|
|
|
return (ia);
|
|
|
|
}
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
|
|
|
* Determine whether an IP address is in a reserved set of addresses
|
|
|
|
* that may not be forwarded, or whether datagrams to that destination
|
|
|
|
* may be forwarded.
|
|
|
|
*/
|
1994-05-25 09:21:21 +00:00
|
|
|
int
|
2007-05-10 15:58:48 +00:00
|
|
|
in_canforward(struct in_addr in)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2017-05-17 00:34:34 +00:00
|
|
|
u_long i = ntohl(in.s_addr);
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2019-04-04 19:01:13 +00:00
|
|
|
if (IN_EXPERIMENTAL(i) || IN_MULTICAST(i) || IN_LINKLOCAL(i) ||
|
|
|
|
IN_ZERONET(i) || IN_LOOPBACK(i))
|
1994-05-24 10:09:53 +00:00
|
|
|
return (0);
|
|
|
|
return (1);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Trim a mask in a sockaddr
|
|
|
|
*/
|
1995-11-14 20:34:56 +00:00
|
|
|
static void
|
2007-05-10 15:58:48 +00:00
|
|
|
in_socktrim(struct sockaddr_in *ap)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2017-05-17 00:34:34 +00:00
|
|
|
char *cplim = (char *) &ap->sin_addr;
|
|
|
|
char *cp = (char *) (&ap->sin_addr + 1);
|
1994-05-24 10:09:53 +00:00
|
|
|
|
|
|
|
ap->sin_len = 0;
|
1994-11-03 21:04:21 +00:00
|
|
|
while (--cp >= cplim)
|
2004-08-16 18:32:07 +00:00
|
|
|
if (*cp) {
|
1994-05-24 10:09:53 +00:00
|
|
|
(ap)->sin_len = cp - (char *) (ap) + 1;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Generic internet control operations (ioctl's).
|
|
|
|
*/
|
1994-05-25 09:21:21 +00:00
|
|
|
int
|
2007-05-10 15:58:48 +00:00
|
|
|
in_control(struct socket *so, u_long cmd, caddr_t data, struct ifnet *ifp,
|
|
|
|
struct thread *td)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2013-11-05 07:44:15 +00:00
|
|
|
struct ifreq *ifr = (struct ifreq *)data;
|
|
|
|
struct sockaddr_in *addr = (struct sockaddr_in *)&ifr->ifr_addr;
|
2019-01-09 01:11:19 +00:00
|
|
|
struct epoch_tracker et;
|
2013-11-06 08:36:08 +00:00
|
|
|
struct ifaddr *ifa;
|
2013-11-05 07:44:15 +00:00
|
|
|
struct in_ifaddr *ia;
|
|
|
|
int error;
|
2001-11-30 14:00:55 +00:00
|
|
|
|
2013-11-05 07:44:15 +00:00
|
|
|
if (ifp == NULL)
|
|
|
|
return (EADDRNOTAVAIL);
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2009-04-23 21:41:37 +00:00
|
|
|
/*
|
2013-11-05 07:44:15 +00:00
|
|
|
* Filter out 4 ioctls we implement directly. Forward the rest
|
|
|
|
* to specific functions and ifp->if_ioctl().
|
2009-04-23 21:41:37 +00:00
|
|
|
*/
|
1999-12-22 19:13:38 +00:00
|
|
|
switch (cmd) {
|
2009-04-23 21:41:37 +00:00
|
|
|
case SIOCGIFADDR:
|
|
|
|
case SIOCGIFBRDADDR:
|
|
|
|
case SIOCGIFDSTADDR:
|
|
|
|
case SIOCGIFNETMASK:
|
2011-11-21 14:10:13 +00:00
|
|
|
break;
|
2020-10-14 09:22:54 +00:00
|
|
|
case SIOCGIFALIAS:
|
|
|
|
sx_xlock(&in_control_sx);
|
|
|
|
error = in_gifaddr_ioctl(cmd, data, ifp, td);
|
|
|
|
sx_xunlock(&in_control_sx);
|
|
|
|
return (error);
|
2013-11-05 07:44:15 +00:00
|
|
|
case SIOCDIFADDR:
|
|
|
|
sx_xlock(&in_control_sx);
|
2017-01-25 19:04:08 +00:00
|
|
|
error = in_difaddr_ioctl(cmd, data, ifp, td);
|
2013-11-05 07:44:15 +00:00
|
|
|
sx_xunlock(&in_control_sx);
|
|
|
|
return (error);
|
2013-11-06 19:46:20 +00:00
|
|
|
case OSIOCAIFADDR: /* 9.x compat */
|
2011-11-21 14:10:13 +00:00
|
|
|
case SIOCAIFADDR:
|
2013-11-05 07:44:15 +00:00
|
|
|
sx_xlock(&in_control_sx);
|
2013-11-06 19:46:20 +00:00
|
|
|
error = in_aifaddr_ioctl(cmd, data, ifp, td);
|
2013-11-05 07:44:15 +00:00
|
|
|
sx_xunlock(&in_control_sx);
|
|
|
|
return (error);
|
2009-04-23 21:41:37 +00:00
|
|
|
case SIOCSIFADDR:
|
|
|
|
case SIOCSIFBRDADDR:
|
|
|
|
case SIOCSIFDSTADDR:
|
|
|
|
case SIOCSIFNETMASK:
|
2012-01-16 09:53:24 +00:00
|
|
|
/* We no longer support that old commands. */
|
|
|
|
return (EINVAL);
|
2009-04-23 21:41:37 +00:00
|
|
|
default:
|
2013-11-05 07:44:15 +00:00
|
|
|
if (ifp->if_ioctl == NULL)
|
2009-04-23 21:41:37 +00:00
|
|
|
return (EOPNOTSUPP);
|
|
|
|
return ((*ifp->if_ioctl)(ifp, cmd, data));
|
1999-12-22 19:13:38 +00:00
|
|
|
}
|
|
|
|
|
2013-11-06 08:36:08 +00:00
|
|
|
if (addr->sin_addr.s_addr != INADDR_ANY &&
|
|
|
|
prison_check_ip4(td->td_ucred, &addr->sin_addr) != 0)
|
|
|
|
return (EADDRNOTAVAIL);
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
2014-08-22 19:08:12 +00:00
|
|
|
* Find address for this interface, if it exists. If an
|
|
|
|
* address was specified, find that one instead of the
|
|
|
|
* first one on the interface, if possible.
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
2019-01-09 01:11:19 +00:00
|
|
|
NET_EPOCH_ENTER(et);
|
2018-05-18 20:13:34 +00:00
|
|
|
CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
|
2013-12-29 22:20:06 +00:00
|
|
|
if (ifa->ifa_addr->sa_family != AF_INET)
|
|
|
|
continue;
|
2013-11-06 08:36:08 +00:00
|
|
|
ia = (struct in_ifaddr *)ifa;
|
|
|
|
if (ia->ia_addr.sin_addr.s_addr == addr->sin_addr.s_addr)
|
2009-04-23 21:41:37 +00:00
|
|
|
break;
|
|
|
|
}
|
2014-08-22 19:08:12 +00:00
|
|
|
if (ifa == NULL)
|
2018-05-18 20:13:34 +00:00
|
|
|
CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link)
|
2014-08-22 19:08:12 +00:00
|
|
|
if (ifa->ifa_addr->sa_family == AF_INET) {
|
|
|
|
ia = (struct in_ifaddr *)ifa;
|
|
|
|
if (prison_check_ip4(td->td_ucred,
|
|
|
|
&ia->ia_addr.sin_addr) == 0)
|
|
|
|
break;
|
|
|
|
}
|
2013-11-05 07:44:15 +00:00
|
|
|
|
2013-11-06 08:36:08 +00:00
|
|
|
if (ifa == NULL) {
|
2019-01-09 01:11:19 +00:00
|
|
|
NET_EPOCH_EXIT(et);
|
2013-11-05 07:44:15 +00:00
|
|
|
return (EADDRNOTAVAIL);
|
2001-09-29 04:34:11 +00:00
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2009-04-25 23:02:57 +00:00
|
|
|
error = 0;
|
1994-05-24 10:09:53 +00:00
|
|
|
switch (cmd) {
|
|
|
|
case SIOCGIFADDR:
|
2013-11-05 07:44:15 +00:00
|
|
|
*addr = ia->ia_addr;
|
1994-05-24 10:09:53 +00:00
|
|
|
break;
|
|
|
|
|
|
|
|
case SIOCGIFBRDADDR:
|
2009-04-25 23:02:57 +00:00
|
|
|
if ((ifp->if_flags & IFF_BROADCAST) == 0) {
|
|
|
|
error = EINVAL;
|
2013-11-05 07:44:15 +00:00
|
|
|
break;
|
2009-04-25 23:02:57 +00:00
|
|
|
}
|
2013-11-05 07:44:15 +00:00
|
|
|
*addr = ia->ia_broadaddr;
|
|
|
|
break;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
|
|
|
case SIOCGIFDSTADDR:
|
2009-04-25 23:02:57 +00:00
|
|
|
if ((ifp->if_flags & IFF_POINTOPOINT) == 0) {
|
|
|
|
error = EINVAL;
|
2013-11-05 07:44:15 +00:00
|
|
|
break;
|
2009-04-25 23:02:57 +00:00
|
|
|
}
|
2013-11-05 07:44:15 +00:00
|
|
|
*addr = ia->ia_dstaddr;
|
|
|
|
break;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
|
|
|
case SIOCGIFNETMASK:
|
2013-11-05 07:44:15 +00:00
|
|
|
*addr = ia->ia_sockmask;
|
|
|
|
break;
|
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2019-01-09 01:11:19 +00:00
|
|
|
NET_EPOCH_EXIT(et);
|
2001-11-30 14:00:55 +00:00
|
|
|
|
2013-11-05 07:44:15 +00:00
|
|
|
return (error);
|
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2013-11-05 07:44:15 +00:00
|
|
|
static int
|
2013-11-06 19:46:20 +00:00
|
|
|
in_aifaddr_ioctl(u_long cmd, caddr_t data, struct ifnet *ifp, struct thread *td)
|
2013-11-05 07:44:15 +00:00
|
|
|
{
|
|
|
|
const struct in_aliasreq *ifra = (struct in_aliasreq *)data;
|
|
|
|
const struct sockaddr_in *addr = &ifra->ifra_addr;
|
|
|
|
const struct sockaddr_in *broadaddr = &ifra->ifra_broadaddr;
|
|
|
|
const struct sockaddr_in *mask = &ifra->ifra_mask;
|
|
|
|
const struct sockaddr_in *dstaddr = &ifra->ifra_dstaddr;
|
2013-11-06 19:46:20 +00:00
|
|
|
const int vhid = (cmd == SIOCAIFADDR) ? ifra->ifra_vhid : 0;
|
2019-01-09 01:11:19 +00:00
|
|
|
struct epoch_tracker et;
|
2013-11-05 07:44:15 +00:00
|
|
|
struct ifaddr *ifa;
|
|
|
|
struct in_ifaddr *ia;
|
|
|
|
bool iaIsFirst;
|
|
|
|
int error = 0;
|
|
|
|
|
|
|
|
error = priv_check(td, PRIV_NET_ADDIFADDR);
|
|
|
|
if (error)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* ifra_addr must be present and be of INET family.
|
|
|
|
* ifra_broadaddr/ifra_dstaddr and ifra_mask are optional.
|
|
|
|
*/
|
|
|
|
if (addr->sin_len != sizeof(struct sockaddr_in) ||
|
|
|
|
addr->sin_family != AF_INET)
|
|
|
|
return (EINVAL);
|
|
|
|
if (broadaddr->sin_len != 0 &&
|
|
|
|
(broadaddr->sin_len != sizeof(struct sockaddr_in) ||
|
|
|
|
broadaddr->sin_family != AF_INET))
|
|
|
|
return (EINVAL);
|
|
|
|
if (mask->sin_len != 0 &&
|
|
|
|
(mask->sin_len != sizeof(struct sockaddr_in) ||
|
|
|
|
mask->sin_family != AF_INET))
|
|
|
|
return (EINVAL);
|
|
|
|
if ((ifp->if_flags & IFF_POINTOPOINT) &&
|
|
|
|
(dstaddr->sin_len != sizeof(struct sockaddr_in) ||
|
|
|
|
dstaddr->sin_addr.s_addr == INADDR_ANY))
|
|
|
|
return (EDESTADDRREQ);
|
netinet: prevent NULL pointer dereference in in_aifaddr_ioctl()
It appears that maliciously crafted ifaliasreq can lead to NULL
pointer dereference in in_aifaddr_ioctl(). In order to replicate
that, one needs to
1. Ensure that carp(4) is not loaded
2. Issue SIOCAIFADDR call setting ifra_vhid field of the request
to a negative value.
A repro code would look like this.
int main() {
struct ifaliasreq req;
struct sockaddr_in sin, mask;
int fd, error;
bzero(&sin, sizeof(struct sockaddr_in));
bzero(&mask, sizeof(struct sockaddr_in));
sin.sin_len = sizeof(struct sockaddr_in);
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = inet_addr("192.168.88.2");
mask.sin_len = sizeof(struct sockaddr_in);
mask.sin_family = AF_INET;
mask.sin_addr.s_addr = inet_addr("255.255.255.0");
fd = socket(AF_INET, SOCK_DGRAM, 0);
if (fd < 0)
return (-1);
memset(&req, 0, sizeof(struct ifaliasreq));
strlcpy(req.ifra_name, "lo0", sizeof(req.ifra_name));
memcpy(&req.ifra_addr, &sin, sin.sin_len);
memcpy(&req.ifra_mask, &mask, mask.sin_len);
req.ifra_vhid = -1;
return ioctl(fd, SIOCAIFADDR, (char *)&req);
}
To fix, discard both positive and negative vhid values in
in_aifaddr_ioctl, if carp(4) is not loaded. This prevents NULL pointer
dereference and kernel panic.
Reviewed by: imp@
Pull Request: https://github.com/freebsd/freebsd-src/pull/530
2021-08-24 14:26:35 +00:00
|
|
|
if (vhid != 0 && carp_attach_p == NULL)
|
2013-11-05 07:44:15 +00:00
|
|
|
return (EPROTONOSUPPORT);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* See whether address already exist.
|
|
|
|
*/
|
|
|
|
iaIsFirst = true;
|
|
|
|
ia = NULL;
|
2019-01-09 01:11:19 +00:00
|
|
|
NET_EPOCH_ENTER(et);
|
2018-05-18 20:13:34 +00:00
|
|
|
CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
|
2013-12-29 22:20:06 +00:00
|
|
|
struct in_ifaddr *it;
|
2013-11-05 07:44:15 +00:00
|
|
|
|
2013-12-29 22:20:06 +00:00
|
|
|
if (ifa->ifa_addr->sa_family != AF_INET)
|
2013-11-05 07:44:15 +00:00
|
|
|
continue;
|
|
|
|
|
2013-12-29 22:20:06 +00:00
|
|
|
it = (struct in_ifaddr *)ifa;
|
2013-11-05 07:44:15 +00:00
|
|
|
if (it->ia_addr.sin_addr.s_addr == addr->sin_addr.s_addr &&
|
|
|
|
prison_check_ip4(td->td_ucred, &addr->sin_addr) == 0)
|
|
|
|
ia = it;
|
2020-10-13 19:34:36 +00:00
|
|
|
else
|
|
|
|
iaIsFirst = false;
|
2013-11-05 07:44:15 +00:00
|
|
|
}
|
2019-01-09 01:11:19 +00:00
|
|
|
NET_EPOCH_EXIT(et);
|
2013-11-05 07:44:15 +00:00
|
|
|
|
|
|
|
if (ia != NULL)
|
2017-01-25 19:04:08 +00:00
|
|
|
(void )in_difaddr_ioctl(cmd, data, ifp, td);
|
2013-11-05 07:44:15 +00:00
|
|
|
|
|
|
|
ifa = ifa_alloc(sizeof(struct in_ifaddr), M_WAITOK);
|
|
|
|
ia = (struct in_ifaddr *)ifa;
|
|
|
|
ifa->ifa_addr = (struct sockaddr *)&ia->ia_addr;
|
|
|
|
ifa->ifa_dstaddr = (struct sockaddr *)&ia->ia_dstaddr;
|
|
|
|
ifa->ifa_netmask = (struct sockaddr *)&ia->ia_sockmask;
|
Add GARP retransmit capability
A single gratuitous ARP (GARP) is always transmitted when an IPv4
address is added to an interface, and that is usually sufficient.
However, in some circumstances, such as when a shared address is
passed between cluster nodes, this single GARP may occasionally be
dropped or lost. This can lead to neighbors on the network link
working with a stale ARP cache and sending packets destined for
that address to the node that previously owned the address, which
may not respond.
To avoid this situation, GARP retransmissions can be enabled by setting
the net.link.ether.inet.garp_rexmit_count sysctl to a value greater
than zero. The setting represents the maximum number of retransmissions.
The interval between retransmissions is calculated using an exponential
backoff algorithm, doubling each time, so the retransmission intervals
are: {1, 2, 4, 8, 16, ...} (seconds).
Due to the exponential backoff algorithm used for the interval
between GARP retransmissions, the maximum number of retransmissions
is limited to 16 for sanity. This limit corresponds to a maximum
interval between retransmissions of 2^16 seconds ~= 18 hours.
Increasing this limit is possible, but sending out GARPs spaced
days apart would be of little use.
Submitted by: David A. Bright <david.a.bright@dell.com>
MFC after: 1 month
Relnotes: yes
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D7695
2016-10-02 01:42:45 +00:00
|
|
|
callout_init_rw(&ia->ia_garp_timer, &ifp->if_addr_lock,
|
|
|
|
CALLOUT_RETURNUNLOCKED);
|
2013-11-05 07:44:15 +00:00
|
|
|
|
|
|
|
ia->ia_ifp = ifp;
|
|
|
|
ia->ia_addr = *addr;
|
|
|
|
if (mask->sin_len != 0) {
|
|
|
|
ia->ia_sockmask = *mask;
|
|
|
|
ia->ia_subnetmask = ntohl(ia->ia_sockmask.sin_addr.s_addr);
|
|
|
|
} else {
|
|
|
|
in_addr_t i = ntohl(addr->sin_addr.s_addr);
|
2009-04-25 23:02:57 +00:00
|
|
|
|
net/route.c:
A route generated from an RTF_CLONING route had the RTF_WASCLONED flag
set but did not have a reference to the parent route, as documented in
the rtentry(9) manpage. This prevented such routes from being deleted
when their parent route is deleted.
Now, for example, if you delete an IP address from a network interface,
all ARP entries that were cloned from this interface route are flushed.
This also has an impact on netstat(1) output. Previously, dynamically
created ARP cache entries (RTF_STATIC flag is unset) were displayed as
part of the routing table display (-r). Now, they are only printed if
the -a option is given.
netinet/in.c, netinet/in_rmx.c:
When address is removed from an interface, also delete all routes that
point to this interface and address. Previously, for example, if you
changed the address on an interface, outgoing IP datagrams might still
use the old address. The only solution was to delete and re-add some
routes. (The problem is easily observed with the route(8) command.)
Note, that if the socket was already bound to the local address before
this address is removed, new datagrams generated from this socket will
still be sent from the old address.
PR: kern/20785, kern/21914
Reviewed by: wollman (the idea)
2001-03-15 14:52:12 +00:00
|
|
|
/*
|
2013-11-05 07:44:15 +00:00
|
|
|
* Be compatible with network classes, if netmask isn't
|
|
|
|
* supplied, guess it based on classes.
|
|
|
|
*/
|
|
|
|
if (IN_CLASSA(i))
|
|
|
|
ia->ia_subnetmask = IN_CLASSA_NET;
|
|
|
|
else if (IN_CLASSB(i))
|
|
|
|
ia->ia_subnetmask = IN_CLASSB_NET;
|
|
|
|
else
|
|
|
|
ia->ia_subnetmask = IN_CLASSC_NET;
|
|
|
|
ia->ia_sockmask.sin_addr.s_addr = htonl(ia->ia_subnetmask);
|
|
|
|
}
|
|
|
|
ia->ia_subnet = ntohl(addr->sin_addr.s_addr) & ia->ia_subnetmask;
|
|
|
|
in_socktrim(&ia->ia_sockmask);
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2013-11-05 07:44:15 +00:00
|
|
|
if (ifp->if_flags & IFF_BROADCAST) {
|
|
|
|
if (broadaddr->sin_len != 0) {
|
|
|
|
ia->ia_broadaddr = *broadaddr;
|
|
|
|
} else if (ia->ia_subnetmask == IN_RFC3021_MASK) {
|
|
|
|
ia->ia_broadaddr.sin_addr.s_addr = INADDR_BROADCAST;
|
|
|
|
ia->ia_broadaddr.sin_len = sizeof(struct sockaddr_in);
|
|
|
|
ia->ia_broadaddr.sin_family = AF_INET;
|
|
|
|
} else {
|
|
|
|
ia->ia_broadaddr.sin_addr.s_addr =
|
|
|
|
htonl(ia->ia_subnet | ~ia->ia_subnetmask);
|
|
|
|
ia->ia_broadaddr.sin_len = sizeof(struct sockaddr_in);
|
|
|
|
ia->ia_broadaddr.sin_family = AF_INET;
|
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
2001-11-30 14:00:55 +00:00
|
|
|
|
2013-11-05 07:44:15 +00:00
|
|
|
if (ifp->if_flags & IFF_POINTOPOINT)
|
|
|
|
ia->ia_dstaddr = *dstaddr;
|
|
|
|
|
2014-07-31 16:43:56 +00:00
|
|
|
if (vhid != 0) {
|
|
|
|
error = (*carp_attach_p)(&ia->ia_ifa, vhid);
|
|
|
|
if (error)
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2014-01-16 12:35:18 +00:00
|
|
|
/* if_addrhead is already referenced by ifa_alloc() */
|
2013-11-05 07:44:15 +00:00
|
|
|
IF_ADDR_WLOCK(ifp);
|
2018-05-18 20:13:34 +00:00
|
|
|
CK_STAILQ_INSERT_TAIL(&ifp->if_addrhead, ifa, ifa_link);
|
2013-11-05 07:44:15 +00:00
|
|
|
IF_ADDR_WUNLOCK(ifp);
|
|
|
|
|
|
|
|
ifa_ref(ifa); /* in_ifaddrhead */
|
|
|
|
IN_IFADDR_WLOCK();
|
2018-05-18 20:13:34 +00:00
|
|
|
CK_STAILQ_INSERT_TAIL(&V_in_ifaddrhead, ia, ia_link);
|
2013-11-05 07:44:15 +00:00
|
|
|
LIST_INSERT_HEAD(INADDR_HASH(ia->ia_addr.sin_addr.s_addr), ia, ia_hash);
|
|
|
|
IN_IFADDR_WUNLOCK();
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Give the interface a chance to initialize
|
|
|
|
* if this is its first address,
|
|
|
|
* and to validate the address if necessary.
|
|
|
|
*/
|
2014-07-31 09:18:29 +00:00
|
|
|
if (ifp->if_ioctl != NULL) {
|
2013-11-05 07:44:15 +00:00
|
|
|
error = (*ifp->if_ioctl)(ifp, SIOCSIFADDR, (caddr_t)ia);
|
2014-07-31 09:18:29 +00:00
|
|
|
if (error)
|
2014-07-31 16:43:56 +00:00
|
|
|
goto fail1;
|
2014-07-31 09:18:29 +00:00
|
|
|
}
|
2013-11-05 07:44:15 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Add route for the network.
|
|
|
|
*/
|
|
|
|
if (vhid == 0) {
|
2021-01-19 23:50:34 +00:00
|
|
|
error = in_addprefix(ia);
|
2013-11-05 07:44:15 +00:00
|
|
|
if (error)
|
2014-07-31 16:43:56 +00:00
|
|
|
goto fail1;
|
2013-11-05 07:44:15 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Add a loopback route to self.
|
|
|
|
*/
|
2021-01-19 23:50:34 +00:00
|
|
|
if (vhid == 0 && ia_need_loopback_route(ia)) {
|
2013-11-05 07:44:15 +00:00
|
|
|
struct in_ifaddr *eia;
|
|
|
|
|
|
|
|
eia = in_localip_more(ia);
|
|
|
|
|
|
|
|
if (eia == NULL) {
|
|
|
|
error = ifa_add_loopback_route((struct ifaddr *)ia,
|
|
|
|
(struct sockaddr *)&ia->ia_addr);
|
|
|
|
if (error)
|
2014-07-31 16:43:56 +00:00
|
|
|
goto fail2;
|
2013-11-05 07:44:15 +00:00
|
|
|
} else
|
|
|
|
ifa_free(&eia->ia_ifa);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (iaIsFirst && (ifp->if_flags & IFF_MULTICAST)) {
|
|
|
|
struct in_addr allhosts_addr;
|
|
|
|
struct in_ifinfo *ii;
|
|
|
|
|
|
|
|
ii = ((struct in_ifinfo *)ifp->if_afdata[AF_INET]);
|
|
|
|
allhosts_addr.s_addr = htonl(INADDR_ALLHOSTS_GROUP);
|
|
|
|
|
|
|
|
error = in_joingroup(ifp, &allhosts_addr, NULL,
|
|
|
|
&ii->ii_allhosts);
|
|
|
|
}
|
|
|
|
|
2018-10-21 15:02:06 +00:00
|
|
|
/*
|
|
|
|
* Note: we don't need extra reference for ifa, since we called
|
|
|
|
* with sx lock held, and ifaddr can not be deleted in concurrent
|
|
|
|
* thread.
|
|
|
|
*/
|
|
|
|
EVENTHANDLER_INVOKE(ifaddr_event_ext, ifp, ifa, IFADDR_EVENT_ADD);
|
2013-11-05 07:44:15 +00:00
|
|
|
|
|
|
|
return (error);
|
|
|
|
|
2014-07-31 16:43:56 +00:00
|
|
|
fail2:
|
2013-11-05 07:44:15 +00:00
|
|
|
if (vhid == 0)
|
|
|
|
(void )in_scrubprefix(ia, LLE_STATIC);
|
|
|
|
|
2014-07-31 16:43:56 +00:00
|
|
|
fail1:
|
A major overhaul of the CARP implementation. The ip_carp.c was started
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.
The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.
ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.
To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]
The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.
Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!
PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by: bz
Submitted by: bz [1]
2011-12-16 12:16:56 +00:00
|
|
|
if (ia->ia_ifa.ifa_carp)
|
2017-01-25 19:04:08 +00:00
|
|
|
(*carp_detach_p)(&ia->ia_ifa, false);
|
A major overhaul of the CARP implementation. The ip_carp.c was started
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.
The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.
ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.
To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]
The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.
Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!
PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by: bz
Submitted by: bz [1]
2011-12-16 12:16:56 +00:00
|
|
|
|
2013-11-05 07:44:15 +00:00
|
|
|
IF_ADDR_WLOCK(ifp);
|
2018-05-18 20:13:34 +00:00
|
|
|
CK_STAILQ_REMOVE(&ifp->if_addrhead, &ia->ia_ifa, ifaddr, ifa_link);
|
2013-11-05 07:44:15 +00:00
|
|
|
IF_ADDR_WUNLOCK(ifp);
|
2014-01-16 12:35:18 +00:00
|
|
|
ifa_free(&ia->ia_ifa); /* if_addrhead */
|
2013-11-05 07:44:15 +00:00
|
|
|
|
|
|
|
IN_IFADDR_WLOCK();
|
2018-05-18 20:13:34 +00:00
|
|
|
CK_STAILQ_REMOVE(&V_in_ifaddrhead, ia, in_ifaddr, ia_link);
|
2013-11-05 07:44:15 +00:00
|
|
|
LIST_REMOVE(ia, ia_hash);
|
|
|
|
IN_IFADDR_WUNLOCK();
|
2014-01-16 12:35:18 +00:00
|
|
|
ifa_free(&ia->ia_ifa); /* in_ifaddrhead */
|
2013-11-05 07:44:15 +00:00
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2017-01-25 19:04:08 +00:00
|
|
|
in_difaddr_ioctl(u_long cmd, caddr_t data, struct ifnet *ifp, struct thread *td)
|
2013-11-05 07:44:15 +00:00
|
|
|
{
|
|
|
|
const struct ifreq *ifr = (struct ifreq *)data;
|
2013-11-06 01:14:00 +00:00
|
|
|
const struct sockaddr_in *addr = (const struct sockaddr_in *)
|
|
|
|
&ifr->ifr_addr;
|
2013-11-05 07:44:15 +00:00
|
|
|
struct ifaddr *ifa;
|
|
|
|
struct in_ifaddr *ia;
|
|
|
|
bool deleteAny, iaIsLast;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
if (td != NULL) {
|
|
|
|
error = priv_check(td, PRIV_NET_DELIFADDR);
|
|
|
|
if (error)
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (addr->sin_len != sizeof(struct sockaddr_in) ||
|
|
|
|
addr->sin_family != AF_INET)
|
|
|
|
deleteAny = true;
|
|
|
|
else
|
|
|
|
deleteAny = false;
|
|
|
|
|
|
|
|
iaIsLast = true;
|
|
|
|
ia = NULL;
|
2012-01-05 19:00:36 +00:00
|
|
|
IF_ADDR_WLOCK(ifp);
|
2018-05-18 20:13:34 +00:00
|
|
|
CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
|
2013-12-29 22:20:06 +00:00
|
|
|
struct in_ifaddr *it;
|
2013-11-05 07:44:15 +00:00
|
|
|
|
2013-12-29 22:20:06 +00:00
|
|
|
if (ifa->ifa_addr->sa_family != AF_INET)
|
2013-11-05 07:44:15 +00:00
|
|
|
continue;
|
|
|
|
|
2013-12-29 22:20:06 +00:00
|
|
|
it = (struct in_ifaddr *)ifa;
|
2013-11-05 07:44:15 +00:00
|
|
|
if (deleteAny && ia == NULL && (td == NULL ||
|
|
|
|
prison_check_ip4(td->td_ucred, &it->ia_addr.sin_addr) == 0))
|
|
|
|
ia = it;
|
|
|
|
|
|
|
|
if (it->ia_addr.sin_addr.s_addr == addr->sin_addr.s_addr &&
|
|
|
|
(td == NULL || prison_check_ip4(td->td_ucred,
|
|
|
|
&addr->sin_addr) == 0))
|
|
|
|
ia = it;
|
|
|
|
|
|
|
|
if (it != ia)
|
|
|
|
iaIsLast = false;
|
2010-10-16 19:53:22 +00:00
|
|
|
}
|
2013-11-05 07:44:15 +00:00
|
|
|
|
|
|
|
if (ia == NULL) {
|
2012-01-05 19:00:36 +00:00
|
|
|
IF_ADDR_WUNLOCK(ifp);
|
2013-11-05 07:44:15 +00:00
|
|
|
return (EADDRNOTAVAIL);
|
2010-10-16 19:53:22 +00:00
|
|
|
}
|
2013-11-05 07:44:15 +00:00
|
|
|
|
2018-05-18 20:13:34 +00:00
|
|
|
CK_STAILQ_REMOVE(&ifp->if_addrhead, &ia->ia_ifa, ifaddr, ifa_link);
|
2012-01-05 19:00:36 +00:00
|
|
|
IF_ADDR_WUNLOCK(ifp);
|
2013-11-05 07:44:15 +00:00
|
|
|
ifa_free(&ia->ia_ifa); /* if_addrhead */
|
2009-06-25 11:52:33 +00:00
|
|
|
|
|
|
|
IN_IFADDR_WLOCK();
|
2018-05-18 20:13:34 +00:00
|
|
|
CK_STAILQ_REMOVE(&V_in_ifaddrhead, ia, in_ifaddr, ia_link);
|
2011-11-21 14:10:13 +00:00
|
|
|
LIST_REMOVE(ia, ia_hash);
|
|
|
|
IN_IFADDR_WUNLOCK();
|
2013-11-05 07:44:15 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* in_scrubprefix() kills the interface route.
|
|
|
|
*/
|
|
|
|
in_scrubprefix(ia, LLE_STATIC);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* in_ifadown gets rid of all the rest of
|
|
|
|
* the routes. This is not quite the right
|
|
|
|
* thing to do, but at least if we are running
|
|
|
|
* a routing process they will come back.
|
|
|
|
*/
|
|
|
|
in_ifadown(&ia->ia_ifa, 1);
|
|
|
|
|
|
|
|
if (ia->ia_ifa.ifa_carp)
|
2018-08-15 15:44:30 +00:00
|
|
|
(*carp_detach_p)(&ia->ia_ifa, cmd == SIOCAIFADDR);
|
2013-11-05 07:44:15 +00:00
|
|
|
|
2011-11-21 14:10:13 +00:00
|
|
|
/*
|
|
|
|
* If this is the last IPv4 address configured on this
|
|
|
|
* interface, leave the all-hosts group.
|
|
|
|
* No state-change report need be transmitted.
|
|
|
|
*/
|
2013-11-05 07:44:15 +00:00
|
|
|
if (iaIsLast && (ifp->if_flags & IFF_MULTICAST)) {
|
|
|
|
struct in_ifinfo *ii;
|
|
|
|
|
2011-11-21 14:10:13 +00:00
|
|
|
ii = ((struct in_ifinfo *)ifp->if_afdata[AF_INET]);
|
|
|
|
if (ii->ii_allhosts) {
|
2018-05-02 19:36:29 +00:00
|
|
|
(void)in_leavegroup(ii->ii_allhosts, NULL);
|
2011-11-21 14:10:13 +00:00
|
|
|
ii->ii_allhosts = NULL;
|
|
|
|
}
|
2013-11-05 07:44:15 +00:00
|
|
|
}
|
2011-11-21 14:10:13 +00:00
|
|
|
|
Add GARP retransmit capability
A single gratuitous ARP (GARP) is always transmitted when an IPv4
address is added to an interface, and that is usually sufficient.
However, in some circumstances, such as when a shared address is
passed between cluster nodes, this single GARP may occasionally be
dropped or lost. This can lead to neighbors on the network link
working with a stale ARP cache and sending packets destined for
that address to the node that previously owned the address, which
may not respond.
To avoid this situation, GARP retransmissions can be enabled by setting
the net.link.ether.inet.garp_rexmit_count sysctl to a value greater
than zero. The setting represents the maximum number of retransmissions.
The interval between retransmissions is calculated using an exponential
backoff algorithm, doubling each time, so the retransmission intervals
are: {1, 2, 4, 8, 16, ...} (seconds).
Due to the exponential backoff algorithm used for the interval
between GARP retransmissions, the maximum number of retransmissions
is limited to 16 for sanity. This limit corresponds to a maximum
interval between retransmissions of 2^16 seconds ~= 18 hours.
Increasing this limit is possible, but sending out GARPs spaced
days apart would be of little use.
Submitted by: David A. Bright <david.a.bright@dell.com>
MFC after: 1 month
Relnotes: yes
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D7695
2016-10-02 01:42:45 +00:00
|
|
|
IF_ADDR_WLOCK(ifp);
|
|
|
|
if (callout_stop(&ia->ia_garp_timer) == 1) {
|
|
|
|
ifa_free(&ia->ia_ifa);
|
|
|
|
}
|
|
|
|
IF_ADDR_WUNLOCK(ifp);
|
|
|
|
|
2018-10-21 15:02:06 +00:00
|
|
|
EVENTHANDLER_INVOKE(ifaddr_event_ext, ifp, &ia->ia_ifa,
|
|
|
|
IFADDR_EVENT_DEL);
|
2014-01-16 12:35:18 +00:00
|
|
|
ifa_free(&ia->ia_ifa); /* in_ifaddrhead */
|
2013-11-05 07:44:15 +00:00
|
|
|
|
|
|
|
return (0);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
|
2020-10-14 09:22:54 +00:00
|
|
|
static int
|
|
|
|
in_gifaddr_ioctl(u_long cmd, caddr_t data, struct ifnet *ifp, struct thread *td)
|
|
|
|
{
|
|
|
|
struct in_aliasreq *ifra = (struct in_aliasreq *)data;
|
|
|
|
const struct sockaddr_in *addr = &ifra->ifra_addr;
|
|
|
|
struct epoch_tracker et;
|
|
|
|
struct ifaddr *ifa;
|
|
|
|
struct in_ifaddr *ia;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* ifra_addr must be present and be of INET family.
|
|
|
|
*/
|
|
|
|
if (addr->sin_len != sizeof(struct sockaddr_in) ||
|
|
|
|
addr->sin_family != AF_INET)
|
|
|
|
return (EINVAL);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* See whether address exist.
|
|
|
|
*/
|
|
|
|
ia = NULL;
|
|
|
|
NET_EPOCH_ENTER(et);
|
|
|
|
CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
|
|
|
|
struct in_ifaddr *it;
|
|
|
|
|
|
|
|
if (ifa->ifa_addr->sa_family != AF_INET)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
it = (struct in_ifaddr *)ifa;
|
|
|
|
if (it->ia_addr.sin_addr.s_addr == addr->sin_addr.s_addr &&
|
|
|
|
prison_check_ip4(td->td_ucred, &addr->sin_addr) == 0) {
|
|
|
|
ia = it;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (ia == NULL) {
|
|
|
|
NET_EPOCH_EXIT(et);
|
|
|
|
return (EADDRNOTAVAIL);
|
|
|
|
}
|
|
|
|
|
|
|
|
ifra->ifra_mask = ia->ia_sockmask;
|
|
|
|
if ((ifp->if_flags & IFF_POINTOPOINT) &&
|
|
|
|
ia->ia_dstaddr.sin_family == AF_INET)
|
|
|
|
ifra->ifra_dstaddr = ia->ia_dstaddr;
|
|
|
|
else if ((ifp->if_flags & IFF_BROADCAST) &&
|
|
|
|
ia->ia_broadaddr.sin_family == AF_INET)
|
|
|
|
ifra->ifra_broadaddr = ia->ia_broadaddr;
|
|
|
|
else
|
|
|
|
memset(&ifra->ifra_broadaddr, 0,
|
|
|
|
sizeof(ifra->ifra_broadaddr));
|
|
|
|
|
|
|
|
NET_EPOCH_EXIT(et);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
Split rtinit() into multiple functions.
rtinit[1]() is a function used to add or remove interface address prefix routes,
similar to ifa_maintain_loopback_route().
It was intended to be family-agnostic. There is a problem with this approach
in reality.
1) IPv6 code does not use it for the ifa routes. There is a separate layer,
nd6_prelist_(), providing interface for maintaining interface routes. Its part,
responsible for the actual route table interaction, mimics rtenty() code.
2) rtinit tries to combine multiple actions in the same function: constructing
proper route attributes and handling iterations over multiple fibs, for the
non-zero net.add_addr_allfibs use case. It notably increases the code complexity.
3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag
for p2p connections, host routes and p2p routes are handled in the same way.
Additionally, mapping IFA flags to RTF flags makes the interface pretty messy.
It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface
aliases.
4) rtinit() is the last customer passing non-masked prefixes to rib_action(),
complicating rib_action() implementation.
5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive"
ifa messages in certain corner cases.
To address all these points, the following has been done:
* rtinit() has been split into multiple functions:
- Route attribute construction were moved to the per-address-family functions,
dealing with (2), (3) and (4).
- funnction providing net.add_addr_allfibs handling and route rtsock notificaions
is the new routing table inteface.
- rtsock ifa notificaion has been moved out as well. resulting set of funcion are only
responsible for the actual route notifications.
Side effects:
* /32 alias does not result in interface routes (/32 route and "host" route)
* RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses
Differential revision: https://reviews.freebsd.org/D28186
2021-01-09 00:19:25 +00:00
|
|
|
static int
|
|
|
|
in_match_ifaddr(const struct rtentry *rt, const struct nhop_object *nh, void *arg)
|
|
|
|
{
|
|
|
|
|
|
|
|
if (nh->nh_ifa == (struct ifaddr *)arg)
|
|
|
|
return (1);
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
in_handle_prefix_route(uint32_t fibnum, int cmd,
|
2021-01-19 23:50:34 +00:00
|
|
|
struct sockaddr_in *dst, struct sockaddr_in *netmask, struct ifaddr *ifa,
|
|
|
|
struct ifnet *ifp)
|
Split rtinit() into multiple functions.
rtinit[1]() is a function used to add or remove interface address prefix routes,
similar to ifa_maintain_loopback_route().
It was intended to be family-agnostic. There is a problem with this approach
in reality.
1) IPv6 code does not use it for the ifa routes. There is a separate layer,
nd6_prelist_(), providing interface for maintaining interface routes. Its part,
responsible for the actual route table interaction, mimics rtenty() code.
2) rtinit tries to combine multiple actions in the same function: constructing
proper route attributes and handling iterations over multiple fibs, for the
non-zero net.add_addr_allfibs use case. It notably increases the code complexity.
3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag
for p2p connections, host routes and p2p routes are handled in the same way.
Additionally, mapping IFA flags to RTF flags makes the interface pretty messy.
It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface
aliases.
4) rtinit() is the last customer passing non-masked prefixes to rib_action(),
complicating rib_action() implementation.
5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive"
ifa messages in certain corner cases.
To address all these points, the following has been done:
* rtinit() has been split into multiple functions:
- Route attribute construction were moved to the per-address-family functions,
dealing with (2), (3) and (4).
- funnction providing net.add_addr_allfibs handling and route rtsock notificaions
is the new routing table inteface.
- rtsock ifa notificaion has been moved out as well. resulting set of funcion are only
responsible for the actual route notifications.
Side effects:
* /32 alias does not result in interface routes (/32 route and "host" route)
* RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses
Differential revision: https://reviews.freebsd.org/D28186
2021-01-09 00:19:25 +00:00
|
|
|
{
|
|
|
|
|
|
|
|
NET_EPOCH_ASSERT();
|
|
|
|
|
|
|
|
/* Prepare gateway */
|
|
|
|
struct sockaddr_dl_short sdl = {
|
|
|
|
.sdl_family = AF_LINK,
|
|
|
|
.sdl_len = sizeof(struct sockaddr_dl_short),
|
|
|
|
.sdl_type = ifa->ifa_ifp->if_type,
|
|
|
|
.sdl_index = ifa->ifa_ifp->if_index,
|
|
|
|
};
|
|
|
|
|
|
|
|
struct rt_addrinfo info = {
|
|
|
|
.rti_ifa = ifa,
|
2021-01-19 23:50:34 +00:00
|
|
|
.rti_ifp = ifp,
|
Split rtinit() into multiple functions.
rtinit[1]() is a function used to add or remove interface address prefix routes,
similar to ifa_maintain_loopback_route().
It was intended to be family-agnostic. There is a problem with this approach
in reality.
1) IPv6 code does not use it for the ifa routes. There is a separate layer,
nd6_prelist_(), providing interface for maintaining interface routes. Its part,
responsible for the actual route table interaction, mimics rtenty() code.
2) rtinit tries to combine multiple actions in the same function: constructing
proper route attributes and handling iterations over multiple fibs, for the
non-zero net.add_addr_allfibs use case. It notably increases the code complexity.
3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag
for p2p connections, host routes and p2p routes are handled in the same way.
Additionally, mapping IFA flags to RTF flags makes the interface pretty messy.
It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface
aliases.
4) rtinit() is the last customer passing non-masked prefixes to rib_action(),
complicating rib_action() implementation.
5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive"
ifa messages in certain corner cases.
To address all these points, the following has been done:
* rtinit() has been split into multiple functions:
- Route attribute construction were moved to the per-address-family functions,
dealing with (2), (3) and (4).
- funnction providing net.add_addr_allfibs handling and route rtsock notificaions
is the new routing table inteface.
- rtsock ifa notificaion has been moved out as well. resulting set of funcion are only
responsible for the actual route notifications.
Side effects:
* /32 alias does not result in interface routes (/32 route and "host" route)
* RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses
Differential revision: https://reviews.freebsd.org/D28186
2021-01-09 00:19:25 +00:00
|
|
|
.rti_flags = RTF_PINNED | ((netmask != NULL) ? 0 : RTF_HOST),
|
|
|
|
.rti_info = {
|
|
|
|
[RTAX_DST] = (struct sockaddr *)dst,
|
|
|
|
[RTAX_NETMASK] = (struct sockaddr *)netmask,
|
|
|
|
[RTAX_GATEWAY] = (struct sockaddr *)&sdl,
|
|
|
|
},
|
|
|
|
/* Ensure we delete the prefix IFF prefix ifa matches */
|
|
|
|
.rti_filter = in_match_ifaddr,
|
|
|
|
.rti_filterdata = ifa,
|
|
|
|
};
|
|
|
|
|
|
|
|
return (rib_handle_ifaddr_info(fibnum, cmd, &info));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2021-01-19 23:50:34 +00:00
|
|
|
* Routing table interaction with interface addresses.
|
|
|
|
*
|
|
|
|
* In general, two types of routes needs to be installed:
|
|
|
|
* a) "interface" or "prefix" route, telling user that the addresses
|
|
|
|
* behind the ifa prefix are reached directly.
|
|
|
|
* b) "loopback" route installed for the ifa address, telling user that
|
|
|
|
* the address belongs to local system.
|
|
|
|
*
|
|
|
|
* Handling for (a) and (b) differs in multi-fib aspects, hence they
|
|
|
|
* are implemented in different functions below.
|
|
|
|
*
|
|
|
|
* The cases above may intersect - /32 interface aliases results in
|
|
|
|
* the same prefix produced by (a) and (b). This blurs the definition
|
|
|
|
* of the "loopback" route and complicate interactions. The interaction
|
|
|
|
* table is defined below. The case numbers are used in the multiple
|
|
|
|
* functions below to refer to the particular test case.
|
|
|
|
*
|
Split rtinit() into multiple functions.
rtinit[1]() is a function used to add or remove interface address prefix routes,
similar to ifa_maintain_loopback_route().
It was intended to be family-agnostic. There is a problem with this approach
in reality.
1) IPv6 code does not use it for the ifa routes. There is a separate layer,
nd6_prelist_(), providing interface for maintaining interface routes. Its part,
responsible for the actual route table interaction, mimics rtenty() code.
2) rtinit tries to combine multiple actions in the same function: constructing
proper route attributes and handling iterations over multiple fibs, for the
non-zero net.add_addr_allfibs use case. It notably increases the code complexity.
3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag
for p2p connections, host routes and p2p routes are handled in the same way.
Additionally, mapping IFA flags to RTF flags makes the interface pretty messy.
It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface
aliases.
4) rtinit() is the last customer passing non-masked prefixes to rib_action(),
complicating rib_action() implementation.
5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive"
ifa messages in certain corner cases.
To address all these points, the following has been done:
* rtinit() has been split into multiple functions:
- Route attribute construction were moved to the per-address-family functions,
dealing with (2), (3) and (4).
- funnction providing net.add_addr_allfibs handling and route rtsock notificaions
is the new routing table inteface.
- rtsock ifa notificaion has been moved out as well. resulting set of funcion are only
responsible for the actual route notifications.
Side effects:
* /32 alias does not result in interface routes (/32 route and "host" route)
* RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses
Differential revision: https://reviews.freebsd.org/D28186
2021-01-09 00:19:25 +00:00
|
|
|
* There can be multiple options:
|
2021-01-19 23:50:34 +00:00
|
|
|
* 1) Adding address with prefix on non-p2p/non-loopback interface.
|
|
|
|
* Example: 192.0.2.1/24. Action:
|
|
|
|
* * add "prefix" route towards 192.0.2.0/24 via @ia interface,
|
|
|
|
* using @ia as an address source.
|
|
|
|
* * add "loopback" route towards 192.0.2.1 via V_loif, saving
|
|
|
|
* @ia ifp in the gateway and using @ia as an address source.
|
|
|
|
*
|
|
|
|
* 2) Adding address with /32 mask to non-p2p/non-loopback interface.
|
|
|
|
* Example: 192.0.2.2/32. Action:
|
|
|
|
* * add "prefix" host route via V_loif, using @ia as an address source.
|
|
|
|
*
|
Split rtinit() into multiple functions.
rtinit[1]() is a function used to add or remove interface address prefix routes,
similar to ifa_maintain_loopback_route().
It was intended to be family-agnostic. There is a problem with this approach
in reality.
1) IPv6 code does not use it for the ifa routes. There is a separate layer,
nd6_prelist_(), providing interface for maintaining interface routes. Its part,
responsible for the actual route table interaction, mimics rtenty() code.
2) rtinit tries to combine multiple actions in the same function: constructing
proper route attributes and handling iterations over multiple fibs, for the
non-zero net.add_addr_allfibs use case. It notably increases the code complexity.
3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag
for p2p connections, host routes and p2p routes are handled in the same way.
Additionally, mapping IFA flags to RTF flags makes the interface pretty messy.
It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface
aliases.
4) rtinit() is the last customer passing non-masked prefixes to rib_action(),
complicating rib_action() implementation.
5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive"
ifa messages in certain corner cases.
To address all these points, the following has been done:
* rtinit() has been split into multiple functions:
- Route attribute construction were moved to the per-address-family functions,
dealing with (2), (3) and (4).
- funnction providing net.add_addr_allfibs handling and route rtsock notificaions
is the new routing table inteface.
- rtsock ifa notificaion has been moved out as well. resulting set of funcion are only
responsible for the actual route notifications.
Side effects:
* /32 alias does not result in interface routes (/32 route and "host" route)
* RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses
Differential revision: https://reviews.freebsd.org/D28186
2021-01-09 00:19:25 +00:00
|
|
|
* 3) Adding address with or without prefix to p2p interface.
|
2021-01-19 23:50:34 +00:00
|
|
|
* Example: 10.0.0.1/24->10.0.0.2. Action:
|
|
|
|
* * add "prefix" host route towards 10.0.0.2 via this interface, using @ia
|
|
|
|
* as an address source. Note: no sense in installing full /24 as the interface
|
|
|
|
* is point-to-point.
|
|
|
|
* * add "loopback" route towards 10.0.9.1 via V_loif, saving
|
|
|
|
* @ia ifp in the gateway and using @ia as an address source.
|
|
|
|
*
|
Split rtinit() into multiple functions.
rtinit[1]() is a function used to add or remove interface address prefix routes,
similar to ifa_maintain_loopback_route().
It was intended to be family-agnostic. There is a problem with this approach
in reality.
1) IPv6 code does not use it for the ifa routes. There is a separate layer,
nd6_prelist_(), providing interface for maintaining interface routes. Its part,
responsible for the actual route table interaction, mimics rtenty() code.
2) rtinit tries to combine multiple actions in the same function: constructing
proper route attributes and handling iterations over multiple fibs, for the
non-zero net.add_addr_allfibs use case. It notably increases the code complexity.
3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag
for p2p connections, host routes and p2p routes are handled in the same way.
Additionally, mapping IFA flags to RTF flags makes the interface pretty messy.
It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface
aliases.
4) rtinit() is the last customer passing non-masked prefixes to rib_action(),
complicating rib_action() implementation.
5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive"
ifa messages in certain corner cases.
To address all these points, the following has been done:
* rtinit() has been split into multiple functions:
- Route attribute construction were moved to the per-address-family functions,
dealing with (2), (3) and (4).
- funnction providing net.add_addr_allfibs handling and route rtsock notificaions
is the new routing table inteface.
- rtsock ifa notificaion has been moved out as well. resulting set of funcion are only
responsible for the actual route notifications.
Side effects:
* /32 alias does not result in interface routes (/32 route and "host" route)
* RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses
Differential revision: https://reviews.freebsd.org/D28186
2021-01-09 00:19:25 +00:00
|
|
|
* 4) Adding address with or without prefix to loopback interface.
|
2021-01-19 23:50:34 +00:00
|
|
|
* Example: 192.0.2.1/24. Action:
|
|
|
|
* * add "prefix" host route via @ia interface, using @ia as an address source.
|
|
|
|
* Note: Skip installing /24 prefix as it would introduce TTL loop
|
|
|
|
* for the traffic destined to these addresses.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Checks if @ia needs to install loopback route to @ia address via
|
|
|
|
* ifa_maintain_loopback_route().
|
|
|
|
*
|
|
|
|
* Return true on success.
|
|
|
|
*/
|
|
|
|
static bool
|
|
|
|
ia_need_loopback_route(const struct in_ifaddr *ia)
|
|
|
|
{
|
|
|
|
struct ifnet *ifp = ia->ia_ifp;
|
|
|
|
|
|
|
|
/* Case 4: Skip loopback interfaces */
|
|
|
|
if ((ifp->if_flags & IFF_LOOPBACK) ||
|
|
|
|
(ia->ia_addr.sin_addr.s_addr == INADDR_ANY))
|
|
|
|
return (false);
|
|
|
|
|
|
|
|
/* Clash avoidance: Skip p2p interfaces with both addresses are equal */
|
|
|
|
if ((ifp->if_flags & IFF_POINTOPOINT) &&
|
|
|
|
ia->ia_dstaddr.sin_addr.s_addr == ia->ia_addr.sin_addr.s_addr)
|
|
|
|
return (false);
|
|
|
|
|
|
|
|
/* Case 2: skip /32 prefixes */
|
|
|
|
if (!(ifp->if_flags & IFF_POINTOPOINT) &&
|
|
|
|
(ia->ia_sockmask.sin_addr.s_addr == INADDR_BROADCAST))
|
|
|
|
return (false);
|
|
|
|
|
|
|
|
return (true);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Calculate "prefix" route corresponding to @ia.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
ia_getrtprefix(const struct in_ifaddr *ia, struct in_addr *prefix, struct in_addr *mask)
|
|
|
|
{
|
|
|
|
|
|
|
|
if (ia->ia_ifp->if_flags & IFF_POINTOPOINT) {
|
|
|
|
/* Case 3: return host route for dstaddr */
|
|
|
|
*prefix = ia->ia_dstaddr.sin_addr;
|
|
|
|
mask->s_addr = INADDR_BROADCAST;
|
|
|
|
} else if (ia->ia_ifp->if_flags & IFF_LOOPBACK) {
|
|
|
|
/* Case 4: return host route for ifaddr */
|
|
|
|
*prefix = ia->ia_addr.sin_addr;
|
|
|
|
mask->s_addr = INADDR_BROADCAST;
|
|
|
|
} else {
|
|
|
|
/* Cases 1,2: return actual ia prefix */
|
|
|
|
*prefix = ia->ia_addr.sin_addr;
|
|
|
|
*mask = ia->ia_sockmask.sin_addr;
|
|
|
|
prefix->s_addr &= mask->s_addr;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Adds or delete interface "prefix" route corresponding to @ifa.
|
|
|
|
* Returns 0 on success or errno.
|
Split rtinit() into multiple functions.
rtinit[1]() is a function used to add or remove interface address prefix routes,
similar to ifa_maintain_loopback_route().
It was intended to be family-agnostic. There is a problem with this approach
in reality.
1) IPv6 code does not use it for the ifa routes. There is a separate layer,
nd6_prelist_(), providing interface for maintaining interface routes. Its part,
responsible for the actual route table interaction, mimics rtenty() code.
2) rtinit tries to combine multiple actions in the same function: constructing
proper route attributes and handling iterations over multiple fibs, for the
non-zero net.add_addr_allfibs use case. It notably increases the code complexity.
3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag
for p2p connections, host routes and p2p routes are handled in the same way.
Additionally, mapping IFA flags to RTF flags makes the interface pretty messy.
It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface
aliases.
4) rtinit() is the last customer passing non-masked prefixes to rib_action(),
complicating rib_action() implementation.
5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive"
ifa messages in certain corner cases.
To address all these points, the following has been done:
* rtinit() has been split into multiple functions:
- Route attribute construction were moved to the per-address-family functions,
dealing with (2), (3) and (4).
- funnction providing net.add_addr_allfibs handling and route rtsock notificaions
is the new routing table inteface.
- rtsock ifa notificaion has been moved out as well. resulting set of funcion are only
responsible for the actual route notifications.
Side effects:
* /32 alias does not result in interface routes (/32 route and "host" route)
* RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses
Differential revision: https://reviews.freebsd.org/D28186
2021-01-09 00:19:25 +00:00
|
|
|
*/
|
|
|
|
int
|
|
|
|
in_handle_ifaddr_route(int cmd, struct in_ifaddr *ia)
|
|
|
|
{
|
|
|
|
struct ifaddr *ifa = &ia->ia_ifa;
|
|
|
|
struct in_addr daddr, maddr;
|
|
|
|
struct sockaddr_in *pmask;
|
|
|
|
struct epoch_tracker et;
|
|
|
|
int error;
|
|
|
|
|
2021-01-19 23:50:34 +00:00
|
|
|
ia_getrtprefix(ia, &daddr, &maddr);
|
Split rtinit() into multiple functions.
rtinit[1]() is a function used to add or remove interface address prefix routes,
similar to ifa_maintain_loopback_route().
It was intended to be family-agnostic. There is a problem with this approach
in reality.
1) IPv6 code does not use it for the ifa routes. There is a separate layer,
nd6_prelist_(), providing interface for maintaining interface routes. Its part,
responsible for the actual route table interaction, mimics rtenty() code.
2) rtinit tries to combine multiple actions in the same function: constructing
proper route attributes and handling iterations over multiple fibs, for the
non-zero net.add_addr_allfibs use case. It notably increases the code complexity.
3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag
for p2p connections, host routes and p2p routes are handled in the same way.
Additionally, mapping IFA flags to RTF flags makes the interface pretty messy.
It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface
aliases.
4) rtinit() is the last customer passing non-masked prefixes to rib_action(),
complicating rib_action() implementation.
5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive"
ifa messages in certain corner cases.
To address all these points, the following has been done:
* rtinit() has been split into multiple functions:
- Route attribute construction were moved to the per-address-family functions,
dealing with (2), (3) and (4).
- funnction providing net.add_addr_allfibs handling and route rtsock notificaions
is the new routing table inteface.
- rtsock ifa notificaion has been moved out as well. resulting set of funcion are only
responsible for the actual route notifications.
Side effects:
* /32 alias does not result in interface routes (/32 route and "host" route)
* RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses
Differential revision: https://reviews.freebsd.org/D28186
2021-01-09 00:19:25 +00:00
|
|
|
|
|
|
|
struct sockaddr_in mask = {
|
|
|
|
.sin_family = AF_INET,
|
|
|
|
.sin_len = sizeof(struct sockaddr_in),
|
|
|
|
.sin_addr = maddr,
|
|
|
|
};
|
|
|
|
|
2021-01-19 23:50:34 +00:00
|
|
|
pmask = (maddr.s_addr != INADDR_BROADCAST) ? &mask : NULL;
|
Split rtinit() into multiple functions.
rtinit[1]() is a function used to add or remove interface address prefix routes,
similar to ifa_maintain_loopback_route().
It was intended to be family-agnostic. There is a problem with this approach
in reality.
1) IPv6 code does not use it for the ifa routes. There is a separate layer,
nd6_prelist_(), providing interface for maintaining interface routes. Its part,
responsible for the actual route table interaction, mimics rtenty() code.
2) rtinit tries to combine multiple actions in the same function: constructing
proper route attributes and handling iterations over multiple fibs, for the
non-zero net.add_addr_allfibs use case. It notably increases the code complexity.
3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag
for p2p connections, host routes and p2p routes are handled in the same way.
Additionally, mapping IFA flags to RTF flags makes the interface pretty messy.
It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface
aliases.
4) rtinit() is the last customer passing non-masked prefixes to rib_action(),
complicating rib_action() implementation.
5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive"
ifa messages in certain corner cases.
To address all these points, the following has been done:
* rtinit() has been split into multiple functions:
- Route attribute construction were moved to the per-address-family functions,
dealing with (2), (3) and (4).
- funnction providing net.add_addr_allfibs handling and route rtsock notificaions
is the new routing table inteface.
- rtsock ifa notificaion has been moved out as well. resulting set of funcion are only
responsible for the actual route notifications.
Side effects:
* /32 alias does not result in interface routes (/32 route and "host" route)
* RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses
Differential revision: https://reviews.freebsd.org/D28186
2021-01-09 00:19:25 +00:00
|
|
|
|
|
|
|
struct sockaddr_in dst = {
|
|
|
|
.sin_family = AF_INET,
|
|
|
|
.sin_len = sizeof(struct sockaddr_in),
|
|
|
|
.sin_addr.s_addr = daddr.s_addr & maddr.s_addr,
|
|
|
|
};
|
|
|
|
|
2021-01-19 23:50:34 +00:00
|
|
|
struct ifnet *ifp = ia->ia_ifp;
|
|
|
|
|
|
|
|
if ((maddr.s_addr == INADDR_BROADCAST) &&
|
|
|
|
(!(ia->ia_ifp->if_flags & (IFF_POINTOPOINT|IFF_LOOPBACK)))) {
|
|
|
|
/* Case 2: host route on broadcast interface */
|
|
|
|
ifp = V_loif;
|
|
|
|
}
|
|
|
|
|
Split rtinit() into multiple functions.
rtinit[1]() is a function used to add or remove interface address prefix routes,
similar to ifa_maintain_loopback_route().
It was intended to be family-agnostic. There is a problem with this approach
in reality.
1) IPv6 code does not use it for the ifa routes. There is a separate layer,
nd6_prelist_(), providing interface for maintaining interface routes. Its part,
responsible for the actual route table interaction, mimics rtenty() code.
2) rtinit tries to combine multiple actions in the same function: constructing
proper route attributes and handling iterations over multiple fibs, for the
non-zero net.add_addr_allfibs use case. It notably increases the code complexity.
3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag
for p2p connections, host routes and p2p routes are handled in the same way.
Additionally, mapping IFA flags to RTF flags makes the interface pretty messy.
It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface
aliases.
4) rtinit() is the last customer passing non-masked prefixes to rib_action(),
complicating rib_action() implementation.
5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive"
ifa messages in certain corner cases.
To address all these points, the following has been done:
* rtinit() has been split into multiple functions:
- Route attribute construction were moved to the per-address-family functions,
dealing with (2), (3) and (4).
- funnction providing net.add_addr_allfibs handling and route rtsock notificaions
is the new routing table inteface.
- rtsock ifa notificaion has been moved out as well. resulting set of funcion are only
responsible for the actual route notifications.
Side effects:
* /32 alias does not result in interface routes (/32 route and "host" route)
* RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses
Differential revision: https://reviews.freebsd.org/D28186
2021-01-09 00:19:25 +00:00
|
|
|
uint32_t fibnum = ifa->ifa_ifp->if_fib;
|
|
|
|
NET_EPOCH_ENTER(et);
|
2021-01-19 23:50:34 +00:00
|
|
|
error = in_handle_prefix_route(fibnum, cmd, &dst, pmask, ifa, ifp);
|
Split rtinit() into multiple functions.
rtinit[1]() is a function used to add or remove interface address prefix routes,
similar to ifa_maintain_loopback_route().
It was intended to be family-agnostic. There is a problem with this approach
in reality.
1) IPv6 code does not use it for the ifa routes. There is a separate layer,
nd6_prelist_(), providing interface for maintaining interface routes. Its part,
responsible for the actual route table interaction, mimics rtenty() code.
2) rtinit tries to combine multiple actions in the same function: constructing
proper route attributes and handling iterations over multiple fibs, for the
non-zero net.add_addr_allfibs use case. It notably increases the code complexity.
3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag
for p2p connections, host routes and p2p routes are handled in the same way.
Additionally, mapping IFA flags to RTF flags makes the interface pretty messy.
It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface
aliases.
4) rtinit() is the last customer passing non-masked prefixes to rib_action(),
complicating rib_action() implementation.
5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive"
ifa messages in certain corner cases.
To address all these points, the following has been done:
* rtinit() has been split into multiple functions:
- Route attribute construction were moved to the per-address-family functions,
dealing with (2), (3) and (4).
- funnction providing net.add_addr_allfibs handling and route rtsock notificaions
is the new routing table inteface.
- rtsock ifa notificaion has been moved out as well. resulting set of funcion are only
responsible for the actual route notifications.
Side effects:
* /32 alias does not result in interface routes (/32 route and "host" route)
* RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses
Differential revision: https://reviews.freebsd.org/D28186
2021-01-09 00:19:25 +00:00
|
|
|
NET_EPOCH_EXIT(et);
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2004-11-12 20:53:51 +00:00
|
|
|
/*
|
2021-01-07 19:13:52 +00:00
|
|
|
* Check if we have a route for the given prefix already.
|
2004-11-12 20:53:51 +00:00
|
|
|
*/
|
2021-01-07 19:13:52 +00:00
|
|
|
static bool
|
2021-01-19 23:50:34 +00:00
|
|
|
in_hasrtprefix(struct in_ifaddr *target)
|
2004-11-12 20:53:51 +00:00
|
|
|
{
|
2015-07-29 08:12:05 +00:00
|
|
|
struct rm_priotracker in_ifa_tracker;
|
2004-11-12 20:53:51 +00:00
|
|
|
struct in_ifaddr *ia;
|
2005-10-22 14:50:27 +00:00
|
|
|
struct in_addr prefix, mask, p, m;
|
2021-01-07 19:13:52 +00:00
|
|
|
bool result = false;
|
2004-11-12 20:53:51 +00:00
|
|
|
|
2021-01-19 23:50:34 +00:00
|
|
|
ia_getrtprefix(target, &prefix, &mask);
|
2004-11-12 20:53:51 +00:00
|
|
|
|
2015-07-29 08:12:05 +00:00
|
|
|
IN_IFADDR_RLOCK(&in_ifa_tracker);
|
2014-04-24 23:56:56 +00:00
|
|
|
/* Look for an existing address with the same prefix, mask, and fib */
|
2018-05-18 20:13:34 +00:00
|
|
|
CK_STAILQ_FOREACH(ia, &V_in_ifaddrhead, ia_link) {
|
2021-01-19 23:50:34 +00:00
|
|
|
ia_getrtprefix(ia, &p, &m);
|
2004-11-12 20:53:51 +00:00
|
|
|
|
2021-01-19 23:50:34 +00:00
|
|
|
if (prefix.s_addr != p.s_addr ||
|
|
|
|
mask.s_addr != m.s_addr)
|
|
|
|
continue;
|
2005-10-22 14:50:27 +00:00
|
|
|
|
2014-04-24 23:56:56 +00:00
|
|
|
if (target->ia_ifp->if_fib != ia->ia_ifp->if_fib)
|
|
|
|
continue;
|
2004-11-12 20:53:51 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If we got a matching prefix route inserted by other
|
|
|
|
* interface address, we are done here.
|
|
|
|
*/
|
2005-08-18 10:34:30 +00:00
|
|
|
if (ia->ia_flags & IFA_ROUTE) {
|
2021-01-07 19:13:52 +00:00
|
|
|
result = true;
|
|
|
|
break;
|
2005-08-18 10:34:30 +00:00
|
|
|
}
|
2004-11-12 20:53:51 +00:00
|
|
|
}
|
2015-07-29 08:12:05 +00:00
|
|
|
IN_IFADDR_RUNLOCK(&in_ifa_tracker);
|
2004-11-12 20:53:51 +00:00
|
|
|
|
2021-01-07 19:13:52 +00:00
|
|
|
return (result);
|
|
|
|
}
|
|
|
|
|
|
|
|
int
|
2021-01-19 23:50:34 +00:00
|
|
|
in_addprefix(struct in_ifaddr *target)
|
2021-01-07 19:13:52 +00:00
|
|
|
{
|
|
|
|
int error;
|
|
|
|
|
2021-01-19 23:50:34 +00:00
|
|
|
if (in_hasrtprefix(target)) {
|
2021-01-07 19:13:52 +00:00
|
|
|
if (V_nosameprefix)
|
|
|
|
return (EEXIST);
|
|
|
|
else {
|
|
|
|
rt_addrmsg(RTM_ADD, &target->ia_ifa,
|
|
|
|
target->ia_ifp->if_fib);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2004-11-12 20:53:51 +00:00
|
|
|
/*
|
|
|
|
* No-one seem to have this prefix route, so we try to insert it.
|
|
|
|
*/
|
Split rtinit() into multiple functions.
rtinit[1]() is a function used to add or remove interface address prefix routes,
similar to ifa_maintain_loopback_route().
It was intended to be family-agnostic. There is a problem with this approach
in reality.
1) IPv6 code does not use it for the ifa routes. There is a separate layer,
nd6_prelist_(), providing interface for maintaining interface routes. Its part,
responsible for the actual route table interaction, mimics rtenty() code.
2) rtinit tries to combine multiple actions in the same function: constructing
proper route attributes and handling iterations over multiple fibs, for the
non-zero net.add_addr_allfibs use case. It notably increases the code complexity.
3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag
for p2p connections, host routes and p2p routes are handled in the same way.
Additionally, mapping IFA flags to RTF flags makes the interface pretty messy.
It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface
aliases.
4) rtinit() is the last customer passing non-masked prefixes to rib_action(),
complicating rib_action() implementation.
5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive"
ifa messages in certain corner cases.
To address all these points, the following has been done:
* rtinit() has been split into multiple functions:
- Route attribute construction were moved to the per-address-family functions,
dealing with (2), (3) and (4).
- funnction providing net.add_addr_allfibs handling and route rtsock notificaions
is the new routing table inteface.
- rtsock ifa notificaion has been moved out as well. resulting set of funcion are only
responsible for the actual route notifications.
Side effects:
* /32 alias does not result in interface routes (/32 route and "host" route)
* RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses
Differential revision: https://reviews.freebsd.org/D28186
2021-01-09 00:19:25 +00:00
|
|
|
rt_addrmsg(RTM_ADD, &target->ia_ifa, target->ia_ifp->if_fib);
|
|
|
|
error = in_handle_ifaddr_route(RTM_ADD, target);
|
2004-11-12 20:53:51 +00:00
|
|
|
if (!error)
|
|
|
|
target->ia_flags |= IFA_ROUTE;
|
2008-10-26 19:17:25 +00:00
|
|
|
return (error);
|
2004-11-12 20:53:51 +00:00
|
|
|
}
|
|
|
|
|
2015-09-14 16:48:19 +00:00
|
|
|
/*
|
|
|
|
* Removes either all lle entries for given @ia, or lle
|
|
|
|
* corresponding to @ia address.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
in_scrubprefixlle(struct in_ifaddr *ia, int all, u_int flags)
|
|
|
|
{
|
|
|
|
struct sockaddr_in addr, mask;
|
|
|
|
struct sockaddr *saddr, *smask;
|
|
|
|
struct ifnet *ifp;
|
|
|
|
|
|
|
|
saddr = (struct sockaddr *)&addr;
|
|
|
|
bzero(&addr, sizeof(addr));
|
|
|
|
addr.sin_len = sizeof(addr);
|
|
|
|
addr.sin_family = AF_INET;
|
|
|
|
smask = (struct sockaddr *)&mask;
|
|
|
|
bzero(&mask, sizeof(mask));
|
|
|
|
mask.sin_len = sizeof(mask);
|
|
|
|
mask.sin_family = AF_INET;
|
|
|
|
mask.sin_addr.s_addr = ia->ia_subnetmask;
|
|
|
|
ifp = ia->ia_ifp;
|
|
|
|
|
2015-10-18 12:26:25 +00:00
|
|
|
if (all) {
|
|
|
|
/*
|
|
|
|
* Remove all L2 entries matching given prefix.
|
|
|
|
* Convert address to host representation to avoid
|
|
|
|
* doing this on every callback. ia_subnetmask is already
|
|
|
|
* stored in host representation.
|
|
|
|
*/
|
|
|
|
addr.sin_addr.s_addr = ntohl(ia->ia_addr.sin_addr.s_addr);
|
2015-09-14 16:48:19 +00:00
|
|
|
lltable_prefix_free(AF_INET, saddr, smask, flags);
|
2015-10-18 12:26:25 +00:00
|
|
|
} else {
|
|
|
|
/* Remove interface address only */
|
|
|
|
addr.sin_addr.s_addr = ia->ia_addr.sin_addr.s_addr;
|
2015-09-14 16:48:19 +00:00
|
|
|
lltable_delete_addr(LLTABLE(ifp), LLE_IFADDR, saddr);
|
2015-10-18 12:26:25 +00:00
|
|
|
}
|
2015-09-14 16:48:19 +00:00
|
|
|
}
|
|
|
|
|
2004-11-12 20:53:51 +00:00
|
|
|
/*
|
|
|
|
* If there is no other address in the system that can serve a route to the
|
|
|
|
* same prefix, remove the route. Hand over the route to the new address
|
|
|
|
* otherwise.
|
|
|
|
*/
|
A major overhaul of the CARP implementation. The ip_carp.c was started
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.
The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.
ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.
To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]
The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.
Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!
PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by: bz
Submitted by: bz [1]
2011-12-16 12:16:56 +00:00
|
|
|
int
|
2011-05-20 19:12:20 +00:00
|
|
|
in_scrubprefix(struct in_ifaddr *target, u_int flags)
|
2004-11-12 20:53:51 +00:00
|
|
|
{
|
2015-07-29 08:12:05 +00:00
|
|
|
struct rm_priotracker in_ifa_tracker;
|
2004-11-12 20:53:51 +00:00
|
|
|
struct in_ifaddr *ia;
|
2011-12-13 06:56:43 +00:00
|
|
|
struct in_addr prefix, mask, p, m;
|
2014-04-29 14:46:45 +00:00
|
|
|
int error = 0;
|
2004-11-12 20:53:51 +00:00
|
|
|
|
2009-07-27 17:08:06 +00:00
|
|
|
/*
|
|
|
|
* Remove the loopback route to the interface address.
|
|
|
|
*/
|
2021-01-19 23:50:34 +00:00
|
|
|
if (ia_need_loopback_route(target) && (flags & LLE_STATIC)) {
|
2013-11-05 07:44:15 +00:00
|
|
|
struct in_ifaddr *eia;
|
|
|
|
|
|
|
|
eia = in_localip_more(target);
|
|
|
|
|
|
|
|
if (eia != NULL) {
|
|
|
|
error = ifa_switch_loopback_route((struct ifaddr *)eia,
|
2015-09-16 06:23:15 +00:00
|
|
|
(struct sockaddr *)&target->ia_addr);
|
2013-11-05 07:44:15 +00:00
|
|
|
ifa_free(&eia->ia_ifa);
|
|
|
|
} else {
|
2009-12-30 21:35:34 +00:00
|
|
|
error = ifa_del_loopback_route((struct ifaddr *)target,
|
2012-07-31 11:31:12 +00:00
|
|
|
(struct sockaddr *)&target->ia_addr);
|
2011-05-20 19:12:20 +00:00
|
|
|
}
|
2009-05-12 07:41:20 +00:00
|
|
|
}
|
|
|
|
|
2021-01-19 23:50:34 +00:00
|
|
|
ia_getrtprefix(target, &prefix, &mask);
|
2004-11-12 20:53:51 +00:00
|
|
|
|
2009-12-30 22:13:01 +00:00
|
|
|
if ((target->ia_flags & IFA_ROUTE) == 0) {
|
2021-01-07 19:13:52 +00:00
|
|
|
rt_addrmsg(RTM_DELETE, &target->ia_ifa, target->ia_ifp->if_fib);
|
2020-02-12 13:31:36 +00:00
|
|
|
|
2015-10-18 12:26:25 +00:00
|
|
|
/*
|
|
|
|
* Removing address from !IFF_UP interface or
|
|
|
|
* prefix which exists on other interface (along with route).
|
|
|
|
* No entries should exist here except target addr.
|
|
|
|
* Given that, delete this entry only.
|
|
|
|
*/
|
|
|
|
in_scrubprefixlle(target, 0, flags);
|
2009-12-30 22:13:01 +00:00
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2015-07-29 08:12:05 +00:00
|
|
|
IN_IFADDR_RLOCK(&in_ifa_tracker);
|
2018-05-18 20:13:34 +00:00
|
|
|
CK_STAILQ_FOREACH(ia, &V_in_ifaddrhead, ia_link) {
|
2021-01-19 23:50:34 +00:00
|
|
|
ia_getrtprefix(ia, &p, &m);
|
2011-12-13 06:56:43 +00:00
|
|
|
|
2021-01-19 23:50:34 +00:00
|
|
|
if (prefix.s_addr != p.s_addr ||
|
|
|
|
mask.s_addr != m.s_addr)
|
|
|
|
continue;
|
2004-11-12 20:53:51 +00:00
|
|
|
|
2011-12-13 06:56:43 +00:00
|
|
|
if ((ia->ia_ifp->if_flags & IFF_UP) == 0)
|
2004-11-12 20:53:51 +00:00
|
|
|
continue;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we got a matching prefix address, move IFA_ROUTE and
|
|
|
|
* the route itself to it. Make sure that routing daemons
|
|
|
|
* get a heads-up.
|
|
|
|
*/
|
A major overhaul of the CARP implementation. The ip_carp.c was started
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.
The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.
ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.
To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]
The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.
Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!
PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by: bz
Submitted by: bz [1]
2011-12-16 12:16:56 +00:00
|
|
|
if ((ia->ia_flags & IFA_ROUTE) == 0) {
|
2011-03-21 14:19:40 +00:00
|
|
|
ifa_ref(&ia->ia_ifa);
|
2015-07-29 08:12:05 +00:00
|
|
|
IN_IFADDR_RUNLOCK(&in_ifa_tracker);
|
Split rtinit() into multiple functions.
rtinit[1]() is a function used to add or remove interface address prefix routes,
similar to ifa_maintain_loopback_route().
It was intended to be family-agnostic. There is a problem with this approach
in reality.
1) IPv6 code does not use it for the ifa routes. There is a separate layer,
nd6_prelist_(), providing interface for maintaining interface routes. Its part,
responsible for the actual route table interaction, mimics rtenty() code.
2) rtinit tries to combine multiple actions in the same function: constructing
proper route attributes and handling iterations over multiple fibs, for the
non-zero net.add_addr_allfibs use case. It notably increases the code complexity.
3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag
for p2p connections, host routes and p2p routes are handled in the same way.
Additionally, mapping IFA flags to RTF flags makes the interface pretty messy.
It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface
aliases.
4) rtinit() is the last customer passing non-masked prefixes to rib_action(),
complicating rib_action() implementation.
5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive"
ifa messages in certain corner cases.
To address all these points, the following has been done:
* rtinit() has been split into multiple functions:
- Route attribute construction were moved to the per-address-family functions,
dealing with (2), (3) and (4).
- funnction providing net.add_addr_allfibs handling and route rtsock notificaions
is the new routing table inteface.
- rtsock ifa notificaion has been moved out as well. resulting set of funcion are only
responsible for the actual route notifications.
Side effects:
* /32 alias does not result in interface routes (/32 route and "host" route)
* RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses
Differential revision: https://reviews.freebsd.org/D28186
2021-01-09 00:19:25 +00:00
|
|
|
error = in_handle_ifaddr_route(RTM_DELETE, target);
|
2011-05-29 02:21:35 +00:00
|
|
|
if (error == 0)
|
|
|
|
target->ia_flags &= ~IFA_ROUTE;
|
|
|
|
else
|
|
|
|
log(LOG_INFO, "in_scrubprefix: err=%d, old prefix delete failed\n",
|
|
|
|
error);
|
2015-09-14 16:48:19 +00:00
|
|
|
/* Scrub all entries IFF interface is different */
|
|
|
|
in_scrubprefixlle(target, target->ia_ifp != ia->ia_ifp,
|
|
|
|
flags);
|
Split rtinit() into multiple functions.
rtinit[1]() is a function used to add or remove interface address prefix routes,
similar to ifa_maintain_loopback_route().
It was intended to be family-agnostic. There is a problem with this approach
in reality.
1) IPv6 code does not use it for the ifa routes. There is a separate layer,
nd6_prelist_(), providing interface for maintaining interface routes. Its part,
responsible for the actual route table interaction, mimics rtenty() code.
2) rtinit tries to combine multiple actions in the same function: constructing
proper route attributes and handling iterations over multiple fibs, for the
non-zero net.add_addr_allfibs use case. It notably increases the code complexity.
3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag
for p2p connections, host routes and p2p routes are handled in the same way.
Additionally, mapping IFA flags to RTF flags makes the interface pretty messy.
It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface
aliases.
4) rtinit() is the last customer passing non-masked prefixes to rib_action(),
complicating rib_action() implementation.
5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive"
ifa messages in certain corner cases.
To address all these points, the following has been done:
* rtinit() has been split into multiple functions:
- Route attribute construction were moved to the per-address-family functions,
dealing with (2), (3) and (4).
- funnction providing net.add_addr_allfibs handling and route rtsock notificaions
is the new routing table inteface.
- rtsock ifa notificaion has been moved out as well. resulting set of funcion are only
responsible for the actual route notifications.
Side effects:
* /32 alias does not result in interface routes (/32 route and "host" route)
* RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses
Differential revision: https://reviews.freebsd.org/D28186
2021-01-09 00:19:25 +00:00
|
|
|
error = in_handle_ifaddr_route(RTM_ADD, ia);
|
2004-11-12 20:53:51 +00:00
|
|
|
if (error == 0)
|
|
|
|
ia->ia_flags |= IFA_ROUTE;
|
2011-05-29 02:21:35 +00:00
|
|
|
else
|
|
|
|
log(LOG_INFO, "in_scrubprefix: err=%d, new prefix add failed\n",
|
|
|
|
error);
|
2011-03-21 14:19:40 +00:00
|
|
|
ifa_free(&ia->ia_ifa);
|
2008-10-26 19:17:25 +00:00
|
|
|
return (error);
|
2004-11-12 20:53:51 +00:00
|
|
|
}
|
|
|
|
}
|
2015-07-29 08:12:05 +00:00
|
|
|
IN_IFADDR_RUNLOCK(&in_ifa_tracker);
|
2004-11-12 20:53:51 +00:00
|
|
|
|
2009-05-20 21:07:15 +00:00
|
|
|
/*
|
|
|
|
* remove all L2 entries on the given prefix
|
|
|
|
*/
|
2015-09-14 16:48:19 +00:00
|
|
|
in_scrubprefixlle(target, 1, flags);
|
2009-05-20 21:07:15 +00:00
|
|
|
|
2004-11-12 20:53:51 +00:00
|
|
|
/*
|
|
|
|
* As no-one seem to have this prefix, we can remove the route.
|
|
|
|
*/
|
Split rtinit() into multiple functions.
rtinit[1]() is a function used to add or remove interface address prefix routes,
similar to ifa_maintain_loopback_route().
It was intended to be family-agnostic. There is a problem with this approach
in reality.
1) IPv6 code does not use it for the ifa routes. There is a separate layer,
nd6_prelist_(), providing interface for maintaining interface routes. Its part,
responsible for the actual route table interaction, mimics rtenty() code.
2) rtinit tries to combine multiple actions in the same function: constructing
proper route attributes and handling iterations over multiple fibs, for the
non-zero net.add_addr_allfibs use case. It notably increases the code complexity.
3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag
for p2p connections, host routes and p2p routes are handled in the same way.
Additionally, mapping IFA flags to RTF flags makes the interface pretty messy.
It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface
aliases.
4) rtinit() is the last customer passing non-masked prefixes to rib_action(),
complicating rib_action() implementation.
5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive"
ifa messages in certain corner cases.
To address all these points, the following has been done:
* rtinit() has been split into multiple functions:
- Route attribute construction were moved to the per-address-family functions,
dealing with (2), (3) and (4).
- funnction providing net.add_addr_allfibs handling and route rtsock notificaions
is the new routing table inteface.
- rtsock ifa notificaion has been moved out as well. resulting set of funcion are only
responsible for the actual route notifications.
Side effects:
* /32 alias does not result in interface routes (/32 route and "host" route)
* RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses
Differential revision: https://reviews.freebsd.org/D28186
2021-01-09 00:19:25 +00:00
|
|
|
rt_addrmsg(RTM_DELETE, &target->ia_ifa, target->ia_ifp->if_fib);
|
|
|
|
error = in_handle_ifaddr_route(RTM_DELETE, target);
|
2011-05-29 02:21:35 +00:00
|
|
|
if (error == 0)
|
|
|
|
target->ia_flags &= ~IFA_ROUTE;
|
|
|
|
else
|
|
|
|
log(LOG_INFO, "in_scrubprefix: err=%d, prefix delete failed\n", error);
|
|
|
|
return (error);
|
2004-11-12 20:53:51 +00:00
|
|
|
}
|
|
|
|
|
Get closer to a VIMAGE network stack teardown from top to bottom rather
than removing the network interfaces first. This change is rather larger
and convoluted as the ordering requirements cannot be separated.
Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and
related modules to their own SI_SUB_PROTO_FIREWALL.
Move initialization of "physical" interfaces to SI_SUB_DRIVERS,
move virtual (cloned) interfaces to SI_SUB_PSEUDO.
Move Multicast to SI_SUB_PROTO_MC.
Re-work parts of multicast initialisation and teardown, not taking the
huge amount of memory into account if used as a module yet.
For interface teardown we try to do as many of them as we can on
SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling
over a higher layer protocol such as IP. In that case the interface
has to go along (or before) the higher layer protocol is shutdown.
Kernel hhooks need to go last on teardown as they may be used at various
higher layers and we cannot remove them before we cleaned up the higher
layers.
For interface teardown there are multiple paths:
(a) a cloned interface is destroyed (inside a VIMAGE or in the base system),
(b) any interface is moved from a virtual network stack to a different
network stack ("vmove"), or (c) a virtual network stack is being shut down.
All code paths go through if_detach_internal() where we, depending on the
vmove flag or the vnet state, make a decision on how much to shut down;
in case we are destroying a VNET the individual protocol layers will
cleanup their own parts thus we cannot do so again for each interface as
we end up with, e.g., double-frees, destroying locks twice or acquiring
already destroyed locks.
When calling into protocol cleanups we equally have to tell them
whether they need to detach upper layer protocols ("ulp") or not
(e.g., in6_ifdetach()).
Provide or enahnce helper functions to do proper cleanup at a protocol
rather than at an interface level.
Approved by: re (hrs)
Obtained from: projects/vnet
Reviewed by: gnn, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6747
2016-06-21 13:48:49 +00:00
|
|
|
void
|
|
|
|
in_ifscrub_all(void)
|
|
|
|
{
|
|
|
|
struct ifnet *ifp;
|
|
|
|
struct ifaddr *ifa, *nifa;
|
|
|
|
struct ifaliasreq ifr;
|
|
|
|
|
|
|
|
IFNET_RLOCK();
|
2018-05-23 21:02:14 +00:00
|
|
|
CK_STAILQ_FOREACH(ifp, &V_ifnet, if_link) {
|
Get closer to a VIMAGE network stack teardown from top to bottom rather
than removing the network interfaces first. This change is rather larger
and convoluted as the ordering requirements cannot be separated.
Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and
related modules to their own SI_SUB_PROTO_FIREWALL.
Move initialization of "physical" interfaces to SI_SUB_DRIVERS,
move virtual (cloned) interfaces to SI_SUB_PSEUDO.
Move Multicast to SI_SUB_PROTO_MC.
Re-work parts of multicast initialisation and teardown, not taking the
huge amount of memory into account if used as a module yet.
For interface teardown we try to do as many of them as we can on
SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling
over a higher layer protocol such as IP. In that case the interface
has to go along (or before) the higher layer protocol is shutdown.
Kernel hhooks need to go last on teardown as they may be used at various
higher layers and we cannot remove them before we cleaned up the higher
layers.
For interface teardown there are multiple paths:
(a) a cloned interface is destroyed (inside a VIMAGE or in the base system),
(b) any interface is moved from a virtual network stack to a different
network stack ("vmove"), or (c) a virtual network stack is being shut down.
All code paths go through if_detach_internal() where we, depending on the
vmove flag or the vnet state, make a decision on how much to shut down;
in case we are destroying a VNET the individual protocol layers will
cleanup their own parts thus we cannot do so again for each interface as
we end up with, e.g., double-frees, destroying locks twice or acquiring
already destroyed locks.
When calling into protocol cleanups we equally have to tell them
whether they need to detach upper layer protocols ("ulp") or not
(e.g., in6_ifdetach()).
Provide or enahnce helper functions to do proper cleanup at a protocol
rather than at an interface level.
Approved by: re (hrs)
Obtained from: projects/vnet
Reviewed by: gnn, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6747
2016-06-21 13:48:49 +00:00
|
|
|
/* Cannot lock here - lock recursion. */
|
2019-01-09 01:11:19 +00:00
|
|
|
/* NET_EPOCH_ENTER(et); */
|
2018-05-18 20:13:34 +00:00
|
|
|
CK_STAILQ_FOREACH_SAFE(ifa, &ifp->if_addrhead, ifa_link, nifa) {
|
Get closer to a VIMAGE network stack teardown from top to bottom rather
than removing the network interfaces first. This change is rather larger
and convoluted as the ordering requirements cannot be separated.
Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and
related modules to their own SI_SUB_PROTO_FIREWALL.
Move initialization of "physical" interfaces to SI_SUB_DRIVERS,
move virtual (cloned) interfaces to SI_SUB_PSEUDO.
Move Multicast to SI_SUB_PROTO_MC.
Re-work parts of multicast initialisation and teardown, not taking the
huge amount of memory into account if used as a module yet.
For interface teardown we try to do as many of them as we can on
SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling
over a higher layer protocol such as IP. In that case the interface
has to go along (or before) the higher layer protocol is shutdown.
Kernel hhooks need to go last on teardown as they may be used at various
higher layers and we cannot remove them before we cleaned up the higher
layers.
For interface teardown there are multiple paths:
(a) a cloned interface is destroyed (inside a VIMAGE or in the base system),
(b) any interface is moved from a virtual network stack to a different
network stack ("vmove"), or (c) a virtual network stack is being shut down.
All code paths go through if_detach_internal() where we, depending on the
vmove flag or the vnet state, make a decision on how much to shut down;
in case we are destroying a VNET the individual protocol layers will
cleanup their own parts thus we cannot do so again for each interface as
we end up with, e.g., double-frees, destroying locks twice or acquiring
already destroyed locks.
When calling into protocol cleanups we equally have to tell them
whether they need to detach upper layer protocols ("ulp") or not
(e.g., in6_ifdetach()).
Provide or enahnce helper functions to do proper cleanup at a protocol
rather than at an interface level.
Approved by: re (hrs)
Obtained from: projects/vnet
Reviewed by: gnn, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6747
2016-06-21 13:48:49 +00:00
|
|
|
if (ifa->ifa_addr->sa_family != AF_INET)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This is ugly but the only way for legacy IP to
|
|
|
|
* cleanly remove addresses and everything attached.
|
|
|
|
*/
|
|
|
|
bzero(&ifr, sizeof(ifr));
|
|
|
|
ifr.ifra_addr = *ifa->ifa_addr;
|
|
|
|
if (ifa->ifa_dstaddr)
|
|
|
|
ifr.ifra_broadaddr = *ifa->ifa_dstaddr;
|
|
|
|
(void)in_control(NULL, SIOCDIFADDR, (caddr_t)&ifr,
|
|
|
|
ifp, NULL);
|
|
|
|
}
|
2019-01-09 01:11:19 +00:00
|
|
|
/* NET_EPOCH_EXIT(et); */
|
Get closer to a VIMAGE network stack teardown from top to bottom rather
than removing the network interfaces first. This change is rather larger
and convoluted as the ordering requirements cannot be separated.
Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and
related modules to their own SI_SUB_PROTO_FIREWALL.
Move initialization of "physical" interfaces to SI_SUB_DRIVERS,
move virtual (cloned) interfaces to SI_SUB_PSEUDO.
Move Multicast to SI_SUB_PROTO_MC.
Re-work parts of multicast initialisation and teardown, not taking the
huge amount of memory into account if used as a module yet.
For interface teardown we try to do as many of them as we can on
SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling
over a higher layer protocol such as IP. In that case the interface
has to go along (or before) the higher layer protocol is shutdown.
Kernel hhooks need to go last on teardown as they may be used at various
higher layers and we cannot remove them before we cleaned up the higher
layers.
For interface teardown there are multiple paths:
(a) a cloned interface is destroyed (inside a VIMAGE or in the base system),
(b) any interface is moved from a virtual network stack to a different
network stack ("vmove"), or (c) a virtual network stack is being shut down.
All code paths go through if_detach_internal() where we, depending on the
vmove flag or the vnet state, make a decision on how much to shut down;
in case we are destroying a VNET the individual protocol layers will
cleanup their own parts thus we cannot do so again for each interface as
we end up with, e.g., double-frees, destroying locks twice or acquiring
already destroyed locks.
When calling into protocol cleanups we equally have to tell them
whether they need to detach upper layer protocols ("ulp") or not
(e.g., in6_ifdetach()).
Provide or enahnce helper functions to do proper cleanup at a protocol
rather than at an interface level.
Approved by: re (hrs)
Obtained from: projects/vnet
Reviewed by: gnn, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6747
2016-06-21 13:48:49 +00:00
|
|
|
in_purgemaddrs(ifp);
|
|
|
|
igmp_domifdetach(ifp);
|
|
|
|
}
|
|
|
|
IFNET_RUNLOCK();
|
|
|
|
}
|
|
|
|
|
2016-08-18 22:59:00 +00:00
|
|
|
int
|
|
|
|
in_ifaddr_broadcast(struct in_addr in, struct in_ifaddr *ia)
|
|
|
|
{
|
|
|
|
|
|
|
|
return ((in.s_addr == ia->ia_broadaddr.sin_addr.s_addr ||
|
|
|
|
/*
|
2021-09-05 18:14:04 +00:00
|
|
|
* Optionally check for old-style (host 0) broadcast, but
|
2016-08-18 22:59:00 +00:00
|
|
|
* taking into account that RFC 3021 obsoletes it.
|
|
|
|
*/
|
2021-09-05 18:14:04 +00:00
|
|
|
(V_broadcast_lowest && ia->ia_subnetmask != IN_RFC3021_MASK &&
|
2016-08-18 22:59:00 +00:00
|
|
|
ntohl(in.s_addr) == ia->ia_subnet)) &&
|
|
|
|
/*
|
|
|
|
* Check for an all one subnetmask. These
|
|
|
|
* only exist when an interface gets a secondary
|
|
|
|
* address.
|
|
|
|
*/
|
|
|
|
ia->ia_subnetmask != (u_long)0xffffffff);
|
|
|
|
}
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
|
|
|
* Return 1 if the address might be a local broadcast address.
|
|
|
|
*/
|
1994-05-25 09:21:21 +00:00
|
|
|
int
|
2007-05-10 15:58:48 +00:00
|
|
|
in_broadcast(struct in_addr in, struct ifnet *ifp)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2017-05-17 00:34:34 +00:00
|
|
|
struct ifaddr *ifa;
|
2016-08-18 22:59:10 +00:00
|
|
|
int found;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
Widen NET_EPOCH coverage.
When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.
However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.
Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.
On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().
This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.
Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.
This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.
Reviewed by: gallatin, hselasky, cy, adrian, kristof
Differential Revision: https://reviews.freebsd.org/D19111
2019-10-07 22:40:05 +00:00
|
|
|
NET_EPOCH_ASSERT();
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
if (in.s_addr == INADDR_BROADCAST ||
|
|
|
|
in.s_addr == INADDR_ANY)
|
2008-10-26 19:17:25 +00:00
|
|
|
return (1);
|
1994-05-24 10:09:53 +00:00
|
|
|
if ((ifp->if_flags & IFF_BROADCAST) == 0)
|
2008-10-26 19:17:25 +00:00
|
|
|
return (0);
|
2016-08-18 22:59:10 +00:00
|
|
|
found = 0;
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
|
|
|
* Look through the list of addresses for a match
|
|
|
|
* with a broadcast address.
|
|
|
|
*/
|
2018-05-18 20:13:34 +00:00
|
|
|
CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link)
|
1994-05-24 10:09:53 +00:00
|
|
|
if (ifa->ifa_addr->sa_family == AF_INET &&
|
2016-08-18 22:59:10 +00:00
|
|
|
in_ifaddr_broadcast(in, (struct in_ifaddr *)ifa)) {
|
|
|
|
found = 1;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
return (found);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
2007-03-20 00:36:10 +00:00
|
|
|
|
2009-03-09 17:53:05 +00:00
|
|
|
/*
|
|
|
|
* On interface removal, clean up IPv4 data structures hung off of the ifnet.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
in_ifdetach(struct ifnet *ifp)
|
|
|
|
{
|
2018-05-02 19:36:29 +00:00
|
|
|
IN_MULTI_LOCK();
|
2009-03-09 17:53:05 +00:00
|
|
|
in_pcbpurgeif0(&V_ripcbinfo, ifp);
|
|
|
|
in_pcbpurgeif0(&V_udbinfo, ifp);
|
2014-04-07 01:53:03 +00:00
|
|
|
in_pcbpurgeif0(&V_ulitecbinfo, ifp);
|
2009-03-09 17:53:05 +00:00
|
|
|
in_purgemaddrs(ifp);
|
2018-05-02 19:36:29 +00:00
|
|
|
IN_MULTI_UNLOCK();
|
2020-08-10 10:46:08 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Make sure all multicast deletions invoking if_ioctl() are
|
|
|
|
* completed before returning. Else we risk accessing a freed
|
|
|
|
* ifnet structure pointer.
|
|
|
|
*/
|
|
|
|
inm_release_wait(NULL);
|
2009-03-09 17:53:05 +00:00
|
|
|
}
|
|
|
|
|
2006-09-28 10:04:07 +00:00
|
|
|
/*
|
2007-03-20 00:36:10 +00:00
|
|
|
* Delete all IPv4 multicast address records, and associated link-layer
|
|
|
|
* multicast address records, associated with ifp.
|
2009-03-09 17:53:05 +00:00
|
|
|
* XXX It looks like domifdetach runs AFTER the link layer cleanup.
|
2009-03-17 14:41:54 +00:00
|
|
|
* XXX This should not race with ifma_protospec being set during
|
|
|
|
* a new allocation, if it does, we have bigger problems.
|
2006-09-28 10:04:07 +00:00
|
|
|
*/
|
2007-03-20 00:36:10 +00:00
|
|
|
static void
|
|
|
|
in_purgemaddrs(struct ifnet *ifp)
|
2006-09-28 10:04:07 +00:00
|
|
|
{
|
2018-05-02 19:36:29 +00:00
|
|
|
struct in_multi_head purgeinms;
|
|
|
|
struct in_multi *inm;
|
2018-05-06 20:34:13 +00:00
|
|
|
struct ifmultiaddr *ifma, *next;
|
2006-09-28 10:04:07 +00:00
|
|
|
|
2018-05-02 19:36:29 +00:00
|
|
|
SLIST_INIT(&purgeinms);
|
|
|
|
IN_MULTI_LIST_LOCK();
|
2009-03-09 17:53:05 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Extract list of in_multi associated with the detaching ifp
|
|
|
|
* which the PF_INET layer is about to release.
|
|
|
|
* We need to do this as IF_ADDR_LOCK() may be re-acquired
|
|
|
|
* by code further down.
|
|
|
|
*/
|
2018-05-06 20:34:13 +00:00
|
|
|
IF_ADDR_WLOCK(ifp);
|
|
|
|
restart:
|
2018-05-18 20:13:34 +00:00
|
|
|
CK_STAILQ_FOREACH_SAFE(ifma, &ifp->if_multiaddrs, ifma_link, next) {
|
2009-03-17 14:41:54 +00:00
|
|
|
if (ifma->ifma_addr->sa_family != AF_INET ||
|
|
|
|
ifma->ifma_protospec == NULL)
|
2009-03-09 17:53:05 +00:00
|
|
|
continue;
|
|
|
|
inm = (struct in_multi *)ifma->ifma_protospec;
|
2018-05-02 19:36:29 +00:00
|
|
|
inm_rele_locked(&purgeinms, inm);
|
2018-05-06 20:34:13 +00:00
|
|
|
if (__predict_false(ifma_restart)) {
|
|
|
|
ifma_restart = true;
|
|
|
|
goto restart;
|
|
|
|
}
|
2006-09-28 10:04:07 +00:00
|
|
|
}
|
2018-05-06 20:34:13 +00:00
|
|
|
IF_ADDR_WUNLOCK(ifp);
|
2005-09-18 17:36:28 +00:00
|
|
|
|
2018-05-02 19:36:29 +00:00
|
|
|
inm_release_list_deferred(&purgeinms);
|
2009-03-09 17:53:05 +00:00
|
|
|
igmp_ifdetach(ifp);
|
2018-05-02 19:36:29 +00:00
|
|
|
IN_MULTI_LIST_UNLOCK();
|
2005-09-18 17:36:28 +00:00
|
|
|
}
|
This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
|
|
|
|
|
|
|
struct in_llentry {
|
|
|
|
struct llentry base;
|
|
|
|
};
|
|
|
|
|
2015-08-10 12:03:59 +00:00
|
|
|
#define IN_LLTBL_DEFAULT_HSIZE 32
|
|
|
|
#define IN_LLTBL_HASH(k, h) \
|
|
|
|
(((((((k >> 8) ^ k) >> 8) ^ k) >> 8) ^ k) & ((h) - 1))
|
|
|
|
|
2012-02-23 18:21:37 +00:00
|
|
|
/*
|
2015-08-10 12:03:59 +00:00
|
|
|
* Do actual deallocation of @lle.
|
2016-04-26 23:13:48 +00:00
|
|
|
*/
|
|
|
|
static void
|
2018-05-23 21:02:14 +00:00
|
|
|
in_lltable_destroy_lle_unlocked(epoch_context_t ctx)
|
2016-04-26 23:13:48 +00:00
|
|
|
{
|
2018-05-23 21:02:14 +00:00
|
|
|
struct llentry *lle;
|
2016-04-26 23:13:48 +00:00
|
|
|
|
2018-05-23 21:02:14 +00:00
|
|
|
lle = __containerof(ctx, struct llentry, lle_epoch_ctx);
|
2016-04-26 23:13:48 +00:00
|
|
|
LLE_LOCK_DESTROY(lle);
|
|
|
|
LLE_REQ_DESTROY(lle);
|
|
|
|
free(lle, M_LLTABLE);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2015-08-10 12:03:59 +00:00
|
|
|
* Called by LLE_FREE_LOCKED when number of references
|
|
|
|
* drops to zero.
|
2012-02-23 18:21:37 +00:00
|
|
|
*/
|
|
|
|
static void
|
2015-08-10 12:03:59 +00:00
|
|
|
in_lltable_destroy_lle(struct llentry *lle)
|
2012-02-23 18:21:37 +00:00
|
|
|
{
|
2015-08-10 12:03:59 +00:00
|
|
|
|
2012-02-23 18:21:37 +00:00
|
|
|
LLE_WUNLOCK(lle);
|
2020-01-15 06:05:20 +00:00
|
|
|
NET_EPOCH_CALL(in_lltable_destroy_lle_unlocked, &lle->lle_epoch_ctx);
|
2012-02-23 18:21:37 +00:00
|
|
|
}
|
|
|
|
|
This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
|
|
|
static struct llentry *
|
2015-08-11 09:26:11 +00:00
|
|
|
in_lltable_new(struct in_addr addr4, u_int flags)
|
This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
|
|
|
{
|
|
|
|
struct in_llentry *lle;
|
|
|
|
|
2012-04-10 06:52:39 +00:00
|
|
|
lle = malloc(sizeof(struct in_llentry), M_LLTABLE, M_NOWAIT | M_ZERO);
|
This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
|
|
|
if (lle == NULL) /* NB: caller generates msg */
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* For IPv4 this will trigger "arpresolve" to generate
|
|
|
|
* an ARP request.
|
|
|
|
*/
|
2010-11-30 15:57:00 +00:00
|
|
|
lle->base.la_expire = time_uptime; /* mark expired */
|
2015-08-11 09:26:11 +00:00
|
|
|
lle->base.r_l3addr.addr4 = addr4;
|
This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
|
|
|
lle->base.lle_refcnt = 1;
|
2015-08-10 12:03:59 +00:00
|
|
|
lle->base.lle_free = in_lltable_destroy_lle;
|
This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
|
|
|
LLE_LOCK_INIT(&lle->base);
|
2015-12-05 09:50:37 +00:00
|
|
|
LLE_REQ_INIT(&lle->base);
|
2015-08-11 12:38:54 +00:00
|
|
|
callout_init(&lle->base.lle_timer, 1);
|
2012-08-02 13:57:49 +00:00
|
|
|
|
|
|
|
return (&lle->base);
|
This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
|
|
|
}
|
|
|
|
|
2015-09-14 16:48:19 +00:00
|
|
|
#define IN_ARE_MASKED_ADDR_EQUAL(d, a, m) ( \
|
|
|
|
((((d).s_addr ^ (a).s_addr) & (m).s_addr)) == 0 )
|
2009-05-20 21:07:15 +00:00
|
|
|
|
2015-08-10 12:03:59 +00:00
|
|
|
static int
|
2015-09-14 16:48:19 +00:00
|
|
|
in_lltable_match_prefix(const struct sockaddr *saddr,
|
|
|
|
const struct sockaddr *smask, u_int flags, struct llentry *lle)
|
2009-05-20 21:07:15 +00:00
|
|
|
{
|
2015-09-14 16:48:19 +00:00
|
|
|
struct in_addr addr, mask, lle_addr;
|
2015-08-10 12:03:59 +00:00
|
|
|
|
2015-09-14 16:48:19 +00:00
|
|
|
addr = ((const struct sockaddr_in *)saddr)->sin_addr;
|
|
|
|
mask = ((const struct sockaddr_in *)smask)->sin_addr;
|
|
|
|
lle_addr.s_addr = ntohl(lle->r_l3addr.addr4.s_addr);
|
|
|
|
|
|
|
|
if (IN_ARE_MASKED_ADDR_EQUAL(lle_addr, addr, mask) == 0)
|
|
|
|
return (0);
|
|
|
|
|
|
|
|
if (lle->la_flags & LLE_IFADDR) {
|
|
|
|
/*
|
|
|
|
* Delete LLE_IFADDR records IFF address & flag matches.
|
|
|
|
* Note that addr is the interface address within prefix
|
|
|
|
* being matched.
|
|
|
|
* Note also we should handle 'ifdown' cases without removing
|
|
|
|
* ifaddr macs.
|
|
|
|
*/
|
|
|
|
if (addr.s_addr == lle_addr.s_addr && (flags & LLE_STATIC) != 0)
|
|
|
|
return (1);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* flags & LLE_STATIC means deleting both dynamic and static entries */
|
|
|
|
if ((flags & LLE_STATIC) || !(lle->la_flags & LLE_STATIC))
|
2015-08-10 12:03:59 +00:00
|
|
|
return (1);
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
in_lltable_free_entry(struct lltable *llt, struct llentry *lle)
|
|
|
|
{
|
2010-11-12 22:03:02 +00:00
|
|
|
size_t pkts_dropped;
|
2009-05-20 21:07:15 +00:00
|
|
|
|
2015-08-10 12:03:59 +00:00
|
|
|
LLE_WLOCK_ASSERT(lle);
|
|
|
|
KASSERT(llt != NULL, ("lltable is NULL"));
|
|
|
|
|
|
|
|
/* Unlink entry from table if not already */
|
|
|
|
if ((lle->la_flags & LLE_LINKED) != 0) {
|
2018-05-19 05:56:21 +00:00
|
|
|
IF_AFDATA_WLOCK_ASSERT(llt->llt_ifp);
|
2015-08-10 12:03:59 +00:00
|
|
|
lltable_unlink_entry(llt, lle);
|
2009-05-20 21:07:15 +00:00
|
|
|
}
|
|
|
|
|
2015-08-10 12:03:59 +00:00
|
|
|
/* Drop hold queue */
|
|
|
|
pkts_dropped = llentry_free(lle);
|
|
|
|
ARPSTAT_ADD(dropped, pkts_dropped);
|
|
|
|
}
|
2009-05-20 21:07:15 +00:00
|
|
|
|
This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
|
|
|
static int
|
2009-12-30 21:35:34 +00:00
|
|
|
in_lltable_rtcheck(struct ifnet *ifp, u_int flags, const struct sockaddr *l3addr)
|
This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
|
|
|
{
|
lltable: do not require prefix lookup when checking lle allocation rules.
With the new FIB_ALGO infrastructure, nearly all subsystems use
fib[46]_lookup() functions, which provides lockless lookups.
A number of places remains that uses old-style lookup functions, that
still requires RIB read lock to return the result. One of such places
is arp processing code.
FIB_ALGO implementation makes some tradeoffs, resulting in (relatively)
prolonged periods of holding RIB_WLOCK. If the lock is held and datapath
competes for it, the RX ring may get blocked, ending in traffic delays and losses.
As currently arp processing is performed directly in the interrupt handler,
handling ARP replies triggers the problem descibed above when the amount of
ARP replies is high.
To be more specific, prior to creating new ARP entry, routing lookup for the entry
address in interface fib is executed. The following conditions are the verified:
1. If lookup returns an empty result, or the resulting prefix is non-directly-reachable,
failure is returned. The only exception are host routes w/ gateway==address.
2. If the routing lookup returns different interface and non-host route,
we want to support the use case of having multiple interfaces with the same prefix.
In fact, the current code just checks if the returned prefix covers target address
(always true) and effectively allow allocating ARP entries for any directly-reachable prefix,
regardless of its interface.
Change the code to perform the following:
1) use fib4_lookup() to get the nexthop, instead of requesting exact prefix.
2) Rewrite first condition check using nexthop flags (1:1 match)
3) Rewrite second condition to check for interface addresses matching target address on
the input interface.
Differential Revision: https://reviews.freebsd.org/D31824
Reviewed by: ae
MFC after: 1 week
PR: 257965
2021-09-03 11:48:36 +00:00
|
|
|
struct nhop_object *nh;
|
|
|
|
struct in_addr addr;
|
This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
|
|
|
|
|
|
|
KASSERT(l3addr->sa_family == AF_INET,
|
|
|
|
("sin_family %d", l3addr->sa_family));
|
|
|
|
|
lltable: do not require prefix lookup when checking lle allocation rules.
With the new FIB_ALGO infrastructure, nearly all subsystems use
fib[46]_lookup() functions, which provides lockless lookups.
A number of places remains that uses old-style lookup functions, that
still requires RIB read lock to return the result. One of such places
is arp processing code.
FIB_ALGO implementation makes some tradeoffs, resulting in (relatively)
prolonged periods of holding RIB_WLOCK. If the lock is held and datapath
competes for it, the RX ring may get blocked, ending in traffic delays and losses.
As currently arp processing is performed directly in the interrupt handler,
handling ARP replies triggers the problem descibed above when the amount of
ARP replies is high.
To be more specific, prior to creating new ARP entry, routing lookup for the entry
address in interface fib is executed. The following conditions are the verified:
1. If lookup returns an empty result, or the resulting prefix is non-directly-reachable,
failure is returned. The only exception are host routes w/ gateway==address.
2. If the routing lookup returns different interface and non-host route,
we want to support the use case of having multiple interfaces with the same prefix.
In fact, the current code just checks if the returned prefix covers target address
(always true) and effectively allow allocating ARP entries for any directly-reachable prefix,
regardless of its interface.
Change the code to perform the following:
1) use fib4_lookup() to get the nexthop, instead of requesting exact prefix.
2) Rewrite first condition check using nexthop flags (1:1 match)
3) Rewrite second condition to check for interface addresses matching target address on
the input interface.
Differential Revision: https://reviews.freebsd.org/D31824
Reviewed by: ae
MFC after: 1 week
PR: 257965
2021-09-03 11:48:36 +00:00
|
|
|
addr = ((const struct sockaddr_in *)l3addr)->sin_addr;
|
2011-07-08 09:38:33 +00:00
|
|
|
|
lltable: do not require prefix lookup when checking lle allocation rules.
With the new FIB_ALGO infrastructure, nearly all subsystems use
fib[46]_lookup() functions, which provides lockless lookups.
A number of places remains that uses old-style lookup functions, that
still requires RIB read lock to return the result. One of such places
is arp processing code.
FIB_ALGO implementation makes some tradeoffs, resulting in (relatively)
prolonged periods of holding RIB_WLOCK. If the lock is held and datapath
competes for it, the RX ring may get blocked, ending in traffic delays and losses.
As currently arp processing is performed directly in the interrupt handler,
handling ARP replies triggers the problem descibed above when the amount of
ARP replies is high.
To be more specific, prior to creating new ARP entry, routing lookup for the entry
address in interface fib is executed. The following conditions are the verified:
1. If lookup returns an empty result, or the resulting prefix is non-directly-reachable,
failure is returned. The only exception are host routes w/ gateway==address.
2. If the routing lookup returns different interface and non-host route,
we want to support the use case of having multiple interfaces with the same prefix.
In fact, the current code just checks if the returned prefix covers target address
(always true) and effectively allow allocating ARP entries for any directly-reachable prefix,
regardless of its interface.
Change the code to perform the following:
1) use fib4_lookup() to get the nexthop, instead of requesting exact prefix.
2) Rewrite first condition check using nexthop flags (1:1 match)
3) Rewrite second condition to check for interface addresses matching target address on
the input interface.
Differential Revision: https://reviews.freebsd.org/D31824
Reviewed by: ae
MFC after: 1 week
PR: 257965
2021-09-03 11:48:36 +00:00
|
|
|
nh = fib4_lookup(ifp->if_fib, addr, 0, NHR_NONE, 0);
|
|
|
|
if (nh == NULL)
|
2011-10-03 19:06:55 +00:00
|
|
|
return (EINVAL);
|
|
|
|
|
2011-07-08 09:38:33 +00:00
|
|
|
/*
|
|
|
|
* If the gateway for an existing host route matches the target L3
|
2011-10-03 19:06:55 +00:00
|
|
|
* address, which is a special route inserted by some implementation
|
|
|
|
* such as MANET, and the interface is of the correct type, then
|
|
|
|
* allow for ARP to proceed.
|
2011-07-08 09:38:33 +00:00
|
|
|
*/
|
lltable: do not require prefix lookup when checking lle allocation rules.
With the new FIB_ALGO infrastructure, nearly all subsystems use
fib[46]_lookup() functions, which provides lockless lookups.
A number of places remains that uses old-style lookup functions, that
still requires RIB read lock to return the result. One of such places
is arp processing code.
FIB_ALGO implementation makes some tradeoffs, resulting in (relatively)
prolonged periods of holding RIB_WLOCK. If the lock is held and datapath
competes for it, the RX ring may get blocked, ending in traffic delays and losses.
As currently arp processing is performed directly in the interrupt handler,
handling ARP replies triggers the problem descibed above when the amount of
ARP replies is high.
To be more specific, prior to creating new ARP entry, routing lookup for the entry
address in interface fib is executed. The following conditions are the verified:
1. If lookup returns an empty result, or the resulting prefix is non-directly-reachable,
failure is returned. The only exception are host routes w/ gateway==address.
2. If the routing lookup returns different interface and non-host route,
we want to support the use case of having multiple interfaces with the same prefix.
In fact, the current code just checks if the returned prefix covers target address
(always true) and effectively allow allocating ARP entries for any directly-reachable prefix,
regardless of its interface.
Change the code to perform the following:
1) use fib4_lookup() to get the nexthop, instead of requesting exact prefix.
2) Rewrite first condition check using nexthop flags (1:1 match)
3) Rewrite second condition to check for interface addresses matching target address on
the input interface.
Differential Revision: https://reviews.freebsd.org/D31824
Reviewed by: ae
MFC after: 1 week
PR: 257965
2021-09-03 11:48:36 +00:00
|
|
|
if (nh->nh_flags & NHF_GATEWAY) {
|
|
|
|
if (!(nh->nh_flags & NHF_HOST) || nh->nh_ifp->if_type != IFT_ETHER ||
|
|
|
|
(nh->nh_ifp->if_flags & (IFF_NOARP | IFF_STATICARP)) != 0 ||
|
|
|
|
memcmp(nh->gw_sa.sa_data, l3addr->sa_data,
|
2012-07-31 11:31:12 +00:00
|
|
|
sizeof(in_addr_t)) != 0) {
|
2011-10-10 17:41:11 +00:00
|
|
|
return (EINVAL);
|
|
|
|
}
|
2011-10-03 19:51:18 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2012-07-31 11:31:12 +00:00
|
|
|
* Make sure that at least the destination address is covered
|
|
|
|
* by the route. This is for handling the case where 2 or more
|
2011-10-03 19:51:18 +00:00
|
|
|
* interfaces have the same prefix. An incoming packet arrives
|
|
|
|
* on one interface and the corresponding outgoing packet leaves
|
|
|
|
* another interface.
|
|
|
|
*/
|
lltable: do not require prefix lookup when checking lle allocation rules.
With the new FIB_ALGO infrastructure, nearly all subsystems use
fib[46]_lookup() functions, which provides lockless lookups.
A number of places remains that uses old-style lookup functions, that
still requires RIB read lock to return the result. One of such places
is arp processing code.
FIB_ALGO implementation makes some tradeoffs, resulting in (relatively)
prolonged periods of holding RIB_WLOCK. If the lock is held and datapath
competes for it, the RX ring may get blocked, ending in traffic delays and losses.
As currently arp processing is performed directly in the interrupt handler,
handling ARP replies triggers the problem descibed above when the amount of
ARP replies is high.
To be more specific, prior to creating new ARP entry, routing lookup for the entry
address in interface fib is executed. The following conditions are the verified:
1. If lookup returns an empty result, or the resulting prefix is non-directly-reachable,
failure is returned. The only exception are host routes w/ gateway==address.
2. If the routing lookup returns different interface and non-host route,
we want to support the use case of having multiple interfaces with the same prefix.
In fact, the current code just checks if the returned prefix covers target address
(always true) and effectively allow allocating ARP entries for any directly-reachable prefix,
regardless of its interface.
Change the code to perform the following:
1) use fib4_lookup() to get the nexthop, instead of requesting exact prefix.
2) Rewrite first condition check using nexthop flags (1:1 match)
3) Rewrite second condition to check for interface addresses matching target address on
the input interface.
Differential Revision: https://reviews.freebsd.org/D31824
Reviewed by: ae
MFC after: 1 week
PR: 257965
2021-09-03 11:48:36 +00:00
|
|
|
if ((nh->nh_ifp != ifp) && (nh->nh_flags & NHF_HOST) == 0) {
|
|
|
|
struct in_ifaddr *ia = (struct in_ifaddr *)ifaof_ifpforaddr(l3addr, ifp);
|
|
|
|
struct in_addr dst_addr, mask_addr;
|
|
|
|
|
|
|
|
if (ia == NULL)
|
|
|
|
return (EINVAL);
|
2011-10-03 19:51:18 +00:00
|
|
|
|
2011-10-25 04:06:29 +00:00
|
|
|
/*
|
lltable: do not require prefix lookup when checking lle allocation rules.
With the new FIB_ALGO infrastructure, nearly all subsystems use
fib[46]_lookup() functions, which provides lockless lookups.
A number of places remains that uses old-style lookup functions, that
still requires RIB read lock to return the result. One of such places
is arp processing code.
FIB_ALGO implementation makes some tradeoffs, resulting in (relatively)
prolonged periods of holding RIB_WLOCK. If the lock is held and datapath
competes for it, the RX ring may get blocked, ending in traffic delays and losses.
As currently arp processing is performed directly in the interrupt handler,
handling ARP replies triggers the problem descibed above when the amount of
ARP replies is high.
To be more specific, prior to creating new ARP entry, routing lookup for the entry
address in interface fib is executed. The following conditions are the verified:
1. If lookup returns an empty result, or the resulting prefix is non-directly-reachable,
failure is returned. The only exception are host routes w/ gateway==address.
2. If the routing lookup returns different interface and non-host route,
we want to support the use case of having multiple interfaces with the same prefix.
In fact, the current code just checks if the returned prefix covers target address
(always true) and effectively allow allocating ARP entries for any directly-reachable prefix,
regardless of its interface.
Change the code to perform the following:
1) use fib4_lookup() to get the nexthop, instead of requesting exact prefix.
2) Rewrite first condition check using nexthop flags (1:1 match)
3) Rewrite second condition to check for interface addresses matching target address on
the input interface.
Differential Revision: https://reviews.freebsd.org/D31824
Reviewed by: ae
MFC after: 1 week
PR: 257965
2021-09-03 11:48:36 +00:00
|
|
|
* ifaof_ifpforaddr() returns _best matching_ IFA.
|
|
|
|
* It is possible that ifa prefix does not cover our address.
|
|
|
|
* Explicitly verify and fail if that's the case.
|
2011-10-25 04:06:29 +00:00
|
|
|
*/
|
lltable: do not require prefix lookup when checking lle allocation rules.
With the new FIB_ALGO infrastructure, nearly all subsystems use
fib[46]_lookup() functions, which provides lockless lookups.
A number of places remains that uses old-style lookup functions, that
still requires RIB read lock to return the result. One of such places
is arp processing code.
FIB_ALGO implementation makes some tradeoffs, resulting in (relatively)
prolonged periods of holding RIB_WLOCK. If the lock is held and datapath
competes for it, the RX ring may get blocked, ending in traffic delays and losses.
As currently arp processing is performed directly in the interrupt handler,
handling ARP replies triggers the problem descibed above when the amount of
ARP replies is high.
To be more specific, prior to creating new ARP entry, routing lookup for the entry
address in interface fib is executed. The following conditions are the verified:
1. If lookup returns an empty result, or the resulting prefix is non-directly-reachable,
failure is returned. The only exception are host routes w/ gateway==address.
2. If the routing lookup returns different interface and non-host route,
we want to support the use case of having multiple interfaces with the same prefix.
In fact, the current code just checks if the returned prefix covers target address
(always true) and effectively allow allocating ARP entries for any directly-reachable prefix,
regardless of its interface.
Change the code to perform the following:
1) use fib4_lookup() to get the nexthop, instead of requesting exact prefix.
2) Rewrite first condition check using nexthop flags (1:1 match)
3) Rewrite second condition to check for interface addresses matching target address on
the input interface.
Differential Revision: https://reviews.freebsd.org/D31824
Reviewed by: ae
MFC after: 1 week
PR: 257965
2021-09-03 11:48:36 +00:00
|
|
|
dst_addr = IA_SIN(ia)->sin_addr;
|
|
|
|
mask_addr.s_addr = htonl(ia->ia_subnetmask);
|
2011-10-25 04:06:29 +00:00
|
|
|
|
lltable: do not require prefix lookup when checking lle allocation rules.
With the new FIB_ALGO infrastructure, nearly all subsystems use
fib[46]_lookup() functions, which provides lockless lookups.
A number of places remains that uses old-style lookup functions, that
still requires RIB read lock to return the result. One of such places
is arp processing code.
FIB_ALGO implementation makes some tradeoffs, resulting in (relatively)
prolonged periods of holding RIB_WLOCK. If the lock is held and datapath
competes for it, the RX ring may get blocked, ending in traffic delays and losses.
As currently arp processing is performed directly in the interrupt handler,
handling ARP replies triggers the problem descibed above when the amount of
ARP replies is high.
To be more specific, prior to creating new ARP entry, routing lookup for the entry
address in interface fib is executed. The following conditions are the verified:
1. If lookup returns an empty result, or the resulting prefix is non-directly-reachable,
failure is returned. The only exception are host routes w/ gateway==address.
2. If the routing lookup returns different interface and non-host route,
we want to support the use case of having multiple interfaces with the same prefix.
In fact, the current code just checks if the returned prefix covers target address
(always true) and effectively allow allocating ARP entries for any directly-reachable prefix,
regardless of its interface.
Change the code to perform the following:
1) use fib4_lookup() to get the nexthop, instead of requesting exact prefix.
2) Rewrite first condition check using nexthop flags (1:1 match)
3) Rewrite second condition to check for interface addresses matching target address on
the input interface.
Differential Revision: https://reviews.freebsd.org/D31824
Reviewed by: ae
MFC after: 1 week
PR: 257965
2021-09-03 11:48:36 +00:00
|
|
|
if (!IN_ARE_MASKED_ADDR_EQUAL(dst_addr, addr, mask_addr))
|
|
|
|
return (EINVAL);
|
This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
|
|
|
}
|
2011-10-03 19:51:18 +00:00
|
|
|
|
2011-10-10 17:41:11 +00:00
|
|
|
return (0);
|
This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
|
|
|
}
|
|
|
|
|
2015-08-10 12:03:59 +00:00
|
|
|
static inline uint32_t
|
|
|
|
in_lltable_hash_dst(const struct in_addr dst, uint32_t hsize)
|
|
|
|
{
|
|
|
|
|
|
|
|
return (IN_LLTBL_HASH(dst.s_addr, hsize));
|
|
|
|
}
|
|
|
|
|
|
|
|
static uint32_t
|
|
|
|
in_lltable_hash(const struct llentry *lle, uint32_t hsize)
|
|
|
|
{
|
|
|
|
|
2015-08-11 09:26:11 +00:00
|
|
|
return (in_lltable_hash_dst(lle->r_l3addr.addr4, hsize));
|
2015-08-10 12:03:59 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
in_lltable_fill_sa_entry(const struct llentry *lle, struct sockaddr *sa)
|
|
|
|
{
|
|
|
|
struct sockaddr_in *sin;
|
|
|
|
|
|
|
|
sin = (struct sockaddr_in *)sa;
|
|
|
|
bzero(sin, sizeof(*sin));
|
|
|
|
sin->sin_family = AF_INET;
|
|
|
|
sin->sin_len = sizeof(*sin);
|
2015-08-11 09:26:11 +00:00
|
|
|
sin->sin_addr = lle->r_l3addr.addr4;
|
2015-08-10 12:03:59 +00:00
|
|
|
}
|
|
|
|
|
2014-11-15 18:54:07 +00:00
|
|
|
static inline struct llentry *
|
|
|
|
in_lltable_find_dst(struct lltable *llt, struct in_addr dst)
|
|
|
|
{
|
|
|
|
struct llentry *lle;
|
|
|
|
struct llentries *lleh;
|
2015-08-10 12:03:59 +00:00
|
|
|
u_int hashidx;
|
2014-11-15 18:54:07 +00:00
|
|
|
|
2015-01-05 17:23:02 +00:00
|
|
|
hashidx = in_lltable_hash_dst(dst, llt->llt_hsize);
|
2015-08-10 12:03:59 +00:00
|
|
|
lleh = &llt->lle_head[hashidx];
|
2018-05-23 21:02:14 +00:00
|
|
|
CK_LIST_FOREACH(lle, lleh, lle_next) {
|
2014-11-15 18:54:07 +00:00
|
|
|
if (lle->la_flags & LLE_DELETED)
|
|
|
|
continue;
|
2015-08-11 09:26:11 +00:00
|
|
|
if (lle->r_l3addr.addr4.s_addr == dst.s_addr)
|
2014-11-15 18:54:07 +00:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
return (lle);
|
|
|
|
}
|
|
|
|
|
2015-09-14 16:48:19 +00:00
|
|
|
static void
|
|
|
|
in_lltable_delete_entry(struct lltable *llt, struct llentry *lle)
|
2014-11-15 18:54:07 +00:00
|
|
|
{
|
|
|
|
|
2015-09-14 16:48:19 +00:00
|
|
|
lle->la_flags |= LLE_DELETED;
|
|
|
|
EVENTHANDLER_INVOKE(lle_event, lle, LLENTRY_DELETED);
|
2014-11-15 18:54:07 +00:00
|
|
|
#ifdef DIAGNOSTIC
|
2015-09-14 16:48:19 +00:00
|
|
|
log(LOG_INFO, "ifaddr cache = %p is deleted\n", lle);
|
2014-11-15 18:54:07 +00:00
|
|
|
#endif
|
2015-09-14 16:48:19 +00:00
|
|
|
llentry_free(lle);
|
2014-11-15 18:54:07 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static struct llentry *
|
2015-08-20 12:05:17 +00:00
|
|
|
in_lltable_alloc(struct lltable *llt, u_int flags, const struct sockaddr *l3addr)
|
2014-11-15 18:54:07 +00:00
|
|
|
{
|
|
|
|
const struct sockaddr_in *sin = (const struct sockaddr_in *)l3addr;
|
|
|
|
struct ifnet *ifp = llt->llt_ifp;
|
|
|
|
struct llentry *lle;
|
2015-12-31 05:03:27 +00:00
|
|
|
char linkhdr[LLE_MAX_LINKHDR];
|
|
|
|
size_t linkhdrsize;
|
|
|
|
int lladdr_off;
|
2014-11-15 18:54:07 +00:00
|
|
|
|
|
|
|
KASSERT(l3addr->sa_family == AF_INET,
|
|
|
|
("sin_family %d", l3addr->sa_family));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* A route that covers the given address must have
|
|
|
|
* been installed 1st because we are doing a resolution,
|
|
|
|
* verify this.
|
|
|
|
*/
|
|
|
|
if (!(flags & LLE_IFADDR) &&
|
|
|
|
in_lltable_rtcheck(ifp, flags, l3addr) != 0)
|
|
|
|
return (NULL);
|
|
|
|
|
2015-08-11 09:26:11 +00:00
|
|
|
lle = in_lltable_new(sin->sin_addr, flags);
|
2014-11-15 18:54:07 +00:00
|
|
|
if (lle == NULL) {
|
|
|
|
log(LOG_INFO, "lla_lookup: new lle malloc failed\n");
|
|
|
|
return (NULL);
|
|
|
|
}
|
|
|
|
lle->la_flags = flags;
|
2015-12-05 09:50:37 +00:00
|
|
|
if (flags & LLE_STATIC)
|
|
|
|
lle->r_flags |= RLLE_VALID;
|
2014-11-15 18:54:07 +00:00
|
|
|
if ((flags & LLE_IFADDR) == LLE_IFADDR) {
|
2015-12-31 05:03:27 +00:00
|
|
|
linkhdrsize = LLE_MAX_LINKHDR;
|
|
|
|
if (lltable_calc_llheader(ifp, AF_INET, IF_LLADDR(ifp),
|
2016-04-26 23:13:48 +00:00
|
|
|
linkhdr, &linkhdrsize, &lladdr_off) != 0) {
|
2020-01-15 06:05:20 +00:00
|
|
|
NET_EPOCH_CALL(in_lltable_destroy_lle_unlocked, &lle->lle_epoch_ctx);
|
2015-12-31 05:03:27 +00:00
|
|
|
return (NULL);
|
2016-04-26 23:13:48 +00:00
|
|
|
}
|
2015-12-31 05:03:27 +00:00
|
|
|
lltable_set_entry_addr(ifp, lle, linkhdr, linkhdrsize,
|
|
|
|
lladdr_off);
|
2015-11-07 11:12:00 +00:00
|
|
|
lle->la_flags |= LLE_STATIC;
|
2015-12-05 09:50:37 +00:00
|
|
|
lle->r_flags |= (RLLE_VALID | RLLE_IFADDR);
|
2014-11-15 18:54:07 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
return (lle);
|
|
|
|
}
|
|
|
|
|
This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
|
|
|
/*
|
|
|
|
* Return NULL if not found or marked for deletion.
|
|
|
|
* If found return lle read locked.
|
|
|
|
*/
|
|
|
|
static struct llentry *
|
|
|
|
in_lltable_lookup(struct lltable *llt, u_int flags, const struct sockaddr *l3addr)
|
|
|
|
{
|
|
|
|
const struct sockaddr_in *sin = (const struct sockaddr_in *)l3addr;
|
|
|
|
struct llentry *lle;
|
|
|
|
|
2015-08-08 21:41:59 +00:00
|
|
|
IF_AFDATA_LOCK_ASSERT(llt->llt_ifp);
|
This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
|
|
|
KASSERT(l3addr->sa_family == AF_INET,
|
|
|
|
("sin_family %d", l3addr->sa_family));
|
2019-01-23 22:19:49 +00:00
|
|
|
KASSERT((flags & (LLE_UNLOCKED | LLE_EXCLUSIVE)) !=
|
|
|
|
(LLE_UNLOCKED | LLE_EXCLUSIVE),
|
|
|
|
("wrong lle request flags: %#x", flags));
|
This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
|
|
|
|
2019-01-23 22:19:49 +00:00
|
|
|
lle = in_lltable_find_dst(llt, sin->sin_addr);
|
2014-11-15 18:54:07 +00:00
|
|
|
if (lle == NULL)
|
|
|
|
return (NULL);
|
2015-12-05 09:50:37 +00:00
|
|
|
if (flags & LLE_UNLOCKED)
|
|
|
|
return (lle);
|
|
|
|
|
2014-11-15 18:54:07 +00:00
|
|
|
if (flags & LLE_EXCLUSIVE)
|
|
|
|
LLE_WLOCK(lle);
|
|
|
|
else
|
|
|
|
LLE_RLOCK(lle);
|
2012-08-01 09:00:26 +00:00
|
|
|
|
2019-01-23 22:18:23 +00:00
|
|
|
/*
|
|
|
|
* If the afdata lock is not held, the LLE may have been unlinked while
|
|
|
|
* we were blocked on the LLE lock. Check for this case.
|
|
|
|
*/
|
|
|
|
if (__predict_false((lle->la_flags & LLE_LINKED) == 0)) {
|
|
|
|
if (flags & LLE_EXCLUSIVE)
|
|
|
|
LLE_WUNLOCK(lle);
|
|
|
|
else
|
|
|
|
LLE_RUNLOCK(lle);
|
|
|
|
return (NULL);
|
|
|
|
}
|
This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
|
|
|
return (lle);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2015-08-10 12:03:59 +00:00
|
|
|
in_lltable_dump_entry(struct lltable *llt, struct llentry *lle,
|
|
|
|
struct sysctl_req *wr)
|
This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
|
|
|
{
|
|
|
|
struct ifnet *ifp = llt->llt_ifp;
|
|
|
|
/* XXX stack use */
|
|
|
|
struct {
|
|
|
|
struct rt_msghdr rtm;
|
2013-01-31 08:55:21 +00:00
|
|
|
struct sockaddr_in sin;
|
This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
|
|
|
struct sockaddr_dl sdl;
|
|
|
|
} arpc;
|
2015-08-10 12:03:59 +00:00
|
|
|
struct sockaddr_dl *sdl;
|
|
|
|
int error;
|
2012-08-01 09:00:26 +00:00
|
|
|
|
2015-08-10 12:03:59 +00:00
|
|
|
bzero(&arpc, sizeof(arpc));
|
2018-05-15 20:13:00 +00:00
|
|
|
/* skip deleted entries */
|
|
|
|
if ((lle->la_flags & LLE_DELETED) == LLE_DELETED)
|
|
|
|
return (0);
|
|
|
|
/* Skip if jailed and not a valid IP of the prison. */
|
|
|
|
lltable_fill_sa_entry(lle,(struct sockaddr *)&arpc.sin);
|
2018-05-15 20:14:38 +00:00
|
|
|
if (prison_if(wr->td->td_ucred, (struct sockaddr *)&arpc.sin) != 0)
|
2018-05-15 20:13:00 +00:00
|
|
|
return (0);
|
|
|
|
/*
|
|
|
|
* produce a msg made of:
|
|
|
|
* struct rt_msghdr;
|
|
|
|
* struct sockaddr_in; (IPv4)
|
|
|
|
* struct sockaddr_dl;
|
|
|
|
*/
|
|
|
|
arpc.rtm.rtm_msglen = sizeof(arpc);
|
|
|
|
arpc.rtm.rtm_version = RTM_VERSION;
|
|
|
|
arpc.rtm.rtm_type = RTM_GET;
|
|
|
|
arpc.rtm.rtm_flags = RTF_UP;
|
|
|
|
arpc.rtm.rtm_addrs = RTA_DST | RTA_GATEWAY;
|
|
|
|
|
|
|
|
/* publish */
|
|
|
|
if (lle->la_flags & LLE_PUB)
|
|
|
|
arpc.rtm.rtm_flags |= RTF_ANNOUNCE;
|
|
|
|
|
|
|
|
sdl = &arpc.sdl;
|
|
|
|
sdl->sdl_family = AF_LINK;
|
|
|
|
sdl->sdl_len = sizeof(*sdl);
|
|
|
|
sdl->sdl_index = ifp->if_index;
|
|
|
|
sdl->sdl_type = ifp->if_type;
|
|
|
|
if ((lle->la_flags & LLE_VALID) == LLE_VALID) {
|
|
|
|
sdl->sdl_alen = ifp->if_addrlen;
|
|
|
|
bcopy(lle->ll_addr, LLADDR(sdl), ifp->if_addrlen);
|
|
|
|
} else {
|
|
|
|
sdl->sdl_alen = 0;
|
|
|
|
bzero(LLADDR(sdl), ifp->if_addrlen);
|
|
|
|
}
|
This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
|
|
|
|
2018-05-15 20:13:00 +00:00
|
|
|
arpc.rtm.rtm_rmx.rmx_expire =
|
|
|
|
lle->la_flags & LLE_STATIC ? 0 : lle->la_expire;
|
|
|
|
arpc.rtm.rtm_flags |= (RTF_HOST | RTF_LLDATA);
|
|
|
|
if (lle->la_flags & LLE_STATIC)
|
|
|
|
arpc.rtm.rtm_flags |= RTF_STATIC;
|
|
|
|
if (lle->la_flags & LLE_IFADDR)
|
|
|
|
arpc.rtm.rtm_flags |= RTF_PINNED;
|
|
|
|
arpc.rtm.rtm_index = ifp->if_index;
|
|
|
|
error = SYSCTL_OUT(wr, &arpc, sizeof(arpc));
|
2015-08-10 12:03:59 +00:00
|
|
|
|
|
|
|
return (error);
|
This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
|
|
|
}
|
|
|
|
|
2015-01-05 17:23:02 +00:00
|
|
|
static struct lltable *
|
|
|
|
in_lltattach(struct ifnet *ifp)
|
2009-03-09 17:53:05 +00:00
|
|
|
{
|
|
|
|
struct lltable *llt;
|
2014-12-07 17:32:06 +00:00
|
|
|
|
2015-01-05 17:23:02 +00:00
|
|
|
llt = lltable_allocate_htbl(IN_LLTBL_DEFAULT_HSIZE);
|
2015-08-11 05:51:00 +00:00
|
|
|
llt->llt_af = AF_INET;
|
|
|
|
llt->llt_ifp = ifp;
|
2014-12-07 17:32:06 +00:00
|
|
|
|
|
|
|
llt->llt_lookup = in_lltable_lookup;
|
2015-08-20 12:05:17 +00:00
|
|
|
llt->llt_alloc_entry = in_lltable_alloc;
|
2015-09-14 16:48:19 +00:00
|
|
|
llt->llt_delete_entry = in_lltable_delete_entry;
|
2014-12-07 17:32:06 +00:00
|
|
|
llt->llt_dump_entry = in_lltable_dump_entry;
|
|
|
|
llt->llt_hash = in_lltable_hash;
|
2014-12-09 00:48:08 +00:00
|
|
|
llt->llt_fill_sa_entry = in_lltable_fill_sa_entry;
|
2015-08-11 05:51:00 +00:00
|
|
|
llt->llt_free_entry = in_lltable_free_entry;
|
2014-12-07 17:32:06 +00:00
|
|
|
llt->llt_match_prefix = in_lltable_match_prefix;
|
2021-08-02 22:39:00 +00:00
|
|
|
llt->llt_mark_used = llentry_mark_used;
|
2015-08-11 05:51:00 +00:00
|
|
|
lltable_link(llt);
|
2009-03-09 17:53:05 +00:00
|
|
|
|
2015-01-05 17:23:02 +00:00
|
|
|
return (llt);
|
|
|
|
}
|
|
|
|
|
This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
|
|
|
void *
|
|
|
|
in_domifattach(struct ifnet *ifp)
|
2009-03-09 17:53:05 +00:00
|
|
|
{
|
|
|
|
struct in_ifinfo *ii;
|
|
|
|
|
|
|
|
ii = malloc(sizeof(struct in_ifinfo), M_IFADDR, M_WAITOK|M_ZERO);
|
|
|
|
|
2015-01-05 17:23:02 +00:00
|
|
|
ii->ii_llt = in_lltattach(ifp);
|
2009-03-09 17:53:05 +00:00
|
|
|
ii->ii_igmp = igmp_domifattach(ifp);
|
|
|
|
|
2015-08-11 05:51:00 +00:00
|
|
|
return (ii);
|
This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
void
|
2009-03-09 17:53:05 +00:00
|
|
|
in_domifdetach(struct ifnet *ifp, void *aux)
|
This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
|
|
|
{
|
2009-03-09 17:53:05 +00:00
|
|
|
struct in_ifinfo *ii = (struct in_ifinfo *)aux;
|
This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
|
|
|
|
2009-03-09 17:53:05 +00:00
|
|
|
igmp_domifdetach(ifp);
|
|
|
|
lltable_free(ii->ii_llt);
|
|
|
|
free(ii, M_IFADDR);
|
This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
|
|
|
}
|