Kill custom in_matroute() radix mathing function removing one rte mutex lock.

Initially in_matrote() in_clsroute() in their current state was introduced by
r4105 20 years ago. Instead of deleting inactive routes immediately, we kept them
in route table, setting RTPRF_OURS flag and some expire time. After that, either
GC came or RTPRF_OURS got removed on first-packet. It was a good solution
in that days (and probably another decade after that) to keep TCP metrics.
However, after moving metrics to TCP hostcache in r122922, most of in_rmx
functionality became unused. It might had been used for flushing icmp-originated
routes before rte mutexes/refcounting, but I'm not sure about that.

So it looks like this is nearly impossible to make GC do its work nowadays:

in_rtkill() ignores non-RTPRF_OURS routes.
route can only become RTPRF_OURS after dropping last reference via rtfree()
which calls in_clsroute(), which, it turn, ignores UP and non-RTF_DYNAMIC routes.

Dynamic routes can still be installed via received redirect, but they
have default lifetime (no specific rt_expire) and no one has another trie walker
to call RTFREE() on them.

So, the changelist:
* remove custom rnh_match / rnh_close matching function.
* remove all GC functions
* partially revert r256695 (proto3 is no more used inside kernel,
  it is not possible to use rt_expire from user point of view, proto3 support
  is not complete)
* Finish r241884 (similar to this commit) and remove remaining IPv6 parts

MFC after:	1 month
This commit is contained in:
melifaro 2014-11-11 02:52:40 +00:00
parent 42e544acc7
commit 12580bcaa8
14 changed files with 10 additions and 395 deletions

View File

@ -40,7 +40,6 @@ osi
prefixlen
proto1
proto2
proto3
proxy
recvpipe
reject

View File

@ -28,7 +28,7 @@
.\" @(#)route.8 8.3 (Berkeley) 3/19/94
.\" $FreeBSD$
.\"
.Dd January 11, 2014
.Dd November 11, 2014
.Dt ROUTE 8
.Os
.Sh NAME
@ -315,7 +315,6 @@ by indicating the following corresponding modifiers:
-blackhole RTF_BLACKHOLE - silently discard pkts (during updates)
-proto1 RTF_PROTO1 - set protocol specific routing flag #1
-proto2 RTF_PROTO2 - set protocol specific routing flag #2
-proto3 RTF_PROTO3 - set protocol specific routing flag #3
.Ed
.Pp
The optional modifiers

View File

@ -847,9 +847,6 @@ newroute(int argc, char **argv)
case K_PROTO2:
flags |= RTF_PROTO2;
break;
case K_PROTO3:
flags |= RTF_PROTO3;
break;
case K_PROXY:
nrflags |= F_PROXY;
break;

View File

@ -211,21 +211,6 @@ Boolean: enable/disable accepting of source-routed IP packets (default false).
.It Dv IPCTL_SOURCEROUTE
.Pq ip.sourceroute
Boolean: enable/disable forwarding of source-routed IP packets (default false).
.It Dv IPCTL_RTEXPIRE
.Pq ip.rtexpire
Integer: lifetime in seconds of protocol-cloned
.Tn IP
routes after the last reference drops (default one hour).
This value varies dynamically as described above.
.It Dv IPCTL_RTMINEXPIRE
.Pq ip.rtminexpire
Integer: minimum value of ip.rtexpire (default ten seconds).
This value has no effect on user modifications, but restricts the dynamic
adaptation described above.
.It Dv IPCTL_RTMAXCACHE
.Pq ip.rtmaxcache
Integer: trigger level of cached, unreferenced, protocol-cloned routes
which initiates dynamic adaptation (default 128).
.It Va ip.process_options
Integer: control IP options processing.
By setting this variable to 0, all IP options in the incoming packets

View File

@ -312,21 +312,6 @@ mapped address on
.Dv AF_INET6
sockets.
Defaults to on.
.It Dv IPV6CTL_RTEXPIRE
.Pq ip6.rtexpire
Integer: lifetime in seconds of protocol-cloned
.Tn IP
routes after the last reference drops (default one hour).
.\"This value varies dynamically as described above.
.It Dv IPV6CTL_RTMINEXPIRE
.Pq ip6.rtminexpire
Integer: minimum value of ip.rtexpire (default ten seconds).
.\"This value has no effect on user modifications, but restricts the dynamic
.\"adaptation described above.
.It Dv IPV6CTL_RTMAXCACHE
.Pq ip6.rtmaxcache
Integer: trigger level of cached, unreferenced, protocol-cloned routes
which initiates dynamic adaptation (default 128).
.El
.Ss Interaction between IPv4/v6 sockets
By default,

View File

@ -894,41 +894,6 @@ A competent sysadmin will turn off all
of these
.Xr inetd 8 Ns -internal
test services.
.Pp
Spoofed packet attacks may also be used to overload the kernel route cache.
Refer to the
.Va net.inet.ip.rtexpire , net.inet.ip.rtminexpire ,
and
.Va net.inet.ip.rtmaxcache
.Xr sysctl 8
variables.
A spoofed packet attack that uses a random source IP will cause
the kernel to generate a temporary cached route in the route table, viewable
with
.Dq Li "netstat -rna | fgrep W3" .
These routes typically timeout in 1600
seconds or so.
If the kernel detects that the cached route table has gotten
too big it will dynamically reduce the
.Va rtexpire
but will never decrease it to
less than
.Va rtminexpire .
There are two problems: (1) The kernel does not react
quickly enough when a lightly loaded server is suddenly attacked, and (2) The
.Va rtminexpire
is not low enough for the kernel to survive a sustained attack.
If your servers are connected to the internet via a T3 or better it may be
prudent to manually override both
.Va rtexpire
and
.Va rtminexpire
via
.Xr sysctl 8 .
Never set either parameter to zero
(unless you want to crash the machine :-)).
Setting both parameters to 2 seconds should be sufficient to protect the route
table from attack.
.Sh ACCESS ISSUES WITH KERBEROS AND SSH
There are a few issues with both Kerberos and SSH that need to be addressed
if you intend to use them.

View File

@ -619,9 +619,9 @@ int getsourcefilter(int, uint32_t, struct sockaddr *, socklen_t,
#ifdef notyet
#define IPCTL_DEFMTU 4 /* default MTU */
#endif
#define IPCTL_RTEXPIRE 5 /* cloned route expiration time */
#define IPCTL_RTMINEXPIRE 6 /* min value for expiration time */
#define IPCTL_RTMAXCACHE 7 /* trigger level for dynamic expire */
/* IPCTL_RTEXPIRE 5 deprecated */
/* IPCTL_RTMINEXPIRE 6 deprecated */
/* IPCTL_RTMAXCACHE 7 deprecated */
#define IPCTL_SOURCEROUTE 8 /* may perform source routes */
#define IPCTL_DIRECTEDBROADCAST 9 /* may re-broadcast received packets */
#define IPCTL_INTRQMAXLEN 10 /* max length of netisr queue */

View File

@ -36,8 +36,6 @@ __FBSDID("$FreeBSD$");
#include <sys/sysctl.h>
#include <sys/socket.h>
#include <sys/mbuf.h>
#include <sys/syslog.h>
#include <sys/callout.h>
#include <net/if.h>
#include <net/if_var.h>
@ -55,8 +53,6 @@ extern int in_inithead(void **head, int off);
extern int in_detachhead(void **head, int off);
#endif
#define RTPRF_OURS RTF_PROTO3 /* set on routes we manage */
/*
* Do what we need to do when inserting a route.
*/
@ -110,238 +106,6 @@ in_addroute(void *v_arg, void *n_arg, struct radix_node_head *head,
return (rn_addroute(v_arg, n_arg, head, treenodes));
}
/*
* This code is the inverse of in_clsroute: on first reference, if we
* were managing the route, stop doing so and set the expiration timer
* back off again.
*/
static struct radix_node *
in_matroute(void *v_arg, struct radix_node_head *head)
{
struct radix_node *rn = rn_match(v_arg, head);
struct rtentry *rt = (struct rtentry *)rn;
if (rt) {
RT_LOCK(rt);
if (rt->rt_flags & RTPRF_OURS) {
rt->rt_flags &= ~RTPRF_OURS;
rt->rt_expire = 0;
}
RT_UNLOCK(rt);
}
return rn;
}
static VNET_DEFINE(int, rtq_reallyold) = 60*60; /* one hour is "really old" */
#define V_rtq_reallyold VNET(rtq_reallyold)
SYSCTL_INT(_net_inet_ip, IPCTL_RTEXPIRE, rtexpire, CTLFLAG_VNET | CTLFLAG_RW,
&VNET_NAME(rtq_reallyold), 0,
"Default expiration time on dynamically learned routes");
/* never automatically crank down to less */
static VNET_DEFINE(int, rtq_minreallyold) = 10;
#define V_rtq_minreallyold VNET(rtq_minreallyold)
SYSCTL_INT(_net_inet_ip, IPCTL_RTMINEXPIRE, rtminexpire, CTLFLAG_VNET | CTLFLAG_RW,
&VNET_NAME(rtq_minreallyold), 0,
"Minimum time to attempt to hold onto dynamically learned routes");
/* 128 cached routes is "too many" */
static VNET_DEFINE(int, rtq_toomany) = 128;
#define V_rtq_toomany VNET(rtq_toomany)
SYSCTL_INT(_net_inet_ip, IPCTL_RTMAXCACHE, rtmaxcache, CTLFLAG_VNET | CTLFLAG_RW,
&VNET_NAME(rtq_toomany), 0,
"Upper limit on dynamically learned routes");
/*
* On last reference drop, mark the route as belong to us so that it can be
* timed out.
*/
static void
in_clsroute(struct radix_node *rn, struct radix_node_head *head)
{
struct rtentry *rt = (struct rtentry *)rn;
RT_LOCK_ASSERT(rt);
if (!(rt->rt_flags & RTF_UP))
return; /* prophylactic measures */
if (rt->rt_flags & RTPRF_OURS)
return;
if (!(rt->rt_flags & RTF_DYNAMIC))
return;
/*
* If rtq_reallyold is 0, just delete the route without
* waiting for a timeout cycle to kill it.
*/
if (V_rtq_reallyold != 0) {
rt->rt_flags |= RTPRF_OURS;
rt->rt_expire = time_uptime + V_rtq_reallyold;
} else
rt_expunge(head, rt);
}
struct rtqk_arg {
struct radix_node_head *rnh;
int draining;
int killed;
int found;
int updating;
time_t nextstop;
};
/*
* Get rid of old routes. When draining, this deletes everything, even when
* the timeout is not expired yet. When updating, this makes sure that
* nothing has a timeout longer than the current value of rtq_reallyold.
*/
static int
in_rtqkill(struct radix_node *rn, void *rock)
{
struct rtqk_arg *ap = rock;
struct rtentry *rt = (struct rtentry *)rn;
int err;
RADIX_NODE_HEAD_WLOCK_ASSERT(ap->rnh);
if (rt->rt_flags & RTPRF_OURS) {
ap->found++;
if (ap->draining || rt->rt_expire <= time_uptime) {
if (rt->rt_refcnt > 0)
panic("rtqkill route really not free");
err = in_rtrequest(RTM_DELETE,
(struct sockaddr *)rt_key(rt),
rt->rt_gateway, rt_mask(rt),
rt->rt_flags | RTF_RNH_LOCKED, 0,
rt->rt_fibnum);
if (err) {
log(LOG_WARNING, "in_rtqkill: error %d\n", err);
} else {
ap->killed++;
}
} else {
if (ap->updating &&
(rt->rt_expire - time_uptime > V_rtq_reallyold))
rt->rt_expire = time_uptime + V_rtq_reallyold;
ap->nextstop = lmin(ap->nextstop, rt->rt_expire);
}
}
return 0;
}
#define RTQ_TIMEOUT 60*10 /* run no less than once every ten minutes */
static VNET_DEFINE(int, rtq_timeout) = RTQ_TIMEOUT;
static VNET_DEFINE(struct callout, rtq_timer);
#define V_rtq_timeout VNET(rtq_timeout)
#define V_rtq_timer VNET(rtq_timer)
static void in_rtqtimo_one(void *rock);
static void
in_rtqtimo(void *rock)
{
CURVNET_SET((struct vnet *) rock);
int fibnum;
void *newrock;
struct timeval atv;
for (fibnum = 0; fibnum < rt_numfibs; fibnum++) {
newrock = rt_tables_get_rnh(fibnum, AF_INET);
if (newrock != NULL)
in_rtqtimo_one(newrock);
}
atv.tv_usec = 0;
atv.tv_sec = V_rtq_timeout;
callout_reset(&V_rtq_timer, tvtohz(&atv), in_rtqtimo, rock);
CURVNET_RESTORE();
}
static void
in_rtqtimo_one(void *rock)
{
struct radix_node_head *rnh = rock;
struct rtqk_arg arg;
static time_t last_adjusted_timeout = 0;
arg.found = arg.killed = 0;
arg.rnh = rnh;
arg.nextstop = time_uptime + V_rtq_timeout;
arg.draining = arg.updating = 0;
RADIX_NODE_HEAD_LOCK(rnh);
rnh->rnh_walktree(rnh, in_rtqkill, &arg);
RADIX_NODE_HEAD_UNLOCK(rnh);
/*
* Attempt to be somewhat dynamic about this:
* If there are ``too many'' routes sitting around taking up space,
* then crank down the timeout, and see if we can't make some more
* go away. However, we make sure that we will never adjust more
* than once in rtq_timeout seconds, to keep from cranking down too
* hard.
*/
if ((arg.found - arg.killed > V_rtq_toomany) &&
(time_uptime - last_adjusted_timeout >= V_rtq_timeout) &&
V_rtq_reallyold > V_rtq_minreallyold) {
V_rtq_reallyold = 2 * V_rtq_reallyold / 3;
if (V_rtq_reallyold < V_rtq_minreallyold) {
V_rtq_reallyold = V_rtq_minreallyold;
}
last_adjusted_timeout = time_uptime;
#ifdef DIAGNOSTIC
log(LOG_DEBUG, "in_rtqtimo: adjusted rtq_reallyold to %d\n",
V_rtq_reallyold);
#endif
arg.found = arg.killed = 0;
arg.updating = 1;
RADIX_NODE_HEAD_LOCK(rnh);
rnh->rnh_walktree(rnh, in_rtqkill, &arg);
RADIX_NODE_HEAD_UNLOCK(rnh);
}
}
void
in_rtqdrain(void)
{
VNET_ITERATOR_DECL(vnet_iter);
struct radix_node_head *rnh;
struct rtqk_arg arg;
int fibnum;
VNET_LIST_RLOCK_NOSLEEP();
VNET_FOREACH(vnet_iter) {
CURVNET_SET(vnet_iter);
for ( fibnum = 0; fibnum < rt_numfibs; fibnum++) {
rnh = rt_tables_get_rnh(fibnum, AF_INET);
arg.found = arg.killed = 0;
arg.rnh = rnh;
arg.nextstop = 0;
arg.draining = 1;
arg.updating = 0;
RADIX_NODE_HEAD_LOCK(rnh);
rnh->rnh_walktree(rnh, in_rtqkill, &arg);
RADIX_NODE_HEAD_UNLOCK(rnh);
}
CURVNET_RESTORE();
}
VNET_LIST_RUNLOCK_NOSLEEP();
}
void
in_setmatchfunc(struct radix_node_head *rnh, int val)
{
rnh->rnh_matchaddr = (val != 0) ? rn_match : in_matroute;
}
static int _in_rt_was_here;
/*
* Initialize our routing tree.
@ -358,11 +122,7 @@ in_inithead(void **head, int off)
RADIX_NODE_HEAD_LOCK_INIT(rnh);
rnh->rnh_addaddr = in_addroute;
in_setmatchfunc(rnh, V_drop_redirect);
rnh->rnh_close = in_clsroute;
if (_in_rt_was_here == 0 ) {
callout_init(&V_rtq_timer, CALLOUT_MPSAFE);
callout_reset(&V_rtq_timer, 1, in_rtqtimo, curvnet);
_in_rt_was_here = 1;
}
return 1;
@ -373,7 +133,6 @@ int
in_detachhead(void **head, int off)
{
callout_drain(&V_rtq_timer);
return (1);
}
#endif

View File

@ -407,7 +407,6 @@ int in_leavegroup_locked(struct in_multi *,
/*const*/ struct in_mfilter *);
int in_control(struct socket *, u_long, caddr_t, struct ifnet *,
struct thread *);
void in_rtqdrain(void);
int in_addprefix(struct in_ifaddr *, int);
int in_scrubprefix(struct in_ifaddr *, u_int);
void ip_input(struct mbuf *);
@ -426,7 +425,6 @@ void in_rtredirect(struct sockaddr *, struct sockaddr *,
struct sockaddr *, int, struct sockaddr *, u_int);
int in_rtrequest(int, struct sockaddr *,
struct sockaddr *, struct sockaddr *, int, struct rtentry **, u_int);
void in_setmatchfunc(struct radix_node_head *, int);
#if 0
int in_rt_getifa(struct rt_addrinfo *, u_int fibnum);

View File

@ -115,6 +115,9 @@ SYSCTL_UINT(_net_inet_icmp, OID_AUTO, maskfake, CTLFLAG_VNET | CTLFLAG_RW,
"Fake reply to ICMP Address Mask Request packets.");
VNET_DEFINE(int, drop_redirect) = 0;
#define V_drop_redirect VNET(drop_redirect)
SYSCTL_INT(_net_inet_icmp, OID_AUTO, drop_redirect, CTLFLAG_VNET | CTLFLAG_RW,
&VNET_NAME(drop_redirect), 0, "Ignore ICMP redirects");
static VNET_DEFINE(int, log_redirect) = 0;
#define V_log_redirect VNET(log_redirect)
@ -163,39 +166,6 @@ static void icmp_send(struct mbuf *, struct mbuf *);
extern struct protosw inetsw[];
static int
sysctl_net_icmp_drop_redir(SYSCTL_HANDLER_ARGS)
{
int error, new;
int i;
struct radix_node_head *rnh;
new = V_drop_redirect;
error = sysctl_handle_int(oidp, &new, 0, req);
if (error == 0 && req->newptr) {
new = (new != 0) ? 1 : 0;
if (new == V_drop_redirect)
return (0);
for (i = 0; i < rt_numfibs; i++) {
if ((rnh = rt_tables_get_rnh(i, AF_INET)) == NULL)
continue;
RADIX_NODE_HEAD_LOCK(rnh);
in_setmatchfunc(rnh, new);
RADIX_NODE_HEAD_UNLOCK(rnh);
}
V_drop_redirect = new;
}
return (error);
}
SYSCTL_PROC(_net_inet_icmp, OID_AUTO, drop_redirect,
CTLFLAG_VNET | CTLTYPE_INT | CTLFLAG_RW, 0, 0,
sysctl_net_icmp_drop_redir, "I", "Ignore ICMP redirects");
/*
* Kernel module interface for updating icmpstat. The argument is an index
* into icmpstat treated as an array of u_long. While this encodes the

View File

@ -1330,7 +1330,6 @@ ip_drain(void)
}
IPQ_UNLOCK();
VNET_LIST_RUNLOCK_NOSLEEP();
in_rtqdrain();
}
/*

View File

@ -593,9 +593,9 @@ struct ip6_mtuinfo {
#define IPV6CTL_MAPPED_ADDR 23
#endif
#define IPV6CTL_V6ONLY 24
#define IPV6CTL_RTEXPIRE 25 /* cloned route expiration time */
#define IPV6CTL_RTMINEXPIRE 26 /* min value for expiration time */
#define IPV6CTL_RTMAXCACHE 27 /* trigger level for dynamic expire */
/* IPV6CTL_RTEXPIRE 25 deprecated */
/* IPV6CTL_RTMINEXPIRE 26 deprecated */
/* IPV6CTL_RTMAXCACHE 27 deprecated */
#define IPV6CTL_USETEMPADDR 32 /* use temporary addresses (RFC3041) */
#define IPV6CTL_TEMPPLTIME 33 /* preferred lifetime for tmpaddrs */

View File

@ -66,7 +66,6 @@ __FBSDID("$FreeBSD$");
#include <sys/systm.h>
#include <sys/kernel.h>
#include <sys/lock.h>
#include <sys/sysctl.h>
#include <sys/queue.h>
#include <sys/socket.h>
#include <sys/socketvar.h>
@ -179,24 +178,6 @@ in6_addroute(void *v_arg, void *n_arg, struct radix_node_head *head,
return (ret);
}
SYSCTL_DECL(_net_inet6_ip6);
static VNET_DEFINE(int, rtq_toomany6) = 128;
/* 128 cached routes is ``too many'' */
#define V_rtq_toomany6 VNET(rtq_toomany6)
SYSCTL_INT(_net_inet6_ip6, IPV6CTL_RTMAXCACHE, rtmaxcache, CTLFLAG_VNET | CTLFLAG_RW,
&VNET_NAME(rtq_toomany6) , 0, "");
struct rtqk_arg {
struct radix_node_head *rnh;
int mode;
int updating;
int draining;
int killed;
int found;
time_t nextstop;
};
/*
* Age old PMTUs.
*/

View File

@ -1324,28 +1324,6 @@ bool
Controls the sending of ICMP redirects in response to unforwardable IP
packets.
---
net.inet.ip.rtexpire
int
Lifetime in seconds of protocol-cloned IP routes after the last
reference drops (default one hour).
---
net.inet.ip.rtmaxcache
int
Trigger level of cached, unreferenced, protocol-cloned
routes which initiates dynamic adaptation.
---
net.inet.ip.rtminexpire
int
See
.Xr inet 4
for more information.
---
net.inet.ip.sourceroute
bool