rt_check() in its original form proved to be sufficient and
rt_check_fib() can go away (as can its evil twin in_rt_check()).
I believe this does NOT address the crashes people have been seeing
in rt_check.
MFC after: 1 week
virtualization work done by Marko Zec (zec@).
This is the first in a series of commits over the course
of the next few weeks.
Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.
We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.
Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch
(Other more specific related options will follow)
This allows one to set multiple p2p links to the same place
and select which to use by having each in different FIBS.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)
Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.
From my notes:
-----
One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.
Constraints:
------------
I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.
One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".
One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.
This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.
Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.
To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.
The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.
The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.
In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.
One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).
You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.
This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.
Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.
Packets fall into one of a number of classes.
1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..
setfib -3 ping target.example.com # will use fib 3 for ping.
It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.
2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)
3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).
4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.
5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.
6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.
Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)
In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.
In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.
Early testing experience:
-------------------------
Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.
For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.
Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.
ipfw has grown 2 new keywords:
setfib N ip from anay to any
count ip from any to any fib N
In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.
SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.
Where to next:
--------------------
After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.
Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.
My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.
When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.
Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.
This work was sponsored by Ironport Systems/Cisco
Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco
(ECMP) for both IPv4 and IPv6. Previously, multipath route insertion
is disallowed. For example,
route add -net 192.103.54.0/24 10.9.44.1
route add -net 192.103.54.0/24 10.9.44.2
The second route insertion will trigger an error message of
"add net 192.103.54.0/24: gateway 10.2.5.2: route already in table"
Multiple default routes can also be inserted. Here is the netstat
output:
default 10.2.5.1 UGS 0 3074 bge0 =>
default 10.2.5.2 UGS 0 0 bge0
When multipath routes exist, the "route delete" command requires
a specific gateway to be specified or else an error message would
be displayed. For example,
route delete default
would fail and trigger the following error message:
"route: writing to routing socket: No such process"
"delete net default: not in table"
On the other hand,
route delete default 10.2.5.2
would be successful: "delete net default: gateway 10.2.5.2"
One does not have to specify a gateway if there is only a single
route for a particular destination.
I need to perform more testings on address aliases and multiple
interfaces that have the same IP prefixes. This patch as it
stands today is not yet ready for prime time. Therefore, the ECMP
code fragments are fully guarded by the RADIX_MPATH macro.
Include the "options RADIX_MPATH" in the kernel configuration
to enable this feature.
Reviewed by: robert, sam, gnn, julian, kmacy
functions. It is easily triggered by running routed, and, I expect, by
running any other daemon that uses routing sockets.
Reviewed by: net@
MFC after: 1 week
Specifically, if two threads were doing concurrent lookups and the existing
gateway was marked down, the the first thread would drop a reference on the
gateway route and then unlock the "root" route while it tried to allocate
a new route. The second thread could then also drop a reference on the
same gateway route resulting in a reference underflow. Fix this by
clearing the gateway route pointer after dropping the reference count but
before dropping the lock. Secondly, in this same case, the second thread
would overwrite the gateway route pointer w/o free'ing a reference to the
route installed by the first thread. In practice this would probably just
fix a lost reference that would result in a route never being freed.
This fixes panics observed in rt_check() and rtexpunge().
MFC after: 1 week
PR: kern/112490
Insight from: mehuljv at yahoo.com
Reviewed by: ru (found the "not-setting it to NULL" part)
Tested by: several
- In rt_check() remove the senderr() macro and the "bad" label. They
used to simplify code, but now aren't.
- Remove extra RT_LOCK_ASSERT() in rt_setgate(). The RT_REMREF macro
does this.
- In rtfree() convert panics to KASSERTs.
- Strict the routing API: rtfree() should be called only in a case
when we are completely sure we've got the last reference on the
rtentry. In all other cases RTFREE_LOCKED() macro should be used.
If the reference isn't the last one spit out a warning printf.
Correct the only(?) case for this in rt_check().
- Fix typos in comments.
at the start of rtalloc1(). This backs out part of revs 1.83 and 1.85.
Profiling on an i386 showed that that for sending tiny packets using
bge, -current takes 7 bzero()s where RELENG_4 takes only 1, and that
bzero()ing is now the dominant overhead (10-12%, up from 1%, but
profiling overestimated this a bit). This commit backs out 2 of the
6 extra bzero()s (1 in each of 2 calls per packet to rtalloc1()). They
were the largest ones by byte count (48 bytes each) but perhaps not
by time (small misaligned ones might take longer).
255.255.255.0, and a default route with gateway x.x.x.1. Now if
the address mask is changed to something more specific, e.g.,
255.255.255.128, then after the mask change the default gateway
is no longer reachable.
Since the default route is still present in the routing table,
when the output code tries to resolve the address of the default
gateway in function rt_check(), again, the default route will be
returned by rtalloc1(). Because the lock is currently held on the
rtentry structure, one more attempt to hold the lock will trigger
a crash due to "lock recursed on non-recursive mutex ..."
This is a general problem. The fix checks for the above condition
so that an existing route entry is not mistaken for a new cloned
route. Approriately, an ENETUNREACH error is returned back to the
caller
Approved by: andre
gateways which are unreachable except through the default router. For
example, assuming there is a default route configured, and inserting
a route
"route add 64.102.54.0/24 60.80.1.1"
is currently allowed even when 60.80.1.1 is only reachable through
the default route. However, an error is thrown when this route is
utilized, say,
"ping 64.102.54.1" will return an error
This type of route insertion should be disallowed becasue:
1) Let's say that somehow our code allowed this packet to flow to
the default router, and the default router knows the next hop is
60.80.1.1, then the question is why bother inserting this route in
the 1st place, just simply use the default route.
2) Since we're not talking about source routing here, the default
router could very well choose a different path than using 60.80.1.1
for the next hop, again it defeats the purpose of adding this route.
Reviewed by: ru, gnn, bz
Approved by: andre
destination. These checks are needed so we do not install
a route looking like this:
(0) 192.0.2.200 UH tun0 =>
When removing this route the kernel will start to walk
the address space which looks like a hang on 64bit platforms
because it'll take ages while on 32bit you should see a panic
when kernel debugging options are turned on.
The problem is in rtrequest1:
if (netmask) {
rt_maskedcopy(dst, ndst, netmask);
} else
bcopy(dst, ndst, dst->sa_len);
In both cases the len might be 0 if the application forgot to
set it. If so ndst will be all-zero leading to above
mentioned strange routes.
This is an application error but we must not fail/hang/panic
because of this.
Looks ok: gnn
No objections: net@ (silence)
MFC after: 8 weeks
rather than in ifindex_table[]; all (except one) accesses are
through ifp anyway. IF_LLADDR() works faster, and all (except
one) ifaddr_byindex() users were converted to use ifp->if_addr.
- Stop storing a (pointer to) Ethernet address in "struct arpcom",
and drop the IFP2ENADDR() macro; all users have been converted
to use IF_LLADDR() instead.
- Rearrange code so that in a case of failure the affected
route is not changed. Otherwise, a bogus rtentry will be
left and later rt_check() can recurse on its lock. [1]
- Remove comment about protocol cloning.
- Fix two places where rtentry mutex was recursed on, because
accessed via two different pointers, that were actually pointing
to the same rtentry in some cases. [1]
- Return EADDRINUSE instead of bogus EDQUOT, in case when gateway
uses the same route. [2]
Reported & tested by: ps, Andrej Zverev <az inec.ru> [1]
PR: kern/64090 [2]
- rt0 passed to rt_check() must not be NULL, assert this.
- rt returned by rt_check() must be valid locked rtentry,
if no error occured.
o Modify callers, so that they never pass NULL rt0
to rt_check().
Reviewed by: sam, ume (nd6.c)
route itself.
It fixes a bug where an IPv4 route for example has an IPv6 gateway
specified:
route add 10.1.1.1 -inet6 fe80::1%fxp0
Destination Gateway Flags Refs Use Netif Expire
10.1.1.1 fe80::1%fxp0 UGHS 0 0 fxp0
The fix rejects these illegal combinations:
route: writing to routing socket: Invalid argument
add host 10.1.1.1: gateway fe80::1%fxp0: Invalid argument
Reviewed by: KAME jinmei@isl.rdc.toshiba.co.jp
Reviewed by: andre (mentor)
Approved by: re
MFC after: 5
security.jail.allow_raw_sockets sysctl MIB is set to 1) where privileged
access to jails is given out, it is possible for prison root to manipulate
various network parameters which effect the host environment. This commit
plugs a number of security holes associated with the use of raw sockets
and prisons.
This commit makes the following changes:
- Add a comment to rtioctl warning developers that if they add
any ioctl commands, they should use super-user checks where necessary,
as it is possible for PRISON root to make it this far in execution.
- Add super-user checks for the execution of the SIOCGETVIFCNT
and SIOCGETSGCNT IP multicast ioctl commands.
- Add a super-user check to rip_ctloutput(). If the calling cred
is PRISON root, make sure the socket option name is IP_HDRINCL,
otherwise deny the request.
Although this patch corrects a number of security problems associated
with raw sockets and prisons, the warning in jail(8) should still
apply, and by default we should keep the default value of
security.jail.allow_raw_sockets MIB to 0 (or disabled) until
we are certain that we have tracked down all the problems.
Looking forward, we will probably want to eliminate the
references to curthread.
This may be a MFC candidate for RELENG_5.
Reviewed by: rwatson
Approved by: bmilekic (mentor)
called "rtentry".
This saves a considerable amount of kernel memory. R_Zmalloc previously
used 256 byte blocks (plus kmalloc overhead) whereas UMA only needs 132
bytes.
Idea from: OpenBSD
it checked for rt == NULL after dereferencing the pointer).
We never check for those events elsewhere, so probably these checks
might go away here as well.
Slightly simplify (and document) the logic for memory allocation
in rt_setgate().
The rest is mostly style changes -- replace 0 with NULL where appropriate,
remove the macro SA() that was only used once, remove some useless
debugging code in rt_fixchange, explain some odd-looking casts.
of an interface. No functional change.
On passing, comment a likely bug in net/rtsock.c:sysctl_ifmalist()
which, if confirmed, would deserve to be fixed and MFC'ed
the space occupied by a struct sockaddr when passed through a
routing socket.
Use it to replace the macro ROUNDUP(int), that does the same but
is redefined by every file which uses it, courtesy of
the School of Cut'n'Paste Programming(TM).
(partial) userland changes to follow.
accordingly. The define is left intact for ABI compatibility
with userland.
This is a pre-step for the introduction of tcp_hostcache. The
network stack remains fully useable with this change.
Reviewed by: sam (mentor), bms
Reviewed by: -net, -current, core@kame.net (IPv6 parts)
Approved by: re (scottl)
routine that takes a locked routing table reference and removes all
references to the entry in the various data structures. This
eliminates instances of recursive locking and also closes races
where the lock on the entry had to be dropped prior to calling
rtrequest(RTM_DELETE). This also cleans up confusion where the
caller held a reference to an entry that might have been reclaimed
(and in some cases used that reference).
Supported by: FreeBSD Foundation
that covers updates to the contents. Note this is separate from holding
a reference and/or locking the routing table itself.
Other/related changes:
o rtredirect loses the final parameter by which an rtentry reference
may be returned; this was never used and added unwarranted complexity
for locking.
o minor style cleanups to routing code (e.g. ansi-fy function decls)
o remove the logic to bump the refcnt on the parent of cloned routes,
we assume the parent will remain as long as the clone; doing this avoids
a circularity in locking during delete
o convert some timeouts to MPSAFE callouts
Notes:
1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level
applications cannot/do-no know about mutex's. Doing this requires
that the mutex be the last element in the structure. A better solution
is to introduce an externalized version of struct rtentry but this is
a major task because of the intertwining of rtentry and other data
structures that are visible to user applications.
2. There are known LOR's that are expected to go away with forthcoming
work to eliminate many held references. If not these will be resolved
prior to release.
3. ATM changes are untested.
Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS (partly)
o move route_cb to be private to rtsock.c
o replace global static route_proto by locals
o eliminate global #define shorthands for info references
o remove some register decls
o ansi-fy function decls
o move items to be close in scope to their usage
o add rt_dispatch function for dispatching the actual message
o cleanup tangled logic for doing all-but-me msg send
Support by: FreeBSD Foundation
when julian@ added it, but the commented out code had at least
one bug -- not freeing the allocated mbuf.
Anyway, this comment no longer applies as of revision 1.67, so
remove it.
the entry being removed (ret_nrt != NULL), increment the entry's
rt_refcnt like we do it for RTM_ADD and RTM_RESOLVE, rather than
messing around with 1->0 transitions for rtfree() all over.
to current leaves because function may vanish the current node.
If parent RTA_GENMASK route has a clone (a "cloning clone"), an
rn_walktree_from() starting from parent will cause another walk
starting from clone. If a function is either rt_fixdelete() or
rt_fixchange(), this recursive walk may vanish the leaf that is
remembered by an outer walk (the "next leaf" above), panicing a
system when it resumes with an outer walk.
The following script paniced my single-user mode booted system:
: sysctl net.inet.ip.forwarding=1
: ipfw add 1 allow ip from any to any
: ifconfig lo0 127.1
: route add -net 10 -genmask 255.255.255.0 127.1
: telnet 10.1 # rt_fixchange() panic
: telnet 10.2
: telnet 10.1
: route delete -net 10 # rt_fixdelete() panic
For the time being, avoid these races by disallowing recursive
walks in rt_fixchange() and rt_fixdelete().
Also, make a slight optimization in the rtrequest(RTM_RESOLVE)
case: there is no reason to call rt_fixchange() in this case.
PR: kern/37606
MFC after: 5 days
No functional changes, but:
+ the mrouting module now should behave the same as the compiled-in
version (it did not before, some of the rsvp code was not loaded
properly);
+ netinet/ip_mroute.c is now truly optional;
+ removed some redundant/unused code;
+ changed many instances of '0' to NULL and INADDR_ANY as appropriate;
+ removed several static variables to make the code more SMP-friendly;
+ fixed some minor bugs in the mrouting code (mostly, incorrect return
values from functions).
This commit is also a prerequisite to the addition of support for PIM,
which i would like to put in before DP2 (it does not change any of
the existing APIs, anyways).
Note, in the process we found out that some device drivers fail to
properly handle changes in IFF_ALLMULTI, leading to interesting
behaviour when a multicast router is started. This bug is not
corrected by this commit, and will be fixed with a separate commit.
Detailed changes:
--------------------
netinet/ip_mroute.c all the above.
conf/files make ip_mroute.c optional
net/route.c fix mrt_ioctl hook
netinet/ip_input.c fix ip_mforward hook, move rsvp_input() here
together with other rsvp code, and a couple
of indentation fixes.
netinet/ip_output.c fix ip_mforward and ip_mcast_src hooks
netinet/ip_var.h rsvp function hooks
netinet/raw_ip.c hooks for mrouting and rsvp functions, plus
interface cleanup.
netinet/ip_mroute.h remove an unused and optional field from a struct
Most of the code is from Pavlin Radoslavov and the XORP project
Reviewed by: sam
MFC after: 1 week
a route is cloned. Previously, they took on the count
of their parent route (which was sometimes nonzero.)
Submitted by: Andre Oppermann <oppermann@pipeline.ch>
MFC after: 5 days
Have sys/net/route.c:rtrequest1(), which takes ``rt_addrinfo *''
as the argument. Pass rt_addrinfo all the way down to rtrequest1
and ifa->ifa_rtrequest. 3rd argument of ifa->ifa_rtrequest is now
``rt_addrinfo *'' instead of ``sockaddr *'' (almost noone is
using it anyways).
Benefit: the following command now works. Previously we needed
two route(8) invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0
Remove unsafe typecast in rtrequest(), from ``rtentry *'' to
``sockaddr *''. It was introduced by 4.3BSD-Reno and never
corrected.
Obtained from: BSD/OS, NetBSD
MFC after: 1 month
PR: kern/28360
effect, which would cause unnecessary route deletion:
* Unfortunately, this has the obnoxious
* property of also triggering for insertion /above/ a pre-existing network
* route and clones. Sigh. This may be fixed some day.
The effect has been even worse, because recent versions of route.c set
the parent rtentry for cloned routes from an interface-direct route.
For example, suppose that we have an interface "ne0" that has an IPv4
subnet "10.0.0.0/24". Then we may have a cloned route like 10.0.0.1
on the interface, whose parent route is 10.0.0.0/24 (to the interface
ne0). Now, when we add the default route (i.e. 0.0.0.0/0),
rt_fixchange() will remove the cloned route 10.0.0.1. The (bad) effect
also prevents rt_setgate from configuring rt_gwroute, which would not
be an intended behavior.
As suggested in the comments to rt_fixchange(), we need stricter check
in the function, to prevent unintentional route deletion.
This fix also solve the "IPV6 panic?" problem in nd6_timer().
Submitted by: JINMEI Tatuya <jinmei@isl.rdc.toshiba.co.jp>
MFC after: 4 days
route in ifa_ifwithroute(), as the last resort, look up the route to
the gateway, not destination (to derive the interface from).
PR: kern/27852
Submitted by: Iasen Kostoff <tbyte@tbyte.org>
MFC after: 2 weeks
A route generated from an RTF_CLONING route had the RTF_WASCLONED flag
set but did not have a reference to the parent route, as documented in
the rtentry(9) manpage. This prevented such routes from being deleted
when their parent route is deleted.
Now, for example, if you delete an IP address from a network interface,
all ARP entries that were cloned from this interface route are flushed.
This also has an impact on netstat(1) output. Previously, dynamically
created ARP cache entries (RTF_STATIC flag is unset) were displayed as
part of the routing table display (-r). Now, they are only printed if
the -a option is given.
netinet/in.c, netinet/in_rmx.c:
When address is removed from an interface, also delete all routes that
point to this interface and address. Previously, for example, if you
changed the address on an interface, outgoing IP datagrams might still
use the old address. The only solution was to delete and re-add some
routes. (The problem is easily observed with the route(8) command.)
Note, that if the socket was already bound to the local address before
this address is removed, new datagrams generated from this socket will
still be sent from the old address.
PR: kern/20785, kern/21914
Reviewed by: wollman (the idea)
search routine, and scratching our heads over why it was so obfuscated.
This delta fixes a number of confusing style bugs and renames several
structure members to have more meaningful names. There remain a number
of odd control-flow structures. These changes do not affect the generated
code.
Pleases let me make sure that no one touch the invalid ro_rt pointer,
after splx(s) and before next ro_rt initialization.
Though usually this seems to be already called at splnet,
I still sometime experience kernel crash at rtfree() in my
INET6 enabled environment where IPv6 connection is frequently used.
(Off-course, it might be just due to another bug.)
pr_input() routines prototype is also changed to support IPSEC and IPV6
chained protocol headers.
Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project
possible for ro->ro_rt to be non-NULL even though the RTF_UP flag
is cleared. (Example: a routing daemon or the "route" command
deletes a cloned route in active use by a TCP connection.) In that
case, the code was clobbering a reference to the routing table
entry without decrementing the entry's reference count.
The splnet() call probably isn't needed, but I haven't been able
to prove that yet. It isn't significant from a performance standpoint
since it is executed very rarely.
Reviewed by: wollman and others in the freebsd-current mailing list
is neither IFF_LOOPBACK or IFF_POINTOPOINT. It's quite common
(and probably more correct) to route local IP numbers via lo0
and it makes configuration easier to assign the hostname address
to local POINTOPOINT links too.
This message usually remains hidden because the loopback interface
gets the highest interface number at boot time, but when the
ethernet interface is added later, the message can get pretty
annoying.
Also, fix a typo.
Not objected to by: freebsd-net
for IPv6 yet)
With this patch, you can assigne IPv6 addr automatically, and can reply to
IPv6 ping.
Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project
This will not make any of object files that LINT create change; there
might be differences with INET disabled, but hardly anything compiled
before without INET anyway. Now the 'obvious' things will give a
proper error if compiled without inet - ipx_ip, ipfw, tcp_debug. The
only thing that _should_ work (but can't be made to compile reasonably
easily) is sppp :-(
This commit move struct arpcom from <netinet/if_ether.h> to
<net/if_arp.h>.
This is some of the worst code I've had to wade through in
ages and I don't want to have to start from scratch again next time.
(I have a 2.2 version of these comments, can I commit them?)
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.
Sorry if this makes it harder to merge in lite2 stuff but hey..
At least I can figure out what is going on whenever I end up going through those
files again..
do we have a policy regarding commenting existing code?
is non-null before trying to delete it in rt_setgate(), which then
allows removal of the special-case code from the RTM_ADD case.
This should fix the panics that joerg and Phil Karn have been seeing.
like it does elsewhere. This is probably only happens when incorrect
args are given to route(8), or when running with non-IPv4 stacks but
incorrect args to the route command is no excuse for panicing!
Submitted by: Michael Clay <mclay@weareb.org>, PR#1532
purpose, other than to get in the way of the ARP table and cause
"can't allocate llinfo" errors.
This change may cause gated or routed to start complaining when adding
such routes. If so, these programs will need to be fixed to not try
to add these routes.
Reviewed by: wollman
when rt == rt->rt_gwroute . rt == rt->gwroute shouldn't happen
in the first place, but that's another problem.
(try "route add -host <hostonmynet> <hostonmynet>; ping <hostonmynet>;
route delete <hostonmynet>")
(mask,value) in the tree, don't immediately return EEXIST. Instead, check
to see if the pre-existing route was generated by protcol-cloning. If so,
then it is OK to simply blow away the old route and re-attempt the insertion.
If not, then fall back to the same error code as before.
go away whenever a clone's parent is changed, or a route is added in a
certain set of circumstances.
This also includes code to forbid setting a route's gateway to an
address which can only be reached through that route, thus (hopefully)
eliminating one class of cloning bottomless-recursion bugs.
delete them when the ``parent'' goes away
route.h: add glue to track this to rtentry structure. WARNING WILL ROBINSON!
This will be yet another incompatible change in your route-using binaries.
I apologize, but this was the only way to do it. I took this opportunity
to increase the size of the metrics to what I believe will be the final
length for 2.1, so that when the T/TCP stuff is done, this won't happen
again.
and one set by the protocol family. Also add another parameter to
rtalloc1() to allow for any interface flags to be ignored; currently
this is only useful for RTF_PRCLONING. Get rid of rt_prflags and re-unite
with rt_flags. Add T/TCP ``route metrics''.
NB: YOU MUST RECOMPILE `route' AND OTHER RELATED PROGRAMS AS A RESULT OF
THIS CHANGE.
This also adds a new interface parameter, `ifi_physical', which will
eventually replace IFF_ALTPHYS as the mechanism for specifying the
particular physical connection desired on a multiple-connection card.
NB: YOU MUST RECOMPILE `ifconfig' AND OTHER RELATED PROGRAMS AS A RESULT OF
THIS CHANGE.
NB: You will have to recompile programs which use the `rt_use' member in
order to get the correct values. This should not cause incorrect operation,
but the statistics may look a little confusing.
a route. (This still doesn't work, but it doesn't panic now.) It looks
like there may be a number of incipient bugs in this code.
Also, get ready for the time when all IP gateway routes are cloning, which
is necessary to keep proper TCP statistics.
to something more recent than the ancient 1.2 release contained in
4.4. This code has the following advantages as compared to
previous versions (culled from the README file for the SunOS release):
- True multicast delivery
- Configurable rate-limiting of forwarded multicast traffic on each
physical interface or tunnel, using a token-bucket limiter.
- Simplistic classification of packets for prioritized dropping.
- Administrative scoping of multicast address ranges.
- Faster detection of hosts leaving groups.
- Support for multicast traceroute (code not yet available).
- Support for RSVP, the Resource Reservation Protocol.
What still needs to be done:
- The multicast forwarder needs testing.
- The multicast routing daemon needs to be ported.
- Network interface drivers need to have the `#ifdef MULTICAST' goop ripped
out of them.
- The IGMP code should probably be bogon-tested.
Some notes about the porting process:
In some cases, the Berkeley people decided to incorporate functionality from
later releases of the multicast code, but then had to do things differently.
As a result, if you look at Deering's patches, and then look at
our code, it is not always obvious whether the patch even applies. Let
the reader beware.
I ran ip_mroute.c through several passes of `unifdef' to get rid of
useless grot, and to permanently enable the RSVP support, which we will
include as standard.
Ported by: Garrett Wollman
Submitted by: Steve Deering and Ajit Thyagarajan (among others)