Commit Graph

136 Commits

Author SHA1 Message Date
cem
f2d427692e rtentry: Initialize rt_mtx with MTX_NEW
The "rtentry" zone does not use UMA_ZONE_ZINIT, so it is invalid to assume the
mutex's memory will be zero.  Without MTX_NEW, garbage backing memory may
trigger the "re-initializing a mutex" assertion.

PR:		200991
Submitted by:	Chang-Hsien Tsai <luke.tw AT gmail.com>
2016-08-01 23:07:31 +00:00
bz
8757b6342b Provide a public interface to rt_flushifroutes which takes the address
family as an argument as well.
This will be used to cleanup individual protocols during VNET teardown.

Obtained from:	projects/vnet
Sponsored by:	The FreeBSD Foundation
2016-06-06 12:49:47 +00:00
gnn
d75e0c471e This change re-adds L2 caching for TCP and UDP, as originally added in D4306
but removed due to other changes in the system. Restore the llentry pointer
to the "struct route", and use it to cache the L2 lookup (ARP or ND6) as
appropriate.

Submitted by:	Mike Karels
Differential Revision:	https://reviews.freebsd.org/D6262
2016-06-02 17:51:29 +00:00
bz
a460d01567 Fix compile errors after r297225:
- properly V_irtualise variable access unbreaking VIMAGE kernels.
- remove the volatile from the function return type to make architecture
  using gcc happy [-Wreturn-type]
  "type qualifiers ignored on function return type"
  I am not entirely happy with this solution putting the u_int there
  but it will do for now.
2016-03-24 11:40:10 +00:00
gnn
c3d5404bbe FreeBSD previously provided route caching for TCP (and UDP). Re-add
route caching for TCP, with some improvements. In particular, invalidate
the route cache if a new route is added, which might be a better match.
The cache is automatically invalidated if the old route is deleted.

Submitted by:	Mike Karels
Reviewed by:	gnn
Differential Revision:	https://reviews.freebsd.org/D4306
2016-03-24 07:54:56 +00:00
melifaro
23582454c7 MFP r287070,r287073: split radix implementation and route table structure.
There are number of radix consumers in kernel land (pf,ipfw,nfs,route)
  with different requirements. In fact, first 3 don't have _any_ requirements
  and first 2 does not use radix locking. On the other hand, routing
  structure do have these requirements (rnh_gen, multipath, custom
  to-be-added control plane functions, different locking).
Additionally, radix should not known anything about its consumers internals.

So, radix code now uses tiny 'struct radix_head' structure along with
  internal 'struct radix_mask_head' instead of 'struct radix_node_head'.
  Existing consumers still uses the same 'struct radix_node_head' with
  slight modifications: they need to pass pointer to (embedded)
  'struct radix_head' to all radix callbacks.

Routing code now uses new 'struct rib_head' with different locking macro:
  RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing
  information base).

New net/route_var.h header was added to hold routing subsystem internal
  data. 'struct rib_head' was placed there. 'struct rtentry' will also
  be moved there soon.
2016-01-25 06:33:15 +00:00
melifaro
6342484c52 Remove now-unused wrappers for various routing functions. 2016-01-14 08:54:44 +00:00
melifaro
7fd2ccf67e Remove RTF_RNH_LOCKED support from rtalloc1_fib().
Last caller using it was eliminated in r293471.

Sponsored by:	Yandex LLC
2016-01-13 14:32:48 +00:00
melifaro
2d1014754e Do not rewrite all ro_flags. 2016-01-11 08:00:13 +00:00
melifaro
8975645335 Fix userland build broken by r293470.
Pointy hat to:	melifaro
2016-01-09 18:42:12 +00:00
melifaro
e4b451498b Finish r275196: do not dereference rtentry in if_output() routines.
The only piece of information that is required is rt_flags subset.

In particular, if_loop() requires RTF_REJECT and RTF_BLACKHOLE flags
  to check if this particular mbuf needs to be dropped (and what
  error should be returned).
Note that if_loop() will always return EHOSTUNREACH for "reject" routes
  regardless of RTF_HOST flag existence. This is due to upcoming routing
  changes where RTF_HOST value won't be available as lookup result.

All other functions require RTF_GATEWAY flag to check if they need
  to return EHOSTUNREACH instead of EHOSTDOWN error.

There are 11 places where non-zero 'struct route' is passed to if_output().
For most of the callers (forwarding, bpf, arp) does not care about exact
  error value. In fact, the only place where this result is propagated
  is ip_output(). (ip6_output() passes NULL route to nd6_output_ifp()).

Given that, add 3 new 'struct route' flags (RT_REJECT, RT_BLACKHOLE and
  RT_IS_GW) and inline function (rt_update_ro_flags()) to copy necessary
  rte flags to ro_flags. Call this function in ip_output() after looking up/
  verifying rte.

Reviewed by:	ae
2016-01-09 16:34:37 +00:00
melifaro
14cf7637d1 Remove sys/eventhandler.h from net/route.h
Reviewed by:	ae
2016-01-09 09:34:39 +00:00
melifaro
334ff06bce (Temporarily) remove route_redirect_event eventhandler.
Such handler should pass different set of variables, instead
  of directly providing 2 locked route entries.
Given that it hasn't been really used since at least 2012, remove
  current code.
Will re-add it after finishing most major routing-related changes.

Discussed with:	np
2016-01-09 06:26:40 +00:00
melifaro
31d78f6810 Add rib_lookup_info() to provide API for retrieving individual route
entries data in unified format.

There are control plane functions that require information other than
  just next-hop data (e.g. individual rtentry fields like flags or
  prefix/mask). Given that the goal is to avoid rte reference/refcounting,
  re-use rt_addrinfo structure to store most rte fields. If caller wants
  to retrieve key/mask or gateway (which are sockaddrs and are allocated
  separately), it needs to provide sufficient-sized sockaddrs structures
  w/ ther pointers saved in passed rt_addrinfo.

Convert:
  * lltable new records checks (in_lltable_rtcheck(),
    nd6_is_new_addr_neighbor().
  * rtsock pre-add/change route check.
  * IPv6 NS ND-proxy check (RADIX_MPATH code was eliminated because
     1) we don't support RTF_ANNOUNCE ND-proxy for networks and there should
       not be multiple host routes for such hosts 2) if we have multiple
       routes we should inspect them (which is not done). 3) the entire idea
       of abusing KRT as storage for ND proxy seems odd. Userland programs
       should be used for that purpose).
2016-01-04 15:03:20 +00:00
melifaro
c0fd3127f0 Handle IPV6_PATHMTU option by spliting ip6_getpmtu_ctl() from ip6_getpmtu().
Add ro_mtu field to 'struct route' to be able to pass lookup MTU back to
  the caller.

Currently, ip6_getpmtu() has 2 totally different use cases:
1) control plane (IPV6_PATHMTU req), where we just need to calculate MTU
  and return it, w/o any reusability.
2) Actual ip6_output() data path where we (nearly) always use the provided
  route lookup data. If this data is not 'valid' we need to perform another
  lookup and save the result (which cannot be re-used by ip6_output()).

Given that, handle 1) by calling separate function doing rte lookup itself.
  Resulting MTU is calculated by (newly-added) ip6_calcmtu() used by both
  ip6_getpmtu_ctl() and ip6_getpmtu().
For 2) instead of storing ref'ed rte, store mtu (the only needed data
  from the lookup result) inside newly-added ro_mtu field.
  'struct route' was shrinked by 8(or 4 bytes) in r292978. Grow it again
  by 4 bytes. New ro_mtu field will be used in other places like
  ip/tcp_output (EMSGSIZE handling from output routines).

Reviewed by:	ae
2016-01-03 09:54:03 +00:00
melifaro
93152c67c9 Implement interface link header precomputation API.
Add if_requestencap() interface method which is capable of calculating
  various link headers for given interface. Right now there is support
  for INET/INET6/ARP llheader calculation (IFENCAP_LL type request).
  Other types are planned to support more complex calculation
  (L2 multipath lagg nexthops, tunnel encap nexthops, etc..).

Reshape 'struct route' to be able to pass additional data (with is length)
  to prepend to mbuf.

These two changes permits routing code to pass pre-calculated nexthop data
  (like L2 header for route w/gateway) down to the stack eliminating the
  need for other lookups. It also brings us closer to more complex scenarios
  like transparently handling MPLS nexthops and tunnel interfaces.
  Last, but not least, it removes layering violation introduced by flowtable
  code (ro_lle) and simplifies handling of existing if_output consumers.

ARP/ND changes:
Make arp/ndp stack pre-calculate link header upon installing/updating lle
  record. Interface link address change are handled by re-calculating
  headers for all lles based on if_lladdr event. After these changes,
  arpresolve()/nd6_resolve() returns full pre-calculated header for
  supported interfaces thus simplifying if_output().
Move these lookups to separate ether_resolve_addr() function which ether
  returs error or fully-prepared link header. Add <arp|nd6_>resolve_addr()
  compat versions to return link addresses instead of pre-calculated data.

BPF changes:
Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT.
Despite the naming, both of there have ther header "complete". The only
  difference is that interface source mac has to be filled by OS for
  AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside
  BPF and not pollute if_output() routines. Convert BPF to pass prepend data
  via new 'struct route' mechanism. Note that it does not change
  non-optimized if_output(): ro_prepend handling is purely optional.
Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI.
  It is not needed for ethernet anymore. The only remaining FDDI user is
  dev/pdq mostly untouched since 2007. FDDI support was eliminated from
  OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65).

Flowtable changes:
  Flowtable violates layering by saving (and not correctly managing)
  rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated
  header data from that lle.

Differential Revision:	https://reviews.freebsd.org/D4102
2015-12-31 05:03:27 +00:00
melifaro
fb12a509fe Provide additional lle data in IPv6 lltable dump used by ndp(8).
Before the change, things like lle state were queried via
  SIOCGNBRINFO_IN6 by ndp(8) for _each_ lle entry in dump.
This ioctl was added in 1999, probably to avoid touching rtsock code.

This change maps SIOCGNBRINFO_IN6 data to standard rtsock dump the
 following way:
  expire (already) maps to rtm_rmx.rmx_expire
  isrouter -> rtm_flags & RTF_GATEWAY
  asked -> rtm_rmx.rmx_pksent
  state -> rtm_rmx.rmx_state (maps to rmx_weight via define)

Reviewed by:	ae
2015-12-16 10:14:16 +00:00
melifaro
ca13483a3c Merge helper fib* functions used for basic lookups.
Vast majority of rtalloc(9) users require only basic info from
route table (e.g. "does the rtentry interface match with the interface
  I have?". "what is the MTU?", "Give me the IPv4 source address to use",
  etc..).
Instead of hand-rolling lookups, checking if rtentry is up, valid,
  dealing with IPv6 mtu, finding "address" ifp (almost never done right),
  provide easy-to-use API hiding all the complexity and returning the
  needed info into small on-stack structure.

This change also helps hiding route subsystem internals (locking, direct
  rtentry accesses).
Additionaly, using this API improves lookup performance since rtentry is not
  locked.
(This is safe, since all the rtentry changes happens under both radix WLOCK
  and rtentry WLOCK).

Sponsored by:	Yandex LLC
2015-12-08 10:50:03 +00:00
melifaro
e198456483 Add new rt_foreach_fib_walk_del() function for deleting route entries
by filter function instead of picking into routing table details in
  each consumer.
Remove now-unused rt_expunge() (eliminating last external RTF_RNH_LOCKED
 user).
This simplifies future nexthops/mulitipath changes and rtrequest1_fib()
  locking refactoring.

Actual changes:
Add "rt_chain" field to permit rte grouping while doing batched delete
  from routing table (thus growing rte 200->208 on amd64).
Add "rti_filter" /  "rti_filterdata" / "rti_spare" fields to rt_addrinfo
  to pass filter function to various routing subsystems in standard way.
Convert all rt_expunge() customers to new rt_addinfo-based api and eliminate
  rt_expunge().
2015-11-30 05:51:14 +00:00
melifaro
45d403ed23 Remove several compat functions from pre-fib era. 2015-10-17 17:26:44 +00:00
melifaro
bbab608243 Constantify lookup key in ifa_ifwith* functions.
Some places in our network stack already have const
arguments (like if_output() routines and LLE functions).

Code using ifa_ifwith (and similar functins) along with
LLE/_output functions is currently bound to use tricks
like __DECONST(). Provide a cleaner way by making sockaddr
lookup key really constant.

MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D3464
2015-09-05 05:33:20 +00:00
melifaro
ba06112c24 Rename rt_foreach_fib() to rt_foreach_fib_walk().
Suggested by:	julian
2015-08-10 20:50:31 +00:00
melifaro
33c52eed18 MFP r274295:
* Move interface route cleanup to route.c:rt_flushifroutes()
* Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users
  to use new rt_foreach_fib() instead of hand-rolling cycles.
2015-08-08 18:14:59 +00:00
melifaro
f8d64c469a Finish r274175: do control plane MTU tracking.
Update route MTU in case of ifnet MTU change.
Add new RTF_FIXEDMTU to track explicitly specified MTU.

Old behavior:
ifconfig em0 mtu 1500->9000 -> all routes traversing em0 do not change MTU.
User has to manually update all routes.
ifconfig em0 mtu 9000->1500 -> all routes traversing em0 do not change MTU.
However, if ip[6]_output finds route with rt_mtu > interface mtu, rt_mtu
gets updated.

New behavior:
ifconfig em0 mtu 1500->9000 -> all interface routes in all fibs gets updated
with new MTU unless RTF_FIXEDMTU flag set on them.
ifconfig em0 mtu 9000->1500 -> all routes in all fibs gets updated with new
MTU unless RTF_FIXEDMTU flag set on them AND rt_mtu is less than ifp mtu.

route add ... -mtu XXX automatically sets RTF_FIXEDMTU flag.
route change .. -mtu 0 automatically removes RTF_FIXEDMTU flag.

PR:		194238
MFC after:	1 month
CR:		D1125
2014-11-17 01:05:29 +00:00
hrs
3eeeb7c9a3 Fix build. 2014-09-21 07:16:51 +00:00
hrs
5e2751fde9 Make net.add_addr_allfibs vnet-local. 2014-09-21 03:48:20 +00:00
melifaro
a4407f98c0 Pass radix head ptr along with rte to rtexpunge().
Rename rtexpunge to rt_expunge().
2014-05-03 16:28:54 +00:00
melifaro
1883ddc524 Move rt_setmetrics() from rtsock.c to route.c.
All rtsock-initiated rte creation/modification are now
performed in route.c holding radix tree write lock.
This reduces the need for per-rte mutex.

Sponsored by:	Yandex LLC
MFC after:	1 month
2014-04-29 19:14:42 +00:00
melifaro
7b860c446e Unify sa_equal() macro usage.
MFC after:	2 weeks
2014-04-26 14:52:03 +00:00
jmg
072b24d95e garbage collect something that hasn't been triggered in almost 5 years...
the last consumer was removed a couple years ago...
2014-04-19 19:08:08 +00:00
glebius
8293a6c1cc Garbage collect long time obsoleted (or never used) stuff from routing API. 2014-03-15 06:49:32 +00:00
glebius
dddd70a112 The route code used to mtx_destroy() a locked mutex before rtentry free. Now,
after r262763 it started to return locked mutexes to UMA. To fix that,
conditionally unlock the mutex in the destructor.

Tested by:	"Sergey V. Dyatko" <sergey.dyatko@gmail.com>
2014-03-05 21:16:46 +00:00
glebius
2d3e25388b Hide struct rtentry from userland. 2014-03-05 01:47:08 +00:00
glebius
8a3e4bbebb - Remove rt_metrics_lite and simply put its members into rtentry.
- Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This
  removes another cache trashing ++ from packet forwarding path.
- Create zini/fini methods for the rtentry UMA zone. Via initialize
  mutex and counter in them.
- Fix reporting of rmx_pksent to routing socket.
- Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode.

The change is mostly targeted for stable/10 merge. For head,
rt_pksent is expected to just disappear.

Discussed with:		melifaro
Sponsored by:		Netflix
Sponsored by:		Nginx, Inc.
2014-03-05 01:17:47 +00:00
melifaro
cd97f8bba8 Simplify inet alias handling code: if we're adding/removing alias which
has the same prefix as some other alias on the same interface, use
newly-added rt_addrmsg() instead of hand-rolled in_addralias_rtmsg().

This eliminates the following rtsock messages:

Pinned RTM_ADD for prefix (for alias addition).
Pinned RTM_DELETE for prefix (for alias withdrawal).

Example (got 10.0.0.1/24 on vlan4, playing with 10.0.0.2/24):

before commit, addition:

  got message of size 116 on Fri Jan 10 14:13:15 2014
  RTM_NEWADDR: address being added to iface: len 116, metric 0, flags:
  sockaddrs: <NETMASK,IFP,IFA,BRD>
   255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255

  got message of size 192 on Fri Jan 10 14:13:15 2014
  RTM_ADD: Add Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED>
  locks:  inits:
  sockaddrs: <DST,GATEWAY,NETMASK>
   10.0.0.0 10.0.0.2 (255) ffff ffff ff

after commit, addition:

  got message of size 116 on Fri Jan 10 13:56:26 2014
  RTM_NEWADDR: address being added to iface: len 116, metric 0, flags:
  sockaddrs: <NETMASK,IFP,IFA,BRD>
   255.255.255.0 vlan4:8.0.27.c5.29.d4 14.0.0.2 14.0.0.255

before commit, wihdrawal:

  got message of size 192 on Fri Jan 10 13:58:59 2014
  RTM_DELETE: Delete Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED>
  locks:  inits:
  sockaddrs: <DST,GATEWAY,NETMASK>
   10.0.0.0 10.0.0.2 (255) ffff ffff ff

  got message of size 116 on Fri Jan 10 13:58:59 2014
  RTM_DELADDR: address being removed from iface: len 116, metric 0, flags:
  sockaddrs: <NETMASK,IFP,IFA,BRD>
   255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255

adter commit, withdrawal:

  got message of size 116 on Fri Jan 10 14:14:11 2014
  RTM_DELADDR: address being removed from iface: len 116, metric 0, flags:
  sockaddrs: <NETMASK,IFP,IFA,BRD>
   255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255

Sending both RTM_ADD/RTM_DELETE messages to rtsock is completely wrong
(and requires some hacks to keep prefix in route table on RTM_DELETE).

I've tested this change with quagga (no change) and bird (*).

bird alias handling is already broken in *BSD sysdep code, so nothing
changes here, too.

I'm going to MFC this change if there will be no complains about behavior
change.

While here, fix some style(9) bugs introduced by r260488
(pointed by glebius and bde).

Sponsored by:	Yandex LLC
MFC after:	4 weeks
2014-01-10 12:13:55 +00:00
melifaro
dfba7fd9ef Split rt_newaddrmsg_fib() into two different functions.
Adding/deleting interface addresses involves access to 3 different subsystems,
int different parts of code. Each call can fail, so reporting successful
operation by rtsock in the middle of the process error-prone.

Further split routing notification API and actual rtsock calls via creating
public-available rt_addrmsg() / rt_routemsg() functions with "private"
rtsock_* backend.

MFC after:	2 weeks
2014-01-09 18:13:25 +00:00
melifaro
6e726b4922 Constanly use RT_ALL_FIBS everywhere instead of -1.
MFC after:	2 weeks
2014-01-08 23:09:02 +00:00
qingli
2160365ab5 Due to the routing related networking kernel redesign work
in FBSD 8.0, interface routes have been returened to the
applications without the RTF_GATEWAY bit. This incompatibility
has caused some issues with Zebra, Qugga and the like.
This patch provides the RTF_GATEWAY flag bit in returned interface
routes so to behave similarly to pre 8.0 systems.

Reviewed by:	    hrs
Verified by:	    mackn at opendns dot com
2013-06-25 00:10:49 +00:00
melifaro
fccbb24392 Fix long-standing issue with interface routes being unprotected:
Use RTM_PINNED flag to mark route as immutable.
Forbid deleting immutable routes without special rtrequest1_fib() flag.
Adding interface address with prefix already in route table is handled
by atomically deleting old prefix and adding interface one.

Discussed with:	andre, eri
MFC after:	3 weeks
2013-03-08 20:33:50 +00:00
glebius
418a04b467 When ip_output()/ip6_output() is supplied a struct route *ro argument,
it skips FLOWTABLE lookup. However, the non-NULL ro has dual meaning
here: it may be supplied to provide route, and it may be supplied to
store and return to caller the route that ip_output()/ip6_output()
finds. In the latter case skipping FLOWTABLE lookup is pessimisation.

The difference between struct route filled by FLOWTABLE and filled
by rtalloc() family is that the former doesn't hold a reference on
its rtentry. Reference is hold by flow entry, and it is about to
be released in future. Thus, route filled by FLOWTABLE shouldn't
be passed to RTFREE() macro.

- Introduce new flag for struct route/route_in6, that marks route
  not holding a reference on rtentry.
- Introduce new macro RO_RTFREE() that cleans up a struct route
  depending on its kind.
- All callers to ip_output()/ip6_output() that do supply non-NULL
  but empty route should use RO_RTFREE() to free results of
  lookup.
- ip_output()/ip6_output() now do FLOWTABLE lookup always when
  ro->ro_rt == NULL.

Tested by:	tuexen (SCTP part)
2012-07-04 07:37:53 +00:00
bz
a45ce69b30 Hide kernel option ROUTETABLES evaluations in the implementation
rather than the header file.  With this also move RT_MAXFIBS and
RT_NUMFIBS into the implemantion to avoid further usage in other
code. rt_numfibs is all that should be needed.

This allows users to change the number of FIBs from 1..RT_MAXFIBS(16)
dynamically using the tunable without the need to change the kernel
config for the maximum anymore.  This means that thet multi-FIB
feature is now fully available with GENERIC kernels.
The kernel option ROUTETABLES can still be used to set the default
numbers of FIBs in absence of the tunable.

Ok.ed by:	julian, hrs, melifaro
MFC after:	2 weeks
2012-03-18 11:23:40 +00:00
bz
dcdb23291f Merge multi-FIB IPv6 support from projects/multi-fibv6/head/:
Extend the so far IPv4-only support for multiple routing tables (FIBs)
introduced in r178888 to IPv6 providing feature parity.

This includes an extended rtalloc(9) KPI for IPv6, the necessary
adjustments to the network stack, and user land support as in netstat.

Sponsored by:	Cisco Systems, Inc.
Reviewed by:	melifaro (basically)
MFC after:	10 days
2012-02-17 02:39:58 +00:00
bz
a13ffdabcc Pass the fibnum where we need filtering of the message on the
rtsock allowing routing daemons to filter routing updates on an
rtsock per FIB.

Adjust raw_input() and split it into wrapper and a new function
taking an optional callback argument even though we only have one
consumer [1] to keep the hackish flags local to rtsock.c.

PR:		kern/134931
Submitted by:	multiple (see PR)
Suggested by:	rwatson [1]
Reviewed by:	rwatson
MFC after:	3 days
2011-09-28 13:48:36 +00:00
kmacy
e3079e1350 Make KBI changes required for future MFCing of inpcb rtentry / llentry caching.
Reviewed by:	rwatson, bz
Approved by:	re (kib)
2011-09-20 20:27:26 +00:00
bz
a8e6967ab3 Garbage collect never used global, sysctl, externs.
MFC after:	1 week
2011-06-21 07:19:03 +00:00
dchagin
0f55e947bd Remove dead code.
MFC after:	1 Week
2011-03-20 08:35:00 +00:00
qingli
4ff4954e4e Verify interface up status using its link state only
if the interface has such capability. The interface
capability flag indicates whether such capability
exists. This approach is much more backward compatible.
Physical device driver changes will be part of another
commit.

Also updated the ifconfig utility to show the LINKSTATE
capability if present.

Reviewed by:	rwatson, imp, juli
MFC after:	3 days
2010-03-16 17:59:12 +00:00
qingli
cde322640a The if_tap interface is of IFT_ETHERNET type, but it
does not set or update the if_link_state variable.
As such RT_LINK_IS_UP() fails for the if_tap interface.

Also, the RT_LINK_IS_UP() needs to bypass all loopback
interfaces because loopback interfaces are considered
up logically as long as the system is running.

This patch fixes the above issues by setting and updating
the if_link_state variable when the tap interface is
opened or closed respectively. Similary approach is
already done in the if_tun device.

MFC after:	3 days
2010-03-11 17:56:46 +00:00
qingli
93013817b0 One of the advantages of enabling ECMP (a.k.a RADIX_MPATH) is to
allow for connection load balancing across interfaces. Currently
the address alias handling method is colliding with the ECMP code.
For example, when two interfaces are configured on the same prefix,
only one prefix route is installed. So connection load balancing
among the available interfaces is not possible.

The other advantage of ECMP is for failover. The issue with the
current code, is that the interface link-state is not reflected
in the route entry. For example, if there are two interfaces on
the same prefix, the cable on one interface is unplugged, new and
existing connections should switch over to the other interface.
This is not done today and packets go into a black hole.

Also, there is a small bug in the kernel where deleting ECMP routes
in the userland will always return an error even though the command
is successfully executed.

MFC after:	5 days
2010-03-09 01:11:45 +00:00
qingli
ed965a92bc The proxy arp entries could not be added into the system over the
IFF_POINTOPOINT link types. The reason was due to the routing
entry returned from the kernel covering the remote end is of an
interface type that does not support ARP. This patch fixes this
problem by providing a hint to the kernel routing code, which
indicates the prefix route instead of the PPP host route should
be returned to the caller. Since a host route to the local end
point is also added into the routing table, and there could be
multiple such instantiations due to multiple PPP links can be
created with the same local end IP address, this patch also fixes
the loopback route installation failure problem observed prior to
this patch. The reference count of loopback route to local end would
be either incremented or decremented. The first instantiation would
create the entry and the last removal would delete the route entry.

MFC after:	5 days
2009-12-30 21:35:34 +00:00