Provide wrapper for the rnh_walktree_from() rib callback.
As currently `struct rib_head` is considered internal to the
routing subsystem, this wrapper is necessary to maintain isolation
from the external code.
Differential Revision: https://reviews.freebsd.org/D29971
MFC after: 1 week
Currently, if the immutable algorithm like bsearch or radix_lockless
receives rtable update notification, it schedules algorithm rebuild.
This rebuild is executed by the callout after ~50 milliseconds.
It is possible that a script adding an interface address and than route
with the gateway bound to that address will fail. It can happen due
to the fact that fib is not updated by the time the route addition
request arrives.
Fix this by allowing synchronous algorithm rebuilds based on certain
conditions. By default, these conditions assume:
1) less than net.route.algo.fib_sync_limit=100 routes
2) routes without gateway.
* Move algo instance build entirely under rib WLOCK.
Rib lock is only used for control plane (except radix algo, but there
are no rebuilds).
* Add rib_walk_ext_locked() function to allow RIB iteration with
rib lock already held.
* Fix rare potential callout use-after-free for fds by binding fd
callout to the relevant rib rmlock. In that case, callout_stop()
under rib WLOCK guarantees no callout will be executed afterwards.
MFC after: 3 days
subscriptions during RIB modifications.
Add new subscriptions to the beginning of the lists instead of
the end. This fixes the situation when new subscription is created
int the callback for the existing subscription, leading to the
subscription notification handler pick it.
MFC after: 3 days
rtinit[1]() is a function used to add or remove interface address prefix routes,
similar to ifa_maintain_loopback_route().
It was intended to be family-agnostic. There is a problem with this approach
in reality.
1) IPv6 code does not use it for the ifa routes. There is a separate layer,
nd6_prelist_(), providing interface for maintaining interface routes. Its part,
responsible for the actual route table interaction, mimics rtenty() code.
2) rtinit tries to combine multiple actions in the same function: constructing
proper route attributes and handling iterations over multiple fibs, for the
non-zero net.add_addr_allfibs use case. It notably increases the code complexity.
3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag
for p2p connections, host routes and p2p routes are handled in the same way.
Additionally, mapping IFA flags to RTF flags makes the interface pretty messy.
It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface
aliases.
4) rtinit() is the last customer passing non-masked prefixes to rib_action(),
complicating rib_action() implementation.
5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive"
ifa messages in certain corner cases.
To address all these points, the following has been done:
* rtinit() has been split into multiple functions:
- Route attribute construction were moved to the per-address-family functions,
dealing with (2), (3) and (4).
- funnction providing net.add_addr_allfibs handling and route rtsock notificaions
is the new routing table inteface.
- rtsock ifa notificaion has been moved out as well. resulting set of funcion are only
responsible for the actual route notifications.
Side effects:
* /32 alias does not result in interface routes (/32 route and "host" route)
* RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses
Differential revision: https://reviews.freebsd.org/D28186
Multiple consumers like ipfw, netflow or new route lookup algorithms
need to get the prefix data out of struct rtentry.
Instead of providing direct access to the rtentry, create IPv4/IPv6
accessors to abstract struct rtentry internals and avoid including
internal routing headers for external consumers.
While here, move struct route_nhop_data to the public header, so external
customers can actually use lookup functions returning rt&nhop data.
Differential Revision: https://reviews.freebsd.org/D27416
The resulting KPI can be used by routing table consumers to estimate the required
scale for route table export.
* Add tracking for rib routes
* Add accessors for number of nexthops/nexthop objects
* Simplify rib_unsubscribe: store rnh we're attached to instead of requiring it up
again on destruction. This helps in the cases when rnh is not linked yet/already unlinked.
Differential Revision: https://reviews.freebsd.org/D27404
* Make rib_walk() order of arguments consistent with the rest of RIB api
* Add rib_walk_ext() allowing to exec callback before/after iteration.
* Rename rt_foreach_fib_walk_del -> rib_foreach_table_walk_del
* Rename rt_forach_fib_walk -> rib_foreach_table_walk
* Move rib_foreach_table_walk{_del} to route/route_helpers.c
* Slightly refactor rib_foreach_table_walk{_del} to make the implementation
consistent and prepare for upcoming iterator optimizations.
Differential Revision: https://reviews.freebsd.org/D27219
This change is based on the nexthop objects landed in D24232.
The change introduces the concept of nexthop groups.
Each group contains the collection of nexthops with their
relative weights and a dataplane-optimized structure to enable
efficient nexthop selection.
Simular to the nexthops, nexthop groups are immutable. Dataplane part
gets compiled during group creation and is basically an array of
nexthop pointers, compiled w.r.t their weights.
With this change, `rt_nhop` field of `struct rtentry` contains either
nexthop or nexthop group. They are distinguished by the presense of
NHF_MULTIPATH flag.
All dataplane lookup functions returns pointer to the nexthop object,
leaving nexhop groups details inside routing subsystem.
User-visible changes:
The change is intended to be backward-compatible: all non-mpath operations
should work as before with ROUTE_MPATH and net.route.multipath=1.
All routes now comes with weight, default weight is 1, maximum is 2^24-1.
Current maximum multipath group width is statically set to 64.
This will become sysctl-tunable in the followup changes.
Using functionality:
* Recompile kernel with ROUTE_MPATH
* set net.route.multipath to 1
route add -6 2001:db8::/32 2001:db8::2 -weight 10
route add -6 2001:db8::/32 2001:db8::3 -weight 20
netstat -6On
Nexthop groups data
Internet6:
GrpIdx NhIdx Weight Slots Gateway Netif Refcnt
1 ------- ------- ------- --------------------------------------- --------- 1
13 10 1 2001:db8::2 vlan2
14 20 2 2001:db8::3 vlan2
Next steps:
* Land outbound hashing for locally-originated routes ( D26523 ).
* Fix net/bird multipath (net/frr seems to work fine)
* Add ROUTE_MPATH to GENERIC
* Set net.route.multipath=1 by default
Tested by: olivier
Reviewed by: glebius
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D26449
Remove unused arguments from dom_rtattach/dom_rtdetach functions and make
them return/accept 'struct rib_head' instead of 'void **'.
Declare inet/inet6 implementations in the relevant _var.h headers similar
to domifattach / domifdetach.
Add rib_subscribe_internal() function to accept subscriptions to the rnh
directly.
Differential Revision: https://reviews.freebsd.org/D26053
This simplifies the code and allows to further split rtentry and nexthop,
removing one of the blockers for multipath code introduction, described in
D24141.
Reviewed by: ae
Differential Revision: https://reviews.freebsd.org/D25192
Currently there is no easy way of subscribing for the routing table changes.
The only existing way is to set ifa_rtrequest callback in the each protocol
ifaddr, which is not convenient or extandable.
This change provides generic notification subscription mechanism, that will
replace current ifa_rtrequest one and allow other applications such as
accelerated routing lookup modules subscribe for the changes.
In particular, this change provides 2 hooks: 1) synchronous one
(RIB_NOTIFY_IMMEDIATE), called under RIB_WLOCK, which ensures exact
ordering of the changes and 2) async one, (RIB_NOTIFY_DELAYED)
that is called after the change w/o holding locks. The latter one does not
provide any notification ordering guarantee.
Differential Revision: https://reviews.freebsd.org/D25070
The main driver for the change is the need to improve notification mechanism.
Currently callers guess the operation data based on the rtentry structure
returned in case of successful operation result. There are two problems with
this appoach. First is that it doesn't provide enough information for the
upcoming multipath changes, where rtentry refers to a new nexthop group,
and there is no way of guessing which paths were added during the change.
Second is that some rtentry fields can change during notification and
protecting from it by requiring customers to unlock rtentry is not desired.
Additionally, as the consumers such as rtsock do know which operation they
request in advance, making explicit add/change/del versions of the functions
makes sense, especially given the functions don't share a lot of code.
With that in mind, introduce rib_cmd_info notification structure and
rib_<add|del|change>_route() functions, with mandatory rib_cmd_info pointer.
It will be used in upcoming generalized notifications.
* Move definitions of the new functions and some other functions/structures
used for the routing table manipulation to a separate header file,
net/route/route_ctl.h. net/route.h is a frequently used file included in
~140 places in kernel, and 90% of the users don't need these definitions.
Reviewed by: ae
Differential Revision: https://reviews.freebsd.org/D25067