Currently, PCB caching mechanism relies on the rib generation
counter (rnh_gen) to invalidate cached nhops/LLE entries.
With certain fib algorithms, it is now possible that the
datapath lookup state applies RIB changes with some delay.
In that scenario, PCB cache will invalidate on the RIB change,
but the new lookup may result in the same nexthop being returned.
When fib algo finally gets in sync with the RIB changes, PCB cache
will not receive any notification and will end up caching the stale data.
To fix this, introduce additional counter, rnh_gen_rib, which is used
only when FIB_ALGO is enabled.
This counter is incremented by the control plane. Each time when fib algo
synchronises with the RIB, it updates rnh_gen to the current rnh_gen_rib value.
Differential Revision: https://reviews.freebsd.org/D29812
Reviewed by: donner
MFC after: 2 weeks
In case with batch route delete via rib_walk_del(), when
some paths from the multipath route gets deleted, old
multipath group were not freed.
PR: 254496
Reported by: Zhenlei Huang <zlei.huang@gmail.com>
MFC after: 1 day
Summary:
This fixes rtentry leak for the cloned interfaces created inside the
VNET.
PR: 253998
Reported by: rashey at superbox.pl
MFC after: 3 days
Loopback teardown order is `SI_SUB_INIT_IF`, which happens after `SI_SUB_PROTO_DOMAIN` (route table teardown).
Thus, any route table operations are too late to schedule.
As the intent of the vnet teardown procedures to minimise the amount of effort by doing global cleanups instead of per-interface ones, address this by adding a relatively light-weight routing table cleanup function, `rib_flush_routes()`.
It removes all remaining routes from the routing table and schedules the deletion, which will happen later, when `rtables_destroy()` waits for the current epoch to finish.
Test Plan:
```
set_skip:set_skip_group_lo -> passed [0.053s]
tail -n 200 /var/log/messages | grep rtentry
```
Reviewers: #network, kp, bz
Reviewed By: kp
Subscribers: imp, ae
Differential Revision: https://reviews.freebsd.org/D29116
The routing stack control depends on quite a tree of functions to
determine the proper attributes of a route such as a source address (ifa)
or transmit ifp of a route.
When actually inserting a route, the stack needs to ensure that ifa and ifp
points to the entities that are still valid.
Validity means slightly more than just pointer validity - stack need guarantee
that the provided objects are not scheduled for deletion.
Currently, callers either ignore it (most ifp parts, historically) or try to
use refcounting (ifa parts). Even in case of ifa refcounting it's not always
implemented in fully-safe manner. For example, some codepaths inside
rt_getifa_fib() are referencing ifa while not holding any locks, resulting in
possibility of referencing scheduled-for-deletion ifa.
Instead of trying to fix all of the callers by enforcing proper refcounting,
switch to a different model.
As the rib_action() already requires epoch, do not require any stability guarantees
other than the epoch-provided one.
Use newly-added conditional versions of the refcounting functions
(ifa_try_ref(), if_try_ref()) and fail if any of these fails.
Reviewed by: donner
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D28837
ROUTE_MPATH was added to the GENERIC kernel in r368648.
According to the plan in D27428, it was enabled with `net.route.multipath` sysctl set to 0.
Given enough time has passed, this change enables route multipath by default.
The goal is to ship FreeBSD 13 with multipath turned on.
Reviewed By: donner, olivier
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28423
subscriptions during RIB modifications.
Add new subscriptions to the beginning of the lists instead of
the end. This fixes the situation when new subscription is created
int the callback for the existing subscription, leading to the
subscription notification handler pick it.
MFC after: 3 days
Multiple consumers like ipfw, netflow or new route lookup algorithms
need to get the prefix data out of struct rtentry.
Instead of providing direct access to the rtentry, create IPv4/IPv6
accessors to abstract struct rtentry internals and avoid including
internal routing headers for external consumers.
While here, move struct route_nhop_data to the public header, so external
customers can actually use lookup functions returning rt&nhop data.
Differential Revision: https://reviews.freebsd.org/D27416
ROUTE_MPATH is the new config option controlling new multipath routing
implementation. Remove the last pieces of RADIX_MPATH-related code and
the config option.
Reviewed by: glebius
Differential Revision: https://reviews.freebsd.org/D27244
The resulting KPI can be used by routing table consumers to estimate the required
scale for route table export.
* Add tracking for rib routes
* Add accessors for number of nexthops/nexthop objects
* Simplify rib_unsubscribe: store rnh we're attached to instead of requiring it up
again on destruction. This helps in the cases when rnh is not linked yet/already unlinked.
Differential Revision: https://reviews.freebsd.org/D27404
* Make rib_walk() order of arguments consistent with the rest of RIB api
* Add rib_walk_ext() allowing to exec callback before/after iteration.
* Rename rt_foreach_fib_walk_del -> rib_foreach_table_walk_del
* Rename rt_forach_fib_walk -> rib_foreach_table_walk
* Move rib_foreach_table_walk{_del} to route/route_helpers.c
* Slightly refactor rib_foreach_table_walk{_del} to make the implementation
consistent and prepare for upcoming iterator optimizations.
Differential Revision: https://reviews.freebsd.org/D27219
Nexthop lookup was not consireding rt_flags when doing
structure comparison, which lead to an original nexthop
selection when changing flags. Fix the case by adding
rt_flags field into comparison and rearranging nhop_priv
fields to allow for efficient matching.
Fix `route change X/Y flags` case - recent changes
disallowed specifying RTF_GATEWAY flag without actual gateway.
It turns out, route(8) fills in RTF_GATEWAY by default, unless
-interface flag is specified. Fix regression by clearing
RTF_GATEWAY flag instead of failing.
Fix route flag reporting in RTM_CHANGE messages by explicitly
updating rtm_flags after operation competion.
Add IPv4/IPv6 tests for flag-only route changes.
This change is based on the nexthop objects landed in D24232.
The change introduces the concept of nexthop groups.
Each group contains the collection of nexthops with their
relative weights and a dataplane-optimized structure to enable
efficient nexthop selection.
Simular to the nexthops, nexthop groups are immutable. Dataplane part
gets compiled during group creation and is basically an array of
nexthop pointers, compiled w.r.t their weights.
With this change, `rt_nhop` field of `struct rtentry` contains either
nexthop or nexthop group. They are distinguished by the presense of
NHF_MULTIPATH flag.
All dataplane lookup functions returns pointer to the nexthop object,
leaving nexhop groups details inside routing subsystem.
User-visible changes:
The change is intended to be backward-compatible: all non-mpath operations
should work as before with ROUTE_MPATH and net.route.multipath=1.
All routes now comes with weight, default weight is 1, maximum is 2^24-1.
Current maximum multipath group width is statically set to 64.
This will become sysctl-tunable in the followup changes.
Using functionality:
* Recompile kernel with ROUTE_MPATH
* set net.route.multipath to 1
route add -6 2001:db8::/32 2001:db8::2 -weight 10
route add -6 2001:db8::/32 2001:db8::3 -weight 20
netstat -6On
Nexthop groups data
Internet6:
GrpIdx NhIdx Weight Slots Gateway Netif Refcnt
1 ------- ------- ------- --------------------------------------- --------- 1
13 10 1 2001:db8::2 vlan2
14 20 2 2001:db8::3 vlan2
Next steps:
* Land outbound hashing for locally-originated routes ( D26523 ).
* Fix net/bird multipath (net/frr seems to work fine)
* Add ROUTE_MPATH to GENERIC
* Set net.route.multipath=1 by default
Tested by: olivier
Reviewed by: glebius
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D26449
* Split rt_setmetrics into get_info_weight() and rt_set_expire_info(),
as these two can be applied at different entities and at different times.
* Start filling route weight in route change notifications
* Pass flowid to UDP/raw IP route lookups
* Rework nd6_subscription_cb() and sysctl_dumpentry() to prepare for the fact
that rtentry can contain multiple nexthops.
Differential Revision: https://reviews.freebsd.org/D26497
* Zero gw_sdl if switching to interface route - the assumption
that underlying storage is zeroed is incorrect with route changes.
* Apply proper flag mask to rte.
Reported by: vangyzen
Currently kernel requests deletion for the certain routes with specified gateway,
but this gateway is not actually checked. With multipath routes, internal gateway
checking becomes mandatory. Add the logic performing this check.
Generalise RTF_PINNED routes to the generic route priorities, simplifying the logic.
Add lookup_prefix() function to perform exact match search based on data in @info.
Differential Revision: https://reviews.freebsd.org/D26356
No functional changes.
net/route/shared.h was created in the inital phases of nexthop conversion.
It was intended to serve the same purpose as route_var.h - share definitions
of functions and structures between the routing subsystem components. At
that time route_var.h was included by many files external to the routing
subsystem, which largerly defeats its purpose.
As currently this is not the case anymore and amount of route_var.h includes
is roughly the same as shared.h, retire the latter in favour of the former.
As nexthops are immutable, some operations such as route attribute changes
require nexthop fetching, forking, modification and route switching.
These operations are not atomic, so they may need to be retried multiple
times in presence of multiple speakers changing the same route.
This change introduces "synchronisation" primitive: route_update_conditional(),
simplifying logic for route changes and upcoming multipath operations.
Differential Revision: https://reviews.freebsd.org/D26216
rtentry lock traditionally served 2 purposed: first was protecting refcounts,
the second was assuring consistent field access/changes.
Since route nexthop introduction, the need for the former disappeared and
the need for the latter reduced.
To be more precise, the following rte field are mutable:
rt_nhop (nexthop pointer, updated with RIB_WLOCK, passed in rib_cmd_info)
rte_flags (only RTF_HOST and RTF_UP, where RTF_UP gets changed at rte removal)
rt_weight (relative weight, updated with RIB_WLOCK, passed in rib_cmd_info)
rt_expire (time when rte deletion is scheduled, updated with RIB_WLOCK)
rt_chain (deletion chain pointer, updated with RIB_WLOCK)
All of them are updated under RIB_WLOCK, so the only remaining concern is the reading.
rt_nhop and rt_weight (addressed in this review) are read under rib lock and
stored in the rib_cmd_info, so the caller has no problem with consitency.
rte_flags is currently read unlocked in rtsock reporting (however the scope
is only RTF_UP flag, which is pretty static).
rt_expire is currently read unlocked in rtsock reporting.
rt_chain accesses are safe, as this is only used at route deletion.
rt_expire and rte_flags reads will be dealt in a separate reviews soon.
Differential Revision: https://reviews.freebsd.org/D26162
No functional changes.
Most of the routing flags are stored in the netxtop instead of rtentry.
Rename rt->rt_flags to rt->rte_flags to simplify reading/modifying code
checking routing flags.
In the new multipath code, rt->rt_nhop may actually point to nexthop group
instead of nhop. To ease transition, reduce the amount of rt->rt_nhop->...
accesses.
Differential Revision: https://reviews.freebsd.org/D26156
Remove unused arguments from dom_rtattach/dom_rtdetach functions and make
them return/accept 'struct rib_head' instead of 'void **'.
Declare inet/inet6 implementations in the relevant _var.h headers similar
to domifattach / domifdetach.
Add rib_subscribe_internal() function to accept subscriptions to the rnh
directly.
Differential Revision: https://reviews.freebsd.org/D26053
After moving the route control plane code from net/route.c,
all rtzone users ended up being in net/route_ctl.c.
Move uma(9) rtzone setup/teardown code to net/route_ctl.c as well
to have everything in a single place.
While here, remove custom initializers from the zone.
It was added originally to avoid setup/teardown of costy per-cpu couters.
With these counters removed, the only remaining job was avoiding rte mutex
setup/teardown. Mutex setup is relatively cheap. Additionally, this mutex
will soon be removed. With that in mind, there is no sense in keeping
custom zone callbacks.
Differential Revision: https://reviews.freebsd.org/D26051
Old subscription model allowed only single customer.
Switch inet6 to the new subscription api and eliminate the old model.
Differential Revision: https://reviews.freebsd.org/D25615
Subscriptions are planned to be used by modules such as route lookup engines.
In that case that's the module task to properly unsibscribe before detach.
However, the in-kernel customer - inet6 wants to track default route changes.
To avoid having inet6 store per-fib subscriptions, handle automatic
destruction internally.
Differential Revision: https://reviews.freebsd.org/D25614
This simplifies the code and allows to further split rtentry and nexthop,
removing one of the blockers for multipath code introduction, described in
D24141.
Reviewed by: ae
Differential Revision: https://reviews.freebsd.org/D25192
Currently there is no easy way of subscribing for the routing table changes.
The only existing way is to set ifa_rtrequest callback in the each protocol
ifaddr, which is not convenient or extandable.
This change provides generic notification subscription mechanism, that will
replace current ifa_rtrequest one and allow other applications such as
accelerated routing lookup modules subscribe for the changes.
In particular, this change provides 2 hooks: 1) synchronous one
(RIB_NOTIFY_IMMEDIATE), called under RIB_WLOCK, which ensures exact
ordering of the changes and 2) async one, (RIB_NOTIFY_DELAYED)
that is called after the change w/o holding locks. The latter one does not
provide any notification ordering guarantee.
Differential Revision: https://reviews.freebsd.org/D25070
The main driver for the change is the need to improve notification mechanism.
Currently callers guess the operation data based on the rtentry structure
returned in case of successful operation result. There are two problems with
this appoach. First is that it doesn't provide enough information for the
upcoming multipath changes, where rtentry refers to a new nexthop group,
and there is no way of guessing which paths were added during the change.
Second is that some rtentry fields can change during notification and
protecting from it by requiring customers to unlock rtentry is not desired.
Additionally, as the consumers such as rtsock do know which operation they
request in advance, making explicit add/change/del versions of the functions
makes sense, especially given the functions don't share a lot of code.
With that in mind, introduce rib_cmd_info notification structure and
rib_<add|del|change>_route() functions, with mandatory rib_cmd_info pointer.
It will be used in upcoming generalized notifications.
* Move definitions of the new functions and some other functions/structures
used for the routing table manipulation to a separate header file,
net/route/route_ctl.h. net/route.h is a frequently used file included in
~140 places in kernel, and 90% of the users don't need these definitions.
Reviewed by: ae
Differential Revision: https://reviews.freebsd.org/D25067
The main driver for the change is the need to improve notification mechanism.
Currently callers guess the operation data based on the rtentry structure
returned in case of successful operation result. There are two problems with
this appoach. First is that it doesn't provide enough information for the
upcoming multipath changes, where rtentry refers to a new nexthop group,
and there is no way of guessing which paths were added during the change.
Second is that some rtentry fields can change during notification and
protecting from it by requiring customers to unlock rtentry is not desired.
Additionally, as the consumers such as rtsock do know which operation they
request in advance, making explicit add/change/del versions of the functions
makes sense, especially given the functions don't share a lot of code.
With that in mind, introduce rib_cmd_info notification structure and
rib_<add|del|change>_route() functions, with mandatory rib_cmd_info pointer.
It will be used in upcoming generalized notifications.
* Move definitions of the new functions and some other functions/structures
used for the routing table manipulation to a separate header file,
net/route/route_ctl.h. net/route.h is a frequently used file included in
~140 places in kernel, and 90% of the users don't need these definitions.
Reviewed by: ae
Differential Revision: https://reviews.freebsd.org/D25067
multipath control plane changed described in D24141.
Currently route.c contains core routing init/teardown functions, route table
manipulation functions and various helper functions, resulting in >2KLOC
file in total. This change moves most of the route table manipulation parts
to a dedicated file, simplifying planned multipath changes and making
route.c more manageable.
Differential Revision: https://reviews.freebsd.org/D24870
Nexthop objects implementation, defined in r359823,
introduced sys/net/route directory intended to hold all
routing-related code. Move recently-introduced route_temporal.c and
private route_var.h header there.
Differential Revision: https://reviews.freebsd.org/D24597
This is the foundational change for the routing subsytem rearchitecture.
More details and goals are available in https://reviews.freebsd.org/D24141 .
This patch introduces concept of nexthop objects and new nexthop-based
routing KPI.
Nexthops are objects, containing all necessary information for performing
the packet output decision. Output interface, mtu, flags, gw address goes
there. For most of the cases, these objects will serve the same role as
the struct rtentry is currently serving.
Typically there will be low tens of such objects for the router even with
multiple BGP full-views, as these objects will be shared between routing
entries. This allows to store more information in the nexthop.
New KPI:
struct nhop_object *fib4_lookup(uint32_t fibnum, struct in_addr dst,
uint32_t scopeid, uint32_t flags, uint32_t flowid);
struct nhop_object *fib6_lookup(uint32_t fibnum, const struct in6_addr *dst6,
uint32_t scopeid, uint32_t flags, uint32_t flowid);
These 2 function are intended to replace all all flavours of
<in_|in6_>rtalloc[1]<_ign><_fib>, mpath functions and the previous
fib[46]-generation functions.
Upon successful lookup, they return nexthop object which is guaranteed to
exist within current NET_EPOCH. If longer lifetime is desired, one can
specify NHR_REF as a flag and get a referenced version of the nexthop.
Reference semantic closely resembles rtentry one, allowing sed-style conversion.
Additionally, another 2 functions are introduced to support uRPF functionality
inside variety of our firewalls. Their primary goal is to hide the multipath
implementation details inside the routing subsystem, greatly simplifying
firewalls implementation:
int fib4_lookup_urpf(uint32_t fibnum, struct in_addr dst, uint32_t scopeid,
uint32_t flags, const struct ifnet *src_if);
int fib6_lookup_urpf(uint32_t fibnum, const struct in6_addr *dst6, uint32_t scopeid,
uint32_t flags, const struct ifnet *src_if);
All functions have a separate scopeid argument, paving way to eliminating IPv6 scope
embedding and allowing to support IPv4 link-locals in the future.
Structure changes:
* rtentry gets new 'rt_nhop' pointer, slightly growing the overall size.
* rib_head gets new 'rnh_preadd' callback pointer, slightly growing overall sz.
Old KPI:
During the transition state old and new KPI will coexists. As there are another 4-5
decent-sized conversion patches, it will probably take a couple of weeks.
To support both KPIs, fields not required by the new KPI (most of rtentry) has to be
kept, resulting in the temporary size increase.
Once conversion is finished, rtentry will notably shrink.
More details:
* architectural overview: https://reviews.freebsd.org/D24141
* list of the next changes: https://reviews.freebsd.org/D24232
Reviewed by: ae,glebius(initial version)
Differential Revision: https://reviews.freebsd.org/D24232