Commit Graph

60 Commits

Author SHA1 Message Date
Alexander V. Chernikov
d1d941c5b9 Remove RADIX_MPATH config option.
ROUTE_MPATH is the new config option controlling new multipath routing
 implementation. Remove the last pieces of RADIX_MPATH-related code and
 the config option.

Reviewed by:	glebius
Differential Revision:	https://reviews.freebsd.org/D27244
2020-11-29 19:43:33 +00:00
Alexander V. Chernikov
3b1654cb14 Introduce rib_walk_ext_internal() to allow iteration with rnh pointer.
This solves the case when rib is not yet attached/detached to/from the
 system rib array.

Differential Revision:	https://reviews.freebsd.org/D27406
2020-11-29 13:54:49 +00:00
Alexander V. Chernikov
f47fa26065 Add nhop_ref_any() to unify referencing nhop or nexthop group.
It allows code within routing subsystem to transparently reference nexthops
 and nexthop groups, similar to nhop_free_any(), abstracting ROUTE_MPATH
 details.

Differential Revision:	https://reviews.freebsd.org/D27410
2020-11-29 13:52:06 +00:00
Alexander V. Chernikov
98d5c4e5c8 Add tracking for rib/nhops/nhgrp objects and provide cumulative number accessors.
The resulting KPI can be used by routing table consumers to estimate the required
 scale for route table export.

* Add tracking for rib routes
* Add accessors for number of nexthops/nexthop objects
* Simplify rib_unsubscribe: store rnh we're attached to instead of requiring it up
 again on destruction. This helps in the cases when rnh is not linked yet/already unlinked.

Differential Revision:	https://reviews.freebsd.org/D27404
2020-11-29 13:27:24 +00:00
Alexander V. Chernikov
ef6ef7e5da Add nhgrp_get_idx() as a counterpart for nhop_get_idx().
It allows the routing-related code to reference nexthop groups by index
 instead of storing a pointer.
2020-11-28 15:46:40 +00:00
Alexander V. Chernikov
7511a63825 Refactor rib iterator functions.
* Make rib_walk() order of arguments consistent with the rest of RIB api
* Add rib_walk_ext() allowing to exec callback before/after iteration.
* Rename rt_foreach_fib_walk_del -> rib_foreach_table_walk_del
* Rename rt_forach_fib_walk -> rib_foreach_table_walk
* Move rib_foreach_table_walk{_del} to route/route_helpers.c
* Slightly refactor rib_foreach_table_walk{_del} to make the implementation
 consistent and prepare for upcoming iterator optimizations.

Differential Revision:	https://reviews.freebsd.org/D27219
2020-11-22 20:21:10 +00:00
Alexander V. Chernikov
2d39824195 Switch net.add_addr_allfibs default to 0.
The goal of the fib support is to provide multiple independent
 routing tables, isolated from each other.
net.add_addr_allfibs default tries to shift gears in the opposite
 direction, unconditionally inserting all addresses to all of the fibs.

There are use cases when this is necessary, however this is not a
 default expected behaviour, especially compared to other implementations.

Provide WARNING message for the setups with multiple fibs to notify
 potential users of the feature.

Differential Revision:	https://reviews.freebsd.org/D26076
2020-11-08 18:27:49 +00:00
Alexander V. Chernikov
76e6b37f6b Temporarily revert setting net.add_addr_allfibs to 0.
It accidentally sweeped in r367486.
Revert to allow for proper commit message & warning.
2020-11-08 18:11:12 +00:00
Alexander V. Chernikov
770495f4c0 Fix build broken by r367484: add route_ifaddrs.c.
Pointy hat to: melifaro
Reported by:	jenkins
2020-11-08 13:30:44 +00:00
Alexander V. Chernikov
0c325f53f1 Implement flowid calculation for outbound connections to balance
connections over multiple paths.

Multipath routing relies on mbuf flowid data for both transit
 and outbound traffic. Current code fills mbuf flowid from inp_flowid
 for connection-oriented sockets. However, inp_flowid is currently
 not calculated for outbound connections.

This change creates simple hashing functions and starts calculating hashes
 for TCP,UDP/UDP-Lite and raw IP if multipath routes are present in the
 system.

Reviewed by:	glebius (previous version),ae
Differential Revision:	https://reviews.freebsd.org/D26523
2020-10-18 17:15:47 +00:00
Alexander V. Chernikov
1b95005e95 Fix route flags update during RTM_CHANGE.
Nexthop lookup was not consireding rt_flags when doing
 structure comparison, which lead to an original nexthop
 selection when changing flags. Fix the case by adding
 rt_flags field into comparison and rearranging nhop_priv
 fields to allow for efficient matching.
Fix `route change X/Y flags` case - recent changes
 disallowed specifying RTF_GATEWAY flag without actual gateway.
 It turns out, route(8) fills in RTF_GATEWAY by default, unless
 -interface flag is specified. Fix regression by clearing
 RTF_GATEWAY flag instead of failing.
Fix route flag reporting in RTM_CHANGE messages by explicitly
 updating rtm_flags after operation competion.
Add IPv4/IPv6 tests for flag-only route changes.
2020-10-04 13:24:58 +00:00
Alexander V. Chernikov
9c584fa4bc Remove ROUTE_MPATH-related warnings introduced in r366390.
Reported by:	mjg
2020-10-03 14:37:54 +00:00
Alexander V. Chernikov
fedeb08b6a Introduce scalable route multipath.
This change is based on the nexthop objects landed in D24232.

The change introduces the concept of nexthop groups.
Each group contains the collection of nexthops with their
 relative weights and a dataplane-optimized structure to enable
 efficient nexthop selection.

Simular to the nexthops, nexthop groups are immutable. Dataplane part
 gets compiled during group creation and is basically an array of
 nexthop pointers, compiled w.r.t their weights.

With this change, `rt_nhop` field of `struct rtentry` contains either
 nexthop or nexthop group. They are distinguished by the presense of
 NHF_MULTIPATH flag.
All dataplane lookup functions returns pointer to the nexthop object,
leaving nexhop groups details inside routing subsystem.

User-visible changes:

The change is intended to be backward-compatible: all non-mpath operations
 should work as before with ROUTE_MPATH and net.route.multipath=1.

All routes now comes with weight, default weight is 1, maximum is 2^24-1.

Current maximum multipath group width is statically set to 64.
 This will become sysctl-tunable in the followup changes.

Using functionality:
* Recompile kernel with ROUTE_MPATH
* set net.route.multipath to 1

route add -6 2001:db8::/32 2001:db8::2 -weight 10
route add -6 2001:db8::/32 2001:db8::3 -weight 20

netstat -6On

Nexthop groups data

Internet6:
GrpIdx  NhIdx     Weight   Slots                                 Gateway     Netif  Refcnt
1         ------- ------- ------- --------------------------------------- ---------       1
              13      10       1                             2001:db8::2     vlan2
              14      20       2                             2001:db8::3     vlan2

Next steps:
* Land outbound hashing for locally-originated routes ( D26523 ).
* Fix net/bird multipath (net/frr seems to work fine)
* Add ROUTE_MPATH to GENERIC
* Set net.route.multipath=1 by default

Tested by:	olivier
Reviewed by:	glebius
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D26449
2020-10-03 10:47:17 +00:00
Alexander V. Chernikov
2259a03020 Rework part of routing code to reduce difference to D26449.
* Split rt_setmetrics into get_info_weight() and rt_set_expire_info(),
 as these two can be applied at different entities and at different times.
* Start filling route weight in route change notifications
* Pass flowid to UDP/raw IP route lookups
* Rework nd6_subscription_cb() and sysctl_dumpentry() to prepare for the fact
 that rtentry can contain multiple nexthops.

Differential Revision:	https://reviews.freebsd.org/D26497
2020-09-21 20:02:26 +00:00
Alexander V. Chernikov
1440f62266 Remove unused nhop_ref_any() function.
Remove "opt_mpath.h" header where not needed.

No functional changes.
2020-09-20 21:32:52 +00:00
Alexander V. Chernikov
c4bcfe98e2 Fix gw updates / flag updates during route changes.
* Zero gw_sdl if switching to interface route - the assumption
 that underlying storage is zeroed is incorrect with route changes.
* Apply proper flag mask to rte.

Reported by:	vangyzen
2020-09-20 12:31:48 +00:00
Alexander V. Chernikov
2b32d93e55 Fix RADIX_MPATH build broken by r365521.
Reported by:	jenkins, Hartmann, O. <ohartmann at walstatt.org>
2020-09-10 07:05:31 +00:00
Alexander V. Chernikov
aa8f9f90ff Update nexthop handling for route addition/deletion in preparation for mpath.
Currently kernel requests deletion for the certain routes with specified gateway,
 but this gateway is not actually checked. With multipath routes, internal gateway
 checking becomes mandatory. Add the logic performing this check.

Generalise RTF_PINNED routes to the generic route priorities, simplifying the logic.

Add lookup_prefix() function to perform exact match search based on data in @info.

Differential Revision:	https://reviews.freebsd.org/D26356
2020-09-09 22:07:54 +00:00
Alexander V. Chernikov
cd6298d5c5 Retain marking net.fibs sysctl as a tunable.
Suggested by:	avg
2020-09-09 21:45:18 +00:00
Alexander V. Chernikov
4a8201c13a Fix panic with net.fibs tunable set in loader.conf.
Fix by removing forgotten CTLFLAG_RWTUN flag from the sysctl,
 loader variable will be read later in vnet_rtables_init().

Reported by:	mav
2020-09-08 21:39:34 +00:00
Alexander V. Chernikov
8f07963360 Fix regression for IPv6 loopback routes.
After nexthop introduction, loopback routes for the interface addresses
 were created without embedding actual interface index in the gateway.
 The latter is needed to pass the IPv6 scope during transmission via loopback..

Fix the regression by actually using passed gateway data with interface index.

Differential Revision:	https://reviews.freebsd.org/D26306
2020-09-03 22:24:52 +00:00
Mateusz Guzik
662c13053f net: clean up empty lines in .c and .h files 2020-09-01 21:19:14 +00:00
Alexander V. Chernikov
b8d2d479cd Revert uma zone alignemnt cache unadvertenly committed in r364950. 2020-08-29 12:04:13 +00:00
Alexander V. Chernikov
6498f66f7c Fix build with RADIX_MPATH.
Reported by:	Hartmann, O <ohartmann@walstatt.org>
2020-08-29 11:04:24 +00:00
Alexander V. Chernikov
7c89a3b63f Move fib_rte_to_nh_flags() from net/route_var.h to net/route/nhop_ctl.c.
No functional changes.
Initially this function was created to perform runtime flag conversions
 for the previous incarnation of fib lookup functions. As these functions
 got deprecated, move the function to the file with the only remaining
 caller. Lastly, rename it to convert_rt_to_nh_flags() to follow the
 naming notation.
2020-08-28 23:01:56 +00:00
Alexander V. Chernikov
a624ca3dff Move net/route/shared.h definitions to net/route/route_var.h.
No functional changes.

net/route/shared.h was created in the inital phases of nexthop conversion.
It was intended to serve the same purpose as route_var.h - share definitions
 of functions and structures between the routing subsystem components. At
 that time route_var.h was included by many files external to the routing
 subsystem, which largerly defeats its purpose.

As currently this is not the case anymore and amount of route_var.h includes
 is roughly the same as shared.h, retire the latter in favour of the former.
2020-08-28 22:50:20 +00:00
Alexander V. Chernikov
b122304f6a Further split nhop creation and rtable operations.
As nexthops are immutable, some operations such as route attribute changes
 require nexthop fetching, forking, modification and route switching.
These operations are not atomic, so they may need to be retried multiple
 times in presence of multiple speakers changing the same route.

This change introduces "synchronisation" primitive: route_update_conditional(),
 simplifying logic for route changes and upcoming multipath operations.

Differential Revision:	https://reviews.freebsd.org/D26216
2020-08-28 21:59:10 +00:00
Alexander V. Chernikov
592d300e34 Remove RT_LOCK mutex from rte.
rtentry lock traditionally served 2 purposed: first was protecting refcounts,
 the second was assuring consistent field access/changes.
Since route nexthop introduction, the need for the former disappeared and
 the need for the latter reduced.
To be more precise, the following rte field are mutable:

rt_nhop (nexthop pointer, updated with RIB_WLOCK, passed in rib_cmd_info)
rte_flags (only RTF_HOST and RTF_UP, where RTF_UP gets changed at rte removal)
rt_weight (relative weight, updated with RIB_WLOCK, passed in rib_cmd_info)
rt_expire (time when rte deletion is scheduled, updated with RIB_WLOCK)
rt_chain (deletion chain pointer, updated with RIB_WLOCK)
All of them are updated under RIB_WLOCK, so the only remaining concern is the reading.

rt_nhop and rt_weight (addressed in this review) are read under rib lock and
 stored in the rib_cmd_info, so the caller has no problem with consitency.
rte_flags is currently read unlocked in rtsock reporting (however the scope
 is only RTF_UP flag, which is pretty static).
rt_expire is currently read unlocked in rtsock reporting.
rt_chain accesses are safe, as this is only used at route deletion.

rt_expire and rte_flags reads will be dealt in a separate reviews soon.

Differential Revision:	https://reviews.freebsd.org/D26162
2020-08-24 20:23:34 +00:00
Alexander V. Chernikov
93bfd365d2 Rename rt_flags to rte_flags && reduce number of rt_nhop accesses.
No functional changes.

Most of the routing flags are stored in the netxtop instead of rtentry.
Rename rt->rt_flags to rt->rte_flags to simplify reading/modifying code
 checking routing flags.

In the new multipath code, rt->rt_nhop may actually point to nexthop group
 instead of nhop. To ease transition, reduce the amount of rt->rt_nhop->...
 accesses.

Differential Revision:	https://reviews.freebsd.org/D26156
2020-08-22 19:30:56 +00:00
Mateusz Guzik
c93d310f87 Fix tinderbox build after r364465 2020-08-22 07:43:38 +00:00
Alexander V. Chernikov
f5247a232a Make net.fibs growable.
Allow to dynamically grow the amount of fibs in each vnet.

This change alters current behavior. Currently, if one defines
 ROUTETABLES > 1 in the kernel config, each vnet will be created
 with the number of fibs defined in the kernel config.
 After this commit vnets will be created with fibs=1.

Dynamic net.fibs is not compatible with net.add_addr_allfibs.
 The plan is to deprecate the latter and make
 net.add_addr_allfibs=0 default behaviour.

Reviewed by:	glebius
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D26062
2020-08-21 21:34:52 +00:00
Alexander V. Chernikov
2f23f45b20 Simplify dom_<rtattach|rtdetach>.
Remove unused arguments from dom_rtattach/dom_rtdetach functions and make
  them return/accept 'struct rib_head' instead of 'void **'.
Declare inet/inet6 implementations in the relevant _var.h headers similar
  to domifattach / domifdetach.
Add rib_subscribe_internal() function to accept subscriptions to the rnh
  directly.

Differential Revision:	https://reviews.freebsd.org/D26053
2020-08-14 21:29:56 +00:00
Alexander V. Chernikov
6cbadc4234 Move rtzone handling code to net/route_ctl.c
After moving the route control plane code from net/route.c,
 all rtzone users ended up being in net/route_ctl.c.
Move uma(9) rtzone setup/teardown code to net/route_ctl.c as well
 to have everything in a single place.

While here, remove custom initializers from the zone.
It was added originally to avoid setup/teardown of costy per-cpu couters.
With these counters removed, the only remaining job was avoiding rte mutex
 setup/teardown. Mutex setup is relatively cheap. Additionally, this mutex
 will soon be removed. With that in mind, there is no sense in keeping
 custom zone callbacks.

Differential Revision:	https://reviews.freebsd.org/D26051
2020-08-13 18:35:29 +00:00
Alexander V. Chernikov
8a0917c35b Do not enter epoch in add_route(), as it is already called in epoch.
Reviewed by:	glebius
2020-08-11 07:23:07 +00:00
Alexander V. Chernikov
136a1f8da8 Make <add|del|change>_route() static to finish the transition to the new kpi.
Discussed with:	glebius
2020-08-11 07:21:32 +00:00
Alexander V. Chernikov
9a00f6d067 Fix rib_subscribe() waitok flag by performing allocation outside epoch.
Make in6_inithead() use rib_subscribe with waitok to achieve reliable
 subscription allocation.

Reviewed by:	glebius
2020-08-11 07:05:30 +00:00
Alexander V. Chernikov
4c7ba83f9d Switch inet6 default route subscription to the new rib subscription api.
Old subscription model allowed only single customer.

Switch inet6 to the new subscription api and eliminate the old model.

Differential Revision:	https://reviews.freebsd.org/D25615
2020-07-12 11:24:23 +00:00
Alexander V. Chernikov
edc37a66e3 Add destructor for the rib subscription system to simplify users code.
Subscriptions are planned to be used by modules such as route lookup engines.
In that case that's the module task to properly unsibscribe before detach.
However, the in-kernel customer - inet6 wants to track default route changes.
To avoid having inet6 store per-fib subscriptions, handle automatic
 destruction internally.

Differential Revision:	https://reviews.freebsd.org/D25614
2020-07-12 11:18:09 +00:00
Mark Johnston
26dd427800 Split nhop_ref_object().
Now nhop_ref_object() unconditionally acquires a reference, and the new
nhop_try_ref_object() uses refcount_acquire_if_not_zero() to
conditionally acquire a reference.  Since the former is cheaper, use it
when we know that the initial counter value is non-zero.  No functional
change intended.

Reviewed by:	melifaro
Differential Revision:	https://reviews.freebsd.org/D25535
2020-07-06 21:20:57 +00:00
Alexander V. Chernikov
a287a973e3 Switch rtsock code to using newly-create rib_action() KPI call.
This simplifies the code and allows to further split rtentry and nexthop,
 removing one of the blockers for multipath code introduction, described in
 D24141.

Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D25192
2020-06-10 07:46:22 +00:00
Alexander V. Chernikov
41e66f4eca Add rib subscription API.
Currently there is no easy way of subscribing for the routing table changes.
The only existing way is to set ifa_rtrequest callback in the each protocol
 ifaddr, which is not convenient or extandable.

This change provides generic notification subscription mechanism, that will
 replace current ifa_rtrequest one and allow other applications such as
 accelerated routing lookup modules subscribe for the changes.

In particular, this change provides 2 hooks: 1) synchronous one
 (RIB_NOTIFY_IMMEDIATE), called under RIB_WLOCK, which ensures exact
 ordering of the changes and 2) async one, (RIB_NOTIFY_DELAYED)
 that is called after the change w/o holding locks. The latter one does not
 provide any notification ordering guarantee.

Differential Revision:  https://reviews.freebsd.org/D25070
2020-06-01 21:52:24 +00:00
Alexander V. Chernikov
46cc6153d4 Finish r361706: add sys/net/route/route_ctl.h, missed in previous commit. 2020-06-01 21:51:20 +00:00
Alexander V. Chernikov
da187ddb3d * Add rib_<add|del|change>_route() functions to manipulate the routing table.
The main driver for the change is the need to improve notification mechanism.
Currently callers guess the operation data based on the rtentry structure
 returned in case of successful operation result. There are two problems with
 this appoach. First is that it doesn't provide enough information for the
 upcoming multipath changes, where rtentry refers to a new nexthop group,
 and there is no way of guessing which paths were added during the change.
 Second is that some rtentry fields can change during notification and
 protecting from it by requiring customers to unlock rtentry is not desired.

Additionally, as the consumers such as rtsock do know which operation they
 request in advance, making explicit add/change/del versions of the functions
 makes sense, especially given the functions don't share a lot of code.

With that in mind, introduce rib_cmd_info notification structure and
 rib_<add|del|change>_route() functions, with mandatory rib_cmd_info pointer.
 It will be used in upcoming generalized notifications.

* Move definitions of the new functions and some other functions/structures
 used for the routing table manipulation to a separate header file,
 net/route/route_ctl.h. net/route.h is a frequently used file included in
 ~140 places in kernel, and 90% of the users don't need these definitions.

Reviewed by:		ae
Differential Revision:	https://reviews.freebsd.org/D25067
2020-06-01 20:49:42 +00:00
Alexander V. Chernikov
e7403d0230 Revert r361704, it accidentally committed merged D25067 and D25070. 2020-06-01 20:40:40 +00:00
Alexander V. Chernikov
79674562b8 * Add rib_<add|del|change>_route() functions to manipulate the routing table.
The main driver for the change is the need to improve notification mechanism.
Currently callers guess the operation data based on the rtentry structure
 returned in case of successful operation result. There are two problems with
 this appoach. First is that it doesn't provide enough information for the
 upcoming multipath changes, where rtentry refers to a new nexthop group,
 and there is no way of guessing which paths were added during the change.
 Second is that some rtentry fields can change during notification and
 protecting from it by requiring customers to unlock rtentry is not desired.

Additionally, as the consumers such as rtsock do know which operation they
 request in advance, making explicit add/change/del versions of the functions
 makes sense, especially given the functions don't share a lot of code.

With that in mind, introduce rib_cmd_info notification structure and
 rib_<add|del|change>_route() functions, with mandatory rib_cmd_info pointer.
 It will be used in upcoming generalized notifications.

* Move definitions of the new functions and some other functions/structures
 used for the routing table manipulation to a separate header file,
 net/route/route_ctl.h. net/route.h is a frequently used file included in
 ~140 places in kernel, and 90% of the users don't need these definitions.

Reviewed by:	ae
Differential Revision: https://reviews.freebsd.org/D25067
2020-06-01 20:32:02 +00:00
Alexander V. Chernikov
4d2c2509f2 Move <add|del|change>_route() functions to route_ctl.c in preparation of
multipath control plane changed described in D24141.

Currently route.c contains core routing init/teardown functions, route table
 manipulation functions and various helper functions, resulting in >2KLOC
 file in total. This change moves most of the route table manipulation parts
 to a dedicated file, simplifying planned multipath changes and making
 route.c more manageable.

Differential Revision:	https://reviews.freebsd.org/D24870
2020-05-23 19:06:57 +00:00
Alexander V. Chernikov
a82f62ec2d Remove refcounting from rtentry.
After making rtentry reclamation backed by epoch(9) in r361409, there is
 no reason in keeping reference counting code.

Differential Revision:	https://reviews.freebsd.org/D24867
2020-05-23 12:15:47 +00:00
Alexander V. Chernikov
2bbab0af6d Use epoch(9) for rtentries to simplify control plane operations.
Currently the only reason of refcounting rtentries is the need to report
 the rtable operation details immediately after the execution.
Delaying rtentry reclamation allows to stop refcounting and simplify the code.
Additionally, this change allows to reimplement rib_lookup_info(), which
 is used by some of the customers to get the matching prefix along
 with nexthops, in more efficient way.

The change keeps per-vnet rtzone uma zone. It adds nh_vnet field to
 nhop_priv to be able to reliably set curvnet even during vnet teardown.
Rest of the reference counting code will be removed in the D24867 .

Differential Revision:	https://reviews.freebsd.org/D24866
2020-05-23 10:21:02 +00:00
Warner Losh
83b4342743 Kill trailing newline while I'm here... 2020-05-12 23:46:52 +00:00
Alexander V. Chernikov
4a6ee281d9 Remove unused rnh_close callback from rtable & cleanup depends.
rnh_close callbackes was used by the in[6]_clsroute() handlers,
 doing cleanup in the route cloning code. Route cloning was eliminated
 somewhere around r186119. Last callback user was eliminated in r186215,
 11 years ago.

Differential Revision:	https://reviews.freebsd.org/D24793
2020-05-11 06:09:18 +00:00