freebsd-dev

Author	SHA1	Message	Date
Alexander V. Chernikov	33cb3cb2e3	Fix rib generation count for fib algo. Currently, PCB caching mechanism relies on the rib generation counter (rnh_gen) to invalidate cached nhops/LLE entries. With certain fib algorithms, it is now possible that the datapath lookup state applies RIB changes with some delay. In that scenario, PCB cache will invalidate on the RIB change, but the new lookup may result in the same nexthop being returned. When fib algo finally gets in sync with the RIB changes, PCB cache will not receive any notification and will end up caching the stale data. To fix this, introduce additional counter, rnh_gen_rib, which is used only when FIB_ALGO is enabled. This counter is incremented by the control plane. Each time when fib algo synchronises with the RIB, it updates rnh_gen to the current rnh_gen_rib value. Differential Revision: https://reviews.freebsd.org/D29812 Reviewed by: donner MFC after: 2 weeks	2021-04-20 22:02:41 +00:00
Alexander V. Chernikov	0abb6ff590	fib algo: do not reallocate datapath index for datapath ptr update. Fib algo uses a per-family array indexed by the fibnum to store lookup function pointers and per-fib data. Each algorithm rebuild currently requires re-allocating this array to support atomic change of two pointers. As in reality most of the changes actually involve changing only data pointer, add a shortcut performing in-flight pointer update. MFC after: 2 weeks	2021-04-18 16:12:13 +01:00
Alexander V. Chernikov	e2f79d9e51	Fib algo: extend KPI by allowing algo to set datapath pointers. Some algorithms may require updating datapath and control plane algo pointers after the (batched) updates. Export fib_set_datapath_ptr() to allow setting the new datapath function or data pointer from the algo. Add fib_set_algo_ptr() to allow updating algo control plane pointer from the algo. Add fib_epoch_call() epoch(9) wrapper to simplify freeing old datapath state. Reviewed by: zec Differential Revision: https://reviews.freebsd.org/D29799 MFC after: 1 week	2021-04-18 16:12:12 +01:00
Alexander V. Chernikov	6b8ef0d428	Add batched update support for the fib algo. Initial fib algo implementation was build on a very simple set of principles w.r.t updates: 1) algorithm is ether able to apply the change synchronously (DIR24-8) or requires full rebuild (bsearch, lradix). 2) framework falls back to rebuild on every error (memory allocation, nhg limit, other internal algo errors, etc). This changes brings the new "intermediate" concept - batched updates. Algotirhm can indicate that the particular update has to be handled in batched fashion (FLM_BATCH). The framework will write this update and other updates to the temporary buffer instead of pushing them to the algo callback. Depending on the update rate, the framework will batch 50..1024 ms of updates and submit them to a different algo callback. This functionality is handy for the slow-to-rebuild algorithms like DXR. Differential Revision: https://reviews.freebsd.org/D29588 Reviewed by: zec MFC after: 2 weeks	2021-04-14 23:54:11 +01:00
Alexander V. Chernikov	ee2cf2b360	Implement better rebuild-delay fib algo policy. The intent is to better handle time intervals with large amount of RIB updates (e.g. BGP peer going up or down), while still keeping low sync delay for the rest scenarios. The implementation is the following: updates are bucketed into the buckets of size 50ms. If the number of updates within a current bucket exceeds the threshold of 500 routes/sec (e.g. 10 updates per bucket interval), the update is delayed for another 50ms. This can be repeated until the maximum update delay (1 sec) is reached. All 3 variables are runtime tunables: * net.route.algo.fib_max_sync_delay_ms: 1000 * net.route.algo.bucket_change_threshold_rate: 500 * net.route.algo.bucket_time_ms: 50 Differential Review: https://reviews.freebsd.org/D29588 MFC after: 2 weeks	2021-04-09 21:33:03 +01:00
Alexander V. Chernikov	0c2a0e0380	Fix typo in the `9fa8d1582b`. Reported by: cy	2021-03-29 23:42:48 +00:00
Alexander V. Chernikov	9fa8d1582b	Put bandaid for nhgrp_dump_sysctl() malloc KASSERT(). Recent rtsock changes widened epoch and covered nhgrp_dump_sysctl(), resulting in `netstat -4On` triggering with KASSERT. MFC after: 1 day	2021-03-29 23:12:11 +00:00
Alexander V. Chernikov	0f30a36ded	Rename variables inside nexhtop group consider_resize() code. No functional changes. MFC after: 3 days	2021-03-29 23:06:13 +00:00
Alexander V. Chernikov	9095dc7da4	Fix nexhtop group index array scaling. The current code has the limit of 127 nexthop groups due to the wrongly-checked bitmask_copy() return value. PR: 254303 Reported by: Aleks <a.ivanov at veesp.com> MFC after: 1 day	2021-03-29 23:00:17 +00:00
Alexander V. Chernikov	6f43c72b47	Zero `struct weightened_nhop` fields in nhgrp_get_addition_group(). `struct weightened_nhop` has spare 32bit between the fields due to the alignment (on amd64). Not zeroing these spare bits results in duplicating nhop groups in the kernel due to the way how comparison works. MFC after: 1 day	2021-03-20 08:26:03 +00:00
Alexander V. Chernikov	24cd2796cf	Fix !VNET build broken by `66f138563b`.	2021-03-25 00:31:08 +00:00
Alexander V. Chernikov	66f138563b	Plug nexthop group refcount leak. In case with batch route delete via rib_walk_del(), when some paths from the multipath route gets deleted, old multipath group were not freed. PR: 254496 Reported by: Zhenlei Huang <zlei.huang@gmail.com> MFC after: 1 day	2021-03-24 23:52:18 +00:00
Alexander V. Chernikov	c00e2f573b	Fix build for non-vnet non-multipath kernels broken by `a0308e48ec`.	2021-03-23 23:35:23 +00:00
Alexander V. Chernikov	a0308e48ec	Fix panic when destroying interface with ECMP routes. Reported by: Zhenlei Huang <zlei.huang at gmail.com> PR: 254496 MFC after: immediately	2021-03-23 22:03:20 +00:00
Alexander V. Chernikov	2476178e6b	Fix kassert panic when inserting multipath routes from multiple threads. Reported by: Marco Zec <zec at fer.hr> MFC after: immediately	2021-03-21 18:15:29 +00:00
Alexander V. Chernikov	e4ac3f7463	Fix fib algo rebuild delay calculation. Submitted by: Marco Zec <zec at fer.hr> MFC after: 3 days	2021-03-15 21:09:07 +00:00
Alexander V. Chernikov	b1d63265ac	Flush remaining routes from the routing table during VNET shutdown. Summary: This fixes rtentry leak for the cloned interfaces created inside the VNET. PR: 253998 Reported by: rashey at superbox.pl MFC after: 3 days Loopback teardown order is `SI_SUB_INIT_IF`, which happens after `SI_SUB_PROTO_DOMAIN` (route table teardown). Thus, any route table operations are too late to schedule. As the intent of the vnet teardown procedures to minimise the amount of effort by doing global cleanups instead of per-interface ones, address this by adding a relatively light-weight routing table cleanup function, `rib_flush_routes()`. It removes all remaining routes from the routing table and schedules the deletion, which will happen later, when `rtables_destroy()` waits for the current epoch to finish. Test Plan: ``` set_skip:set_skip_group_lo -> passed [0.053s] tail -n 200 /var/log/messages \| grep rtentry ``` Reviewers: #network, kp, bz Reviewed By: kp Subscribers: imp, ae Differential Revision: https://reviews.freebsd.org/D29116	2021-03-10 21:10:14 +00:00
Alexander V. Chernikov	5964172837	Simplify ifa/ifp refcounting in the routing stack. The routing stack control depends on quite a tree of functions to determine the proper attributes of a route such as a source address (ifa) or transmit ifp of a route. When actually inserting a route, the stack needs to ensure that ifa and ifp points to the entities that are still valid. Validity means slightly more than just pointer validity - stack need guarantee that the provided objects are not scheduled for deletion. Currently, callers either ignore it (most ifp parts, historically) or try to use refcounting (ifa parts). Even in case of ifa refcounting it's not always implemented in fully-safe manner. For example, some codepaths inside rt_getifa_fib() are referencing ifa while not holding any locks, resulting in possibility of referencing scheduled-for-deletion ifa. Instead of trying to fix all of the callers by enforcing proper refcounting, switch to a different model. As the rib_action() already requires epoch, do not require any stability guarantees other than the epoch-provided one. Use newly-added conditional versions of the refcounting functions (ifa_try_ref(), if_try_ref()) and fail if any of these fails. Reviewed by: donner MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28837	2021-02-22 23:37:59 +00:00
Alexander V. Chernikov	a375ec52a7	Fix ifa refcount leak during route addition. Reported by: rstone Reviewed by: rstone MFC after: 1 day	2021-02-13 00:06:14 +00:00
Alexander V. Chernikov	8170a7d438	Fix interface route addition with net/bird. The case of adding interface route by specifying interface address as the gateway was missed during code refactoring. Re-add it back by copying non-AF_LINK gateway data when RTF_GATEWAY is not set. Reviewed by: donner MFC after: 3 days	2021-02-12 19:45:35 +00:00
Alexander V. Chernikov	adc4ea97bd	Turn off forgotten multipath debug messages Reported by: mike tancsa<mike at sentex.net> MFC after: 3 days	2021-02-08 21:42:20 +00:00
Alexander V. Chernikov	eb0b1b33d5	Enable multipath routing by default. ROUTE_MPATH was added to the GENERIC kernel in r368648. According to the plan in D27428, it was enabled with `net.route.multipath` sysctl set to 0. Given enough time has passed, this change enables route multipath by default. The goal is to ship FreeBSD 13 with multipath turned on. Reviewed By: donner, olivier MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28423	2021-02-03 08:49:58 +00:00
Alexander V. Chernikov	78c93a1721	Use process fib for inet/inet6 fib_algo sysctls. This allows to set/query fib algo for non-default fibs. MFC after: 3 days	2021-01-31 10:50:08 +00:00
Alexander V. Chernikov	151ec796a2	Fix the design problem with delayed algorithm sync. Currently, if the immutable algorithm like bsearch or radix_lockless receives rtable update notification, it schedules algorithm rebuild. This rebuild is executed by the callout after ~50 milliseconds. It is possible that a script adding an interface address and than route with the gateway bound to that address will fail. It can happen due to the fact that fib is not updated by the time the route addition request arrives. Fix this by allowing synchronous algorithm rebuilds based on certain conditions. By default, these conditions assume: 1) less than net.route.algo.fib_sync_limit=100 routes 2) routes without gateway. * Move algo instance build entirely under rib WLOCK. Rib lock is only used for control plane (except radix algo, but there are no rebuilds). * Add rib_walk_ext_locked() function to allow RIB iteration with rib lock already held. * Fix rare potential callout use-after-free for fds by binding fd callout to the relevant rib rmlock. In that case, callout_stop() under rib WLOCK guarantees no callout will be executed afterwards. MFC after: 3 days	2021-01-30 23:25:57 +00:00
Alexander V. Chernikov	dd9163003c	Add rib_subscribe_locked() and rib_unsubsribe_locked() to support subscriptions during RIB modifications. Add new subscriptions to the beginning of the lists instead of the end. This fixes the situation when new subscription is created int the callback for the existing subscription, leading to the subscription notification handler pick it. MFC after: 3 days	2021-01-30 23:25:57 +00:00
Alexander V. Chernikov	ab6d9aaed7	Move business logic from rebuild_fd_callout() into rebuild_fd(). This simplifies code a bit and allows for future non-callout callers to request rebuild. MFC after: 3 days	2021-01-30 23:25:57 +00:00
Alexander V. Chernikov	f8b7ebea49	Improve fib_algo debug messages. * Move per-prefix debug lines under LOG_DEBUG2 * Create fib instance counter to distingush log messages between instances * Add more messages on rebuild reason. MFC after: 3 days	2021-01-30 23:25:56 +00:00
Alexander V. Chernikov	9d6567bc30	Fix panic on vnet creation if fib algo has been set to fixed value. Make fixed algo property per-VNET instead of global.	2021-01-17 20:32:25 +00:00
Alexander V. Chernikov	81728a538d	Split rtinit() into multiple functions. rtinit[1]() is a function used to add or remove interface address prefix routes, similar to ifa_maintain_loopback_route(). It was intended to be family-agnostic. There is a problem with this approach in reality. 1) IPv6 code does not use it for the ifa routes. There is a separate layer, nd6_prelist_(), providing interface for maintaining interface routes. Its part, responsible for the actual route table interaction, mimics rtenty() code. 2) rtinit tries to combine multiple actions in the same function: constructing proper route attributes and handling iterations over multiple fibs, for the non-zero net.add_addr_allfibs use case. It notably increases the code complexity. 3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag for p2p connections, host routes and p2p routes are handled in the same way. Additionally, mapping IFA flags to RTF flags makes the interface pretty messy. It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface aliases. 4) rtinit() is the last customer passing non-masked prefixes to rib_action(), complicating rib_action() implementation. 5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive" ifa messages in certain corner cases. To address all these points, the following has been done: * rtinit() has been split into multiple functions: - Route attribute construction were moved to the per-address-family functions, dealing with (2), (3) and (4). - funnction providing net.add_addr_allfibs handling and route rtsock notificaions is the new routing table inteface. - rtsock ifa notificaion has been moved out as well. resulting set of funcion are only responsible for the actual route notifications. Side effects: * /32 alias does not result in interface routes (/32 route and "host" route) * RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses Differential revision: https://reviews.freebsd.org/D28186	2021-01-16 22:42:41 +00:00
Alexander V. Chernikov	685de460bc	Use static initializers for fib algo to shift initialization to ealier stage. This allows to register modules loaded at boot time. Reported by: olivier	2021-01-11 00:16:54 +00:00
Alexander V. Chernikov	d68cf57b7f	Refactor rt_addrmsg() and rt_routemsg(). Summary: * Refactor rt_addrmsg(): make V_rt_add_addr_allfibs decision locally. * Fix rt_routemsg() and multipath by accepting nexthop instead of interface pointer. * Refactor rtsock_routemsg(): avoid accessing rtentry fields directly. * Simplify in_addprefix() by moving prefix search to a separate function. Reviewers: #network Subscribers: imp, ae, bz Differential Revision: https://reviews.freebsd.org/D28011	2021-01-07 19:38:19 +00:00
Alexander V. Chernikov	9c0ff6a8bb	Remove now-unused RT_GATEWAY* definitions. They were used to simplify nexthop transition, hence not needed anymore.	2021-01-04 21:45:46 +00:00
Ryan Libby	833dbf1e22	route: quiet -Wredundant-decls Remove declaration duplicated in `f5baf8bb12` Reviewed by: melifaro Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D27790	2020-12-27 16:32:27 -08:00
Alexander V. Chernikov	f733d9701b	Fix default route handling in radix4_lockless algo. Improve nexthop debugging. Reported by: Florian Smeets <flo at smeets.xyz>	2020-12-26 22:51:02 +00:00
Alexander V. Chernikov	4e19e0d92a	Use light-weight versions of routing lookup functions in ng_netflow. Use recently-added combination of `fib[46]_lookup_rt()` which returns rtentry & raw nexthop with `rt_get_inet[6]_plen()` which returns address/prefix length of prefix inside `rt`. Add `nhop_select_func()` wrapper around inlined `nhop_select()` to allow callers external to the routing subsystem select the proper nexthop from the multipath group without including internal headers. New calls does not require reference counting objects and reduce the amount of copied/processed rtentry data. Differential Revision: https://reviews.freebsd.org/D27675	2020-12-26 11:27:38 +00:00
Alexander V. Chernikov	f5baf8bb12	Add modular fib lookup framework. This change introduces framework that allows to dynamically attach or detach longest prefix match (lpm) lookup algorithms to speed up datapath route tables lookups. Framework takes care of handling initial synchronisation, route subscription, nhop/nhop groups reference and indexing, dataplane attachments and fib instance algorithm setup/teardown. Framework features automatic algorithm selection, allowing for picking the best matching algorithm on-the-fly based on the amount of routes in the routing table. Currently framework code is guarded under FIB_ALGO config option. An idea is to enable it by default in the next couple of weeks. The following algorithms are provided by default: IPv4: * bsearch4 (lockless binary search in a special IP array), tailored for small-fib (<16 routes) * radix4_lockless (lockless immutable radix, re-created on every rtable change), tailored for small-fib (<1000 routes) * radix4 (base system radix backend) * dpdk_lpm4 (DPDK DIR24-8-based lookups), lockless datastrucure, optimized for large-fib (D27412) IPv6: * radix6_lockless (lockless immutable radix, re-created on every rtable change), tailed for small-fib (<1000 routes) * radix6 (base system radix backend) * dpdk_lpm6 (DPDK DIR24-8-based lookups), lockless datastrucure, optimized for large-fib (D27412) Performance changes: Micro benchmarks (I7-7660U, single-core lookups, 2048k dst, code in D27604): IPv4: 8 routes: radix4: ~20mpps radix4_lockless: ~24.8mpps bsearch4: ~69mpps dpdk_lpm4: ~67 mpps 700k routes: radix4_lockless: 3.3mpps dpdk_lpm4: 46mpps IPv6: 8 routes: radix6_lockless: ~20mpps dpdk_lpm6: ~70mpps 100k routes: radix6_lockless: 13.9mpps dpdk_lpm6: 57mpps Forwarding benchmarks: + 10-15% IPv4 forwarding performance (small-fib, bsearch4) + 25% IPv4 forwarding performance (full-view, dpdk_lpm4) + 20% IPv6 forwarding performance (full-view, dpdk_lpm6) Control: Framwork adds the following runtime sysctls: List algos * net.route.algo.inet.algo_list: bsearch4, radix4_lockless, radix4 * net.route.algo.inet6.algo_list: radix6_lockless, radix6, dpdk_lpm6 Debug level (7=LOG_DEBUG, per-route) net.route.algo.debug_level: 5 Algo selection (currently only for fib 0): net.route.algo.inet.algo: bsearch4 net.route.algo.inet6.algo: radix6_lockless Support for manually changing algos in non-default fib will be added soon. Some sysctl names will be changed in the near future. Differential Revision: https://reviews.freebsd.org/D27401	2020-12-25 11:33:17 +00:00
Alexander V. Chernikov	df9053920f	Add IPv4/IPv6 rtentry prefix accessors. Multiple consumers like ipfw, netflow or new route lookup algorithms need to get the prefix data out of struct rtentry. Instead of providing direct access to the rtentry, create IPv4/IPv6 accessors to abstract struct rtentry internals and avoid including internal routing headers for external consumers. While here, move struct route_nhop_data to the public header, so external customers can actually use lookup functions returning rt&nhop data. Differential Revision: https://reviews.freebsd.org/D27416	2020-12-03 22:23:57 +00:00
Alexander V. Chernikov	d1d941c5b9	Remove RADIX_MPATH config option. ROUTE_MPATH is the new config option controlling new multipath routing implementation. Remove the last pieces of RADIX_MPATH-related code and the config option. Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D27244	2020-11-29 19:43:33 +00:00
Alexander V. Chernikov	3b1654cb14	Introduce rib_walk_ext_internal() to allow iteration with rnh pointer. This solves the case when rib is not yet attached/detached to/from the system rib array. Differential Revision: https://reviews.freebsd.org/D27406	2020-11-29 13:54:49 +00:00
Alexander V. Chernikov	f47fa26065	Add nhop_ref_any() to unify referencing nhop or nexthop group. It allows code within routing subsystem to transparently reference nexthops and nexthop groups, similar to nhop_free_any(), abstracting ROUTE_MPATH details. Differential Revision: https://reviews.freebsd.org/D27410	2020-11-29 13:52:06 +00:00
Alexander V. Chernikov	98d5c4e5c8	Add tracking for rib/nhops/nhgrp objects and provide cumulative number accessors. The resulting KPI can be used by routing table consumers to estimate the required scale for route table export. * Add tracking for rib routes * Add accessors for number of nexthops/nexthop objects * Simplify rib_unsubscribe: store rnh we're attached to instead of requiring it up again on destruction. This helps in the cases when rnh is not linked yet/already unlinked. Differential Revision: https://reviews.freebsd.org/D27404	2020-11-29 13:27:24 +00:00
Alexander V. Chernikov	ef6ef7e5da	Add nhgrp_get_idx() as a counterpart for nhop_get_idx(). It allows the routing-related code to reference nexthop groups by index instead of storing a pointer.	2020-11-28 15:46:40 +00:00
Alexander V. Chernikov	7511a63825	Refactor rib iterator functions. * Make rib_walk() order of arguments consistent with the rest of RIB api * Add rib_walk_ext() allowing to exec callback before/after iteration. * Rename rt_foreach_fib_walk_del -> rib_foreach_table_walk_del * Rename rt_forach_fib_walk -> rib_foreach_table_walk * Move rib_foreach_table_walk{_del} to route/route_helpers.c * Slightly refactor rib_foreach_table_walk{_del} to make the implementation consistent and prepare for upcoming iterator optimizations. Differential Revision: https://reviews.freebsd.org/D27219	2020-11-22 20:21:10 +00:00
Alexander V. Chernikov	2d39824195	Switch net.add_addr_allfibs default to 0. The goal of the fib support is to provide multiple independent routing tables, isolated from each other. net.add_addr_allfibs default tries to shift gears in the opposite direction, unconditionally inserting all addresses to all of the fibs. There are use cases when this is necessary, however this is not a default expected behaviour, especially compared to other implementations. Provide WARNING message for the setups with multiple fibs to notify potential users of the feature. Differential Revision: https://reviews.freebsd.org/D26076	2020-11-08 18:27:49 +00:00
Alexander V. Chernikov	76e6b37f6b	Temporarily revert setting net.add_addr_allfibs to 0. It accidentally sweeped in r367486. Revert to allow for proper commit message & warning.	2020-11-08 18:11:12 +00:00
Alexander V. Chernikov	770495f4c0	Fix build broken by r367484: add route_ifaddrs.c. Pointy hat to: melifaro Reported by: jenkins	2020-11-08 13:30:44 +00:00
Alexander V. Chernikov	0c325f53f1	Implement flowid calculation for outbound connections to balance connections over multiple paths. Multipath routing relies on mbuf flowid data for both transit and outbound traffic. Current code fills mbuf flowid from inp_flowid for connection-oriented sockets. However, inp_flowid is currently not calculated for outbound connections. This change creates simple hashing functions and starts calculating hashes for TCP,UDP/UDP-Lite and raw IP if multipath routes are present in the system. Reviewed by: glebius (previous version),ae Differential Revision: https://reviews.freebsd.org/D26523	2020-10-18 17:15:47 +00:00
Alexander V. Chernikov	1b95005e95	Fix route flags update during RTM_CHANGE. Nexthop lookup was not consireding rt_flags when doing structure comparison, which lead to an original nexthop selection when changing flags. Fix the case by adding rt_flags field into comparison and rearranging nhop_priv fields to allow for efficient matching. Fix `route change X/Y flags` case - recent changes disallowed specifying RTF_GATEWAY flag without actual gateway. It turns out, route(8) fills in RTF_GATEWAY by default, unless -interface flag is specified. Fix regression by clearing RTF_GATEWAY flag instead of failing. Fix route flag reporting in RTM_CHANGE messages by explicitly updating rtm_flags after operation competion. Add IPv4/IPv6 tests for flag-only route changes.	2020-10-04 13:24:58 +00:00
Alexander V. Chernikov	9c584fa4bc	Remove ROUTE_MPATH-related warnings introduced in r366390. Reported by: mjg	2020-10-03 14:37:54 +00:00
Alexander V. Chernikov	fedeb08b6a	Introduce scalable route multipath. This change is based on the nexthop objects landed in D24232. The change introduces the concept of nexthop groups. Each group contains the collection of nexthops with their relative weights and a dataplane-optimized structure to enable efficient nexthop selection. Simular to the nexthops, nexthop groups are immutable. Dataplane part gets compiled during group creation and is basically an array of nexthop pointers, compiled w.r.t their weights. With this change, `rt_nhop` field of `struct rtentry` contains either nexthop or nexthop group. They are distinguished by the presense of NHF_MULTIPATH flag. All dataplane lookup functions returns pointer to the nexthop object, leaving nexhop groups details inside routing subsystem. User-visible changes: The change is intended to be backward-compatible: all non-mpath operations should work as before with ROUTE_MPATH and net.route.multipath=1. All routes now comes with weight, default weight is 1, maximum is 2^24-1. Current maximum multipath group width is statically set to 64. This will become sysctl-tunable in the followup changes. Using functionality: * Recompile kernel with ROUTE_MPATH * set net.route.multipath to 1 route add -6 2001:db8::/32 2001:db8::2 -weight 10 route add -6 2001:db8::/32 2001:db8::3 -weight 20 netstat -6On Nexthop groups data Internet6: GrpIdx NhIdx Weight Slots Gateway Netif Refcnt 1 ------- ------- ------- --------------------------------------- --------- 1 13 10 1 2001:db8::2 vlan2 14 20 2 2001:db8::3 vlan2 Next steps: * Land outbound hashing for locally-originated routes ( D26523 ). * Fix net/bird multipath (net/frr seems to work fine) * Add ROUTE_MPATH to GENERIC * Set net.route.multipath=1 by default Tested by: olivier Reviewed by: glebius Relnotes: yes Differential Revision: https://reviews.freebsd.org/D26449	2020-10-03 10:47:17 +00:00

1 2

97 Commits