freebsd-skq

Author	SHA1	Message	Date
Alexander V. Chernikov	aad59c79f5	Fix panic when trying to delete non-existent gateway in multipath route. IF non-existend gateway was specified, the code responsible for calculating an updated nexthop group, returned the same already-used nexthop group. After the route table update, the operation result contained the same old & new nexthop groups. Thus, the code responsible for decomposing the notification to the list of simple nexthop-level notifications, was not able to find any differences. As a result, it hasn't updated any of the "simple" notification fields, resulting in empty rtentry pointer. This empty pointer was the direct reason of a panic. Fix the problem by returning ESRCH when the new nexthop group is the same as the old one after applying gateway filter. Reported by: Michael <michael.adm at gmail.com> PR: 255665 MFC after: 3 days	2021-05-07 20:41:31 +00:00
Alexander V. Chernikov	41ce0e34ea	[fib algo] Update fib_gen counter under FIB_MOD_LOCK. MFC after: 3 days	2021-04-28 20:23:03 +00:00
Alexander V. Chernikov	f9668e42b4	Add rib_walk_from() wrapper for selective rib tree traversal. Provide wrapper for the rnh_walktree_from() rib callback. As currently `struct rib_head` is considered internal to the routing subsystem, this wrapper is necessary to maintain isolation from the external code. Differential Revision: https://reviews.freebsd.org/D29971 MFC after: 1 week	2021-04-28 08:09:45 +00:00
Alexander V. Chernikov	8a0d57baec	[fib algo] Delay algo init at fib growth to to allow to reliably use rib KPI. Currently, most of the rib(9) KPI does not use rnh pointers, using fibnum and family parameters to determine the rib pointer instead. This works well except for the case when we initialize new rib pointers during fib growth. In that case, there is no mapping between fib/family and the new rib, as an entirely new rib pointer array is populated. Address this by delaying fib algo initialization till after switching to the new pointer array and updating the number of fibs. Set datapath pointer to the dummy function, so the potential callers won't crash the kernel in the brief moment when the rib exists, but no fib algo is attached. This change allows to avoid creating duplicates of existing rib functions, with altered signature. Differential Revision: https://reviews.freebsd.org/D29969 MFC after: 1 week	2021-04-27 22:10:08 +00:00
Alexander V. Chernikov	439d087d0b	[fib algo] always commit static routes synchronously. Modular fib lookup framework features logic that allows route update batching for the algorithms that cannot easily apply the routing change without rebuilding. As a result, dataplane lookups may return old data until the the sync takes place. With the default sync timeout of 50ms, it is possible that new binary like ping(8) executed exactly after route(8) will still use the old fib data. To address some aspects of the problem, framework executes all rtable changes without RTF_GATEWAY synchronously. To fix the aforementioned problem, this diff extends sync execution for all RTF_STATIC routes (e.g. ones maintained by route(8). This fixes a bunch of tests in the networking space. Reported by: ci, arichardson MFC after: 2 weeks	2021-04-27 08:31:40 +00:00
Alexander V. Chernikov	bc5ef45aec	Fix drace CTF for the rib_head. `33cb3cb2e3` introduced an `rib_head` structure field under the FIB_ALGO define. This may be problematic for the CTF, as some of the files including `route_var.h` do not have `fib_algo` defined. Make dtrace happy by making the field unconditional. Suggested by: markj	2021-04-27 07:47:53 +00:00
Alexander V. Chernikov	7d222ce3c1	Fix NOINET[6],!VIMAGE builds after FIB_ALGO addition to GENERIC Reported by: jbeich PR: 255390	2021-04-21 05:53:42 +01:00
Alexander V. Chernikov	67372fb3e0	Fix NOINET[6] build after enabling FIB_ALGO in GENERIC. Submitted by: jbeich PR: 255389	2021-04-21 02:49:18 +01:00
Alexander V. Chernikov	c23385612d	[fib algo] Do not print algo attach/detach message on boot MFC after: 1 day	2021-04-25 08:58:06 +00:00
Alexander V. Chernikov	a81e2e7890	Make gcc happy by initializing error in rib_handle_ifaddr_info().	2021-04-25 08:44:59 +00:00
Stefan Eßer	6409e59427	Fix build with gcc Correctly declare function without arguments as f(void) instead of f().	2021-04-25 10:15:17 +02:00
Alexander V. Chernikov	33cb3cb2e3	Fix rib generation count for fib algo. Currently, PCB caching mechanism relies on the rib generation counter (rnh_gen) to invalidate cached nhops/LLE entries. With certain fib algorithms, it is now possible that the datapath lookup state applies RIB changes with some delay. In that scenario, PCB cache will invalidate on the RIB change, but the new lookup may result in the same nexthop being returned. When fib algo finally gets in sync with the RIB changes, PCB cache will not receive any notification and will end up caching the stale data. To fix this, introduce additional counter, rnh_gen_rib, which is used only when FIB_ALGO is enabled. This counter is incremented by the control plane. Each time when fib algo synchronises with the RIB, it updates rnh_gen to the current rnh_gen_rib value. Differential Revision: https://reviews.freebsd.org/D29812 Reviewed by: donner MFC after: 2 weeks	2021-04-20 22:02:41 +00:00
Alexander V. Chernikov	0abb6ff590	fib algo: do not reallocate datapath index for datapath ptr update. Fib algo uses a per-family array indexed by the fibnum to store lookup function pointers and per-fib data. Each algorithm rebuild currently requires re-allocating this array to support atomic change of two pointers. As in reality most of the changes actually involve changing only data pointer, add a shortcut performing in-flight pointer update. MFC after: 2 weeks	2021-04-18 16:12:13 +01:00
Alexander V. Chernikov	e2f79d9e51	Fib algo: extend KPI by allowing algo to set datapath pointers. Some algorithms may require updating datapath and control plane algo pointers after the (batched) updates. Export fib_set_datapath_ptr() to allow setting the new datapath function or data pointer from the algo. Add fib_set_algo_ptr() to allow updating algo control plane pointer from the algo. Add fib_epoch_call() epoch(9) wrapper to simplify freeing old datapath state. Reviewed by: zec Differential Revision: https://reviews.freebsd.org/D29799 MFC after: 1 week	2021-04-18 16:12:12 +01:00
Alexander V. Chernikov	6b8ef0d428	Add batched update support for the fib algo. Initial fib algo implementation was build on a very simple set of principles w.r.t updates: 1) algorithm is ether able to apply the change synchronously (DIR24-8) or requires full rebuild (bsearch, lradix). 2) framework falls back to rebuild on every error (memory allocation, nhg limit, other internal algo errors, etc). This changes brings the new "intermediate" concept - batched updates. Algotirhm can indicate that the particular update has to be handled in batched fashion (FLM_BATCH). The framework will write this update and other updates to the temporary buffer instead of pushing them to the algo callback. Depending on the update rate, the framework will batch 50..1024 ms of updates and submit them to a different algo callback. This functionality is handy for the slow-to-rebuild algorithms like DXR. Differential Revision: https://reviews.freebsd.org/D29588 Reviewed by: zec MFC after: 2 weeks	2021-04-14 23:54:11 +01:00
Alexander V. Chernikov	ee2cf2b360	Implement better rebuild-delay fib algo policy. The intent is to better handle time intervals with large amount of RIB updates (e.g. BGP peer going up or down), while still keeping low sync delay for the rest scenarios. The implementation is the following: updates are bucketed into the buckets of size 50ms. If the number of updates within a current bucket exceeds the threshold of 500 routes/sec (e.g. 10 updates per bucket interval), the update is delayed for another 50ms. This can be repeated until the maximum update delay (1 sec) is reached. All 3 variables are runtime tunables: * net.route.algo.fib_max_sync_delay_ms: 1000 * net.route.algo.bucket_change_threshold_rate: 500 * net.route.algo.bucket_time_ms: 50 Differential Review: https://reviews.freebsd.org/D29588 MFC after: 2 weeks	2021-04-09 21:33:03 +01:00
Alexander V. Chernikov	0c2a0e0380	Fix typo in the `9fa8d1582b`. Reported by: cy	2021-03-29 23:42:48 +00:00
Alexander V. Chernikov	9fa8d1582b	Put bandaid for nhgrp_dump_sysctl() malloc KASSERT(). Recent rtsock changes widened epoch and covered nhgrp_dump_sysctl(), resulting in `netstat -4On` triggering with KASSERT. MFC after: 1 day	2021-03-29 23:12:11 +00:00
Alexander V. Chernikov	0f30a36ded	Rename variables inside nexhtop group consider_resize() code. No functional changes. MFC after: 3 days	2021-03-29 23:06:13 +00:00
Alexander V. Chernikov	9095dc7da4	Fix nexhtop group index array scaling. The current code has the limit of 127 nexthop groups due to the wrongly-checked bitmask_copy() return value. PR: 254303 Reported by: Aleks <a.ivanov at veesp.com> MFC after: 1 day	2021-03-29 23:00:17 +00:00
Alexander V. Chernikov	6f43c72b47	Zero `struct weightened_nhop` fields in nhgrp_get_addition_group(). `struct weightened_nhop` has spare 32bit between the fields due to the alignment (on amd64). Not zeroing these spare bits results in duplicating nhop groups in the kernel due to the way how comparison works. MFC after: 1 day	2021-03-20 08:26:03 +00:00
Alexander V. Chernikov	24cd2796cf	Fix !VNET build broken by `66f138563b`.	2021-03-25 00:31:08 +00:00
Alexander V. Chernikov	66f138563b	Plug nexthop group refcount leak. In case with batch route delete via rib_walk_del(), when some paths from the multipath route gets deleted, old multipath group were not freed. PR: 254496 Reported by: Zhenlei Huang <zlei.huang@gmail.com> MFC after: 1 day	2021-03-24 23:52:18 +00:00
Alexander V. Chernikov	c00e2f573b	Fix build for non-vnet non-multipath kernels broken by `a0308e48ec`.	2021-03-23 23:35:23 +00:00
Alexander V. Chernikov	a0308e48ec	Fix panic when destroying interface with ECMP routes. Reported by: Zhenlei Huang <zlei.huang at gmail.com> PR: 254496 MFC after: immediately	2021-03-23 22:03:20 +00:00
Alexander V. Chernikov	2476178e6b	Fix kassert panic when inserting multipath routes from multiple threads. Reported by: Marco Zec <zec at fer.hr> MFC after: immediately	2021-03-21 18:15:29 +00:00
Alexander V. Chernikov	e4ac3f7463	Fix fib algo rebuild delay calculation. Submitted by: Marco Zec <zec at fer.hr> MFC after: 3 days	2021-03-15 21:09:07 +00:00
Alexander V. Chernikov	b1d63265ac	Flush remaining routes from the routing table during VNET shutdown. Summary: This fixes rtentry leak for the cloned interfaces created inside the VNET. PR: 253998 Reported by: rashey at superbox.pl MFC after: 3 days Loopback teardown order is `SI_SUB_INIT_IF`, which happens after `SI_SUB_PROTO_DOMAIN` (route table teardown). Thus, any route table operations are too late to schedule. As the intent of the vnet teardown procedures to minimise the amount of effort by doing global cleanups instead of per-interface ones, address this by adding a relatively light-weight routing table cleanup function, `rib_flush_routes()`. It removes all remaining routes from the routing table and schedules the deletion, which will happen later, when `rtables_destroy()` waits for the current epoch to finish. Test Plan: ``` set_skip:set_skip_group_lo -> passed [0.053s] tail -n 200 /var/log/messages \| grep rtentry ``` Reviewers: #network, kp, bz Reviewed By: kp Subscribers: imp, ae Differential Revision: https://reviews.freebsd.org/D29116	2021-03-10 21:10:14 +00:00
Alexander V. Chernikov	5964172837	Simplify ifa/ifp refcounting in the routing stack. The routing stack control depends on quite a tree of functions to determine the proper attributes of a route such as a source address (ifa) or transmit ifp of a route. When actually inserting a route, the stack needs to ensure that ifa and ifp points to the entities that are still valid. Validity means slightly more than just pointer validity - stack need guarantee that the provided objects are not scheduled for deletion. Currently, callers either ignore it (most ifp parts, historically) or try to use refcounting (ifa parts). Even in case of ifa refcounting it's not always implemented in fully-safe manner. For example, some codepaths inside rt_getifa_fib() are referencing ifa while not holding any locks, resulting in possibility of referencing scheduled-for-deletion ifa. Instead of trying to fix all of the callers by enforcing proper refcounting, switch to a different model. As the rib_action() already requires epoch, do not require any stability guarantees other than the epoch-provided one. Use newly-added conditional versions of the refcounting functions (ifa_try_ref(), if_try_ref()) and fail if any of these fails. Reviewed by: donner MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28837	2021-02-22 23:37:59 +00:00
Alexander V. Chernikov	a375ec52a7	Fix ifa refcount leak during route addition. Reported by: rstone Reviewed by: rstone MFC after: 1 day	2021-02-13 00:06:14 +00:00
Alexander V. Chernikov	8170a7d438	Fix interface route addition with net/bird. The case of adding interface route by specifying interface address as the gateway was missed during code refactoring. Re-add it back by copying non-AF_LINK gateway data when RTF_GATEWAY is not set. Reviewed by: donner MFC after: 3 days	2021-02-12 19:45:35 +00:00
Alexander V. Chernikov	adc4ea97bd	Turn off forgotten multipath debug messages Reported by: mike tancsa<mike at sentex.net> MFC after: 3 days	2021-02-08 21:42:20 +00:00
Alexander V. Chernikov	eb0b1b33d5	Enable multipath routing by default. ROUTE_MPATH was added to the GENERIC kernel in r368648. According to the plan in D27428, it was enabled with `net.route.multipath` sysctl set to 0. Given enough time has passed, this change enables route multipath by default. The goal is to ship FreeBSD 13 with multipath turned on. Reviewed By: donner, olivier MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28423	2021-02-03 08:49:58 +00:00
Alexander V. Chernikov	78c93a1721	Use process fib for inet/inet6 fib_algo sysctls. This allows to set/query fib algo for non-default fibs. MFC after: 3 days	2021-01-31 10:50:08 +00:00
Alexander V. Chernikov	151ec796a2	Fix the design problem with delayed algorithm sync. Currently, if the immutable algorithm like bsearch or radix_lockless receives rtable update notification, it schedules algorithm rebuild. This rebuild is executed by the callout after ~50 milliseconds. It is possible that a script adding an interface address and than route with the gateway bound to that address will fail. It can happen due to the fact that fib is not updated by the time the route addition request arrives. Fix this by allowing synchronous algorithm rebuilds based on certain conditions. By default, these conditions assume: 1) less than net.route.algo.fib_sync_limit=100 routes 2) routes without gateway. * Move algo instance build entirely under rib WLOCK. Rib lock is only used for control plane (except radix algo, but there are no rebuilds). * Add rib_walk_ext_locked() function to allow RIB iteration with rib lock already held. * Fix rare potential callout use-after-free for fds by binding fd callout to the relevant rib rmlock. In that case, callout_stop() under rib WLOCK guarantees no callout will be executed afterwards. MFC after: 3 days	2021-01-30 23:25:57 +00:00
Alexander V. Chernikov	dd9163003c	Add rib_subscribe_locked() and rib_unsubsribe_locked() to support subscriptions during RIB modifications. Add new subscriptions to the beginning of the lists instead of the end. This fixes the situation when new subscription is created int the callback for the existing subscription, leading to the subscription notification handler pick it. MFC after: 3 days	2021-01-30 23:25:57 +00:00
Alexander V. Chernikov	ab6d9aaed7	Move business logic from rebuild_fd_callout() into rebuild_fd(). This simplifies code a bit and allows for future non-callout callers to request rebuild. MFC after: 3 days	2021-01-30 23:25:57 +00:00
Alexander V. Chernikov	f8b7ebea49	Improve fib_algo debug messages. * Move per-prefix debug lines under LOG_DEBUG2 * Create fib instance counter to distingush log messages between instances * Add more messages on rebuild reason. MFC after: 3 days	2021-01-30 23:25:56 +00:00
Alexander V. Chernikov	9d6567bc30	Fix panic on vnet creation if fib algo has been set to fixed value. Make fixed algo property per-VNET instead of global.	2021-01-17 20:32:25 +00:00
Alexander V. Chernikov	81728a538d	Split rtinit() into multiple functions. rtinit[1]() is a function used to add or remove interface address prefix routes, similar to ifa_maintain_loopback_route(). It was intended to be family-agnostic. There is a problem with this approach in reality. 1) IPv6 code does not use it for the ifa routes. There is a separate layer, nd6_prelist_(), providing interface for maintaining interface routes. Its part, responsible for the actual route table interaction, mimics rtenty() code. 2) rtinit tries to combine multiple actions in the same function: constructing proper route attributes and handling iterations over multiple fibs, for the non-zero net.add_addr_allfibs use case. It notably increases the code complexity. 3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag for p2p connections, host routes and p2p routes are handled in the same way. Additionally, mapping IFA flags to RTF flags makes the interface pretty messy. It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface aliases. 4) rtinit() is the last customer passing non-masked prefixes to rib_action(), complicating rib_action() implementation. 5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive" ifa messages in certain corner cases. To address all these points, the following has been done: * rtinit() has been split into multiple functions: - Route attribute construction were moved to the per-address-family functions, dealing with (2), (3) and (4). - funnction providing net.add_addr_allfibs handling and route rtsock notificaions is the new routing table inteface. - rtsock ifa notificaion has been moved out as well. resulting set of funcion are only responsible for the actual route notifications. Side effects: * /32 alias does not result in interface routes (/32 route and "host" route) * RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses Differential revision: https://reviews.freebsd.org/D28186	2021-01-16 22:42:41 +00:00
Alexander V. Chernikov	685de460bc	Use static initializers for fib algo to shift initialization to ealier stage. This allows to register modules loaded at boot time. Reported by: olivier	2021-01-11 00:16:54 +00:00
Alexander V. Chernikov	d68cf57b7f	Refactor rt_addrmsg() and rt_routemsg(). Summary: * Refactor rt_addrmsg(): make V_rt_add_addr_allfibs decision locally. * Fix rt_routemsg() and multipath by accepting nexthop instead of interface pointer. * Refactor rtsock_routemsg(): avoid accessing rtentry fields directly. * Simplify in_addprefix() by moving prefix search to a separate function. Reviewers: #network Subscribers: imp, ae, bz Differential Revision: https://reviews.freebsd.org/D28011	2021-01-07 19:38:19 +00:00
Alexander V. Chernikov	9c0ff6a8bb	Remove now-unused RT_GATEWAY* definitions. They were used to simplify nexthop transition, hence not needed anymore.	2021-01-04 21:45:46 +00:00
Ryan Libby	833dbf1e22	route: quiet -Wredundant-decls Remove declaration duplicated in `f5baf8bb12` Reviewed by: melifaro Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D27790	2020-12-27 16:32:27 -08:00
Alexander V. Chernikov	f733d9701b	Fix default route handling in radix4_lockless algo. Improve nexthop debugging. Reported by: Florian Smeets <flo at smeets.xyz>	2020-12-26 22:51:02 +00:00
Alexander V. Chernikov	4e19e0d92a	Use light-weight versions of routing lookup functions in ng_netflow. Use recently-added combination of `fib[46]_lookup_rt()` which returns rtentry & raw nexthop with `rt_get_inet[6]_plen()` which returns address/prefix length of prefix inside `rt`. Add `nhop_select_func()` wrapper around inlined `nhop_select()` to allow callers external to the routing subsystem select the proper nexthop from the multipath group without including internal headers. New calls does not require reference counting objects and reduce the amount of copied/processed rtentry data. Differential Revision: https://reviews.freebsd.org/D27675	2020-12-26 11:27:38 +00:00
Alexander V. Chernikov	f5baf8bb12	Add modular fib lookup framework. This change introduces framework that allows to dynamically attach or detach longest prefix match (lpm) lookup algorithms to speed up datapath route tables lookups. Framework takes care of handling initial synchronisation, route subscription, nhop/nhop groups reference and indexing, dataplane attachments and fib instance algorithm setup/teardown. Framework features automatic algorithm selection, allowing for picking the best matching algorithm on-the-fly based on the amount of routes in the routing table. Currently framework code is guarded under FIB_ALGO config option. An idea is to enable it by default in the next couple of weeks. The following algorithms are provided by default: IPv4: * bsearch4 (lockless binary search in a special IP array), tailored for small-fib (<16 routes) * radix4_lockless (lockless immutable radix, re-created on every rtable change), tailored for small-fib (<1000 routes) * radix4 (base system radix backend) * dpdk_lpm4 (DPDK DIR24-8-based lookups), lockless datastrucure, optimized for large-fib (D27412) IPv6: * radix6_lockless (lockless immutable radix, re-created on every rtable change), tailed for small-fib (<1000 routes) * radix6 (base system radix backend) * dpdk_lpm6 (DPDK DIR24-8-based lookups), lockless datastrucure, optimized for large-fib (D27412) Performance changes: Micro benchmarks (I7-7660U, single-core lookups, 2048k dst, code in D27604): IPv4: 8 routes: radix4: ~20mpps radix4_lockless: ~24.8mpps bsearch4: ~69mpps dpdk_lpm4: ~67 mpps 700k routes: radix4_lockless: 3.3mpps dpdk_lpm4: 46mpps IPv6: 8 routes: radix6_lockless: ~20mpps dpdk_lpm6: ~70mpps 100k routes: radix6_lockless: 13.9mpps dpdk_lpm6: 57mpps Forwarding benchmarks: + 10-15% IPv4 forwarding performance (small-fib, bsearch4) + 25% IPv4 forwarding performance (full-view, dpdk_lpm4) + 20% IPv6 forwarding performance (full-view, dpdk_lpm6) Control: Framwork adds the following runtime sysctls: List algos * net.route.algo.inet.algo_list: bsearch4, radix4_lockless, radix4 * net.route.algo.inet6.algo_list: radix6_lockless, radix6, dpdk_lpm6 Debug level (7=LOG_DEBUG, per-route) net.route.algo.debug_level: 5 Algo selection (currently only for fib 0): net.route.algo.inet.algo: bsearch4 net.route.algo.inet6.algo: radix6_lockless Support for manually changing algos in non-default fib will be added soon. Some sysctl names will be changed in the near future. Differential Revision: https://reviews.freebsd.org/D27401	2020-12-25 11:33:17 +00:00
Alexander V. Chernikov	df9053920f	Add IPv4/IPv6 rtentry prefix accessors. Multiple consumers like ipfw, netflow or new route lookup algorithms need to get the prefix data out of struct rtentry. Instead of providing direct access to the rtentry, create IPv4/IPv6 accessors to abstract struct rtentry internals and avoid including internal routing headers for external consumers. While here, move struct route_nhop_data to the public header, so external customers can actually use lookup functions returning rt&nhop data. Differential Revision: https://reviews.freebsd.org/D27416	2020-12-03 22:23:57 +00:00
Alexander V. Chernikov	d1d941c5b9	Remove RADIX_MPATH config option. ROUTE_MPATH is the new config option controlling new multipath routing implementation. Remove the last pieces of RADIX_MPATH-related code and the config option. Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D27244	2020-11-29 19:43:33 +00:00
Alexander V. Chernikov	3b1654cb14	Introduce rib_walk_ext_internal() to allow iteration with rnh pointer. This solves the case when rib is not yet attached/detached to/from the system rib array. Differential Revision: https://reviews.freebsd.org/D27406	2020-11-29 13:54:49 +00:00

1 2 3

108 Commits