freebsd-dev

Author	SHA1	Message	Date
John Baldwin	f7236dd068	change_mpath_route: Remove write-only nh variable. While here, cleanup the style of the function prologue by moving an assignment out of the middle of two variable declaration blocks.	2022-04-06 16:45:28 -07:00
John Baldwin	371c917b0b	unlink_nhgrp: Remove write-only variable. Possibly one could assert that ret should always be 0 here (that is, that there was always an index found in the bitmask). That should be true since a bitmask index is allocated before the nhgrp is inserted in the ctl->gr_head list in link_nhgrp.	2022-04-06 16:45:27 -07:00
Warner Losh	5de5b5a34d	route_ctl: eliminate write only variables ifa and nh Sponsored by: Netflix	2022-04-04 22:30:48 -06:00
Warner Losh	7f9c3339a4	get_nhop: eliminate write only variable gateway Sponsored by: Netflix	2022-04-04 22:30:47 -06:00
Alexander V. Chernikov	1b8b69508b	routing: copy nexthop fib when changing existing nexthop MFC after: 1 day	2022-03-28 11:32:30 +00:00
Ed Maste	a6668e31aa	Fix kernel build without INET and INET6 Reviewed by: brooks, melifaro Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33718	2022-01-05 09:41:38 -05:00
Alexander V. Chernikov	63f7f3921b	routing: Add unified level-based logging support for the routing subsystem. Summary: MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D33664	2021-12-29 21:30:18 +00:00
Alexander V. Chernikov	823a08d740	nhops: split nh_family into nh_upper_family and nh_neigh_family. With IPv4 over IPv6 nexthops and IP->MPLS support, there is a need to distingush "upper" e.g. traffic family and "neighbor" e.g. LLE/gateway address family. Store them explicitly in the private part of the nexthop data. While here, store nhop fibnum in nhop_prip datastructure to make it self-contained. MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D33663	2021-12-29 21:03:19 +00:00
Gleb Smirnoff	ad2a0aec29	nhop: hash ifnet pointer instead of if_index Yet another problem created by VIMAGE/if_vmove/epair design that relocates ifnet between vnets and changes if_index. Since if_index changes, nhop hash values also changes, unlink_nhop() isn't able to find entry in hash and leaks the nhop. Since nhop references ifnet, the latter is also leaked. As result running network tests leaks memory on every single test that creates vnet jail. While here, rewrite whole hash_priv() to use static initializer, per Alexander's suggestion. Reviewed by: melifaro	2021-12-04 10:05:46 -08:00
Alexander V. Chernikov	7e64580b5f	routing: Use the same index space for both nexthop and nexthop groups. This simplifies userland object handling along with kernel-level nexthop handling in fib algo framework. MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D32342	2021-10-08 07:58:55 +00:00
Alexander V. Chernikov	0a3a377aee	routing: Disallow zero nexthop weights in nexthop groups. Adding such nexthops breaks calc_min_mpath_slots() assumptions, thus resulting in the incorrect nexthop group creation and eventually leading to panic. Reported by: avg MFC after: 1 week	2021-09-01 07:16:24 +00:00
Alexander V. Chernikov	639d7abec6	routing: simplify malloc flags in alloc_nhgrp(). MFC after: 1 week	2021-08-31 08:14:16 +00:00
Alexander V. Chernikov	f84c30106e	routing: Fix newly-added rt_get_inet[6]_parent() api. Correctly handle the case when no default route is present. Reported by: Konrad <konrad.kreciwilk at korbank.pl>	2021-08-30 21:10:37 +00:00
Alexander V. Chernikov	d98954e229	routing: Bring back the ability to specify transmit interface via its name. Some software references outgoing interfaces by specifying name instead of index. Use rti_ifp from rt_addrinfo if provided instead of always using address interface when constructing nexthop. PR: 255678 Reported by: martin.larsson2 at gmail.com MFC after: 1 week	2021-08-29 20:05:14 +00:00
Zhenlei Huang	62e1a437f3	routing: Allow using IPv6 next-hops for IPv4 routes (RFC 5549). Implement kernel support for RFC 5549/8950. * Relax control plane restrictions and allow specifying IPv6 gateways for IPv4 routes. This behavior is controlled by the net.route.rib_route_ipv6_nexthop sysctl (on by default). * Always pass final destination in ro->ro_dst in ip_forward(). * Use ro->ro_dst to exract packet family inside if_output() routines. Consistently use RO_GET_FAMILY() macro to handle ro=NULL case. * Pass extracted family to nd6_resolve() to get the LLE with proper encap. It leverages recent lltable changes committed in `c541bd368f`. Presence of the functionality can be checked using ipv4_rfc5549_support feature(3). Example usage: route add -net 192.0.0.0/24 -inet6 fe80::5054:ff:fe14:e319%vtnet0 Differential Revision: https://reviews.freebsd.org/D30398 MFC after: 2 weeks	2021-08-22 22:56:08 +00:00
Alexander V. Chernikov	36e15b717e	routing: Fix crashes with dpdk_lpm[46] algo. When a prefix gets deleted from the RIB, dpdk_lpm algo needs to know the nexthop of the "parent" prefix to update its internal state. The glue code, which utilises RIB as a backing route store, uses fib[46]_lookup_rt() for the prefix destination after its deletion to fetch the desired nexthop. This approach does not work when deleting less-specific prefixes with most-specific ones are still present. For example, if 10.0.0.0/24, 10.0.0.0/23 and 10.0.0.0/22 exist in RIB, deleting 10.0.0.0/23 would result in 10.0.0.0/24 being returned as a search result instead of 10.0.0.0/22. This, in turn, results in the failed datastructure update: part of the deleted /23 prefix will still contain the reference to an old nexthop. This leads to the use-after-free behaviour, ending with the eventual crashes. Fix the logic flaw by properly fetching the prefix "parent" via newly-created rt_get_inet[6]_parent() helpers. Differential Revision: https://reviews.freebsd.org/D31546 PR: 256882,256833 MFC after: 1 week	2021-08-17 20:46:22 +00:00
Alexander V. Chernikov	9748eb7427	Simplify nhop operations in ip_output(). Consistently use `nh` instead of always dereferencing ro->ro_nh inside the if block. Always use nexthop mtu, as it provides guarantee that mtu is accurate. Pass `nh` pointer to rt_update_ro_flags() to allow upcoming uses of updating ro flags based on different nexthop. Differential Revision: https://reviews.freebsd.org/D31451 Reviewed by: kp MFC after: 2 weeks	2021-08-08 09:19:27 +00:00
Alexander V. Chernikov	5b42b494d5	Fix typo in rib_unsibscribe<_locked>(). Submitted by: Zhenlei Huang<zlei.huang at gmail.com> Differential Revision: https://reviews.freebsd.org/D31356	2021-08-01 13:29:52 +00:00
Alexander V. Chernikov	054948bd81	[multipath][nhops] Fix random crashes with high route churn rate. When certain multipath route begins flapping really fast, it may result in creating multiple identical nexthop groups. The code responsible for unlinking unused nexthop groups had an implicit assumption that there could be only one nexthop group for the same combination of nexthops with weights. This assumption resulted in always unlinking the first "identical" group, instead of the desired one. Such action, in turn, produced a used-but-unlinked nhg along with freed-and-linked nhg, ending up in random crashes. Similarly, it is possible that multiple identical nexthops gets created in the case of high route churn, resulting in the same problem when deleting one of such nexthops. Fix by matching the nexthop/nexhop group pointer when deleting the item. Reported by: avg MFC after: 1 week	2021-08-01 10:07:37 +00:00
Alexander V. Chernikov	aad59c79f5	Fix panic when trying to delete non-existent gateway in multipath route. IF non-existend gateway was specified, the code responsible for calculating an updated nexthop group, returned the same already-used nexthop group. After the route table update, the operation result contained the same old & new nexthop groups. Thus, the code responsible for decomposing the notification to the list of simple nexthop-level notifications, was not able to find any differences. As a result, it hasn't updated any of the "simple" notification fields, resulting in empty rtentry pointer. This empty pointer was the direct reason of a panic. Fix the problem by returning ESRCH when the new nexthop group is the same as the old one after applying gateway filter. Reported by: Michael <michael.adm at gmail.com> PR: 255665 MFC after: 3 days	2021-05-07 20:41:31 +00:00
Alexander V. Chernikov	41ce0e34ea	[fib algo] Update fib_gen counter under FIB_MOD_LOCK. MFC after: 3 days	2021-04-28 20:23:03 +00:00
Alexander V. Chernikov	f9668e42b4	Add rib_walk_from() wrapper for selective rib tree traversal. Provide wrapper for the rnh_walktree_from() rib callback. As currently `struct rib_head` is considered internal to the routing subsystem, this wrapper is necessary to maintain isolation from the external code. Differential Revision: https://reviews.freebsd.org/D29971 MFC after: 1 week	2021-04-28 08:09:45 +00:00
Alexander V. Chernikov	8a0d57baec	[fib algo] Delay algo init at fib growth to to allow to reliably use rib KPI. Currently, most of the rib(9) KPI does not use rnh pointers, using fibnum and family parameters to determine the rib pointer instead. This works well except for the case when we initialize new rib pointers during fib growth. In that case, there is no mapping between fib/family and the new rib, as an entirely new rib pointer array is populated. Address this by delaying fib algo initialization till after switching to the new pointer array and updating the number of fibs. Set datapath pointer to the dummy function, so the potential callers won't crash the kernel in the brief moment when the rib exists, but no fib algo is attached. This change allows to avoid creating duplicates of existing rib functions, with altered signature. Differential Revision: https://reviews.freebsd.org/D29969 MFC after: 1 week	2021-04-27 22:10:08 +00:00
Alexander V. Chernikov	439d087d0b	[fib algo] always commit static routes synchronously. Modular fib lookup framework features logic that allows route update batching for the algorithms that cannot easily apply the routing change without rebuilding. As a result, dataplane lookups may return old data until the the sync takes place. With the default sync timeout of 50ms, it is possible that new binary like ping(8) executed exactly after route(8) will still use the old fib data. To address some aspects of the problem, framework executes all rtable changes without RTF_GATEWAY synchronously. To fix the aforementioned problem, this diff extends sync execution for all RTF_STATIC routes (e.g. ones maintained by route(8). This fixes a bunch of tests in the networking space. Reported by: ci, arichardson MFC after: 2 weeks	2021-04-27 08:31:40 +00:00
Alexander V. Chernikov	bc5ef45aec	Fix drace CTF for the rib_head. `33cb3cb2e3` introduced an `rib_head` structure field under the FIB_ALGO define. This may be problematic for the CTF, as some of the files including `route_var.h` do not have `fib_algo` defined. Make dtrace happy by making the field unconditional. Suggested by: markj	2021-04-27 07:47:53 +00:00
Alexander V. Chernikov	7d222ce3c1	Fix NOINET[6],!VIMAGE builds after FIB_ALGO addition to GENERIC Reported by: jbeich PR: 255390	2021-04-21 05:53:42 +01:00
Alexander V. Chernikov	67372fb3e0	Fix NOINET[6] build after enabling FIB_ALGO in GENERIC. Submitted by: jbeich PR: 255389	2021-04-21 02:49:18 +01:00
Alexander V. Chernikov	c23385612d	[fib algo] Do not print algo attach/detach message on boot MFC after: 1 day	2021-04-25 08:58:06 +00:00
Alexander V. Chernikov	a81e2e7890	Make gcc happy by initializing error in rib_handle_ifaddr_info().	2021-04-25 08:44:59 +00:00
Stefan Eßer	6409e59427	Fix build with gcc Correctly declare function without arguments as f(void) instead of f().	2021-04-25 10:15:17 +02:00
Alexander V. Chernikov	33cb3cb2e3	Fix rib generation count for fib algo. Currently, PCB caching mechanism relies on the rib generation counter (rnh_gen) to invalidate cached nhops/LLE entries. With certain fib algorithms, it is now possible that the datapath lookup state applies RIB changes with some delay. In that scenario, PCB cache will invalidate on the RIB change, but the new lookup may result in the same nexthop being returned. When fib algo finally gets in sync with the RIB changes, PCB cache will not receive any notification and will end up caching the stale data. To fix this, introduce additional counter, rnh_gen_rib, which is used only when FIB_ALGO is enabled. This counter is incremented by the control plane. Each time when fib algo synchronises with the RIB, it updates rnh_gen to the current rnh_gen_rib value. Differential Revision: https://reviews.freebsd.org/D29812 Reviewed by: donner MFC after: 2 weeks	2021-04-20 22:02:41 +00:00
Alexander V. Chernikov	0abb6ff590	fib algo: do not reallocate datapath index for datapath ptr update. Fib algo uses a per-family array indexed by the fibnum to store lookup function pointers and per-fib data. Each algorithm rebuild currently requires re-allocating this array to support atomic change of two pointers. As in reality most of the changes actually involve changing only data pointer, add a shortcut performing in-flight pointer update. MFC after: 2 weeks	2021-04-18 16:12:13 +01:00
Alexander V. Chernikov	e2f79d9e51	Fib algo: extend KPI by allowing algo to set datapath pointers. Some algorithms may require updating datapath and control plane algo pointers after the (batched) updates. Export fib_set_datapath_ptr() to allow setting the new datapath function or data pointer from the algo. Add fib_set_algo_ptr() to allow updating algo control plane pointer from the algo. Add fib_epoch_call() epoch(9) wrapper to simplify freeing old datapath state. Reviewed by: zec Differential Revision: https://reviews.freebsd.org/D29799 MFC after: 1 week	2021-04-18 16:12:12 +01:00
Alexander V. Chernikov	6b8ef0d428	Add batched update support for the fib algo. Initial fib algo implementation was build on a very simple set of principles w.r.t updates: 1) algorithm is ether able to apply the change synchronously (DIR24-8) or requires full rebuild (bsearch, lradix). 2) framework falls back to rebuild on every error (memory allocation, nhg limit, other internal algo errors, etc). This changes brings the new "intermediate" concept - batched updates. Algotirhm can indicate that the particular update has to be handled in batched fashion (FLM_BATCH). The framework will write this update and other updates to the temporary buffer instead of pushing them to the algo callback. Depending on the update rate, the framework will batch 50..1024 ms of updates and submit them to a different algo callback. This functionality is handy for the slow-to-rebuild algorithms like DXR. Differential Revision: https://reviews.freebsd.org/D29588 Reviewed by: zec MFC after: 2 weeks	2021-04-14 23:54:11 +01:00
Alexander V. Chernikov	ee2cf2b360	Implement better rebuild-delay fib algo policy. The intent is to better handle time intervals with large amount of RIB updates (e.g. BGP peer going up or down), while still keeping low sync delay for the rest scenarios. The implementation is the following: updates are bucketed into the buckets of size 50ms. If the number of updates within a current bucket exceeds the threshold of 500 routes/sec (e.g. 10 updates per bucket interval), the update is delayed for another 50ms. This can be repeated until the maximum update delay (1 sec) is reached. All 3 variables are runtime tunables: * net.route.algo.fib_max_sync_delay_ms: 1000 * net.route.algo.bucket_change_threshold_rate: 500 * net.route.algo.bucket_time_ms: 50 Differential Review: https://reviews.freebsd.org/D29588 MFC after: 2 weeks	2021-04-09 21:33:03 +01:00
Alexander V. Chernikov	0c2a0e0380	Fix typo in the `9fa8d1582b`. Reported by: cy	2021-03-29 23:42:48 +00:00
Alexander V. Chernikov	9fa8d1582b	Put bandaid for nhgrp_dump_sysctl() malloc KASSERT(). Recent rtsock changes widened epoch and covered nhgrp_dump_sysctl(), resulting in `netstat -4On` triggering with KASSERT. MFC after: 1 day	2021-03-29 23:12:11 +00:00
Alexander V. Chernikov	0f30a36ded	Rename variables inside nexhtop group consider_resize() code. No functional changes. MFC after: 3 days	2021-03-29 23:06:13 +00:00
Alexander V. Chernikov	9095dc7da4	Fix nexhtop group index array scaling. The current code has the limit of 127 nexthop groups due to the wrongly-checked bitmask_copy() return value. PR: 254303 Reported by: Aleks <a.ivanov at veesp.com> MFC after: 1 day	2021-03-29 23:00:17 +00:00
Alexander V. Chernikov	6f43c72b47	Zero `struct weightened_nhop` fields in nhgrp_get_addition_group(). `struct weightened_nhop` has spare 32bit between the fields due to the alignment (on amd64). Not zeroing these spare bits results in duplicating nhop groups in the kernel due to the way how comparison works. MFC after: 1 day	2021-03-20 08:26:03 +00:00
Alexander V. Chernikov	24cd2796cf	Fix !VNET build broken by `66f138563b`.	2021-03-25 00:31:08 +00:00
Alexander V. Chernikov	66f138563b	Plug nexthop group refcount leak. In case with batch route delete via rib_walk_del(), when some paths from the multipath route gets deleted, old multipath group were not freed. PR: 254496 Reported by: Zhenlei Huang <zlei.huang@gmail.com> MFC after: 1 day	2021-03-24 23:52:18 +00:00
Alexander V. Chernikov	c00e2f573b	Fix build for non-vnet non-multipath kernels broken by `a0308e48ec`.	2021-03-23 23:35:23 +00:00
Alexander V. Chernikov	a0308e48ec	Fix panic when destroying interface with ECMP routes. Reported by: Zhenlei Huang <zlei.huang at gmail.com> PR: 254496 MFC after: immediately	2021-03-23 22:03:20 +00:00
Alexander V. Chernikov	2476178e6b	Fix kassert panic when inserting multipath routes from multiple threads. Reported by: Marco Zec <zec at fer.hr> MFC after: immediately	2021-03-21 18:15:29 +00:00
Alexander V. Chernikov	e4ac3f7463	Fix fib algo rebuild delay calculation. Submitted by: Marco Zec <zec at fer.hr> MFC after: 3 days	2021-03-15 21:09:07 +00:00
Alexander V. Chernikov	b1d63265ac	Flush remaining routes from the routing table during VNET shutdown. Summary: This fixes rtentry leak for the cloned interfaces created inside the VNET. PR: 253998 Reported by: rashey at superbox.pl MFC after: 3 days Loopback teardown order is `SI_SUB_INIT_IF`, which happens after `SI_SUB_PROTO_DOMAIN` (route table teardown). Thus, any route table operations are too late to schedule. As the intent of the vnet teardown procedures to minimise the amount of effort by doing global cleanups instead of per-interface ones, address this by adding a relatively light-weight routing table cleanup function, `rib_flush_routes()`. It removes all remaining routes from the routing table and schedules the deletion, which will happen later, when `rtables_destroy()` waits for the current epoch to finish. Test Plan: ``` set_skip:set_skip_group_lo -> passed [0.053s] tail -n 200 /var/log/messages \| grep rtentry ``` Reviewers: #network, kp, bz Reviewed By: kp Subscribers: imp, ae Differential Revision: https://reviews.freebsd.org/D29116	2021-03-10 21:10:14 +00:00
Alexander V. Chernikov	5964172837	Simplify ifa/ifp refcounting in the routing stack. The routing stack control depends on quite a tree of functions to determine the proper attributes of a route such as a source address (ifa) or transmit ifp of a route. When actually inserting a route, the stack needs to ensure that ifa and ifp points to the entities that are still valid. Validity means slightly more than just pointer validity - stack need guarantee that the provided objects are not scheduled for deletion. Currently, callers either ignore it (most ifp parts, historically) or try to use refcounting (ifa parts). Even in case of ifa refcounting it's not always implemented in fully-safe manner. For example, some codepaths inside rt_getifa_fib() are referencing ifa while not holding any locks, resulting in possibility of referencing scheduled-for-deletion ifa. Instead of trying to fix all of the callers by enforcing proper refcounting, switch to a different model. As the rib_action() already requires epoch, do not require any stability guarantees other than the epoch-provided one. Use newly-added conditional versions of the refcounting functions (ifa_try_ref(), if_try_ref()) and fail if any of these fails. Reviewed by: donner MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28837	2021-02-22 23:37:59 +00:00
Alexander V. Chernikov	a375ec52a7	Fix ifa refcount leak during route addition. Reported by: rstone Reviewed by: rstone MFC after: 1 day	2021-02-13 00:06:14 +00:00
Alexander V. Chernikov	8170a7d438	Fix interface route addition with net/bird. The case of adding interface route by specifying interface address as the gateway was missed during code refactoring. Re-add it back by copying non-AF_LINK gateway data when RTF_GATEWAY is not set. Reviewed by: donner MFC after: 3 days	2021-02-12 19:45:35 +00:00

1 2 3

127 Commits