freebsd-skq

Author	SHA1	Message	Date
Alexander V. Chernikov	a375ec52a7	Fix ifa refcount leak during route addition. Reported by: rstone Reviewed by: rstone MFC after: 1 day	2021-02-13 00:06:14 +00:00
Alexander V. Chernikov	8170a7d438	Fix interface route addition with net/bird. The case of adding interface route by specifying interface address as the gateway was missed during code refactoring. Re-add it back by copying non-AF_LINK gateway data when RTF_GATEWAY is not set. Reviewed by: donner MFC after: 3 days	2021-02-12 19:45:35 +00:00
Alexander V. Chernikov	adc4ea97bd	Turn off forgotten multipath debug messages Reported by: mike tancsa<mike at sentex.net> MFC after: 3 days	2021-02-08 21:42:20 +00:00
Alexander V. Chernikov	eb0b1b33d5	Enable multipath routing by default. ROUTE_MPATH was added to the GENERIC kernel in r368648. According to the plan in D27428, it was enabled with `net.route.multipath` sysctl set to 0. Given enough time has passed, this change enables route multipath by default. The goal is to ship FreeBSD 13 with multipath turned on. Reviewed By: donner, olivier MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28423	2021-02-03 08:49:58 +00:00
Alexander V. Chernikov	78c93a1721	Use process fib for inet/inet6 fib_algo sysctls. This allows to set/query fib algo for non-default fibs. MFC after: 3 days	2021-01-31 10:50:08 +00:00
Alexander V. Chernikov	151ec796a2	Fix the design problem with delayed algorithm sync. Currently, if the immutable algorithm like bsearch or radix_lockless receives rtable update notification, it schedules algorithm rebuild. This rebuild is executed by the callout after ~50 milliseconds. It is possible that a script adding an interface address and than route with the gateway bound to that address will fail. It can happen due to the fact that fib is not updated by the time the route addition request arrives. Fix this by allowing synchronous algorithm rebuilds based on certain conditions. By default, these conditions assume: 1) less than net.route.algo.fib_sync_limit=100 routes 2) routes without gateway. * Move algo instance build entirely under rib WLOCK. Rib lock is only used for control plane (except radix algo, but there are no rebuilds). * Add rib_walk_ext_locked() function to allow RIB iteration with rib lock already held. * Fix rare potential callout use-after-free for fds by binding fd callout to the relevant rib rmlock. In that case, callout_stop() under rib WLOCK guarantees no callout will be executed afterwards. MFC after: 3 days	2021-01-30 23:25:57 +00:00
Alexander V. Chernikov	dd9163003c	Add rib_subscribe_locked() and rib_unsubsribe_locked() to support subscriptions during RIB modifications. Add new subscriptions to the beginning of the lists instead of the end. This fixes the situation when new subscription is created int the callback for the existing subscription, leading to the subscription notification handler pick it. MFC after: 3 days	2021-01-30 23:25:57 +00:00
Alexander V. Chernikov	ab6d9aaed7	Move business logic from rebuild_fd_callout() into rebuild_fd(). This simplifies code a bit and allows for future non-callout callers to request rebuild. MFC after: 3 days	2021-01-30 23:25:57 +00:00
Alexander V. Chernikov	f8b7ebea49	Improve fib_algo debug messages. * Move per-prefix debug lines under LOG_DEBUG2 * Create fib instance counter to distingush log messages between instances * Add more messages on rebuild reason. MFC after: 3 days	2021-01-30 23:25:56 +00:00
Alexander V. Chernikov	9d6567bc30	Fix panic on vnet creation if fib algo has been set to fixed value. Make fixed algo property per-VNET instead of global.	2021-01-17 20:32:25 +00:00
Alexander V. Chernikov	81728a538d	Split rtinit() into multiple functions. rtinit[1]() is a function used to add or remove interface address prefix routes, similar to ifa_maintain_loopback_route(). It was intended to be family-agnostic. There is a problem with this approach in reality. 1) IPv6 code does not use it for the ifa routes. There is a separate layer, nd6_prelist_(), providing interface for maintaining interface routes. Its part, responsible for the actual route table interaction, mimics rtenty() code. 2) rtinit tries to combine multiple actions in the same function: constructing proper route attributes and handling iterations over multiple fibs, for the non-zero net.add_addr_allfibs use case. It notably increases the code complexity. 3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag for p2p connections, host routes and p2p routes are handled in the same way. Additionally, mapping IFA flags to RTF flags makes the interface pretty messy. It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface aliases. 4) rtinit() is the last customer passing non-masked prefixes to rib_action(), complicating rib_action() implementation. 5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive" ifa messages in certain corner cases. To address all these points, the following has been done: * rtinit() has been split into multiple functions: - Route attribute construction were moved to the per-address-family functions, dealing with (2), (3) and (4). - funnction providing net.add_addr_allfibs handling and route rtsock notificaions is the new routing table inteface. - rtsock ifa notificaion has been moved out as well. resulting set of funcion are only responsible for the actual route notifications. Side effects: * /32 alias does not result in interface routes (/32 route and "host" route) * RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses Differential revision: https://reviews.freebsd.org/D28186	2021-01-16 22:42:41 +00:00
Alexander V. Chernikov	685de460bc	Use static initializers for fib algo to shift initialization to ealier stage. This allows to register modules loaded at boot time. Reported by: olivier	2021-01-11 00:16:54 +00:00
Alexander V. Chernikov	d68cf57b7f	Refactor rt_addrmsg() and rt_routemsg(). Summary: * Refactor rt_addrmsg(): make V_rt_add_addr_allfibs decision locally. * Fix rt_routemsg() and multipath by accepting nexthop instead of interface pointer. * Refactor rtsock_routemsg(): avoid accessing rtentry fields directly. * Simplify in_addprefix() by moving prefix search to a separate function. Reviewers: #network Subscribers: imp, ae, bz Differential Revision: https://reviews.freebsd.org/D28011	2021-01-07 19:38:19 +00:00
Alexander V. Chernikov	9c0ff6a8bb	Remove now-unused RT_GATEWAY* definitions. They were used to simplify nexthop transition, hence not needed anymore.	2021-01-04 21:45:46 +00:00
Ryan Libby	833dbf1e22	route: quiet -Wredundant-decls Remove declaration duplicated in `f5baf8bb12` Reviewed by: melifaro Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D27790	2020-12-27 16:32:27 -08:00
Alexander V. Chernikov	f733d9701b	Fix default route handling in radix4_lockless algo. Improve nexthop debugging. Reported by: Florian Smeets <flo at smeets.xyz>	2020-12-26 22:51:02 +00:00
Alexander V. Chernikov	4e19e0d92a	Use light-weight versions of routing lookup functions in ng_netflow. Use recently-added combination of `fib[46]_lookup_rt()` which returns rtentry & raw nexthop with `rt_get_inet[6]_plen()` which returns address/prefix length of prefix inside `rt`. Add `nhop_select_func()` wrapper around inlined `nhop_select()` to allow callers external to the routing subsystem select the proper nexthop from the multipath group without including internal headers. New calls does not require reference counting objects and reduce the amount of copied/processed rtentry data. Differential Revision: https://reviews.freebsd.org/D27675	2020-12-26 11:27:38 +00:00
Alexander V. Chernikov	f5baf8bb12	Add modular fib lookup framework. This change introduces framework that allows to dynamically attach or detach longest prefix match (lpm) lookup algorithms to speed up datapath route tables lookups. Framework takes care of handling initial synchronisation, route subscription, nhop/nhop groups reference and indexing, dataplane attachments and fib instance algorithm setup/teardown. Framework features automatic algorithm selection, allowing for picking the best matching algorithm on-the-fly based on the amount of routes in the routing table. Currently framework code is guarded under FIB_ALGO config option. An idea is to enable it by default in the next couple of weeks. The following algorithms are provided by default: IPv4: * bsearch4 (lockless binary search in a special IP array), tailored for small-fib (<16 routes) * radix4_lockless (lockless immutable radix, re-created on every rtable change), tailored for small-fib (<1000 routes) * radix4 (base system radix backend) * dpdk_lpm4 (DPDK DIR24-8-based lookups), lockless datastrucure, optimized for large-fib (D27412) IPv6: * radix6_lockless (lockless immutable radix, re-created on every rtable change), tailed for small-fib (<1000 routes) * radix6 (base system radix backend) * dpdk_lpm6 (DPDK DIR24-8-based lookups), lockless datastrucure, optimized for large-fib (D27412) Performance changes: Micro benchmarks (I7-7660U, single-core lookups, 2048k dst, code in D27604): IPv4: 8 routes: radix4: ~20mpps radix4_lockless: ~24.8mpps bsearch4: ~69mpps dpdk_lpm4: ~67 mpps 700k routes: radix4_lockless: 3.3mpps dpdk_lpm4: 46mpps IPv6: 8 routes: radix6_lockless: ~20mpps dpdk_lpm6: ~70mpps 100k routes: radix6_lockless: 13.9mpps dpdk_lpm6: 57mpps Forwarding benchmarks: + 10-15% IPv4 forwarding performance (small-fib, bsearch4) + 25% IPv4 forwarding performance (full-view, dpdk_lpm4) + 20% IPv6 forwarding performance (full-view, dpdk_lpm6) Control: Framwork adds the following runtime sysctls: List algos * net.route.algo.inet.algo_list: bsearch4, radix4_lockless, radix4 * net.route.algo.inet6.algo_list: radix6_lockless, radix6, dpdk_lpm6 Debug level (7=LOG_DEBUG, per-route) net.route.algo.debug_level: 5 Algo selection (currently only for fib 0): net.route.algo.inet.algo: bsearch4 net.route.algo.inet6.algo: radix6_lockless Support for manually changing algos in non-default fib will be added soon. Some sysctl names will be changed in the near future. Differential Revision: https://reviews.freebsd.org/D27401	2020-12-25 11:33:17 +00:00
Alexander V. Chernikov	df9053920f	Add IPv4/IPv6 rtentry prefix accessors. Multiple consumers like ipfw, netflow or new route lookup algorithms need to get the prefix data out of struct rtentry. Instead of providing direct access to the rtentry, create IPv4/IPv6 accessors to abstract struct rtentry internals and avoid including internal routing headers for external consumers. While here, move struct route_nhop_data to the public header, so external customers can actually use lookup functions returning rt&nhop data. Differential Revision: https://reviews.freebsd.org/D27416	2020-12-03 22:23:57 +00:00
Alexander V. Chernikov	d1d941c5b9	Remove RADIX_MPATH config option. ROUTE_MPATH is the new config option controlling new multipath routing implementation. Remove the last pieces of RADIX_MPATH-related code and the config option. Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D27244	2020-11-29 19:43:33 +00:00
Alexander V. Chernikov	3b1654cb14	Introduce rib_walk_ext_internal() to allow iteration with rnh pointer. This solves the case when rib is not yet attached/detached to/from the system rib array. Differential Revision: https://reviews.freebsd.org/D27406	2020-11-29 13:54:49 +00:00
Alexander V. Chernikov	f47fa26065	Add nhop_ref_any() to unify referencing nhop or nexthop group. It allows code within routing subsystem to transparently reference nexthops and nexthop groups, similar to nhop_free_any(), abstracting ROUTE_MPATH details. Differential Revision: https://reviews.freebsd.org/D27410	2020-11-29 13:52:06 +00:00
Alexander V. Chernikov	98d5c4e5c8	Add tracking for rib/nhops/nhgrp objects and provide cumulative number accessors. The resulting KPI can be used by routing table consumers to estimate the required scale for route table export. * Add tracking for rib routes * Add accessors for number of nexthops/nexthop objects * Simplify rib_unsubscribe: store rnh we're attached to instead of requiring it up again on destruction. This helps in the cases when rnh is not linked yet/already unlinked. Differential Revision: https://reviews.freebsd.org/D27404	2020-11-29 13:27:24 +00:00
Alexander V. Chernikov	ef6ef7e5da	Add nhgrp_get_idx() as a counterpart for nhop_get_idx(). It allows the routing-related code to reference nexthop groups by index instead of storing a pointer.	2020-11-28 15:46:40 +00:00
Alexander V. Chernikov	7511a63825	Refactor rib iterator functions. * Make rib_walk() order of arguments consistent with the rest of RIB api * Add rib_walk_ext() allowing to exec callback before/after iteration. * Rename rt_foreach_fib_walk_del -> rib_foreach_table_walk_del * Rename rt_forach_fib_walk -> rib_foreach_table_walk * Move rib_foreach_table_walk{_del} to route/route_helpers.c * Slightly refactor rib_foreach_table_walk{_del} to make the implementation consistent and prepare for upcoming iterator optimizations. Differential Revision: https://reviews.freebsd.org/D27219	2020-11-22 20:21:10 +00:00
Alexander V. Chernikov	2d39824195	Switch net.add_addr_allfibs default to 0. The goal of the fib support is to provide multiple independent routing tables, isolated from each other. net.add_addr_allfibs default tries to shift gears in the opposite direction, unconditionally inserting all addresses to all of the fibs. There are use cases when this is necessary, however this is not a default expected behaviour, especially compared to other implementations. Provide WARNING message for the setups with multiple fibs to notify potential users of the feature. Differential Revision: https://reviews.freebsd.org/D26076	2020-11-08 18:27:49 +00:00
Alexander V. Chernikov	76e6b37f6b	Temporarily revert setting net.add_addr_allfibs to 0. It accidentally sweeped in r367486. Revert to allow for proper commit message & warning.	2020-11-08 18:11:12 +00:00
Alexander V. Chernikov	770495f4c0	Fix build broken by r367484: add route_ifaddrs.c. Pointy hat to: melifaro Reported by: jenkins	2020-11-08 13:30:44 +00:00
Alexander V. Chernikov	0c325f53f1	Implement flowid calculation for outbound connections to balance connections over multiple paths. Multipath routing relies on mbuf flowid data for both transit and outbound traffic. Current code fills mbuf flowid from inp_flowid for connection-oriented sockets. However, inp_flowid is currently not calculated for outbound connections. This change creates simple hashing functions and starts calculating hashes for TCP,UDP/UDP-Lite and raw IP if multipath routes are present in the system. Reviewed by: glebius (previous version),ae Differential Revision: https://reviews.freebsd.org/D26523	2020-10-18 17:15:47 +00:00
Alexander V. Chernikov	1b95005e95	Fix route flags update during RTM_CHANGE. Nexthop lookup was not consireding rt_flags when doing structure comparison, which lead to an original nexthop selection when changing flags. Fix the case by adding rt_flags field into comparison and rearranging nhop_priv fields to allow for efficient matching. Fix `route change X/Y flags` case - recent changes disallowed specifying RTF_GATEWAY flag without actual gateway. It turns out, route(8) fills in RTF_GATEWAY by default, unless -interface flag is specified. Fix regression by clearing RTF_GATEWAY flag instead of failing. Fix route flag reporting in RTM_CHANGE messages by explicitly updating rtm_flags after operation competion. Add IPv4/IPv6 tests for flag-only route changes.	2020-10-04 13:24:58 +00:00
Alexander V. Chernikov	9c584fa4bc	Remove ROUTE_MPATH-related warnings introduced in r366390. Reported by: mjg	2020-10-03 14:37:54 +00:00
Alexander V. Chernikov	fedeb08b6a	Introduce scalable route multipath. This change is based on the nexthop objects landed in D24232. The change introduces the concept of nexthop groups. Each group contains the collection of nexthops with their relative weights and a dataplane-optimized structure to enable efficient nexthop selection. Simular to the nexthops, nexthop groups are immutable. Dataplane part gets compiled during group creation and is basically an array of nexthop pointers, compiled w.r.t their weights. With this change, `rt_nhop` field of `struct rtentry` contains either nexthop or nexthop group. They are distinguished by the presense of NHF_MULTIPATH flag. All dataplane lookup functions returns pointer to the nexthop object, leaving nexhop groups details inside routing subsystem. User-visible changes: The change is intended to be backward-compatible: all non-mpath operations should work as before with ROUTE_MPATH and net.route.multipath=1. All routes now comes with weight, default weight is 1, maximum is 2^24-1. Current maximum multipath group width is statically set to 64. This will become sysctl-tunable in the followup changes. Using functionality: * Recompile kernel with ROUTE_MPATH * set net.route.multipath to 1 route add -6 2001:db8::/32 2001:db8::2 -weight 10 route add -6 2001:db8::/32 2001:db8::3 -weight 20 netstat -6On Nexthop groups data Internet6: GrpIdx NhIdx Weight Slots Gateway Netif Refcnt 1 ------- ------- ------- --------------------------------------- --------- 1 13 10 1 2001:db8::2 vlan2 14 20 2 2001:db8::3 vlan2 Next steps: * Land outbound hashing for locally-originated routes ( D26523 ). * Fix net/bird multipath (net/frr seems to work fine) * Add ROUTE_MPATH to GENERIC * Set net.route.multipath=1 by default Tested by: olivier Reviewed by: glebius Relnotes: yes Differential Revision: https://reviews.freebsd.org/D26449	2020-10-03 10:47:17 +00:00
Alexander V. Chernikov	2259a03020	Rework part of routing code to reduce difference to D26449. * Split rt_setmetrics into get_info_weight() and rt_set_expire_info(), as these two can be applied at different entities and at different times. * Start filling route weight in route change notifications * Pass flowid to UDP/raw IP route lookups * Rework nd6_subscription_cb() and sysctl_dumpentry() to prepare for the fact that rtentry can contain multiple nexthops. Differential Revision: https://reviews.freebsd.org/D26497	2020-09-21 20:02:26 +00:00
Alexander V. Chernikov	1440f62266	Remove unused nhop_ref_any() function. Remove "opt_mpath.h" header where not needed. No functional changes.	2020-09-20 21:32:52 +00:00
Alexander V. Chernikov	c4bcfe98e2	Fix gw updates / flag updates during route changes. * Zero gw_sdl if switching to interface route - the assumption that underlying storage is zeroed is incorrect with route changes. * Apply proper flag mask to rte. Reported by: vangyzen	2020-09-20 12:31:48 +00:00
Alexander V. Chernikov	2b32d93e55	Fix RADIX_MPATH build broken by r365521. Reported by: jenkins, Hartmann, O. <ohartmann at walstatt.org>	2020-09-10 07:05:31 +00:00
Alexander V. Chernikov	aa8f9f90ff	Update nexthop handling for route addition/deletion in preparation for mpath. Currently kernel requests deletion for the certain routes with specified gateway, but this gateway is not actually checked. With multipath routes, internal gateway checking becomes mandatory. Add the logic performing this check. Generalise RTF_PINNED routes to the generic route priorities, simplifying the logic. Add lookup_prefix() function to perform exact match search based on data in @info. Differential Revision: https://reviews.freebsd.org/D26356	2020-09-09 22:07:54 +00:00
Alexander V. Chernikov	cd6298d5c5	Retain marking net.fibs sysctl as a tunable. Suggested by: avg	2020-09-09 21:45:18 +00:00
Alexander V. Chernikov	4a8201c13a	Fix panic with net.fibs tunable set in loader.conf. Fix by removing forgotten CTLFLAG_RWTUN flag from the sysctl, loader variable will be read later in vnet_rtables_init(). Reported by: mav	2020-09-08 21:39:34 +00:00
Alexander V. Chernikov	8f07963360	Fix regression for IPv6 loopback routes. After nexthop introduction, loopback routes for the interface addresses were created without embedding actual interface index in the gateway. The latter is needed to pass the IPv6 scope during transmission via loopback.. Fix the regression by actually using passed gateway data with interface index. Differential Revision: https://reviews.freebsd.org/D26306	2020-09-03 22:24:52 +00:00
Mateusz Guzik	662c13053f	net: clean up empty lines in .c and .h files	2020-09-01 21:19:14 +00:00
Alexander V. Chernikov	b8d2d479cd	Revert uma zone alignemnt cache unadvertenly committed in r364950.	2020-08-29 12:04:13 +00:00
Alexander V. Chernikov	6498f66f7c	Fix build with RADIX_MPATH. Reported by: Hartmann, O <ohartmann@walstatt.org>	2020-08-29 11:04:24 +00:00
Alexander V. Chernikov	7c89a3b63f	Move fib_rte_to_nh_flags() from net/route_var.h to net/route/nhop_ctl.c. No functional changes. Initially this function was created to perform runtime flag conversions for the previous incarnation of fib lookup functions. As these functions got deprecated, move the function to the file with the only remaining caller. Lastly, rename it to convert_rt_to_nh_flags() to follow the naming notation.	2020-08-28 23:01:56 +00:00
Alexander V. Chernikov	a624ca3dff	Move net/route/shared.h definitions to net/route/route_var.h. No functional changes. net/route/shared.h was created in the inital phases of nexthop conversion. It was intended to serve the same purpose as route_var.h - share definitions of functions and structures between the routing subsystem components. At that time route_var.h was included by many files external to the routing subsystem, which largerly defeats its purpose. As currently this is not the case anymore and amount of route_var.h includes is roughly the same as shared.h, retire the latter in favour of the former.	2020-08-28 22:50:20 +00:00
Alexander V. Chernikov	b122304f6a	Further split nhop creation and rtable operations. As nexthops are immutable, some operations such as route attribute changes require nexthop fetching, forking, modification and route switching. These operations are not atomic, so they may need to be retried multiple times in presence of multiple speakers changing the same route. This change introduces "synchronisation" primitive: route_update_conditional(), simplifying logic for route changes and upcoming multipath operations. Differential Revision: https://reviews.freebsd.org/D26216	2020-08-28 21:59:10 +00:00
Alexander V. Chernikov	592d300e34	Remove RT_LOCK mutex from rte. rtentry lock traditionally served 2 purposed: first was protecting refcounts, the second was assuring consistent field access/changes. Since route nexthop introduction, the need for the former disappeared and the need for the latter reduced. To be more precise, the following rte field are mutable: rt_nhop (nexthop pointer, updated with RIB_WLOCK, passed in rib_cmd_info) rte_flags (only RTF_HOST and RTF_UP, where RTF_UP gets changed at rte removal) rt_weight (relative weight, updated with RIB_WLOCK, passed in rib_cmd_info) rt_expire (time when rte deletion is scheduled, updated with RIB_WLOCK) rt_chain (deletion chain pointer, updated with RIB_WLOCK) All of them are updated under RIB_WLOCK, so the only remaining concern is the reading. rt_nhop and rt_weight (addressed in this review) are read under rib lock and stored in the rib_cmd_info, so the caller has no problem with consitency. rte_flags is currently read unlocked in rtsock reporting (however the scope is only RTF_UP flag, which is pretty static). rt_expire is currently read unlocked in rtsock reporting. rt_chain accesses are safe, as this is only used at route deletion. rt_expire and rte_flags reads will be dealt in a separate reviews soon. Differential Revision: https://reviews.freebsd.org/D26162	2020-08-24 20:23:34 +00:00
Alexander V. Chernikov	93bfd365d2	Rename rt_flags to rte_flags && reduce number of rt_nhop accesses. No functional changes. Most of the routing flags are stored in the netxtop instead of rtentry. Rename rt->rt_flags to rt->rte_flags to simplify reading/modifying code checking routing flags. In the new multipath code, rt->rt_nhop may actually point to nexthop group instead of nhop. To ease transition, reduce the amount of rt->rt_nhop->... accesses. Differential Revision: https://reviews.freebsd.org/D26156	2020-08-22 19:30:56 +00:00
Mateusz Guzik	c93d310f87	Fix tinderbox build after r364465	2020-08-22 07:43:38 +00:00
Alexander V. Chernikov	f5247a232a	Make net.fibs growable. Allow to dynamically grow the amount of fibs in each vnet. This change alters current behavior. Currently, if one defines ROUTETABLES > 1 in the kernel config, each vnet will be created with the number of fibs defined in the kernel config. After this commit vnets will be created with fibs=1. Dynamic net.fibs is not compatible with net.add_addr_allfibs. The plan is to deprecate the latter and make net.add_addr_allfibs=0 default behaviour. Reviewed by: glebius Relnotes: yes Differential Revision: https://reviews.freebsd.org/D26062	2020-08-21 21:34:52 +00:00

1 2

79 Commits