freebsd-skq

Author	SHA1	Message	Date
Alexander V. Chernikov	d68cf57b7f	Refactor rt_addrmsg() and rt_routemsg(). Summary: * Refactor rt_addrmsg(): make V_rt_add_addr_allfibs decision locally. * Fix rt_routemsg() and multipath by accepting nexthop instead of interface pointer. * Refactor rtsock_routemsg(): avoid accessing rtentry fields directly. * Simplify in_addprefix() by moving prefix search to a separate function. Reviewers: #network Subscribers: imp, ae, bz Differential Revision: https://reviews.freebsd.org/D28011	2021-01-07 19:38:19 +00:00
Alexander V. Chernikov	f5baf8bb12	Add modular fib lookup framework. This change introduces framework that allows to dynamically attach or detach longest prefix match (lpm) lookup algorithms to speed up datapath route tables lookups. Framework takes care of handling initial synchronisation, route subscription, nhop/nhop groups reference and indexing, dataplane attachments and fib instance algorithm setup/teardown. Framework features automatic algorithm selection, allowing for picking the best matching algorithm on-the-fly based on the amount of routes in the routing table. Currently framework code is guarded under FIB_ALGO config option. An idea is to enable it by default in the next couple of weeks. The following algorithms are provided by default: IPv4: * bsearch4 (lockless binary search in a special IP array), tailored for small-fib (<16 routes) * radix4_lockless (lockless immutable radix, re-created on every rtable change), tailored for small-fib (<1000 routes) * radix4 (base system radix backend) * dpdk_lpm4 (DPDK DIR24-8-based lookups), lockless datastrucure, optimized for large-fib (D27412) IPv6: * radix6_lockless (lockless immutable radix, re-created on every rtable change), tailed for small-fib (<1000 routes) * radix6 (base system radix backend) * dpdk_lpm6 (DPDK DIR24-8-based lookups), lockless datastrucure, optimized for large-fib (D27412) Performance changes: Micro benchmarks (I7-7660U, single-core lookups, 2048k dst, code in D27604): IPv4: 8 routes: radix4: ~20mpps radix4_lockless: ~24.8mpps bsearch4: ~69mpps dpdk_lpm4: ~67 mpps 700k routes: radix4_lockless: 3.3mpps dpdk_lpm4: 46mpps IPv6: 8 routes: radix6_lockless: ~20mpps dpdk_lpm6: ~70mpps 100k routes: radix6_lockless: 13.9mpps dpdk_lpm6: 57mpps Forwarding benchmarks: + 10-15% IPv4 forwarding performance (small-fib, bsearch4) + 25% IPv4 forwarding performance (full-view, dpdk_lpm4) + 20% IPv6 forwarding performance (full-view, dpdk_lpm6) Control: Framwork adds the following runtime sysctls: List algos * net.route.algo.inet.algo_list: bsearch4, radix4_lockless, radix4 * net.route.algo.inet6.algo_list: radix6_lockless, radix6, dpdk_lpm6 Debug level (7=LOG_DEBUG, per-route) net.route.algo.debug_level: 5 Algo selection (currently only for fib 0): net.route.algo.inet.algo: bsearch4 net.route.algo.inet6.algo: radix6_lockless Support for manually changing algos in non-default fib will be added soon. Some sysctl names will be changed in the near future. Differential Revision: https://reviews.freebsd.org/D27401	2020-12-25 11:33:17 +00:00
Alexander V. Chernikov	d1d941c5b9	Remove RADIX_MPATH config option. ROUTE_MPATH is the new config option controlling new multipath routing implementation. Remove the last pieces of RADIX_MPATH-related code and the config option. Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D27244	2020-11-29 19:43:33 +00:00
Alexander V. Chernikov	7511a63825	Refactor rib iterator functions. * Make rib_walk() order of arguments consistent with the rest of RIB api * Add rib_walk_ext() allowing to exec callback before/after iteration. * Rename rt_foreach_fib_walk_del -> rib_foreach_table_walk_del * Rename rt_forach_fib_walk -> rib_foreach_table_walk * Move rib_foreach_table_walk{_del} to route/route_helpers.c * Slightly refactor rib_foreach_table_walk{_del} to make the implementation consistent and prepare for upcoming iterator optimizations. Differential Revision: https://reviews.freebsd.org/D27219	2020-11-22 20:21:10 +00:00
Alexander V. Chernikov	bad6b23606	Move all ifaddr route creation business logic to net/route/route_ifaddr.c Differential Revision: https://reviews.freebsd.org/D26318	2020-11-08 11:12:00 +00:00
Alexander V. Chernikov	fedeb08b6a	Introduce scalable route multipath. This change is based on the nexthop objects landed in D24232. The change introduces the concept of nexthop groups. Each group contains the collection of nexthops with their relative weights and a dataplane-optimized structure to enable efficient nexthop selection. Simular to the nexthops, nexthop groups are immutable. Dataplane part gets compiled during group creation and is basically an array of nexthop pointers, compiled w.r.t their weights. With this change, `rt_nhop` field of `struct rtentry` contains either nexthop or nexthop group. They are distinguished by the presense of NHF_MULTIPATH flag. All dataplane lookup functions returns pointer to the nexthop object, leaving nexhop groups details inside routing subsystem. User-visible changes: The change is intended to be backward-compatible: all non-mpath operations should work as before with ROUTE_MPATH and net.route.multipath=1. All routes now comes with weight, default weight is 1, maximum is 2^24-1. Current maximum multipath group width is statically set to 64. This will become sysctl-tunable in the followup changes. Using functionality: * Recompile kernel with ROUTE_MPATH * set net.route.multipath to 1 route add -6 2001:db8::/32 2001:db8::2 -weight 10 route add -6 2001:db8::/32 2001:db8::3 -weight 20 netstat -6On Nexthop groups data Internet6: GrpIdx NhIdx Weight Slots Gateway Netif Refcnt 1 ------- ------- ------- --------------------------------------- --------- 1 13 10 1 2001:db8::2 vlan2 14 20 2 2001:db8::3 vlan2 Next steps: * Land outbound hashing for locally-originated routes ( D26523 ). * Fix net/bird multipath (net/frr seems to work fine) * Add ROUTE_MPATH to GENERIC * Set net.route.multipath=1 by default Tested by: olivier Reviewed by: glebius Relnotes: yes Differential Revision: https://reviews.freebsd.org/D26449	2020-10-03 10:47:17 +00:00
Alexander V. Chernikov	2259a03020	Rework part of routing code to reduce difference to D26449. * Split rt_setmetrics into get_info_weight() and rt_set_expire_info(), as these two can be applied at different entities and at different times. * Start filling route weight in route change notifications * Pass flowid to UDP/raw IP route lookups * Rework nd6_subscription_cb() and sysctl_dumpentry() to prepare for the fact that rtentry can contain multiple nexthops. Differential Revision: https://reviews.freebsd.org/D26497	2020-09-21 20:02:26 +00:00
Alexander V. Chernikov	05aca418f4	Consistently use the same gateway when adding/deleting interface routes. Use the same link-level gateway when adding or deleting interface routes. This helps nexthop checking in the upcoming multipath changes. Differential Revision: https://reviews.freebsd.org/D26317	2020-09-07 10:13:54 +00:00
Mateusz Guzik	662c13053f	net: clean up empty lines in .c and .h files	2020-09-01 21:19:14 +00:00
Alexander V. Chernikov	a624ca3dff	Move net/route/shared.h definitions to net/route/route_var.h. No functional changes. net/route/shared.h was created in the inital phases of nexthop conversion. It was intended to serve the same purpose as route_var.h - share definitions of functions and structures between the routing subsystem components. At that time route_var.h was included by many files external to the routing subsystem, which largerly defeats its purpose. As currently this is not the case anymore and amount of route_var.h includes is roughly the same as shared.h, retire the latter in favour of the former.	2020-08-28 22:50:20 +00:00
Alexander V. Chernikov	592d300e34	Remove RT_LOCK mutex from rte. rtentry lock traditionally served 2 purposed: first was protecting refcounts, the second was assuring consistent field access/changes. Since route nexthop introduction, the need for the former disappeared and the need for the latter reduced. To be more precise, the following rte field are mutable: rt_nhop (nexthop pointer, updated with RIB_WLOCK, passed in rib_cmd_info) rte_flags (only RTF_HOST and RTF_UP, where RTF_UP gets changed at rte removal) rt_weight (relative weight, updated with RIB_WLOCK, passed in rib_cmd_info) rt_expire (time when rte deletion is scheduled, updated with RIB_WLOCK) rt_chain (deletion chain pointer, updated with RIB_WLOCK) All of them are updated under RIB_WLOCK, so the only remaining concern is the reading. rt_nhop and rt_weight (addressed in this review) are read under rib lock and stored in the rib_cmd_info, so the caller has no problem with consitency. rte_flags is currently read unlocked in rtsock reporting (however the scope is only RTF_UP flag, which is pretty static). rt_expire is currently read unlocked in rtsock reporting. rt_chain accesses are safe, as this is only used at route deletion. rt_expire and rte_flags reads will be dealt in a separate reviews soon. Differential Revision: https://reviews.freebsd.org/D26162	2020-08-24 20:23:34 +00:00
Alexander V. Chernikov	eb1c7adb70	Finish r364492 by renaming rt_flags to rte_flags for multipath code.	2020-08-22 20:02:40 +00:00
Alexander V. Chernikov	93bfd365d2	Rename rt_flags to rte_flags && reduce number of rt_nhop accesses. No functional changes. Most of the routing flags are stored in the netxtop instead of rtentry. Rename rt->rt_flags to rt->rte_flags to simplify reading/modifying code checking routing flags. In the new multipath code, rt->rt_nhop may actually point to nexthop group instead of nhop. To ease transition, reduce the amount of rt->rt_nhop->... accesses. Differential Revision: https://reviews.freebsd.org/D26156	2020-08-22 19:30:56 +00:00
Alexander V. Chernikov	f5247a232a	Make net.fibs growable. Allow to dynamically grow the amount of fibs in each vnet. This change alters current behavior. Currently, if one defines ROUTETABLES > 1 in the kernel config, each vnet will be created with the number of fibs defined in the kernel config. After this commit vnets will be created with fibs=1. Dynamic net.fibs is not compatible with net.add_addr_allfibs. The plan is to deprecate the latter and make net.add_addr_allfibs=0 default behaviour. Reviewed by: glebius Relnotes: yes Differential Revision: https://reviews.freebsd.org/D26062	2020-08-21 21:34:52 +00:00
Alexander V. Chernikov	2f23f45b20	Simplify dom_<rtattach\|rtdetach>. Remove unused arguments from dom_rtattach/dom_rtdetach functions and make them return/accept 'struct rib_head' instead of 'void **'. Declare inet/inet6 implementations in the relevant _var.h headers similar to domifattach / domifdetach. Add rib_subscribe_internal() function to accept subscriptions to the rnh directly. Differential Revision: https://reviews.freebsd.org/D26053	2020-08-14 21:29:56 +00:00
Alexander V. Chernikov	6cbadc4234	Move rtzone handling code to net/route_ctl.c After moving the route control plane code from net/route.c, all rtzone users ended up being in net/route_ctl.c. Move uma(9) rtzone setup/teardown code to net/route_ctl.c as well to have everything in a single place. While here, remove custom initializers from the zone. It was added originally to avoid setup/teardown of costy per-cpu couters. With these counters removed, the only remaining job was avoiding rte mutex setup/teardown. Mutex setup is relatively cheap. Additionally, this mutex will soon be removed. With that in mind, there is no sense in keeping custom zone callbacks. Differential Revision: https://reviews.freebsd.org/D26051	2020-08-13 18:35:29 +00:00
Mitchell Horne	f7d79f6c6d	Correctly set error in rt_mpath_unlink It is possible for rn_delete() to return NULL. If this happens, then set *perror to ESRCH, as is done in the rest of the function. Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D25871	2020-08-12 16:43:20 +00:00
Alexander V. Chernikov	e1c05fd290	Transition from rtrequest1_fib() to rib_action(). Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib, in6_rtrequest, rtrequest_fib> and their uses and switch to to rib_action(). This is part of the new routing KPI. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25546	2020-07-21 19:56:13 +00:00
Alexander V. Chernikov	725871230d	Temporarly revert r363319 to unbreak the build. Reported by: CI Pointy hat to: melifaro	2020-07-19 10:53:15 +00:00
Alexander V. Chernikov	8cee15d9e4	Transition from rtrequest1_fib() to rib_action(). Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib, in6_rtrequest, rtrequest_fib> and their uses and switch to to rib_action(). This is part of the new routing KPI. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25546	2020-07-19 09:29:27 +00:00
Alexander V. Chernikov	edc37a66e3	Add destructor for the rib subscription system to simplify users code. Subscriptions are planned to be used by modules such as route lookup engines. In that case that's the module task to properly unsibscribe before detach. However, the in-kernel customer - inet6 wants to track default route changes. To avoid having inet6 store per-fib subscriptions, handle automatic destruction internally. Differential Revision: https://reviews.freebsd.org/D25614	2020-07-12 11:18:09 +00:00
Alexander V. Chernikov	da187ddb3d	* Add rib_<add\|del\|change>_route() functions to manipulate the routing table. The main driver for the change is the need to improve notification mechanism. Currently callers guess the operation data based on the rtentry structure returned in case of successful operation result. There are two problems with this appoach. First is that it doesn't provide enough information for the upcoming multipath changes, where rtentry refers to a new nexthop group, and there is no way of guessing which paths were added during the change. Second is that some rtentry fields can change during notification and protecting from it by requiring customers to unlock rtentry is not desired. Additionally, as the consumers such as rtsock do know which operation they request in advance, making explicit add/change/del versions of the functions makes sense, especially given the functions don't share a lot of code. With that in mind, introduce rib_cmd_info notification structure and rib_<add\|del\|change>_route() functions, with mandatory rib_cmd_info pointer. It will be used in upcoming generalized notifications. * Move definitions of the new functions and some other functions/structures used for the routing table manipulation to a separate header file, net/route/route_ctl.h. net/route.h is a frequently used file included in ~140 places in kernel, and 90% of the users don't need these definitions. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D25067	2020-06-01 20:49:42 +00:00
Alexander V. Chernikov	e7403d0230	Revert r361704, it accidentally committed merged D25067 and D25070.	2020-06-01 20:40:40 +00:00
Alexander V. Chernikov	79674562b8	* Add rib_<add\|del\|change>_route() functions to manipulate the routing table. The main driver for the change is the need to improve notification mechanism. Currently callers guess the operation data based on the rtentry structure returned in case of successful operation result. There are two problems with this appoach. First is that it doesn't provide enough information for the upcoming multipath changes, where rtentry refers to a new nexthop group, and there is no way of guessing which paths were added during the change. Second is that some rtentry fields can change during notification and protecting from it by requiring customers to unlock rtentry is not desired. Additionally, as the consumers such as rtsock do know which operation they request in advance, making explicit add/change/del versions of the functions makes sense, especially given the functions don't share a lot of code. With that in mind, introduce rib_cmd_info notification structure and rib_<add\|del\|change>_route() functions, with mandatory rib_cmd_info pointer. It will be used in upcoming generalized notifications. * Move definitions of the new functions and some other functions/structures used for the routing table manipulation to a separate header file, net/route/route_ctl.h. net/route.h is a frequently used file included in ~140 places in kernel, and 90% of the users don't need these definitions. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D25067	2020-06-01 20:32:02 +00:00
Alexander V. Chernikov	cb86ca48bf	Unlock rtentry before calling for epoch(9) destruction as the destruction may happen immediately, leading to panic. Reported by: bdragon	2020-05-28 07:23:27 +00:00
Alexander V. Chernikov	4d2c2509f2	Move <add\|del\|change>_route() functions to route_ctl.c in preparation of multipath control plane changed described in D24141. Currently route.c contains core routing init/teardown functions, route table manipulation functions and various helper functions, resulting in >2KLOC file in total. This change moves most of the route table manipulation parts to a dedicated file, simplifying planned multipath changes and making route.c more manageable. Differential Revision: https://reviews.freebsd.org/D24870	2020-05-23 19:06:57 +00:00
Alexander V. Chernikov	a82f62ec2d	Remove refcounting from rtentry. After making rtentry reclamation backed by epoch(9) in r361409, there is no reason in keeping reference counting code. Differential Revision: https://reviews.freebsd.org/D24867	2020-05-23 12:15:47 +00:00
Alexander V. Chernikov	2bbab0af6d	Use epoch(9) for rtentries to simplify control plane operations. Currently the only reason of refcounting rtentries is the need to report the rtable operation details immediately after the execution. Delaying rtentry reclamation allows to stop refcounting and simplify the code. Additionally, this change allows to reimplement rib_lookup_info(), which is used by some of the customers to get the matching prefix along with nexthops, in more efficient way. The change keeps per-vnet rtzone uma zone. It adds nh_vnet field to nhop_priv to be able to reliably set curvnet even during vnet teardown. Rest of the reference counting code will be removed in the D24867 . Differential Revision: https://reviews.freebsd.org/D24866	2020-05-23 10:21:02 +00:00
Alexander V. Chernikov	4a6ee281d9	Remove unused rnh_close callback from rtable & cleanup depends. rnh_close callbackes was used by the in[6]_clsroute() handlers, doing cleanup in the route cloning code. Route cloning was eliminated somewhere around r186119. Last callback user was eliminated in r186215, 11 years ago. Differential Revision: https://reviews.freebsd.org/D24793	2020-05-11 06:09:18 +00:00
Alexander V. Chernikov	d223372545	Remove rtalloc1(_fib) KPI. Last user of rtalloc1() KPI has been eliminated in rS360631. As kernel is now fully switched to use new routing KPI defined in rS359823, remove old lookup functions. Differential Revision: https://reviews.freebsd.org/D24776	2020-05-10 09:34:48 +00:00
Alexander V. Chernikov	656442a718	Embed dst sockaddr into rtentry and remove rte packet counter Currently each rtentry has dst&gateway allocated separately from another zone, bloating cache accesses. Current 'struct rtentry' has 12 "mandatory" radix pointers in the beginning, leaving 4 usable pointers/32 bytes in the first 2 cache lines (amd64). Fields needed for the datapath are destination sockaddr and rt_nhop. So far it doesn't look like there is other routable addressing protocol other than IPv4/IPv6/MPLS, which uses keys longer than 20 bytes. With that in mind, embed dst into struct rtentry, making the first 24 bytes of rtentry within 128 bytes. That is enough to make IPv6 address within first 128 bytes. It is still pretty easy to add code for supporting separately-allocated dst, however it doesn't make a lot of sense in having such code without a use case. As rS359823 moved the gateway to the nexthop structure, the dst embedding change removes the need for any additional allocations done by rt_setgate(). Lastly, as a part of cleanup, remove counter(9) allocation code, as this field is not used in packet processing anymore. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24669	2020-05-08 21:06:10 +00:00
Alexander V. Chernikov	682b902daf	Add rib_lookup() sockaddr lookup wrapper and make ifa_ifwithroute use it. Create rib_lookup() wrapper around per-af dataplane lookup functions. This will help in the cases of having control plane af-agnostic code. Switch ifa_ifwithroute() to use this function instead of rtalloc1(). Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24731	2020-05-07 08:11:36 +00:00
Alexander V. Chernikov	9e02229580	Remove now-unused rt_ifp,rt_ifa,rt_gateway,rt_mtu rte fields. After converting routing subsystem customers to use nexthop objects defined in r359823, some fields in struct rtentry became unused. This commit removes rt_ifp, rt_ifa, rt_gateway and rt_mtu from struct rtentry along with the code initializing and updating these fields. Cleanup of the remaining fields will be addressed by D24669. This commit also changes the implementation of the RTM_CHANGE handling. Old implementation tried to perform the whole operation under radix WLOCK, resulting in slow performance and hacks like using RTF_RNH_LOCKED flag. New implementation looks up the route nexthop under radix RLOCK, creates new nexthop and tries to update rte nhop pointer. Only last part is done under WLOCK. In the hypothetical scenarious where multiple rtsock clients repeatedly issue RTM_CHANGE requests for the same route, route may get updated between read and update operation. This is addressed by retrying the operation multiple (3) times before returning failure back to the caller. Differential Revision: https://reviews.freebsd.org/D24666	2020-05-04 14:31:45 +00:00
Alexander V. Chernikov	8c61eb2107	Convert more rtentry field accesses into nhop fields accesses. Continue routing subsystem conversion to nhop objects defined in r359823. Use fields from nhop structure instead of "struct rtentry" fields. This is one of the last changes prior to removing rt_ifp, rt_ifa, rt_gateway and rt_mtu from struct rtentry. Differential Revision: https://reviews.freebsd.org/D24609	2020-04-29 21:54:09 +00:00
Alexander V. Chernikov	74787ef47b	Add nhop to the ifa_rtrequest() callback. With the upcoming multipath changes described in D24141, rt->rt_nhop can potentially point to a nexthop group instead of an individual nhop. To simplify caller handling of such cases, change ifa_rtrequest() callback to pass changed nhop directly. Differential Revision: https://reviews.freebsd.org/D24604	2020-04-29 19:28:56 +00:00
Alexander V. Chernikov	e7d8af4f65	Move route_temporal.c and route_var.h to net/route. Nexthop objects implementation, defined in r359823, introduced sys/net/route directory intended to hold all routing-related code. Move recently-introduced route_temporal.c and private route_var.h header there. Differential Revision: https://reviews.freebsd.org/D24597	2020-04-28 19:14:09 +00:00
Alexander V. Chernikov	1b0051bada	Eliminate now-unused parts of old routing KPI. r360292 switched most of the remaining routing customers to a new KPI, leaving a bunch of wrappers for old routing lookup functions unused. Remove them from the tree as a part of routing cleanup. Differential Revision: https://reviews.freebsd.org/D24569	2020-04-28 07:25:34 +00:00
Alexander V. Chernikov	983066f05b	Convert route caching to nexthop caching. This change is build on top of nexthop objects introduced in r359823. Nexthops are separate datastructures, containing all necessary information to perform packet forwarding such as gateway interface and mtu. Nexthops are shared among the routes, providing more pre-computed cache-efficient data while requiring less memory. Splitting the LPM code and the attached data solves multiple long-standing problems in the routing layer, drastically reduces the coupling with outher parts of the stack and allows to transparently introduce faster lookup algorithms. Route caching was (re)introduced to minimise (slow) routing lookups, allowing for notably better performance for large TCP senders. Caching works by acquiring rtentry reference, which is protected by per-rtentry mutex. If the routing table is changed (checked by comparing the rtable generation id) or link goes down, cache record gets withdrawn. Nexthops have the same reference counting interface, backed by refcount(9). This change merely replaces rtentry with the actual forwarding nextop as a cached object, which is mostly mechanical. Other moving parts like cache cleanup on rtable change remains the same. Differential Revision: https://reviews.freebsd.org/D24340	2020-04-25 09:06:11 +00:00
Alexander V. Chernikov	aaad3c4fca	Convert rtentry field accesses into nhop field accesses. One of the goals of the new routing KPI defined in r359823 is to entirely hide`struct rtentry` from the consumers. It will allow to improve routing subsystem internals and deliver more features much faster. This commit is mostly mechanical change to eliminate direct struct rtentry field accesses. The only notable difference is AF_LINK gateway encoding. AF_LINK gw is used in routing stack for operations with interface routes and host loopback routes. In the former case it indicates _some_ non-NULL gateway, as the interface is the same as in rt_ifp in kernel and rtm_ifindex in rtsock reporting. In the latter case the interface index inside gateway was used by the IPv6 datapath to verify address scope for link-local interfaces. Kernel uses struct sockaddr_dl for this type of gateway. This structure allows for specifying rich interface data, such as mac address and interface name. However, this results in relatively large structure size - 52 bytes. Routing stack fils in only 2 fields - sdl_index and sdl_type, which reside in the first 8 bytes of the structure. In the new KPI, struct nhop_object tries to be cache-efficient, hence embodies gateway address inside the structure. In the AF_LINK case it stores stortened version of the structure - struct sockaddr_dl_short, which occupies 16 bytes. After D24340 changes, the data inside AF_LINK gateway will not be used in the kernel at all, leaving rtsock as the only potential concern. The difference in rtsock reporting: (old) got message of size 240 on Thu Apr 16 03:12:13 2020 RTM_ADD: Add Route: len 240, pid: 0, seq 0, errno 0, flags:<UP,DONE,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK> 10.0.0.0 link#5 255.255.255.0 (new) got message of size 200 on Sun Apr 19 09:46:32 2020 RTM_ADD: Add Route: len 200, pid: 0, seq 0, errno 0, flags:<UP,DONE,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK> 10.0.0.0 link#5 255.255.255.0 Note 40 bytes different (52-16 + alignment). However, gateway is still a valid AF_LINK gateway with proper data filled in. It is worth noting that these particular messages (interface routes) are mostly ignored by routing daemons: * bird/quagga/frr uses RTM_NEWADDR and ignores prefix route addition messages. * quagga/frr ignores routes without gateway More detailed overview on how rtsock messages are used by the routing daemons to reconstruct the kernel view, can be found in D22974. Differential Revision: https://reviews.freebsd.org/D24519	2020-04-23 08:04:20 +00:00
Alexander V. Chernikov	539642a29d	Add nhop parameter to rti_filter callback. One of the goals of the new routing KPI defined in r359823 is to entirely hide`struct rtentry` from the consumers. It will allow to improve routing subsystem internals and deliver more features much faster. This change is one of the ongoing changes to eliminate direct struct rtentry field accesses. Additionally, with the followup multipath changes, single rtentry can point to multiple nexthops. With that in mind, convert rti_filter callback used when traversing the routing table to accept pair (rt, nhop) instead of nexthop. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24440	2020-04-16 17:20:18 +00:00
Alexander V. Chernikov	dd4776f0cc	Reorganise nd6 notification code to avoid direct rtentry field access. One of the goals of the new routing KPI defined in r359823 is to entirely hide `struct rtentry` from the consumers. Doing so will allow to improve routing subsystem internals and deliver features more easily. This change is one of the ongoing changes to eliminate direct struct rtentry field accesses. It introduces rtfree_func() wrapper around RTFREE() and reorganises nd6 notification code to avoid accessing most of the rtentry fields. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24404	2020-04-14 22:48:33 +00:00
Alexander V. Chernikov	a666325282	Introduce nexthop objects and new routing KPI. This is the foundational change for the routing subsytem rearchitecture. More details and goals are available in https://reviews.freebsd.org/D24141 . This patch introduces concept of nexthop objects and new nexthop-based routing KPI. Nexthops are objects, containing all necessary information for performing the packet output decision. Output interface, mtu, flags, gw address goes there. For most of the cases, these objects will serve the same role as the struct rtentry is currently serving. Typically there will be low tens of such objects for the router even with multiple BGP full-views, as these objects will be shared between routing entries. This allows to store more information in the nexthop. New KPI: struct nhop_object fib4_lookup(uint32_t fibnum, struct in_addr dst, uint32_t scopeid, uint32_t flags, uint32_t flowid); struct nhop_object fib6_lookup(uint32_t fibnum, const struct in6_addr dst6, uint32_t scopeid, uint32_t flags, uint32_t flowid); These 2 function are intended to replace all all flavours of <in_\|in6_>rtalloc[1]<_ign><_fib>, mpath functions and the previous fib[46]-generation functions. Upon successful lookup, they return nexthop object which is guaranteed to exist within current NET_EPOCH. If longer lifetime is desired, one can specify NHR_REF as a flag and get a referenced version of the nexthop. Reference semantic closely resembles rtentry one, allowing sed-style conversion. Additionally, another 2 functions are introduced to support uRPF functionality inside variety of our firewalls. Their primary goal is to hide the multipath implementation details inside the routing subsystem, greatly simplifying firewalls implementation: int fib4_lookup_urpf(uint32_t fibnum, struct in_addr dst, uint32_t scopeid, uint32_t flags, const struct ifnet src_if); int fib6_lookup_urpf(uint32_t fibnum, const struct in6_addr dst6, uint32_t scopeid, uint32_t flags, const struct ifnet src_if); All functions have a separate scopeid argument, paving way to eliminating IPv6 scope embedding and allowing to support IPv4 link-locals in the future. Structure changes: * rtentry gets new 'rt_nhop' pointer, slightly growing the overall size. * rib_head gets new 'rnh_preadd' callback pointer, slightly growing overall sz. Old KPI: During the transition state old and new KPI will coexists. As there are another 4-5 decent-sized conversion patches, it will probably take a couple of weeks. To support both KPIs, fields not required by the new KPI (most of rtentry) has to be kept, resulting in the temporary size increase. Once conversion is finished, rtentry will notably shrink. More details: * architectural overview: https://reviews.freebsd.org/D24141 * list of the next changes: https://reviews.freebsd.org/D24232 Reviewed by: ae,glebius(initial version) Differential Revision: https://reviews.freebsd.org/D24232	2020-04-12 14:30:00 +00:00
Alexander V. Chernikov	aef2d5fb0e	Split rtrequest1_fib() into smaller manageable chunks. No functional changes. * Move route addition / route deletion code from rtrequest1_fib() to add_route() and del_route() respectively. * Rename rtrequest1_fib_change() to change_route() for consistency. * Shrink the scope of ugly info #defines. MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D24349	2020-04-10 16:27:27 +00:00
Alexander V. Chernikov	ea2773323c	Fix dynamic redrects by adding forgotten RTF_HOST flag. Improve tests to verify the generated route flags. Reported by: jtl MFC after: 2 weeks	2020-03-03 15:33:43 +00:00
Pawel Biernacki	7029da5c36	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
Alexander V. Chernikov	34a5582c47	Bring back redirect route expiration. Redirect (and temporal) route expiration was broken a while ago. This change brings route expiration back, with unified IPv4/IPv6 handling code. It introduces net.inet.icmp.redirtimeout sysctl, allowing to set an expiration time for redirected routes. It defaults to 10 minutes, analogues with net.inet6.icmp6.redirtimeout. Implementation uses separate file, route_temporal.c, as route.c is already bloated with tons of different functions. Internally, expiration is implemented as an per-rnh callout scheduled when route with non-zero rt_expire time is added or rt_expire is changed. It does not add any overhead when no temporal routes are present. Callout traverses entire routing tree under wlock, scheduling expired routes for deletion and calculating the next time it needs to be run. The rationale for such implemention is the following: typically workloads requiring large amount of routes have redirects turned off already, while the systems with small amount of routes will not inhibit large overhead during tree traversal. This changes also fixes netstat -rn display of route expiration time, which has been broken since the conversion from kread() to sysctl. Reviewed by: bz MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D23075	2020-01-22 13:53:18 +00:00
Gleb Smirnoff	97168be809	Mechanically substitute assertion of in_epoch(net_epoch_preempt) to NET_EPOCH_ASSERT(). NFC	2020-01-15 05:45:27 +00:00
Alexander V. Chernikov	ead85fe415	Add fibnum, family and vnet pointer to each rib head. Having metadata such as fibnum or vnet in the struct rib_head is handy as it eases building functionality in the routing space. This change is required to properly bring back route redirect support. Reviewed by: bz MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D23047	2020-01-09 17:21:00 +00:00
Alexander V. Chernikov	e02d3fe70c	Fix rtsock route message generation for interface addresses. Reviewed by: olivier MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D22974	2020-01-07 21:16:30 +00:00
Mark Johnston	6b5d8e30f1	Plug some ifaddr refcount leaks. - Only take an ifaddr ref in in rt_exportinfo() if the caller explicitly requests it. Take care to release it in this case. - Don't unconditionally take a ref in rtrequest1_fib(). rt_getifa_fib() will acquire a reference, in which case we would previously acquire two references. - Stop taking a reference in rtinit1() before calling rtrequest1_fib(). rtrequest1_fib() will acquire a reference for the RTM_ADD case. PR: 242746 Reviewed by: melifaro (previous version) Tested by: ghuckriede@blackberry.com MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22912	2019-12-27 01:12:54 +00:00

1 2 3 4 5 ...

342 Commits