Commit Graph

38 Commits

Author SHA1 Message Date
Alexander V. Chernikov
b8d2d479cd Revert uma zone alignemnt cache unadvertenly committed in r364950. 2020-08-29 12:04:13 +00:00
Alexander V. Chernikov
6498f66f7c Fix build with RADIX_MPATH.
Reported by:	Hartmann, O <ohartmann@walstatt.org>
2020-08-29 11:04:24 +00:00
Alexander V. Chernikov
7c89a3b63f Move fib_rte_to_nh_flags() from net/route_var.h to net/route/nhop_ctl.c.
No functional changes.
Initially this function was created to perform runtime flag conversions
 for the previous incarnation of fib lookup functions. As these functions
 got deprecated, move the function to the file with the only remaining
 caller. Lastly, rename it to convert_rt_to_nh_flags() to follow the
 naming notation.
2020-08-28 23:01:56 +00:00
Alexander V. Chernikov
a624ca3dff Move net/route/shared.h definitions to net/route/route_var.h.
No functional changes.

net/route/shared.h was created in the inital phases of nexthop conversion.
It was intended to serve the same purpose as route_var.h - share definitions
 of functions and structures between the routing subsystem components. At
 that time route_var.h was included by many files external to the routing
 subsystem, which largerly defeats its purpose.

As currently this is not the case anymore and amount of route_var.h includes
 is roughly the same as shared.h, retire the latter in favour of the former.
2020-08-28 22:50:20 +00:00
Alexander V. Chernikov
b122304f6a Further split nhop creation and rtable operations.
As nexthops are immutable, some operations such as route attribute changes
 require nexthop fetching, forking, modification and route switching.
These operations are not atomic, so they may need to be retried multiple
 times in presence of multiple speakers changing the same route.

This change introduces "synchronisation" primitive: route_update_conditional(),
 simplifying logic for route changes and upcoming multipath operations.

Differential Revision:	https://reviews.freebsd.org/D26216
2020-08-28 21:59:10 +00:00
Alexander V. Chernikov
592d300e34 Remove RT_LOCK mutex from rte.
rtentry lock traditionally served 2 purposed: first was protecting refcounts,
 the second was assuring consistent field access/changes.
Since route nexthop introduction, the need for the former disappeared and
 the need for the latter reduced.
To be more precise, the following rte field are mutable:

rt_nhop (nexthop pointer, updated with RIB_WLOCK, passed in rib_cmd_info)
rte_flags (only RTF_HOST and RTF_UP, where RTF_UP gets changed at rte removal)
rt_weight (relative weight, updated with RIB_WLOCK, passed in rib_cmd_info)
rt_expire (time when rte deletion is scheduled, updated with RIB_WLOCK)
rt_chain (deletion chain pointer, updated with RIB_WLOCK)
All of them are updated under RIB_WLOCK, so the only remaining concern is the reading.

rt_nhop and rt_weight (addressed in this review) are read under rib lock and
 stored in the rib_cmd_info, so the caller has no problem with consitency.
rte_flags is currently read unlocked in rtsock reporting (however the scope
 is only RTF_UP flag, which is pretty static).
rt_expire is currently read unlocked in rtsock reporting.
rt_chain accesses are safe, as this is only used at route deletion.

rt_expire and rte_flags reads will be dealt in a separate reviews soon.

Differential Revision:	https://reviews.freebsd.org/D26162
2020-08-24 20:23:34 +00:00
Alexander V. Chernikov
93bfd365d2 Rename rt_flags to rte_flags && reduce number of rt_nhop accesses.
No functional changes.

Most of the routing flags are stored in the netxtop instead of rtentry.
Rename rt->rt_flags to rt->rte_flags to simplify reading/modifying code
 checking routing flags.

In the new multipath code, rt->rt_nhop may actually point to nexthop group
 instead of nhop. To ease transition, reduce the amount of rt->rt_nhop->...
 accesses.

Differential Revision:	https://reviews.freebsd.org/D26156
2020-08-22 19:30:56 +00:00
Mateusz Guzik
c93d310f87 Fix tinderbox build after r364465 2020-08-22 07:43:38 +00:00
Alexander V. Chernikov
f5247a232a Make net.fibs growable.
Allow to dynamically grow the amount of fibs in each vnet.

This change alters current behavior. Currently, if one defines
 ROUTETABLES > 1 in the kernel config, each vnet will be created
 with the number of fibs defined in the kernel config.
 After this commit vnets will be created with fibs=1.

Dynamic net.fibs is not compatible with net.add_addr_allfibs.
 The plan is to deprecate the latter and make
 net.add_addr_allfibs=0 default behaviour.

Reviewed by:	glebius
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D26062
2020-08-21 21:34:52 +00:00
Alexander V. Chernikov
2f23f45b20 Simplify dom_<rtattach|rtdetach>.
Remove unused arguments from dom_rtattach/dom_rtdetach functions and make
  them return/accept 'struct rib_head' instead of 'void **'.
Declare inet/inet6 implementations in the relevant _var.h headers similar
  to domifattach / domifdetach.
Add rib_subscribe_internal() function to accept subscriptions to the rnh
  directly.

Differential Revision:	https://reviews.freebsd.org/D26053
2020-08-14 21:29:56 +00:00
Alexander V. Chernikov
6cbadc4234 Move rtzone handling code to net/route_ctl.c
After moving the route control plane code from net/route.c,
 all rtzone users ended up being in net/route_ctl.c.
Move uma(9) rtzone setup/teardown code to net/route_ctl.c as well
 to have everything in a single place.

While here, remove custom initializers from the zone.
It was added originally to avoid setup/teardown of costy per-cpu couters.
With these counters removed, the only remaining job was avoiding rte mutex
 setup/teardown. Mutex setup is relatively cheap. Additionally, this mutex
 will soon be removed. With that in mind, there is no sense in keeping
 custom zone callbacks.

Differential Revision:	https://reviews.freebsd.org/D26051
2020-08-13 18:35:29 +00:00
Alexander V. Chernikov
8a0917c35b Do not enter epoch in add_route(), as it is already called in epoch.
Reviewed by:	glebius
2020-08-11 07:23:07 +00:00
Alexander V. Chernikov
136a1f8da8 Make <add|del|change>_route() static to finish the transition to the new kpi.
Discussed with:	glebius
2020-08-11 07:21:32 +00:00
Alexander V. Chernikov
9a00f6d067 Fix rib_subscribe() waitok flag by performing allocation outside epoch.
Make in6_inithead() use rib_subscribe with waitok to achieve reliable
 subscription allocation.

Reviewed by:	glebius
2020-08-11 07:05:30 +00:00
Alexander V. Chernikov
4c7ba83f9d Switch inet6 default route subscription to the new rib subscription api.
Old subscription model allowed only single customer.

Switch inet6 to the new subscription api and eliminate the old model.

Differential Revision:	https://reviews.freebsd.org/D25615
2020-07-12 11:24:23 +00:00
Alexander V. Chernikov
edc37a66e3 Add destructor for the rib subscription system to simplify users code.
Subscriptions are planned to be used by modules such as route lookup engines.
In that case that's the module task to properly unsibscribe before detach.
However, the in-kernel customer - inet6 wants to track default route changes.
To avoid having inet6 store per-fib subscriptions, handle automatic
 destruction internally.

Differential Revision:	https://reviews.freebsd.org/D25614
2020-07-12 11:18:09 +00:00
Mark Johnston
26dd427800 Split nhop_ref_object().
Now nhop_ref_object() unconditionally acquires a reference, and the new
nhop_try_ref_object() uses refcount_acquire_if_not_zero() to
conditionally acquire a reference.  Since the former is cheaper, use it
when we know that the initial counter value is non-zero.  No functional
change intended.

Reviewed by:	melifaro
Differential Revision:	https://reviews.freebsd.org/D25535
2020-07-06 21:20:57 +00:00
Alexander V. Chernikov
a287a973e3 Switch rtsock code to using newly-create rib_action() KPI call.
This simplifies the code and allows to further split rtentry and nexthop,
 removing one of the blockers for multipath code introduction, described in
 D24141.

Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D25192
2020-06-10 07:46:22 +00:00
Alexander V. Chernikov
41e66f4eca Add rib subscription API.
Currently there is no easy way of subscribing for the routing table changes.
The only existing way is to set ifa_rtrequest callback in the each protocol
 ifaddr, which is not convenient or extandable.

This change provides generic notification subscription mechanism, that will
 replace current ifa_rtrequest one and allow other applications such as
 accelerated routing lookup modules subscribe for the changes.

In particular, this change provides 2 hooks: 1) synchronous one
 (RIB_NOTIFY_IMMEDIATE), called under RIB_WLOCK, which ensures exact
 ordering of the changes and 2) async one, (RIB_NOTIFY_DELAYED)
 that is called after the change w/o holding locks. The latter one does not
 provide any notification ordering guarantee.

Differential Revision:  https://reviews.freebsd.org/D25070
2020-06-01 21:52:24 +00:00
Alexander V. Chernikov
46cc6153d4 Finish r361706: add sys/net/route/route_ctl.h, missed in previous commit. 2020-06-01 21:51:20 +00:00
Alexander V. Chernikov
da187ddb3d * Add rib_<add|del|change>_route() functions to manipulate the routing table.
The main driver for the change is the need to improve notification mechanism.
Currently callers guess the operation data based on the rtentry structure
 returned in case of successful operation result. There are two problems with
 this appoach. First is that it doesn't provide enough information for the
 upcoming multipath changes, where rtentry refers to a new nexthop group,
 and there is no way of guessing which paths were added during the change.
 Second is that some rtentry fields can change during notification and
 protecting from it by requiring customers to unlock rtentry is not desired.

Additionally, as the consumers such as rtsock do know which operation they
 request in advance, making explicit add/change/del versions of the functions
 makes sense, especially given the functions don't share a lot of code.

With that in mind, introduce rib_cmd_info notification structure and
 rib_<add|del|change>_route() functions, with mandatory rib_cmd_info pointer.
 It will be used in upcoming generalized notifications.

* Move definitions of the new functions and some other functions/structures
 used for the routing table manipulation to a separate header file,
 net/route/route_ctl.h. net/route.h is a frequently used file included in
 ~140 places in kernel, and 90% of the users don't need these definitions.

Reviewed by:		ae
Differential Revision:	https://reviews.freebsd.org/D25067
2020-06-01 20:49:42 +00:00
Alexander V. Chernikov
e7403d0230 Revert r361704, it accidentally committed merged D25067 and D25070. 2020-06-01 20:40:40 +00:00
Alexander V. Chernikov
79674562b8 * Add rib_<add|del|change>_route() functions to manipulate the routing table.
The main driver for the change is the need to improve notification mechanism.
Currently callers guess the operation data based on the rtentry structure
 returned in case of successful operation result. There are two problems with
 this appoach. First is that it doesn't provide enough information for the
 upcoming multipath changes, where rtentry refers to a new nexthop group,
 and there is no way of guessing which paths were added during the change.
 Second is that some rtentry fields can change during notification and
 protecting from it by requiring customers to unlock rtentry is not desired.

Additionally, as the consumers such as rtsock do know which operation they
 request in advance, making explicit add/change/del versions of the functions
 makes sense, especially given the functions don't share a lot of code.

With that in mind, introduce rib_cmd_info notification structure and
 rib_<add|del|change>_route() functions, with mandatory rib_cmd_info pointer.
 It will be used in upcoming generalized notifications.

* Move definitions of the new functions and some other functions/structures
 used for the routing table manipulation to a separate header file,
 net/route/route_ctl.h. net/route.h is a frequently used file included in
 ~140 places in kernel, and 90% of the users don't need these definitions.

Reviewed by:	ae
Differential Revision: https://reviews.freebsd.org/D25067
2020-06-01 20:32:02 +00:00
Alexander V. Chernikov
4d2c2509f2 Move <add|del|change>_route() functions to route_ctl.c in preparation of
multipath control plane changed described in D24141.

Currently route.c contains core routing init/teardown functions, route table
 manipulation functions and various helper functions, resulting in >2KLOC
 file in total. This change moves most of the route table manipulation parts
 to a dedicated file, simplifying planned multipath changes and making
 route.c more manageable.

Differential Revision:	https://reviews.freebsd.org/D24870
2020-05-23 19:06:57 +00:00
Alexander V. Chernikov
a82f62ec2d Remove refcounting from rtentry.
After making rtentry reclamation backed by epoch(9) in r361409, there is
 no reason in keeping reference counting code.

Differential Revision:	https://reviews.freebsd.org/D24867
2020-05-23 12:15:47 +00:00
Alexander V. Chernikov
2bbab0af6d Use epoch(9) for rtentries to simplify control plane operations.
Currently the only reason of refcounting rtentries is the need to report
 the rtable operation details immediately after the execution.
Delaying rtentry reclamation allows to stop refcounting and simplify the code.
Additionally, this change allows to reimplement rib_lookup_info(), which
 is used by some of the customers to get the matching prefix along
 with nexthops, in more efficient way.

The change keeps per-vnet rtzone uma zone. It adds nh_vnet field to
 nhop_priv to be able to reliably set curvnet even during vnet teardown.
Rest of the reference counting code will be removed in the D24867 .

Differential Revision:	https://reviews.freebsd.org/D24866
2020-05-23 10:21:02 +00:00
Warner Losh
83b4342743 Kill trailing newline while I'm here... 2020-05-12 23:46:52 +00:00
Alexander V. Chernikov
4a6ee281d9 Remove unused rnh_close callback from rtable & cleanup depends.
rnh_close callbackes was used by the in[6]_clsroute() handlers,
 doing cleanup in the route cloning code. Route cloning was eliminated
 somewhere around r186119. Last callback user was eliminated in r186215,
 11 years ago.

Differential Revision:	https://reviews.freebsd.org/D24793
2020-05-11 06:09:18 +00:00
Alexander V. Chernikov
656442a718 Embed dst sockaddr into rtentry and remove rte packet counter
Currently each rtentry has dst&gateway allocated separately from another zone,
 bloating cache accesses.

Current 'struct rtentry' has 12 "mandatory" radix pointers in the beginning,
 leaving 4 usable pointers/32 bytes in the first 2 cache lines (amd64).
Fields needed for the datapath are destination sockaddr and rt_nhop.

So far it doesn't look like there is other routable addressing protocol other
 than IPv4/IPv6/MPLS, which uses keys longer than 20 bytes.
With that in mind, embed dst into struct rtentry, making the first 24 bytes
 of rtentry within 128 bytes. That is enough to make IPv6 address within first
 128 bytes.

It is still pretty easy to add code for supporting separately-allocated dst,
 however it doesn't make a lot of sense in having such code without a use case.

As rS359823 moved the gateway to the nexthop structure, the dst embedding change
 removes the need for any additional allocations done by rt_setgate().

Lastly, as a part of cleanup, remove counter(9) allocation code, as this field
 is not used in packet processing anymore.

Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D24669
2020-05-08 21:06:10 +00:00
Alexander V. Chernikov
682b902daf Add rib_lookup() sockaddr lookup wrapper and make ifa_ifwithroute use it.
Create rib_lookup() wrapper around per-af dataplane lookup functions.
This will help in the cases of having control plane af-agnostic code.

Switch ifa_ifwithroute() to use this function instead of rtalloc1().

Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D24731
2020-05-07 08:11:36 +00:00
Alexander V. Chernikov
d94be4ccaa Switch DDB show route to direct rnh_matchaddr() call instead of rtalloc1().
Eliminate the last rtalloc1() call to finish transition to the new routing
KPI defined in r359823.

Differential Revision:	https://reviews.freebsd.org/D24663
2020-05-04 15:07:57 +00:00
Alexander V. Chernikov
4f08f052ad Simplify address parsing in DDB show route command.
Use db_get_line() to overcome parser limitation.

Differential Revision:	https://reviews.freebsd.org/D24662
2020-05-04 15:00:19 +00:00
Alexander V. Chernikov
9e02229580 Remove now-unused rt_ifp,rt_ifa,rt_gateway,rt_mtu rte fields.
After converting routing subsystem customers to use nexthop objects
 defined in r359823, some fields in struct rtentry became unused.

This commit removes rt_ifp, rt_ifa, rt_gateway and rt_mtu from struct rtentry
 along with the code initializing and updating these fields.

Cleanup of the remaining fields will be addressed by D24669.

This commit also changes the implementation of the RTM_CHANGE handling.
Old implementation tried to perform the whole operation under radix WLOCK,
 resulting in slow performance and hacks like using RTF_RNH_LOCKED flag.
New implementation looks up the route nexthop under radix RLOCK, creates new
 nexthop and tries to update rte nhop pointer. Only last part is done under
 WLOCK.
In the hypothetical scenarious where multiple rtsock clients
 repeatedly issue RTM_CHANGE requests for the same route, route may get
 updated between read and update operation. This is addressed by retrying
 the operation multiple (3) times before returning failure back to the
 caller.

Differential Revision:	https://reviews.freebsd.org/D24666
2020-05-04 14:31:45 +00:00
Alexander V. Chernikov
8c61eb2107 Convert more rtentry field accesses into nhop fields accesses.
Continue routing subsystem conversion to nhop objects defined in r359823.
Use fields from nhop structure instead of "struct rtentry" fields.
This is one of the last changes prior to removing rt_ifp, rt_ifa,
 rt_gateway and rt_mtu from struct rtentry.

Differential Revision:	https://reviews.freebsd.org/D24609
2020-04-29 21:54:09 +00:00
Alexander V. Chernikov
fc88ecd31a Move route-specific ddb commands to route/route_ddb.c
Currently functionality resides in rtsock.c, which is a controlling
 interface, partially external to the routing subsystem.
Additionally, DDB-supporting functionality is > 100SLOC, which deserves
 a separate file.

Given that, move this functionality to a newly-created net/route/ subdir.

Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D24561
2020-04-28 20:00:17 +00:00
Alexander V. Chernikov
e7d8af4f65 Move route_temporal.c and route_var.h to net/route.
Nexthop objects implementation, defined in r359823,
 introduced sys/net/route directory intended to hold all
 routing-related code. Move recently-introduced route_temporal.c and
 private route_var.h header there.

Differential Revision:	https://reviews.freebsd.org/D24597
2020-04-28 19:14:09 +00:00
Adrian Chadd
2538a4b1b4 Remove an duplicate definition of nhops_dump_sysctl()
One of the source files included both nhop.h and shared.h, leading to this
clashing.

Tested with: mips-gcc cross toolchain
2020-04-16 23:28:47 +00:00
Alexander V. Chernikov
a666325282 Introduce nexthop objects and new routing KPI.
This is the foundational change for the routing subsytem rearchitecture.
 More details and goals are available in https://reviews.freebsd.org/D24141 .

This patch introduces concept of nexthop objects and new nexthop-based
 routing KPI.

Nexthops are objects, containing all necessary information for performing
 the packet output decision. Output interface, mtu, flags, gw address goes
 there. For most of the cases, these objects will serve the same role as
 the struct rtentry is currently serving.
Typically there will be low tens of such objects for the router even with
 multiple BGP full-views, as these objects will be shared between routing
 entries. This allows to store more information in the nexthop.

New KPI:

struct nhop_object *fib4_lookup(uint32_t fibnum, struct in_addr dst,
  uint32_t scopeid, uint32_t flags, uint32_t flowid);
struct nhop_object *fib6_lookup(uint32_t fibnum, const struct in6_addr *dst6,
  uint32_t scopeid, uint32_t flags, uint32_t flowid);

These 2 function are intended to replace all all flavours of
 <in_|in6_>rtalloc[1]<_ign><_fib>, mpath functions  and the previous
 fib[46]-generation functions.

Upon successful lookup, they return nexthop object which is guaranteed to
 exist within current NET_EPOCH. If longer lifetime is desired, one can
 specify NHR_REF as a flag and get a referenced version of the nexthop.
 Reference semantic closely resembles rtentry one, allowing sed-style conversion.

Additionally, another 2 functions are introduced to support uRPF functionality
 inside variety of our firewalls. Their primary goal is to hide the multipath
 implementation details inside the routing subsystem, greatly simplifying
 firewalls implementation:

int fib4_lookup_urpf(uint32_t fibnum, struct in_addr dst, uint32_t scopeid,
  uint32_t flags, const struct ifnet *src_if);
int fib6_lookup_urpf(uint32_t fibnum, const struct in6_addr *dst6, uint32_t scopeid,
  uint32_t flags, const struct ifnet *src_if);

All functions have a separate scopeid argument, paving way to eliminating IPv6 scope
 embedding and allowing to support IPv4 link-locals in the future.

Structure changes:
 * rtentry gets new 'rt_nhop' pointer, slightly growing the overall size.
 * rib_head gets new 'rnh_preadd' callback pointer, slightly growing overall sz.

Old KPI:
During the transition state old and new KPI will coexists. As there are another 4-5
 decent-sized conversion patches, it will probably take a couple of weeks.
To support both KPIs, fields not required by the new KPI (most of rtentry) has to be
 kept, resulting in the temporary size increase.
Once conversion is finished, rtentry will notably shrink.

More details:
* architectural overview: https://reviews.freebsd.org/D24141
* list of the next changes: https://reviews.freebsd.org/D24232

Reviewed by:	ae,glebius(initial version)
Differential Revision:	https://reviews.freebsd.org/D24232
2020-04-12 14:30:00 +00:00