Commit Graph

172 Commits

Author SHA1 Message Date
Alexander V. Chernikov
7e5bf68495 netlink: add netlink support
Netlinks is a communication protocol currently used in Linux kernel to modify,
 read and subscribe for nearly all networking state. Interfaces, addresses, routes,
 firewall, fibs, vnets, etc are controlled via netlink.
It is async, TLV-based protocol, providing 1-1 and 1-many communications.

The current implementation supports the subset of NETLINK_ROUTE
family. To be more specific, the following is supported:
* Dumps:
 - routes
 - nexthops / nexthop groups
 - interfaces
 - interface addresses
 - neighbors (arp/ndp)
* Notifications:
 - interface arrival/departure
 - interface address arrival/departure
 - route addition/deletion
* Modifications:
 - adding/deleting routes
 - adding/deleting nexthops/nexthops groups
 - adding/deleting neghbors
 - adding/deleting interfaces (basic support only)
* Rtsock interaction
 - route events are bridged both ways

The implementation also supports the NETLINK_GENERIC family framework.

Implementation notes:
Netlink is implemented via loadable/unloadable kernel module,
 not touching many kernel parts.
Each netlink socket uses dedicated taskqueue to support async operations
 that can sleep, such as interface creation. All message processing is
 performed within these taskqueues.

Compatibility:
Most of the Netlink data models specified above maps to FreeBSD concepts
 nicely. Unmodified ip(8) binary correctly works with
interfaces, addresses, routes, nexthops and nexthop groups. Some
software such as net/bird require header-only modifications to compile
and work with FreeBSD netlink.

Reviewed by:	imp
Differential Revision: https://reviews.freebsd.org/D36002
MFC after:	2 months
2022-10-01 14:15:35 +00:00
Alexander V. Chernikov
e762417077 routing: constantify nh/nhg argument in <nhop|nhgrp>_get_origin().
MFC after:	1 month
2022-09-08 10:21:25 +00:00
Alexander V. Chernikov
000250be0d routing: add abitity to set the protocol that installed route/nexthop.
Routing daemons such as bird need to know if they install certain route
 so they can clean it up on startup, as a form of achieving consistent
 state during the crash recovery.
Currently they use combination of routing flags (RTF_PROTO1) to detect
 these routes when interacting via route(4) rtsock protocol.
Netlink protocol has a special "rtm_protocol" field that is filled and
 checked by the route originator. To prepare for the upcoming netlink
 introduction, add ability to record origing to both nexthops and
 nexthop groups via <nhop|nhgrp>_<get|set>_origin() KPI. The actual
 calls will be used in the followup commits.

MFC after:	1 month
2022-09-08 09:18:32 +00:00
Alexander V. Chernikov
4bccbf03d8 routing: allow logging framework to be used outside of the subsystem
MFC after:	2 weeks
2022-09-05 10:44:27 +00:00
Gleb Smirnoff
e18c5816ea domains: use queue(9) SLIST for linked list of domains 2022-08-29 19:15:01 -07:00
Alexander V. Chernikov
177f04d57f routing: constantify @rc in rib_decompose_notification().
Clarify the @rc immutability by explicitly marking @rc const.

MFC after:	2 weeks
2022-08-29 18:12:24 +00:00
Alexander V. Chernikov
7b3440fc30 Revert "routing: install prefix and loopback routes using new nhop-based KPI."
Temporarily revert the commit to unblock testing.

This reverts commit a1b59379db.
2022-08-29 16:20:42 +00:00
Alexander V. Chernikov
578a99c939 routing: improve multiline debug
Add IF_DEBUG_LEVEL() macro to ensure all debug output preparation
 is run only if the current debug level is sufficient. Consistently
 use it within routing subsystem.

MFC after:	2 weeks
2022-08-29 15:14:49 +00:00
Alexander V. Chernikov
fe05d1dd0f routing: extend nhop(9) kpi
* add nhop_get_unlinked() used to prepare referenced but not
 linked nexthop, that can later be used as a clone source.
* add nhop_check_gateway() to check for allowed address family
  combinations between the rib family and neighbor family (useful
  for 4o6 or direct routes)
* add nhop_set_upper_family() to allow copying IPv6 nexthops to
 IPv4 rib.
* add rt_get_rnd() wrapper, returning both nexthop/group and its
 weight attached to the rtentry.
* Add CHT_SLIST_FOREACH_SAFE(), allowing to delete items during
  iteration.

MFC after:	2 weeks
2022-08-29 14:46:03 +00:00
Alexander V. Chernikov
c24a8f19c5 routing: fix rib_add_route_px()
Fix panic in newly-added rib_add_route_px() by removin unlocked
 prefix lookup.

MFC after:	2 weeks
2022-08-29 12:57:47 +00:00
Alexander V. Chernikov
db4ca19002 routing: add ability to store opaque indentifiers in nhops/nhgs
This is a pre-requisite for the direct nexthop/nexhop group operations
 via netlink.

MFC after:	2 weeks
2022-08-29 12:20:28 +00:00
Alexander V. Chernikov
d8b2693414 routing: add rib_add_default_route() wrapper
Multiple consumers in the kernel space want to install IPv4 or IPv6
 default route. Provide convenient wrapper to simplify the code
 inside the customers.

MFC after:		1 month
Differential Revision:	https://reviews.freebsd.org/D36167
2022-08-29 10:08:24 +00:00
Alexander V. Chernikov
a1b59379db routing: install prefix and loopback routes using new nhop-based KPI.
Construct the desired hexthops directly instead of using the
 "translation" layer in form of filling rt_addrinfo data.
Simplify V_rt_add_addr_allfibs handling by using recently-added
 rib_copy_route() to propagate the routes to the non-primary address
 fibs.

MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D36166
2022-08-29 10:07:58 +00:00
Alexander V. Chernikov
730bfa2805 routing: add rib_match_gw() helper
Finish 02e05b8fae:
* add gateway matcher function that can be used in rib_del_route_px()
 or any rib_walk-family functions. It will be used in the upcoming
 migration to the new KPI
* rename gw_fulter_func to match_gw_one() to better signal the
 function purpose / semantic.

MFC after:	1 month
2022-08-12 09:31:21 +00:00
Mateusz Guzik
69077c81e5 routing: fix non-debug build
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-08-11 14:12:59 +00:00
Alexander V. Chernikov
40503b792f routing: populate fibs with interface routes after growing net.fibs.
Currently it is possible to extend number of fibs in runtime, but this
 functionality is of limited use when net.add_addrs_all_fibs is
 non-zero, as the routing tables are created empty.

This change automatically populate newly-created fibs with the kernel-originated
 interface routes (filtered by RTF_PINNED flag) if net.add_addrs_all_fibs
 is set.

```
-> sysctl net.add_addr_allfibs=1
net.add_addr_allfibs: 0 -> 1
-> sysctl net.fibs
net.fibs: 2
-> sysctl net.fibs=3
net.fibs: 2 -> 3

BEFORE:
-> setfib 2 netstat -rn
Routing tables (fib: 2)

AFTER:
-> setfib 2 netstat -rn
Routing tables (fib: 2)

Internet:
Destination        Gateway            Flags     Netif Expire
10.0.0.0/24        link#1             U        vtnet0
10.0.0.5           link#1             UHS         lo0
127.0.0.1          link#2             UH          lo0

Internet6:
Destination                       Gateway                       Flags     Netif Expire
::1                               link#2                        UHS         lo0
2a01:4f9:3a:fa00::/64             link#1                        U        vtnet0
2a01:4f9:3a:fa00:5054:ff:fe15:4a3b link#1                       UHS         lo0
fe80::%vtnet0/64                  link#1                        U        vtnet0
fe80::5054:ff:fe15:4a3b%vtnet0    link#1                        UHS         lo0
fe80::%lo0/64                     link#2                        U           lo0
fe80::1%lo0                       link#2                        UHS         lo0
```

Differential Revision: https://reviews.freebsd.org/D36075
MFC after:	1 month
2022-08-11 12:48:08 +00:00
Alexander V. Chernikov
02e05b8fae routing: fixup empty mask prefix handling after 2ce553854c.
MFC after: 1 month
2022-08-11 12:48:04 +00:00
Alexander V. Chernikov
258828d03b routing: fix build warning without ROUTE_MPATH
Reported by:	Gary Jennejohn <garyj@gmx.de>
MFC after:	1 month
2022-08-11 09:47:26 +00:00
Alexander V. Chernikov
685866bbe1 routing: fix build without ROUTE_MPATH
MFC after:	1 month
2022-08-10 20:45:22 +00:00
Alexander V. Chernikov
5c4d2252d7 routing: move rtentry and subscription code out of route_ctl.c
route_ctl.c size has grown considerably since initial introduction.
Factor out non-relevant parts:
* all rtentry logic, such as creation/destruction and accessors
 goes to net/route/route_rtentry.c
* all rtable subscription logic goes to net/route/route_subscription.c

Differential Revision: https://reviews.freebsd.org/D36074
MFC after:	1 month
2022-08-10 18:56:01 +00:00
Alexander V. Chernikov
2ce553854c routing: add rib_<add|del>_route_px() functions operating with nexthops.
This change adds public KPI to work with routes using pre-created
 nexthops, instead of using data from addrinfo structures. These
 functions will be later used for adding/deleting kernel-originated
 routes and upcoming netlink protocol.

As a part of providing this KPI, low-level route addition code has been
 reworked to provide more control over route creation or change.
 Specifically, a number of operation flags
 (RTM_F_<CREATE|EXCL|REPLACE|APPEND>) have been added, defining the
 desired behaviour the the route already exists (or not exists). This
 change required some changes in the multipath addition code, resulting
 in moving this code to route_ctl.c, rendering mpath_ctl.c empty.

Differential Revision: https://reviews.freebsd.org/D36073
MFC after:	1 month
2022-08-10 18:56:01 +00:00
Alexander V. Chernikov
66230639ce routing: split nexthop creation and rtentry creation.
This change is required for the upcoming introduction of the next
 nexhop-based operations KPI, as it will create rtentry and nexthops
 at different stages of route table modification.

Differential Revision: https://reviews.freebsd.org/D36072
MFC after:	2 weeks
2022-08-10 18:27:13 +00:00
Alexander V. Chernikov
dedeec1143 routing: refactor #2
* Use same filter func (rib_filter_f_t) for nexhtop groups to
 simplify callbacks.
* simplify conditional route deletion & remove the need to pass
 rt_addrinfo to the low-level deletion functions
* speedup rib_walk_del() by removing an additional per-prefix lookup

Differential Revision: https://reviews.freebsd.org/D36071
MFC after:	1 month
2022-08-10 18:20:21 +00:00
Alexander V. Chernikov
0d60e88b41 routing: refactor control cmds #1
This and the follow-up routing-related changes target to remove or
 reduce `struct rt_addrinfo` usage and use recently-landed nhop(9)
 KPI instead.
Traditionally `rt_addrinfo` structure has been used to propagate all necessary
information between the protocol/rtsock and a routing layer. Many
functions inside routing subsystem uses it internally. However, using
this structure became somewhat complicated, as there are too many ways
of specifying a single state and verifying data consistency is hard.
For example, arerouting flgs consistent with mask/gateway sockaddr pointers?
Is mask really a host mask? Are sockaddr "valid" (e.g. properly zeroed, masked,
have proper length)? Are they mutable? Is the suggested interface specified
 by the interface index embedded into the sockadd_dl gateway, or passed
 as RTAX_IFP parameter, or directly provided by rti_ifp or it needs to
 be derived from the ifa?
These (and other similar) questions have to be considered every time when
 a function has `rt_addrinfo` pointer as an argument.

The new approach is to bring more control back to the protocols and
construct the desired routing objects themselves - in the end, it's the
protocol/subsystem who knows the desired outcome.

This specific diff changes the following:
* add explicit basic low-level radix operations:
 add_route() (renamed from add_route_nhop())
 delete_route() (factored from change_route_nhop())
 change_route() (renamed from change_route_nhop)
* remove "info" parameter from change_route_conditional() as a part
 of reducing rt_addrinfo usage in the internal KPIs
* add lookup_prefix_rt() wrapper for doing re-lookups after
 RIB lock/unlock

Differential Revision: https://reviews.freebsd.org/D36070
MFC after:	2 weeks
2022-08-10 18:20:20 +00:00
Alexander V. Chernikov
93dd3adac7 fib_algo: set vnet when destroying algo instance
Reported by:	Konrad Kręciwilk <konrad.kreciwilk@korbank.pl>
MFC after:	2 weeks
2022-08-06 12:51:22 +00:00
Alexander V. Chernikov
d46b000ecc routing: remove duplicate error message after 5c23343b8c.
MFC after:	2 weeks
2022-08-04 09:53:58 +00:00
Mateusz Guzik
412bdb5a46 route: fix NOIP builds
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-08-03 21:23:32 +00:00
Alexander V. Chernikov
ae6bfd12c8 routing: refactor private KPI
* Make nhgrp_get_nhops() return const struct weightened_nhop to
 indicate that the list is immutable
* Make nhgrp_get_group() return the actual group, instead of
 group+weight.

MFC after:	2 weeks
2022-08-01 10:02:12 +00:00
Alexander V. Chernikov
5c23343b8c routing: convert remnants of DPRINTF to FIB_CTL_LOG().
Convert the last remaining pieces of old-style debug messages
 to the new debugging framework.

Differential Revision: https://reviews.freebsd.org/D35994
MFC after:	2 weeks
2022-08-01 08:55:07 +00:00
Alexander V. Chernikov
800c68469b routing: add nhop(9) kpi.
Differential Revision: https://reviews.freebsd.org/D35985
MFC after:	1 month
2022-08-01 08:52:26 +00:00
Alexander V. Chernikov
29029b06a6 routing: remove info argument from add/change_route_nhop().
Currently, rt_addrinfo(info) serves as a main "transport" moving
 state between various functions inside the routing subsystem.
As all of the fields are filled in directly by the customers, it
 is problematic to maintain consistency, resulting in repeated checks
 inside many functions. Additionally, there are multiple ways of
 specifying the same value (RTAX_IFP vs rti_ifp / rti_ifa) and so on.
With the upcoming nhop(9) kpi it is possible to store all of the
 required state in the nexthops in the consistent fashion, reducing the
 need to use "info" in the KPI calls.
Finally, rt_addrinfo structure format was derived from the rtsock wire
 format, which is different from other kernel routing users or netlink.

This cleanup simplifies upcoming nhop(9) kpi and netlink introduction.

Reviewed by:	zlei.huang@gmail.com
Differential Revision: https://reviews.freebsd.org/D35972
MFC after:	2 weeks
2022-08-01 07:41:07 +00:00
Alexander V. Chernikov
2717e958df routing: move route expiration time to its nexthop
Expiration time is actually a path property, not a route property.
Move its storage to nexthop to simplify upcoming nhop(9) KPI changes
 and netlink introduction.

Differential Revision: https://reviews.freebsd.org/D35970
MFC after:	2 weeks
2022-08-01 07:26:53 +00:00
Alexander V. Chernikov
27f107e1b4 routing: add debug printing helpers for rtentry and RTM* cmds.
MFC after:	2 weeks
2022-07-31 09:01:42 +00:00
Zhenlei Huang
150486f6a9 Introduce and use the NET_EPOCH_DRAIN_CALLBACKS() macro
Reviewed by:	melifao, kp
Differential Revision:	https://reviews.freebsd.org/D35968
2022-07-29 21:21:10 +02:00
Dimitry Andric
5e1097f83c Adjust function definitions in route_ctl.c to avoid clang 15 warnings
With clang 15, the following -Werror warnings are produced:

    sys/net/route/route_ctl.c:130:17: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
    vnet_rtzone_init()
                    ^
                     void
    sys/net/route/route_ctl.c:139:20: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
    vnet_rtzone_destroy()
                       ^
                        void

This is because vnet_rtzone_init() and vnet_rtzone_destroy() are
declared with (void) argument lists, but defined with empty argument
lists. Make the definitions match the declarations.

MFC after:	3 days
2022-07-26 21:25:09 +02:00
Dimitry Andric
a8adf13a63 Adjust function definition in nhop_ctl.c to avoid clang 15 warnings
With clang 15, the following -Werror warning is produced:

    sys/net/route/nhop_ctl.c:508:21: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
    alloc_nhop_structure()
                        ^
                         void

This is alloc_nhop_structure() is declared with a (void) argument list,
but defined with an empty argument list. Make the definition match the
declaration.

MFC after:	3 days
2022-07-26 21:25:09 +02:00
Mitchell Horne
258958b3c7 ddb: use _FLAGS command macros where appropriate
Some command definitions were forced to use DB_FUNC in order to specify
their required flags, CS_OWN or CS_MORE. Use the new macros to simplify
these.

Reviewed by:	markj, jhb
MFC after:	3 days
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D35582
2022-07-05 11:56:55 -03:00
Mateusz Guzik
db4b40213a routing: hide notify_add and notify_del behind ROUTE_MPATH
Fixes a warn about unused routines without the option.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-07-04 08:38:13 +00:00
Alexander V. Chernikov
8010b7a78a routing: simplify decompose_change_notification().
The function's goal is to compare old/new nhop/nexthop group for the route
 and decompose it into the series of RTM_ADD/RTM_DELETE single-nhop
 events, calling specified callback for each event.
Simplify it by properly leveraging the fact that both old/new groups
 are sorted nhop-# ascending.

Tested by:	Claudio Jeker<claudio.jeker@klarasystems.com>
Differential Revision: https://reviews.freebsd.org/D35598
MFC after: 2 weeks
2022-06-27 17:30:52 +00:00
Alexander V. Chernikov
76f1ab8eff routing: actually sort nexthops in nhgs by their index
Nexthops in the nexthop groups needs to be deterministically sorted
 by some their property to simplify reporting cost when changing
 large nexthop groups.

Fix reporting by actually sorting next hops by their indices (`wn_cmp_idx()`).
As calc_min_mpath_slots_fast() has an assumption that next hops are sorted
using their relative weight in the nexthop groups, it needs to be
addressed as well. The latter sorting is required to quickly determine the
layout of the next hops in the actual forwarding group. For example,
what's the best way to split the traffic between nhops with weights
19,31 and 47 if the maximum nexthop group width is 64?
It is worth mentioning that such sorting is only required during nexthop
group creation and is not used elsewhere. Lastly, normally all nexthop
are of the same weight. With that in mind, (a) use spare 32 bytes inside
`struct weightened_nexthop` to avoid another memory allocation and
(b) use insertion sort to sort the nexthop weights.

Reported by:	thj
Tested by:	Claudio Jeker<claudio.jeker@klarasystems.com>
Differential Revision: https://reviews.freebsd.org/D35599
MFC after:	2 weeks
2022-06-27 17:30:52 +00:00
Alexander V. Chernikov
0e87bab6b4 routing: fix debug headers added in 6fa8ed43ee.
- move debug headers out of COMPAT_FREEBSD32 in rtsock.c
- remove accidentally-added LOG_ defines from syslog.h

MFC after:	2 weeks
2022-06-25 23:05:25 +00:00
Alexander V. Chernikov
6fa8ed43ee routing: improve debugging.
Use unified guidelines for the severity across the routing subsystem.
Update severity for some of the already-used messages to adhere the
guidelines.
Convert rtsock logging to the new FIB_ reporting format.

MFC after:	2 weeks
2022-06-25 19:53:31 +00:00
Alexander V. Chernikov
c38da70c28 routing: fix RTM_CHANGE nhgroup updates.
RTM_CHANGE operates on a single component of the multipath route (e.g. on a single nexthop).
Search of this nexthop is peformed by iterating over each component from multipath (nexthop)
 group, using check_info_match_nhop. The problem with the current code that it incorrectly
 assumes that `check_info_match_nhop()` returns true value on match, while in reality it
 returns an error code on failure). Fix this by properly comparing the result with 0.
Additionally, the followup code modified original necthop group instead of a new one.
Fix this by targetting new nexthop group instead.

Reported by:	thj
Tested by:	Claudio Jeker <claudio.jeker@klarasystems.com>
Differential Revision: https://reviews.freebsd.org/D35526
MFC after: 2 weeks
2022-06-25 18:54:57 +00:00
Alexander V. Chernikov
5d6894bd66 routing: improve debug logging
Use standard logging (FIB_XX_LOG) across nhg code instead of using
 old-style DPRINTFs.
 Add debug object printer for nhgs (`nhgrp_print_buf`).

Example:

```
Jun 19 20:17:09 devel2 kernel: [nhgrp] inet.0 nhgrp_ctl_alloc_default: multipath init done
Jun 19 20:17:09 devel2 kernel: [nhg_ctl] inet.0 alloc_nhgrp: num_nhops: 2, compiled_nhop: 2

Jun 19 20:17:26 devel2 kernel: [nhg_ctl] inet.0 alloc_nhgrp: num_nhops: 3, compiled_nhop: 3
Jun 19 20:17:26 devel2 kernel: [nhg_ctl] inet.0 destroy_nhgrp: destroying nhg#0/sz=2:[#6:1,#5:1]
```

Differential Revision: https://reviews.freebsd.org/D35525
MFC after: 2 weeks
2022-06-22 15:59:21 +00:00
John Baldwin
2174f0f2f2 net/route: Use __diagused for variables only used in KASSERT(). 2022-04-13 16:08:19 -07:00
John Baldwin
f7236dd068 change_mpath_route: Remove write-only nh variable.
While here, cleanup the style of the function prologue by moving an
assignment out of the middle of two variable declaration blocks.
2022-04-06 16:45:28 -07:00
John Baldwin
371c917b0b unlink_nhgrp: Remove write-only variable.
Possibly one could assert that ret should always be 0 here (that is,
that there was always an index found in the bitmask).  That should be
true since a bitmask index is allocated before the nhgrp is inserted
in the ctl->gr_head list in link_nhgrp.
2022-04-06 16:45:27 -07:00
Warner Losh
5de5b5a34d route_ctl: eliminate write only variables ifa and nh
Sponsored by:		Netflix
2022-04-04 22:30:48 -06:00
Warner Losh
7f9c3339a4 get_nhop: eliminate write only variable gateway
Sponsored by:		Netflix
2022-04-04 22:30:47 -06:00
Alexander V. Chernikov
1b8b69508b routing: copy nexthop fib when changing existing nexthop
MFC after:	1 day
2022-03-28 11:32:30 +00:00