Commit Graph

5127 Commits

Author SHA1 Message Date
Kristof Provost
6905fd01cb if_ovpn: ensure we're in vnet context when calling sorele()
We reference count to ensure we don't release the socket while we still
have data in flight. That means that we can end up releasing the socket
from ovpn_encrypt_tx_cb().

We must have a vnet context set when calling sorele() (which asserts
this from within sofree()), so move the CURVNET_SET()/CURVNET_RESTORE()
to ensure this is the case.

While here also add a couple of assertions to make this more obvious,
and to ease future debugging.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D37326
2022-11-14 09:36:44 +01:00
Kristof Provost
2c58d0cb3b if_ovpn: fix AES-128-GCM support
We need to explicitly list AES-128-GCM as an allowed cipher for that
mode to work. While here also add AES-192-GCM. That brings our supported
cipher list in line with other openvpn/dco platforms.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-11-11 11:17:39 +01:00
Luiz Amaral
813c5b75e6 pfsync: prepare code to accommodate AF_INET6 family
Work is ongoing to add support for pfsync over IPv6. This required some
changes to allow for differentiating between the two families in a more
generic way.

This patch converts the relevant ioctls to using nvlists, making future
extensions (such as supporting IPv6 addresses) easier.

Sponsored by:	InnoGames GmbH
Differential Revision:	https://reviews.freebsd.org/D36277
2022-11-09 21:06:07 +01:00
Kristof Provost
8a8af94240 pf: bridge-to
Allow pf (l2) to be used to redirect ethernet packets to a different
interface.

The intended use case is to send 802.1x challenges out to a side
interface, to enable AT&T links to function with pfSense as a gateway,
rather than the AT&T provided hardware.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D37193
2022-11-02 15:45:23 +01:00
Kristof Provost
9f8f3a8e9a ipsec: add support for CHACHA20POLY1305
Based on a patch by ae@.

Reviewed by:	gbe (man page), pauamma (man page)
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D37180
2022-11-02 14:19:04 +01:00
John Baldwin
744bfb2131 Import the WireGuard driver from zx2c4.com.
This commit brings back the driver from FreeBSD commit
f187d6dfbf plus subsequent fixes from
upstream.

Relative to upstream this commit includes a few other small fixes such
as additional INET and INET6 #ifdef's, #include cleanups, and updates
for recent API changes in main.

Reviewed by:	pauamma, gbe, kevans, emaste
Obtained from:	git@git.zx2c4.com:wireguard-freebsd @ 3cc22b2
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D36909
2022-10-28 13:36:12 -07:00
Gordon Bergling
8ba2beacfd netmap(4): Fix a typo in a source code comment
- s/microsconds/microseconds/

MFC after:	3 days
2022-10-25 14:56:25 +02:00
Kristof Provost
13b1d6f0c9 if_ovpn: avoid netisr_queue name conflicts
Rename the netisr_queue variable in if_ovpn.c to avoid naming conflicts.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-10-24 20:45:39 +02:00
Kristof Provost
22893e5840 bridge: default to not filtering L3
Change the default for net.link.bridge.pfil_member and
net.link.bridge.pfil_bridge to zero.

That is, default to not calling layer 3 firewalls on the bridge or its
member interfaces.

With either of these enabled the bridge will, during L2 processing,
remove the Ethernet header from packets, feed them to L3 firewalls,
re-add the Ethernet header and send them out.

Not only does this interact very poorly with firewalls which defer
packets, or reassemble and refragment IPv6, it also causes considerable
confusion for users, because the firewall gets called in unexpected
ways.

For example, a bridge which contains a bhyve tap and the host's LAN
interface. We'd expect traffic between the LAN and bhyve VM to pass, no
matter what (layer 3) firewall rules are set on the host. That's not the
case as long as pfil_bridge or pfil_member are set.

Reviewed by:	Zhenlei Huang
MFC:		never
Differential Revision:	https://reviews.freebsd.org/D37009
2022-10-24 08:52:21 +02:00
Kristof Provost
dc12ee39b7 if_ovpn: add sysctls for netisr_queue() and crypto_dispatch_async()
Allow the choice between asynchronous and synchronous netisr and crypto
calls. These have performance implications, but depend on the specific
setup and OCF back-end.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D37017
2022-10-24 10:08:35 +02:00
Eric Joyner
9c95013905
iflib: Introduce v2 of TX Queue Select Functionality
For v2, iflib will parse packet headers before queueing a packet.

This commit also adds a new field in the structure that holds parsed
header information from packets; it stores the IP ToS/traffic class
field found in the IPv4/IPv6 header.

To help, it will only partially parse header packets before queueing
them by using a new header parsing function that does less than the
current parsing header function; for our purposes we only need up to the
minimal IP header in order to get the IP ToS infromation and don't need
to pull up more data.

For now, v1 and v2 co-exist in this patch; v1 still offers a
less-invasive method where none of the packet is parsed in iflib before
queueing.

This also bumps the sys/param.h version.

Signed-off-by:	Eric Joyner <erj@FreeBSD.org>
Tested by:	IntelNetworking
MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision: 	https://reviews.freebsd.org/D34742
2022-10-17 14:59:55 -07:00
Gert Doering
2e797555f7 if_ovpn(4): implement ioctl() to set if_flags
Fully working openvpn(8) --iroute support needs real subnet config
on ovpn(4) interfaces (IFF_BROADCAST), while client-side/p2p
configs need IFF_POINTOPOINT setting.  So make this configurable.

Reviewed by:	kp
2022-10-17 15:33:45 +02:00
Kristof Provost
b136983a8a if_ovpn: fix use-after-free
ovpn_encrypt_tx_cb() calls ovpn_encap() to transmit a packet, then adds
the length of the packet to the "tunnel_bytes_sent" counter.  However,
after ovpn_encap() returns 0, the mbuf chain may have been freed, so the
load of m->m_pkthdr.len may be a use-after-free.

Reported by:	markj
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-10-17 09:24:41 +02:00
Zhenlei Huang
43f8c763cd if_me: Use dedicated network privilege
Separate if_me privileges from if_gif.

Reviewed by:		kp
Differential Revision:	https://reviews.freebsd.org/D36691
2022-10-15 17:05:36 +02:00
Kristof Provost
133935d26f pf: atomically increment state ids
Rather than using a per-cpu state counter, and adding in the CPU id we
can atomically increment the number.
This has the advantage of removing the assumption that the CPU ID fits
in 8 bits.

Event:		Aberdeen Hackathon 2022
Reviewed by:	mjg
Differential Revision:	https://reviews.freebsd.org/D36915
2022-10-08 18:27:29 +02:00
Kristof Provost
61ab88d873 if_ovpn: remove an incorrect assertion
netisr_dispatch() can fail, especially when under high traffic loads.
This isn't a fatal error, so simply don't check the return value.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-10-07 19:10:01 +02:00
Zhenlei Huang
de1ea2d517 if_vxlan(4): Correct the statistic for output bytes
The vxlan interface encapsulates the Ethernet frame by prepending IP/UDP
and vxlan headers. For statistics, only the payload, i.e. the
encapsulated (inner) frame should be counted.

Event:		Aberdeen Hackathon 2022
Reviewed by:	kp
Differential Revision:	https://reviews.freebsd.org/D36855
2022-10-07 13:45:16 +02:00
Kristof Provost
4f756295e0 if_ovpn: ensure we're in net_epoch when calling ovpn_encap()
If the crypto callback is asynchronous we're no longer in net_epoch,
which ovpn_encap() (and ip_output() it calls) expect.

Ensure we've entered the epoch.

Do the same thing for the rx path.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-10-06 17:42:12 +02:00
Kristof Provost
1d090028d3 pf: use time_to for timestamps
Use time_t rather than uint32_t to represent the timestamps. That means
we have 64 bits rather than 32 on all platforms except i386, avoiding
the Y2K38 issues on most platforms.

Reviewed by:	Zhenlei Huang
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D36837
2022-10-05 17:52:27 +02:00
Zhenlei Huang
1fc839f489 if_vxlan(4): Add missing statistic for input packets
Event:		Aberdeen hackathon 2022
Reviewed by:	bryanv, kp
Differential Revision:	https://reviews.freebsd.org/D36841
2022-10-05 12:38:30 +02:00
Jung-uk Kim
56cdab3372 bpf: obtain timestamps from controller via pkthdr if available
r325506 (3cf8254f1e) extended struct pkthdr to add packet timestamp in
mbuf(9) chain.  For example, cxgbe(4) and mlx5en(4) support this feature.
Use the timestamp for bpf(4) if it is available.

Reviewed by:	hselasky, kib, np
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D36868
2022-10-03 18:53:40 -04:00
Kristof Provost
8a299958c1 if_epair: fix build with RSS
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-10-03 17:02:55 +02:00
Alexander V. Chernikov
7e5bf68495 netlink: add netlink support
Netlinks is a communication protocol currently used in Linux kernel to modify,
 read and subscribe for nearly all networking state. Interfaces, addresses, routes,
 firewall, fibs, vnets, etc are controlled via netlink.
It is async, TLV-based protocol, providing 1-1 and 1-many communications.

The current implementation supports the subset of NETLINK_ROUTE
family. To be more specific, the following is supported:
* Dumps:
 - routes
 - nexthops / nexthop groups
 - interfaces
 - interface addresses
 - neighbors (arp/ndp)
* Notifications:
 - interface arrival/departure
 - interface address arrival/departure
 - route addition/deletion
* Modifications:
 - adding/deleting routes
 - adding/deleting nexthops/nexthops groups
 - adding/deleting neghbors
 - adding/deleting interfaces (basic support only)
* Rtsock interaction
 - route events are bridged both ways

The implementation also supports the NETLINK_GENERIC family framework.

Implementation notes:
Netlink is implemented via loadable/unloadable kernel module,
 not touching many kernel parts.
Each netlink socket uses dedicated taskqueue to support async operations
 that can sleep, such as interface creation. All message processing is
 performed within these taskqueues.

Compatibility:
Most of the Netlink data models specified above maps to FreeBSD concepts
 nicely. Unmodified ip(8) binary correctly works with
interfaces, addresses, routes, nexthops and nexthop groups. Some
software such as net/bird require header-only modifications to compile
and work with FreeBSD netlink.

Reviewed by:	imp
Differential Revision: https://reviews.freebsd.org/D36002
MFC after:	2 months
2022-10-01 14:15:35 +00:00
Zhenlei Huang
8707cb19e6 if_vxlan(4): Check the size of data available in mbuf before using them
PR:		261711
Reviewed by:	kp
Differential Revision:	https://reviews.freebsd.org/D36794
2022-09-30 09:56:15 +02:00
Alexander V. Chernikov
04a32b802e if_epair: refactor interface creation and enqueue code.
* Factor out queue selection (epair_select_queue()) and mbuf
 preparation (epair_prepare_mbuf()) from epair_menq(). It simplifies
 epair_menq() implementation and reduces the amount of dependencies
 on the neighbouring epair.
* Use dedicated epair_set_state() instead of 2-lines copy-paste
* Factor out unit selection code (epair_handle_unit()) from
 epair_clone_create(). It simplifies the clone creation logic.

Reviewed By: kp
Differential Revision: https://reviews.freebsd.org/D36689
2022-09-27 13:34:19 +00:00
Kristof Provost
76e1c9c671 if_ovpn: fix address family check when traffic class bits are set
When the tunneled (IPv6) traffic had traffic class bits set (but only >=
16) the packet got lost on the receive side.

This happened because the address family check in ovpn_get_af() failed
to mask correctly, so the version check didn't match, causing us to drop
the packet.

While here also extend the existing 6-in-6 test case to trigger this
issue.

PR:		266598
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-09-26 13:54:20 +02:00
Alexander V. Chernikov
9a7c520a78 ifp: add if_setdescr() / if_freedesrt() methods
Add methods for setting and removing the description from the interface,
 so the external users can manage it without using ioctl API.

MFC after:      2 weeks
2022-09-24 19:42:42 +00:00
Alexander V. Chernikov
26c190d280 if_clone: add ifc_link_ifp() / ifc_unlink_ifp() to the KPI
Factor cloner ifp addition/deletion into separate functions and
 make them public. This change simlifies the current cloner code
 and paves the way to the other upcoming cloner / epair changes.

MFC after:	2 weeks
2022-09-24 19:42:42 +00:00
Alexander V. Chernikov
91ebcbe02a if_clone: migrate some consumers to the new KPI.
Convert most of the cloner customers who require custom params
 to the new if_clone KPI.

Reviewed by:	kp
Differential Revision:	https://reviews.freebsd.org/D36636
MFC after:	2 weeks
2022-09-22 12:30:09 +00:00
Alexander V. Chernikov
09ee0fc023 if_clone: rework cloning KPI
The current cloning KPI does not provide a way of creating interfaces
 with parameres from within kernel. The reason is that those parameters
 are passed as an opaque pointer and it is not possible to specify whether
 this pointer references kernel-space or user-space.
Instead of just adding a flag, generalise the KPI to simplify the
 extension process. Unify current notion of `SIMPLE` and `ADVANCED` users
 by leveraging newly-added IFC_C_AUTOUNIT flag to automatically pick
 unit number, which is a primary feature of the "SIMPLE" KPI.
Use extendable structures everywhere instead of passing function
 pointers or parameters.
Isolate all parts of the oldKPI under `CLONE_COMPAT_13` so it can be safely
 merged back to 13. Old KPI will be removed after the merge.

Differential Revision: https://reviews.freebsd.org/D36632
MFC after:	2 weeks
2022-09-22 10:18:31 +00:00
Alexander V. Chernikov
12aeeb9190 epair: deduplicate interface allocation code #1
Simplify epair_clone_create() and epair_clone_destroy() by
 factoring out epair softc allocation / desctruction and
 interface setup/teardown into separate functions.

Reviewed By: kp, zlei.huang_gmail.com
Differential Revision: https://reviews.freebsd.org/D36614
MFC after:	2 weeks
2022-09-22 09:06:06 +00:00
Kristof Provost
d99d59a79f if_ovpn: fix memory leak on unload
When we're unloading the if_ovpn module we sometimes end up only freeing
the softc after the module is unloaded and the M_OVPN malloc type no
longer exists.

Don't return from ovpn_clone_destroy() until the epoch callbacks have
been called, which ensures that we've freed the softc before we destroy
M_OVPN.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-09-21 21:44:59 +02:00
Kristof Provost
9dfbbc919f if_ovpn: remove incorrect rounding up of packet sizes
The ciphers used by OpenVPN (DCO) do not require data to be block-sized.
Do not round up to AES_BLOCK_LEN, as this can lead to issues with
fragmented packets.

Reported by:	Gert Doering <gert@greenie.muc.de>
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-09-21 21:44:59 +02:00
Mateusz Guzik
3212ad15ab Add getsock
All but one consumers of getsock_cap only pass 4 arguments.
Take advantage of it.
2022-09-10 19:47:47 +00:00
Mateusz Guzik
0b70e3e78b net: add pfil_mbuf_{in,out}
This shaves a lot of branching due to MEMPTR flag.

Reviewed by:	glebius
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D36454
2022-09-08 16:20:43 +00:00
Alexander V. Chernikov
e762417077 routing: constantify nh/nhg argument in <nhop|nhgrp>_get_origin().
MFC after:	1 month
2022-09-08 10:21:25 +00:00
Alexander V. Chernikov
000250be0d routing: add abitity to set the protocol that installed route/nexthop.
Routing daemons such as bird need to know if they install certain route
 so they can clean it up on startup, as a form of achieving consistent
 state during the crash recovery.
Currently they use combination of routing flags (RTF_PROTO1) to detect
 these routes when interacting via route(4) rtsock protocol.
Netlink protocol has a special "rtm_protocol" field that is filled and
 checked by the route originator. To prepare for the upcoming netlink
 introduction, add ability to record origing to both nexthops and
 nexthop groups via <nhop|nhgrp>_<get|set>_origin() KPI. The actual
 calls will be used in the followup commits.

MFC after:	1 month
2022-09-08 09:18:32 +00:00
Mateusz Guzik
14c9a2dbfb net: retire PFIL_FWD
It is now unused and not having it allows further clean ups.

Reviewed by:	cy, glebius, kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D36452
2022-09-07 10:04:31 +00:00
Mateusz Guzik
223a73a1c4 net: remove stale altq_input reference
Code setting it was removed in:
commit 325fab802e
Author: Eric van Gyzen <vangyzen@FreeBSD.org>
Date:   Tue Dec 4 23:46:43 2018 +0000

    altq: remove ALTQ3_COMPAT code

Reviewed by:	glebius, kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D36471
2022-09-07 10:03:12 +00:00
Gleb Smirnoff
ddc0ed5836 loop(4): make interface methods static 2022-09-05 10:29:10 -07:00
Alexander V. Chernikov
4bccbf03d8 routing: allow logging framework to be used outside of the subsystem
MFC after:	2 weeks
2022-09-05 10:44:27 +00:00
Gordon Bergling
8a153724cd bpf(3): Grammar fix for a source code comment
- s/that that/that the/

MFC after:	3 days
2022-09-04 17:30:05 +02:00
Gordon Bergling
028ecc7aa1 netisr(9): Fix a typo in a source code comment
- s/overriden/overridden/

MFC after:	3 days
2022-09-03 15:04:15 +02:00
Gleb Smirnoff
e18c5816ea domains: use queue(9) SLIST for linked list of domains 2022-08-29 19:15:01 -07:00
Alexander V. Chernikov
177f04d57f routing: constantify @rc in rib_decompose_notification().
Clarify the @rc immutability by explicitly marking @rc const.

MFC after:	2 weeks
2022-08-29 18:12:24 +00:00
Alexander V. Chernikov
7b3440fc30 Revert "routing: install prefix and loopback routes using new nhop-based KPI."
Temporarily revert the commit to unblock testing.

This reverts commit a1b59379db.
2022-08-29 16:20:42 +00:00
Alexander V. Chernikov
578a99c939 routing: improve multiline debug
Add IF_DEBUG_LEVEL() macro to ensure all debug output preparation
 is run only if the current debug level is sufficient. Consistently
 use it within routing subsystem.

MFC after:	2 weeks
2022-08-29 15:14:49 +00:00
Alexander V. Chernikov
fe05d1dd0f routing: extend nhop(9) kpi
* add nhop_get_unlinked() used to prepare referenced but not
 linked nexthop, that can later be used as a clone source.
* add nhop_check_gateway() to check for allowed address family
  combinations between the rib family and neighbor family (useful
  for 4o6 or direct routes)
* add nhop_set_upper_family() to allow copying IPv6 nexthops to
 IPv4 rib.
* add rt_get_rnd() wrapper, returning both nexthop/group and its
 weight attached to the rtentry.
* Add CHT_SLIST_FOREACH_SAFE(), allowing to delete items during
  iteration.

MFC after:	2 weeks
2022-08-29 14:46:03 +00:00
Alexander V. Chernikov
c24a8f19c5 routing: fix rib_add_route_px()
Fix panic in newly-added rib_add_route_px() by removin unlocked
 prefix lookup.

MFC after:	2 weeks
2022-08-29 12:57:47 +00:00
Alexander V. Chernikov
db4ca19002 routing: add ability to store opaque indentifiers in nhops/nhgs
This is a pre-requisite for the direct nexthop/nexhop group operations
 via netlink.

MFC after:	2 weeks
2022-08-29 12:20:28 +00:00
Alexander V. Chernikov
6d4f6e4c70 routing: make rib_add_redirect() use new nhop-based KPI
MFC after:		1 month
Differential Revision:	https://reviews.freebsd.org/D36169
2022-08-29 10:23:26 +00:00
Alexander V. Chernikov
d8b2693414 routing: add rib_add_default_route() wrapper
Multiple consumers in the kernel space want to install IPv4 or IPv6
 default route. Provide convenient wrapper to simplify the code
 inside the customers.

MFC after:		1 month
Differential Revision:	https://reviews.freebsd.org/D36167
2022-08-29 10:08:24 +00:00
Alexander V. Chernikov
a1b59379db routing: install prefix and loopback routes using new nhop-based KPI.
Construct the desired hexthops directly instead of using the
 "translation" layer in form of filling rt_addrinfo data.
Simplify V_rt_add_addr_allfibs handling by using recently-added
 rib_copy_route() to propagate the routes to the non-primary address
 fibs.

MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D36166
2022-08-29 10:07:58 +00:00
Luiz Amaral
485be9798a pfsync: replace struct pfsync_pkt with int flags
Get rid of struct pfsync_pkt. It was used to store data on the stack to
pass to all the submessage handlers, but only the flags part of it was
ever used. Just pass the flags directly instead.

Reviewed by:		kp
Obtained from:		OpenBSD
Sponsored by:		InnoGames GmbH
Differential Revision:	https://reviews.freebsd.org/D36294
2022-08-22 23:46:50 +02:00
Mateusz Guzik
497240def8 Retire clone_drain_lock
It is only ever xlocked in drain_dev_clone_events and the only consumer of
that routine does not need it -- eventhandler code already makes sure the
relevant callback is no longer running.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D36268
2022-08-20 09:44:05 +00:00
Gleb Smirnoff
e7d02be19d protosw: refactor protosw and domain static declaration and load
o Assert that every protosw has pr_attach.  Now this structure is
  only for socket protocols declarations and nothing else.
o Merge struct pr_usrreqs into struct protosw.  This was suggested
  in 1996 by wollman@ (see 7b187005d1), and later reiterated
  in 2006 by rwatson@ (see 6fbb9cf860).
o Make struct domain hold a variable sized array of protosw pointers.
  For most protocols these pointers are initialized statically.
  Those domains that may have loadable protocols have spacers. IPv4
  and IPv6 have 8 spacers each (andre@ dff3237ee5).
o For inetsw and inet6sw leave a comment noting that many protosw
  entries very likely are dead code.
o Refactor pf_proto_[un]register() into protosw_[un]register().
o Isolate pr_*_notsupp() methods into uipc_domain.c

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D36232
2022-08-17 11:50:32 -07:00
Gleb Smirnoff
1922eb3e9c protosw: retire pr_slowtimo and pr_fasttimo
They were useful many years ago, when the callwheel was not efficient,
and the kernel tried to have as little callout entries scheduled as
possible.

Reviewed by:		tuexen, melifaro
Differential revision:	https://reviews.freebsd.org/D36163
2022-08-17 11:50:31 -07:00
Mateusz Guzik
88a782fc84 routing: G/C rt_exportinfo declaration
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-08-15 20:38:31 +00:00
Alexander V. Chernikov
036f1bc613 routing: retire rib_lookup_info()
This function was added in pre-epoch era ( 9a1b64d5a0 ) to
 provide public rtentry access interface & hide rtentry internals.
The implementation is based on the large on-stack copying and
 refcounting of the referenced objects (ifa/ifp).
It has become obsolete after epoch & nexthop introduction. Convert
 the last remaining user and remove the function itself.

Differential Revision: https://reviews.freebsd.org/D36197
2022-08-15 06:46:30 +00:00
Alexander V. Chernikov
730bfa2805 routing: add rib_match_gw() helper
Finish 02e05b8fae:
* add gateway matcher function that can be used in rib_del_route_px()
 or any rib_walk-family functions. It will be used in the upcoming
 migration to the new KPI
* rename gw_fulter_func to match_gw_one() to better signal the
 function purpose / semantic.

MFC after:	1 month
2022-08-12 09:31:21 +00:00
Mateusz Guzik
f73e4f6c58 routing: unbreak the build of a bunch of kernels
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-08-11 21:50:37 +00:00
Alexander V. Chernikov
d8b42ddcac rtsock: subscribe to ifnet eventhandlers instead of direct calls.
Stop treating rtsock as a "special" consumer and use already-provided
 ifaddr arrival/departure notifications.

MFC after:	2 weeks

Test Plan:
```
21:05 [0] m@devel0 route -n monitor

-> ifconfig vtnet0.2 create

got message of size 24 on Tue Aug  9 21:05:44 2022
RTM_IFANNOUNCE: interface arrival/departure: len 24, if# 3, what: arrival

got message of size 168 on Tue Aug  9 21:05:54 2022
RTM_IFINFO: iface status change: len 168, if# 3, link: up, flags:<BROADCAST,RUNNING,SIMPLEX,MULTICAST>

-> ifconfig vtnet0.2 destroy

got message of size 24 on Tue Aug  9 21:05:54 2022
RTM_IFANNOUNCE: interface arrival/departure: len 24, if# 3, what: departure

```

Reviewed By: glebius
Differential Revision: https://reviews.freebsd.org/D36095
MFC after:	2 weeks
2022-08-11 20:36:59 +00:00
Gleb Smirnoff
f63cb32c19 Retire 4.4BSD raw sockets
Until today the remnants of the original code had provided some aid
in implementation of routing socket and IPSEC key socket.  There were
more obfuscation rather than generalisation with this aid.

A historical reference on the original idea of the raw sockets can
be found in chapter 11 of 4.4BSD System Manager Manual:

https://raw.githubusercontent.com/sergev/4.4BSD-Lite2/master/usr/share/doc/smm/18.net.pdf

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D36124
2022-08-11 09:19:36 -07:00
Gleb Smirnoff
36b10ac2cd rtsock: do not use raw socket code
This makes routing socket implementation self contained and removes one
of the last dependencies on the raw socket code and pr_output method.

There are very subtle API visible changes:
- now routing socket would return EOPNOTSUPP instead of EINVAL on
  syscalls that are not supposed to be called on a routing socket.
- routing socket buffer sizes are now controlled by net.rtsock
  sysctls instead of net.raw.  The latter were not documented
  anywhere, and even Internet search doesn't find any references
  or discussions related to these sysctls.

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D36122
2022-08-11 09:19:36 -07:00
Gleb Smirnoff
d94ec7490d rtsock: do not allocate mbufs_tags(9) just to store a 8-bit value
Use local storage of the mbuf packet header instead.

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D36121
2022-08-11 09:19:36 -07:00
Gleb Smirnoff
b8103ca76d netinet: get interface event notifications directly via EVENTHANDLER(9)
The old mechanism of getting them via domains/protocols control input
is a relict from the previous century, when nothing like EVENTHANDLER(9)
existed yet.  Retire PRC_IFDOWN/PRC_IFUP as netinet was the only one
to use them.

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D36116
2022-08-11 09:19:36 -07:00
Mateusz Guzik
69077c81e5 routing: fix non-debug build
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-08-11 14:12:59 +00:00
Alexander V. Chernikov
40503b792f routing: populate fibs with interface routes after growing net.fibs.
Currently it is possible to extend number of fibs in runtime, but this
 functionality is of limited use when net.add_addrs_all_fibs is
 non-zero, as the routing tables are created empty.

This change automatically populate newly-created fibs with the kernel-originated
 interface routes (filtered by RTF_PINNED flag) if net.add_addrs_all_fibs
 is set.

```
-> sysctl net.add_addr_allfibs=1
net.add_addr_allfibs: 0 -> 1
-> sysctl net.fibs
net.fibs: 2
-> sysctl net.fibs=3
net.fibs: 2 -> 3

BEFORE:
-> setfib 2 netstat -rn
Routing tables (fib: 2)

AFTER:
-> setfib 2 netstat -rn
Routing tables (fib: 2)

Internet:
Destination        Gateway            Flags     Netif Expire
10.0.0.0/24        link#1             U        vtnet0
10.0.0.5           link#1             UHS         lo0
127.0.0.1          link#2             UH          lo0

Internet6:
Destination                       Gateway                       Flags     Netif Expire
::1                               link#2                        UHS         lo0
2a01:4f9:3a:fa00::/64             link#1                        U        vtnet0
2a01:4f9:3a:fa00:5054:ff:fe15:4a3b link#1                       UHS         lo0
fe80::%vtnet0/64                  link#1                        U        vtnet0
fe80::5054:ff:fe15:4a3b%vtnet0    link#1                        UHS         lo0
fe80::%lo0/64                     link#2                        U           lo0
fe80::1%lo0                       link#2                        UHS         lo0
```

Differential Revision: https://reviews.freebsd.org/D36075
MFC after:	1 month
2022-08-11 12:48:08 +00:00
Alexander V. Chernikov
02e05b8fae routing: fixup empty mask prefix handling after 2ce553854c.
MFC after: 1 month
2022-08-11 12:48:04 +00:00
Alexander V. Chernikov
258828d03b routing: fix build warning without ROUTE_MPATH
Reported by:	Gary Jennejohn <garyj@gmx.de>
MFC after:	1 month
2022-08-11 09:47:26 +00:00
Kristof Provost
fd6b3bede5 if_ovpn: reject non-UDP sockets
We must ensure that the fd provided by userspace is really for a UDP
socket. If it's not we'll panic in udp_set_kernel_tunneling().

Reported by:	Gert Doering <gert@greenie.muc.de>
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-08-11 10:40:03 +02:00
Alexander V. Chernikov
685866bbe1 routing: fix build without ROUTE_MPATH
MFC after:	1 month
2022-08-10 20:45:22 +00:00
Alexander V. Chernikov
5c4d2252d7 routing: move rtentry and subscription code out of route_ctl.c
route_ctl.c size has grown considerably since initial introduction.
Factor out non-relevant parts:
* all rtentry logic, such as creation/destruction and accessors
 goes to net/route/route_rtentry.c
* all rtable subscription logic goes to net/route/route_subscription.c

Differential Revision: https://reviews.freebsd.org/D36074
MFC after:	1 month
2022-08-10 18:56:01 +00:00
Alexander V. Chernikov
2ce553854c routing: add rib_<add|del>_route_px() functions operating with nexthops.
This change adds public KPI to work with routes using pre-created
 nexthops, instead of using data from addrinfo structures. These
 functions will be later used for adding/deleting kernel-originated
 routes and upcoming netlink protocol.

As a part of providing this KPI, low-level route addition code has been
 reworked to provide more control over route creation or change.
 Specifically, a number of operation flags
 (RTM_F_<CREATE|EXCL|REPLACE|APPEND>) have been added, defining the
 desired behaviour the the route already exists (or not exists). This
 change required some changes in the multipath addition code, resulting
 in moving this code to route_ctl.c, rendering mpath_ctl.c empty.

Differential Revision: https://reviews.freebsd.org/D36073
MFC after:	1 month
2022-08-10 18:56:01 +00:00
Alexander V. Chernikov
66230639ce routing: split nexthop creation and rtentry creation.
This change is required for the upcoming introduction of the next
 nexhop-based operations KPI, as it will create rtentry and nexthops
 at different stages of route table modification.

Differential Revision: https://reviews.freebsd.org/D36072
MFC after:	2 weeks
2022-08-10 18:27:13 +00:00
Alexander V. Chernikov
dedeec1143 routing: refactor #2
* Use same filter func (rib_filter_f_t) for nexhtop groups to
 simplify callbacks.
* simplify conditional route deletion & remove the need to pass
 rt_addrinfo to the low-level deletion functions
* speedup rib_walk_del() by removing an additional per-prefix lookup

Differential Revision: https://reviews.freebsd.org/D36071
MFC after:	1 month
2022-08-10 18:20:21 +00:00
Alexander V. Chernikov
0d60e88b41 routing: refactor control cmds #1
This and the follow-up routing-related changes target to remove or
 reduce `struct rt_addrinfo` usage and use recently-landed nhop(9)
 KPI instead.
Traditionally `rt_addrinfo` structure has been used to propagate all necessary
information between the protocol/rtsock and a routing layer. Many
functions inside routing subsystem uses it internally. However, using
this structure became somewhat complicated, as there are too many ways
of specifying a single state and verifying data consistency is hard.
For example, arerouting flgs consistent with mask/gateway sockaddr pointers?
Is mask really a host mask? Are sockaddr "valid" (e.g. properly zeroed, masked,
have proper length)? Are they mutable? Is the suggested interface specified
 by the interface index embedded into the sockadd_dl gateway, or passed
 as RTAX_IFP parameter, or directly provided by rti_ifp or it needs to
 be derived from the ifa?
These (and other similar) questions have to be considered every time when
 a function has `rt_addrinfo` pointer as an argument.

The new approach is to bring more control back to the protocols and
construct the desired routing objects themselves - in the end, it's the
protocol/subsystem who knows the desired outcome.

This specific diff changes the following:
* add explicit basic low-level radix operations:
 add_route() (renamed from add_route_nhop())
 delete_route() (factored from change_route_nhop())
 change_route() (renamed from change_route_nhop)
* remove "info" parameter from change_route_conditional() as a part
 of reducing rt_addrinfo usage in the internal KPIs
* add lookup_prefix_rt() wrapper for doing re-lookups after
 RIB lock/unlock

Differential Revision: https://reviews.freebsd.org/D36070
MFC after:	2 weeks
2022-08-10 18:20:20 +00:00
Gordon Bergling
b2b1bb0410 debugnet: Fix a typo in a source code comment
- s/paramaters/parameters/

MFC after:	3 days
2022-08-07 16:07:01 +02:00
Alexander V. Chernikov
93dd3adac7 fib_algo: set vnet when destroying algo instance
Reported by:	Konrad Kręciwilk <konrad.kreciwilk@korbank.pl>
MFC after:	2 weeks
2022-08-06 12:51:22 +00:00
Mark Johnston
220818ac03 bpf: Fix BIOCPROMISC locking
BPF might put an interface in promiscuous mode when handling the
BIOCSDLT ioctl.  When this happens, a flag is set in the BPF descriptor
so that the old interface can be restored when the BPF descriptor is
destroyed.

The BIOCPROMISC ioctl can also be used to put a BPF descriptor's
interface into promiscuous mode, but there was nothing synchronizing the
flag.  Fix this by modifying the ioctl handler to acquire the global BPF
mutex, which is used to synchronize ifpromisc() calls elsewhere in BPF.

Reviewed by:	kp, melifaro
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D36045
2022-08-05 16:26:34 -04:00
Kristof Provost
8449762738 if_ovpn: fix unused functions with NOINET / NOINET6
ovpn_find_peer_by_ip() is not used if INET is not defined. Do not
define the function in that case. Same for ovpn_find_peer_by_ip6().

Fix these warnings:

	/usr/src/sys/net/if_ovpn.c:1580:1: warning: unused function 'ovpn_find_peer_by_ip' [-Wunused-function]
	ovpn_find_peer_by_ip(struct ovpn_softc *sc, const struct in_addr addr)
	^
	/usr/src/sys/net/if_ovpn.c:1599:1: warning: unused function 'ovpn_find_peer_by_ip6' [-Wunused-function]
	ovpn_find_peer_by_ip6(struct ovpn_softc *sc, const struct in6_addr *addr)
	^

Reported by:	mjg
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-08-04 14:00:32 +02:00
Alexander V. Chernikov
d46b000ecc routing: remove duplicate error message after 5c23343b8c.
MFC after:	2 weeks
2022-08-04 09:53:58 +00:00
Mateusz Guzik
412bdb5a46 route: fix NOIP builds
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-08-03 21:23:32 +00:00
Alexander V. Chernikov
ae6bfd12c8 routing: refactor private KPI
* Make nhgrp_get_nhops() return const struct weightened_nhop to
 indicate that the list is immutable
* Make nhgrp_get_group() return the actual group, instead of
 group+weight.

MFC after:	2 weeks
2022-08-01 10:02:12 +00:00
Alexander V. Chernikov
5c23343b8c routing: convert remnants of DPRINTF to FIB_CTL_LOG().
Convert the last remaining pieces of old-style debug messages
 to the new debugging framework.

Differential Revision: https://reviews.freebsd.org/D35994
MFC after:	2 weeks
2022-08-01 08:55:07 +00:00
Alexander V. Chernikov
800c68469b routing: add nhop(9) kpi.
Differential Revision: https://reviews.freebsd.org/D35985
MFC after:	1 month
2022-08-01 08:52:26 +00:00
Alexander V. Chernikov
29029b06a6 routing: remove info argument from add/change_route_nhop().
Currently, rt_addrinfo(info) serves as a main "transport" moving
 state between various functions inside the routing subsystem.
As all of the fields are filled in directly by the customers, it
 is problematic to maintain consistency, resulting in repeated checks
 inside many functions. Additionally, there are multiple ways of
 specifying the same value (RTAX_IFP vs rti_ifp / rti_ifa) and so on.
With the upcoming nhop(9) kpi it is possible to store all of the
 required state in the nexthops in the consistent fashion, reducing the
 need to use "info" in the KPI calls.
Finally, rt_addrinfo structure format was derived from the rtsock wire
 format, which is different from other kernel routing users or netlink.

This cleanup simplifies upcoming nhop(9) kpi and netlink introduction.

Reviewed by:	zlei.huang@gmail.com
Differential Revision: https://reviews.freebsd.org/D35972
MFC after:	2 weeks
2022-08-01 07:41:07 +00:00
Alexander V. Chernikov
97ffaff859 net: constantify radix.c functions
Mark dst/mask public API functions fields as const to clearly
 indicate that these parameters are not modified or stored in
 the datastructure.

Differential Revision: https://reviews.freebsd.org/D35971
MFC after:	2 weeks
2022-08-01 07:32:40 +00:00
Alexander V. Chernikov
2717e958df routing: move route expiration time to its nexthop
Expiration time is actually a path property, not a route property.
Move its storage to nexthop to simplify upcoming nhop(9) KPI changes
 and netlink introduction.

Differential Revision: https://reviews.freebsd.org/D35970
MFC after:	2 weeks
2022-08-01 07:26:53 +00:00
Alexander V. Chernikov
27f107e1b4 routing: add debug printing helpers for rtentry and RTM* cmds.
MFC after:	2 weeks
2022-07-31 09:01:42 +00:00
Zhenlei Huang
150486f6a9 Introduce and use the NET_EPOCH_DRAIN_CALLBACKS() macro
Reviewed by:	melifao, kp
Differential Revision:	https://reviews.freebsd.org/D35968
2022-07-29 21:21:10 +02:00
James Skon
13890d30f8 altq: improve pfctl config time for large numbers of queues
In the current implementation of altq_hfsc.c, whne new queues are being
added (by pfctl), each queue is added to the tail of the siblings linked
list under the parent queue.

On a system with many queues (50,000+) this leads to very long load
times at the insertion process must scan the entire list for every new
queue,

Since this list is unordered, this changes merely adds the new queue to
the head of the list rather than the tail.

Reviewed by:	kp
MFC after:	3 weeks
Sponsored by:	RG Nets
Differential Revision:	https://reviews.freebsd.org/D35964
2022-07-28 22:00:07 +02:00
Andrew Gallatin
713ceb99b6 lagg: fix lagg ifioctl after SIOCSIFCAPNV
Lagg was broken by SIOCSIFCAPNV when all underlying devices
support SIOCSIFCAPNV.  This change updates lagg to work with
SIOCSIFCAPNV and if_capabilities2.

Reviewed by: kib, hselasky
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D35865
2022-07-28 10:39:00 -04:00
Dimitry Andric
5e1097f83c Adjust function definitions in route_ctl.c to avoid clang 15 warnings
With clang 15, the following -Werror warnings are produced:

    sys/net/route/route_ctl.c:130:17: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
    vnet_rtzone_init()
                    ^
                     void
    sys/net/route/route_ctl.c:139:20: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
    vnet_rtzone_destroy()
                       ^
                        void

This is because vnet_rtzone_init() and vnet_rtzone_destroy() are
declared with (void) argument lists, but defined with empty argument
lists. Make the definitions match the declarations.

MFC after:	3 days
2022-07-26 21:25:09 +02:00
Dimitry Andric
a8adf13a63 Adjust function definition in nhop_ctl.c to avoid clang 15 warnings
With clang 15, the following -Werror warning is produced:

    sys/net/route/nhop_ctl.c:508:21: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
    alloc_nhop_structure()
                        ^
                         void

This is alloc_nhop_structure() is declared with a (void) argument list,
but defined with an empty argument list. Make the definition match the
declaration.

MFC after:	3 days
2022-07-26 21:25:09 +02:00
Kristof Provost
151abc80cd if_vlan: avoid hash table thrashing when adding and removing entries
vlan_remhash() uses incorrect value for b.

When using the default value for VLAN_DEF_HWIDTH (4), the VLAN hash-list table
expands from 16 chains to 32 chains as the 129th entry is added. trunk->hwidth
becomes 5. Say a few more entries are added and there are now 135 entries.
trunk-hwidth will still be 5. If an entry is removed, vlan_remhash() will
calculate a value of 32 for b. refcnt will be decremented to 134. The if
comparison at line 473 will return true and vlan_growhash() will be called. The
VLAN hash-list table will be compressed from 32 chains wide to 16 chains wide.
hwidth will become 4. This is an error, and it can be seen when a new VLAN is
added. The table will again be expanded. If an entry is then removed, again
the table is contracted.

If the number of VLANS stays in the range of 128-512, each time an insert
follows a remove, the table will expand. Each time a remove follows an
insert, the table will be contracted.

The fix is simple. The line 473 should test that the number of entries has
decreased such that the table should be contracted using what would be the new
value of hwidth. line 467 should be:

	b = 1 << (trunk->hwidth - 1);

PR:		265382
Reviewed by:	kp
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
2022-07-22 19:18:41 +02:00
Dimitry Andric
0294e95da4 Fix unused variable warning in iflib.c
With clang 15, the following -Werror warning is produced:

    sys/net/iflib.c:993:8: error: variable 'n' set but not used [-Werror,-Wunused-but-set-variable]
            u_int n;
                  ^

The 'n' variable appears to have been a debugging aid that has never
been used for anything, so remove it.

MFC after:	3 days
2022-07-21 21:19:39 +02:00
Dimitry Andric
fa267a329f Fix unused variable warning in if_lagg.c
With clang 15, the following -Werror warning is produced:

    sys/net/if_lagg.c:2413:6: error: variable 'active_ports' set but not used [-Werror,-Wunused-but-set-variable]
            int active_ports = 0;
                ^

The 'active_ports' variable appears to have been a debugging aid that
has never been used for anything (ref https://reviews.freebsd.org/D549),
so remove it.

MFC after:	3 days
2022-07-21 21:05:51 +02:00
Kristof Provost
663f556b03 if_vlan: allow vlan and vlanproto to be changed
It's currently not possible to change the vlan ID or vlan protocol (i.e.
802.1q vs. 802.1ad) without de-configuring the interface (i.e. ifconfig
vlanX -vlandev).
Add a specific flow for this, allowing both the protocol and id (but not
parent interface) to be changed without going through the '-vlandev'
step.

Reviewed by:	glebius
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D35846
2022-07-21 18:36:01 +02:00
Mitchell Horne
c84c5e00ac ddb: annotate some commands with DB_CMD_MEMSAFE
This is not completely exhaustive, but covers a large majority of
commands in the tree.

Reviewed by:	markj
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D35583
2022-07-18 22:06:09 +00:00
Mike Karels
efe58855f3 IPv4: experimental changes to allow net 0/8, 240/4, part of 127/8
Combined changes to allow experimentation with net 0/8 (network 0),
240/4 (Experimental/"Class E"), and part of the loopback net 127/8
(all but 127.0/16).  All changes are disabled by default, and can be
enabled by the following sysctls:

    net.inet.ip.allow_net0=1
    net.inet.ip.allow_net240=1
    net.inet.ip.loopback_prefixlen=16

When enabled, the corresponding addresses can be used as normal
unicast IP addresses, both as endpoints and when forwarding.

Add descriptions of the new sysctls to inet.4.

Add <machine/param.h> to vnet.h, as CACHE_LINE_SIZE is undefined in
various C files when in.h includes vnet.h.

The proposals motivating this experimentation can be found in

    https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-0
    https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-240
    https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-127

Reviewed by:	rgrimes, pauamma_gundo.com; previous versions melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D35741
2022-07-13 09:46:05 -05:00
Kristof Provost
59219dde9a if_ovpn: fix mbuf leak
If the link is down or we can't find a peer we do not transmit the
packet, but also don't fee it.

Remember to m_freem() mbufs we can't transmit.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-07-12 14:19:25 +02:00
Zhenlei Huang
7f7a804ae0 vxlan: Add support for socket ioctls SIOC[SG]TUNFIB
Submitted by: Luiz Amaral <email@luiz.eng.br>
PR: 244004
Differential Revision:	https://reviews.freebsd.org/D32820
MFC after:	2 weeks
2022-07-08 18:14:19 +00:00
Kristof Provost
37f604b49d vnet: make VNET_FOREACH() always be a loop
VNET_FOREACH() is a LIST_FOREACH if VIMAGE is set, but empty if it's
not. This means that users of the macro couldn't use 'continue' or
'break' as one would expect of a loop.

Change VNET_FOREACH() to be a loop in all cases (although one that is
fixed to one iteration if VIMAGE is not set).

Reviewed by:	karels, melifaro, glebius
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D35739
2022-07-07 09:52:21 +02:00
Kristof Provost
6ba6c05cb2 if_ovpn: deal with short packets
If we receive a UDP packet (directed towards an active OpenVPN socket)
which is too short to contain an OpenVPN header ('struct
ovpn_wire_header') we wound up making m_copydata() read outside the
mbuf, and panicking the machine.

Explicitly check that the packet is long enough to copy the data we're
interested in. If it's not we will pass the packet to userspace, just
like we'd do for an unknown peer.

Extend a test case to provoke this situation.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-07-05 19:27:00 +02:00
Mitchell Horne
258958b3c7 ddb: use _FLAGS command macros where appropriate
Some command definitions were forced to use DB_FUNC in order to specify
their required flags, CS_OWN or CS_MORE. Use the new macros to simplify
these.

Reviewed by:	markj, jhb
MFC after:	3 days
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D35582
2022-07-05 11:56:55 -03:00
Mateusz Guzik
db4b40213a routing: hide notify_add and notify_del behind ROUTE_MPATH
Fixes a warn about unused routines without the option.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-07-04 08:38:13 +00:00
Gordon Bergling
e8b7972cfe if_clone: Fix a typo in a source code comment
- s/fucntions/functions/

MFC ater:	3 days
2022-07-03 15:13:32 +02:00
Kristof Provost
6c77f8f0e0 if_ovpn: handle m_pullup() failure
Ensure we correctly handle m_pullup() failing in ovpn_finish_rx().

Reported by:	Coverity (CID 1490340)
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-07-01 10:02:32 +02:00
Kristof Provost
9f7c81eb33 if_ovpn: deal with v4 mapped IPv6 addresses
Openvpn defaults to binding to IPv6 sockets (with
setsockopt(IPV6_V6ONLY=0)), which we didn't deal with.
That resulted in us trying to in6_selectsrc_addr() on a v4 mapped v6
address, which does not work.

Instead we translate the mapped address to v4 and treat it as an IPv4
address.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-07-01 10:02:32 +02:00
Kristof Provost
b33308db39 if_ovpn: static probe points
Sprinkle a few SDTs around if_ovpn to ease debugging.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-06-28 13:50:54 +02:00
Kristof Provost
ab91feabcc ovpn: Introduce OpenVPN DCO support
OpenVPN Data Channel Offload (DCO) moves OpenVPN data plane processing
(i.e. tunneling and cryptography) into the kernel, rather than using tap
devices.
This avoids significant copying and context switching overhead between
kernel and user space and improves OpenVPN throughput.

In my test setup throughput improved from around 660Mbit/s to around
2Gbit/s.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D34340
2022-06-28 11:33:10 +02:00
Alexander V. Chernikov
8010b7a78a routing: simplify decompose_change_notification().
The function's goal is to compare old/new nhop/nexthop group for the route
 and decompose it into the series of RTM_ADD/RTM_DELETE single-nhop
 events, calling specified callback for each event.
Simplify it by properly leveraging the fact that both old/new groups
 are sorted nhop-# ascending.

Tested by:	Claudio Jeker<claudio.jeker@klarasystems.com>
Differential Revision: https://reviews.freebsd.org/D35598
MFC after: 2 weeks
2022-06-27 17:30:52 +00:00
Alexander V. Chernikov
76f1ab8eff routing: actually sort nexthops in nhgs by their index
Nexthops in the nexthop groups needs to be deterministically sorted
 by some their property to simplify reporting cost when changing
 large nexthop groups.

Fix reporting by actually sorting next hops by their indices (`wn_cmp_idx()`).
As calc_min_mpath_slots_fast() has an assumption that next hops are sorted
using their relative weight in the nexthop groups, it needs to be
addressed as well. The latter sorting is required to quickly determine the
layout of the next hops in the actual forwarding group. For example,
what's the best way to split the traffic between nhops with weights
19,31 and 47 if the maximum nexthop group width is 64?
It is worth mentioning that such sorting is only required during nexthop
group creation and is not used elsewhere. Lastly, normally all nexthop
are of the same weight. With that in mind, (a) use spare 32 bytes inside
`struct weightened_nexthop` to avoid another memory allocation and
(b) use insertion sort to sort the nexthop weights.

Reported by:	thj
Tested by:	Claudio Jeker<claudio.jeker@klarasystems.com>
Differential Revision: https://reviews.freebsd.org/D35599
MFC after:	2 weeks
2022-06-27 17:30:52 +00:00
Kristof Provost
1865ebfb12 if_bridge: change MTU for new members
Rather than reject new bridge members because they have the wrong MTU
change it to match the bridge. If that fails, reject the new interface.

PR:	264883
Different Revision:	https://reviews.freebsd.org/D35597
2022-06-27 08:27:27 +02:00
Alexander V. Chernikov
33a0803f00 routing: fix debug headers added in 6fa8ed43ee #2.
Move debug declaration out of COMPAT_FREEBSD32 in rtsock.c

MFC after: 2 weeks
2022-06-26 07:28:15 +00:00
Alexander V. Chernikov
0e87bab6b4 routing: fix debug headers added in 6fa8ed43ee.
- move debug headers out of COMPAT_FREEBSD32 in rtsock.c
- remove accidentally-added LOG_ defines from syslog.h

MFC after:	2 weeks
2022-06-25 23:05:25 +00:00
Alexander V. Chernikov
76179e400a routing: fix syslog include for rtsock.c
MFC after:	2 weeks
2022-06-25 22:08:10 +00:00
Alexander V. Chernikov
6fa8ed43ee routing: improve debugging.
Use unified guidelines for the severity across the routing subsystem.
Update severity for some of the already-used messages to adhere the
guidelines.
Convert rtsock logging to the new FIB_ reporting format.

MFC after:	2 weeks
2022-06-25 19:53:31 +00:00
Alexander V. Chernikov
c260d5cd8e routing: fix crash when RTM_CHANGE results in no-op for the multipath
route.

Reporting logic assumed there is always some nhop change for every
 successful modification operation. Explicitly check that the changed
 nexthop indeed exists when reporting back to userland.

MFC after:	2 weeks
Reported by:	Claudio Jeker <claudio.jeker@klarasystems.com>
Tested by:	Claudio Jeker <claudio.jeker@klarasystems.com>
2022-06-25 19:35:09 +00:00
Alexander V. Chernikov
c38da70c28 routing: fix RTM_CHANGE nhgroup updates.
RTM_CHANGE operates on a single component of the multipath route (e.g. on a single nexthop).
Search of this nexthop is peformed by iterating over each component from multipath (nexthop)
 group, using check_info_match_nhop. The problem with the current code that it incorrectly
 assumes that `check_info_match_nhop()` returns true value on match, while in reality it
 returns an error code on failure). Fix this by properly comparing the result with 0.
Additionally, the followup code modified original necthop group instead of a new one.
Fix this by targetting new nexthop group instead.

Reported by:	thj
Tested by:	Claudio Jeker <claudio.jeker@klarasystems.com>
Differential Revision: https://reviews.freebsd.org/D35526
MFC after: 2 weeks
2022-06-25 18:54:57 +00:00
Alexander V. Chernikov
5d6894bd66 routing: improve debug logging
Use standard logging (FIB_XX_LOG) across nhg code instead of using
 old-style DPRINTFs.
 Add debug object printer for nhgs (`nhgrp_print_buf`).

Example:

```
Jun 19 20:17:09 devel2 kernel: [nhgrp] inet.0 nhgrp_ctl_alloc_default: multipath init done
Jun 19 20:17:09 devel2 kernel: [nhg_ctl] inet.0 alloc_nhgrp: num_nhops: 2, compiled_nhop: 2

Jun 19 20:17:26 devel2 kernel: [nhg_ctl] inet.0 alloc_nhgrp: num_nhops: 3, compiled_nhop: 3
Jun 19 20:17:26 devel2 kernel: [nhg_ctl] inet.0 destroy_nhgrp: destroying nhg#0/sz=2:[#6:1,#5:1]
```

Differential Revision: https://reviews.freebsd.org/D35525
MFC after: 2 weeks
2022-06-22 15:59:21 +00:00
Mark Johnston
60b4ad4b6b bpf: Zero pad bytes preceding BPF headers
BPF headers are word-aligned when copied into the store buffer.  Ensure
that pad bytes following the preceding packet are cleared.

Reported by:	KMSAN
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2022-06-20 12:48:13 -04:00
Mark Johnston
c88f6908b4 bpf: Correct a comment
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2022-06-20 12:48:13 -04:00
Kristof Provost
1f61367f8d pf: support matching on tags for Ethernet rules
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D35362
2022-06-20 10:16:20 +02:00
Mark Johnston
c262d5e877 debugnet: Fix an error handling bug in the DDB command tokenizer
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2022-06-16 10:05:10 -04:00
Mark Johnston
8414331481 debugnet: Handle batches of packets from if_input
Some drivers will collect multiple mbuf chains, linked by m_nextpkt,
before passing them to upper layers.  debugnet_pkt_in() didn't handle
this and would process only the first packet, typically leading to
retransmits.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2022-06-16 10:02:00 -04:00
Andrew Gallatin
43c72c45a1 lacp: Remove racy kassert
In lacp_select_tx_port_by_hash(), we assert that the selected port is
DISTRIBUTING. However, the port state is protected by the LACP_LOCK(),
which is not held around lacp_select_tx_port_by_hash().  So this
assertion is racy, and can result in a spurious panic when links
are flapping.

It is certainly possible to fix it by acquiring LACP_LOCK(),
but this seems like an early development assert, and it seems best
to just remove it, rather than add complexity inside an ifdef
INVARIANTS.

Sponsored by: Netflix
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D35396
2022-06-13 11:32:10 -04:00
Hans Petter Selasky
892eded5b8 vlan(4): Add support for allocating TLS receive tags.
The TLS receive tags are allocated directly from the receiving interface,
because mbufs are flowing in the opposite direction and then route change
checks are not useful, because they only work for outgoing traffic.

Differential revision:	https://reviews.freebsd.org/D32356
Sponsored by:	NVIDIA Networking
2022-06-07 12:54:42 +02:00
Hans Petter Selasky
1967e31379 lagg(4): Add support for allocating TLS receive tags.
The TLS receive tags are allocated directly from the receiving interface,
because mbufs are flowing in the opposite direction and then route change
checks are not useful, because they only work for outgoing traffic.

Differential revision:	https://reviews.freebsd.org/D32356
Sponsored by:	NVIDIA Networking
2022-06-07 12:54:42 +02:00
Gordon Bergling
4f493559b0 if_llatbl: Fix a typo in a debug statement
- s/droped/dropped/

Obtained from:	NetBSD
MFC after:	3 days
2022-06-04 15:22:09 +02:00
Gordon Bergling
f7faa4ad48 if_bridge(4): Fix a typo in a source code comment
- s/accross/across/

MFC after:	3 days
2022-06-04 11:26:01 +02:00
Arseny Smalyuk
d18b4bec98 netinet6: Fix mbuf leak in NDP
Mbufs leak when manually removing incomplete NDP records with pending packet via ndp -d.
It happens because lltable_drop_entry_queue() rely on `la_numheld`
counter when dropping NDP entries (lles). It turned out NDP code never
increased `la_numheld`, so the actual free never happened.

Fix the issue by introducing unified lltable_append_entry_queue(),
common for both ARP and NDP code, properly addressing packet queue
maintenance.

Reviewed By: melifaro
Differential Revision: https://reviews.freebsd.org/D35365
MFC after:	2 weeks
2022-05-31 21:06:14 +00:00
KUROSAWA Takahiro
d6cd20cc5c netinet6: fix ndp proxying
We could insert proxy NDP entries by the ndp command, but the host
with proxy ndp entries had not responded to Neighbor Solicitations.
Change the following points for proxy NDP to work as expected:
* join solicited-node multicast addresses for proxy NDP entries
  in order to receive Neighbor Solicitations.
* look up proxy NDP entries not on the routing table but on the
  link-level address table when receiving Neighbor Solicitations.

Reviewed By: melifaro
Differential Revision: https://reviews.freebsd.org/D35307
MFC after:	2 weeks
2022-05-30 10:53:33 +00:00
KUROSAWA Takahiro
77001f9b6d lltable: introduce the llt_post_resolved callback
In order to decrease ifdef INET/INET6s in the lltable implementation,
introduce the llt_post_resolved callback and implement protocol-dependent
code in the protocol-dependent part.

Reviewed By: melifaro
Differential Revision: https://reviews.freebsd.org/D35322
MFC after:	2 weeks
2022-05-30 10:53:33 +00:00
KUROSAWA Takahiro
3719dedb91 lltable: use sa_family_t instead of int for lltable.llt_af
Reviewed By: melifaro, #network
Differential Revision: https://reviews.freebsd.org/D35323
MFC after:	2 weeks
2022-05-30 10:53:33 +00:00
Konrad Sewiłło-Jopek
c9a5c48ae8 arp: Implement sticky ARP mode for interfaces.
Provide sticky ARP flag for network interface which marks it as the
"sticky" one similarly to what we have for bridges. Once interface is
marked sticky, any address resolved using the ARP will be saved as a
static one in the ARP table. Such functionality may be used to prevent
ARP spoofing or to decrease latencies in Ethernet networks.

The drawbacks include potential limitations in usage of ARP-based
load-balancers and high-availability solutions such as carp(4).

The implemented option is disabled by default, therefore should not
impact the default behaviour of the networking stack.

Sponsored by:		Conclusive Engineering sp. z o.o.
Reviewed By:		melifaro, pauamma_gundo.com
Differential Revision: https://reviews.freebsd.org/D35314
MFC after:		2 weeks
2022-05-27 12:41:30 +00:00
Konstantin Belousov
6a311e6fa5 Add ifcap2 names for RXTLS4 and RXTLS6 interface capabilities
and corresponding nvlist capabilities name strings.

Reviewed by:	hselasky, jhb, kp (previous version)
Sponsored by:	NVIDIA Networking
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D32551
2022-05-24 23:59:32 +03:00
Konstantin Belousov
051e7d78b0 Kernel-side infrastructure to implement nvlist-based set/get ifcaps
Reviewed by:	hselasky, jhb, kp (previous version)
Sponsored by:	NVIDIA Networking
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D32551
2022-05-24 23:59:32 +03:00
Konstantin Belousov
b96549f057 struct ifnet: add if_capabilities2 and if_capenable2 bitmasks
We are running out of bits in if_capabilities.

Suggested by:	jhb
Reviewed by:	hselasky, jhb, kp (previous version)
Sponsored by:	NVIDIA Networking
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D32551
2022-05-24 23:59:32 +03:00
Andrey V. Elsukov
f2ab916084 [vlan + lagg] add IFNET_EVENT_UPDATE_BAUDRATE event
use it to update if_baudrate for vlan interfaces created on the LACP lagg.

Differential revision:	https://reviews.freebsd.org/D33405
2022-05-20 06:38:43 +02:00
Mitchell Horne
a84bf5eaa1 debugnet: fix an errant assertion
We may call debugnet_free() before g_debugnet_pcb_inuse is true,
specifically in the cases where the interface is down or does not
support debugnet. pcb->dp_drv_input is used to hold the real driver
if_input callback while debugnet is in use, so we can check the status
of this field in the assertion.

This can be triggered trivially by trying to configure netdump on an
unsupported interface at the ddb prompt.

Initializing the dp_drv_input field to NULL explicitly is not necessary
but helps display the intent.

PR:		263929
Reported by:	Martin Filla <freebsd@sysctl.cz>
Reviewed by:	cem, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D35179
2022-05-14 10:27:53 -03:00
Kurosawa Takahiro
9573cc3555 rtsock: fix a stack overflow
struct sockaddr is not sufficient for buffer that can hold any
sockaddr_* structure. struct sockaddr_storage should be used.

Test:
ifconfig epair create
ifconfig epair0a inet6 add 2001:db8::1 up
ndp -s 2001:db8::2 02:86:98:2e:96:0b proxy # this triggers kernel stack overflow

Reviewed by:	markj, kp
Differential Revision:	https://reviews.freebsd.org/D35188
2022-05-13 20:05:36 +02:00
Kristof Provost
cbbce42345 epair: unbind prior to returning to userspace
If 'options RSS' is set we bind the epair tasks to different CPUs. We
must take care to not keep the current thread bound to the last CPU when
we return to userspace.

MFC after:	1 week
Sponsored by:	Orange Business Services
2022-05-07 18:17:33 +02:00
Kristof Provost
a6b0c8d04d epair: fix set but not used warning
If 'options RSS' is set.

MFC after:	1 week
Sponsored by:	Orange Business Services
2022-05-07 18:17:32 +02:00
Kristof Provost
868bf82153 if: avoid interface destroy race
When we destroy an interface while the jail containing it is being
destroyed we risk seeing a race between if_vmove() and the destruction
code, which results in us trying to move a destroyed interface.

Protect against this by using the ifnet_detach_sxlock to also covert
if_vmove() (and not just detach).

PR:		262829
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D34704
2022-05-06 13:55:08 +02:00
Gleb Smirnoff
51f798e761 netisr: serialize/restore m_pkthdr.rcvif when queueing mbufs
Reviewed by:		kp
Differential revision:	https://reviews.freebsd.org/D33268

(cherry picked from commit 6871de9363)
2022-05-05 14:38:07 -04:00
Gleb Smirnoff
4d7a1361ef ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif
Supplement ifindex table with generation count and use it to
serialize & restore an ifnet pointer.

Reviewed by:		kp
Differential revision:	https://reviews.freebsd.org/D33266
Fun note:		git show e6abef0918

(cherry picked from commit e1882428dc)
2022-05-05 14:38:07 -04:00
Gleb Smirnoff
80e60e236d ifnet: make if_index global
Now that ifindex is static to if.c we can unvirtualize it.  For lifetime
of an ifnet its index never changes.  To avoid leaking foreign interfaces
the net.link.generic.system.ifcount sysctl and the ifnet_byindex() KPI
filter their returned value on curvnet.  Since if_vmove() no longer
changes the if_index, inline ifindex_alloc() and ifindex_free() into
if_alloc() and if_free() respectively.

API wise the only change is that now minimum interface index can be
greater than 1.  The holes in interface indexes were always allowed.

Reviewed by:		kp
Differential revision:	https://reviews.freebsd.org/D33672

(cherry picked from commit 91f44749c6)
2022-05-05 14:38:07 -04:00
Marko Zec
d461deeaa4 VNET: Revert "ifnet: make if_index global"
This reverts commit 91f44749c6.

Devirtualization of V_if_index and V_ifindex_table was rushed into
the tree lacking proper context, discussion, and declaration of intent,
so I'm backing it out as harmful to VNET on the following grounds:

1) The change repurposed the decades-old and stable if_index KBI for
new, unclear goals which were omitted from the commit note.

2) The change opened up a new resource exhaustion vector where any vnet
could starve the system of ifnet indices, including vnet0.

3) To circumvent the newly introduced problem of separating ifnets
belonging to different vnets from the globalized ifindex_table, the
author introduced sysctl_ifcount() which does a linear traversal over
the (potentially huge) global ifnet list just to return a simple upper
bound on existing ifnet indices.

4) The change effectively led to nonuniform ifnet index allocation
among vnets.

5) The commit note clearly stated that the patch changed the implicit
if_index ABI contract where ifnet indices were assumed to be starting
from one.  The commit note also included a correct observation that
holes in interface indices were always allowed, but failed to declare
that the userland-observable ifindex tables could now include huge
empty spans even under modest operating conditions.

6) The author had an earlier proposal in the works which did not
affect per-vnet ifnet lists (D33265) but which he abandoned without
providing the rationale behind his decision to do so, at the expense
of sacrificing the vnet isolation contract and if_index ABI / KBI.

Furthermore, the author agreed to back out his changes himself and
to follow up with a proposal for a less intrusive alternative, but
later silently declined to act.  Therefore, I decided to resolve the
status-quo by backing this out myself.  This in no way precludes a
future proposal aiming to mitigate ifnet-removal related system
crashes or panics to be accepted, provided it would not unnecessarily
compromise the goal of as strict as possible isolation between vnets.

Obtained from: github.com/glebius/FreeBSD/commits/backout-ifindex
2022-05-03 19:27:57 +02:00