* Add link-state change notifications by subscribing to ifnet_link_event.
In the Linux netlink model, link state is reported in 2 places: first is
the IFLA_OPERSTATE, which stores state per RFC2863.
The second is an IFF_LOWER_UP interface flag. As many applications rely
on the latter, reserve 1 bit from if_flags, named as IFF_NETLINK_1.
This flag is mapped to IFF_LOWER_UP in the netlink headers. This is done
to avoid making applications think this flag is actually
supported / presented in non-netlink outputs.
* Add flag change notifications, by hooking into rt_ifmsg().
In the netlink model, notification should include the bitmask for the
change flags. Update rt_ifmsg() to include such bitmask.
Differential Revision: https://reviews.freebsd.org/D37597
Extend peer deleted notifications (which are the only type right now) to
include the reason the peer was deleted. This can be either because
userspace requested it, or because the peer timed out.
Reviewed by: zlei
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37583
Store user-supplied source protocol in the nexthops and nexthop groups.
Protocol specification help routing daemons like bird to quickly
identify self-originated routes after the crash or restart.
Example:
```
10.2.0.0/24 via 10.0.0.2 dev vtnet0 proto bird
10.3.0.0/24 proto bird
nexthop via 10.0.0.2 dev vtnet0 weight 3
nexthop via 10.0.0.3 dev vtnet0 weight 4
```
There is a need to store client metadata in nexthops and nexthop groups.
This metadata is immutable and participate in nhop/nhg comparison.
Nexthops KPI already supports its: nexthop creation pattern is
```
nhop_alloc()
nhop_set_...()
...
nhop_get_nhop()
```
This change provides a similar pattern for the nexthop groups.
Specifically, it adds nhgrp_alloc(), nhgrp_get_nhgrp() and
nhgrp_set_uidx().
MFC after: 2 weeks
Replace the fixed-sized array by an RB_TREE. This should both speed up
lookups and remove the 128 peer limit.
Reviewed by: zlei
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37524
This avoids having to undo it before invoking NetGraph's orphan input
hook.
Reviewed by: ae, melifaro
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D37510
Rather than passing control packets through the ioctl interface allow
them to pass through the normal UDP socket flow.
This simplifies both kernel and userspace, and matches the approach
taken (or the one that will be taken) on the Linux side of things.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37317
We reference count to ensure we don't release the socket while we still
have data in flight. That means that we can end up releasing the socket
from ovpn_encrypt_tx_cb().
We must have a vnet context set when calling sorele() (which asserts
this from within sofree()), so move the CURVNET_SET()/CURVNET_RESTORE()
to ensure this is the case.
While here also add a couple of assertions to make this more obvious,
and to ease future debugging.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37326
We need to explicitly list AES-128-GCM as an allowed cipher for that
mode to work. While here also add AES-192-GCM. That brings our supported
cipher list in line with other openvpn/dco platforms.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Work is ongoing to add support for pfsync over IPv6. This required some
changes to allow for differentiating between the two families in a more
generic way.
This patch converts the relevant ioctls to using nvlists, making future
extensions (such as supporting IPv6 addresses) easier.
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D36277
Allow pf (l2) to be used to redirect ethernet packets to a different
interface.
The intended use case is to send 802.1x challenges out to a side
interface, to enable AT&T links to function with pfSense as a gateway,
rather than the AT&T provided hardware.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37193
This commit brings back the driver from FreeBSD commit
f187d6dfbf plus subsequent fixes from
upstream.
Relative to upstream this commit includes a few other small fixes such
as additional INET and INET6 #ifdef's, #include cleanups, and updates
for recent API changes in main.
Reviewed by: pauamma, gbe, kevans, emaste
Obtained from: git@git.zx2c4.com:wireguard-freebsd @ 3cc22b2
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36909
Change the default for net.link.bridge.pfil_member and
net.link.bridge.pfil_bridge to zero.
That is, default to not calling layer 3 firewalls on the bridge or its
member interfaces.
With either of these enabled the bridge will, during L2 processing,
remove the Ethernet header from packets, feed them to L3 firewalls,
re-add the Ethernet header and send them out.
Not only does this interact very poorly with firewalls which defer
packets, or reassemble and refragment IPv6, it also causes considerable
confusion for users, because the firewall gets called in unexpected
ways.
For example, a bridge which contains a bhyve tap and the host's LAN
interface. We'd expect traffic between the LAN and bhyve VM to pass, no
matter what (layer 3) firewall rules are set on the host. That's not the
case as long as pfil_bridge or pfil_member are set.
Reviewed by: Zhenlei Huang
MFC: never
Differential Revision: https://reviews.freebsd.org/D37009
Allow the choice between asynchronous and synchronous netisr and crypto
calls. These have performance implications, but depend on the specific
setup and OCF back-end.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37017
For v2, iflib will parse packet headers before queueing a packet.
This commit also adds a new field in the structure that holds parsed
header information from packets; it stores the IP ToS/traffic class
field found in the IPv4/IPv6 header.
To help, it will only partially parse header packets before queueing
them by using a new header parsing function that does less than the
current parsing header function; for our purposes we only need up to the
minimal IP header in order to get the IP ToS infromation and don't need
to pull up more data.
For now, v1 and v2 co-exist in this patch; v1 still offers a
less-invasive method where none of the packet is parsed in iflib before
queueing.
This also bumps the sys/param.h version.
Signed-off-by: Eric Joyner <erj@FreeBSD.org>
Tested by: IntelNetworking
MFC after: 3 days
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D34742
Fully working openvpn(8) --iroute support needs real subnet config
on ovpn(4) interfaces (IFF_BROADCAST), while client-side/p2p
configs need IFF_POINTOPOINT setting. So make this configurable.
Reviewed by: kp
ovpn_encrypt_tx_cb() calls ovpn_encap() to transmit a packet, then adds
the length of the packet to the "tunnel_bytes_sent" counter. However,
after ovpn_encap() returns 0, the mbuf chain may have been freed, so the
load of m->m_pkthdr.len may be a use-after-free.
Reported by: markj
Sponsored by: Rubicon Communications, LLC ("Netgate")
Rather than using a per-cpu state counter, and adding in the CPU id we
can atomically increment the number.
This has the advantage of removing the assumption that the CPU ID fits
in 8 bits.
Event: Aberdeen Hackathon 2022
Reviewed by: mjg
Differential Revision: https://reviews.freebsd.org/D36915
netisr_dispatch() can fail, especially when under high traffic loads.
This isn't a fatal error, so simply don't check the return value.
Sponsored by: Rubicon Communications, LLC ("Netgate")
The vxlan interface encapsulates the Ethernet frame by prepending IP/UDP
and vxlan headers. For statistics, only the payload, i.e. the
encapsulated (inner) frame should be counted.
Event: Aberdeen Hackathon 2022
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D36855
If the crypto callback is asynchronous we're no longer in net_epoch,
which ovpn_encap() (and ip_output() it calls) expect.
Ensure we've entered the epoch.
Do the same thing for the rx path.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Use time_t rather than uint32_t to represent the timestamps. That means
we have 64 bits rather than 32 on all platforms except i386, avoiding
the Y2K38 issues on most platforms.
Reviewed by: Zhenlei Huang
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D36837
r325506 (3cf8254f1e) extended struct pkthdr to add packet timestamp in
mbuf(9) chain. For example, cxgbe(4) and mlx5en(4) support this feature.
Use the timestamp for bpf(4) if it is available.
Reviewed by: hselasky, kib, np
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D36868
Netlinks is a communication protocol currently used in Linux kernel to modify,
read and subscribe for nearly all networking state. Interfaces, addresses, routes,
firewall, fibs, vnets, etc are controlled via netlink.
It is async, TLV-based protocol, providing 1-1 and 1-many communications.
The current implementation supports the subset of NETLINK_ROUTE
family. To be more specific, the following is supported:
* Dumps:
- routes
- nexthops / nexthop groups
- interfaces
- interface addresses
- neighbors (arp/ndp)
* Notifications:
- interface arrival/departure
- interface address arrival/departure
- route addition/deletion
* Modifications:
- adding/deleting routes
- adding/deleting nexthops/nexthops groups
- adding/deleting neghbors
- adding/deleting interfaces (basic support only)
* Rtsock interaction
- route events are bridged both ways
The implementation also supports the NETLINK_GENERIC family framework.
Implementation notes:
Netlink is implemented via loadable/unloadable kernel module,
not touching many kernel parts.
Each netlink socket uses dedicated taskqueue to support async operations
that can sleep, such as interface creation. All message processing is
performed within these taskqueues.
Compatibility:
Most of the Netlink data models specified above maps to FreeBSD concepts
nicely. Unmodified ip(8) binary correctly works with
interfaces, addresses, routes, nexthops and nexthop groups. Some
software such as net/bird require header-only modifications to compile
and work with FreeBSD netlink.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D36002
MFC after: 2 months
* Factor out queue selection (epair_select_queue()) and mbuf
preparation (epair_prepare_mbuf()) from epair_menq(). It simplifies
epair_menq() implementation and reduces the amount of dependencies
on the neighbouring epair.
* Use dedicated epair_set_state() instead of 2-lines copy-paste
* Factor out unit selection code (epair_handle_unit()) from
epair_clone_create(). It simplifies the clone creation logic.
Reviewed By: kp
Differential Revision: https://reviews.freebsd.org/D36689
When the tunneled (IPv6) traffic had traffic class bits set (but only >=
16) the packet got lost on the receive side.
This happened because the address family check in ovpn_get_af() failed
to mask correctly, so the version check didn't match, causing us to drop
the packet.
While here also extend the existing 6-in-6 test case to trigger this
issue.
PR: 266598
Sponsored by: Rubicon Communications, LLC ("Netgate")
Add methods for setting and removing the description from the interface,
so the external users can manage it without using ioctl API.
MFC after: 2 weeks
Factor cloner ifp addition/deletion into separate functions and
make them public. This change simlifies the current cloner code
and paves the way to the other upcoming cloner / epair changes.
MFC after: 2 weeks
Convert most of the cloner customers who require custom params
to the new if_clone KPI.
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D36636
MFC after: 2 weeks
The current cloning KPI does not provide a way of creating interfaces
with parameres from within kernel. The reason is that those parameters
are passed as an opaque pointer and it is not possible to specify whether
this pointer references kernel-space or user-space.
Instead of just adding a flag, generalise the KPI to simplify the
extension process. Unify current notion of `SIMPLE` and `ADVANCED` users
by leveraging newly-added IFC_C_AUTOUNIT flag to automatically pick
unit number, which is a primary feature of the "SIMPLE" KPI.
Use extendable structures everywhere instead of passing function
pointers or parameters.
Isolate all parts of the oldKPI under `CLONE_COMPAT_13` so it can be safely
merged back to 13. Old KPI will be removed after the merge.
Differential Revision: https://reviews.freebsd.org/D36632
MFC after: 2 weeks
Simplify epair_clone_create() and epair_clone_destroy() by
factoring out epair softc allocation / desctruction and
interface setup/teardown into separate functions.
Reviewed By: kp, zlei.huang_gmail.com
Differential Revision: https://reviews.freebsd.org/D36614
MFC after: 2 weeks
When we're unloading the if_ovpn module we sometimes end up only freeing
the softc after the module is unloaded and the M_OVPN malloc type no
longer exists.
Don't return from ovpn_clone_destroy() until the epoch callbacks have
been called, which ensures that we've freed the softc before we destroy
M_OVPN.
Sponsored by: Rubicon Communications, LLC ("Netgate")
The ciphers used by OpenVPN (DCO) do not require data to be block-sized.
Do not round up to AES_BLOCK_LEN, as this can lead to issues with
fragmented packets.
Reported by: Gert Doering <gert@greenie.muc.de>
Sponsored by: Rubicon Communications, LLC ("Netgate")
This shaves a lot of branching due to MEMPTR flag.
Reviewed by: glebius
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D36454
Routing daemons such as bird need to know if they install certain route
so they can clean it up on startup, as a form of achieving consistent
state during the crash recovery.
Currently they use combination of routing flags (RTF_PROTO1) to detect
these routes when interacting via route(4) rtsock protocol.
Netlink protocol has a special "rtm_protocol" field that is filled and
checked by the route originator. To prepare for the upcoming netlink
introduction, add ability to record origing to both nexthops and
nexthop groups via <nhop|nhgrp>_<get|set>_origin() KPI. The actual
calls will be used in the followup commits.
MFC after: 1 month
It is now unused and not having it allows further clean ups.
Reviewed by: cy, glebius, kp
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D36452
Add IF_DEBUG_LEVEL() macro to ensure all debug output preparation
is run only if the current debug level is sufficient. Consistently
use it within routing subsystem.
MFC after: 2 weeks
* add nhop_get_unlinked() used to prepare referenced but not
linked nexthop, that can later be used as a clone source.
* add nhop_check_gateway() to check for allowed address family
combinations between the rib family and neighbor family (useful
for 4o6 or direct routes)
* add nhop_set_upper_family() to allow copying IPv6 nexthops to
IPv4 rib.
* add rt_get_rnd() wrapper, returning both nexthop/group and its
weight attached to the rtentry.
* Add CHT_SLIST_FOREACH_SAFE(), allowing to delete items during
iteration.
MFC after: 2 weeks
Multiple consumers in the kernel space want to install IPv4 or IPv6
default route. Provide convenient wrapper to simplify the code
inside the customers.
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D36167
Construct the desired hexthops directly instead of using the
"translation" layer in form of filling rt_addrinfo data.
Simplify V_rt_add_addr_allfibs handling by using recently-added
rib_copy_route() to propagate the routes to the non-primary address
fibs.
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D36166
Get rid of struct pfsync_pkt. It was used to store data on the stack to
pass to all the submessage handlers, but only the flags part of it was
ever used. Just pass the flags directly instead.
Reviewed by: kp
Obtained from: OpenBSD
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D36294
It is only ever xlocked in drain_dev_clone_events and the only consumer of
that routine does not need it -- eventhandler code already makes sure the
relevant callback is no longer running.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D36268
o Assert that every protosw has pr_attach. Now this structure is
only for socket protocols declarations and nothing else.
o Merge struct pr_usrreqs into struct protosw. This was suggested
in 1996 by wollman@ (see 7b187005d1), and later reiterated
in 2006 by rwatson@ (see 6fbb9cf860).
o Make struct domain hold a variable sized array of protosw pointers.
For most protocols these pointers are initialized statically.
Those domains that may have loadable protocols have spacers. IPv4
and IPv6 have 8 spacers each (andre@ dff3237ee5).
o For inetsw and inet6sw leave a comment noting that many protosw
entries very likely are dead code.
o Refactor pf_proto_[un]register() into protosw_[un]register().
o Isolate pr_*_notsupp() methods into uipc_domain.c
Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D36232
They were useful many years ago, when the callwheel was not efficient,
and the kernel tried to have as little callout entries scheduled as
possible.
Reviewed by: tuexen, melifaro
Differential revision: https://reviews.freebsd.org/D36163
This function was added in pre-epoch era ( 9a1b64d5a0 ) to
provide public rtentry access interface & hide rtentry internals.
The implementation is based on the large on-stack copying and
refcounting of the referenced objects (ifa/ifp).
It has become obsolete after epoch & nexthop introduction. Convert
the last remaining user and remove the function itself.
Differential Revision: https://reviews.freebsd.org/D36197
Finish 02e05b8fae:
* add gateway matcher function that can be used in rib_del_route_px()
or any rib_walk-family functions. It will be used in the upcoming
migration to the new KPI
* rename gw_fulter_func to match_gw_one() to better signal the
function purpose / semantic.
MFC after: 1 month
This makes routing socket implementation self contained and removes one
of the last dependencies on the raw socket code and pr_output method.
There are very subtle API visible changes:
- now routing socket would return EOPNOTSUPP instead of EINVAL on
syscalls that are not supposed to be called on a routing socket.
- routing socket buffer sizes are now controlled by net.rtsock
sysctls instead of net.raw. The latter were not documented
anywhere, and even Internet search doesn't find any references
or discussions related to these sysctls.
Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D36122
The old mechanism of getting them via domains/protocols control input
is a relict from the previous century, when nothing like EVENTHANDLER(9)
existed yet. Retire PRC_IFDOWN/PRC_IFUP as netinet was the only one
to use them.
Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D36116
We must ensure that the fd provided by userspace is really for a UDP
socket. If it's not we'll panic in udp_set_kernel_tunneling().
Reported by: Gert Doering <gert@greenie.muc.de>
Sponsored by: Rubicon Communications, LLC ("Netgate")
route_ctl.c size has grown considerably since initial introduction.
Factor out non-relevant parts:
* all rtentry logic, such as creation/destruction and accessors
goes to net/route/route_rtentry.c
* all rtable subscription logic goes to net/route/route_subscription.c
Differential Revision: https://reviews.freebsd.org/D36074
MFC after: 1 month
This change adds public KPI to work with routes using pre-created
nexthops, instead of using data from addrinfo structures. These
functions will be later used for adding/deleting kernel-originated
routes and upcoming netlink protocol.
As a part of providing this KPI, low-level route addition code has been
reworked to provide more control over route creation or change.
Specifically, a number of operation flags
(RTM_F_<CREATE|EXCL|REPLACE|APPEND>) have been added, defining the
desired behaviour the the route already exists (or not exists). This
change required some changes in the multipath addition code, resulting
in moving this code to route_ctl.c, rendering mpath_ctl.c empty.
Differential Revision: https://reviews.freebsd.org/D36073
MFC after: 1 month
This change is required for the upcoming introduction of the next
nexhop-based operations KPI, as it will create rtentry and nexthops
at different stages of route table modification.
Differential Revision: https://reviews.freebsd.org/D36072
MFC after: 2 weeks
* Use same filter func (rib_filter_f_t) for nexhtop groups to
simplify callbacks.
* simplify conditional route deletion & remove the need to pass
rt_addrinfo to the low-level deletion functions
* speedup rib_walk_del() by removing an additional per-prefix lookup
Differential Revision: https://reviews.freebsd.org/D36071
MFC after: 1 month
This and the follow-up routing-related changes target to remove or
reduce `struct rt_addrinfo` usage and use recently-landed nhop(9)
KPI instead.
Traditionally `rt_addrinfo` structure has been used to propagate all necessary
information between the protocol/rtsock and a routing layer. Many
functions inside routing subsystem uses it internally. However, using
this structure became somewhat complicated, as there are too many ways
of specifying a single state and verifying data consistency is hard.
For example, arerouting flgs consistent with mask/gateway sockaddr pointers?
Is mask really a host mask? Are sockaddr "valid" (e.g. properly zeroed, masked,
have proper length)? Are they mutable? Is the suggested interface specified
by the interface index embedded into the sockadd_dl gateway, or passed
as RTAX_IFP parameter, or directly provided by rti_ifp or it needs to
be derived from the ifa?
These (and other similar) questions have to be considered every time when
a function has `rt_addrinfo` pointer as an argument.
The new approach is to bring more control back to the protocols and
construct the desired routing objects themselves - in the end, it's the
protocol/subsystem who knows the desired outcome.
This specific diff changes the following:
* add explicit basic low-level radix operations:
add_route() (renamed from add_route_nhop())
delete_route() (factored from change_route_nhop())
change_route() (renamed from change_route_nhop)
* remove "info" parameter from change_route_conditional() as a part
of reducing rt_addrinfo usage in the internal KPIs
* add lookup_prefix_rt() wrapper for doing re-lookups after
RIB lock/unlock
Differential Revision: https://reviews.freebsd.org/D36070
MFC after: 2 weeks
BPF might put an interface in promiscuous mode when handling the
BIOCSDLT ioctl. When this happens, a flag is set in the BPF descriptor
so that the old interface can be restored when the BPF descriptor is
destroyed.
The BIOCPROMISC ioctl can also be used to put a BPF descriptor's
interface into promiscuous mode, but there was nothing synchronizing the
flag. Fix this by modifying the ioctl handler to acquire the global BPF
mutex, which is used to synchronize ifpromisc() calls elsewhere in BPF.
Reviewed by: kp, melifaro
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36045
ovpn_find_peer_by_ip() is not used if INET is not defined. Do not
define the function in that case. Same for ovpn_find_peer_by_ip6().
Fix these warnings:
/usr/src/sys/net/if_ovpn.c:1580:1: warning: unused function 'ovpn_find_peer_by_ip' [-Wunused-function]
ovpn_find_peer_by_ip(struct ovpn_softc *sc, const struct in_addr addr)
^
/usr/src/sys/net/if_ovpn.c:1599:1: warning: unused function 'ovpn_find_peer_by_ip6' [-Wunused-function]
ovpn_find_peer_by_ip6(struct ovpn_softc *sc, const struct in6_addr *addr)
^
Reported by: mjg
Sponsored by: Rubicon Communications, LLC ("Netgate")
* Make nhgrp_get_nhops() return const struct weightened_nhop to
indicate that the list is immutable
* Make nhgrp_get_group() return the actual group, instead of
group+weight.
MFC after: 2 weeks
Convert the last remaining pieces of old-style debug messages
to the new debugging framework.
Differential Revision: https://reviews.freebsd.org/D35994
MFC after: 2 weeks
Currently, rt_addrinfo(info) serves as a main "transport" moving
state between various functions inside the routing subsystem.
As all of the fields are filled in directly by the customers, it
is problematic to maintain consistency, resulting in repeated checks
inside many functions. Additionally, there are multiple ways of
specifying the same value (RTAX_IFP vs rti_ifp / rti_ifa) and so on.
With the upcoming nhop(9) kpi it is possible to store all of the
required state in the nexthops in the consistent fashion, reducing the
need to use "info" in the KPI calls.
Finally, rt_addrinfo structure format was derived from the rtsock wire
format, which is different from other kernel routing users or netlink.
This cleanup simplifies upcoming nhop(9) kpi and netlink introduction.
Reviewed by: zlei.huang@gmail.com
Differential Revision: https://reviews.freebsd.org/D35972
MFC after: 2 weeks
Mark dst/mask public API functions fields as const to clearly
indicate that these parameters are not modified or stored in
the datastructure.
Differential Revision: https://reviews.freebsd.org/D35971
MFC after: 2 weeks
Expiration time is actually a path property, not a route property.
Move its storage to nexthop to simplify upcoming nhop(9) KPI changes
and netlink introduction.
Differential Revision: https://reviews.freebsd.org/D35970
MFC after: 2 weeks