We must ensure that the fd provided by userspace is really for a UDP
socket. If it's not we'll panic in udp_set_kernel_tunneling().
Reported by: Gert Doering <gert@greenie.muc.de>
Sponsored by: Rubicon Communications, LLC ("Netgate")
route_ctl.c size has grown considerably since initial introduction.
Factor out non-relevant parts:
* all rtentry logic, such as creation/destruction and accessors
goes to net/route/route_rtentry.c
* all rtable subscription logic goes to net/route/route_subscription.c
Differential Revision: https://reviews.freebsd.org/D36074
MFC after: 1 month
This change adds public KPI to work with routes using pre-created
nexthops, instead of using data from addrinfo structures. These
functions will be later used for adding/deleting kernel-originated
routes and upcoming netlink protocol.
As a part of providing this KPI, low-level route addition code has been
reworked to provide more control over route creation or change.
Specifically, a number of operation flags
(RTM_F_<CREATE|EXCL|REPLACE|APPEND>) have been added, defining the
desired behaviour the the route already exists (or not exists). This
change required some changes in the multipath addition code, resulting
in moving this code to route_ctl.c, rendering mpath_ctl.c empty.
Differential Revision: https://reviews.freebsd.org/D36073
MFC after: 1 month
This change is required for the upcoming introduction of the next
nexhop-based operations KPI, as it will create rtentry and nexthops
at different stages of route table modification.
Differential Revision: https://reviews.freebsd.org/D36072
MFC after: 2 weeks
* Use same filter func (rib_filter_f_t) for nexhtop groups to
simplify callbacks.
* simplify conditional route deletion & remove the need to pass
rt_addrinfo to the low-level deletion functions
* speedup rib_walk_del() by removing an additional per-prefix lookup
Differential Revision: https://reviews.freebsd.org/D36071
MFC after: 1 month
This and the follow-up routing-related changes target to remove or
reduce `struct rt_addrinfo` usage and use recently-landed nhop(9)
KPI instead.
Traditionally `rt_addrinfo` structure has been used to propagate all necessary
information between the protocol/rtsock and a routing layer. Many
functions inside routing subsystem uses it internally. However, using
this structure became somewhat complicated, as there are too many ways
of specifying a single state and verifying data consistency is hard.
For example, arerouting flgs consistent with mask/gateway sockaddr pointers?
Is mask really a host mask? Are sockaddr "valid" (e.g. properly zeroed, masked,
have proper length)? Are they mutable? Is the suggested interface specified
by the interface index embedded into the sockadd_dl gateway, or passed
as RTAX_IFP parameter, or directly provided by rti_ifp or it needs to
be derived from the ifa?
These (and other similar) questions have to be considered every time when
a function has `rt_addrinfo` pointer as an argument.
The new approach is to bring more control back to the protocols and
construct the desired routing objects themselves - in the end, it's the
protocol/subsystem who knows the desired outcome.
This specific diff changes the following:
* add explicit basic low-level radix operations:
add_route() (renamed from add_route_nhop())
delete_route() (factored from change_route_nhop())
change_route() (renamed from change_route_nhop)
* remove "info" parameter from change_route_conditional() as a part
of reducing rt_addrinfo usage in the internal KPIs
* add lookup_prefix_rt() wrapper for doing re-lookups after
RIB lock/unlock
Differential Revision: https://reviews.freebsd.org/D36070
MFC after: 2 weeks
BPF might put an interface in promiscuous mode when handling the
BIOCSDLT ioctl. When this happens, a flag is set in the BPF descriptor
so that the old interface can be restored when the BPF descriptor is
destroyed.
The BIOCPROMISC ioctl can also be used to put a BPF descriptor's
interface into promiscuous mode, but there was nothing synchronizing the
flag. Fix this by modifying the ioctl handler to acquire the global BPF
mutex, which is used to synchronize ifpromisc() calls elsewhere in BPF.
Reviewed by: kp, melifaro
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36045
ovpn_find_peer_by_ip() is not used if INET is not defined. Do not
define the function in that case. Same for ovpn_find_peer_by_ip6().
Fix these warnings:
/usr/src/sys/net/if_ovpn.c:1580:1: warning: unused function 'ovpn_find_peer_by_ip' [-Wunused-function]
ovpn_find_peer_by_ip(struct ovpn_softc *sc, const struct in_addr addr)
^
/usr/src/sys/net/if_ovpn.c:1599:1: warning: unused function 'ovpn_find_peer_by_ip6' [-Wunused-function]
ovpn_find_peer_by_ip6(struct ovpn_softc *sc, const struct in6_addr *addr)
^
Reported by: mjg
Sponsored by: Rubicon Communications, LLC ("Netgate")
* Make nhgrp_get_nhops() return const struct weightened_nhop to
indicate that the list is immutable
* Make nhgrp_get_group() return the actual group, instead of
group+weight.
MFC after: 2 weeks
Convert the last remaining pieces of old-style debug messages
to the new debugging framework.
Differential Revision: https://reviews.freebsd.org/D35994
MFC after: 2 weeks
Currently, rt_addrinfo(info) serves as a main "transport" moving
state between various functions inside the routing subsystem.
As all of the fields are filled in directly by the customers, it
is problematic to maintain consistency, resulting in repeated checks
inside many functions. Additionally, there are multiple ways of
specifying the same value (RTAX_IFP vs rti_ifp / rti_ifa) and so on.
With the upcoming nhop(9) kpi it is possible to store all of the
required state in the nexthops in the consistent fashion, reducing the
need to use "info" in the KPI calls.
Finally, rt_addrinfo structure format was derived from the rtsock wire
format, which is different from other kernel routing users or netlink.
This cleanup simplifies upcoming nhop(9) kpi and netlink introduction.
Reviewed by: zlei.huang@gmail.com
Differential Revision: https://reviews.freebsd.org/D35972
MFC after: 2 weeks
Mark dst/mask public API functions fields as const to clearly
indicate that these parameters are not modified or stored in
the datastructure.
Differential Revision: https://reviews.freebsd.org/D35971
MFC after: 2 weeks
Expiration time is actually a path property, not a route property.
Move its storage to nexthop to simplify upcoming nhop(9) KPI changes
and netlink introduction.
Differential Revision: https://reviews.freebsd.org/D35970
MFC after: 2 weeks
In the current implementation of altq_hfsc.c, whne new queues are being
added (by pfctl), each queue is added to the tail of the siblings linked
list under the parent queue.
On a system with many queues (50,000+) this leads to very long load
times at the insertion process must scan the entire list for every new
queue,
Since this list is unordered, this changes merely adds the new queue to
the head of the list rather than the tail.
Reviewed by: kp
MFC after: 3 weeks
Sponsored by: RG Nets
Differential Revision: https://reviews.freebsd.org/D35964
Lagg was broken by SIOCSIFCAPNV when all underlying devices
support SIOCSIFCAPNV. This change updates lagg to work with
SIOCSIFCAPNV and if_capabilities2.
Reviewed by: kib, hselasky
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D35865
With clang 15, the following -Werror warnings are produced:
sys/net/route/route_ctl.c:130:17: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
vnet_rtzone_init()
^
void
sys/net/route/route_ctl.c:139:20: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
vnet_rtzone_destroy()
^
void
This is because vnet_rtzone_init() and vnet_rtzone_destroy() are
declared with (void) argument lists, but defined with empty argument
lists. Make the definitions match the declarations.
MFC after: 3 days
With clang 15, the following -Werror warning is produced:
sys/net/route/nhop_ctl.c:508:21: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
alloc_nhop_structure()
^
void
This is alloc_nhop_structure() is declared with a (void) argument list,
but defined with an empty argument list. Make the definition match the
declaration.
MFC after: 3 days
vlan_remhash() uses incorrect value for b.
When using the default value for VLAN_DEF_HWIDTH (4), the VLAN hash-list table
expands from 16 chains to 32 chains as the 129th entry is added. trunk->hwidth
becomes 5. Say a few more entries are added and there are now 135 entries.
trunk-hwidth will still be 5. If an entry is removed, vlan_remhash() will
calculate a value of 32 for b. refcnt will be decremented to 134. The if
comparison at line 473 will return true and vlan_growhash() will be called. The
VLAN hash-list table will be compressed from 32 chains wide to 16 chains wide.
hwidth will become 4. This is an error, and it can be seen when a new VLAN is
added. The table will again be expanded. If an entry is then removed, again
the table is contracted.
If the number of VLANS stays in the range of 128-512, each time an insert
follows a remove, the table will expand. Each time a remove follows an
insert, the table will be contracted.
The fix is simple. The line 473 should test that the number of entries has
decreased such that the table should be contracted using what would be the new
value of hwidth. line 467 should be:
b = 1 << (trunk->hwidth - 1);
PR: 265382
Reviewed by: kp
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
With clang 15, the following -Werror warning is produced:
sys/net/iflib.c:993:8: error: variable 'n' set but not used [-Werror,-Wunused-but-set-variable]
u_int n;
^
The 'n' variable appears to have been a debugging aid that has never
been used for anything, so remove it.
MFC after: 3 days
With clang 15, the following -Werror warning is produced:
sys/net/if_lagg.c:2413:6: error: variable 'active_ports' set but not used [-Werror,-Wunused-but-set-variable]
int active_ports = 0;
^
The 'active_ports' variable appears to have been a debugging aid that
has never been used for anything (ref https://reviews.freebsd.org/D549),
so remove it.
MFC after: 3 days
It's currently not possible to change the vlan ID or vlan protocol (i.e.
802.1q vs. 802.1ad) without de-configuring the interface (i.e. ifconfig
vlanX -vlandev).
Add a specific flow for this, allowing both the protocol and id (but not
parent interface) to be changed without going through the '-vlandev'
step.
Reviewed by: glebius
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D35846
This is not completely exhaustive, but covers a large majority of
commands in the tree.
Reviewed by: markj
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D35583
Combined changes to allow experimentation with net 0/8 (network 0),
240/4 (Experimental/"Class E"), and part of the loopback net 127/8
(all but 127.0/16). All changes are disabled by default, and can be
enabled by the following sysctls:
net.inet.ip.allow_net0=1
net.inet.ip.allow_net240=1
net.inet.ip.loopback_prefixlen=16
When enabled, the corresponding addresses can be used as normal
unicast IP addresses, both as endpoints and when forwarding.
Add descriptions of the new sysctls to inet.4.
Add <machine/param.h> to vnet.h, as CACHE_LINE_SIZE is undefined in
various C files when in.h includes vnet.h.
The proposals motivating this experimentation can be found in
https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-0https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-240https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-127
Reviewed by: rgrimes, pauamma_gundo.com; previous versions melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D35741
If the link is down or we can't find a peer we do not transmit the
packet, but also don't fee it.
Remember to m_freem() mbufs we can't transmit.
Sponsored by: Rubicon Communications, LLC ("Netgate")
VNET_FOREACH() is a LIST_FOREACH if VIMAGE is set, but empty if it's
not. This means that users of the macro couldn't use 'continue' or
'break' as one would expect of a loop.
Change VNET_FOREACH() to be a loop in all cases (although one that is
fixed to one iteration if VIMAGE is not set).
Reviewed by: karels, melifaro, glebius
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D35739
If we receive a UDP packet (directed towards an active OpenVPN socket)
which is too short to contain an OpenVPN header ('struct
ovpn_wire_header') we wound up making m_copydata() read outside the
mbuf, and panicking the machine.
Explicitly check that the packet is long enough to copy the data we're
interested in. If it's not we will pass the packet to userspace, just
like we'd do for an unknown peer.
Extend a test case to provoke this situation.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Some command definitions were forced to use DB_FUNC in order to specify
their required flags, CS_OWN or CS_MORE. Use the new macros to simplify
these.
Reviewed by: markj, jhb
MFC after: 3 days
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D35582
Openvpn defaults to binding to IPv6 sockets (with
setsockopt(IPV6_V6ONLY=0)), which we didn't deal with.
That resulted in us trying to in6_selectsrc_addr() on a v4 mapped v6
address, which does not work.
Instead we translate the mapped address to v4 and treat it as an IPv4
address.
Sponsored by: Rubicon Communications, LLC ("Netgate")
OpenVPN Data Channel Offload (DCO) moves OpenVPN data plane processing
(i.e. tunneling and cryptography) into the kernel, rather than using tap
devices.
This avoids significant copying and context switching overhead between
kernel and user space and improves OpenVPN throughput.
In my test setup throughput improved from around 660Mbit/s to around
2Gbit/s.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D34340
The function's goal is to compare old/new nhop/nexthop group for the route
and decompose it into the series of RTM_ADD/RTM_DELETE single-nhop
events, calling specified callback for each event.
Simplify it by properly leveraging the fact that both old/new groups
are sorted nhop-# ascending.
Tested by: Claudio Jeker<claudio.jeker@klarasystems.com>
Differential Revision: https://reviews.freebsd.org/D35598
MFC after: 2 weeks
Nexthops in the nexthop groups needs to be deterministically sorted
by some their property to simplify reporting cost when changing
large nexthop groups.
Fix reporting by actually sorting next hops by their indices (`wn_cmp_idx()`).
As calc_min_mpath_slots_fast() has an assumption that next hops are sorted
using their relative weight in the nexthop groups, it needs to be
addressed as well. The latter sorting is required to quickly determine the
layout of the next hops in the actual forwarding group. For example,
what's the best way to split the traffic between nhops with weights
19,31 and 47 if the maximum nexthop group width is 64?
It is worth mentioning that such sorting is only required during nexthop
group creation and is not used elsewhere. Lastly, normally all nexthop
are of the same weight. With that in mind, (a) use spare 32 bytes inside
`struct weightened_nexthop` to avoid another memory allocation and
(b) use insertion sort to sort the nexthop weights.
Reported by: thj
Tested by: Claudio Jeker<claudio.jeker@klarasystems.com>
Differential Revision: https://reviews.freebsd.org/D35599
MFC after: 2 weeks
Rather than reject new bridge members because they have the wrong MTU
change it to match the bridge. If that fails, reject the new interface.
PR: 264883
Different Revision: https://reviews.freebsd.org/D35597