Commit Graph

4944 Commits

Author SHA1 Message Date
Alexander V. Chernikov
ae6bfd12c8 routing: refactor private KPI
* Make nhgrp_get_nhops() return const struct weightened_nhop to
 indicate that the list is immutable
* Make nhgrp_get_group() return the actual group, instead of
 group+weight.

MFC after:	2 weeks
2022-08-01 10:02:12 +00:00
Alexander V. Chernikov
5c23343b8c routing: convert remnants of DPRINTF to FIB_CTL_LOG().
Convert the last remaining pieces of old-style debug messages
 to the new debugging framework.

Differential Revision: https://reviews.freebsd.org/D35994
MFC after:	2 weeks
2022-08-01 08:55:07 +00:00
Alexander V. Chernikov
800c68469b routing: add nhop(9) kpi.
Differential Revision: https://reviews.freebsd.org/D35985
MFC after:	1 month
2022-08-01 08:52:26 +00:00
Alexander V. Chernikov
29029b06a6 routing: remove info argument from add/change_route_nhop().
Currently, rt_addrinfo(info) serves as a main "transport" moving
 state between various functions inside the routing subsystem.
As all of the fields are filled in directly by the customers, it
 is problematic to maintain consistency, resulting in repeated checks
 inside many functions. Additionally, there are multiple ways of
 specifying the same value (RTAX_IFP vs rti_ifp / rti_ifa) and so on.
With the upcoming nhop(9) kpi it is possible to store all of the
 required state in the nexthops in the consistent fashion, reducing the
 need to use "info" in the KPI calls.
Finally, rt_addrinfo structure format was derived from the rtsock wire
 format, which is different from other kernel routing users or netlink.

This cleanup simplifies upcoming nhop(9) kpi and netlink introduction.

Reviewed by:	zlei.huang@gmail.com
Differential Revision: https://reviews.freebsd.org/D35972
MFC after:	2 weeks
2022-08-01 07:41:07 +00:00
Alexander V. Chernikov
97ffaff859 net: constantify radix.c functions
Mark dst/mask public API functions fields as const to clearly
 indicate that these parameters are not modified or stored in
 the datastructure.

Differential Revision: https://reviews.freebsd.org/D35971
MFC after:	2 weeks
2022-08-01 07:32:40 +00:00
Alexander V. Chernikov
2717e958df routing: move route expiration time to its nexthop
Expiration time is actually a path property, not a route property.
Move its storage to nexthop to simplify upcoming nhop(9) KPI changes
 and netlink introduction.

Differential Revision: https://reviews.freebsd.org/D35970
MFC after:	2 weeks
2022-08-01 07:26:53 +00:00
Alexander V. Chernikov
27f107e1b4 routing: add debug printing helpers for rtentry and RTM* cmds.
MFC after:	2 weeks
2022-07-31 09:01:42 +00:00
Zhenlei Huang
150486f6a9 Introduce and use the NET_EPOCH_DRAIN_CALLBACKS() macro
Reviewed by:	melifao, kp
Differential Revision:	https://reviews.freebsd.org/D35968
2022-07-29 21:21:10 +02:00
James Skon
13890d30f8 altq: improve pfctl config time for large numbers of queues
In the current implementation of altq_hfsc.c, whne new queues are being
added (by pfctl), each queue is added to the tail of the siblings linked
list under the parent queue.

On a system with many queues (50,000+) this leads to very long load
times at the insertion process must scan the entire list for every new
queue,

Since this list is unordered, this changes merely adds the new queue to
the head of the list rather than the tail.

Reviewed by:	kp
MFC after:	3 weeks
Sponsored by:	RG Nets
Differential Revision:	https://reviews.freebsd.org/D35964
2022-07-28 22:00:07 +02:00
Andrew Gallatin
713ceb99b6 lagg: fix lagg ifioctl after SIOCSIFCAPNV
Lagg was broken by SIOCSIFCAPNV when all underlying devices
support SIOCSIFCAPNV.  This change updates lagg to work with
SIOCSIFCAPNV and if_capabilities2.

Reviewed by: kib, hselasky
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D35865
2022-07-28 10:39:00 -04:00
Dimitry Andric
5e1097f83c Adjust function definitions in route_ctl.c to avoid clang 15 warnings
With clang 15, the following -Werror warnings are produced:

    sys/net/route/route_ctl.c:130:17: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
    vnet_rtzone_init()
                    ^
                     void
    sys/net/route/route_ctl.c:139:20: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
    vnet_rtzone_destroy()
                       ^
                        void

This is because vnet_rtzone_init() and vnet_rtzone_destroy() are
declared with (void) argument lists, but defined with empty argument
lists. Make the definitions match the declarations.

MFC after:	3 days
2022-07-26 21:25:09 +02:00
Dimitry Andric
a8adf13a63 Adjust function definition in nhop_ctl.c to avoid clang 15 warnings
With clang 15, the following -Werror warning is produced:

    sys/net/route/nhop_ctl.c:508:21: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
    alloc_nhop_structure()
                        ^
                         void

This is alloc_nhop_structure() is declared with a (void) argument list,
but defined with an empty argument list. Make the definition match the
declaration.

MFC after:	3 days
2022-07-26 21:25:09 +02:00
Kristof Provost
151abc80cd if_vlan: avoid hash table thrashing when adding and removing entries
vlan_remhash() uses incorrect value for b.

When using the default value for VLAN_DEF_HWIDTH (4), the VLAN hash-list table
expands from 16 chains to 32 chains as the 129th entry is added. trunk->hwidth
becomes 5. Say a few more entries are added and there are now 135 entries.
trunk-hwidth will still be 5. If an entry is removed, vlan_remhash() will
calculate a value of 32 for b. refcnt will be decremented to 134. The if
comparison at line 473 will return true and vlan_growhash() will be called. The
VLAN hash-list table will be compressed from 32 chains wide to 16 chains wide.
hwidth will become 4. This is an error, and it can be seen when a new VLAN is
added. The table will again be expanded. If an entry is then removed, again
the table is contracted.

If the number of VLANS stays in the range of 128-512, each time an insert
follows a remove, the table will expand. Each time a remove follows an
insert, the table will be contracted.

The fix is simple. The line 473 should test that the number of entries has
decreased such that the table should be contracted using what would be the new
value of hwidth. line 467 should be:

	b = 1 << (trunk->hwidth - 1);

PR:		265382
Reviewed by:	kp
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
2022-07-22 19:18:41 +02:00
Dimitry Andric
0294e95da4 Fix unused variable warning in iflib.c
With clang 15, the following -Werror warning is produced:

    sys/net/iflib.c:993:8: error: variable 'n' set but not used [-Werror,-Wunused-but-set-variable]
            u_int n;
                  ^

The 'n' variable appears to have been a debugging aid that has never
been used for anything, so remove it.

MFC after:	3 days
2022-07-21 21:19:39 +02:00
Dimitry Andric
fa267a329f Fix unused variable warning in if_lagg.c
With clang 15, the following -Werror warning is produced:

    sys/net/if_lagg.c:2413:6: error: variable 'active_ports' set but not used [-Werror,-Wunused-but-set-variable]
            int active_ports = 0;
                ^

The 'active_ports' variable appears to have been a debugging aid that
has never been used for anything (ref https://reviews.freebsd.org/D549),
so remove it.

MFC after:	3 days
2022-07-21 21:05:51 +02:00
Kristof Provost
663f556b03 if_vlan: allow vlan and vlanproto to be changed
It's currently not possible to change the vlan ID or vlan protocol (i.e.
802.1q vs. 802.1ad) without de-configuring the interface (i.e. ifconfig
vlanX -vlandev).
Add a specific flow for this, allowing both the protocol and id (but not
parent interface) to be changed without going through the '-vlandev'
step.

Reviewed by:	glebius
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D35846
2022-07-21 18:36:01 +02:00
Mitchell Horne
c84c5e00ac ddb: annotate some commands with DB_CMD_MEMSAFE
This is not completely exhaustive, but covers a large majority of
commands in the tree.

Reviewed by:	markj
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D35583
2022-07-18 22:06:09 +00:00
Mike Karels
efe58855f3 IPv4: experimental changes to allow net 0/8, 240/4, part of 127/8
Combined changes to allow experimentation with net 0/8 (network 0),
240/4 (Experimental/"Class E"), and part of the loopback net 127/8
(all but 127.0/16).  All changes are disabled by default, and can be
enabled by the following sysctls:

    net.inet.ip.allow_net0=1
    net.inet.ip.allow_net240=1
    net.inet.ip.loopback_prefixlen=16

When enabled, the corresponding addresses can be used as normal
unicast IP addresses, both as endpoints and when forwarding.

Add descriptions of the new sysctls to inet.4.

Add <machine/param.h> to vnet.h, as CACHE_LINE_SIZE is undefined in
various C files when in.h includes vnet.h.

The proposals motivating this experimentation can be found in

    https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-0
    https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-240
    https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-127

Reviewed by:	rgrimes, pauamma_gundo.com; previous versions melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D35741
2022-07-13 09:46:05 -05:00
Kristof Provost
59219dde9a if_ovpn: fix mbuf leak
If the link is down or we can't find a peer we do not transmit the
packet, but also don't fee it.

Remember to m_freem() mbufs we can't transmit.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-07-12 14:19:25 +02:00
Zhenlei Huang
7f7a804ae0 vxlan: Add support for socket ioctls SIOC[SG]TUNFIB
Submitted by: Luiz Amaral <email@luiz.eng.br>
PR: 244004
Differential Revision:	https://reviews.freebsd.org/D32820
MFC after:	2 weeks
2022-07-08 18:14:19 +00:00
Kristof Provost
37f604b49d vnet: make VNET_FOREACH() always be a loop
VNET_FOREACH() is a LIST_FOREACH if VIMAGE is set, but empty if it's
not. This means that users of the macro couldn't use 'continue' or
'break' as one would expect of a loop.

Change VNET_FOREACH() to be a loop in all cases (although one that is
fixed to one iteration if VIMAGE is not set).

Reviewed by:	karels, melifaro, glebius
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D35739
2022-07-07 09:52:21 +02:00
Kristof Provost
6ba6c05cb2 if_ovpn: deal with short packets
If we receive a UDP packet (directed towards an active OpenVPN socket)
which is too short to contain an OpenVPN header ('struct
ovpn_wire_header') we wound up making m_copydata() read outside the
mbuf, and panicking the machine.

Explicitly check that the packet is long enough to copy the data we're
interested in. If it's not we will pass the packet to userspace, just
like we'd do for an unknown peer.

Extend a test case to provoke this situation.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-07-05 19:27:00 +02:00
Mitchell Horne
258958b3c7 ddb: use _FLAGS command macros where appropriate
Some command definitions were forced to use DB_FUNC in order to specify
their required flags, CS_OWN or CS_MORE. Use the new macros to simplify
these.

Reviewed by:	markj, jhb
MFC after:	3 days
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D35582
2022-07-05 11:56:55 -03:00
Mateusz Guzik
db4b40213a routing: hide notify_add and notify_del behind ROUTE_MPATH
Fixes a warn about unused routines without the option.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-07-04 08:38:13 +00:00
Gordon Bergling
e8b7972cfe if_clone: Fix a typo in a source code comment
- s/fucntions/functions/

MFC ater:	3 days
2022-07-03 15:13:32 +02:00
Kristof Provost
6c77f8f0e0 if_ovpn: handle m_pullup() failure
Ensure we correctly handle m_pullup() failing in ovpn_finish_rx().

Reported by:	Coverity (CID 1490340)
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-07-01 10:02:32 +02:00
Kristof Provost
9f7c81eb33 if_ovpn: deal with v4 mapped IPv6 addresses
Openvpn defaults to binding to IPv6 sockets (with
setsockopt(IPV6_V6ONLY=0)), which we didn't deal with.
That resulted in us trying to in6_selectsrc_addr() on a v4 mapped v6
address, which does not work.

Instead we translate the mapped address to v4 and treat it as an IPv4
address.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-07-01 10:02:32 +02:00
Kristof Provost
b33308db39 if_ovpn: static probe points
Sprinkle a few SDTs around if_ovpn to ease debugging.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-06-28 13:50:54 +02:00
Kristof Provost
ab91feabcc ovpn: Introduce OpenVPN DCO support
OpenVPN Data Channel Offload (DCO) moves OpenVPN data plane processing
(i.e. tunneling and cryptography) into the kernel, rather than using tap
devices.
This avoids significant copying and context switching overhead between
kernel and user space and improves OpenVPN throughput.

In my test setup throughput improved from around 660Mbit/s to around
2Gbit/s.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D34340
2022-06-28 11:33:10 +02:00
Alexander V. Chernikov
8010b7a78a routing: simplify decompose_change_notification().
The function's goal is to compare old/new nhop/nexthop group for the route
 and decompose it into the series of RTM_ADD/RTM_DELETE single-nhop
 events, calling specified callback for each event.
Simplify it by properly leveraging the fact that both old/new groups
 are sorted nhop-# ascending.

Tested by:	Claudio Jeker<claudio.jeker@klarasystems.com>
Differential Revision: https://reviews.freebsd.org/D35598
MFC after: 2 weeks
2022-06-27 17:30:52 +00:00
Alexander V. Chernikov
76f1ab8eff routing: actually sort nexthops in nhgs by their index
Nexthops in the nexthop groups needs to be deterministically sorted
 by some their property to simplify reporting cost when changing
 large nexthop groups.

Fix reporting by actually sorting next hops by their indices (`wn_cmp_idx()`).
As calc_min_mpath_slots_fast() has an assumption that next hops are sorted
using their relative weight in the nexthop groups, it needs to be
addressed as well. The latter sorting is required to quickly determine the
layout of the next hops in the actual forwarding group. For example,
what's the best way to split the traffic between nhops with weights
19,31 and 47 if the maximum nexthop group width is 64?
It is worth mentioning that such sorting is only required during nexthop
group creation and is not used elsewhere. Lastly, normally all nexthop
are of the same weight. With that in mind, (a) use spare 32 bytes inside
`struct weightened_nexthop` to avoid another memory allocation and
(b) use insertion sort to sort the nexthop weights.

Reported by:	thj
Tested by:	Claudio Jeker<claudio.jeker@klarasystems.com>
Differential Revision: https://reviews.freebsd.org/D35599
MFC after:	2 weeks
2022-06-27 17:30:52 +00:00
Kristof Provost
1865ebfb12 if_bridge: change MTU for new members
Rather than reject new bridge members because they have the wrong MTU
change it to match the bridge. If that fails, reject the new interface.

PR:	264883
Different Revision:	https://reviews.freebsd.org/D35597
2022-06-27 08:27:27 +02:00
Alexander V. Chernikov
33a0803f00 routing: fix debug headers added in 6fa8ed43ee #2.
Move debug declaration out of COMPAT_FREEBSD32 in rtsock.c

MFC after: 2 weeks
2022-06-26 07:28:15 +00:00
Alexander V. Chernikov
0e87bab6b4 routing: fix debug headers added in 6fa8ed43ee.
- move debug headers out of COMPAT_FREEBSD32 in rtsock.c
- remove accidentally-added LOG_ defines from syslog.h

MFC after:	2 weeks
2022-06-25 23:05:25 +00:00
Alexander V. Chernikov
76179e400a routing: fix syslog include for rtsock.c
MFC after:	2 weeks
2022-06-25 22:08:10 +00:00
Alexander V. Chernikov
6fa8ed43ee routing: improve debugging.
Use unified guidelines for the severity across the routing subsystem.
Update severity for some of the already-used messages to adhere the
guidelines.
Convert rtsock logging to the new FIB_ reporting format.

MFC after:	2 weeks
2022-06-25 19:53:31 +00:00
Alexander V. Chernikov
c260d5cd8e routing: fix crash when RTM_CHANGE results in no-op for the multipath
route.

Reporting logic assumed there is always some nhop change for every
 successful modification operation. Explicitly check that the changed
 nexthop indeed exists when reporting back to userland.

MFC after:	2 weeks
Reported by:	Claudio Jeker <claudio.jeker@klarasystems.com>
Tested by:	Claudio Jeker <claudio.jeker@klarasystems.com>
2022-06-25 19:35:09 +00:00
Alexander V. Chernikov
c38da70c28 routing: fix RTM_CHANGE nhgroup updates.
RTM_CHANGE operates on a single component of the multipath route (e.g. on a single nexthop).
Search of this nexthop is peformed by iterating over each component from multipath (nexthop)
 group, using check_info_match_nhop. The problem with the current code that it incorrectly
 assumes that `check_info_match_nhop()` returns true value on match, while in reality it
 returns an error code on failure). Fix this by properly comparing the result with 0.
Additionally, the followup code modified original necthop group instead of a new one.
Fix this by targetting new nexthop group instead.

Reported by:	thj
Tested by:	Claudio Jeker <claudio.jeker@klarasystems.com>
Differential Revision: https://reviews.freebsd.org/D35526
MFC after: 2 weeks
2022-06-25 18:54:57 +00:00
Alexander V. Chernikov
5d6894bd66 routing: improve debug logging
Use standard logging (FIB_XX_LOG) across nhg code instead of using
 old-style DPRINTFs.
 Add debug object printer for nhgs (`nhgrp_print_buf`).

Example:

```
Jun 19 20:17:09 devel2 kernel: [nhgrp] inet.0 nhgrp_ctl_alloc_default: multipath init done
Jun 19 20:17:09 devel2 kernel: [nhg_ctl] inet.0 alloc_nhgrp: num_nhops: 2, compiled_nhop: 2

Jun 19 20:17:26 devel2 kernel: [nhg_ctl] inet.0 alloc_nhgrp: num_nhops: 3, compiled_nhop: 3
Jun 19 20:17:26 devel2 kernel: [nhg_ctl] inet.0 destroy_nhgrp: destroying nhg#0/sz=2:[#6:1,#5:1]
```

Differential Revision: https://reviews.freebsd.org/D35525
MFC after: 2 weeks
2022-06-22 15:59:21 +00:00
Mark Johnston
60b4ad4b6b bpf: Zero pad bytes preceding BPF headers
BPF headers are word-aligned when copied into the store buffer.  Ensure
that pad bytes following the preceding packet are cleared.

Reported by:	KMSAN
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2022-06-20 12:48:13 -04:00
Mark Johnston
c88f6908b4 bpf: Correct a comment
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2022-06-20 12:48:13 -04:00
Kristof Provost
1f61367f8d pf: support matching on tags for Ethernet rules
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D35362
2022-06-20 10:16:20 +02:00
Mark Johnston
c262d5e877 debugnet: Fix an error handling bug in the DDB command tokenizer
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2022-06-16 10:05:10 -04:00
Mark Johnston
8414331481 debugnet: Handle batches of packets from if_input
Some drivers will collect multiple mbuf chains, linked by m_nextpkt,
before passing them to upper layers.  debugnet_pkt_in() didn't handle
this and would process only the first packet, typically leading to
retransmits.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2022-06-16 10:02:00 -04:00
Andrew Gallatin
43c72c45a1 lacp: Remove racy kassert
In lacp_select_tx_port_by_hash(), we assert that the selected port is
DISTRIBUTING. However, the port state is protected by the LACP_LOCK(),
which is not held around lacp_select_tx_port_by_hash().  So this
assertion is racy, and can result in a spurious panic when links
are flapping.

It is certainly possible to fix it by acquiring LACP_LOCK(),
but this seems like an early development assert, and it seems best
to just remove it, rather than add complexity inside an ifdef
INVARIANTS.

Sponsored by: Netflix
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D35396
2022-06-13 11:32:10 -04:00
Hans Petter Selasky
892eded5b8 vlan(4): Add support for allocating TLS receive tags.
The TLS receive tags are allocated directly from the receiving interface,
because mbufs are flowing in the opposite direction and then route change
checks are not useful, because they only work for outgoing traffic.

Differential revision:	https://reviews.freebsd.org/D32356
Sponsored by:	NVIDIA Networking
2022-06-07 12:54:42 +02:00
Hans Petter Selasky
1967e31379 lagg(4): Add support for allocating TLS receive tags.
The TLS receive tags are allocated directly from the receiving interface,
because mbufs are flowing in the opposite direction and then route change
checks are not useful, because they only work for outgoing traffic.

Differential revision:	https://reviews.freebsd.org/D32356
Sponsored by:	NVIDIA Networking
2022-06-07 12:54:42 +02:00
Gordon Bergling
4f493559b0 if_llatbl: Fix a typo in a debug statement
- s/droped/dropped/

Obtained from:	NetBSD
MFC after:	3 days
2022-06-04 15:22:09 +02:00
Gordon Bergling
f7faa4ad48 if_bridge(4): Fix a typo in a source code comment
- s/accross/across/

MFC after:	3 days
2022-06-04 11:26:01 +02:00
Arseny Smalyuk
d18b4bec98 netinet6: Fix mbuf leak in NDP
Mbufs leak when manually removing incomplete NDP records with pending packet via ndp -d.
It happens because lltable_drop_entry_queue() rely on `la_numheld`
counter when dropping NDP entries (lles). It turned out NDP code never
increased `la_numheld`, so the actual free never happened.

Fix the issue by introducing unified lltable_append_entry_queue(),
common for both ARP and NDP code, properly addressing packet queue
maintenance.

Reviewed By: melifaro
Differential Revision: https://reviews.freebsd.org/D35365
MFC after:	2 weeks
2022-05-31 21:06:14 +00:00