Commit Graph

4926 Commits

Author SHA1 Message Date
Kristof Provost
59219dde9a if_ovpn: fix mbuf leak
If the link is down or we can't find a peer we do not transmit the
packet, but also don't fee it.

Remember to m_freem() mbufs we can't transmit.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-07-12 14:19:25 +02:00
Zhenlei Huang
7f7a804ae0 vxlan: Add support for socket ioctls SIOC[SG]TUNFIB
Submitted by: Luiz Amaral <email@luiz.eng.br>
PR: 244004
Differential Revision:	https://reviews.freebsd.org/D32820
MFC after:	2 weeks
2022-07-08 18:14:19 +00:00
Kristof Provost
37f604b49d vnet: make VNET_FOREACH() always be a loop
VNET_FOREACH() is a LIST_FOREACH if VIMAGE is set, but empty if it's
not. This means that users of the macro couldn't use 'continue' or
'break' as one would expect of a loop.

Change VNET_FOREACH() to be a loop in all cases (although one that is
fixed to one iteration if VIMAGE is not set).

Reviewed by:	karels, melifaro, glebius
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D35739
2022-07-07 09:52:21 +02:00
Kristof Provost
6ba6c05cb2 if_ovpn: deal with short packets
If we receive a UDP packet (directed towards an active OpenVPN socket)
which is too short to contain an OpenVPN header ('struct
ovpn_wire_header') we wound up making m_copydata() read outside the
mbuf, and panicking the machine.

Explicitly check that the packet is long enough to copy the data we're
interested in. If it's not we will pass the packet to userspace, just
like we'd do for an unknown peer.

Extend a test case to provoke this situation.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-07-05 19:27:00 +02:00
Mitchell Horne
258958b3c7 ddb: use _FLAGS command macros where appropriate
Some command definitions were forced to use DB_FUNC in order to specify
their required flags, CS_OWN or CS_MORE. Use the new macros to simplify
these.

Reviewed by:	markj, jhb
MFC after:	3 days
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D35582
2022-07-05 11:56:55 -03:00
Mateusz Guzik
db4b40213a routing: hide notify_add and notify_del behind ROUTE_MPATH
Fixes a warn about unused routines without the option.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-07-04 08:38:13 +00:00
Gordon Bergling
e8b7972cfe if_clone: Fix a typo in a source code comment
- s/fucntions/functions/

MFC ater:	3 days
2022-07-03 15:13:32 +02:00
Kristof Provost
6c77f8f0e0 if_ovpn: handle m_pullup() failure
Ensure we correctly handle m_pullup() failing in ovpn_finish_rx().

Reported by:	Coverity (CID 1490340)
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-07-01 10:02:32 +02:00
Kristof Provost
9f7c81eb33 if_ovpn: deal with v4 mapped IPv6 addresses
Openvpn defaults to binding to IPv6 sockets (with
setsockopt(IPV6_V6ONLY=0)), which we didn't deal with.
That resulted in us trying to in6_selectsrc_addr() on a v4 mapped v6
address, which does not work.

Instead we translate the mapped address to v4 and treat it as an IPv4
address.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-07-01 10:02:32 +02:00
Kristof Provost
b33308db39 if_ovpn: static probe points
Sprinkle a few SDTs around if_ovpn to ease debugging.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-06-28 13:50:54 +02:00
Kristof Provost
ab91feabcc ovpn: Introduce OpenVPN DCO support
OpenVPN Data Channel Offload (DCO) moves OpenVPN data plane processing
(i.e. tunneling and cryptography) into the kernel, rather than using tap
devices.
This avoids significant copying and context switching overhead between
kernel and user space and improves OpenVPN throughput.

In my test setup throughput improved from around 660Mbit/s to around
2Gbit/s.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D34340
2022-06-28 11:33:10 +02:00
Alexander V. Chernikov
8010b7a78a routing: simplify decompose_change_notification().
The function's goal is to compare old/new nhop/nexthop group for the route
 and decompose it into the series of RTM_ADD/RTM_DELETE single-nhop
 events, calling specified callback for each event.
Simplify it by properly leveraging the fact that both old/new groups
 are sorted nhop-# ascending.

Tested by:	Claudio Jeker<claudio.jeker@klarasystems.com>
Differential Revision: https://reviews.freebsd.org/D35598
MFC after: 2 weeks
2022-06-27 17:30:52 +00:00
Alexander V. Chernikov
76f1ab8eff routing: actually sort nexthops in nhgs by their index
Nexthops in the nexthop groups needs to be deterministically sorted
 by some their property to simplify reporting cost when changing
 large nexthop groups.

Fix reporting by actually sorting next hops by their indices (`wn_cmp_idx()`).
As calc_min_mpath_slots_fast() has an assumption that next hops are sorted
using their relative weight in the nexthop groups, it needs to be
addressed as well. The latter sorting is required to quickly determine the
layout of the next hops in the actual forwarding group. For example,
what's the best way to split the traffic between nhops with weights
19,31 and 47 if the maximum nexthop group width is 64?
It is worth mentioning that such sorting is only required during nexthop
group creation and is not used elsewhere. Lastly, normally all nexthop
are of the same weight. With that in mind, (a) use spare 32 bytes inside
`struct weightened_nexthop` to avoid another memory allocation and
(b) use insertion sort to sort the nexthop weights.

Reported by:	thj
Tested by:	Claudio Jeker<claudio.jeker@klarasystems.com>
Differential Revision: https://reviews.freebsd.org/D35599
MFC after:	2 weeks
2022-06-27 17:30:52 +00:00
Kristof Provost
1865ebfb12 if_bridge: change MTU for new members
Rather than reject new bridge members because they have the wrong MTU
change it to match the bridge. If that fails, reject the new interface.

PR:	264883
Different Revision:	https://reviews.freebsd.org/D35597
2022-06-27 08:27:27 +02:00
Alexander V. Chernikov
33a0803f00 routing: fix debug headers added in 6fa8ed43ee #2.
Move debug declaration out of COMPAT_FREEBSD32 in rtsock.c

MFC after: 2 weeks
2022-06-26 07:28:15 +00:00
Alexander V. Chernikov
0e87bab6b4 routing: fix debug headers added in 6fa8ed43ee.
- move debug headers out of COMPAT_FREEBSD32 in rtsock.c
- remove accidentally-added LOG_ defines from syslog.h

MFC after:	2 weeks
2022-06-25 23:05:25 +00:00
Alexander V. Chernikov
76179e400a routing: fix syslog include for rtsock.c
MFC after:	2 weeks
2022-06-25 22:08:10 +00:00
Alexander V. Chernikov
6fa8ed43ee routing: improve debugging.
Use unified guidelines for the severity across the routing subsystem.
Update severity for some of the already-used messages to adhere the
guidelines.
Convert rtsock logging to the new FIB_ reporting format.

MFC after:	2 weeks
2022-06-25 19:53:31 +00:00
Alexander V. Chernikov
c260d5cd8e routing: fix crash when RTM_CHANGE results in no-op for the multipath
route.

Reporting logic assumed there is always some nhop change for every
 successful modification operation. Explicitly check that the changed
 nexthop indeed exists when reporting back to userland.

MFC after:	2 weeks
Reported by:	Claudio Jeker <claudio.jeker@klarasystems.com>
Tested by:	Claudio Jeker <claudio.jeker@klarasystems.com>
2022-06-25 19:35:09 +00:00
Alexander V. Chernikov
c38da70c28 routing: fix RTM_CHANGE nhgroup updates.
RTM_CHANGE operates on a single component of the multipath route (e.g. on a single nexthop).
Search of this nexthop is peformed by iterating over each component from multipath (nexthop)
 group, using check_info_match_nhop. The problem with the current code that it incorrectly
 assumes that `check_info_match_nhop()` returns true value on match, while in reality it
 returns an error code on failure). Fix this by properly comparing the result with 0.
Additionally, the followup code modified original necthop group instead of a new one.
Fix this by targetting new nexthop group instead.

Reported by:	thj
Tested by:	Claudio Jeker <claudio.jeker@klarasystems.com>
Differential Revision: https://reviews.freebsd.org/D35526
MFC after: 2 weeks
2022-06-25 18:54:57 +00:00
Alexander V. Chernikov
5d6894bd66 routing: improve debug logging
Use standard logging (FIB_XX_LOG) across nhg code instead of using
 old-style DPRINTFs.
 Add debug object printer for nhgs (`nhgrp_print_buf`).

Example:

```
Jun 19 20:17:09 devel2 kernel: [nhgrp] inet.0 nhgrp_ctl_alloc_default: multipath init done
Jun 19 20:17:09 devel2 kernel: [nhg_ctl] inet.0 alloc_nhgrp: num_nhops: 2, compiled_nhop: 2

Jun 19 20:17:26 devel2 kernel: [nhg_ctl] inet.0 alloc_nhgrp: num_nhops: 3, compiled_nhop: 3
Jun 19 20:17:26 devel2 kernel: [nhg_ctl] inet.0 destroy_nhgrp: destroying nhg#0/sz=2:[#6:1,#5:1]
```

Differential Revision: https://reviews.freebsd.org/D35525
MFC after: 2 weeks
2022-06-22 15:59:21 +00:00
Mark Johnston
60b4ad4b6b bpf: Zero pad bytes preceding BPF headers
BPF headers are word-aligned when copied into the store buffer.  Ensure
that pad bytes following the preceding packet are cleared.

Reported by:	KMSAN
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2022-06-20 12:48:13 -04:00
Mark Johnston
c88f6908b4 bpf: Correct a comment
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2022-06-20 12:48:13 -04:00
Kristof Provost
1f61367f8d pf: support matching on tags for Ethernet rules
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D35362
2022-06-20 10:16:20 +02:00
Mark Johnston
c262d5e877 debugnet: Fix an error handling bug in the DDB command tokenizer
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2022-06-16 10:05:10 -04:00
Mark Johnston
8414331481 debugnet: Handle batches of packets from if_input
Some drivers will collect multiple mbuf chains, linked by m_nextpkt,
before passing them to upper layers.  debugnet_pkt_in() didn't handle
this and would process only the first packet, typically leading to
retransmits.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2022-06-16 10:02:00 -04:00
Andrew Gallatin
43c72c45a1 lacp: Remove racy kassert
In lacp_select_tx_port_by_hash(), we assert that the selected port is
DISTRIBUTING. However, the port state is protected by the LACP_LOCK(),
which is not held around lacp_select_tx_port_by_hash().  So this
assertion is racy, and can result in a spurious panic when links
are flapping.

It is certainly possible to fix it by acquiring LACP_LOCK(),
but this seems like an early development assert, and it seems best
to just remove it, rather than add complexity inside an ifdef
INVARIANTS.

Sponsored by: Netflix
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D35396
2022-06-13 11:32:10 -04:00
Hans Petter Selasky
892eded5b8 vlan(4): Add support for allocating TLS receive tags.
The TLS receive tags are allocated directly from the receiving interface,
because mbufs are flowing in the opposite direction and then route change
checks are not useful, because they only work for outgoing traffic.

Differential revision:	https://reviews.freebsd.org/D32356
Sponsored by:	NVIDIA Networking
2022-06-07 12:54:42 +02:00
Hans Petter Selasky
1967e31379 lagg(4): Add support for allocating TLS receive tags.
The TLS receive tags are allocated directly from the receiving interface,
because mbufs are flowing in the opposite direction and then route change
checks are not useful, because they only work for outgoing traffic.

Differential revision:	https://reviews.freebsd.org/D32356
Sponsored by:	NVIDIA Networking
2022-06-07 12:54:42 +02:00
Gordon Bergling
4f493559b0 if_llatbl: Fix a typo in a debug statement
- s/droped/dropped/

Obtained from:	NetBSD
MFC after:	3 days
2022-06-04 15:22:09 +02:00
Gordon Bergling
f7faa4ad48 if_bridge(4): Fix a typo in a source code comment
- s/accross/across/

MFC after:	3 days
2022-06-04 11:26:01 +02:00
Arseny Smalyuk
d18b4bec98 netinet6: Fix mbuf leak in NDP
Mbufs leak when manually removing incomplete NDP records with pending packet via ndp -d.
It happens because lltable_drop_entry_queue() rely on `la_numheld`
counter when dropping NDP entries (lles). It turned out NDP code never
increased `la_numheld`, so the actual free never happened.

Fix the issue by introducing unified lltable_append_entry_queue(),
common for both ARP and NDP code, properly addressing packet queue
maintenance.

Reviewed By: melifaro
Differential Revision: https://reviews.freebsd.org/D35365
MFC after:	2 weeks
2022-05-31 21:06:14 +00:00
KUROSAWA Takahiro
d6cd20cc5c netinet6: fix ndp proxying
We could insert proxy NDP entries by the ndp command, but the host
with proxy ndp entries had not responded to Neighbor Solicitations.
Change the following points for proxy NDP to work as expected:
* join solicited-node multicast addresses for proxy NDP entries
  in order to receive Neighbor Solicitations.
* look up proxy NDP entries not on the routing table but on the
  link-level address table when receiving Neighbor Solicitations.

Reviewed By: melifaro
Differential Revision: https://reviews.freebsd.org/D35307
MFC after:	2 weeks
2022-05-30 10:53:33 +00:00
KUROSAWA Takahiro
77001f9b6d lltable: introduce the llt_post_resolved callback
In order to decrease ifdef INET/INET6s in the lltable implementation,
introduce the llt_post_resolved callback and implement protocol-dependent
code in the protocol-dependent part.

Reviewed By: melifaro
Differential Revision: https://reviews.freebsd.org/D35322
MFC after:	2 weeks
2022-05-30 10:53:33 +00:00
KUROSAWA Takahiro
3719dedb91 lltable: use sa_family_t instead of int for lltable.llt_af
Reviewed By: melifaro, #network
Differential Revision: https://reviews.freebsd.org/D35323
MFC after:	2 weeks
2022-05-30 10:53:33 +00:00
Konrad Sewiłło-Jopek
c9a5c48ae8 arp: Implement sticky ARP mode for interfaces.
Provide sticky ARP flag for network interface which marks it as the
"sticky" one similarly to what we have for bridges. Once interface is
marked sticky, any address resolved using the ARP will be saved as a
static one in the ARP table. Such functionality may be used to prevent
ARP spoofing or to decrease latencies in Ethernet networks.

The drawbacks include potential limitations in usage of ARP-based
load-balancers and high-availability solutions such as carp(4).

The implemented option is disabled by default, therefore should not
impact the default behaviour of the networking stack.

Sponsored by:		Conclusive Engineering sp. z o.o.
Reviewed By:		melifaro, pauamma_gundo.com
Differential Revision: https://reviews.freebsd.org/D35314
MFC after:		2 weeks
2022-05-27 12:41:30 +00:00
Konstantin Belousov
6a311e6fa5 Add ifcap2 names for RXTLS4 and RXTLS6 interface capabilities
and corresponding nvlist capabilities name strings.

Reviewed by:	hselasky, jhb, kp (previous version)
Sponsored by:	NVIDIA Networking
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D32551
2022-05-24 23:59:32 +03:00
Konstantin Belousov
051e7d78b0 Kernel-side infrastructure to implement nvlist-based set/get ifcaps
Reviewed by:	hselasky, jhb, kp (previous version)
Sponsored by:	NVIDIA Networking
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D32551
2022-05-24 23:59:32 +03:00
Konstantin Belousov
b96549f057 struct ifnet: add if_capabilities2 and if_capenable2 bitmasks
We are running out of bits in if_capabilities.

Suggested by:	jhb
Reviewed by:	hselasky, jhb, kp (previous version)
Sponsored by:	NVIDIA Networking
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D32551
2022-05-24 23:59:32 +03:00
Andrey V. Elsukov
f2ab916084 [vlan + lagg] add IFNET_EVENT_UPDATE_BAUDRATE event
use it to update if_baudrate for vlan interfaces created on the LACP lagg.

Differential revision:	https://reviews.freebsd.org/D33405
2022-05-20 06:38:43 +02:00
Mitchell Horne
a84bf5eaa1 debugnet: fix an errant assertion
We may call debugnet_free() before g_debugnet_pcb_inuse is true,
specifically in the cases where the interface is down or does not
support debugnet. pcb->dp_drv_input is used to hold the real driver
if_input callback while debugnet is in use, so we can check the status
of this field in the assertion.

This can be triggered trivially by trying to configure netdump on an
unsupported interface at the ddb prompt.

Initializing the dp_drv_input field to NULL explicitly is not necessary
but helps display the intent.

PR:		263929
Reported by:	Martin Filla <freebsd@sysctl.cz>
Reviewed by:	cem, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D35179
2022-05-14 10:27:53 -03:00
Kurosawa Takahiro
9573cc3555 rtsock: fix a stack overflow
struct sockaddr is not sufficient for buffer that can hold any
sockaddr_* structure. struct sockaddr_storage should be used.

Test:
ifconfig epair create
ifconfig epair0a inet6 add 2001:db8::1 up
ndp -s 2001:db8::2 02:86:98:2e:96:0b proxy # this triggers kernel stack overflow

Reviewed by:	markj, kp
Differential Revision:	https://reviews.freebsd.org/D35188
2022-05-13 20:05:36 +02:00
Kristof Provost
cbbce42345 epair: unbind prior to returning to userspace
If 'options RSS' is set we bind the epair tasks to different CPUs. We
must take care to not keep the current thread bound to the last CPU when
we return to userspace.

MFC after:	1 week
Sponsored by:	Orange Business Services
2022-05-07 18:17:33 +02:00
Kristof Provost
a6b0c8d04d epair: fix set but not used warning
If 'options RSS' is set.

MFC after:	1 week
Sponsored by:	Orange Business Services
2022-05-07 18:17:32 +02:00
Kristof Provost
868bf82153 if: avoid interface destroy race
When we destroy an interface while the jail containing it is being
destroyed we risk seeing a race between if_vmove() and the destruction
code, which results in us trying to move a destroyed interface.

Protect against this by using the ifnet_detach_sxlock to also covert
if_vmove() (and not just detach).

PR:		262829
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D34704
2022-05-06 13:55:08 +02:00
Gleb Smirnoff
51f798e761 netisr: serialize/restore m_pkthdr.rcvif when queueing mbufs
Reviewed by:		kp
Differential revision:	https://reviews.freebsd.org/D33268

(cherry picked from commit 6871de9363)
2022-05-05 14:38:07 -04:00
Gleb Smirnoff
4d7a1361ef ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif
Supplement ifindex table with generation count and use it to
serialize & restore an ifnet pointer.

Reviewed by:		kp
Differential revision:	https://reviews.freebsd.org/D33266
Fun note:		git show e6abef0918

(cherry picked from commit e1882428dc)
2022-05-05 14:38:07 -04:00
Gleb Smirnoff
80e60e236d ifnet: make if_index global
Now that ifindex is static to if.c we can unvirtualize it.  For lifetime
of an ifnet its index never changes.  To avoid leaking foreign interfaces
the net.link.generic.system.ifcount sysctl and the ifnet_byindex() KPI
filter their returned value on curvnet.  Since if_vmove() no longer
changes the if_index, inline ifindex_alloc() and ifindex_free() into
if_alloc() and if_free() respectively.

API wise the only change is that now minimum interface index can be
greater than 1.  The holes in interface indexes were always allowed.

Reviewed by:		kp
Differential revision:	https://reviews.freebsd.org/D33672

(cherry picked from commit 91f44749c6)
2022-05-05 14:38:07 -04:00
Marko Zec
d461deeaa4 VNET: Revert "ifnet: make if_index global"
This reverts commit 91f44749c6.

Devirtualization of V_if_index and V_ifindex_table was rushed into
the tree lacking proper context, discussion, and declaration of intent,
so I'm backing it out as harmful to VNET on the following grounds:

1) The change repurposed the decades-old and stable if_index KBI for
new, unclear goals which were omitted from the commit note.

2) The change opened up a new resource exhaustion vector where any vnet
could starve the system of ifnet indices, including vnet0.

3) To circumvent the newly introduced problem of separating ifnets
belonging to different vnets from the globalized ifindex_table, the
author introduced sysctl_ifcount() which does a linear traversal over
the (potentially huge) global ifnet list just to return a simple upper
bound on existing ifnet indices.

4) The change effectively led to nonuniform ifnet index allocation
among vnets.

5) The commit note clearly stated that the patch changed the implicit
if_index ABI contract where ifnet indices were assumed to be starting
from one.  The commit note also included a correct observation that
holes in interface indices were always allowed, but failed to declare
that the userland-observable ifindex tables could now include huge
empty spans even under modest operating conditions.

6) The author had an earlier proposal in the works which did not
affect per-vnet ifnet lists (D33265) but which he abandoned without
providing the rationale behind his decision to do so, at the expense
of sacrificing the vnet isolation contract and if_index ABI / KBI.

Furthermore, the author agreed to back out his changes himself and
to follow up with a proposal for a less intrusive alternative, but
later silently declined to act.  Therefore, I decided to resolve the
status-quo by backing this out myself.  This in no way precludes a
future proposal aiming to mitigate ifnet-removal related system
crashes or panics to be accepted, provided it would not unnecessarily
compromise the goal of as strict as possible isolation between vnets.

Obtained from: github.com/glebius/FreeBSD/commits/backout-ifindex
2022-05-03 19:27:57 +02:00
Marko Zec
6c741ffbfa Revert "mbuf: do not restore dying interfaces"
This reverts commit 703e533da5.

Revert "ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif"

This reverts commit e1882428dc.

Obtained from: github.com/glebius/FreeBSD/commits/backout-ifindex
2022-05-03 19:11:40 +02:00