Commit Graph

4898 Commits

Author SHA1 Message Date
Hans Petter Selasky
1967e31379 lagg(4): Add support for allocating TLS receive tags.
The TLS receive tags are allocated directly from the receiving interface,
because mbufs are flowing in the opposite direction and then route change
checks are not useful, because they only work for outgoing traffic.

Differential revision:	https://reviews.freebsd.org/D32356
Sponsored by:	NVIDIA Networking
2022-06-07 12:54:42 +02:00
Gordon Bergling
4f493559b0 if_llatbl: Fix a typo in a debug statement
- s/droped/dropped/

Obtained from:	NetBSD
MFC after:	3 days
2022-06-04 15:22:09 +02:00
Gordon Bergling
f7faa4ad48 if_bridge(4): Fix a typo in a source code comment
- s/accross/across/

MFC after:	3 days
2022-06-04 11:26:01 +02:00
Arseny Smalyuk
d18b4bec98 netinet6: Fix mbuf leak in NDP
Mbufs leak when manually removing incomplete NDP records with pending packet via ndp -d.
It happens because lltable_drop_entry_queue() rely on `la_numheld`
counter when dropping NDP entries (lles). It turned out NDP code never
increased `la_numheld`, so the actual free never happened.

Fix the issue by introducing unified lltable_append_entry_queue(),
common for both ARP and NDP code, properly addressing packet queue
maintenance.

Reviewed By: melifaro
Differential Revision: https://reviews.freebsd.org/D35365
MFC after:	2 weeks
2022-05-31 21:06:14 +00:00
KUROSAWA Takahiro
d6cd20cc5c netinet6: fix ndp proxying
We could insert proxy NDP entries by the ndp command, but the host
with proxy ndp entries had not responded to Neighbor Solicitations.
Change the following points for proxy NDP to work as expected:
* join solicited-node multicast addresses for proxy NDP entries
  in order to receive Neighbor Solicitations.
* look up proxy NDP entries not on the routing table but on the
  link-level address table when receiving Neighbor Solicitations.

Reviewed By: melifaro
Differential Revision: https://reviews.freebsd.org/D35307
MFC after:	2 weeks
2022-05-30 10:53:33 +00:00
KUROSAWA Takahiro
77001f9b6d lltable: introduce the llt_post_resolved callback
In order to decrease ifdef INET/INET6s in the lltable implementation,
introduce the llt_post_resolved callback and implement protocol-dependent
code in the protocol-dependent part.

Reviewed By: melifaro
Differential Revision: https://reviews.freebsd.org/D35322
MFC after:	2 weeks
2022-05-30 10:53:33 +00:00
KUROSAWA Takahiro
3719dedb91 lltable: use sa_family_t instead of int for lltable.llt_af
Reviewed By: melifaro, #network
Differential Revision: https://reviews.freebsd.org/D35323
MFC after:	2 weeks
2022-05-30 10:53:33 +00:00
Konrad Sewiłło-Jopek
c9a5c48ae8 arp: Implement sticky ARP mode for interfaces.
Provide sticky ARP flag for network interface which marks it as the
"sticky" one similarly to what we have for bridges. Once interface is
marked sticky, any address resolved using the ARP will be saved as a
static one in the ARP table. Such functionality may be used to prevent
ARP spoofing or to decrease latencies in Ethernet networks.

The drawbacks include potential limitations in usage of ARP-based
load-balancers and high-availability solutions such as carp(4).

The implemented option is disabled by default, therefore should not
impact the default behaviour of the networking stack.

Sponsored by:		Conclusive Engineering sp. z o.o.
Reviewed By:		melifaro, pauamma_gundo.com
Differential Revision: https://reviews.freebsd.org/D35314
MFC after:		2 weeks
2022-05-27 12:41:30 +00:00
Konstantin Belousov
6a311e6fa5 Add ifcap2 names for RXTLS4 and RXTLS6 interface capabilities
and corresponding nvlist capabilities name strings.

Reviewed by:	hselasky, jhb, kp (previous version)
Sponsored by:	NVIDIA Networking
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D32551
2022-05-24 23:59:32 +03:00
Konstantin Belousov
051e7d78b0 Kernel-side infrastructure to implement nvlist-based set/get ifcaps
Reviewed by:	hselasky, jhb, kp (previous version)
Sponsored by:	NVIDIA Networking
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D32551
2022-05-24 23:59:32 +03:00
Konstantin Belousov
b96549f057 struct ifnet: add if_capabilities2 and if_capenable2 bitmasks
We are running out of bits in if_capabilities.

Suggested by:	jhb
Reviewed by:	hselasky, jhb, kp (previous version)
Sponsored by:	NVIDIA Networking
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D32551
2022-05-24 23:59:32 +03:00
Andrey V. Elsukov
f2ab916084 [vlan + lagg] add IFNET_EVENT_UPDATE_BAUDRATE event
use it to update if_baudrate for vlan interfaces created on the LACP lagg.

Differential revision:	https://reviews.freebsd.org/D33405
2022-05-20 06:38:43 +02:00
Mitchell Horne
a84bf5eaa1 debugnet: fix an errant assertion
We may call debugnet_free() before g_debugnet_pcb_inuse is true,
specifically in the cases where the interface is down or does not
support debugnet. pcb->dp_drv_input is used to hold the real driver
if_input callback while debugnet is in use, so we can check the status
of this field in the assertion.

This can be triggered trivially by trying to configure netdump on an
unsupported interface at the ddb prompt.

Initializing the dp_drv_input field to NULL explicitly is not necessary
but helps display the intent.

PR:		263929
Reported by:	Martin Filla <freebsd@sysctl.cz>
Reviewed by:	cem, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D35179
2022-05-14 10:27:53 -03:00
Kurosawa Takahiro
9573cc3555 rtsock: fix a stack overflow
struct sockaddr is not sufficient for buffer that can hold any
sockaddr_* structure. struct sockaddr_storage should be used.

Test:
ifconfig epair create
ifconfig epair0a inet6 add 2001:db8::1 up
ndp -s 2001:db8::2 02:86:98:2e:96:0b proxy # this triggers kernel stack overflow

Reviewed by:	markj, kp
Differential Revision:	https://reviews.freebsd.org/D35188
2022-05-13 20:05:36 +02:00
Kristof Provost
cbbce42345 epair: unbind prior to returning to userspace
If 'options RSS' is set we bind the epair tasks to different CPUs. We
must take care to not keep the current thread bound to the last CPU when
we return to userspace.

MFC after:	1 week
Sponsored by:	Orange Business Services
2022-05-07 18:17:33 +02:00
Kristof Provost
a6b0c8d04d epair: fix set but not used warning
If 'options RSS' is set.

MFC after:	1 week
Sponsored by:	Orange Business Services
2022-05-07 18:17:32 +02:00
Kristof Provost
868bf82153 if: avoid interface destroy race
When we destroy an interface while the jail containing it is being
destroyed we risk seeing a race between if_vmove() and the destruction
code, which results in us trying to move a destroyed interface.

Protect against this by using the ifnet_detach_sxlock to also covert
if_vmove() (and not just detach).

PR:		262829
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D34704
2022-05-06 13:55:08 +02:00
Gleb Smirnoff
51f798e761 netisr: serialize/restore m_pkthdr.rcvif when queueing mbufs
Reviewed by:		kp
Differential revision:	https://reviews.freebsd.org/D33268

(cherry picked from commit 6871de9363)
2022-05-05 14:38:07 -04:00
Gleb Smirnoff
4d7a1361ef ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif
Supplement ifindex table with generation count and use it to
serialize & restore an ifnet pointer.

Reviewed by:		kp
Differential revision:	https://reviews.freebsd.org/D33266
Fun note:		git show e6abef0918

(cherry picked from commit e1882428dc)
2022-05-05 14:38:07 -04:00
Gleb Smirnoff
80e60e236d ifnet: make if_index global
Now that ifindex is static to if.c we can unvirtualize it.  For lifetime
of an ifnet its index never changes.  To avoid leaking foreign interfaces
the net.link.generic.system.ifcount sysctl and the ifnet_byindex() KPI
filter their returned value on curvnet.  Since if_vmove() no longer
changes the if_index, inline ifindex_alloc() and ifindex_free() into
if_alloc() and if_free() respectively.

API wise the only change is that now minimum interface index can be
greater than 1.  The holes in interface indexes were always allowed.

Reviewed by:		kp
Differential revision:	https://reviews.freebsd.org/D33672

(cherry picked from commit 91f44749c6)
2022-05-05 14:38:07 -04:00
Marko Zec
d461deeaa4 VNET: Revert "ifnet: make if_index global"
This reverts commit 91f44749c6.

Devirtualization of V_if_index and V_ifindex_table was rushed into
the tree lacking proper context, discussion, and declaration of intent,
so I'm backing it out as harmful to VNET on the following grounds:

1) The change repurposed the decades-old and stable if_index KBI for
new, unclear goals which were omitted from the commit note.

2) The change opened up a new resource exhaustion vector where any vnet
could starve the system of ifnet indices, including vnet0.

3) To circumvent the newly introduced problem of separating ifnets
belonging to different vnets from the globalized ifindex_table, the
author introduced sysctl_ifcount() which does a linear traversal over
the (potentially huge) global ifnet list just to return a simple upper
bound on existing ifnet indices.

4) The change effectively led to nonuniform ifnet index allocation
among vnets.

5) The commit note clearly stated that the patch changed the implicit
if_index ABI contract where ifnet indices were assumed to be starting
from one.  The commit note also included a correct observation that
holes in interface indices were always allowed, but failed to declare
that the userland-observable ifindex tables could now include huge
empty spans even under modest operating conditions.

6) The author had an earlier proposal in the works which did not
affect per-vnet ifnet lists (D33265) but which he abandoned without
providing the rationale behind his decision to do so, at the expense
of sacrificing the vnet isolation contract and if_index ABI / KBI.

Furthermore, the author agreed to back out his changes himself and
to follow up with a proposal for a less intrusive alternative, but
later silently declined to act.  Therefore, I decided to resolve the
status-quo by backing this out myself.  This in no way precludes a
future proposal aiming to mitigate ifnet-removal related system
crashes or panics to be accepted, provided it would not unnecessarily
compromise the goal of as strict as possible isolation between vnets.

Obtained from: github.com/glebius/FreeBSD/commits/backout-ifindex
2022-05-03 19:27:57 +02:00
Marko Zec
6c741ffbfa Revert "mbuf: do not restore dying interfaces"
This reverts commit 703e533da5.

Revert "ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif"

This reverts commit e1882428dc.

Obtained from: github.com/glebius/FreeBSD/commits/backout-ifindex
2022-05-03 19:11:40 +02:00
Marko Zec
0fa5636966 Revert "netisr: serialize/restore m_pkthdr.rcvif when queueing mbufs"
This reverts commit 6871de9363.

Obtained from: github.com/glebius/FreeBSD/commits/backout-ifindex
2022-05-03 19:11:39 +02:00
Greg Foster
00a80538b4 lacp: short timeout erroneously declares link-flapping
Panasas was seeing a higher-than-expected number of link-flap events.
After joint debugging with the switch vendor, we determined there were
problems on both sides; either of which might cause the occasional
event, but together caused lots of them.

On the switch side, an internal queuing issue was causing LACP PDUs --
which should be sent every second, in short-timeout mode -- to sometimes
be sent slightly later than they should have been. In some cases, two
successive PDUs were late, but we never saw three late PDUs in a row.

On the FreeBSD side, we saw a link-flap event every time there were two
late PDUs, while the spec says that it takes *three* seconds of downtime
to trigger that event. It turns out that if a PDU was received shortly
before the timer code was run, it would decrement less than a full
second after the PDU arrived. Then two delayed PDUs would cause two
additional decrements, causing it to reach zero less than three seconds
after the most-recent on-time PDU.

The solution is to note the time a PDU arrives, and only decrement if at
least a full second has elapsed since then.

Reported by:	Greg Foster <gfoster@panasas.com>
Reviewed by:	gallatin
Tested by:	Greg Foster <gfoster@panasas.com>
MFC after:	3 days
Sponsored by:	Panasas
Differential Revision:	https://reviews.freebsd.org/D35070
2022-04-27 12:41:30 -07:00
Reid Linnemann
0abcc1d2d3 pf: Add per-rule timestamps for rule and eth_rule
Similar to ipfw rule timestamps, these timestamps internally are
uint32_t snaps of the system time in seconds. The timestamp is CPU local
and updated each time a rule or a state associated with a rule or state
is matched.

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D34970
2022-04-22 19:53:20 +02:00
Kristof Provost
812839e5aa pf: allow the use of tables in ethernet rules
Allow tables to be used for the l3 source/destination matching.
This requires taking the PF_RULES read lock.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D34917
2022-04-20 13:01:12 +02:00
John Baldwin
ac3e46fa3e infiniband_resolve_addr: ih is only used for INET or INET6. 2022-04-13 16:08:21 -07:00
John Baldwin
d98981585c ether_resolve_addr: eh is only used for INET or INET6. 2022-04-13 16:08:21 -07:00
John Baldwin
2884a93651 vlan: ifa is only used under #ifdef INET. 2022-04-13 16:08:21 -07:00
John Baldwin
2174f0f2f2 net/route: Use __diagused for variables only used in KASSERT(). 2022-04-13 16:08:19 -07:00
Kristof Provost
742e7210d0 udp: allow udp_tun_func_t() to indicate it did not eat the packet
Allow udp tunnel functions to indicate they have not taken ownership of
the packet, and that normal UDP processing should continue.

This is especially useful for scenarios where the kernel has taken
ownership of a socket that was originally created by userspace. It
allows the tunnel function to pass through certain packets for userspace
processing.

The primary user of this is if_ovpn, when it receives messages from
unknown peers (which might be a new client).

Reviewed by:	tuexen
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D34883
2022-04-12 10:04:59 +02:00
Gordon Bergling
1a15a383a6 net: Fix a typo in a source code comment
- s/peform/perform/

MFC after:	3 days
2022-04-09 11:37:57 +02:00
John Baldwin
d08cb45362 iflib: Use empty inline functions for prefetch*() on non-x86.
This avoids warnings about unused variables in expressions passed to
prefetch*().
2022-04-08 17:25:14 -07:00
Mark Johnston
990a6d18b0 net: Fix memory leaks in lltable_calc_llheader() error paths
Also convert raw epoch_call() calls to lltable_free_entry() calls, no
functional change intended.  There's no need to asynchronously free the
LLEs in that case to begin with, but we might as well use the lltable
interfaces consistently.

Noticed by code inspection; I believe lltable_calc_llheader() failures
do not generally happen in practice.

Reviewed by:	bz
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34832
2022-04-08 11:47:25 -04:00
John Baldwin
f7236dd068 change_mpath_route: Remove write-only nh variable.
While here, cleanup the style of the function prologue by moving an
assignment out of the middle of two variable declaration blocks.
2022-04-06 16:45:28 -07:00
John Baldwin
371c917b0b unlink_nhgrp: Remove write-only variable.
Possibly one could assert that ret should always be 0 here (that is,
that there was always an index found in the bitmask).  That should be
true since a bitmask index is allocated before the nhgrp is inserted
in the ctl->gr_head list in link_nhgrp.
2022-04-06 16:45:27 -07:00
Warner Losh
e606e5d157 sysctl_dumpentry: move error to inner scope
Sponsored by:		Netflix
2022-04-04 22:30:50 -06:00
Warner Losh
5de5b5a34d route_ctl: eliminate write only variables ifa and nh
Sponsored by:		Netflix
2022-04-04 22:30:48 -06:00
Warner Losh
7f9c3339a4 get_nhop: eliminate write only variable gateway
Sponsored by:		Netflix
2022-04-04 22:30:47 -06:00
Gordon Bergling
d792dc7ebb net(4): Fix a typo in a source code comment
- s/accomodate/accommodate/

MFC after:	3 days
2022-04-02 14:57:06 +02:00
Gordon Bergling
cba46da538 net(3): Fix a typo in a source code comment
- s/verion/version/

MFC after:	3 days
2022-04-02 10:53:40 +02:00
Gordon Bergling
f8d292b665 net(3): Fix a typo in a source code comment
- s/Multilik/Multilink/

Obtained from:	NetBSD
MFC after:	3 days
2022-04-02 09:41:10 +02:00
Gordon Bergling
23677398ca net(3): Fix a typo in a source code comment
- s/paramenters/parameters/

MFC after:	3 days
2022-04-02 09:24:48 +02:00
Kristof Provost
9bb06778f8 pf: support listing ethernet anchors
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-03-30 10:28:19 +02:00
Gordon Bergling
bef80a7285 vxlan(4): Fix two typos in sysctl descriptions
- s/fowarding/forwarding/

MFC after:	3 days
2022-03-28 19:35:34 +02:00
Mateusz Guzik
bd7762c869 pf: add a rule rb tree
with md5 sum used as key.

This gets rid of the quadratic rule traversal when "keep_counters" is
set.

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-03-28 11:45:03 +00:00
Mateusz Guzik
1a3e98a5b8 pf: pre-compute rule hash
Makes it cheaper to compare rules when "keep_counters" is set.
This also sets up keeping them in a RB tree.

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-03-28 11:44:52 +00:00
Mateusz Guzik
93f8c38c03 pf: add pf_config_lock
For now only protects rule creation/destruction, but will allow
gradually reducing the scope of rules lock when changing the
rules.

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-03-28 11:44:46 +00:00
Alexander V. Chernikov
1b8b69508b routing: copy nexthop fib when changing existing nexthop
MFC after:	1 day
2022-03-28 11:32:30 +00:00
Gordon Bergling
ef88adc527 pf(4): Fix a typo in a source code comment
- s/seaching/searching/

MFC after:	3 days
2022-03-27 19:57:49 +02:00