Commit Graph

4882 Commits

Author SHA1 Message Date
Kristof Provost
868bf82153 if: avoid interface destroy race
When we destroy an interface while the jail containing it is being
destroyed we risk seeing a race between if_vmove() and the destruction
code, which results in us trying to move a destroyed interface.

Protect against this by using the ifnet_detach_sxlock to also covert
if_vmove() (and not just detach).

PR:		262829
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D34704
2022-05-06 13:55:08 +02:00
Gleb Smirnoff
51f798e761 netisr: serialize/restore m_pkthdr.rcvif when queueing mbufs
Reviewed by:		kp
Differential revision:	https://reviews.freebsd.org/D33268

(cherry picked from commit 6871de9363)
2022-05-05 14:38:07 -04:00
Gleb Smirnoff
4d7a1361ef ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif
Supplement ifindex table with generation count and use it to
serialize & restore an ifnet pointer.

Reviewed by:		kp
Differential revision:	https://reviews.freebsd.org/D33266
Fun note:		git show e6abef0918

(cherry picked from commit e1882428dc)
2022-05-05 14:38:07 -04:00
Gleb Smirnoff
80e60e236d ifnet: make if_index global
Now that ifindex is static to if.c we can unvirtualize it.  For lifetime
of an ifnet its index never changes.  To avoid leaking foreign interfaces
the net.link.generic.system.ifcount sysctl and the ifnet_byindex() KPI
filter their returned value on curvnet.  Since if_vmove() no longer
changes the if_index, inline ifindex_alloc() and ifindex_free() into
if_alloc() and if_free() respectively.

API wise the only change is that now minimum interface index can be
greater than 1.  The holes in interface indexes were always allowed.

Reviewed by:		kp
Differential revision:	https://reviews.freebsd.org/D33672

(cherry picked from commit 91f44749c6)
2022-05-05 14:38:07 -04:00
Marko Zec
d461deeaa4 VNET: Revert "ifnet: make if_index global"
This reverts commit 91f44749c6.

Devirtualization of V_if_index and V_ifindex_table was rushed into
the tree lacking proper context, discussion, and declaration of intent,
so I'm backing it out as harmful to VNET on the following grounds:

1) The change repurposed the decades-old and stable if_index KBI for
new, unclear goals which were omitted from the commit note.

2) The change opened up a new resource exhaustion vector where any vnet
could starve the system of ifnet indices, including vnet0.

3) To circumvent the newly introduced problem of separating ifnets
belonging to different vnets from the globalized ifindex_table, the
author introduced sysctl_ifcount() which does a linear traversal over
the (potentially huge) global ifnet list just to return a simple upper
bound on existing ifnet indices.

4) The change effectively led to nonuniform ifnet index allocation
among vnets.

5) The commit note clearly stated that the patch changed the implicit
if_index ABI contract where ifnet indices were assumed to be starting
from one.  The commit note also included a correct observation that
holes in interface indices were always allowed, but failed to declare
that the userland-observable ifindex tables could now include huge
empty spans even under modest operating conditions.

6) The author had an earlier proposal in the works which did not
affect per-vnet ifnet lists (D33265) but which he abandoned without
providing the rationale behind his decision to do so, at the expense
of sacrificing the vnet isolation contract and if_index ABI / KBI.

Furthermore, the author agreed to back out his changes himself and
to follow up with a proposal for a less intrusive alternative, but
later silently declined to act.  Therefore, I decided to resolve the
status-quo by backing this out myself.  This in no way precludes a
future proposal aiming to mitigate ifnet-removal related system
crashes or panics to be accepted, provided it would not unnecessarily
compromise the goal of as strict as possible isolation between vnets.

Obtained from: github.com/glebius/FreeBSD/commits/backout-ifindex
2022-05-03 19:27:57 +02:00
Marko Zec
6c741ffbfa Revert "mbuf: do not restore dying interfaces"
This reverts commit 703e533da5.

Revert "ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif"

This reverts commit e1882428dc.

Obtained from: github.com/glebius/FreeBSD/commits/backout-ifindex
2022-05-03 19:11:40 +02:00
Marko Zec
0fa5636966 Revert "netisr: serialize/restore m_pkthdr.rcvif when queueing mbufs"
This reverts commit 6871de9363.

Obtained from: github.com/glebius/FreeBSD/commits/backout-ifindex
2022-05-03 19:11:39 +02:00
Greg Foster
00a80538b4 lacp: short timeout erroneously declares link-flapping
Panasas was seeing a higher-than-expected number of link-flap events.
After joint debugging with the switch vendor, we determined there were
problems on both sides; either of which might cause the occasional
event, but together caused lots of them.

On the switch side, an internal queuing issue was causing LACP PDUs --
which should be sent every second, in short-timeout mode -- to sometimes
be sent slightly later than they should have been. In some cases, two
successive PDUs were late, but we never saw three late PDUs in a row.

On the FreeBSD side, we saw a link-flap event every time there were two
late PDUs, while the spec says that it takes *three* seconds of downtime
to trigger that event. It turns out that if a PDU was received shortly
before the timer code was run, it would decrement less than a full
second after the PDU arrived. Then two delayed PDUs would cause two
additional decrements, causing it to reach zero less than three seconds
after the most-recent on-time PDU.

The solution is to note the time a PDU arrives, and only decrement if at
least a full second has elapsed since then.

Reported by:	Greg Foster <gfoster@panasas.com>
Reviewed by:	gallatin
Tested by:	Greg Foster <gfoster@panasas.com>
MFC after:	3 days
Sponsored by:	Panasas
Differential Revision:	https://reviews.freebsd.org/D35070
2022-04-27 12:41:30 -07:00
Reid Linnemann
0abcc1d2d3 pf: Add per-rule timestamps for rule and eth_rule
Similar to ipfw rule timestamps, these timestamps internally are
uint32_t snaps of the system time in seconds. The timestamp is CPU local
and updated each time a rule or a state associated with a rule or state
is matched.

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D34970
2022-04-22 19:53:20 +02:00
Kristof Provost
812839e5aa pf: allow the use of tables in ethernet rules
Allow tables to be used for the l3 source/destination matching.
This requires taking the PF_RULES read lock.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D34917
2022-04-20 13:01:12 +02:00
John Baldwin
ac3e46fa3e infiniband_resolve_addr: ih is only used for INET or INET6. 2022-04-13 16:08:21 -07:00
John Baldwin
d98981585c ether_resolve_addr: eh is only used for INET or INET6. 2022-04-13 16:08:21 -07:00
John Baldwin
2884a93651 vlan: ifa is only used under #ifdef INET. 2022-04-13 16:08:21 -07:00
John Baldwin
2174f0f2f2 net/route: Use __diagused for variables only used in KASSERT(). 2022-04-13 16:08:19 -07:00
Kristof Provost
742e7210d0 udp: allow udp_tun_func_t() to indicate it did not eat the packet
Allow udp tunnel functions to indicate they have not taken ownership of
the packet, and that normal UDP processing should continue.

This is especially useful for scenarios where the kernel has taken
ownership of a socket that was originally created by userspace. It
allows the tunnel function to pass through certain packets for userspace
processing.

The primary user of this is if_ovpn, when it receives messages from
unknown peers (which might be a new client).

Reviewed by:	tuexen
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D34883
2022-04-12 10:04:59 +02:00
Gordon Bergling
1a15a383a6 net: Fix a typo in a source code comment
- s/peform/perform/

MFC after:	3 days
2022-04-09 11:37:57 +02:00
John Baldwin
d08cb45362 iflib: Use empty inline functions for prefetch*() on non-x86.
This avoids warnings about unused variables in expressions passed to
prefetch*().
2022-04-08 17:25:14 -07:00
Mark Johnston
990a6d18b0 net: Fix memory leaks in lltable_calc_llheader() error paths
Also convert raw epoch_call() calls to lltable_free_entry() calls, no
functional change intended.  There's no need to asynchronously free the
LLEs in that case to begin with, but we might as well use the lltable
interfaces consistently.

Noticed by code inspection; I believe lltable_calc_llheader() failures
do not generally happen in practice.

Reviewed by:	bz
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34832
2022-04-08 11:47:25 -04:00
John Baldwin
f7236dd068 change_mpath_route: Remove write-only nh variable.
While here, cleanup the style of the function prologue by moving an
assignment out of the middle of two variable declaration blocks.
2022-04-06 16:45:28 -07:00
John Baldwin
371c917b0b unlink_nhgrp: Remove write-only variable.
Possibly one could assert that ret should always be 0 here (that is,
that there was always an index found in the bitmask).  That should be
true since a bitmask index is allocated before the nhgrp is inserted
in the ctl->gr_head list in link_nhgrp.
2022-04-06 16:45:27 -07:00
Warner Losh
e606e5d157 sysctl_dumpentry: move error to inner scope
Sponsored by:		Netflix
2022-04-04 22:30:50 -06:00
Warner Losh
5de5b5a34d route_ctl: eliminate write only variables ifa and nh
Sponsored by:		Netflix
2022-04-04 22:30:48 -06:00
Warner Losh
7f9c3339a4 get_nhop: eliminate write only variable gateway
Sponsored by:		Netflix
2022-04-04 22:30:47 -06:00
Gordon Bergling
d792dc7ebb net(4): Fix a typo in a source code comment
- s/accomodate/accommodate/

MFC after:	3 days
2022-04-02 14:57:06 +02:00
Gordon Bergling
cba46da538 net(3): Fix a typo in a source code comment
- s/verion/version/

MFC after:	3 days
2022-04-02 10:53:40 +02:00
Gordon Bergling
f8d292b665 net(3): Fix a typo in a source code comment
- s/Multilik/Multilink/

Obtained from:	NetBSD
MFC after:	3 days
2022-04-02 09:41:10 +02:00
Gordon Bergling
23677398ca net(3): Fix a typo in a source code comment
- s/paramenters/parameters/

MFC after:	3 days
2022-04-02 09:24:48 +02:00
Kristof Provost
9bb06778f8 pf: support listing ethernet anchors
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-03-30 10:28:19 +02:00
Gordon Bergling
bef80a7285 vxlan(4): Fix two typos in sysctl descriptions
- s/fowarding/forwarding/

MFC after:	3 days
2022-03-28 19:35:34 +02:00
Mateusz Guzik
bd7762c869 pf: add a rule rb tree
with md5 sum used as key.

This gets rid of the quadratic rule traversal when "keep_counters" is
set.

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-03-28 11:45:03 +00:00
Mateusz Guzik
1a3e98a5b8 pf: pre-compute rule hash
Makes it cheaper to compare rules when "keep_counters" is set.
This also sets up keeping them in a RB tree.

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-03-28 11:44:52 +00:00
Mateusz Guzik
93f8c38c03 pf: add pf_config_lock
For now only protects rule creation/destruction, but will allow
gradually reducing the scope of rules lock when changing the
rules.

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-03-28 11:44:46 +00:00
Alexander V. Chernikov
1b8b69508b routing: copy nexthop fib when changing existing nexthop
MFC after:	1 day
2022-03-28 11:32:30 +00:00
Gordon Bergling
ef88adc527 pf(4): Fix a typo in a source code comment
- s/seaching/searching/

MFC after:	3 days
2022-03-27 19:57:49 +02:00
Kristof Provost
0bf7acd6b7 if_epair: build fix
66acf7685b failed to build on riscv (and mips). This is because the
atomic_testandset_int() (and friends) functions do not exist there.
Happily those platforms do have the long variant, so switch to that.

PR:		262571
MFC after:	3 days
2022-03-17 06:43:47 +01:00
Michael Gmelin
66acf7685b if_epair: fix race condition on multi-core systems
As an unwanted side effect of the performance improvements in
24f0bfbad5, epair interfaces stop forwarding traffic on higher
load levels when running on multi-core systems.

This happens due to a race condition in the logic that decides when to
place work in the task queue(s) responsible for processing the content
of ring buffers.

In order to fix this, a field named state is added to the epair_queue
structure. This field is used by the affected functions to signal each
other that something happened in the underlying ring buffers that might
require work to be scheduled in task queue(s), replacing the existing
logic, which relied on checking if ring buffers are empty or not.

epair_menq() does:
  - set BIT_MBUF_QUEUED
  - queue mbuf
  - if testandset BIT_QUEUE_TASK:
      enqueue task

epair_tx_start_deferred() does:
  - swap ring buffers
  - process mbufs
  - clear BIT_QUEUE_TASK
  - if testandclear BIT_MBUF_QUEUED
      enqueue task

PR:		262571
Reported by:	Johan Hendriks <joh.hendriks@gmail.com>
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D34569
2022-03-16 23:08:55 +01:00
Kristof Provost
8a42005d1e pf: support basic L3 filtering in the Ethernet rules
Allow filtering based on the source or destination IP/IPv6 address in
the Ethernet layer rules.

Reviewed by:	pauamma_gundo.com (man), debdrup (man)
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D34482
2022-03-14 22:42:37 +01:00
Mateusz Guzik
f11b6505f1 pf: add PF_UNLNKDRULES_ASSERT
Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-03-10 17:20:41 +00:00
Vincenzo Maffione
09a1893398 netmap: fix refcount bug in netmap allocator
Symptom: when a single extmem memory region is provided to netmap
multiple times, for multiple interfaces, the memory region is
never released by netmap once all the existing file descriptors
are closed.

Fix the relevant condition in netmap_mem_drop(): release the memory
when the last user of netmap_adapter is gone, rather then when
the last user of netmap_mem_d is gone.

MFC after:	2 weeks
2022-03-06 16:39:16 +00:00
Santiago Martinez
52bcdc5b80 if_epair: fix build with RSS and INET or INET6 disabled
Reviewed by:	kp
MFC after:	1 week
2022-03-03 18:31:26 +01:00
Kristof Provost
b590f17a11 pf: support masking mac addresses
When filtering Ethernet packets allow rules to specify a mac address
with a mask. This indicates which bits of the specified address are
significant. This allows users to do things like filter based on device
manufacturer.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-03-02 17:00:08 +01:00
Kristof Provost
c5131afee3 pf: add anchor support for ether rules
Support anchors in ether rules.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D32482
2022-03-02 17:00:07 +01:00
Kristof Provost
fb330f3931 pf: support dummynet on L2 rules
Allow packets to be tagged with dummynet information. Note that we do
not apply dummynet shaping on the L2 traffic, but instead mark it for
dummynet processing in the L3 code. This is the same approach as we take
for ALTQ.

Sponsored by:   Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D32222
2022-03-02 17:00:06 +01:00
Kristof Provost
20c4899a8e pf: Do not hold PF_RULES_RLOCK while processing Ethernet rules
Avoid the overhead of acquiring a (read) RULES lock when processing the
Ethernet rules.
We can get away with that because when rules are modified they're staged
in V_pf_keth_inactive. We take care to ensure the swap to V_pf_keth is
atomic, so that pf_test_eth_rule() always sees either the old rules, or
the new ruleset.

We need to take care not to delete the old ruleset until we're sure no
pf_test_eth_rule() is still running with those. We accomplish that by
using NET_EPOCH_CALL() to actually free the old rules.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31739
2022-03-02 17:00:03 +01:00
Kristof Provost
e732e742b3 pf: Initial Ethernet level filtering code
This is the kernel side of stateless Ethernel level filtering for pf.

The primary use case for this is to enable captive portal functionality
to allow/deny access by MAC address, rather than per IP address.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31737
2022-03-02 17:00:03 +01:00
Kristof Provost
36637dd19d bridge: Don't share broadcast packets
if_bridge duplicates broadcast packets with m_copypacket(), which
creates shared packets. In certain circumstances these packets can be
processed by udp_usrreq.c:udp_input() first, which modifies the mbuf as
part of the checksum verification. That may lead to incorrect packets
being transmitted.

Use m_dup() to create independent mbufs instead.

Reported by:	Richard Russo <toast@ruka.org>
Reviewed by:	donner, afedorov
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D34319
2022-02-21 19:03:44 +01:00
Mateusz Guzik
430e0e409c vnet: add CURVNET_ASSERT_SET for !VIMAGE
Reported by:	ler
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-02-19 21:00:00 +00:00
Mateusz Guzik
75cde1f872 vnet: add CURVNET_ASSERT_SET
Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D34312
2022-02-19 13:10:01 +00:00
Li-Wen Hsu
7442b63231
if_epair: Use ANSI C definition
This fixes -Werror=strict-prototypes from gcc9

Sponsored by:	The FreeBSD Foundation
2022-02-15 21:45:22 +08:00
Kristof Provost
24f0bfbad5 if_epair: implement fanout
Allow multiple cores to be used to process if_epair traffic. We do this
(if RSS is enabled) based on the RSS hash of the incoming packet. This
allows us to distribute the load over multiple cores, rather than
sending everything to the same one.

We also switch from swi_sched() to taskqueues, which also contributes to
better throughput.

Benchmark results:
With net.isr.maxthreads=-1

Setup A: (cc0 - bridge0 - epair0a) (epair0b - bridge1 - cc1)

Before          627 Kpps
After (no RSS)  1.198 Mpps
After (RSS)     3.148 Mpps

Setup B: (cc0 - bridge0 - epaira0) (epair0b - vnet jail - epair1a) (epair1b - bridge1 - cc1)

Before          7.705 Kpps
After (no RSS)  1.017 Mpps
After (RSS)     2.083 Mpps

MFC after:	3 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D33731
2022-02-15 09:03:24 +01:00