The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
Expect that drivers call into the network stack with the net epoch
entered. This has already been the fact since early 2020. The net
interrupts, that are marked with INTR_TYPE_NET, were entering epoch
since 511d1afb6b. For the taskqueues there is NET_TASK_INIT() and
all drivers that were known back in 2020 we marked with it in
6c3e93cb5a. However in e87c494015 we took conservative approach
and preferred to opt-in rather than opt-out for the epoch.
This change not only reverts e87c494015 but adds a safety belt to
avoid panicing with INVARIANTS if there is a missed driver. With
INVARIANTS we will run in_epoch() check, print a warning and enter
the net epoch. A driver that prints can be quickly fixed with the
IFF_NEEDSEPOCH flag, but better be augmented to properly enter the
epoch itself.
Note on TCP LRO: it is a backdoor to enter the TCP stack bypassing
some layers of net stack, ignoring either old IFF_KNOWSEPOCH or the
new IFF_NEEDSEPOCH. But the tcp_lro_flush_all() asserts the presence
of network epoch. Indeed, all NIC drivers that support LRO already
provide the epoch, either with help of INTR_TYPE_NET or just running
NET_EPOCH_ENTER() in their code.
Reviewed by: zlei, gallatin, erj
Differential Revision: https://reviews.freebsd.org/D39510
We already remove mbuf tags from packets transitting an if_epair, but we
didn't remove vlan metadata.
In certain configurations this could lead to unexpected vlan tags
turning up on the rx side.
PR: 270736
Reviewed by: markj
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D39482
epairs currently shuttle all transmitted packets through a single global
taskqueue thread. To hand packets over to the taskqueue thread, each
epair maintains a pair of ring buffers and a lockless scheme for
notifying the thread of pending work. The implementation can lead to
lost wakeups, causing to-be-transmitted packets to end up stuck in the
queue.
Rather than extending the existing scheme, simply replace it with a
linked list protected by a mutex, and use the mutex to synchronize
wakeups of the taskqueue thread. This appears to give equivalent or
better throughput with >= 16 producer threads and eliminates the lost
wakeups.
Reviewed by: kp
MFC after: 1 week
Sponsored by: Klara, Inc.
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D38843
The m_flags field of struct mbuf is 24 bits wide and so gets truncated
in a couple of places in the epair code. Instead of preserving the
entire flag set, just remember whether M_BCAST or M_MCAST is set.
MFC after: 1 week
Sponsored by: Klara, Inc.
Hide the ifnet structure definition, no user serviceable parts inside,
it's a netstack implementation detail. Include it temporarily in
<net/if_var.h> until all drivers are updated to use the accessors
exclusively.
Reviewed by: glebius
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38046
* Factor out queue selection (epair_select_queue()) and mbuf
preparation (epair_prepare_mbuf()) from epair_menq(). It simplifies
epair_menq() implementation and reduces the amount of dependencies
on the neighbouring epair.
* Use dedicated epair_set_state() instead of 2-lines copy-paste
* Factor out unit selection code (epair_handle_unit()) from
epair_clone_create(). It simplifies the clone creation logic.
Reviewed By: kp
Differential Revision: https://reviews.freebsd.org/D36689
Convert most of the cloner customers who require custom params
to the new if_clone KPI.
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D36636
MFC after: 2 weeks
Simplify epair_clone_create() and epair_clone_destroy() by
factoring out epair softc allocation / desctruction and
interface setup/teardown into separate functions.
Reviewed By: kp, zlei.huang_gmail.com
Differential Revision: https://reviews.freebsd.org/D36614
MFC after: 2 weeks
If 'options RSS' is set we bind the epair tasks to different CPUs. We
must take care to not keep the current thread bound to the last CPU when
we return to userspace.
MFC after: 1 week
Sponsored by: Orange Business Services
66acf7685b failed to build on riscv (and mips). This is because the
atomic_testandset_int() (and friends) functions do not exist there.
Happily those platforms do have the long variant, so switch to that.
PR: 262571
MFC after: 3 days
As an unwanted side effect of the performance improvements in
24f0bfbad5, epair interfaces stop forwarding traffic on higher
load levels when running on multi-core systems.
This happens due to a race condition in the logic that decides when to
place work in the task queue(s) responsible for processing the content
of ring buffers.
In order to fix this, a field named state is added to the epair_queue
structure. This field is used by the affected functions to signal each
other that something happened in the underlying ring buffers that might
require work to be scheduled in task queue(s), replacing the existing
logic, which relied on checking if ring buffers are empty or not.
epair_menq() does:
- set BIT_MBUF_QUEUED
- queue mbuf
- if testandset BIT_QUEUE_TASK:
enqueue task
epair_tx_start_deferred() does:
- swap ring buffers
- process mbufs
- clear BIT_QUEUE_TASK
- if testandclear BIT_MBUF_QUEUED
enqueue task
PR: 262571
Reported by: Johan Hendriks <joh.hendriks@gmail.com>
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D34569
Allow multiple cores to be used to process if_epair traffic. We do this
(if RSS is enabled) based on the RSS hash of the incoming packet. This
allows us to distribute the load over multiple cores, rather than
sending everything to the same one.
We also switch from swi_sched() to taskqueues, which also contributes to
better throughput.
Benchmark results:
With net.isr.maxthreads=-1
Setup A: (cc0 - bridge0 - epair0a) (epair0b - bridge1 - cc1)
Before 627 Kpps
After (no RSS) 1.198 Mpps
After (RSS) 3.148 Mpps
Setup B: (cc0 - bridge0 - epaira0) (epair0b - vnet jail - epair1a) (epair1b - bridge1 - cc1)
Before 7.705 Kpps
After (no RSS) 1.017 Mpps
After (RSS) 2.083 Mpps
MFC after: 3 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D33731
Rework if_epair(4) to no longer use netisr and dpcpu.
Instead use mbufq and swi_net.
This simplifies the code and seems to make it work better and
no longer hang.
Work largely by bz@, with minor tweaks by kp@.
Reviewed by: bz, kp
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D31077
Remove all (non-persistent) tags when we transmit a packet. Real network
interfaces do not carry any tags either, and leaving tags attached can
produce unexpected results.
Reviewed by: bz, glebius
MFC after: 3 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D32663
Revert the mitigation code for the vnet/epair cleanup race (done in r365457).
r368237 introduced a more reliable fix.
MFC after: 2 weeks
Sponsored by: Modirum MDPay
There's a race where dying vnets move their interfaces back to their original
vnet, and if_epair cleanup (where deleting one interface also deletes the other
end of the epair). This is commonly triggered by the pf tests, but also by
cleanup of vnet jails.
As we've not yet been able to fix the root cause of the issue work around the
panic by not dereferencing a NULL softc in epair_qflush() and by not
re-attaching DYING interfaces.
This isn't a full fix, but makes a very common panic far less likely.
PR: 244703, 238870
Reviewed by: lutz_donnerhacke.de
MFC after: 4 days
Differential Revision: https://reviews.freebsd.org/D26324
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
if_epair used the 'params' argument to pass a pointer to the b interface
through if_clone_create().
This pointer can be controlled by userspace, which means it could be abused to
trigger a panic. While this requires PRIV_NET_IFCREATE
privileges those are assigned to vnet jails, which means that vnet jails
could panic the system.
Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>
MFC after: 3 days
As reported in PR184149, it can happen that epair devices can have the same
MAC address.
This solution is based on a 32-bit hash, obtained combining the if_index of
the a interface and the hostid.
If the hostid is zero, a random number is used.
PR: 184149
Reviewed by: wollman, eugen
Approved by: cognet
Differential Revision: https://reviews.freebsd.org/D15329
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
No functional change intended.
The VNET_SYSUNINIT() callback is executed after the MOD_UNLOAD. That means
that netisr_unregister() has already been called when
netisr_unregister_vnet() gets calls, leading to an assertion failure.
Restore the expected order of operations by performing everything that
was done in MOD_UNLOAD to a SYSUNINIT() (that will be called after the
VNET_SYSUNINIT()).
Differential Revision: https://reviews.freebsd.org/D12771
that this shouldn't go in. I was unaware when I merged the pull
request. I don't wish to upset the status quo, so backout per
project practice.
Pull Request: https://github.com/freebsd/freebsd/pull/92
Noted by: hrs@
Use netisr_get_cpuid() in netisr_select_cpuid() to limit cpuid value
returned by protocol to be sure that it is not greather than nws_count.
PR: 211836
Reviewed by: adrian
MFC after: 3 days
than removing the network interfaces first. This change is rather larger
and convoluted as the ordering requirements cannot be separated.
Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and
related modules to their own SI_SUB_PROTO_FIREWALL.
Move initialization of "physical" interfaces to SI_SUB_DRIVERS,
move virtual (cloned) interfaces to SI_SUB_PSEUDO.
Move Multicast to SI_SUB_PROTO_MC.
Re-work parts of multicast initialisation and teardown, not taking the
huge amount of memory into account if used as a module yet.
For interface teardown we try to do as many of them as we can on
SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling
over a higher layer protocol such as IP. In that case the interface
has to go along (or before) the higher layer protocol is shutdown.
Kernel hhooks need to go last on teardown as they may be used at various
higher layers and we cannot remove them before we cleaned up the higher
layers.
For interface teardown there are multiple paths:
(a) a cloned interface is destroyed (inside a VIMAGE or in the base system),
(b) any interface is moved from a virtual network stack to a different
network stack ("vmove"), or (c) a virtual network stack is being shut down.
All code paths go through if_detach_internal() where we, depending on the
vmove flag or the vnet state, make a decision on how much to shut down;
in case we are destroying a VNET the individual protocol layers will
cleanup their own parts thus we cannot do so again for each interface as
we end up with, e.g., double-frees, destroying locks twice or acquiring
already destroyed locks.
When calling into protocol cleanups we equally have to tell them
whether they need to detach upper layer protocols ("ulp") or not
(e.g., in6_ifdetach()).
Provide or enahnce helper functions to do proper cleanup at a protocol
rather than at an interface level.
Approved by: re (hrs)
Obtained from: projects/vnet
Reviewed by: gnn, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6747
Add accessor functions to toggle the state per VNET.
The base system (vnet0) will always enable itself with the normal
registration. We will share the registered protocol handlers in all
VNETs minimising duplication and management.
Upon disabling netisr processing for a VNET drain the netisr queue from
packets for that VNET.
Update netisr consumers to (de)register on a per-VNET start/teardown using
VNET_SYS(UN)INIT functionality.
The change should be transparent for non-VIMAGE kernels.
Reviewed by: gnn (, hiren)
Obtained from: projects/vnet
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D6691
is added to return EEXIST when only "b" interface exists---this can happen
when epair<N>b is moved to a vnet jail and then "ifconfig epair<N> create"
is invoked there.
struct ifnet if_oqdrops.
Some netgraph modules used ifqueue w/o ifnet. Accounting of queue drops
is simply removed from them. There were no API to read this statistic.
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
interface, in the r241616 a crutch was provided. It didn't work well, and
finally we decided that it is time to break ABI and simply make if_baudrate
a 64-bit value. Meanwhile, the entire struct if_data was reviewed.
o Remove the if_baudrate_pf crutch.
o Make all fields of struct if_data fixed machine independent size. The
notion of data (packet counters, etc) are by no means MD. And it is a
bug that on amd64 we've got a 64-bit counters, while on i386 32-bit,
which at modern speeds overflow within a second.
This also removes quite a lot of COMPAT_FREEBSD32 code.
o Give 16 bit for the ifi_datalen field. This field was provided to
make future changes to if_data less ABI breaking. Unfortunately the
8 bit size of it had effectively limited sizeof if_data to 256 bytes.
o Give 32 bits to ifi_mtu and ifi_metric.
o Give 64 bits to the rest of fields, since they are counters.
__FreeBSD_version bumped.
Discussed with: emax
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h
Sponsored by: Netflix
Sponsored by: Nginx, Inc.