Commit Graph

147132 Commits

Author SHA1 Message Date
Julien Grall
ab7ce14b1d xen/intr: introduce dev/xen/bus/intr-internal.h
Move the xenisrc structure which needs to be shared between the core Xen
interrupt code and architecture-dependent code into a separate header.  A
similar situation exists for the NR_EVENT_CHANNELS constant.

Turn xi_intsrc into a type definition named xi_arch to reflect the new
purpose of being an architectural variable for the interrupt source.

This was originally implemented by Julien Grall, but has been heavily
modified.  The core side was renamed "intr-internal.h" and is #include'd
by "arch-intr.h" instead of the other way around.  This allows the
architecture to add function definitions which use struct xenisrc.

The original version only moved xi_intsrc into xen_arch_isrc_t.  Moving
xi_vector was done by the submitter.

The submitter had also moved xi_activehi and xi_edgetrigger into
xen_arch_isrc_t.  Those disappeared with the removal of PVHv1 support.

Copyright note.  The current xenisrc structure was introduced at
76acc41fb7 by Justin T. Gibbs.  Traces remain, but the strength of
Copyright claims from before 2013 seem pretty weak.

Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>, 2021-03-17 19:09:01
Original implementation: Julien Grall <julien@xen.org>, 2015-10-20 09:14:56
Differential Revision: https://reviews.freebsd.org/D30648
[royger]
 - Adjust some line lengths
 - Fix comment about NR_EVENT_CHANNELS after movement.
 - Use #include instead of symlinks.
2023-04-14 15:58:53 +02:00
Elliott Mitchell
af610cabf1 xen/intr: adjust xen_intr_handle_upcall() to match driver filter
xen_intr_handle_upcall() has two interfaces.  It needs to be called by
the x86 assembly code invoked by the APIC.  Second, it needs to be called
as a driver_filter_t for the XenPCI code and for architectures besides
x86.

Unfortunately the driver_filter_t interface was implemented as a wrapper
around the x86-APIC interface.  Now create a simple wrapper for the
x86-APIC code, which calls an architecture-independent
xen_intr_handle_upcall().

When called via intr_event_handle(), driver_filter_t functions expect
preemption to be disabled.  This removes the need for
critical_enter()/critical_exit() when called this way.

The lapic_eoi() call is only needed on x86 in some cases when invoked
directly as an APIC vector handler.

Additionally driver_filter_t functions have no need to handle interrupt
counters.  The intrcnt_add() calling function was reworked to match the
current situation.  intrcnt_add() is now only called via one path.

The increment/decrement of curthread->td_intr_nesting_level had
previously been left out.  Appears this was mostly harmless, but this
was noticed during implementation and has been added.

CONFIG_X86 is a leftover from use with Linux.  While the barrier isn't
needed for FreeBSD on x86, it will be needed for FreeBSD on other
architectures.

Copyright note.  xen_intr_intrcnt_add() was introduced at 76acc41fb7
by Justin T. Gibbs.  xen_intrcnt_init() was introduced at fd036deac1
by John Baldwin.

sys/x86/xen/xen_arch_intr.c was originally created by Julien Grall in
2015 for the purpose of holding the x86 interrupt interface.  Later it
was found xen_intr_handle_upcall() was better earlier, and the x86
interrupt interface better later.  As such the filename and header list
belong to Julien Grall, but what those were created for is later.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30006
2023-04-14 15:58:52 +02:00
Elliott Mitchell
2794893ebf xen/intr: do full xenisrc initialization during binding
Keeping released xenisrcs in a known state simplifies allocation, but
forces the allocation function to maintain that state.  This turns into
a problem when trying to allow for interchangeable allocation functions.
Fix this issue by ensuring xenisrcs are always *fully* initialized
during binding.

Reviewed by: royger
2023-04-14 15:58:51 +02:00
Elliott Mitchell
ff73b1d69b xen/intr: split xen_intr_isrc_lock uses
There are actually several distinct locking domains in xen_intr.c, but
all were sharing the same lock.  Both xen_intr_port_to_isrc[] and the
x86 interrupt structures needed protection.  Split these two apart as a
precursor to splitting the architecture portions off the file.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30726
2023-04-14 15:58:51 +02:00
Elliott Mitchell
834013dea2 xen/intr: rework xen_intr_alloc_isrc() locking
Locking for allocation was being done in xen_intr_bind_isrc(), but the
unlock was inside xen_intr_alloc_isrc().  While the lock acquisition at
the end of xen_intr_alloc_isrc() was to modify xen_intr_port_to_isrc[],
NOT allocation.  Fix this garbled (though working) locking scheme.

Now locking for allocation is strictly in xen_intr_alloc_isrc(), while
locking to modify xen_intr_port_to_isrc[] is in xen_intr_bind_isrc().

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30726
2023-04-14 15:58:50 +02:00
Elliott Mitchell
09bd542d17 xen/intr: rework xen_intr_alloc_isrc() call structure
The call structure around xen_intr_alloc_isrc() was rather awful.
Notably finding a structure for reuse is part of allocation, but this
was done outside xen_intr_alloc_isrc().  Move this into
xen_intr_alloc_isrc() so the function handles all allocation steps.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30726
2023-04-14 15:58:49 +02:00
Elliott Mitchell
149c581018 xen/intr: adjust xenisrc types, adjust format strings to match
As "CPUs", IRQs (vector) and virtual IRQs are always positive integers,
adjust the Xen code to use unsigned integers.  Several format strings
need adjustment to match.  Additionally single-bit bitfields are
boolean.

No functional change expected.

Reviewed by: royger
2023-04-14 15:58:49 +02:00
Elliott Mitchell
ecdcad6516 xen: remove CONFIG_XEN_COMPAT, purge Xen 3.0 compatibility
This overlaps the purpose of __XEN_INTERFACE_VERSION__.  Remove Xen 3.0.2
compatibility.  __XEN_INTERFACE_VERSION__ has compatibility to Xen 3.2.8
enabled.  As Xen 3.3 was released almost 15 years ago, it seems unlikely
anyone hasn't updated.

Reviewed by: royger
2023-04-14 15:58:48 +02:00
Elliott Mitchell
61ccede8cf xen: purge no longer used hypervisor functions
HYPERVISOR_poll(), HYPERVISOR_block(), and HYPERVISOR_crash() appear no
longer used.  Further get_system_time() appears to have disappeared at
some point in the past, so HYPERVISOR_poll() was broken anyway.

No functional change intended.

Reviewed by: royger
2023-04-14 15:58:47 +02:00
Elliott Mitchell
b2c50bb934 xen/efi: make Xen PV EFI clock optional
The present implementation is only for x86.  Other architectures need
adjustments for querying presence of EFI.

Xen's EFI support is also quite troublesome on non-x86.  This is being
slowly remedied, but until in better shape the EFI clock functionality
should be disabled.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D31065
2023-04-14 15:58:47 +02:00
Julien Grall
28a78d860e xen: introduce XEN_CPUID_TO_VCPUID()/XEN_VCPUID()
Part of the series for allowing FreeBSD/ARM to run on Xen.  On ARM the
function is a trivial pass-through, other architectures need distinct
implementations.

While implementing XEN_VCPUID() as a call to XEN_CPUID_TO_VCPUID()
works, that involves multiple accesses to the PCPU region.  As such make
this a distinct macro.  Only callers in machine independent code have
been switched.

Add a wrapper for the x86 PIC interface to use matching the old
prototype.

Partially inspired by the work of Julien Grall <julien@xen.org>,
2015-08-01 09:45:06, but XEN_VCPUID() was redone by Elliott Mitchell on
2022-06-13 12:51:57.

Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Original implementation: Julien Grall <julien@xen.org>, 2014-04-19 08:57:40
Original implementation: Julien Grall <julien@xen.org>, 2014-04-19 14:32:01
Differential Revision: https://reviews.freebsd.org/D29404
2023-04-14 15:58:46 +02:00
Elliott Mitchell
054073c283 xen/intr: xen_intr_bind_isrc() always set handle
Previously the upper layer handle was being set before the last
potential error condition.  The reasoning appears to have been it was
assumed invalid in case of an error being returned.  Now ensure it is
invalid until just before a successful return.

Fixes: 76acc41fb7 ("Implement vector callback for PVHVM and unify event channel implementations")
Fixes: 6d54cab1fe ("xen: allow to register event channels without handlers")
Reviewed by: royger
2023-04-14 15:58:45 +02:00
Randall Stewart
9903bf34f0 tcp: rack pacing has some caveats that need to be obeyed when LRO is missing
n further non-LRO testing I found a case where rack is supposed to be waking up but
it is not now. In this special case it sets the flag rc_ack_can_sendout_data. When that is
set we should not prohibit output.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39565
2023-04-14 09:33:36 -04:00
Kristof Provost
b0e38a1373 bridge: distinguish no vlan and vlan 1
The bridge treated no vlan tag as being equivalent to vlan ID 1, which
causes confusion if the bridge sees both untagged and vlan 1 tagged
traffic.

Use DOT1Q_VID_NULL when there's no tag, and fix up the lookup code by
using 'DOT1Q_VID_RSVD_IMPL' to mean 'any vlan', rather than vlan 0. Note
that we have to account for userspace expecting to use 0 as meaning 'any
vlan'.

PR:		270559
Suggested by:	Zhenlei Huang <zlei@FreeBSD.org>
Reviewed by:	philip, zlei
Differential Revision:  https://reviews.freebsd.org/D39478
2023-04-14 13:17:02 +02:00
Zhenlei Huang
9af6f4268a bridge: Use the %D identifier to format MAC address
It is shorter and more readable.

No functional change intended.

Reviewed by:	kp
Fixes:		2d3614fb13 bridge: Log MAC address port flapping
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39542
2023-04-14 18:08:56 +08:00
Kajetan Staszkiewicz
39282ef356 pf: backport OpenBSD syntax of "scrub" option for "match" and "pass" rules
Introduce the OpenBSD syntax of "scrub" option for "match" and "pass"
rules and the "set reassemble" flag. The patch is backward-compatible,
pf.conf can be still written in FreeBSD-style.

Obtained from:	OpenBSD
MFC after:	never
Sponsored by:	InnoGames GmbH
Differential Revision:	https://reviews.freebsd.org/D38025
2023-04-14 09:04:06 +02:00
Gordon Bergling
26713ad9cf arm: Remove a double word in a comment in setjmp
- s/number number/number/

MFC after:	5 days
2023-04-13 20:37:25 +02:00
Gordon Bergling
c159f76713 kern: remove a double word in a KASSERT in subr_trap
- s/with with/with/

MFC after:	5 days
2023-04-13 20:03:37 +02:00
Henri Hennebert
71883128e5 rtsx: Add plug-and-play info
Add MODULE_PNP_INFO() to the driver to make it autoload if not linked
statically into the kernel. Remove the device from amd64/i386 GENERIC.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D35074
2023-04-13 11:12:50 -03:00
Randall Stewart
25685b7537 TCP: Misc cleanups of tcp_subr.c
In going through all the TCP stacks I have found we have a few little bugs and niggles that need to
be cleaned up in tcp_subr.c including the following:

a) Set tcp_restoral_thresh to 450 (45%) not 550. This is a better proven value in my testing.
b) Lets track when we try to do pacing but fail via a counter for connections that do pace.
c) If a switch away from the default stack occurs and it fails we need to make sure the time
   scale is in the right mode (just in case the other stack changed it but then failed).
d) Use the TP_RXTCUR() macro when starting the TT_REXMT timer.
e) When we end a default flow lets log that in BBlogs as well as cleanup any t_acktime (disable).
f) When we respond with a RST lets make sure to update the log_end_status properly.
g) When starting a new pcb lets assure that all LRO features are off.
h) When discarding a connection lets make sure that any t_in_pkt's that might be there are freed properly.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39501
2023-04-13 09:29:05 -04:00
Ed Maste
2ef2c26f3f link_elf: fix SysV hash function overflow
Quoting from https://maskray.me/blog/2023-04-12-elf-hash-function:

The System V Application Binary Interface (generic ABI) specifies the
ELF object file format. When producing an output executable or shared
object needing a dynamic symbol table (.dynsym), a linker generates a
.hash section with type SHT_HASH to hold a symbol hash table. A DT_HASH
tag is produced to hold the address of .hash.

The function is supposed to return a value no larger than 0x0fffffff.
Unfortunately, there is a bug. When unsigned long consists of more than
32 bits, the return value may be larger than UINT32_MAX. For instance,
elf_hash((const unsigned char *)"\xff\x0f\x0f\x0f\x0f\x0f\x12") returns
0x100000002, which is clearly unintended, as the function should behave
the same way regardless of whether long represents a 32-bit integer or
a 64-bit integer.

Reviewed by:	kib, Fangrui Song
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39517
2023-04-12 15:33:55 -04:00
John Baldwin
1ca12bd927 Remove the riscv64sf architecture.
Reviewed by:	jrtc27, arichardson, br, kp, imp, emaste
Differential Revision:	https://reviews.freebsd.org/D39496
2023-04-12 11:09:27 -07:00
Michael Tuexen
2ba2849c82 tcp: fix typo in comment
Reported by:	cc
MFC after:	1 week
Sponsored by:	Netflix, Inc.
2023-04-12 18:08:21 +02:00
Michael Tuexen
c687f21add tcp: make net.inet.tcp.functions_default vnet specific
Reviewed by:		cc, rrs
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D39516
2023-04-12 18:04:27 +02:00
Randall Stewart
1073f41657 tcp_lro: When processing compressed acks lets support the new early wake feature for rack.
During compressed ack and mbuf queuing we determine if we need to wake up. A
new function was added that is optional to the tfb so that the stack itself can also
be asked if a wakeup should happen. This helps compensate for late hpts calls.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39502
2023-04-12 11:35:14 -04:00
Andrew Turner
421516f25e Create pmap_mask_set_locked on arm64
Create a locked version of pmap_mask_set. We will need this for BTI
support.

Sponsored by:	Arm Ltd
2023-04-12 13:10:13 +01:00
Michael Tuexen
73c48d9d8f tcp: fix deregistering stacks when vnets are used
This fixes a bug where stacks could not be deregistered when
end points in the non-default vnet are using it.

Reviewed by:		glebius, zlei
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D39514
2023-04-12 10:52:53 +02:00
Zhenlei Huang
c3c5e6c3e6 tarfs: Use the existing CTLFLAG_RWTUN flag definition
Use it when possible, instead of separated flags.

No functional change intended.

Reviewed by:	hselasky, erj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D39466
2023-04-12 12:20:38 +08:00
Zhenlei Huang
deac4c7f07 iicbus(4): Use the existing CTLFLAG_RWTUN flag definition
Use it when possible, instead of separated flags.

No functional change intended.

Reviewed by:	hselasky, erj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D39466
2023-04-12 12:20:38 +08:00
Zhenlei Huang
8bd9afe9e1 bxe(4): Use CTLFLAG_RDTUN flag definition
sysctl variables rx_budget and max_aggregation_size are read-only loader
tunable. Mark them with CTLFLAG_RD flag.

No functional change intended.

Reviewed by:	hselasky, erj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D39466
2023-04-12 12:20:38 +08:00
Zhenlei Huang
5ff8018108 ice(4): Use the existing CTLFLAG_RWTUN flag definition
Use it when possible, instead of separated flags.

No functional change intended.

Reviewed by:	hselasky, erj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D39466
2023-04-12 12:20:38 +08:00
Zhenlei Huang
69cb72b872 cam iosched: Use the existing CTLFLAG_RDTUN and CTLFLAG_RWTUN flag definitions
Use them when possible, instead of separated flags.

No functional change intended.

Reviewed by:	hselasky, erj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D39466
2023-04-12 12:20:38 +08:00
Zhenlei Huang
dc1c5138c3 powerpc: Use the existing CTLFLAG_RDTUN and CTLFLAG_RWTUN flag definitions
Use them when possible, instead of separated flags.

No functional change intended.

Reviewed by:	hselasky, erj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D39466
2023-04-12 12:20:38 +08:00
John Baldwin
cd800d3c96 Enable -Warray-parameter for clang.
I fixed many of these previously for GCC 12 and make tinderbox passes
with this enabled.

Differential Revision:	https://reviews.freebsd.org/D39378
2023-04-11 13:47:59 -07:00
Richard Scheffenegger
2169f71277 tcp: use IPV6_FLOWLABEL_LEN
Avoid magic numbers when handling the IPv6 flow ID for
DSCP and ECN fields and use the named variable instead.

Reviewed By:		tuexen, #transport
Sponsored by:		NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D39503
2023-04-11 18:53:51 +02:00
Konstantin Belousov
c53e990b8d DEBUG_VFS_LOCKS: restore diagnostic for the witness use case
Reviewed by:	jah, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39477
2023-04-11 15:59:55 +03:00
Konstantin Belousov
75fc6f86c3 Add witness_is_owned(9)
which returns an indicator if the current thread owns the specified
lock.

Reviewed by:	jah, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39477
2023-04-11 15:59:49 +03:00
Konstantin Belousov
afa8f8971b vn_start_write(): consistently set *mpp to NULL on error or after failed sleep
This ensures that *mpp != NULL iff vn_finished_write() should be
called, regardless of the returned error, except for V_NOWAIT.
The only exception that must be maintained is the case where
vn_start_write(V_NOWAIT) is called with the intent of later dropping
other locks and then doing vn_start_write(V_XSLEEP), which needs the mp
value calculated from the non-waitable call above it.

Also note that V_XSLEEP is not supported by vn_start_secondary_write().

Reviewed by:	markj, mjg (previous version), rmacklem (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39441
2023-04-11 15:59:46 +03:00
Konstantin Belousov
b2f3288747 vn_start_write(): minor style
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39441
2023-04-11 15:59:39 +03:00
Eugene Grosbein
37f4cb29bd imgact_binmisc: unbreak module build outside of kernel build environment
MFC after:	3 days
2023-04-11 17:32:29 +07:00
domienschepers
61605e0ae5 net80211: fail for unicast traffic without unicast key
Falling back to the multicast key may cause unicast traffic to leak.
Instead fail when no key is found.

For more information see the 'Framing Frames: Bypassing Wi-Fi Encryption
by Manipulating Transmit Queues' paper.

[ I updated the commit message to reference the paper and the code
comment to record historic behaviour as discussed in private email. ]

Security:	CVE-2022-47522
2023-04-10 23:38:57 +00:00
Randall Stewart
a2b33c9a7a tcp: Rack - in the absence of LRO fixed rate pacing (loopback or interfaces with no LRO) does not work correctly.
Rack is capable of fixed rate or dynamic rate pacing. Both of these can get mixed up when
LRO is not available. This is because LRO will hold off waking up the tcp connection to
processing the inbound packets until the pacing timer is up. Without LRO the pacing only
sort-of works. Sometimes we pace correctly, other times not so much.

This set of changes will make it so pacing works properly in the absence of LRO.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39494
2023-04-10 16:33:56 -04:00
John Baldwin
e222461790 rack: mask and tclass are only used for INET6.
This fixes the LINT-NOINET6 build.
2023-04-10 12:21:03 -07:00
Joseph Koshy
0e9e9048ae
procfs: Sync a documentation comment with the code.
Approved by:	gnn (mentor)
Differential Revision: https://reviews.freebsd.org/D39488
2023-04-10 17:58:46 +00:00
John Baldwin
3b3762c34e sys: Enable -Wunused-but-set-variable for GCC.
It has been enabled for clang for a while now.

Reviewed by:	emaste
Differential Revision:	https://reviews.freebsd.org/D39358
2023-04-10 10:36:33 -07:00
John Baldwin
8e9db62e74 zfs: Appease set by unused warnings for spl_fstrans_*mark stubs.
Use a void cast to mark the cookie value as used in spl_fstrans_unmark.

Reported by:	GCC
Differential Revision:	https://reviews.freebsd.org/D39357
2023-04-10 10:36:14 -07:00
John Baldwin
5328efb3d0 if_mos: Remove set but unused variable.
Reviewed by:	hselasky
Reported by:	GCC
Differential Revision:	https://reviews.freebsd.org/D39356
2023-04-10 10:35:48 -07:00
John Baldwin
4b6228906f libalias: Mark set but unused variables as unused.
This function is clearly a stub, but it seems better to leave the stub
bits in place than to remove the function entirely.

Differential Revision:	https://reviews.freebsd.org/D39355
2023-04-10 10:35:29 -07:00
John Baldwin
16df72a9a2 udf: Remove set but unused variable from udf_getattr.
Reviewed by:	emaste
Reported by:	GCC
Differential Revision:	https://reviews.freebsd.org/D39354
2023-04-10 10:31:45 -07:00
John Baldwin
3a9e6624eb rtw88: Silence unused but set warnings from GCC for debug.c.
Reviewed by:	bz
Differential Revision:	https://reviews.freebsd.org/D39353
2023-04-10 10:31:26 -07:00
John Baldwin
0b672df914 iwlwifi: Silence unused but set warnings from GCC for iwl-debug.c.
Reviewed by:	bz
Differential Revision:	https://reviews.freebsd.org/D39352
2023-04-10 10:31:07 -07:00
John Baldwin
677e70e0c4 ipmi: Remove some dead code for unsupported BMCs.
Reviewed by:	emaste
Reported by:	GCC
Differential Revision:	https://reviews.freebsd.org/D39351
2023-04-10 10:30:54 -07:00
Alan Somers
9309a460b2 Implement GEOM::rotation_rate for gmirror
If all of the mirror's children have the same rotation rate, report
that.  But if they have mixed rotation rates, or if any child has an
unknown rotation rate, report "Unknown".

MFC after:	2 weeks
Sponsored by:	Axcient
Reviewed by:	imp
Differential Revision: https://reviews.freebsd.org/D39458
2023-04-10 10:27:10 -06:00
Christos Margiolis
38594ff9c0 ofw: fix memory leak in ofwbus_attach()
PR:		269509
Reported by:	Jaroslaw Pelczar <jarek@jpelczar.com>
Reviewed by:	markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D38903
2023-04-10 12:14:12 -04:00
Christos Margiolis
0388a0887a dtrace: handle NOP instructions in the riscv invop handler
This will be used by a forthcoming port of the kinst provider.

Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D39481
2023-04-10 12:14:11 -04:00
Mark Johnston
d862b165a6 bridge: Add support for emulated netmap mode
if_bridge receives packets via a special interface, if_bridge_input,
rather than by if_input.  Thus, netmap's usual hooking of ifnet routines
does not work as expected.  Instead, modify bridge_input() to pass
packets directly to netmap when it is enabled.  This applies to both
locally delivered packets and forwarded packets.

When a netmap application transmits a packet by writing it to the host
TX ring, the mbuf chain is passed to if_input, which ordinarily points
to ether_input().  However, when transmitting via if_bridge,
bridge_input() needs to see the packet again in order to decide whether
to deliver or forward.  Thus, introduce a new protocol flag,
M_BRIDGE_INJECT, which 1) causes the packet to be passed to
bridge_input() again after Ethernet processing, and 2) avoids passing
the packet back to netmap.  The source MAC address of the packet is used
to determine the original "receiving" interface.

Reviewed by:	vmaffione
MFC after:	2 months
Sponsored by:	Zenarmor
Sponsored by:	OPNsense
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D38066
2023-04-10 12:14:11 -04:00
Kristof Provost
c69ae84197 if_epair: also remove vlan metadata from mbufs
We already remove mbuf tags from packets transitting an if_epair, but we
didn't remove vlan metadata.
In certain configurations this could lead to unexpected vlan tags
turning up on the rx side.

PR:		270736
Reviewed by:	markj
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D39482
2023-04-10 15:55:35 +02:00
Konstantin Belousov
7b6fe2428a DEBUG_VFS_LOCKS: use witness if available
The assert_vop_locked messages are ignored, and file/line information
is not too useful. Fixing this without changing both witness and VFS
asserts KPIs is not possible.

Reviewed by:	markj (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39464
2023-04-10 00:34:12 +03:00
Alexander V. Chernikov
cc3793b1c5 netlink: improve source ifa selection algorithm when adding routes.
Use route destination sockaddr when the gateway is eiter AF_LINK or
 has the different family (IPv4 over IPv6). This change ensures
 the nexthop IFA has the same family as the destination.

Reported by:	Dmitriy Smirnov <fox@sage.su>
Tested by:	Dmitriy Smirnov <fox@sage.su>
MFC after:	3 days
2023-04-09 13:33:22 +00:00
Alexander V. Chernikov
0d4038e301 netlink: set prefix-related flags to the created nexthop.
This fixes incorrect flag combinations when adding IPv4/IPv6 host
routes.

MFC after:	3 days
2023-04-09 09:26:12 +00:00
Alexander V. Chernikov
75379ea2e4 netlink: do not print "unknown sa family" warnings at the default debug
level.

MFC after:	2 weeks
2023-04-08 19:40:32 +00:00
Alexander V. Chernikov
39c0036d88 netlink: fix !INET6 warning
Reported by:	Gary Jennejohn <garyj@gmx.de>
MFC after:	2 weeks
2023-04-08 19:39:37 +00:00
Mateusz Guzik
c17eb99a66 Bump __FreeBSD_version to 1400086 for vn_lock_pair arg change
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-04-08 17:31:01 +00:00
Hans Petter Selasky
9b077d72bc usb(4): Separate the fast path and the slow path to avoid races and use-after-free for the USB FS interface.
Bad behaving user-space USB applicatoins may crash the kernel by issuing
USB FS related ioctl(2)'s out of their expected order. By default
the USB FS ioctl(2) interface is only available to the
administrator, root, and driver applications like webcamd(8) needs
to be hijacked in order for this to happen.

The issue is the fast-path code does not always see updates made
by the slow-path code, and may then work on freed memory.

This is easily fixed by using an EPOCH(9) type of synchronization
mechanism. A SX(9) lock will be used as a substitute for EPOCH(9),
due to the need for sleepability. In addition most calls going into
the fast-path originate from a single user-space process and the
need for multi-thread performance is not present.

Differential Revision:	https://reviews.freebsd.org/D39373
Reviewed by:	markj@
Reported by:	C Turt <ecturt@gmail.com>
admbugs:	994
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2023-04-08 17:11:31 +02:00
Hans Petter Selasky
03a2e432d5 usb(4): Code refactoring as a pre-step for adding missing synchronization mechanism.
Move code in switch cases into own functions to make later changes easier to track.

No functional change, except for removing a superfluous break statement when
range checking USB_FS_MAX_FRAMES, in the USB_FS_OPEN case.
It should not have been there at all.

Suggested by:	emaste@
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2023-04-08 16:52:20 +02:00
Konstantin Belousov
182b21d462 openzfs: adopt to the new vn_lock_pair() interface
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2023-04-08 02:37:56 +03:00
Konstantin Belousov
bb24eaea49 vn_lock_pair(): allow to request shared locking
If either of vnodes is shared locked, lock must not be recursed.

Requested by:	rmacklem
Reviewed by:	markj, rmacklem
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39444
2023-04-08 01:58:26 +03:00
Mateusz Guzik
d6e2490134 zfs: disable kernel fpu usage on arm and aarc64
It is not implemented and causes panics on boot.

This is a temporary measure until someone(tm) sorts it out.

Reported by:	many
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-04-07 21:44:49 +00:00
Mateusz Guzik
02e6e8d218 vfs: extend vn_printf with vop vector 2023-04-07 20:39:06 +00:00
Mateusz Guzik
26b9648750 vfs: more informative panic for missing fplookup ops 2023-04-07 20:39:06 +00:00
Mateusz Guzik
4032c38814 ufs: add missing vop_fplookup ops to fifo vectors
Reported-by: syzbot+a324b64ef9a933659c1c@syzkaller.appspotmail.com
2023-04-07 20:39:05 +00:00
Rick Macklem
4adb28c0ab nfscl: Fix support for doing Null RPCs
Although the NFS client does not currently perform Null RPCs,
this fix is needed if/when it might do so.
Found during testing of experimental code that uses Null RPCs
to maintain/monitor TCP connections for "nconnect" mounts.

MFC after:	3 months
2023-04-07 12:57:26 -07:00
Rick Macklem
ff2f1f691c nfsd: Add support for the SP4_MACH_CRED case in ExchangeID
Commit f4179ad46f added support for operation bitmaps for
NFSv4.1/4.2.  This commit uses those to implement the SP4_MACH_CRED
case for the NFSv4.1/4.2 ExchangeID operation since the Linux
NFSv4.1/4.2 client is now using this for Kerberized mounts.
The Linux Kerberized NFSv4.1/4.2 mounts currently work without
support for this because Linux will fall back to SP4_NONE,
but there is no guarantee this fallback will work forever.

This commit only affects Kerberized NFSv4.1/4.2 mounts from
Linux at this time.

MFC after:	3 months
2023-04-07 12:49:23 -07:00
Gleb Smirnoff
66fbc19fbd tcp: pass tcpcb in the tfb_tcp_ctloutput() method instead of inpcb
Just matches rest of the KPI.

Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D39435
2023-04-07 12:18:10 -07:00
Gleb Smirnoff
35bc0bcc51 tcp: reduce argument list to functions that pass a segment
The socket argument is superfluous, as a tcpcb always has one and
only one socket.

Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D39434
2023-04-07 12:18:06 -07:00
Gleb Smirnoff
de4368dd84 tcp: retire tfb_tcp_hpts_do_segment()
Isn't in use anymore.  Correct comments that mention it.

Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D39433
2023-04-07 12:18:02 -07:00
Mateusz Guzik
20be1b4fc4 zfs: try to fallback early if can't do optimized copy
Not complete, but already shaves on some locking.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-04-07 15:47:45 +00:00
Mateusz Guzik
d012836fb6 zfs: fix up EXDEV handling for clone_range
API contract requires VOPs to handle EXDEV internally, worst case by
falling back to the generic copy routine. This broke with the recent
changes.

While here whack custom loop to lock 2 vnodes with vn_lock_pair, which
provides the same functionality internally. write start/finish around
it plays no role so got eliminated.

One difference is that vn_lock_pair always takes an exclusive lock on
both vnodes. I did not patch around it because current code takes an
exclusive lock on the target vnode. zfs supports shared-locking for
writes, so this serializes different calls to the routine as is, despite
range locking inside. At the same time you may notice the source vnode
can get some traffic if only shared-locked, thus once more this goes
the safer route of exclusive-locking. Note this should be patched to
use shared-locking for both once the feature is considered stable.

Technically the switch to vn_lock_pair should be a separate change, but
it would only introduce churn immediately whacked by the rest of the
patch.

[note: technically the review is still in progress, but so is the
active breakage]

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-04-07 15:46:20 +00:00
Zhenlei Huang
2d3614fb13 bridge: Log MAC address port flapping
MAC flapping occurs when a bridge receives packets with the same source MAC
address on different member interfaces. The common reasons are:
 - user roams from one bridge port to another
 - user has wrong network setup, bridge loops e.g.
 - someone set duplicated ethernet address on his/her nic
 - some bad guy / virus / trojan send spoofed packets

if_bridge currently updates the bridge routing entry silently hence it is hard
to diagnose.

Emit logs when MAC address port flapping occurs to make it easier to diagnose.

Reviewed by:	kp
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D39375
2023-04-07 22:25:41 +08:00
Randall Stewart
945f9a7cc9 tcp: misc cleanup of options for rack as well as socket option logging.
Both BBR and Rack have the ability to log socket options, which is currently disabled. Rack
has an experimental SaD (Sack Attack Detection) algorithm that should be made available. Also
there is a t_maxpeak_rate that needs to be removed (its un-used).

Reviewed by: tuexen, cc
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D39427
2023-04-07 10:15:29 -04:00
Ganbold Tsagaankhuu
37d97b10ff Fix style. 2023-04-07 03:18:42 +00:00
Ganbold Tsagaankhuu
4720afaffe Improve RK3568 pcie phy handling codes a bit.
Move phy bifurcation code to a separate function
that can be called during the attach phase.
Also initialize both pcie lanes accordingly.
2023-04-07 02:54:13 +00:00
Andrew Turner
04b4655997 Mark EENTRY as .text
To allow it to be used before ENTRY we need to ensure the symbol is
in the .text section. It also needs to be aligned correctly.

While here mark the symbol type as a function as in the ENTRY macro.

Reported by:	jrtc27
Sponsored by:	Arm Ltd
2023-04-06 16:50:54 +01:00
Mateusz Guzik
f87a9f51ef vfs: validate that a mount point with FPLOOKUP has vop_fplookup ops 2023-04-06 15:20:41 +00:00
Mateusz Guzik
e237e2ba5f vfs: only allow doomed vnodes to return EOPNOTSUPP for fplookup vops
This helps asserting that they are provided by filesystems indicating
they do it.
2023-04-06 15:20:41 +00:00
Mateusz Guzik
5f6df17775 vfs: validate that vop vectors provide all or none fplookup vops
In order to prevent later susprises.
2023-04-06 15:20:41 +00:00
Mateusz Guzik
c7f6c2a50b deadfs: consistently return EOPNOTSUPP for fplookup vops 2023-04-06 15:20:41 +00:00
Mateusz Guzik
0baef43ed0 vfs: add missing vop_fplookup ops to syncer 2023-04-06 15:20:41 +00:00
Mateusz Guzik
8495fa49ea vfs: whack spurious comments from syncer's vop_vector 2023-04-06 15:20:40 +00:00
Mateusz Guzik
f3c81b9738 ufs: add missing vop_fplookup ops 2023-04-06 15:20:40 +00:00
Mateusz Guzik
e2d997d1cb zfs: add missing vop_fplookup_vexec assignments
This happens to be a nop right now.
2023-04-06 15:20:40 +00:00
Mark Johnston
5f6d37787f netmap: Handle packet batches in generic mode
ifnets are allowed to pass batches of multiple packets to if_input,
linked by the m_nextpkt pointer.  iflib_rxeof() sometimes does this, for
example.  Netmap's generic mode did not handle this and would only
deliver the first packet in the batch, leaking the rest.

PR:		270636
Reviewed by:	vmaffione
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39426
2023-04-05 17:07:48 -04:00
Mark Johnston
ce12afaa6f netmap: Fix queue stalls with generic interfaces
In emulated mode, the FreeBSD netmap port attempts to perform zero-copy
transmission.  This works as follows: the kernel ring is populated with
mbuf headers to which netmap buffers are attached.  When transmitting,
the mbuf refcount is initialized to 2, and when the counter value has
been decremented to 1 netmap infers that the driver has freed the mbuf
and thus transmission is complete.

This scheme does not generalize to the situation where netmap is
attaching to a software interface which may transmit packets among
multiple "queues", as is the case with bridge or lagg interfaces.  In
that case, we would be relying on backing hardware drivers to free
transmitted mbufs promptly, but this isn't guaranteed; a driver may
reasonably defer freeing a small number of transmitted buffers
indefinitely.  If such a buffer ends up at the tail of a netmap transmit
ring, further transmits can end up blocked indefinitely.

Fix the problem by removing the zero-copy scheme (which is also not
implemented in the Linux port of netmap).  Instead, the kernel ring is
populated with regular mbuf clusters into which netmap buffers are
copied by nm_os_generic_xmit_frame().  The refcounting scheme is
preserved, and this lets us avoid allocating a fresh cluster per
transmitted packet in the common case.  If the transmit ring is full, a
callout is used to free the "stuck" mbuf, avoiding the queue deadlock
described above.

Furthermore, when recycling mbuf clusters, be sure to fully reinitialize
the mbuf header instead of simply re-setting M_PKTHDR.  Some software
interfaces, like if_vlan, may set fields in the header which should be
reset before the mbuf is reused.

Reviewed by:	vmaffione
MFC after:	1 month
Sponsored by:	Zenarmor
Sponsored by:	OPNsense
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D38065
2023-04-05 12:12:30 -04:00
Zhenlei Huang
da4068c4e1 mlx5ib(4): Mark driver knows net epoch
This driver has already been EPOCH(9) aware since e48813009c.

Reviewed by:	hselasky
Tested by:	hselasky
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D39406
2023-04-06 00:08:23 +08:00
Zhenlei Huang
fc6c93b6a5 infiniband: Opt-in for net epoch
This is counterpart to e87c494015, which did the same for ethernet.

Suggested by:	hselasky
Reviewed by:	hselasky, kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D39405
2023-04-06 00:08:23 +08:00
Mark Johnston
03276e338a netisr: Remove the now-unused NETISR_EPAIR queue index
No functional change intended.

Fixes:		3dd5760aa5 ("if_epair: rework")
MFC after:	1 week
Sponsored by:	Klara, Inc.
2023-04-05 11:46:42 -04:00
Mark Johnston
82bbdde4eb bridge: Try to make the GRAB_OUR_PACKETS macro a bit more readable
- Let the compiler use constant folding to eliminate conditionals.
- Fix some inconsistent whitespace.

No functional change intended.

Reviewed by:	zlei
MFC after:	2 weeks
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D38410
2023-04-05 10:37:00 -04:00
Mateusz Guzik
c67eb393fa tcp_hpts: plug a compiler warn
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-04-05 14:32:13 +00:00
Martin Matuska
eb1feadc20 zfs: fix null ap->a_fsizetd NULL pointer derefernce
Submitted by:		rmacklem
Reported by:		cy
Tested by:		cy, mm
Reviewed by:		pjd, mm
Differential revision:	https://reviews.freebsd.org/D39418
2023-04-05 09:33:32 +02:00
Konstantin Belousov
54579376c0 Change kqueue1() to be compatible with NetBSD
by making it accept some open(2) flags.  More precisely, only
O_CLOEXEC is supported, the flag is translated into the KQUEUE_CLOEXEC flag
for kqueuex(2), and O_NONBLOCK is silently ignored.

Reported and tested by:	vishwin
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39377
2023-04-05 06:29:49 +03:00
Gleb Smirnoff
84b42df834 rack: fix build on powerpc 2023-04-04 16:35:36 -07:00
Dmitry Chagin
71bc17803e linux(4): Implement close_range over native
Handling of the CLOSE_RANGE_UNSHARE flag is not implemented due to
difference in fd unsharing mechanism in the Linux and FreeBSD.

Reviewed by:		mjg
Differential revision:	https://reviews.freebsd.org/D39398
MFC after:		2 weeks
2023-04-04 23:24:04 +03:00
Dmitry Chagin
50111714f5 linux(4): Regen for close_range syscall
MFC after:		2 weeks
2023-04-04 23:23:37 +03:00
Dmitry Chagin
1c27dce1f8 linux(4): Modify close_range syscall to match Linux
MFC after:		2 weeks
2023-04-04 23:23:24 +03:00
Randall Stewart
030434acaf Update rack to the latest code used at NF.
There have been many changes to rack over the last couple of years, including:
     a) Ability when switching stacks to have one stack query another.
     b) Internal use of micro-second timers instead of ticks.
     c) Many changes to pacing in forms of
        1) Improvements to Dynamic Goodput Pacing (DGP)
        2) Improvements to fixed rate paciing
        3) A new feature called hybrid pacing where the requestor can
           get a combination of DGP and fixed rate pacing with deadlines
           for delivery that can dynamically speed things up.
     d) All kinds of bugs found during extensive testing and use of the
        rack stack for streaming video and in fact all data transferred
        by NF

Reviewed by: glebius, gallatin, tuexen
Sponsored By: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D39402
2023-04-04 16:05:46 -04:00
Gleb Smirnoff
2ff8187efd tcp_hpts: remove dead code tcp_drop_in_pkts()
Should have gone in f971e79139.
2023-04-04 12:55:27 -07:00
Eric van Gyzen
ecaeac805b mlxfw: fix potential NULL pointer dereference
Reported by:	Coverity (an internal run at Dell)
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D39348
2023-04-04 15:27:46 -05:00
Martin Matuska
91ef6f14f2 Revert "zfs: fall back if block_cloning feature is disabled"
This reverts commit 8ee579abe0.
2023-04-04 16:34:34 +02:00
Konstantin Belousov
11cdffc603 Regen 2023-04-04 16:19:08 +03:00
Konstantin Belousov
dac3102488 Rename kqueue1(2) to kqueuex(2) to avoid compat issues with NetBSD
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39377
2023-04-04 16:19:08 +03:00
Randall Stewart
73ee5756de Fixes in the tcp infrastructure with respect to stack changes as well as other infrastructure updates for incoming rack features.
So stack switching as always been a bit of a issue. We currently use a break before make setup which means that
if something goes wrong you have to try to get back to a stack. This patch among a lot of other things changes that so
that it is a make before break. We also expand some of the function blocks in prep for new features in rack that will allow
more controlled pacing. We also add other abilities such as the pathway for a stack to query a previous stack to acquire from
it critical state information so things in flight don't get dropped or mis-handled when switching stacks. We also add the
concept of a timer granularity. This allows an alternate stack to change from the old ticks granularity to microseconds and
of course this even gives us a pathway to go to nanosecond timekeeping if we need to (something for the data center to consider
for sure).

Once all this lands I will then update rack to begin using all these new features.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D39210
2023-04-01 01:46:38 -04:00
Emmanuel Vadot
63b113af57 linuxkpi: Add a few more dummy includes
Needed by drm-kmod.

Sponsored by:	Beckhoff Automation GmbH & Co. KG
2023-04-04 14:11:32 +02:00
Martin Matuska
8ee579abe0 zfs: fall back if block_cloning feature is disabled
If block_cloning is disabled, or other errors from zfs_clone_range()
return an EXDEV we should fall back to vn_generic_copy_file_range().

This fixes issues when copying files on the same dataset with
block_cloning disabled.

Upstreamed as pull request to OpenZFS.

Reviewed by:	Mateusz Guzik <mjguzik@gmail.com>
OpenZFS pull request:	14713
2023-04-04 13:43:34 +02:00
Emmanuel Vadot
44312c28fe linuxkpi: Add linux/agp_backend.h
It declares the structs needed by drm code for AGP.

Obtained from:	drm-kmod
Sponsored by:	Beckhoff Automation GmbH & Co. KG
2023-04-04 11:49:48 +02:00
Emmanuel Vadot
7c7419f60c linuxkpi: Add linux/stddef.h
It simply include sys/stdef.h

Obtained from:	drm-kmod
Sponsored by:	Beckhoff Automation GmbH & Co. KG
2023-04-04 11:39:27 +02:00
Emmanuel Vadot
0bf561351b linuxkpi: Include linux/types.h in linux/mod_devicetable.h
It's done like this in linux too.

Sponsored by:   Beckhoff Automation GmbH & Co. KG
2023-04-04 10:48:28 +02:00
Emmanuel Vadot
3f39ff2420 linuxkpi: Include linux/math64.h in linux/time.h
It's done like this in linux too.

Sponsored by:	Beckhoff Automation GmbH & Co. KG
2023-04-04 10:48:27 +02:00
Ganbold Tsagaankhuu
396b6914c9 Fix pcie phy enabling codes for RK3568 SoC.
Handle data-lanes property for pcie phy and set it accordingly.
This makes devices attached to pcie3 work properly.
For some RK3568 based boards, RTL8125B based device is
connected it. So with this, realtek-re-kmod driver attaches
and works.

Partially obtained from OpenBSD.
Tested on NanoPI-R5S, FireFly Station P2 boards.
2023-04-04 02:50:29 +00:00
Martin Matuska
2a58b312b6 zfs: merge openzfs/zfs@431083f75
Notable upstream pull request merges:
  #12194 Fix short-lived txg caused by autotrim
  #13368 ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()
  #13392 Implementation of block cloning for ZFS
  #13741 SHA2 reworking and API for iterating over multiple implementations
  #14282 Sync thread should avoid holding the spa config write lock
         when possible
  #14283 txg_sync should handle write errors in ZIL
  #14359 More adaptive ARC eviction
  #14469 Fix NULL pointer dereference in zio_ready()
  #14479 zfs redact fails when dnodesize=auto
  #14496 improve error message of zfs redact
  #14500 Skip memory allocation when compressing holes
  #14501 FreeBSD: don't verify recycled vnode for zfs control directory
  #14502 partially revert PR 14304 (eee9362a7)
  #14509 Fix per-jail zfs.mount_snapshot setting
  #14514 Fix data race between zil_commit() and zil_suspend()
  #14516 System-wide speculative prefetch limit
  #14517 Use rw_tryupgrade() in dmu_bonus_hold_by_dnode()
  #14519 Do not hold spa_config in ZIL while blocked on IO
  #14523 Move dmu_buf_rele() after dsl_dataset_sync_done()
  #14524 Ignore too large stack in case of dsl_deadlist_merge
  #14526 Use .section .rodata instead of .rodata on FreeBSD
  #14528 ICP: AES-GCM: Refactor gcm_clear_ctx()
  #14529 ICP: AES-GCM: Unify gcm_init_ctx() and gmac_init_ctx()
  #14532 Handle unexpected errors in zil_lwb_commit() without ASSERT()
  #14544 icp: Prevent compilers from optimizing away memset()
         in gcm_clear_ctx()
  #14546 Revert zfeature_active() to static
  #14556 Remove bad kmem_free() oversight from previous zfsdev_state_list
         patch
  #14563 Optimize the is_l2cacheable functions
  #14565 FreeBSD: zfs_znode_alloc: lock the vnode earlier
  #14566 FreeBSD: fix false assert in cache_vop_rmdir when replaying ZIL
  #14567 spl: Add cmn_err_once() to log a message only on the first call
  #14568 Fix incremental receive silently failing for recursive sends
  #14569 Restore ASMABI and other Unify work
  #14576 Fix detection of IBM Power8 machines (ISA 2.07)
  #14577 Better handling for future crypto parameters
  #14600 zcommon: Refactor FPU state handling in fletcher4
  #14603 Fix prefetching of indirect blocks while destroying
  #14633 Fixes in persistent error log
  #14639 FreeBSD: Remove extra arc_reduce_target_size() call
  #14641 Additional limits on hole reporting
  #14649 Drop lying to the compiler in the fletcher4 code
  #14652 panic loop when removing slog device
  #14653 Update vdev state for spare vdev
  #14655 Fix cloning into already dirty dbufs
  #14678 Revert "Do not hold spa_config in ZIL while blocked on IO"

Obtained from:	OpenZFS
OpenZFS commit:	431083f75b
2023-04-03 16:49:30 +02:00
Ganbold Tsagaankhuu
b98fbf3781 Fix driver name.
Submitted by:	Tyuryukanov S.Y.
2023-04-03 14:20:28 +00:00
Andrew Turner
41236539d8 Add non-posted device memory to the arm64 mem map
Add VM_MEMATTR_DEVICE_NP to the arm64 vm.pmap.kernel_maps sysctl.

Reviewed by:	markj
Sponsored by:	Arm Ltd
 Differential Revision:	https://reviews.freebsd.org/D39371
2023-04-03 12:59:11 +01:00
Dmitry Chagin
7ae0972c7b linsysfs(4): Reimplement listnics() using ifAPI
Handle if arrival/departure events and VNETs.

Differential Revision:	https://reviews.freebsd.org/D38901
MFC after:		1 month
XMFC with:		ifAPI, pseudofs
2023-04-03 11:22:16 +03:00
Bjoern A. Zeeb
cfccc7f30a LinuxKPI: 802.11: remove extra spaces
Remove two extra spaces.  No functional change.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2023-04-02 22:25:28 +00:00
Zhenlei Huang
5f3d0399e9 lagg(4): Tap traffic after protocol processing
Different lagg protocols have different means and policies to process incoming
traffic. For example, for failover protocol, by default received traffic is only
accepted when they are received through the active port. For lacp protocol, LACP
control messages are tapped off, also traffic will be dropped if they are
received through the port which is not in collecting state or is not joined to
the active aggregator. It confuses if user dump and see inbound traffic on
lagg(4) interfaces but they are actually silently dropped and not passed into
the net stack.

Tap traffic after protocol processing so that user will have consistent view of
the inbound traffic, meanwhile mbuf is set with correct receiving interface and
bpf(4) will diagnose the right direction of inbound packets.

PR:		270417
Reviewed by:	melifaro (previous version)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39225
2023-04-03 01:01:51 +08:00
Zhenlei Huang
90820ef121 infiniband: Widen NET_EPOCH coverage
From static code analysis, some device drivers (cxgbe, mlx4, mthca, and qlnx)
do not enter net epoch before lagg_input_infiniband(). If IPoIB interface is a
member of lagg(4) interface, and after returning from lagg_input_infiniband()
the receiving interface of mbuf is set to lagg(4) interface, then when
concurrently destroying the lagg(4) interface, there is a small window that the
interface gets destroyed and becomes invalid before infiniband_input() re-enter
net epoch, thus leading use-after-free.

Widen NET_EPOCH coverage to prevent use-after-free.

Thanks hselasky@ for testing with mlx5 devices.

Reviewed by:	hselasky
Tested by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39275
2023-04-03 00:51:49 +08:00
Alexander V. Chernikov
3091d980f5 netlink: add NETLINK to the DEFAULTS for each architecture
NETLINK is going to replace rtsock and a number of other ioctl/sysctl interfaces.
In-base utilies such as route(8), netstat(8) and soon ifconfig(8)
 are being converted to use netlink sockets as a transport between
 kernel and userland.
In the current configuration, it still possible have the kernel
 without NETLINK (`nooptions NETLINK`) and use the aforementioned
 utilies by buidling the world with `WITHOUT_NETLINK` src.conf knob.
However, this approach does not cover the cases when person unintentionally
 builds a custom kernel without netlink and tries to use the standard userland.

This change adds `option NETLINK` to the default options for each
 architecture, fixing the custom kernel issue.
For arm, this change uses `std.armv6` and `std.armv7` (netlink already in)
 instead of DEFAULTS.

Reviewed By: imp
Differential Revision: https://reviews.freebsd.org/D39339
2023-04-02 15:27:21 +00:00
Emmanuel Vadot
4fd9e20671 linuxkpi: hdmi: Remove wrong dependency on wlan
Copy-paste mistake.

Reported by:	Alastair Hogge <agh@riseup.net>
Fixes:	f1d7ae31d4 ("linuxkpi: Add hdmi helpers")
2023-04-02 16:56:23 +02:00
Alexander V. Chernikov
c35a43b261 netlink: allow exact-match route lookups via RTM_GETROUTE.
Use already-existing RTM_F_PREFIX rtm_flag to indicate that the
 request assumes exact-prefix lookup instead of the
 longest-prefix-match.

MFC after:	2 weeks
2023-04-02 13:47:10 +00:00
Alexander V. Chernikov
4aeb939ecf netlink: fix NULL check in the default route snl(3) parser.
CID:		1506959
MFC after:	2 weeks
2023-04-02 12:44:20 +00:00
Alexander V. Chernikov
27cbc1a7fe netlink: fix snl_read_reply_multi().
CID:		1506956
MFC after:	2 weeks
2023-04-02 12:41:53 +00:00
Dmitry Chagin
a32ed5ec05 pseudofs: Simplify pfs_visible_proc
Reviewed by:		des
Differential revision:	https://reviews.freebsd.org/D39383
MFC after:		1 month
2023-04-02 11:24:10 +03:00
Dmitry Chagin
405c0c04ed pseudofs: Allow vis callback to be called for a named node
This will be used later in the linsysfs module to filter out VNETs.

Reviewed by:		des
Differential revision:	https://reviews.freebsd.org/D39382
MFC after:		1 month
2023-04-02 11:21:15 +03:00
Dmitry Chagin
7f72324346 pseudofs: Microoptimize struct pfs_node
Since 81167243b the size of struct pfs_node is 280 bytes, so the kernel
memory allocator takes memory from 384 bytes sized bucket. However, the
length of the node name is mostly short, e.g., for Linux emulation layer
it is up to 16 bytes. The size of struct pfs_node w/o pfs_name is 152
bytes, i.e., we have 104 bytes left to fit the node name into the 256
bytes-sized bucket.

Reviewed by:		des
Differential revision:	https://reviews.freebsd.org/D39381
MFC after:		1 month
2023-04-02 11:20:07 +03:00
Navdeep Parhar
9f354cd3d0 cxgbe(4): Allow tracing filters on loopback ports.
Each physical port has an associated loopback tx channel and anything
transmitted over that channel by the driver is looped back internally by
the hardware as if received on that physical port.  This change allows
tracing filters to be installed in this loopback path.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2023-04-01 17:50:46 -07:00
Navdeep Parhar
531ef35241 cxgbe/iw_cxgbe: Always set a vnet around calls to IN_LOOPBACK.
This is catch up with efe58855f3.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2023-04-01 16:19:10 -07:00
Rick Macklem
f4179ad46f nfscommon: Add support for an NFSv4 operation bitmap
NFSv4.1/4.2 uses operation bitmaps for various operations,
such as the SP4_MACH_CRED case for ExchangeID.
This patch adds support for operation bitmaps so that
support for SP4_MACH_CRED can be added to the NFSv4.1/4.2
server in a future commit.

This commit should not change any NFSv4.1/4.2 semantics.

MFC after:	3 months
2023-04-01 14:22:26 -07:00
黃清隆
285d85f4f9 arcmsr(4): Fix reading buffer empty length error.
MFC after:	2 weeks
2023-03-31 22:43:43 -07:00
Bjoern A. Zeeb
8ac540d3b8 LinuxKPI: 802.11: adjust locking
Split up the lhw lock and the scan lock.  The latter is a mtx
while the former changes from mtx to sx as mac80211 downcalls may
sleep (and the ic lock is not usable in that case either and a larger
project to fix).
This will also enforce some lookups under lock (mostly scan) as well
as general protection for more compat code and avoid a possible
deadlock with one of the upcoming callbacks from driver into the
compat code.

Sponsored by:	The FreeBSD Foundation
MFC after:	7 days
2023-03-31 19:59:50 +00:00
Joseph Mingrone
6f9cba8f8b
libpcap: Update to 1.10.3
Local changes:

- In contrib/libpcap/pcap/bpf.h, do not include pcap/dlt.h.  Our system
  net/dlt.h is pulled in from net/bpf.h.
- sys/net/dlt.h: Incorporate changes from libpcap 1.10.3.
- lib/libpcap/Makefile: Update for libpcap 1.10.3.

Changelog:	https://git.tcpdump.org/libpcap/blob/95691ebe7564afa3faa5c6ba0dbd17e351be455a:/CHANGES
Reviewed by:	emaste
Obtained from:	https://www.tcpdump.org/release/libpcap-1.10.3.tar.gz
Sponsored by:	The FreeBSD Foundation
2023-03-31 16:02:22 -03:00
John Baldwin
cb750f7f5a fuse: Remove set but unused cr_gid variable.
Reviewed by:	asomers
Reported by:	GCC
Differential Revision:	https://reviews.freebsd.org/D39350
2023-03-31 10:57:13 -07:00
John Baldwin
ad83dd2b2b LinuxKPI: Appease -Wunused-but-set-variable warnings from GCC.
- Mark assert dummy variables as __unused.

- Use a dummy (void) cast of the flags argument passed to
  spin_unlock_irqrestore so it gets treated as used.

Reviewed by:	manu, hselasky
Differential Revision:	https://reviews.freebsd.org/D39349
2023-03-31 10:56:33 -07:00
Zhenlei Huang
5a8abd0a29 lacp: Use C99 bool for boolean return value
This improves readability.

No functional change intended.

MFC after:	1 week
2023-04-01 01:48:36 +08:00
Mitchell Horne
3462c371c2 arm64/gicv3: correct the size of the distributor resource
Use the GICD_SIZE macro (0x10000), which is half the size of the current
fixed-sized mapping (128 * 1024 == 0x20000).

In ARM64 Hyper-V instances, it seems the Distributor's registers are
located immediately preceding a range of physical memory in the bus
address space. Thus, when ram0 is attaching and attempts to reserve
SYS_RES_MEMORY resources corresponding to its physmem ranges, it fails,
because the first 0x10000 bytes of this range are already owned by gic0.

PR:		270415
Reported by:	whu
Tested by:	whu
Differential Revision:	https://reviews.freebsd.org/D39260
2023-03-31 13:26:22 -03:00
Mark Johnston
1a3cb489e5 arm64: Move the initial kernel stack out of the init_pagetables section
init_pagetables is mapped into the segment containing the BSS, but does
not get zeroed by locore.  It is used for bootstrap page table pages.

It happens that the bootstrap kernel stack is also placed in that
section, but there's no reason it shouldn't live in the BSS, so move it
there.  No functional change intended.

Reviewed by:	andrew
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Juniper Networks
Differential Revision:	https://reviews.freebsd.org/D39367
2023-03-31 11:57:26 -04:00
Andrew Turner
47ff149afa Move arm64 EENTRY uses before ENTRY
The ENTRY macro adds instructions to the start of a function but not
EENTRY. To use these instructions in both functions move the EENTRY
use before the ENTRY use.

Sponsored by:	Arm Ltd
2023-03-31 16:45:31 +01:00
Mark Johnston
a54370f4ab arm64: Ensure that thread0's PCB flags are initialized
On arm64, the PCB is stored at the top of the thread stack.  For thread0
this comes from the static "initstack" region, which is placed in the
.init_pagetable section, which is not part of the BSS and thus doesn't
get zeroed by locore.  (See the comment in ldscript.arm64.)  It is thus
possible for the pcb_flags field to be uninitialized, which can result
in PCB_SINGLE_STEP being set.

Fix this by simply initializing the field.  A separate commit will move
initstack out of the .init_pagetable section, since it has no reason to
be there, but it is preferable to explicitly initialize PCB fields
anyway.  In particular, regular kernel stacks are not zeroed upon
allocation, so we should be consistent here.

Reviewed by:	andrew
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Juniper Networks
Differential Revision:	https://reviews.freebsd.org/D39343
2023-03-31 09:50:34 -04:00
Dmitry Chagin
960562652c linux(4): Fix opt_netlink.h inclusion
Add opt_netlink.h to the linux_common module, on i386, where we don't
uses linux_common module, move opt_netlink.h inclusion under
i386 condition.

MFC after:		2 weeks
2023-03-31 14:56:59 +03:00
Dmitry Chagin
126df352f5 linux(4): Move inclusion of i386-specific files under common condition 2023-03-31 14:56:29 +03:00
Dmitry Chagin
f94b5734bc Revert "linsysfs(4): Reimplement listnics() using ifAPI"
This reverts commit 0b56641cfc.

As it poorly interacts with vnet subsystem
2023-03-31 14:54:33 +03:00
Kristof Provost
28921c4f7d carp: allow commands to use interface name rather than index
Get/set commands can now choose to provide the interface name rather
than the interface index. This allows userspace to avoid a call to
if_nametoindex().

Suggested by:	melifaro
Reviewed by:	melifaro
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D39359
2023-03-31 11:29:58 +02:00