147250 Commits

Author SHA1 Message Date
Oscar Zhao
0799db0cd6 merge from main 2023-04-03 04:17:38 -04:00
Bjoern A. Zeeb
5a249f91a3 LinuxKPI: 802.11: remove extra spaces
Remove two extra spaces.  No functional change.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2023-04-03 04:15:41 -04:00
Zhenlei Huang
e04d79ac41 lagg(4): Tap traffic after protocol processing
Different lagg protocols have different means and policies to process incoming
traffic. For example, for failover protocol, by default received traffic is only
accepted when they are received through the active port. For lacp protocol, LACP
control messages are tapped off, also traffic will be dropped if they are
received through the port which is not in collecting state or is not joined to
the active aggregator. It confuses if user dump and see inbound traffic on
lagg(4) interfaces but they are actually silently dropped and not passed into
the net stack.

Tap traffic after protocol processing so that user will have consistent view of
the inbound traffic, meanwhile mbuf is set with correct receiving interface and
bpf(4) will diagnose the right direction of inbound packets.

PR:		270417
Reviewed by:	melifaro (previous version)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39225
2023-04-03 04:15:41 -04:00
Zhenlei Huang
4f98db4019 infiniband: Widen NET_EPOCH coverage
From static code analysis, some device drivers (cxgbe, mlx4, mthca, and qlnx)
do not enter net epoch before lagg_input_infiniband(). If IPoIB interface is a
member of lagg(4) interface, and after returning from lagg_input_infiniband()
the receiving interface of mbuf is set to lagg(4) interface, then when
concurrently destroying the lagg(4) interface, there is a small window that the
interface gets destroyed and becomes invalid before infiniband_input() re-enter
net epoch, thus leading use-after-free.

Widen NET_EPOCH coverage to prevent use-after-free.

Thanks hselasky@ for testing with mlx5 devices.

Reviewed by:	hselasky
Tested by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39275
2023-04-03 04:15:41 -04:00
Alexander V. Chernikov
3ed80ea71c netlink: add NETLINK to the DEFAULTS for each architecture
NETLINK is going to replace rtsock and a number of other ioctl/sysctl interfaces.
In-base utilies such as route(8), netstat(8) and soon ifconfig(8)
 are being converted to use netlink sockets as a transport between
 kernel and userland.
In the current configuration, it still possible have the kernel
 without NETLINK (`nooptions NETLINK`) and use the aforementioned
 utilies by buidling the world with `WITHOUT_NETLINK` src.conf knob.
However, this approach does not cover the cases when person unintentionally
 builds a custom kernel without netlink and tries to use the standard userland.

This change adds `option NETLINK` to the default options for each
 architecture, fixing the custom kernel issue.
For arm, this change uses `std.armv6` and `std.armv7` (netlink already in)
 instead of DEFAULTS.

Reviewed By: imp
Differential Revision: https://reviews.freebsd.org/D39339
2023-04-03 04:15:40 -04:00
Emmanuel Vadot
8b4d56823b linuxkpi: hdmi: Remove wrong dependency on wlan
Copy-paste mistake.

Reported by:	Alastair Hogge <agh@riseup.net>
Fixes:	f1d7ae31d4aa ("linuxkpi: Add hdmi helpers")
2023-04-03 04:15:40 -04:00
Alexander V. Chernikov
80616931d6 netlink: allow exact-match route lookups via RTM_GETROUTE.
Use already-existing RTM_F_PREFIX rtm_flag to indicate that the
 request assumes exact-prefix lookup instead of the
 longest-prefix-match.

MFC after:	2 weeks
2023-04-03 04:15:40 -04:00
Alexander V. Chernikov
d53a61632d netlink: fix NULL check in the default route snl(3) parser.
CID:		1506959
MFC after:	2 weeks
2023-04-03 04:15:40 -04:00
Alexander V. Chernikov
2fb696c195 netlink: fix snl_read_reply_multi().
CID:		1506956
MFC after:	2 weeks
2023-04-03 04:15:40 -04:00
Dmitry Chagin
09b326eed9 pseudofs: Simplify pfs_visible_proc
Reviewed by:		des
Differential revision:	https://reviews.freebsd.org/D39383
MFC after:		1 month
2023-04-03 04:15:39 -04:00
Dmitry Chagin
c1827839fe pseudofs: Allow vis callback to be called for a named node
This will be used later in the linsysfs module to filter out VNETs.

Reviewed by:		des
Differential revision:	https://reviews.freebsd.org/D39382
MFC after:		1 month
2023-04-03 04:15:39 -04:00
Dmitry Chagin
6bb559f644 pseudofs: Microoptimize struct pfs_node
Since 81167243b the size of struct pfs_node is 280 bytes, so the kernel
memory allocator takes memory from 384 bytes sized bucket. However, the
length of the node name is mostly short, e.g., for Linux emulation layer
it is up to 16 bytes. The size of struct pfs_node w/o pfs_name is 152
bytes, i.e., we have 104 bytes left to fit the node name into the 256
bytes-sized bucket.

Reviewed by:		des
Differential revision:	https://reviews.freebsd.org/D39381
MFC after:		1 month
2023-04-03 04:15:39 -04:00
Navdeep Parhar
a3c4ff6d6c cxgbe(4): Allow tracing filters on loopback ports.
Each physical port has an associated loopback tx channel and anything
transmitted over that channel by the driver is looped back internally by
the hardware as if received on that physical port.  This change allows
tracing filters to be installed in this loopback path.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2023-04-03 04:15:39 -04:00
Navdeep Parhar
a3d7032106 cxgbe/iw_cxgbe: Always set a vnet around calls to IN_LOOPBACK.
This is catch up with efe58855f3ea.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2023-04-03 04:15:39 -04:00
Rick Macklem
5bf33cc297 nfscommon: Add support for an NFSv4 operation bitmap
NFSv4.1/4.2 uses operation bitmaps for various operations,
such as the SP4_MACH_CRED case for ExchangeID.
This patch adds support for operation bitmaps so that
support for SP4_MACH_CRED can be added to the NFSv4.1/4.2
server in a future commit.

This commit should not change any NFSv4.1/4.2 semantics.

MFC after:	3 months
2023-04-03 04:15:38 -04:00
黃清隆
f2645ab309 arcmsr(4): Fix reading buffer empty length error.
MFC after:	2 weeks
2023-04-03 04:15:38 -04:00
Bjoern A. Zeeb
1b9dc4f25a LinuxKPI: 802.11: adjust locking
Split up the lhw lock and the scan lock.  The latter is a mtx
while the former changes from mtx to sx as mac80211 downcalls may
sleep (and the ic lock is not usable in that case either and a larger
project to fix).
This will also enforce some lookups under lock (mostly scan) as well
as general protection for more compat code and avoid a possible
deadlock with one of the upcoming callbacks from driver into the
compat code.

Sponsored by:	The FreeBSD Foundation
MFC after:	7 days
2023-04-03 04:15:37 -04:00
John Baldwin
b0a514e5f7 fuse: Remove set but unused cr_gid variable.
Reviewed by:	asomers
Reported by:	GCC
Differential Revision:	https://reviews.freebsd.org/D39350
2023-04-03 04:14:08 -04:00
John Baldwin
955bff0f64 LinuxKPI: Appease -Wunused-but-set-variable warnings from GCC.
- Mark assert dummy variables as __unused.

- Use a dummy (void) cast of the flags argument passed to
  spin_unlock_irqrestore so it gets treated as used.

Reviewed by:	manu, hselasky
Differential Revision:	https://reviews.freebsd.org/D39349
2023-04-03 04:14:08 -04:00
Zhenlei Huang
4fe1b1de9c lacp: Use C99 bool for boolean return value
This improves readability.

No functional change intended.

MFC after:	1 week
2023-04-03 04:14:07 -04:00
Mitchell Horne
e5a8fc508e arm64/gicv3: correct the size of the distributor resource
Use the GICD_SIZE macro (0x10000), which is half the size of the current
fixed-sized mapping (128 * 1024 == 0x20000).

In ARM64 Hyper-V instances, it seems the Distributor's registers are
located immediately preceding a range of physical memory in the bus
address space. Thus, when ram0 is attaching and attempts to reserve
SYS_RES_MEMORY resources corresponding to its physmem ranges, it fails,
because the first 0x10000 bytes of this range are already owned by gic0.

PR:		270415
Reported by:	whu
Tested by:	whu
Differential Revision:	https://reviews.freebsd.org/D39260
2023-04-03 04:14:07 -04:00
Mark Johnston
6b33e87f62 arm64: Move the initial kernel stack out of the init_pagetables section
init_pagetables is mapped into the segment containing the BSS, but does
not get zeroed by locore.  It is used for bootstrap page table pages.

It happens that the bootstrap kernel stack is also placed in that
section, but there's no reason it shouldn't live in the BSS, so move it
there.  No functional change intended.

Reviewed by:	andrew
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Juniper Networks
Differential Revision:	https://reviews.freebsd.org/D39367
2023-04-03 04:14:07 -04:00
Andrew Turner
68b7e8fa3a Move arm64 EENTRY uses before ENTRY
The ENTRY macro adds instructions to the start of a function but not
EENTRY. To use these instructions in both functions move the EENTRY
use before the ENTRY use.

Sponsored by:	Arm Ltd
2023-04-03 04:14:07 -04:00
Mark Johnston
b8f88877a5 arm64: Ensure that thread0's PCB flags are initialized
On arm64, the PCB is stored at the top of the thread stack.  For thread0
this comes from the static "initstack" region, which is placed in the
.init_pagetable section, which is not part of the BSS and thus doesn't
get zeroed by locore.  (See the comment in ldscript.arm64.)  It is thus
possible for the pcb_flags field to be uninitialized, which can result
in PCB_SINGLE_STEP being set.

Fix this by simply initializing the field.  A separate commit will move
initstack out of the .init_pagetable section, since it has no reason to
be there, but it is preferable to explicitly initialize PCB fields
anyway.  In particular, regular kernel stacks are not zeroed upon
allocation, so we should be consistent here.

Reviewed by:	andrew
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Juniper Networks
Differential Revision:	https://reviews.freebsd.org/D39343
2023-04-03 04:14:06 -04:00
Dmitry Chagin
e0cac84ae0 linux(4): Fix opt_netlink.h inclusion
Add opt_netlink.h to the linux_common module, on i386, where we don't
uses linux_common module, move opt_netlink.h inclusion under
i386 condition.

MFC after:		2 weeks
2023-04-03 04:14:06 -04:00
Dmitry Chagin
a3e9cabc27 linux(4): Move inclusion of i386-specific files under common condition 2023-04-03 04:14:06 -04:00
Dmitry Chagin
901c1b5dd3 Revert "linsysfs(4): Reimplement listnics() using ifAPI"
This reverts commit 0b56641cfcda30d06243223f37781ccc18455bef.

As it poorly interacts with vnet subsystem
2023-04-03 04:14:06 -04:00
Kristof Provost
22a4aadf8c carp: allow commands to use interface name rather than index
Get/set commands can now choose to provide the interface name rather
than the interface index. This allows userspace to avoid a call to
if_nametoindex().

Suggested by:	melifaro
Reviewed by:	melifaro
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D39359
2023-04-03 04:14:06 -04:00
Andrew Turner
9185dee208 Handle the arm64 unknown exception separately
Rather than falling through to the default case handle the unknown
exception with its own panic message. As ESR_EL1 is zero for this
exception stop printing it.

Sponsored by:	Arm Ltd
2023-04-03 04:14:06 -04:00
Bjoern A. Zeeb
afa54ae2d9 LinuxKPI: 802.11: use ic_printf more consistently
Rather than printing ic_name ourselves (or not at all) use ic_printf()
as a common function from net80211 where possible.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2023-04-03 04:14:06 -04:00
Ed Maste
d79e4ea0bc fs/cd9660: add header include guards
Diff reduction against NetBSD files in sys/fs/cd9660/ and OpenBSD files
in usr.sbin/makefs/cd9660/.

Sponsored by:	The FreeBSD Foundation
2023-04-03 04:14:04 -04:00
Konstantin Belousov
36212092dc ifcapnv: cap_bit in ifcap2_nv_bit_names[] is bit, not index
Sponsored by:	Nvidia networking
2023-04-03 04:14:04 -04:00
John Baldwin
4331016c4f sys: Disable errors for -Wunused-function on GCC.
This matches the handling of this warning on clang.
2023-04-03 04:14:04 -04:00
Navdeep Parhar
93b2c0220a cxgbe(4): Remove dead code.
Fixes:	e7e084442227 cxgbe(4): Replace T4_PKT_TIMESTAMP with something slightly less hackish.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2023-04-03 04:14:03 -04:00
Mark Johnston
25f6e9a70a graid3: Pre-allocate the timeout event structure
As in commit 2f1cfb7f63ca ("gmirror: Pre-allocate the timeout event
structure"), graid3 must avoid M_WAITOK allocations in callout handlers.

Reported by:	graid3 regression tests
MFC after	2 weeks
2023-04-03 04:14:03 -04:00
Mark Johnston
f5d6f7cb47 kdb: Modify securelevel policy
Currently, sysctls which enable KDB in some way are flagged with
CTLFLAG_SECURE, meaning that you can't modify them if securelevel > 0.
This is so that KDB cannot be used to lower a running system's
securelevel, see commit 3d7618d8bf0b7.  However, the newer mac_ddb(4)
restricts DDB operations which could be abused to lower securelevel
while retaining some ability to gather useful debugging information.

To enable the use of KDB (specifically, DDB) on systems with a raised
securelevel, change the KDB sysctl policy: rather than relying on
CTLFLAG_SECURE, add a check of the current securelevel to kdb_trap().
If the securelevel is raised, only pass control to the backend if MAC
specifically grants access; otherwise simply check to see if mac_ddb
vetoes the request, as before.

Add a new secure sysctl, debug.kdb.enter_securelevel, to override this
behaviour.  That is, the sysctl lets one enter a KDB backend even with a
raised securelevel, so long as it is set before the securelevel is
raised.

Reviewed by:	mhorne, stevek
MFC after:	1 month
Sponsored by:	Juniper Networks
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D37122
2023-04-03 04:14:02 -04:00
Alexander V. Chernikov
73559a7cbf netlink: Fix adding routes with nexthops on p2p interfaces.
Use full-featured ifa_ifwithroute() to guess route ifa/ifp
 instead of ifa_ifwithnet(). This change makes the route addition
 logic closer to the rt_getifa_fib() used by rtsock.

Reported by:	glebius
Tested by:	glebius
Differential Revision: https://reviews.freebsd.org/D39335
MFC after:	2 weeks
2023-04-03 04:14:01 -04:00
Mateusz Guzik
ffd2a1ec23 inet6: protect address manipulation with a lock
This is a total hack/bare minimum which follows inet4.

Otherwise 2 threads removing the same address can easily crash.

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D39317
2023-04-03 04:14:01 -04:00
Kirk McKusick
0b37f7c2e8 Improvement in UFS/FFS directory placement when doing mkdir(2).
The algorithm for laying out new directories was devised in the 1980s
and markedly improved the performance of the filesystem. In those days
large disks had at most 100 cylinder groups and often as few as 10-20.
Modern multi-terrabyte disks have thousands of cylinder groups. The
original algorithm does not handle these large sizes well. This change
attempts to expand the scope of the original algorithm to work well
with these much larger disks while still retaining the properties
of the original algorithm for small disks.

The filesystem implementation is divided into policy routines and
implementation routines. The policy routines can be changed in any
way desired without risk of corrupting the filesystem. The policy
requests are handled by the implementation layer. If the policy
asks for an available resource, it is granted. But if it asks for
an already in-use resource, then the implementation will provide
an available one nearby the request. Thus it is impossible for a
policy to double allocate. This change is limited to the policy
implementation.

This change updates the ffs_dirpref() routine which is responsible
for selecting the cylinder group into which a new directory should
be placed. If we are near the root of the filesystem we aim to
spread them out as much as possible. As we descend deeper from the
root we cluster them closer together around their parent as we
expect them to be more closely interactive. Higher-level directories
like usr/src/sys and usr/src/bin should be separated while the
directories in these areas are more likely to be accessed together
so should be closer. And directories within commands or kernel
subsystems should be closer still.

We pick a range of cylinder groups around the cylinder group of the
directory in which we are being created. The size of the range for
our search is based on our depth from the root of our filesystem.
We then probe that range based on how many directories are already
present. The first new directory is at 1/2 (middle) of the range;
the second is in the first 1/4 of the range, then at 3/4, 1/8, 3/8,
5/8, 7/8, 1/16, 3/16, 5/16, etc.

It is desirable to store the depth of a directory in its on-disk
inode so that it is available when we need it. We add a new field
di_dirdepth to track the depth of each directory. Because there are
few spare fields left in the inode, we choose to share an existing
field in the inode rather than having one of our own. Specifically
we create a union with the di_freelink field. The di_freelink field
is used to track inodes that have been unlinked but remain referenced.
It is not needed until a rmdir(2) operation has been done on a
directory. At that point, the directory has no contents and even
if it is kept active as a current directory is no longer able to
have any new directories or files created in it. Thus the use of
di_dirdepth and di_freelink will never coincide.

Reported by:  Timo Voelker
Reviewed by:  kib
Tested by:    Peter Holm
MFC after:    2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39246
2023-04-03 04:14:00 -04:00
Alexander V. Chernikov
c2d046fd2b routing: fix panic when adding an interface route to the p2p interface
without and inet/inet6 addresses attached.

MFC after:      3 days
2023-04-03 04:13:59 -04:00
Konstantin Belousov
ab065524f1 amd64 wakeup: recalculate mitigations after APICs are woken
APICs are needed to broadcast IPIs for MSR writes.

PR:	270489
Reviewed by:	dchagin, emaste, jhb
Tested by:	dchagin, manu
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39302
2023-04-03 04:13:59 -04:00
Zhenlei Huang
bcc05e5853 lagg(4): Do not enter net epoch recursively
This saves a little resources.

No functional change intended.

Reviewed by:	kp
Fixes:		b8a6e03fac92 Widen NET_EPOCH coverage
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39267
2023-04-03 04:13:58 -04:00
Zhenlei Huang
a1e9b6ff94 lagg(4): Refactor out some lagg protocol input routines into a default one
Those input routines are identical.

Also inline two fast paths.

No functional change intended.

MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39251
2023-04-03 04:13:57 -04:00
Zhenlei Huang
aa909dfbb9 lagg(4): Make lagg_list and lagg_detach_cookie static
They are used internally only.

No functional change intended.

MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39250
2023-04-03 04:13:57 -04:00
Mateusz Guzik
0aa219a1f9 proc: shave a lock trip on exit if possible
... which happens to be vast majority of the time
2023-04-03 04:13:57 -04:00
Mateusz Guzik
34ed719492 ufs: stop doing refcount_init on made up creds
creds are not using the refcount API for a long time now, but this
previously failed to fail to compile because the type remained int.

Now it broke due to conversion to long.
2023-04-03 04:13:57 -04:00
Joseph Koshy
8b7b71874e pmc: Add a reminder to maintain documentation.
Approved by:	gnn (mentor)
Differential Revision: https://reviews.freebsd.org/D39298
2023-04-03 04:13:57 -04:00
Elliott Mitchell
0fb5d59eda xen/intr: rework xen_intr_resume() for in-place remapping
The prior implementation of xen_intr_resume() was wiping
xen_intr_port_to_isrc[] and then rebuilding from the x86 interrupt
table.  Rework to instead wipe the channel numbers (->xi_port) and then
scan the table for sources with invalid channels.

This will be slower due to scanning the whole table, but this removes
the dependency on the x86 interrupt code.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30599
[royger]
Split line over 80 characters.
2023-04-03 04:13:56 -04:00
Elliott Mitchell
7895021336 xen/intr: merge parts of resume functionality into new function
The portions of xen_rebind_ipi() and xen_rebind_virq() were already
near-identical.  While xen_rebind_ipi() should panic() on
single-processor, still having the functionality to invoke seems
harmless.

Meanwhile much of the loop from xen_intr_resume() seemed to want to be
closer to this same code.  This pushes related bits closer together.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30598
2023-04-03 04:13:56 -04:00
Julien Grall
a5f9811b1b xen/intr: remove x86 APIC headers from xen_intr.c
Remove these no longer needed headers.  Key for making xen_intr.c
machine-independent as they don't exist on other architectures.

Originally this was part of a much larger commit, but was broken off
for submission to the FreeBSD project.

Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Original implementation: Julien Grall <julien@xen.org>, 2015-10-20 09:14:56
MFC after: 1 week
2023-04-03 04:13:56 -04:00