Commit Graph

146914 Commits

Author SHA1 Message Date
Martin Matuska
2a58b312b6 zfs: merge openzfs/zfs@431083f75
Notable upstream pull request merges:
  #12194 Fix short-lived txg caused by autotrim
  #13368 ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()
  #13392 Implementation of block cloning for ZFS
  #13741 SHA2 reworking and API for iterating over multiple implementations
  #14282 Sync thread should avoid holding the spa config write lock
         when possible
  #14283 txg_sync should handle write errors in ZIL
  #14359 More adaptive ARC eviction
  #14469 Fix NULL pointer dereference in zio_ready()
  #14479 zfs redact fails when dnodesize=auto
  #14496 improve error message of zfs redact
  #14500 Skip memory allocation when compressing holes
  #14501 FreeBSD: don't verify recycled vnode for zfs control directory
  #14502 partially revert PR 14304 (eee9362a7)
  #14509 Fix per-jail zfs.mount_snapshot setting
  #14514 Fix data race between zil_commit() and zil_suspend()
  #14516 System-wide speculative prefetch limit
  #14517 Use rw_tryupgrade() in dmu_bonus_hold_by_dnode()
  #14519 Do not hold spa_config in ZIL while blocked on IO
  #14523 Move dmu_buf_rele() after dsl_dataset_sync_done()
  #14524 Ignore too large stack in case of dsl_deadlist_merge
  #14526 Use .section .rodata instead of .rodata on FreeBSD
  #14528 ICP: AES-GCM: Refactor gcm_clear_ctx()
  #14529 ICP: AES-GCM: Unify gcm_init_ctx() and gmac_init_ctx()
  #14532 Handle unexpected errors in zil_lwb_commit() without ASSERT()
  #14544 icp: Prevent compilers from optimizing away memset()
         in gcm_clear_ctx()
  #14546 Revert zfeature_active() to static
  #14556 Remove bad kmem_free() oversight from previous zfsdev_state_list
         patch
  #14563 Optimize the is_l2cacheable functions
  #14565 FreeBSD: zfs_znode_alloc: lock the vnode earlier
  #14566 FreeBSD: fix false assert in cache_vop_rmdir when replaying ZIL
  #14567 spl: Add cmn_err_once() to log a message only on the first call
  #14568 Fix incremental receive silently failing for recursive sends
  #14569 Restore ASMABI and other Unify work
  #14576 Fix detection of IBM Power8 machines (ISA 2.07)
  #14577 Better handling for future crypto parameters
  #14600 zcommon: Refactor FPU state handling in fletcher4
  #14603 Fix prefetching of indirect blocks while destroying
  #14633 Fixes in persistent error log
  #14639 FreeBSD: Remove extra arc_reduce_target_size() call
  #14641 Additional limits on hole reporting
  #14649 Drop lying to the compiler in the fletcher4 code
  #14652 panic loop when removing slog device
  #14653 Update vdev state for spare vdev
  #14655 Fix cloning into already dirty dbufs
  #14678 Revert "Do not hold spa_config in ZIL while blocked on IO"

Obtained from:	OpenZFS
OpenZFS commit:	431083f75b
2023-04-03 16:49:30 +02:00
Ganbold Tsagaankhuu
b98fbf3781 Fix driver name.
Submitted by:	Tyuryukanov S.Y.
2023-04-03 14:20:28 +00:00
Andrew Turner
41236539d8 Add non-posted device memory to the arm64 mem map
Add VM_MEMATTR_DEVICE_NP to the arm64 vm.pmap.kernel_maps sysctl.

Reviewed by:	markj
Sponsored by:	Arm Ltd
 Differential Revision:	https://reviews.freebsd.org/D39371
2023-04-03 12:59:11 +01:00
Dmitry Chagin
7ae0972c7b linsysfs(4): Reimplement listnics() using ifAPI
Handle if arrival/departure events and VNETs.

Differential Revision:	https://reviews.freebsd.org/D38901
MFC after:		1 month
XMFC with:		ifAPI, pseudofs
2023-04-03 11:22:16 +03:00
Bjoern A. Zeeb
cfccc7f30a LinuxKPI: 802.11: remove extra spaces
Remove two extra spaces.  No functional change.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2023-04-02 22:25:28 +00:00
Zhenlei Huang
5f3d0399e9 lagg(4): Tap traffic after protocol processing
Different lagg protocols have different means and policies to process incoming
traffic. For example, for failover protocol, by default received traffic is only
accepted when they are received through the active port. For lacp protocol, LACP
control messages are tapped off, also traffic will be dropped if they are
received through the port which is not in collecting state or is not joined to
the active aggregator. It confuses if user dump and see inbound traffic on
lagg(4) interfaces but they are actually silently dropped and not passed into
the net stack.

Tap traffic after protocol processing so that user will have consistent view of
the inbound traffic, meanwhile mbuf is set with correct receiving interface and
bpf(4) will diagnose the right direction of inbound packets.

PR:		270417
Reviewed by:	melifaro (previous version)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39225
2023-04-03 01:01:51 +08:00
Zhenlei Huang
90820ef121 infiniband: Widen NET_EPOCH coverage
From static code analysis, some device drivers (cxgbe, mlx4, mthca, and qlnx)
do not enter net epoch before lagg_input_infiniband(). If IPoIB interface is a
member of lagg(4) interface, and after returning from lagg_input_infiniband()
the receiving interface of mbuf is set to lagg(4) interface, then when
concurrently destroying the lagg(4) interface, there is a small window that the
interface gets destroyed and becomes invalid before infiniband_input() re-enter
net epoch, thus leading use-after-free.

Widen NET_EPOCH coverage to prevent use-after-free.

Thanks hselasky@ for testing with mlx5 devices.

Reviewed by:	hselasky
Tested by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39275
2023-04-03 00:51:49 +08:00
Alexander V. Chernikov
3091d980f5 netlink: add NETLINK to the DEFAULTS for each architecture
NETLINK is going to replace rtsock and a number of other ioctl/sysctl interfaces.
In-base utilies such as route(8), netstat(8) and soon ifconfig(8)
 are being converted to use netlink sockets as a transport between
 kernel and userland.
In the current configuration, it still possible have the kernel
 without NETLINK (`nooptions NETLINK`) and use the aforementioned
 utilies by buidling the world with `WITHOUT_NETLINK` src.conf knob.
However, this approach does not cover the cases when person unintentionally
 builds a custom kernel without netlink and tries to use the standard userland.

This change adds `option NETLINK` to the default options for each
 architecture, fixing the custom kernel issue.
For arm, this change uses `std.armv6` and `std.armv7` (netlink already in)
 instead of DEFAULTS.

Reviewed By: imp
Differential Revision: https://reviews.freebsd.org/D39339
2023-04-02 15:27:21 +00:00
Emmanuel Vadot
4fd9e20671 linuxkpi: hdmi: Remove wrong dependency on wlan
Copy-paste mistake.

Reported by:	Alastair Hogge <agh@riseup.net>
Fixes:	f1d7ae31d4 ("linuxkpi: Add hdmi helpers")
2023-04-02 16:56:23 +02:00
Alexander V. Chernikov
c35a43b261 netlink: allow exact-match route lookups via RTM_GETROUTE.
Use already-existing RTM_F_PREFIX rtm_flag to indicate that the
 request assumes exact-prefix lookup instead of the
 longest-prefix-match.

MFC after:	2 weeks
2023-04-02 13:47:10 +00:00
Alexander V. Chernikov
4aeb939ecf netlink: fix NULL check in the default route snl(3) parser.
CID:		1506959
MFC after:	2 weeks
2023-04-02 12:44:20 +00:00
Alexander V. Chernikov
27cbc1a7fe netlink: fix snl_read_reply_multi().
CID:		1506956
MFC after:	2 weeks
2023-04-02 12:41:53 +00:00
Dmitry Chagin
a32ed5ec05 pseudofs: Simplify pfs_visible_proc
Reviewed by:		des
Differential revision:	https://reviews.freebsd.org/D39383
MFC after:		1 month
2023-04-02 11:24:10 +03:00
Dmitry Chagin
405c0c04ed pseudofs: Allow vis callback to be called for a named node
This will be used later in the linsysfs module to filter out VNETs.

Reviewed by:		des
Differential revision:	https://reviews.freebsd.org/D39382
MFC after:		1 month
2023-04-02 11:21:15 +03:00
Dmitry Chagin
7f72324346 pseudofs: Microoptimize struct pfs_node
Since 81167243b the size of struct pfs_node is 280 bytes, so the kernel
memory allocator takes memory from 384 bytes sized bucket. However, the
length of the node name is mostly short, e.g., for Linux emulation layer
it is up to 16 bytes. The size of struct pfs_node w/o pfs_name is 152
bytes, i.e., we have 104 bytes left to fit the node name into the 256
bytes-sized bucket.

Reviewed by:		des
Differential revision:	https://reviews.freebsd.org/D39381
MFC after:		1 month
2023-04-02 11:20:07 +03:00
Navdeep Parhar
9f354cd3d0 cxgbe(4): Allow tracing filters on loopback ports.
Each physical port has an associated loopback tx channel and anything
transmitted over that channel by the driver is looped back internally by
the hardware as if received on that physical port.  This change allows
tracing filters to be installed in this loopback path.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2023-04-01 17:50:46 -07:00
Navdeep Parhar
531ef35241 cxgbe/iw_cxgbe: Always set a vnet around calls to IN_LOOPBACK.
This is catch up with efe58855f3.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2023-04-01 16:19:10 -07:00
Rick Macklem
f4179ad46f nfscommon: Add support for an NFSv4 operation bitmap
NFSv4.1/4.2 uses operation bitmaps for various operations,
such as the SP4_MACH_CRED case for ExchangeID.
This patch adds support for operation bitmaps so that
support for SP4_MACH_CRED can be added to the NFSv4.1/4.2
server in a future commit.

This commit should not change any NFSv4.1/4.2 semantics.

MFC after:	3 months
2023-04-01 14:22:26 -07:00
黃清隆
285d85f4f9 arcmsr(4): Fix reading buffer empty length error.
MFC after:	2 weeks
2023-03-31 22:43:43 -07:00
Bjoern A. Zeeb
8ac540d3b8 LinuxKPI: 802.11: adjust locking
Split up the lhw lock and the scan lock.  The latter is a mtx
while the former changes from mtx to sx as mac80211 downcalls may
sleep (and the ic lock is not usable in that case either and a larger
project to fix).
This will also enforce some lookups under lock (mostly scan) as well
as general protection for more compat code and avoid a possible
deadlock with one of the upcoming callbacks from driver into the
compat code.

Sponsored by:	The FreeBSD Foundation
MFC after:	7 days
2023-03-31 19:59:50 +00:00
Joseph Mingrone
6f9cba8f8b
libpcap: Update to 1.10.3
Local changes:

- In contrib/libpcap/pcap/bpf.h, do not include pcap/dlt.h.  Our system
  net/dlt.h is pulled in from net/bpf.h.
- sys/net/dlt.h: Incorporate changes from libpcap 1.10.3.
- lib/libpcap/Makefile: Update for libpcap 1.10.3.

Changelog:	https://git.tcpdump.org/libpcap/blob/95691ebe7564afa3faa5c6ba0dbd17e351be455a:/CHANGES
Reviewed by:	emaste
Obtained from:	https://www.tcpdump.org/release/libpcap-1.10.3.tar.gz
Sponsored by:	The FreeBSD Foundation
2023-03-31 16:02:22 -03:00
John Baldwin
cb750f7f5a fuse: Remove set but unused cr_gid variable.
Reviewed by:	asomers
Reported by:	GCC
Differential Revision:	https://reviews.freebsd.org/D39350
2023-03-31 10:57:13 -07:00
John Baldwin
ad83dd2b2b LinuxKPI: Appease -Wunused-but-set-variable warnings from GCC.
- Mark assert dummy variables as __unused.

- Use a dummy (void) cast of the flags argument passed to
  spin_unlock_irqrestore so it gets treated as used.

Reviewed by:	manu, hselasky
Differential Revision:	https://reviews.freebsd.org/D39349
2023-03-31 10:56:33 -07:00
Zhenlei Huang
5a8abd0a29 lacp: Use C99 bool for boolean return value
This improves readability.

No functional change intended.

MFC after:	1 week
2023-04-01 01:48:36 +08:00
Mitchell Horne
3462c371c2 arm64/gicv3: correct the size of the distributor resource
Use the GICD_SIZE macro (0x10000), which is half the size of the current
fixed-sized mapping (128 * 1024 == 0x20000).

In ARM64 Hyper-V instances, it seems the Distributor's registers are
located immediately preceding a range of physical memory in the bus
address space. Thus, when ram0 is attaching and attempts to reserve
SYS_RES_MEMORY resources corresponding to its physmem ranges, it fails,
because the first 0x10000 bytes of this range are already owned by gic0.

PR:		270415
Reported by:	whu
Tested by:	whu
Differential Revision:	https://reviews.freebsd.org/D39260
2023-03-31 13:26:22 -03:00
Mark Johnston
1a3cb489e5 arm64: Move the initial kernel stack out of the init_pagetables section
init_pagetables is mapped into the segment containing the BSS, but does
not get zeroed by locore.  It is used for bootstrap page table pages.

It happens that the bootstrap kernel stack is also placed in that
section, but there's no reason it shouldn't live in the BSS, so move it
there.  No functional change intended.

Reviewed by:	andrew
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Juniper Networks
Differential Revision:	https://reviews.freebsd.org/D39367
2023-03-31 11:57:26 -04:00
Andrew Turner
47ff149afa Move arm64 EENTRY uses before ENTRY
The ENTRY macro adds instructions to the start of a function but not
EENTRY. To use these instructions in both functions move the EENTRY
use before the ENTRY use.

Sponsored by:	Arm Ltd
2023-03-31 16:45:31 +01:00
Mark Johnston
a54370f4ab arm64: Ensure that thread0's PCB flags are initialized
On arm64, the PCB is stored at the top of the thread stack.  For thread0
this comes from the static "initstack" region, which is placed in the
.init_pagetable section, which is not part of the BSS and thus doesn't
get zeroed by locore.  (See the comment in ldscript.arm64.)  It is thus
possible for the pcb_flags field to be uninitialized, which can result
in PCB_SINGLE_STEP being set.

Fix this by simply initializing the field.  A separate commit will move
initstack out of the .init_pagetable section, since it has no reason to
be there, but it is preferable to explicitly initialize PCB fields
anyway.  In particular, regular kernel stacks are not zeroed upon
allocation, so we should be consistent here.

Reviewed by:	andrew
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Juniper Networks
Differential Revision:	https://reviews.freebsd.org/D39343
2023-03-31 09:50:34 -04:00
Dmitry Chagin
960562652c linux(4): Fix opt_netlink.h inclusion
Add opt_netlink.h to the linux_common module, on i386, where we don't
uses linux_common module, move opt_netlink.h inclusion under
i386 condition.

MFC after:		2 weeks
2023-03-31 14:56:59 +03:00
Dmitry Chagin
126df352f5 linux(4): Move inclusion of i386-specific files under common condition 2023-03-31 14:56:29 +03:00
Dmitry Chagin
f94b5734bc Revert "linsysfs(4): Reimplement listnics() using ifAPI"
This reverts commit 0b56641cfc.

As it poorly interacts with vnet subsystem
2023-03-31 14:54:33 +03:00
Kristof Provost
28921c4f7d carp: allow commands to use interface name rather than index
Get/set commands can now choose to provide the interface name rather
than the interface index. This allows userspace to avoid a call to
if_nametoindex().

Suggested by:	melifaro
Reviewed by:	melifaro
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D39359
2023-03-31 11:29:58 +02:00
Andrew Turner
3a0cc6fe61 Handle the arm64 unknown exception separately
Rather than falling through to the default case handle the unknown
exception with its own panic message. As ESR_EL1 is zero for this
exception stop printing it.

Sponsored by:	Arm Ltd
2023-03-31 10:15:45 +01:00
Bjoern A. Zeeb
3f0083c4e3 LinuxKPI: 802.11: use ic_printf more consistently
Rather than printing ic_name ourselves (or not at all) use ic_printf()
as a common function from net80211 where possible.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2023-03-31 07:14:32 +00:00
Ed Maste
d33cdf16df fs/cd9660: add header include guards
Diff reduction against NetBSD files in sys/fs/cd9660/ and OpenBSD files
in usr.sbin/makefs/cd9660/.

Sponsored by:	The FreeBSD Foundation
2023-03-30 19:20:54 -04:00
Konstantin Belousov
7170774e2a ifcapnv: cap_bit in ifcap2_nv_bit_names[] is bit, not index
Sponsored by:	Nvidia networking
2023-03-31 02:08:15 +03:00
John Baldwin
47d1e67874 sys: Disable errors for -Wunused-function on GCC.
This matches the handling of this warning on clang.
2023-03-30 15:43:48 -07:00
Navdeep Parhar
21b778fbeb cxgbe(4): Remove dead code.
Fixes:	e7e0844422 cxgbe(4): Replace T4_PKT_TIMESTAMP with something slightly less hackish.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2023-03-30 14:13:07 -07:00
Mark Johnston
fd02d0bc14 graid3: Pre-allocate the timeout event structure
As in commit 2f1cfb7f63 ("gmirror: Pre-allocate the timeout event
structure"), graid3 must avoid M_WAITOK allocations in callout handlers.

Reported by:	graid3 regression tests
MFC after	2 weeks
2023-03-30 13:38:15 -04:00
Mark Johnston
cab1056105 kdb: Modify securelevel policy
Currently, sysctls which enable KDB in some way are flagged with
CTLFLAG_SECURE, meaning that you can't modify them if securelevel > 0.
This is so that KDB cannot be used to lower a running system's
securelevel, see commit 3d7618d8bf.  However, the newer mac_ddb(4)
restricts DDB operations which could be abused to lower securelevel
while retaining some ability to gather useful debugging information.

To enable the use of KDB (specifically, DDB) on systems with a raised
securelevel, change the KDB sysctl policy: rather than relying on
CTLFLAG_SECURE, add a check of the current securelevel to kdb_trap().
If the securelevel is raised, only pass control to the backend if MAC
specifically grants access; otherwise simply check to see if mac_ddb
vetoes the request, as before.

Add a new secure sysctl, debug.kdb.enter_securelevel, to override this
behaviour.  That is, the sysctl lets one enter a KDB backend even with a
raised securelevel, so long as it is set before the securelevel is
raised.

Reviewed by:	mhorne, stevek
MFC after:	1 month
Sponsored by:	Juniper Networks
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D37122
2023-03-30 10:45:00 -04:00
Alexander V. Chernikov
b755f1a009 netlink: Fix adding routes with nexthops on p2p interfaces.
Use full-featured ifa_ifwithroute() to guess route ifa/ifp
 instead of ifa_ifwithnet(). This change makes the route addition
 logic closer to the rt_getifa_fib() used by rtsock.

Reported by:	glebius
Tested by:	glebius
Differential Revision: https://reviews.freebsd.org/D39335
MFC after:	2 weeks
2023-03-30 09:53:50 +00:00
Mateusz Guzik
f5a365e51f inet6: protect address manipulation with a lock
This is a total hack/bare minimum which follows inet4.

Otherwise 2 threads removing the same address can easily crash.

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D39317
2023-03-30 08:46:38 +00:00
Kirk McKusick
fe5e6e2cc5 Improvement in UFS/FFS directory placement when doing mkdir(2).
The algorithm for laying out new directories was devised in the 1980s
and markedly improved the performance of the filesystem. In those days
large disks had at most 100 cylinder groups and often as few as 10-20.
Modern multi-terrabyte disks have thousands of cylinder groups. The
original algorithm does not handle these large sizes well. This change
attempts to expand the scope of the original algorithm to work well
with these much larger disks while still retaining the properties
of the original algorithm for small disks.

The filesystem implementation is divided into policy routines and
implementation routines. The policy routines can be changed in any
way desired without risk of corrupting the filesystem. The policy
requests are handled by the implementation layer. If the policy
asks for an available resource, it is granted. But if it asks for
an already in-use resource, then the implementation will provide
an available one nearby the request. Thus it is impossible for a
policy to double allocate. This change is limited to the policy
implementation.

This change updates the ffs_dirpref() routine which is responsible
for selecting the cylinder group into which a new directory should
be placed. If we are near the root of the filesystem we aim to
spread them out as much as possible. As we descend deeper from the
root we cluster them closer together around their parent as we
expect them to be more closely interactive. Higher-level directories
like usr/src/sys and usr/src/bin should be separated while the
directories in these areas are more likely to be accessed together
so should be closer. And directories within commands or kernel
subsystems should be closer still.

We pick a range of cylinder groups around the cylinder group of the
directory in which we are being created. The size of the range for
our search is based on our depth from the root of our filesystem.
We then probe that range based on how many directories are already
present. The first new directory is at 1/2 (middle) of the range;
the second is in the first 1/4 of the range, then at 3/4, 1/8, 3/8,
5/8, 7/8, 1/16, 3/16, 5/16, etc.

It is desirable to store the depth of a directory in its on-disk
inode so that it is available when we need it. We add a new field
di_dirdepth to track the depth of each directory. Because there are
few spare fields left in the inode, we choose to share an existing
field in the inode rather than having one of our own. Specifically
we create a union with the di_freelink field. The di_freelink field
is used to track inodes that have been unlinked but remain referenced.
It is not needed until a rmdir(2) operation has been done on a
directory. At that point, the directory has no contents and even
if it is kept active as a current directory is no longer able to
have any new directories or files created in it. Thus the use of
di_dirdepth and di_freelink will never coincide.

Reported by:  Timo Voelker
Reviewed by:  kib
Tested by:    Peter Holm
MFC after:    2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39246
2023-03-29 21:13:27 -07:00
Alexander V. Chernikov
badcb3fd57 routing: fix panic when adding an interface route to the p2p interface
without and inet/inet6 addresses attached.

MFC after:      3 days
2023-03-29 20:28:24 +00:00
Konstantin Belousov
cd137909c3 amd64 wakeup: recalculate mitigations after APICs are woken
APICs are needed to broadcast IPIs for MSR writes.

PR:	270489
Reviewed by:	dchagin, emaste, jhb
Tested by:	dchagin, manu
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39302
2023-03-29 21:45:20 +03:00
Zhenlei Huang
d4a80d21b3 lagg(4): Do not enter net epoch recursively
This saves a little resources.

No functional change intended.

Reviewed by:	kp
Fixes:		b8a6e03fac Widen NET_EPOCH coverage
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39267
2023-03-30 00:29:51 +08:00
Zhenlei Huang
dbe86dd5de lagg(4): Refactor out some lagg protocol input routines into a default one
Those input routines are identical.

Also inline two fast paths.

No functional change intended.

MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39251
2023-03-30 00:22:13 +08:00
Zhenlei Huang
fcac5719a1 lagg(4): Make lagg_list and lagg_detach_cookie static
They are used internally only.

No functional change intended.

MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39250
2023-03-30 00:14:44 +08:00
Mateusz Guzik
80cf427b8d proc: shave a lock trip on exit if possible
... which happens to be vast majority of the time
2023-03-29 09:19:03 +00:00
Mateusz Guzik
7c31de1a3c ufs: stop doing refcount_init on made up creds
creds are not using the refcount API for a long time now, but this
previously failed to fail to compile because the type remained int.

Now it broke due to conversion to long.
2023-03-29 09:19:03 +00:00