Notable upstream pull request merges:
#12194 Fix short-lived txg caused by autotrim
#13368 ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()
#13392 Implementation of block cloning for ZFS
#13741 SHA2 reworking and API for iterating over multiple implementations
#14282 Sync thread should avoid holding the spa config write lock
when possible
#14283 txg_sync should handle write errors in ZIL
#14359 More adaptive ARC eviction
#14469 Fix NULL pointer dereference in zio_ready()
#14479 zfs redact fails when dnodesize=auto
#14496 improve error message of zfs redact
#14500 Skip memory allocation when compressing holes
#14501 FreeBSD: don't verify recycled vnode for zfs control directory
#14502 partially revert PR 14304 (eee9362a7)
#14509 Fix per-jail zfs.mount_snapshot setting
#14514 Fix data race between zil_commit() and zil_suspend()
#14516 System-wide speculative prefetch limit
#14517 Use rw_tryupgrade() in dmu_bonus_hold_by_dnode()
#14519 Do not hold spa_config in ZIL while blocked on IO
#14523 Move dmu_buf_rele() after dsl_dataset_sync_done()
#14524 Ignore too large stack in case of dsl_deadlist_merge
#14526 Use .section .rodata instead of .rodata on FreeBSD
#14528 ICP: AES-GCM: Refactor gcm_clear_ctx()
#14529 ICP: AES-GCM: Unify gcm_init_ctx() and gmac_init_ctx()
#14532 Handle unexpected errors in zil_lwb_commit() without ASSERT()
#14544 icp: Prevent compilers from optimizing away memset()
in gcm_clear_ctx()
#14546 Revert zfeature_active() to static
#14556 Remove bad kmem_free() oversight from previous zfsdev_state_list
patch
#14563 Optimize the is_l2cacheable functions
#14565 FreeBSD: zfs_znode_alloc: lock the vnode earlier
#14566 FreeBSD: fix false assert in cache_vop_rmdir when replaying ZIL
#14567 spl: Add cmn_err_once() to log a message only on the first call
#14568 Fix incremental receive silently failing for recursive sends
#14569 Restore ASMABI and other Unify work
#14576 Fix detection of IBM Power8 machines (ISA 2.07)
#14577 Better handling for future crypto parameters
#14600 zcommon: Refactor FPU state handling in fletcher4
#14603 Fix prefetching of indirect blocks while destroying
#14633 Fixes in persistent error log
#14639 FreeBSD: Remove extra arc_reduce_target_size() call
#14641 Additional limits on hole reporting
#14649 Drop lying to the compiler in the fletcher4 code
#14652 panic loop when removing slog device
#14653 Update vdev state for spare vdev
#14655 Fix cloning into already dirty dbufs
#14678 Revert "Do not hold spa_config in ZIL while blocked on IO"
Obtained from: OpenZFS
OpenZFS commit: 431083f75b
VLAN identifier 0xFFF is reserved. It must not be configured or
transmitted.
Also validate during parsing to prevent potential integer overflow.
Reviewed by: #network, melifaro
Fixes: c7cffd65c5 Add support for stacked VLANs (IEEE 802.1ad, AKA Q-in-Q)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D39282
Different lagg protocols have different means and policies to process incoming
traffic. For example, for failover protocol, by default received traffic is only
accepted when they are received through the active port. For lacp protocol, LACP
control messages are tapped off, also traffic will be dropped if they are
received through the port which is not in collecting state or is not joined to
the active aggregator. It confuses if user dump and see inbound traffic on
lagg(4) interfaces but they are actually silently dropped and not passed into
the net stack.
Tap traffic after protocol processing so that user will have consistent view of
the inbound traffic, meanwhile mbuf is set with correct receiving interface and
bpf(4) will diagnose the right direction of inbound packets.
PR: 270417
Reviewed by: melifaro (previous version)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D39225
From static code analysis, some device drivers (cxgbe, mlx4, mthca, and qlnx)
do not enter net epoch before lagg_input_infiniband(). If IPoIB interface is a
member of lagg(4) interface, and after returning from lagg_input_infiniband()
the receiving interface of mbuf is set to lagg(4) interface, then when
concurrently destroying the lagg(4) interface, there is a small window that the
interface gets destroyed and becomes invalid before infiniband_input() re-enter
net epoch, thus leading use-after-free.
Widen NET_EPOCH coverage to prevent use-after-free.
Thanks hselasky@ for testing with mlx5 devices.
Reviewed by: hselasky
Tested by: hselasky
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D39275
NETLINK is going to replace rtsock and a number of other ioctl/sysctl interfaces.
In-base utilies such as route(8), netstat(8) and soon ifconfig(8)
are being converted to use netlink sockets as a transport between
kernel and userland.
In the current configuration, it still possible have the kernel
without NETLINK (`nooptions NETLINK`) and use the aforementioned
utilies by buidling the world with `WITHOUT_NETLINK` src.conf knob.
However, this approach does not cover the cases when person unintentionally
builds a custom kernel without netlink and tries to use the standard userland.
This change adds `option NETLINK` to the default options for each
architecture, fixing the custom kernel issue.
For arm, this change uses `std.armv6` and `std.armv7` (netlink already in)
instead of DEFAULTS.
Reviewed By: imp
Differential Revision: https://reviews.freebsd.org/D39339
Use already-existing RTM_F_PREFIX rtm_flag to indicate that the
request assumes exact-prefix lookup instead of the
longest-prefix-match.
MFC after: 2 weeks
This will be used later in the linsysfs module to filter out VNETs.
Reviewed by: des
Differential revision: https://reviews.freebsd.org/D39382
MFC after: 1 month
Since 81167243b the size of struct pfs_node is 280 bytes, so the kernel
memory allocator takes memory from 384 bytes sized bucket. However, the
length of the node name is mostly short, e.g., for Linux emulation layer
it is up to 16 bytes. The size of struct pfs_node w/o pfs_name is 152
bytes, i.e., we have 104 bytes left to fit the node name into the 256
bytes-sized bucket.
Reviewed by: des
Differential revision: https://reviews.freebsd.org/D39381
MFC after: 1 month
Heimdal's lib/hdb/db3.c is only built if DB3 is enabled, i.e. #if HAVE_DB3.
FreeBSD's bdb is DB1. Therefore the entire db3.c file is #ifdef'd out.
Let's avoid building a file that results in a useless object file.
MFC after: 1 week
Each physical port has an associated loopback tx channel and anything
transmitted over that channel by the driver is looped back internally by
the hardware as if received on that physical port. This change allows
tracing filters to be installed in this loopback path.
MFC after: 1 week
Sponsored by: Chelsio Communications
NFSv4.1/4.2 uses operation bitmaps for various operations,
such as the SP4_MACH_CRED case for ExchangeID.
This patch adds support for operation bitmaps so that
support for SP4_MACH_CRED can be added to the NFSv4.1/4.2
server in a future commit.
This commit should not change any NFSv4.1/4.2 semantics.
MFC after: 3 months
* Move more logic from conftest.py to the actual atf_pytest handler
* Move nodeid_to_method_name() to the utils.py so it can be shared
MFC after: 2 weeks
This diff does not contain any functional changes.
Its sole purpose is splitting netlink.py into smaller chunks.
The new code simplifies the upcoming generic netlink support
introduction.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D39365
Split up the lhw lock and the scan lock. The latter is a mtx
while the former changes from mtx to sx as mac80211 downcalls may
sleep (and the ic lock is not usable in that case either and a larger
project to fix).
This will also enforce some lookups under lock (mostly scan) as well
as general protection for more compat code and avoid a possible
deadlock with one of the upcoming callbacks from driver into the
compat code.
Sponsored by: The FreeBSD Foundation
MFC after: 7 days
The hang basically bricks a physical box and it can be recovered
only if you are able to boot from alternate media. This isn't a
perfect fix, but throw it in before loader experts decide on
proper one.
Submitted by: whu
Fixes: 927358dd98
- Mark assert dummy variables as __unused.
- Use a dummy (void) cast of the flags argument passed to
spin_unlock_irqrestore so it gets treated as used.
Reviewed by: manu, hselasky
Differential Revision: https://reviews.freebsd.org/D39349
Use the GICD_SIZE macro (0x10000), which is half the size of the current
fixed-sized mapping (128 * 1024 == 0x20000).
In ARM64 Hyper-V instances, it seems the Distributor's registers are
located immediately preceding a range of physical memory in the bus
address space. Thus, when ram0 is attaching and attempts to reserve
SYS_RES_MEMORY resources corresponding to its physmem ranges, it fails,
because the first 0x10000 bytes of this range are already owned by gic0.
PR: 270415
Reported by: whu
Tested by: whu
Differential Revision: https://reviews.freebsd.org/D39260
init_pagetables is mapped into the segment containing the BSS, but does
not get zeroed by locore. It is used for bootstrap page table pages.
It happens that the bootstrap kernel stack is also placed in that
section, but there's no reason it shouldn't live in the BSS, so move it
there. No functional change intended.
Reviewed by: andrew
MFC after: 1 week
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks
Differential Revision: https://reviews.freebsd.org/D39367
The ENTRY macro adds instructions to the start of a function but not
EENTRY. To use these instructions in both functions move the EENTRY
use before the ENTRY use.
Sponsored by: Arm Ltd
When recovering a system that is unbootable due to some
problem with the active BE, it is likely you'll be booted
from a rescue image running UFS. In this case, bectl
needs help finding the zpool root that you want to operate
on. In this case, improve the error message to suggest
specifying a root, rather than just emitting a generic
error message that might imply, to the naive user, that
there is a ZFS compatibility issue between the rescue
image and the on-disk ZFS pool.
Reviewed by: imp, kevans
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D39346
On arm64, the PCB is stored at the top of the thread stack. For thread0
this comes from the static "initstack" region, which is placed in the
.init_pagetable section, which is not part of the BSS and thus doesn't
get zeroed by locore. (See the comment in ldscript.arm64.) It is thus
possible for the pcb_flags field to be uninitialized, which can result
in PCB_SINGLE_STEP being set.
Fix this by simply initializing the field. A separate commit will move
initstack out of the .init_pagetable section, since it has no reason to
be there, but it is preferable to explicitly initialize PCB fields
anyway. In particular, regular kernel stacks are not zeroed upon
allocation, so we should be consistent here.
Reviewed by: andrew
MFC after: 1 week
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks
Differential Revision: https://reviews.freebsd.org/D39343
Add opt_netlink.h to the linux_common module, on i386, where we don't
uses linux_common module, move opt_netlink.h inclusion under
i386 condition.
MFC after: 2 weeks