I find it very annoying that there is no FreeBSD infrastructure to
determine failures across architectures other than to check in
changes and then have Jenkins find them.
Suggested by: Jessica Clarke
MFC after: 1 week
A check in the superblock validity code verifies that the computed
size of the filesystem cylinder groups (CGSIZE macro) does not
exceed the filesystem block size (fs_bsize).
A report was received that a filesystem had been flagged as failing
this check. We were unable to determine how the reported filesystem
could have been created. This commit adds a check at the end of the
newfs(8) command to verify that the the cylinder group size is valid.
If an oversize cylinder group is found newfs(8) prints a diagnostic
output and rebuilds the filesystem to make it compiliant.
MFC after: 1 week
Provide an additional line of output for the superblock giving the
computed size of the cylinder group (CGSIZE macro) along with the
details needed to calculate it.
MFC after: 1 week
Replacing rtsock with netlink also means providing similar tracing facilities,
rtsock provides `route -n monitor` interface, where each message can be traced
to the originating PID.
This diff closes the feature gap between rtsock and netlink in that regard.
Netlink works slightly differently from rtsock, as it is a generic message
"broker". It calls some kernel KPIs and returns the result to the caller.
Other Netlink consumers gets notified on the changed kernel state using the
relevant subsystem callbacks. Typically, it is close to impossible to pass
some data through these KPIs to enhance the notification.
This diff approaches the problem by using osd(9) to assign the relevant
socket pointer (`'nlp`) to the per-socket taskqueue execution thread.
This change allows to recover the pointer in the aforementioned notification
callbacks and extract some additional data.
Using `osd(9)` (and adding additional metadata) to the notification receiver
comes with some additional cost attached, so this interface needs to be
enabled explicitly by using a newly-created `NETLINK_MSG_INFO` `SOL_NETLINK`
socket option.
The actual medatadata (which includes the originator PID) is provided via
control messages. To enable extensibility, the control message data is
encoded in the standard netlink(TLV-based) fashion. The list of the
currently-provided properties can be found in `nlmsginfo_attrs`.
snl(3) is extended to enable decoding of netlink messages with metadata
(`snl_read_message_dbg()` stores the parsed structure in the provided buffer).
Differential Revision: https://reviews.freebsd.org/D39391
Make Ethernet rules more similar to the usual layer 3 rules by also
allowing ridentifier and labels to be set on them.
Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")
In the early days of gbde, it linked against libmd. Shortly after
conception, phk replaced ARC4 with SHA-512, but libmd did not have SHA2
at the time thus he built a copy of sha2.c for gbde.
Fast forward 3 years, cperciva adds SHA2 to libmd -- this makes gbde's
build of sha2.c redundant, but it's (understandably) overlooked. Let's
simplify the gbde build now and just assume that libmd includes the most
optimal implementation.
Reported by: koobs (weird lto errors?)
Differential Revision: https://reviews.freebsd.org/D34668
Independent of all of the commands, bectl itself takes an `-r` flag that
specifies the BE root to use. This was originally added to facilitate
testing, but it was later discovered to be incredibly useful in other
scenarios; e.g., trying to recover some boot environments in rescue
media.
The "BE root" described here is the parent dataset that holds boot
environments, but I've no idea if that's an accepted definition for that
dataset.
Reviewed by: gallatin, imp, Pau Amma
MFC after: 1 week
Differential Review: https://reviews.freebsd.org/D39710
Packet Mark is an analogue to ipfw tags with O(1) lookup from mbuf while
regular tags require a single-linked list traversal.
Mark is a 32-bit number that can be looked up in a table
[with 'number' table-type], matched or compared with a number with optional
mask applied before comparison.
Having generic nature, Mark can be used in a variety of needs.
For example, it could be used as a security group: mark will hold a security
group id and represent a group of packet flows that shares same access
control policy.
Reviewed By: pauamma_gundo.com
Differential Revision: https://reviews.freebsd.org/D39555
MFC after: 1 month
The manual describes "if*" form only while kernel uses fnmatch(3)
and allows use for more versatile shell-like patterns.
Note that explicitly and provide an example.
MFC after: 3 days
Pass 1 of fsck_ffs checks the integrity of all the cylinder groups.
If any are found to have been corrupted it offers to rebuild them.
Pass 5 then makes a second pass over the cylinder groups to validate
their block and inode maps. Pass 5 assumes that the cylinder groups
are not corrupted and can segment fault if they are corrupted. Rather
than rerunning the corruption checks a second time in pass 5, this
fix keeps track whether any corrupt cylinder groups were found but not
fixed in pass 1 either due to running with the -n flag or by explicitly
answering `no' when asked whether to fix a corrupted cylinder group.
If any corrupted cylinder groups remain after pass 1, fsck_ffs will
decline to run pass 5. Instead it marks the filesystem as unclean
so that fsck_ffs will need to be run again before the filesystem can
be mounted.
This patch cleans up and documents the return value from check_cgmagic().
It also renames the variable / parameter "rebuildcg" to "rebuiltcg".
This parameter describes whether the cylinder group has been rebuilt
rather than whether it should be rebuilt.
Reported by: Chuck Silvers
Reviewed by: Chuck Silvers
MFC after: 1 week
Increment a reference count when returning a zero'ed out buffer
after a failed read.
Zero out a structure before using it.
Only dirty a buffer that has been modified.
Submitted by: Chuck Silvers
Sponsored by: Netflix
MFC after: 1 week
Add the ability to set direct blocks numbers in inodes so that manual
corrections can be made. No checking of the values is attempted so
accidental or deliberate bad values can be set.
Submitted by: Chuck Silvers
MFC after: 1 week
getopt(3) returns int type not char. Using char triggers the
-Wtautological-constant-out-of-range-compare warning with clang.
Change the type of the variable used for holding the return value
of getopt(3) to int to match the prototype and eliminate the warning.
MFC after: 1 week
Introduce the OpenBSD syntax of "scrub" option for "match" and "pass"
rules and the "set reassemble" flag. The patch is backward-compatible,
pf.conf can be still written in FreeBSD-style.
Obtained from: OpenBSD
MFC after: never
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D38025
Commit 896516e54a added a new NFS mount option
used for Kerberized NFSv4.1/4.2 mounts. It specifies that
AUTH_SYS be used for state maintenance (also called system)
operations. This allows the mount to be done without the
"gssname" option or a valid Kerberos TGT being held by the
user doing the mount (so it can be specified in fstab(5) for
example).
Reviewed by: gbe (manpages), karels
MFC after: 3 months
Differential Revision: https://reviews.freebsd.org/D39469
A machine might exist on multiple networks, all of which offer, say, default
routes or name servers. There's no easy way to indicate in the config
that those options are only valid for a single interface.
Now, we can write:
interface "lan0" {
request routers;
require routers;
}
interface "lan1" {
ignore routers;
}
And only take action on default routes offered on lan0.
Tested by: Jose Luis Duran <jlduran at gmail dot com>
MFC after: 2 months
Reviewed by: allanjude, imp
Sponsored by: Zenith Electronics LLC
Sponsored by: Klara, Inc.
Pull Request: #693
For clone create and rename operations, the interface name get back can be
different from the one passed to ioctl(). Use the interface name we get back
so that ifconfig will not return unexpected ENXIO.
PR: 270618
Reviewed by: kp
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D39396
route.c uses newroute() to handle the "route get" command. The logic
inside newroute() adds RTF_GATEWAY flag if "-interface" flag is not
specified. That results in the inconsistent RTM_GET message with
RTF_GATEWAY set but no RTAX_GATEWAY provided. Address this in the
translation code by checking if the gateway is actually provided.
Notable upstream pull request merges:
#12194 Fix short-lived txg caused by autotrim
#13368 ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()
#13392 Implementation of block cloning for ZFS
#13741 SHA2 reworking and API for iterating over multiple implementations
#14282 Sync thread should avoid holding the spa config write lock
when possible
#14283 txg_sync should handle write errors in ZIL
#14359 More adaptive ARC eviction
#14469 Fix NULL pointer dereference in zio_ready()
#14479 zfs redact fails when dnodesize=auto
#14496 improve error message of zfs redact
#14500 Skip memory allocation when compressing holes
#14501 FreeBSD: don't verify recycled vnode for zfs control directory
#14502 partially revert PR 14304 (eee9362a7)
#14509 Fix per-jail zfs.mount_snapshot setting
#14514 Fix data race between zil_commit() and zil_suspend()
#14516 System-wide speculative prefetch limit
#14517 Use rw_tryupgrade() in dmu_bonus_hold_by_dnode()
#14519 Do not hold spa_config in ZIL while blocked on IO
#14523 Move dmu_buf_rele() after dsl_dataset_sync_done()
#14524 Ignore too large stack in case of dsl_deadlist_merge
#14526 Use .section .rodata instead of .rodata on FreeBSD
#14528 ICP: AES-GCM: Refactor gcm_clear_ctx()
#14529 ICP: AES-GCM: Unify gcm_init_ctx() and gmac_init_ctx()
#14532 Handle unexpected errors in zil_lwb_commit() without ASSERT()
#14544 icp: Prevent compilers from optimizing away memset()
in gcm_clear_ctx()
#14546 Revert zfeature_active() to static
#14556 Remove bad kmem_free() oversight from previous zfsdev_state_list
patch
#14563 Optimize the is_l2cacheable functions
#14565 FreeBSD: zfs_znode_alloc: lock the vnode earlier
#14566 FreeBSD: fix false assert in cache_vop_rmdir when replaying ZIL
#14567 spl: Add cmn_err_once() to log a message only on the first call
#14568 Fix incremental receive silently failing for recursive sends
#14569 Restore ASMABI and other Unify work
#14576 Fix detection of IBM Power8 machines (ISA 2.07)
#14577 Better handling for future crypto parameters
#14600 zcommon: Refactor FPU state handling in fletcher4
#14603 Fix prefetching of indirect blocks while destroying
#14633 Fixes in persistent error log
#14639 FreeBSD: Remove extra arc_reduce_target_size() call
#14641 Additional limits on hole reporting
#14649 Drop lying to the compiler in the fletcher4 code
#14652 panic loop when removing slog device
#14653 Update vdev state for spare vdev
#14655 Fix cloning into already dirty dbufs
#14678 Revert "Do not hold spa_config in ZIL while blocked on IO"
Obtained from: OpenZFS
OpenZFS commit: 431083f75b
VLAN identifier 0xFFF is reserved. It must not be configured or
transmitted.
Also validate during parsing to prevent potential integer overflow.
Reviewed by: #network, melifaro
Fixes: c7cffd65c5 Add support for stacked VLANs (IEEE 802.1ad, AKA Q-in-Q)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D39282
When recovering a system that is unbootable due to some
problem with the active BE, it is likely you'll be booted
from a rescue image running UFS. In this case, bectl
needs help finding the zpool root that you want to operate
on. In this case, improve the error message to suggest
specifying a root, rather than just emitting a generic
error message that might imply, to the naive user, that
there is a ZFS compatibility issue between the rescue
image and the on-disk ZFS pool.
Reviewed by: imp, kevans
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D39346
The algorithm for laying out new directories was devised in the 1980s
and markedly improved the performance of the filesystem. In those days
large disks had at most 100 cylinder groups and often as few as 10-20.
Modern multi-terrabyte disks have thousands of cylinder groups. The
original algorithm does not handle these large sizes well. This change
attempts to expand the scope of the original algorithm to work well
with these much larger disks while still retaining the properties
of the original algorithm for small disks.
The filesystem implementation is divided into policy routines and
implementation routines. The policy routines can be changed in any
way desired without risk of corrupting the filesystem. The policy
requests are handled by the implementation layer. If the policy
asks for an available resource, it is granted. But if it asks for
an already in-use resource, then the implementation will provide
an available one nearby the request. Thus it is impossible for a
policy to double allocate. This change is limited to the policy
implementation.
This change updates the ffs_dirpref() routine which is responsible
for selecting the cylinder group into which a new directory should
be placed. If we are near the root of the filesystem we aim to
spread them out as much as possible. As we descend deeper from the
root we cluster them closer together around their parent as we
expect them to be more closely interactive. Higher-level directories
like usr/src/sys and usr/src/bin should be separated while the
directories in these areas are more likely to be accessed together
so should be closer. And directories within commands or kernel
subsystems should be closer still.
We pick a range of cylinder groups around the cylinder group of the
directory in which we are being created. The size of the range for
our search is based on our depth from the root of our filesystem.
We then probe that range based on how many directories are already
present. The first new directory is at 1/2 (middle) of the range;
the second is in the first 1/4 of the range, then at 3/4, 1/8, 3/8,
5/8, 7/8, 1/16, 3/16, 5/16, etc.
It is desirable to store the depth of a directory in its on-disk
inode so that it is available when we need it. We add a new field
di_dirdepth to track the depth of each directory. Because there are
few spare fields left in the inode, we choose to share an existing
field in the inode rather than having one of our own. Specifically
we create a union with the di_freelink field. The di_freelink field
is used to track inodes that have been unlinked but remain referenced.
It is not needed until a rmdir(2) operation has been done on a
directory. At that point, the directory has no contents and even
if it is kept active as a current directory is no longer able to
have any new directories or files created in it. Thus the use of
di_dirdepth and di_freelink will never coincide.
Reported by: Timo Voelker
Reviewed by: kib
Tested by: Peter Holm
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39246
- passing I/O commands through nda requires nsid field to be set (it was
unused when going through nvme_ns_ioctl())
- ccb's status can be OR'ed with the flags, use CAM_STATUS_MASK
Reviewed by: imp (cam)
Differential Revision: https://reviews.freebsd.org/D37696
Test regression fixed in 4630a3252a. Add two tests that do not
use the verbose flag, so the code path in question can be reached:
1. Respond with a proper ICMP destination host unreachable packet.
2. Respond with a doctored ICMP destination host unreachable packet,
that has the ICMP Identifier field modified (+1 bit).
Reviewed by: cy
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D39244