Currently shutdown_reset() is registered as the final entry of the
shutdown_final event handler. However, if a panic occurs early in boot
before the event is registered (SI_SUB_INTRINSIC), we may end up
spinning in the subsequent infinite for loop and failing to reset
altogether. Instead we can simply call this function unconditionally.
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D37981
This will allow user-space programs (e.g. lsof) to locate deleted files
whose nlink equals zero. Prior to this commit, programs has to use
stat(kf_path) to get nlink, but that will fail if the file is deleted.
[mjg: s/fail/file in the commit message]
Reviewed by: mjg
Differential Revision: https://reviews.freebsd.org/D38169
In vn_getsize_locked(), when storing vattr.va_size of type u_quad_t into
off_t size, we must avoid overflow.
Then, the check for fsize < 0, introduced in the commit
f45feecfb2 'vfs: add vn_getsize', is nop [1].
Reported and reviewed by: jhb
Coverity CID: 1502346
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D38133
The assertion is equivalent to kstack_contains() so use that rather
than spelling it out.
Suggested by: jhb
Reviewed by: jhb
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D38107
Currently function prison_ip_restrict() returns true if the replacement
buffer was used, or no buffer provided and allocation fails and should
redo. The logic is confusing and cause possibly infinite loop from
eb8dcdeac2 .
Reviewed by: jamie, glebius
Approved by: kp (mentor)
Differential Revision: https://reviews.freebsd.org/D37918
And possibly infinite loop calling prison_ip_restrict() in
kern_jail_set() [2].
[1] It is possible that prisons do not have any IPv4 or IPv6 addresses.
[2] If prison_ip_restrict() is not provided with prison_ip, when it
allocates prison_ip successfully, then it should return false to
indicate not redo prison_ip_restrict() later.
Reviewed by: glebius
Approved by: kp (mentor)
Fixes: eb8dcdeac2 jail: network epoch protection for IP address lists
Differential Revision: https://reviews.freebsd.org/D37906
... which verifies that given file table does not have file descriptors
referencing vnodes on the specified mount point. It is up to the caller
to ensure that the check is not racy.
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37896
getattr is very expensive and in important cases only gets called to get
the size. This can be optimized with a dedicated routine which obtains
that statistic.
As a step towards that goal make size-only consumers use a dedicated
routine.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D37885
If an error such as an invalid record or one whose decryption fails is
detected on a socket that has received a RST then ktls_drop() could
ignore the error since INP_DROPPED could already be set. In this case
soreceive_generic hangs since it does not return from a KTLS socket
with pending encrypted data unless there is an error (so_error) (this
behavior is to ensure that soreceive_generic doesn't return a
premature EOF when there is pending data still being decrypted).
Note that this was a bug prior to
69542f2682 as tcp_usr_abort would also
have ignored the error in this case.
Reviewed by: gallatin
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D37775
To quote from a comment above vput_final:
<quote>
* XXX Some filesystems pass in an exclusively locked vnode and strongly depend
* on the lock being held all the way until VOP_INACTIVE. This in particular
* happens with UFS which adds half-constructed vnodes to the hash, where they
* can be found by other code.
</quote>
As is there is no mechanism which allows filesystems to denote that a
vnode is fully initialized, consequently problems like the above are
only found the hard way(tm).
Add rudimentary support for state transitions, which in particular allow
to assert the vnode is not legally unlocked until its fate is decided
(either construction finishes or vgone is called to abort it).
The new field lands in a 1-byte hole, thus it does not grow the struct.
Bump __FreeBSD_version to 1400077
Reviewed by: kib (previous version)
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D37759
While here prefix with v for better consistency with the vnode stuff.
Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D37759
Re-assign the sc local (syscall number) before moving args for SYS_syscall.
Correct the audit and kdtrace hooks invocations.
Fixes: 140ceb5d95
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
It indicates to a debugger that the thread is stopped at the
kernel->user exit path.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37590
and TDB_COREDUMPRQ to TDB_COREDUMPREQ
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37590
Mount on a LE machine a filesystem formatted for BE is not supported
currently. This adds a check for the superblock magic number using
swapped bytes to guess and warn the user that it may be a valid
superblock but endian is incompatible.
MFC after: 2 weeks
Reviewed by: mckusick
Obtained from: mckusick, alfredo
Differential Revision: https://reviews.freebsd.org/D37675
For file mounts, the directory vnode is not available from namei and this
prevents the use of vn_fullpath_hardlink. In this case, we can use the
vnode which was covered by the file mount with vn_fullpath.
This also disallows file mounts over files with link counts greater than
one to ensure a deterministic path to the mount point.
Reviewed by: mjg, kib
Tested by: pho
The main use-case for this is to support mounting config files and
secrets into OCI containers. My current workaround copies the files into
the container which is messy and risks secrets leaking into container
images if the cleanup fails.
This adds a VFCF flag to indicate whether the filesystem supports file
mounts and allows fspath to be either a directory or a file if the flag
is set.
Test Plan:
$ sudo mkdir -p /mnt
$ sudo touch /mnt/foo
$ sudo mount -t nullfs /COPYRIGHT /mnt/foo
Reviewed by: mjg, kib
Tested by: pho
instead of looking at SAVESTART
This is a step towards removing the flag.
Reviewed by: mckusick
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D34468
This patch adds "allow.nfsd" to the jail code based on a
new kernel build option VNET_NFSD. This will not work
until future patches fix nmount(2) to allow mountd to
run in a vnet prison and the NFS server code is patched
so that global variables are in a vnet.
The jail(8) man page will be patched in a future commit.
Reviewed by: jamie
MFC after: 4 months
Differential Revision: https://reviews.freebsd.org/D37637
It appears that, prior to r158857 vfs_domount() checked
suser() when MNT_EXPORTED was specified.
r158857 appears to have broken this, since MNT_EXPORTED
was no longer set when mountd.c was converted to use nmount(2).
r164033 replaced the suser() check with
priv_check(td, PRIV_VFS_MOUNT_EXPORTED), which does the
same thing (ie. checks for effective uid == 0 assuming suses_enabled
is set).
This patch restores this check by setting MNT_EXPORTED when the
"export" mount option is specified to nmount().
I think this is reasonable since only mountd(8) should be setting
exports and I doubt any non-root mounted file system would
be setting its own exports.
Reviewed by: kib, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D37718
pr_abort calls tcp_usr_abort which calls tcp_drop with ECONNABORTED.
After pr_abort returns, the so_error is then set to a more specific
error. However, a reader can observe and return the ECONNABORTED
error before so_error is set to the desired error value. This is
resulting in spurious test failures of recently added tests for
invalid conditions such as invalid headers.
To fix, refactor the code to abort a connection to call tcp_drop
directly with the desired error value. ktls_reset_send_tag already
calls tcp_drop directly when it aborts a connection due to an error.
Reviewed by: gallatin
Reported by: CI (jenkins), gallatin, olivier
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D37692
Sockets have special handling for EPIPE on a write, that was spread out
into several places. Treating transient errors is also special - if
protocol is atomic, than we should ignore any changes to uio_resid, a
transient error means the write had completely failed (see d2b3a0ed31).
- Provide sousrsend() that expects a valid uio, and leave sosend() for
kernel consumers only. Do all special error handling right here.
- In dofilewrite() don't do special handling of error for DTYPE_SOCKET.
- For send(2), write(2) and aio_write(2) call into sousrsend() and remove
error handling for kern_sendit(), soo_write() and soaio_process_job().
PR: 265087
Reported by: rz-rpi03 at h-ka.de
Reviewed by: markj
Differential revision: https://reviews.freebsd.org/D35863
When VV_CROSSLOCK is present, the lock for the vnode at the current
stage of lookup must be held across the VFS_ROOT() call for the
filesystem mounted at the vnode. Since VV_CROSSLOCK implies that
the root vnode reuses the already-held lock, the possibility for
recursion should be made clear in the flags passed to VFS_ROOT().
For cases in which the lock is held exclusive, this means passing
LK_CANRECURSE. For cases in which the lock is held shared, it
means clearing LK_NODDLKTREAT to allow VFS_ROOT() to potentially
recurse on the shared lock even in the presence of an exclusive
waiter.
That the existing code works for unionfs is due to a coincidence
of the current unionfs implementation.
Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D37458
Return the value as stat(2) st_blocks.
Suggested and reviewed by: markj (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37097
This makes tmpfs size accounting correct for the sparce files. Also
correct report st_blocks/va_bytes. Previously the reported value did not
accounted for the swapped out pages.
PR: 223015
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37097
This allows the use of chroot and/or jail environments which depend on
interpreters registed with imgact_binmisc to use emulator binaries from
the host to emulate programs inside the chroot.
Reviewed by: imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D37432
Early versions of this code had a free, but this one doesn't need
it. Remove the forgotten free(vv); from earlier versions.
Fixes: ed56dcfc6b
Noticed by: Michael Butler
Sponsored by: Netflix
Copy the arg that sets a variable to maximize the reuse of this
routine. There are places we call it from that are const char * and it
might not be safe to cast that away.
Sponsored by: Netflix
This reverts commit 68c3f03021. There are
some weird crashes when KVMs switch caused by this, so revert this
commit until they are sorted out.
Reported by: cy@
Sponsored by: Netflix
In the rare case that we succeed in probing, but fail to attach, flip
the default to be to disable the
device. hw.bus.disable_failed_devices=false is no required to restore
the old behavior. The old behavior dates form a time when dynamic
control of devices wasn't yet present (devctl didn't exist). Now that
one can retry probe/attach the device with devctl, the default doesn't
make sense: The more desirable behaivor is to have stable device numbers
when one has several instances of the same device in a system (common
for NICs or HBAs).
Reviewed by: jhb (verbal)
Sponsored by: Netflix
Normally, when a device fails to attach, we tear down the newbus state
for that device so that future driver loads can try again (maybe with a
different driver, or maybe with a re-loaded and fixed kld).
Sometimes, however, it is desirable to have the device fail
permanantly. We do this by calling device_disable() on a failed
attached, as well as keeping the device in DS_ATTACHING forever. This
prevents retries on that device. This is enabled via
hw.bus.disable_failed_devices=1 in either a hint via the loader, or at
runtime with a sysctl setting. Setting from 1 -> 0 at runtime will not
affect previously disabled devices, however: they remain disabled.
They can be re-enabled manually with devctl enable, however.
Sponsored by: Netflix
Reviewed by: gallatin, hselasky, jhb
Differential Revision: https://reviews.freebsd.org/D37517
One year ago, I deprecated 'kern' in favor of 'kernel' for the system
name for some power events. I'm about to remove it from the kernel, but
realized there's been no warning generated for users. Preserve POLA by
converting on the fly here and issuing a warning for 14.x, and an fatal
error after we branch 15. Make compiling it an error on 16 to remove
the gross hack after we branch.
Sponsored by: Netflix
Reviewed by: bapt
Differential Revision: https://reviews.freebsd.org/D37584
The new "kernel" system name is the one that's documented and has
been generated for a year now. Remove the old one now that 14.0
is getting close.
Sponsored by: Netflix
Reviewed by: bapt
Differential Revision: https://reviews.freebsd.org/D37582