Commit Graph

396 Commits

Author SHA1 Message Date
Warner Losh
95ee2897e9 sys: Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
2023-08-16 11:54:11 -06:00
Warner Losh
4d846d260e spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with:		pfg
MFC After:		3 days
Sponsored by:		Netflix
2023-05-12 10:44:03 -06:00
Stefan Eßer
88a795e80c sys/fs: do not report blocks allocated for synthetic file systems
The pseudo file systems (devfs, fdescfs, procfs, etc.) report total
and available blocks and inodes despite being synthetic with no
underlying storage device to which those values could be applied.

The current code of these file systems tends to report a fixed number
of total blocks but no free blocks, and in the case of procfs,
libprocfs, linsysfs also no free inodes.

This can be irritating in e.g. the "df" output, since 100% of the
resources seem to be in use, but it can also create warnings in
monitoring tools used for capacity management.

This patch makes these file systems return the same value for the
total and free parameters, leading to 0% in use being displayed by
"df". Since there is no resource that can be exhausted, this appears
to be a sensible result.

Reviewed by:	mckusick
Differential Revision:	https://reviews.freebsd.org/D39442
2023-04-25 09:59:15 +02:00
Mateusz Guzik
829f0bcb5f vfs: add the concept of vnode state transitions
To quote from a comment above vput_final:
<quote>
* XXX Some filesystems pass in an exclusively locked vnode and strongly depend
* on the lock being held all the way until VOP_INACTIVE. This in particular
* happens with UFS which adds half-constructed vnodes to the hash, where they
* can be found by other code.
</quote>

As is there is no mechanism which allows filesystems to denote that a
vnode is fully initialized, consequently problems like the above are
only found the hard way(tm).

Add rudimentary support for state transitions, which in particular allow
to assert the vnode is not legally unlocked until its fate is decided
(either construction finishes or vgone is called to abort it).

The new field lands in a 1-byte hole, thus it does not grow the struct.

Bump __FreeBSD_version to 1400077

Reviewed by:	kib (previous version)
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D37759
2022-12-26 17:35:12 +00:00
Mateusz Guzik
5b5b7e2ca2 vfs: always retain path buffer after lookup
This removes some of the complexity needed to maintain HASBUF and
allows for removing injecting SAVENAME by filesystems.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D36542
2022-09-17 09:10:38 +00:00
Mateusz Guzik
ad5e1f9c2d devfs: stop taking the interlock in devfs_delete
It buys nothing now that vhold does not require it.
2022-09-14 22:51:42 +00:00
Mateusz Guzik
a1c555f48b devfs: retire the unused DEVFS_DEL_VNLOCKED flag 2022-09-14 22:47:53 +00:00
Mateusz Guzik
497240def8 Retire clone_drain_lock
It is only ever xlocked in drain_dev_clone_events and the only consumer of
that routine does not need it -- eventhandler code already makes sure the
relevant callback is no longer running.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D36268
2022-08-20 09:44:05 +00:00
Gordon Bergling
8ea3ceda76 fs: fix a few common typos in source code comments
- s/quadradically/quadratically/
- s/persistant/persistent/

Obtained from:	NetBSD
MFC after:	3 days
2022-02-06 13:48:31 +01:00
Konstantin Belousov
66c5fbca77 insmntque1(): remove useless arguments
Also remove once-used functions to clean up after failed insmntque1(),
which were destructor callbacks in previous life.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D34071
2022-01-31 16:49:08 +02:00
Mateusz Guzik
2a7e4cf843 Revert b58ca5df0b ("vfs: remove the now unused insmntque1")
I was somehow convinced that insmntque calls insmntque1 with a NULL
destructor. Unfortunately this worked well enough to not immediately
blow up in simple testing.

Keep not using the destructor in previously patched filesystems though
as it avoids unnecessary casts.

Noted by:	kib
Reported by:	pho
2022-01-27 16:32:22 +00:00
Mateusz Guzik
3af3e99ce4 devfs: stop using insmntque1
It adds nothing of value over insmntque.
2022-01-27 00:54:30 +01:00
Mateusz Guzik
3ffcfa599e vfs: add vop_stdadd_writecount_nomsync
This avoids needing to inspect the mount point every time.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D33125
2021-11-26 12:06:08 +00:00
Mateusz Guzik
2b68eb8e1d vfs: remove thread argument from VOP_STAT
and fo_stat.
2021-10-11 13:22:32 +00:00
Mateusz Guzik
b4a58fbf64 vfs: remove cn_thread
It is always curthread.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D32453
2021-10-11 13:21:47 +00:00
Mark Johnston
243b324f96 devfs: Avoid comparison with an uninitialized var in devfs_fp_check()
devvn_refthread() will initialize *devp only if it succeeds, so check for
success before comparing with fp->f_data.  Other devvn_refthread()
callers are careful to do this.

Reported by:	KMSAN
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30068
2021-05-03 13:24:30 -04:00
Konstantin Belousov
d485c77f20 Remove #define _KERNEL hacks from libprocstat
Make sys/buf.h, sys/pipe.h, sys/fs/devfs/devfs*.h headers usable in
userspace, assuming that the consumer has an idea what it is for.
Unhide more material from sys/mount.h and sys/ufs/ufs/inode.h,
sys/ufs/ufs/ufsmount.h for consumption of userspace tools, with the
same caveat.

Remove unacceptable hack from usr.sbin/makefs which relied on sys/buf.h
being unusable in userspace, where it override struct buf with its own
definition.  Instead, provide struct m_buf and struct m_vnode and adapt
code to use local variants.

Reviewed by:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D28679
2021-02-21 11:38:21 +02:00
Mateusz Guzik
3bc17248d3 devfs: fix use count leak when using TIOCSCTTY
by matching devfs_ctty_ref

Fixes: 3b44443626 ("devfs: rework si_usecount to track opens")
2021-02-09 01:54:21 +00:00
Edward Tomasz Napierala
4ddb3cc597 devfs(4): defer freeing until we drop devmtx ("cdev")
Before r332974 the old code would sometimes cause a rare lock order
reversal against pagequeue, which looked roughly like this:

witness_checkorder()
__mtx_lock-flags()
vm_page_alloc()
uma_small_alloc()
keg_alloc_slab()
keg_fetch-slab()
zone_fetch-slab()
zone_import()
zone_alloc_bucket()
uma_zalloc_arg()
bucket_alloc()
uma_zfree_arg()
free()
devfs_metoo()
devfs_populate_loop()
devfs_populate()
devfs_rioctl()
VOP_IOCTL_APV()
VOP_IOCTL()
vn_ioctl()
fo_ioctl()
kern_ioctl()
sys_ioctl()

Since r332974 the original problem no longer exists, but it still
makes sense to move things out of the - often congested - lock.

Reviewed By:	kib, markj
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D27334
2020-12-29 13:47:36 +00:00
Mateusz Guzik
586ee69f09 fs: clean up empty lines in .c and .h files 2020-09-01 21:18:40 +00:00
Conrad Meyer
0ac9e27ba9 devfs: Abstract locking assertions
The conversion was largely mechanical: sed(1) with:

  -e 's|mtx_assert(&devmtx, MA_OWNED)|dev_lock_assert_locked()|g'
  -e 's|mtx_assert(&devmtx, MA_NOTOWNED)|dev_lock_assert_unlocked()|g'

The definitions of these abstractions in fs/devfs/devfs_int.h are the
only non-mechanical change.

No functional change.
2020-08-12 00:32:31 +00:00
Mateusz Guzik
3b44443626 devfs: rework si_usecount to track opens
This removes a lot of special casing from the VFS layer.

Reviewed by:	kib (previous version)
Tested by:	pho (previous version)
Differential Revision:	https://reviews.freebsd.org/D25612
2020-08-11 14:27:57 +00:00
Mateusz Guzik
ca423b858b devfs: bool -> int
Fixes buildworld after r364069
2020-08-10 11:46:39 +00:00
Mateusz Guzik
7b19bddac8 devfs: save on spurious relocking for devfs_populate
Tested by:	pho
2020-08-10 10:36:43 +00:00
Mateusz Guzik
f8935a96d1 devfs: use cheaper lockmgr entry points
Tested by:	pho
2020-08-10 10:36:10 +00:00
Mateusz Guzik
f9c13ab856 devfs: use vget_prep/vget_finish
Tested by:	pho
2020-08-10 10:35:47 +00:00
Mateusz Guzik
d292b1940c vfs: remove the obsolete privused argument from vaccess
This brings argument count down to 6, which is passable without the
stack on amd64.
2020-08-05 09:27:03 +00:00
Mateusz Guzik
11c345b18f devfs: fix a vnode use-after-free in devfs_ioctl
The vnode to be replaced was read with a shared lock, meaning 2 racing threads
can find the same one.

While here clean it up a little bit.
2020-07-04 06:27:28 +00:00
Thomas Munro
f270658873 vfs: track sequential reads and writes separately
For software like PostgreSQL and SQLite that sometimes reads sequentially
while also writing sequentially some distance behind with interleaved
syscalls on the same fd, performance is better on UFS if we do
sequential access heuristics separately for reads and writes.

Patch originally by Andrew Gierth in 2008, updated and proposed by me with
his permission.

Reviewed by:	mjg, kib, tmunro
Approved by:	mjg (mentor)
Obtained from:	Andrew Gierth <andrew@tao11.riddles.org.uk>
Differential Revision:	https://reviews.freebsd.org/D25024
2020-06-21 08:51:24 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Mateusz Guzik
f1fa1ba3d0 Fix up various vnode-related asserts which did not dump the used vnode 2020-02-03 14:25:32 +00:00
Kyle Evans
6a5abb1ee5 Provide O_SEARCH
O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping
permissions checks on the directory itself after the initial open(). This is
close to the semantics we've historically applied for O_EXEC on a directory,
which is UB according to POSIX. Conveniently, O_SEARCH on a file is also
explicitly undefined behavior according to POSIX, so O_EXEC would be a fine
choice. The spec goes on to state that O_SEARCH and O_EXEC need not be
distinct values, but they're not defined to be the same value.

This was pointed out as an incompatibility with other systems that had made
its way into libarchive, which had assumed that O_EXEC was an alias for
O_SEARCH.

This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC
respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a
directory is checked in vn_open_vnode already, so for completeness we add a
NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not
re-check that when descending in namei.

[0] https://pubs.opengroup.org/onlinepubs/9699919799/

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23247
2020-02-02 16:34:57 +00:00
Mateusz Guzik
45757984f8 vfs: consistently use size_t for buflen around VOP_VPTOCNP 2020-02-01 20:34:43 +00:00
Mateusz Guzik
b249ce48ea vfs: drop the mostly unused flags argument from VOP_UNLOCK
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D21427
2020-01-03 22:29:58 +00:00
Mateusz Guzik
6fa079fc3f vfs: flatten vop vectors
This eliminates the following loop from all VOP calls:

while(vop != NULL && \
    vop->vop_spare2 == NULL && vop->vop_bypass == NULL)
        vop = vop->vop_default;

Reviewed by:	jeff
Tesetd by:	pho
Differential Revision:	https://reviews.freebsd.org/D22738
2019-12-16 00:06:22 +00:00
Mateusz Guzik
abd80ddb94 vfs: introduce v_irflag and make v_type smaller
The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.

v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.

Reviewed by:	kib, jeff
Differential Revision:	https://reviews.freebsd.org/D22715
2019-12-08 21:30:04 +00:00
Kyle Evans
1b50b999f9 tty: implement TIOCNOTTY
Generally, it's preferred that an application fork/setsid if it doesn't want
to keep its controlling TTY, but it could be that a debugger is trying to
steal it instead -- so it would hook in, drop the controlling TTY, then do
some magic to set things up again. In this case, TIOCNOTTY is quite handy
and still respected by at least OpenBSD, NetBSD, and Linux as far as I can
tell.

I've dropped the note about obsoletion, as I intend to support TIOCNOTTY as
long as it doesn't impose a major burden.

Reviewed by:	bcr (manpages), kib
Differential Revision:	https://reviews.freebsd.org/D22572
2019-11-30 20:10:50 +00:00
Mateusz Guzik
a02cab334c devfs: introduce a per-dev lock to protect ->si_devsw
This allows bumping threadcount without taking the global devmtx lock.

In particular this eliminates contention on said lock while using bhyve
with multiple vms.

Reviewed by:	kib
Tested by:	markj
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22548
2019-11-30 16:46:19 +00:00
Mateusz Guzik
1fccb43c39 vfs: change si_usecount management to count used vnodes
Currently si_usecount is effectively a sum of usecounts from all associated
vnodes. This is maintained by special-casing for VCHR every time usecount is
modified. Apart from complicating the code a little bit, it has a scalability
impact since it forces a read from a cacheline shared with said count.

There are no consumers of the feature in the ports tree. In head there are only
2: revoke and devfs_close. Both can get away with a weaker requirement than the
exact usecount, namely just the count of active vnodes. Changing the meaning to
the latter means we only need to modify it on 0<->1 transitions, avoiding the
check plenty of times (and entirely in something like vrefact).

Reviewed by:	kib, jeff
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22202
2019-11-20 12:05:59 +00:00
Mateusz Guzik
4c9ba39aea devfs: use MNTK_NOMSYNC
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22009
2019-10-13 15:41:47 +00:00
Konstantin Belousov
e3fdd051f9 devfs_vptocnp(): correct the component name when node is not at top.
Node' cdp.si_name is the full path as provided by make_dev(9), it
should not be returned by VOP_VPTOCNP() when only the last component
is requested.  Use the dirent entry instead.

With this note, handling of VDIR and VCHR nodes only differs in
handling of root vnode, which simplifies and unifies the logic.

Reported by:	Li, Zhichao1 <Zhichao_Li1@Dell.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-10-11 18:41:24 +00:00
Mateusz Guzik
559ac49d41 devfs: add root vnode caching
See r353150.

Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21646
2019-10-06 22:16:55 +00:00
Mateusz Guzik
dfa8dae493 devfs: plug redundant bwillwrite avoidance
vn_write already checks for vnode type to see if bwillwrite should be called.

This effectively reverts r244643.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21905
2019-10-05 17:44:33 +00:00
Konstantin Belousov
6470c8d3db Rework v_object lifecycle for vnodes.
Current implementation of vnode_create_vobject() and
vnode_destroy_vobject() is written so that it prepared to handle the
vm object destruction for live vnode.  Practically, no filesystems use
this, except for some remnants that were present in UFS till today.
One of the consequences of that model is that each filesystem must
call vnode_destroy_vobject() in VOP_RECLAIM() or earlier, as result
all of them get rid of the v_object in reclaim.

Move the call to vnode_destroy_vobject() to vgonel() before
VOP_RECLAIM().  This makes v_object stable: either the object is NULL,
or it is valid vm object till the vnode reclamation.  Remove code from
vnode_create_vobject() to handle races with the parallel destruction.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D21412
2019-08-29 07:50:25 +00:00
Mark Johnston
918988576c Avoid relying on header pollution from sys/refcount.h.
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-07-29 20:26:01 +00:00
Conrad Meyer
e2e050c8ef Extract eventfilter declarations to sys/_eventfilter.h
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.

EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).

As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions.  The remainder of the patch addresses
adding appropriate includes to fix those files.

LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).

No functional change (intended).  Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed.  __FreeBSD_version has been bumped.
2019-05-20 00:38:23 +00:00
Mark Johnston
6d2e2df764 Ensure that directory entry padding bytes are zeroed.
Directory entries must be padded to maintain alignment; in many
filesystems the padding was not initialized, resulting in stack
memory being copied out to userspace.  With the ino64 work there
are also some explicit pad fields in struct dirent.  Add a subroutine
to clear these bytes and use it in the in-tree filesystems.  The
NFS client is omitted for now as it was fixed separately in r340787.

Reported by:	Thomas Barabosch, Fraunhofer FKIE
Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2018-11-23 22:24:59 +00:00
Konstantin Belousov
1c4ca77890 Add d_off support for multiple filesystems.
The d_off field has been added to the dirent structure recently.
Currently filesystems don't support this feature.  Support has been
added and tested for zfs, ufs, ext2fs, fdescfs, msdosfs and unionfs.
A stub implementation is available for cd9660, nandfs, udf and
pseudofs but hasn't been tested.

Motivation for this feature: our usecase is for a userspace nfs server
(nfs-ganesha) with zfs.  At the moment we cache direntry offsets by
calling lseek once per entry, with this patch we can get the offset
directly from getdirentries(2) calls which provides a significant
speedup.

Submitted by:	Jack Halford <jack@gandi.net>
Reviewed by:	mckusick, pfg, rmacklem (previous versions)
Sponsored by:	Gandi.net
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17917
2018-11-14 14:18:35 +00:00
Brooks Davis
ed34a7fcf2 Move 32-bit compat support for FIODGNAME to the right place.
ioctl(2) commands only have meaning in the context of a file descriptor
so translating them in the syscall layer is incorrect.

The new handler users an accessor to retrieve/construct a pointer from
the last member of the passed structure and relies on type punning to
access the other member which requires no translation.

Unlike r339174 this change supports both places FIODGNAME is handled.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17475
2018-10-26 17:59:25 +00:00
Brooks Davis
9bc603bd20 Revert r339174: Move 32-bit compat support for FIODGNAME to the right place.
A case was missed in this commit which breaks sshing into a 32-bit sshd
on a 64-bit system.

Approved by:	re (gjb)
2018-10-04 23:55:03 +00:00