Commit Graph

289 Commits

Author SHA1 Message Date
Konstantin Belousov
76b1b5ce6d nullfs: protect against user creating inconsistent state
The VFS conventions is that VOP_LOOKUP() methods do not need to handle
ISDOTDOT lookups for VV_ROOT vnodes (since they cannot, after all).  Nullfs
bypasses VOP_LOOKUP() to lower filesystem, and there, due to user actions,
it is possible to get into situation where
- upper vnode does not have VV_ROOT set
- lower vnode is root
- ISDOTDOT is requested
User just needs to nullfs-mount non-root of some filesystem, and then move
some directory under mount, out of mount, using lower filesystem.

In this case, nullfs cannot do much, but we still should and can ensure
internal kernel structures are consistent.  Avoid ISDOTDOT lookup forwarding
when VV_ROOT is set on lower dvp, return somewhat arbitrary ENOENT.

PR:	253593
Reported by:	Gregor Koscak <elogin41@gmail.com>
Test by:	Patrick Sullivan <sulli00777@gmail.com>
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-04-02 15:40:25 +03:00
Konstantin Belousov
16dea83410 null_vput_pair(): release use reference on dvp earlier
We might own the last use reference, and then vrele() at the end would
need to take the dvp vnode lock to inactivate, which causes deadlock
with vp. We cannot vrele() dvp from start since this might unlock ldvp.

Handle it by holding the vnode and dropping use ref after lowerfs
VOP_VPUT_PAIR() ended.  This effectivaly requires unlock of the vp vnode
after VOP_VPUT_PAIR(), so the call is changed to set unlock_vp to true
unconditionally.  This opens more opportunities for vp to be reclaimed,
if lvp is still alive we reinstantiate vp with null_nodeget().

Reported and tested by:	pho
Reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D29178
2021-03-12 13:31:08 +02:00
Konstantin Belousov
e4aaf35ab5 nullfs: provide special bypass for VOP_VPUT_PAIR
Generic bypass cannot understand the rules of liveness for the VOP.

Reviewed by:	chs, mckusick
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-02-12 03:02:20 +02:00
Mateusz Guzik
3e506a67bb vfs: add v_irflag accessors
Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D27793
2021-01-03 06:50:06 +00:00
Konstantin Belousov
f7af6e5e54 nullfs: provide custom bypass for VOP_READ_PGCACHE().
Normal bypass expects locked vnode, which is not true for
VOP_READ_PGCACHE().  Ensure liveness of the lower vnode by taking the
upper vnode interlock, which is also taked by null_reclaim() when
setting v_data to NULL.

Reported and tested by:	pho
Reviewed by:	markj, mjg
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27327
2020-11-26 18:16:32 +00:00
Edward Tomasz Napierala
e3c51151a0 Make it possible to mount nullfs(5) using plain mount(8)
instead of mount_nullfs(8).

Obviously you'd need to force mount(8) to not call
mount_nullfs(8) to make use of it.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26934
2020-10-29 15:28:15 +00:00
Mateusz Guzik
8ecd87a3e7 vfs: drop spurious cred argument from VOP_VPTOCNP 2020-10-20 07:18:27 +00:00
Konstantin Belousov
6b56b0ca93 nullfs: ensure correct lock is taken after bypass.
If lower VOP relocked the lower vnode, it is possible that nullfs
vnode was reclaimed meantime.  In this case nullfs vnode no longer
shares lock with lower vnode, which breaks locking protocol.

Check for the condition and acquire nullfs vnode lock if detected.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-10-19 19:23:22 +00:00
Mateusz Guzik
586ee69f09 fs: clean up empty lines in .c and .h files 2020-09-01 21:18:40 +00:00
Konstantin Belousov
685cb01a18 VMIO reads: enable for nullfs upper vnode if the lower vnode supports it.
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D25968
2020-08-16 21:05:56 +00:00
Mateusz Guzik
a92a971bbb vfs: remove the thread argument from vget
It was already asserted to be curthread.

Semantic patch:

@@

expression arg1, arg2, arg3;

@@

- vget(arg1, arg2, arg3)
+ vget(arg1, arg2)
2020-08-16 17:18:54 +00:00
Mateusz Guzik
fc9fcee01a nullfs: add missing VOP_STAT handling
Tested by:	pho
2020-08-10 10:31:17 +00:00
Mateusz Guzik
625adeaccd nullfs: don't pre lock exclusive in nullfs_root
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23955
2020-03-04 19:52:00 +00:00
Mateusz Guzik
f1fa1ba3d0 Fix up various vnode-related asserts which did not dump the used vnode 2020-02-03 14:25:32 +00:00
Mateusz Guzik
10a15df653 vfs: remove the never set VDESC_VPP_WILLRELE flag 2020-02-02 09:35:48 +00:00
Konstantin Belousov
dc1d2cc648 Fix a bug in r357199.
Around a generic call to null_nodeget(), there is nothing that would
prevent the unmount of the nullfs mp until we process to the
insmntque1() point.  Calculate the VV_ROOT flag after insmntque1() to
not access mp->mnt_data before we have an exclusively locked vnode
from this mount point on the mp vnode list.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-01-30 19:34:37 +00:00
Mateusz Guzik
3cfabd81a1 vfs: remove the never set VDESC_NOMAP_VPP flag 2020-01-30 08:56:22 +00:00
Konstantin Belousov
5fc9e11c42 Save lower root vnode in nullfs mnt data instead of upper.
Nullfs needs to know the root vnode of the lower fs during the
operation.  Currently it caches the upper vnode of it, which is also
the root of the nullfs mount.  On unmount, nullfs calls vflush() with
rootrefs == 1, and aborts non-forced unmount if there are any more
vnodes instantiated during vflush().  This means that the reference to
the root vnode after failed non-forced unmount could be lost and
nullm_rootvp points to the freed memory.

Fix it by storing the reference for lower vnode instead, which is kept
intact during vflush().  nullfs_root() now instantiates the upper
vnode of lower root.  Care about VV_ROOT flag in null_nodeget().

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-01-28 11:29:06 +00:00
Mateusz Guzik
b249ce48ea vfs: drop the mostly unused flags argument from VOP_UNLOCK
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D21427
2020-01-03 22:29:58 +00:00
Mateusz Guzik
6fa079fc3f vfs: flatten vop vectors
This eliminates the following loop from all VOP calls:

while(vop != NULL && \
    vop->vop_spare2 == NULL && vop->vop_bypass == NULL)
        vop = vop->vop_default;

Reviewed by:	jeff
Tesetd by:	pho
Differential Revision:	https://reviews.freebsd.org/D22738
2019-12-16 00:06:22 +00:00
Mateusz Guzik
abd80ddb94 vfs: introduce v_irflag and make v_type smaller
The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.

v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.

Reviewed by:	kib, jeff
Differential Revision:	https://reviews.freebsd.org/D22715
2019-12-08 21:30:04 +00:00
Mateusz Guzik
1e0006e49c nullfs: locklessly check for entries in null_hashget
During random sampling over poudriere -j 104 over 10% of calls returned NULL.
2019-12-05 13:41:22 +00:00
Mateusz Guzik
be4cd6912f nullfs: use MNTK_NOMSYNC
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22009
2019-10-13 15:42:04 +00:00
Mateusz Guzik
e0f4540a2a nullfs: reduce areas protected by vnode interlock in null_lock
Similarly to the other routine stop taking the interlock for the lower
vnode. The interlock for nullfs vnode is still taken to ensure
stability of ->v_data.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21480
2019-09-01 02:52:00 +00:00
Mateusz Guzik
13c73428dc nullfs: use VOP_NEED_INACTIVE
Reviewed by:	kib
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
2019-08-30 00:30:03 +00:00
Mateusz Guzik
1e2f0ceb2f vfs: add VOP_NEED_INACTIVE
vnode usecount drops to 0 all the time (e.g. for directories during path lookup).
When that happens the kernel would always lock the exclusive lock for the vnode
in order to call vinactive(). This blocks other threads who want to use the vnode
for looukp.

vinactive is very rarely needed and can be tested for without the vnode lock held.

This patch gives filesytems an opportunity to do it, sample total wait time for
tmpfs over 500 minutes of poudriere -j 104:

before: 557563641706 (lockmgr:tmpfs)
after:   46309603301 (lockmgr:tmpfs)

Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21371
2019-08-28 20:34:24 +00:00
Mateusz Guzik
33d46a3cef nullfs: reduce areas protected by vnode interlock
Some places only take the interlock to hold the vnode, which was a requiremnt
before they started being manipulated with atomics. Use the newly introduced
vholdnz to bump the count.

Reviewed by:	kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21358
2019-08-25 05:13:15 +00:00
Mateusz Guzik
81f666e79d nullfs: lock the vnode with LK_SHARED in null_vptocnp
null_nodeget which follows almost always finds the target vnode in the hash,
avoiding insmntque1 altogether. Should it be needed, it already checks if the
lock needs to be upgraded.

Reviewed by:	kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20244
2019-08-21 23:24:40 +00:00
Konstantin Belousov
3c93d22758 Manually clear text references on reclaim for nullfs and tmpfs.
Both filesystems do no use vnode_pager_dealloc() which would handle
this case otherwise.  Nullfs because vnode vm_object handle never
points to nullfs vnode.  Tmpfs because its vm_object is never vnode
object at all.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-06-05 20:16:25 +00:00
Konstantin Belousov
78022527bb Switch to use shared vnode locks for text files during image activation.
kern_execve() locks text vnode exclusive to be able to set and clear
VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0
condition.

The change removes VV_TEXT, replacing it with the condition
v_writecount <= -1, and puts v_writecount under the vnode interlock.
Each text reference decrements v_writecount.  To clear the text
reference when the segment is unmapped, it is recorded in the
vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and
v_writecount is incremented on the map entry removal

The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that
v_writecount does not contradict the desired change.  vn_writecheck()
is now racy and its use was eliminated everywhere except access.
Atomic check for writeability and increment of v_writecount is
performed by the VOP.  vn_truncate() now increments v_writecount
around VOP_SETATTR() call, lack of which is arguably a bug on its own.

nullfs bypasses v_writecount to the lower vnode always, so nullfs
vnode has its own v_writecount correct, and lower vnode gets all
references, since object->handle is always lower vnode.

On the text vnode' vm object dealloc, the v_writecount value is reset
to zero, and deadfs vop_unset_text short-circuit the operation.
Reclamation of lowervp always reclaims all nullfs vnodes referencing
lowervp first, so no stray references are left.

Reviewed by:	markj, trasz
Tested by:	mjg, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D19923
2019-05-05 11:20:43 +00:00
Konstantin Belousov
7ae3486e6d nullfs: fix unmounts when filesystem is active.
If vflush() did not completely flushed the mount vnodes queue, either
retry for forced unmounts, or give up for non-forced.  This situation
can occur when new vnodes are instantiated while vflush() worked.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-03-21 13:30:48 +00:00
Konstantin Belousov
b9662886ef Un null_vptocnp(), cache vp->v_mount and use it for null_nodeget() call.
The vp vnode is unlocked during the execution of the VOP method and
can be reclaimed, zeroing vp->v_data.  Caching allows to use the
correct mount point.

Reported and tested by:	pho
PR: 235549
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-08 08:20:18 +00:00
Konstantin Belousov
25728e8411 Before using VTONULL(), check that the covered vnode belongs to nullfs.
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-08 08:17:31 +00:00
Konstantin Belousov
930cc2dbef Some style for nullfs_mount(). Also use bool type for isvnunlocked.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-08 08:15:29 +00:00
Jamie Gritton
0e5c6bd436 Make it easier for filesystems to count themselves as jail-enabled,
by doing most of the work in a new function prison_add_vfs in kern_jail.c
Now a jail-enabled filesystem need only mark itself with VFCF_JAIL, and
the rest is taken care of.  This includes adding a jail parameter like
allow.mount.foofs, and a sysctl like security.jail.mount_foofs_allowed.
Both of these used to be a static list of known filesystems, with
predefined permission bits.

Reviewed by:	kib
Differential Revision:	D14681
2018-05-04 20:54:27 +00:00
Edward Tomasz Napierala
ed5cdcb6c3 Make nullfs properly report MNT_AUTOMOUNTED set on the nullfs mount itself,
instead of copying from the underlying filesystem.

PR:		224851
Reported by:	Jamie Landeg-Jones <jamie at dyslexicfish.net>
Tested by:	Jamie Landeg-Jones <jamie at dyslexicfish.net>
MFC after:	2 weeks
2018-01-10 17:51:02 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Warner Losh
fbbd9655e5 Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
Konstantin Belousov
2f304845e2 Do not allocate struct statfs on kernel stack.
Right now size of the structure is 472 bytes on amd64, which is
already large and stack allocations are indesirable.  With the ino64
work, MNAMELEN is increased to 1024, which will make it impossible to have
struct statfs on the stack.

Extracted from:	ino64 work by gleb
Discussed with:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-05 17:19:26 +00:00
Konstantin Belousov
abc1515601 NFSv4 client tracks opens, and the track records are only dropped when
the vnode is inactivated.  This contradicts with the nullfs caching
which keeps upper vnode around, as consequence keeping the use
reference to lower vnode.

Add a filesystem flag to request nullfs to not cache when mounted over
that filesystem, and set the flag for nfs v4 mounts.

Reported by:	asomers
Reviewed by:	rmacklem
Tested by:	asomers, rmacklem
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-11-27 09:20:58 +00:00
Bryan Drewery
28323add09 Fix improper use of "its".
Sponsored by:	Dell EMC Isilon
2016-11-08 23:59:41 +00:00
Edward Tomasz Napierala
e583d99909 Change the getnewvnode(9) tag for nullfs from "null" to "nullfs".
It's more consistent, and besides, the "null" alone looks weird.

MFC after:	1 month
2016-09-15 13:57:37 +00:00
Mateusz Guzik
6a3e46059a nullfs: plug vnode ref leak in null_vptocnp
The lower vnode is already referenced and nodeget is supposed to consume
the reference. Thus the extra vref call was causing a leak.

Reported by:	pho
Reviewed by:	kib
MFC after:	1 week
2016-09-09 10:40:55 +00:00
Mateusz Guzik
2740551545 nullfs: stop special-casing directories in null_vptocnp
The previous code was forcing an expensive walk in vop_stdvptocnp,
which was causing performance issues on highly contended zfs.

No objections:	kib
MFC after:	2 weeks
2016-09-06 21:22:03 +00:00
Pedro F. Giffuni
b3a15ddd5b sys/fs: spelling fixes in comments.
No functional change.
2016-04-29 20:51:24 +00:00
Konstantin Belousov
f36aa2b792 Pass MNTK_NO_IOPF and MNTK_UNMAPPED_BUFS flags from the lower
filesystem to the nullfs mount.

MNTK_NO_IOPF must be present on the nullfs struct mount so that struct
file fo_read and fo_write fops operate in the mode requested by the
lower mount.

MNTK_UNMAPPED_BUFS allows VOP_GETPAGES() to use unmapped buffers.  It
does not matter for VOP_GETPAGES() calls from vm_fault() since handle
of the vm_object always points to the lower vnode.  But it may be
useful for other situations where VOP_GETPAGES() is used.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-03-04 17:24:28 +00:00
Konstantin Belousov
830cd4b810 After nullfs rmdir operation, reclaim the directory vnode which was
unlinked.  Otherwise the vnode stays cached, causing leak.  This is
similar to r292961 for regular files.

Reported and tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-02-17 19:43:03 +00:00
Konstantin Belousov
6f73b583d9 Force nullfs vnode reclaim after unlinking, to potentially unlink
lower vnode.  Otherwise, reference to the lower vnode from the upper
one prevents final unlink.

PR:	178238
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-12-30 19:49:22 +00:00
Mark Johnston
5f34e93c58 Check suspendability on the mountpoint returned by VOP_GETWRITEMOUNT.
This obviates the need for a MNTK_SUSPENDABLE flag, since passthrough
filesystems like nullfs and unionfs no longer need to inherit this
information from their lower layer(s). This change also restores the
pre-r273336 behaviour of using the presence of a susp_clean VFS method to
request suspension support.

Reviewed by:	kib, mjg
Differential Revision:	https://reviews.freebsd.org/D2937
2015-07-05 22:37:33 +00:00
Rick Macklem
dda11d4ab9 File systems that do not use the buffer cache (such as ZFS) must
use VOP_FSYNC() to perform the NFS server's Commit operation.
This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which
is set by file systems that use the buffer cache. If this flag
is not set, the NFS server always does a VOP_FSYNC().
This should be ok for old file system modules that do not set
MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although
it might not be optimal for file systems that use the buffer cache.

Reviewed by:	kib
MFC after:	2 weeks
2015-04-15 20:16:31 +00:00