Commit Graph

153 Commits

Author SHA1 Message Date
Mateusz Guzik
10a15df653 vfs: remove the never set VDESC_VPP_WILLRELE flag 2020-02-02 09:35:48 +00:00
Mateusz Guzik
3cfabd81a1 vfs: remove the never set VDESC_NOMAP_VPP flag 2020-01-30 08:56:22 +00:00
Mateusz Guzik
b249ce48ea vfs: drop the mostly unused flags argument from VOP_UNLOCK
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D21427
2020-01-03 22:29:58 +00:00
Mateusz Guzik
6fa079fc3f vfs: flatten vop vectors
This eliminates the following loop from all VOP calls:

while(vop != NULL && \
    vop->vop_spare2 == NULL && vop->vop_bypass == NULL)
        vop = vop->vop_default;

Reviewed by:	jeff
Tesetd by:	pho
Differential Revision:	https://reviews.freebsd.org/D22738
2019-12-16 00:06:22 +00:00
Mateusz Guzik
abd80ddb94 vfs: introduce v_irflag and make v_type smaller
The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.

v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.

Reviewed by:	kib, jeff
Differential Revision:	https://reviews.freebsd.org/D22715
2019-12-08 21:30:04 +00:00
Mateusz Guzik
e0f4540a2a nullfs: reduce areas protected by vnode interlock in null_lock
Similarly to the other routine stop taking the interlock for the lower
vnode. The interlock for nullfs vnode is still taken to ensure
stability of ->v_data.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21480
2019-09-01 02:52:00 +00:00
Mateusz Guzik
13c73428dc nullfs: use VOP_NEED_INACTIVE
Reviewed by:	kib
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
2019-08-30 00:30:03 +00:00
Mateusz Guzik
1e2f0ceb2f vfs: add VOP_NEED_INACTIVE
vnode usecount drops to 0 all the time (e.g. for directories during path lookup).
When that happens the kernel would always lock the exclusive lock for the vnode
in order to call vinactive(). This blocks other threads who want to use the vnode
for looukp.

vinactive is very rarely needed and can be tested for without the vnode lock held.

This patch gives filesytems an opportunity to do it, sample total wait time for
tmpfs over 500 minutes of poudriere -j 104:

before: 557563641706 (lockmgr:tmpfs)
after:   46309603301 (lockmgr:tmpfs)

Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21371
2019-08-28 20:34:24 +00:00
Mateusz Guzik
33d46a3cef nullfs: reduce areas protected by vnode interlock
Some places only take the interlock to hold the vnode, which was a requiremnt
before they started being manipulated with atomics. Use the newly introduced
vholdnz to bump the count.

Reviewed by:	kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21358
2019-08-25 05:13:15 +00:00
Mateusz Guzik
81f666e79d nullfs: lock the vnode with LK_SHARED in null_vptocnp
null_nodeget which follows almost always finds the target vnode in the hash,
avoiding insmntque1 altogether. Should it be needed, it already checks if the
lock needs to be upgraded.

Reviewed by:	kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20244
2019-08-21 23:24:40 +00:00
Konstantin Belousov
3c93d22758 Manually clear text references on reclaim for nullfs and tmpfs.
Both filesystems do no use vnode_pager_dealloc() which would handle
this case otherwise.  Nullfs because vnode vm_object handle never
points to nullfs vnode.  Tmpfs because its vm_object is never vnode
object at all.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-06-05 20:16:25 +00:00
Konstantin Belousov
78022527bb Switch to use shared vnode locks for text files during image activation.
kern_execve() locks text vnode exclusive to be able to set and clear
VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0
condition.

The change removes VV_TEXT, replacing it with the condition
v_writecount <= -1, and puts v_writecount under the vnode interlock.
Each text reference decrements v_writecount.  To clear the text
reference when the segment is unmapped, it is recorded in the
vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and
v_writecount is incremented on the map entry removal

The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that
v_writecount does not contradict the desired change.  vn_writecheck()
is now racy and its use was eliminated everywhere except access.
Atomic check for writeability and increment of v_writecount is
performed by the VOP.  vn_truncate() now increments v_writecount
around VOP_SETATTR() call, lack of which is arguably a bug on its own.

nullfs bypasses v_writecount to the lower vnode always, so nullfs
vnode has its own v_writecount correct, and lower vnode gets all
references, since object->handle is always lower vnode.

On the text vnode' vm object dealloc, the v_writecount value is reset
to zero, and deadfs vop_unset_text short-circuit the operation.
Reclamation of lowervp always reclaims all nullfs vnodes referencing
lowervp first, so no stray references are left.

Reviewed by:	markj, trasz
Tested by:	mjg, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D19923
2019-05-05 11:20:43 +00:00
Konstantin Belousov
b9662886ef Un null_vptocnp(), cache vp->v_mount and use it for null_nodeget() call.
The vp vnode is unlocked during the execution of the VOP method and
can be reclaimed, zeroing vp->v_data.  Caching allows to use the
correct mount point.

Reported and tested by:	pho
PR: 235549
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-08 08:20:18 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Warner Losh
fbbd9655e5 Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
Bryan Drewery
28323add09 Fix improper use of "its".
Sponsored by:	Dell EMC Isilon
2016-11-08 23:59:41 +00:00
Mateusz Guzik
6a3e46059a nullfs: plug vnode ref leak in null_vptocnp
The lower vnode is already referenced and nodeget is supposed to consume
the reference. Thus the extra vref call was causing a leak.

Reported by:	pho
Reviewed by:	kib
MFC after:	1 week
2016-09-09 10:40:55 +00:00
Mateusz Guzik
2740551545 nullfs: stop special-casing directories in null_vptocnp
The previous code was forcing an expensive walk in vop_stdvptocnp,
which was causing performance issues on highly contended zfs.

No objections:	kib
MFC after:	2 weeks
2016-09-06 21:22:03 +00:00
Pedro F. Giffuni
b3a15ddd5b sys/fs: spelling fixes in comments.
No functional change.
2016-04-29 20:51:24 +00:00
Konstantin Belousov
830cd4b810 After nullfs rmdir operation, reclaim the directory vnode which was
unlinked.  Otherwise the vnode stays cached, causing leak.  This is
similar to r292961 for regular files.

Reported and tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-02-17 19:43:03 +00:00
Konstantin Belousov
6f73b583d9 Force nullfs vnode reclaim after unlinking, to potentially unlink
lower vnode.  Otherwise, reference to the lower vnode from the upper
one prevents final unlink.

PR:	178238
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-12-30 19:49:22 +00:00
Konstantin Belousov
effc6a3593 VOP_LOOKUP() may relock the directory vnode for some reasons. Since
nullfs vnode shares vnode lock with lower vnode, this allows the
reclamation of nullfs directory vnode in null_lookup().  In this
situation, VOP must return ENOENT.

More, since after the reclamation, the locks of nullfs directory vnode
and lower vnode are no longer shared, the relock of the ldvp does not
restore the correct locking state of dvp, and leaks ldvp lock.
Correct this by unlocking ldvp and locking dvp.

Use cached value of dvp->v_mount.

Reported by:	bdrewery
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-08-08 11:39:05 +00:00
Konstantin Belousov
0ebe0000b6 Assert that nullfs vnode has VV_ROOT set whenever lower vnode has.
Assert that dotdot lookup on the root vnode is not performed.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-07-28 14:20:31 +00:00
Konstantin Belousov
289dd6dd7c Fix typo.
MFC after:	3 days
2014-07-24 23:14:03 +00:00
Konstantin Belousov
65589a29f4 Check for the cross-device cross-link attempt in the VFS, instead of
forcing filesystem VOP_LINK() methods to repeat the code.  In
tmpfs_link(), remove redundand check for the type of the source,
already done by VFS.

Note that NFS server already performs this check before calling
VOP_LINK().

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-16 14:04:46 +00:00
Dag-Erling Smørgrav
1a05c762b9 Fix the length calculation for the final block of a sendfile(2)
transmission which could be tricked into rounding up to the nearest
page size, leaking up to a page of kernel memory.  [13:11]

In IPv6 and NetATM, stop SIOCSIFADDR, SIOCSIFBRDADDR, SIOCSIFDSTADDR
and SIOCSIFNETMASK at the socket layer rather than pass them on to the
link layer without validation or credential checks.  [SA-13:12]

Prevent cross-mount hardlinks between different nullfs mounts of the
same underlying filesystem.  [SA-13:13]

Security:	CVE-2013-5666
Security:	FreeBSD-SA-13:11.sendfile
Security:	CVE-2013-5691
Security:	FreeBSD-SA-13:12.ifioctl
Security:	CVE-2013-5710
Security:	FreeBSD-SA-13:13.nullfs
Approved by:	re
2013-09-10 10:05:59 +00:00
Konstantin Belousov
18a8d3d7f8 The tvp vnode on rename is usually unlinked. Drop the cached null
vnode for tvp to allow the free of the lower vnode, if needed.

PR:	kern/180236
Tested by:	smh
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-07-04 19:01:18 +00:00
Konstantin Belousov
0fc6daa72d - Fix nullfs vnode reference leak in nullfs_reclaim_lowervp(). The
null_hashget() obtains the reference on the nullfs vnode, which must
  be dropped.

- Fix a wart which existed from the introduction of the nullfs
  caching, do not unlock lower vnode in the nullfs_reclaim_lowervp().
  It should be innocent, but now it is also formally safe.  Inform the
  nullfs_reclaim() about this using the NULLV_NOUNLOCK flag set on
  nullfs inode.

- Add a callback to the upper filesystems for the lower vnode
  unlinking. When inactivating a nullfs vnode, check if the lower
  vnode was unlinked, indicated by nullfs flag NULLV_DROP or VV_NOSYNC
  on the lower vnode, and reclaim upper vnode if so.  This allows
  nullfs to purge cached vnodes for the unlinked lower vnode, avoiding
  excessive caching.

Reported by:	G??ran L??wkrantz <goran.lowkrantz@ismobile.com>
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2013-05-11 11:17:44 +00:00
Konstantin Belousov
6b17595133 When nullfs mount is forcibly unmounted and nullfs vnode is reclaimed,
get back the leased write reference from the lower vnode.  There is no
other path which can correct v_writecount on the lowervp.

Reported by:	flo
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2013-01-10 18:24:48 +00:00
Konstantin Belousov
9cf4c952ca Add the "nocache" nullfs mount option, which disables the caching of
the free nullfs vnodes, switching nullfs behaviour to pre-r240285.
The option is mostly intended as the last-resort when higher pressure
on the vnode cache due to doubling of the vnode counts is not
desirable.

Note that disabling the cache costs more than 2x wall time in the
metadata-hungry scenarious.  The default is "cache".

Tested and benchmarked by:	pho (previous version)
MFC after:	2 weeks
2013-01-03 19:17:57 +00:00
Konstantin Belousov
140dedb81c The r241025 fixed the case when a binary, executed from nullfs mount,
was still possible to open for write from the lower filesystem.  There
is a symmetric situation where the binary could already has file
descriptors opened for write, but it can be executed from the nullfs
overlay.

Handle the issue by passing one v_writecount reference to the lower
vnode if nullfs vnode has non-zero v_writecount.  Note that only one
write reference can be donated, since nullfs only keeps one use
reference on the lower vnode.  Always use the lower vnode v_writecount
for the checks.

Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is
currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT
to manipulate the v_writecount value, which manages a single bypass
reference to the lower vnode.  Caling the VOPs instead of directly
accessing v_writecount provide the fix described in the previous
paragraph.

Tested by:	pho
MFC after:	3 weeks
2012-11-02 13:56:36 +00:00
Konstantin Belousov
82ed933c6f Grammar fixes.
Submitted by:	bf
MFC after:	1 week
2012-10-14 18:13:33 +00:00
Konstantin Belousov
806efacae0 Replace the XXX comment with the proper description.
MFC after:	1 week
2012-10-14 17:07:34 +00:00
Konstantin Belousov
d9e9650a36 Allow shared lookups for nullfs mounts, if lower filesystem supports
it.  There are two problems which shall be addressed for shared
lookups use to have measurable effect on nullfs scalability:

1. When vfs_lookup() calls VOP_LOOKUP() for nullfs, which passes lookup
operation to lower fs, resulting vnode is often only shared-locked. Then
null_nodeget() cannot instantiate covering vnode for lower vnode, since
insmntque1() and null_hashins() require exclusive lock on the lower.

Change the assert that lower vnode is exclusively locked to only
require any lock.  If null hash failed to find pre-existing nullfs
vnode for lower vnode and the vnode is shared-locked, the lower vnode
lock is upgraded.

2. Nullfs reclaims its vnodes on deactivation. This is due to nullfs
inability to detect reclamation of the lower vnode.  Reclamation of a
nullfs vnode at deactivation time prevents a reference to the lower
vnode to become stale.

Change nullfs VOP_INACTIVE to not reclaim the vnode, instead use the
VFS_RECLAIM_LOWERVP to get notification and reclaim upper vnode
together with the reclamation of the lower vnode.

Note that nullfs reclamation procedure calls vput() on the lowervp
vnode, temporary unlocking the vnode being reclaimed. This seems to be
fine for MPSAFE filesystems, but not-MPSAFE code often put partially
initialized vnode on some globally visible list, and later can decide
that half-constructed vnode is not needed.  If nullfs mount is created
above such filesystem, then other threads might catch such not
properly initialized vnode. Instead of trying to overcome this case,
e.g. by recursing the lower vnode lock in null_reclaim_lowervp(), I
decided to rely on nearby removal of the support for non-MPSAFE
filesystems.

In collaboration with:	pho
MFC after:	3 weeks
2012-09-09 19:20:23 +00:00
Edward Tomasz Napierala
af6e6b87ad Remove unused thread argument to vrecycle().
Reviewed by:	kib
2012-04-23 14:10:34 +00:00
Konstantin Belousov
409b12c08a In null_reclaim(), assert that reclaimed vnode is fully constructed,
instead of accepting half-constructed vnode. Previous code cannot decide
what to do with such vnode anyway, and although processing it for hash
removal, paniced later when getting rid of nullfs reference on lowervp.

While there, remove initializations from the declaration block.

Tested by:	pho
MFC after:	1 week
2012-02-29 15:15:36 +00:00
Konstantin Belousov
dd0f9532f3 Do the vput() for the lowervp in the null_nodeget() for error case too.
Several callers of null_nodeget() did the cleanup itself, but several
missed it, most prominent being null_bypass(). Remove the cleanup from
the callers, now null_nodeget() handles lowervp free itself.

Reported and tested by:	pho
MFC after:	1 week
2012-01-03 21:09:07 +00:00
Konstantin Belousov
f82360acf2 Existing VOP_VPTOCNP() interface has a fatal flow that is critical for
nullfs.  The problem is that resulting vnode is only required to be
held on return from the successfull call to vop, instead of being
referenced.

Nullfs VOP_INACTIVE() method reclaims the vnode, which in combination
with the VOP_VPTOCNP() interface means that the directory vnode
returned from VOP_VPTOCNP() is reclaimed in advance, causing
vn_fullpath() to error with EBADF or like.

Change the interface for VOP_VPTOCNP(), now the dvp must be
referenced. Convert all in-tree implementations of VOP_VPTOCNP(),
which is trivial, because vhold(9) and vref(9) are similar in the
locking prerequisites. Out-of-tree fs implementation of VOP_VPTOCNP(),
if any, should have no trouble with the fix.

Tested by:	pho
Reviewed by:	mckusick
MFC after:	3 weeks (subject of re approval)
2011-11-19 07:50:49 +00:00
Konstantin Belousov
f82ee01c1c Do not use NULLVPTOLOWERVP() in the null_print(). If diagnostic is compiled
in, and show vnode is used from ddb on the faulty nullfs vnode, we get
panic instead of vnode dump.

MFC after:	1 week
2011-11-19 07:41:37 +00:00
Rebecca Cran
974206cf70 Fix typos - remove duplicate "is".
PR:		docs/154934
Submitted by:	Eitan Adler <lists at eitanadler.com>
MFC after:	3 days
2011-02-23 09:22:33 +00:00
Rick Macklem
2d0c83b139 Add a null_remove() function to nullfs, so that the v_usecount
of the lower level vnode is incremented to greater than 1 when
the upper level vnode's v_usecount is greater than one. This
is necessary for the NFS clients, so that they will do a silly
rename of the file instead of actually removing it when the
file is still in use. It is "racy", since the v_usecount is
incremented in many places in the kernel with
minimal synchronization, but an extraneous silly rename is
preferred to not doing a silly rename when it is required.
The only other file systems that currently check the value
of v_usecount in their VOP_REMOVE() functions are nwfs and
smbfs. These file systems choose to fail a remove when the
v_usecount is greater than 1 and I believe will function
more correctly with this patch, as well.

Tested by:	to.my.trociny at gmail.com
Submitted by:	to.my.trociny at gmail.com (earlier version)
Reviewed by:	kib
MFC after:	2 weeks
2010-08-31 01:16:45 +00:00
Konstantin Belousov
de082cd17a Disable bypass for the vop_advlockpurge(). The vop is called after
vop_revoke(), the v_data is already destroyed.

Reported and tested by:	ed
2010-05-16 05:00:29 +00:00
Konstantin Belousov
c808c9632d Add explicit struct ucred * argument for VOP_VPTOCNP, to be used by
vn_open_cred in default implementation. Valid struct ucred is needed for
audit and MAC, and curthread credentials may be wrong.

This further requires modifying the interface of vn_fullpath(9), but it
is out of scope of this change.

Reviewed by:	rwatson
2009-06-21 19:21:01 +00:00
Konstantin Belousov
b0f34bb643 Implement the bypass routine for VOP_VPTOCNP in nullfs.
Among other things, this makes procfs <pid>/file working for executables
started from nullfs mount.

Tested by:	pho
PR:	94269, 104938
2009-05-31 14:58:43 +00:00
Konstantin Belousov
cec9ed6d7f Lock the real null vnode lock before substitution of vp->v_vnlock.
This should not really matter for correctness, since vp->v_lock is
not locked before the call, and null_lock() holds the interlock,
but makes the control flow for reclaim more clear.

Tested by:	pho
2009-05-31 14:52:45 +00:00
Edward Tomasz Napierala
c97fcdba57 Add VOP_ACCESSX, which can be used to query for newly added V*
permissions, such as VWRITE_ACL.  For a filsystems that don't
implement it, there is a default implementation, which works
as a wrapper around VOP_ACCESS.

Reviewed by:	rwatson@
2009-05-30 13:59:05 +00:00
Peter Holm
aa73f8c7a2 Do not use null_bypass for VOP_ISLOCKED, directly call default
implementation. null_bypass cannot work for the !nullfs-vnodes, in
particular, for VBAD vnodes.

In collaboration with:	kib
2009-03-18 13:54:35 +00:00
Attilio Rao
b13ec5e016 Remove the null_islocked() overloaded vop because the standard one does
the same.
2009-03-13 07:09:20 +00:00
Konstantin Belousov
062ef8a5f8 Do not use bypass for vop_vptocnp() from nullfs, call standard
implementation instead. The bypass does not assume that returned vnode
is only held.

Reported by:	Paul B. Mahol <onemda gmail com>, pluknet <pluknet gmail com>
Reviewed by:	jhb
Tested by:	pho, pluknet <pluknet gmail com>
2009-03-10 14:35:21 +00:00
Bjoern A. Zeeb
7956d34b95 Remove unused local variables.
Submitted by:	Christoph Mallon christoph.mallon@gmx.de
Reviewed by:	kib
MFC after:	2 weeks
2009-01-31 17:36:22 +00:00