Commit Graph

456 Commits

Author SHA1 Message Date
Mateusz Guzik
bb92cd7bcd vfs: NDFREE(&nd, NDF_ONLY_PNBUF) -> NDFREE_PNBUF(&nd) 2022-03-24 10:20:51 +00:00
Konstantin Belousov
4a4b059a97 Add vfs_remount_ro()
a helper to remount filesystem from rw to ro.

Tested by:	pho
Reviewed by:	markj, mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33721
2022-01-08 05:41:44 +02:00
Mateusz Guzik
7e1d3eefd4 vfs: remove the unused thread argument from NDINIT*
See b4a58fbf64 ("vfs: remove cn_thread")

Bump __FreeBSD_version to 1400043.
2021-11-25 22:50:42 +00:00
Robert Wing
8981a100e6 mount: retire kernel_vmount()
The last usage of this function was removed in e3b1c847a4.

There are no in-tree consumers of kernel_vmount().

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D32607
2021-11-20 10:22:28 -09:00
Kirk McKusick
f10a8d0971 Allow the MNT_FORCE flag to be passed through to an initial mount.
When doing an initial mount(8) with its -f (force) flag, the MNT_FORCE
flag is not passed through to the underlying filesystem mount routine.
MNT_FORCE is only passed through on later updates to an existing
mount. With this commit the MNT_FORCE flag is now passed through on the
initial mount.

Sanity check: kib
Sponsored by: Netflix
2021-11-15 15:45:56 -08:00
Mark Johnston
03d5820f73 mount: Check for !VDIR mount points before handling -o emptydir
To implement -o emptydir, vfs_emptydir() checks that the passed
directory is empty.  This should be done after checking whether the
vnode is of type VDIR, though, or vfs_emptydir() may end up calling
VOP_READDIR on a non-directory.

Reported by:	syzbot+4006732c69fb0f792b2c@syzkaller.appspotmail.com
Reviewed by:	kib, imp
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32475
2021-10-13 09:33:35 -04:00
Piotr Pawel Stefaniak
6e8272f317 mount: improve error message for invalid filesystem names
For an invalid filesystem name used like this:
mount -t asdfs /dev/ada1p5 /usr/obj

emit an error message like this:
mount: /dev/ada1p5: Invalid fstype: Invalid argument

instead of:
mount: /dev/ada1p5: Operation not supported by device

Differential Revision:	https://reviews.freebsd.org/D31540
2021-09-15 16:25:31 +02:00
Mateusz Guzik
f1e2cc1c66 vfs: drop dedicated sysinit for mountlist_mtx
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-08-26 20:52:03 +02:00
Mateusz Guzik
0d28d014c8 vfs: refactor kern_unmount
Split unmounting by path and id in preparation for other changes.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-08-26 13:58:28 +02:00
Mateusz Guzik
7b2561b46b vfs: stop open-coding vfs_getvfs in kern_unmount
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-08-26 11:38:31 +00:00
Mateusz Guzik
614faa3269 vfs: fix cache-relatecd LOR introduced in the previous change
Reported by:	kib
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-08-22 16:20:07 +00:00
Jason A. Harmening
e81e71b0e9 Use interruptible wait for blocking recursive unmounts
Now that we allow recursive unmount attempts to be abandoned upon
exceeding the retry limit, we should avoid leaving an unkillable
thread when a synchronous unmount request was issued against the
base filesystem.

Reviewed by:	kib (earlier revision), mkusick
Differential Revision:  https://reviews.freebsd.org/D31450
2021-08-20 13:21:56 -07:00
Jason A. Harmening
a8c732f4e5 VFS: add retry limit and delay for failed recursive unmounts
A forcible unmount attempt may fail due to a transient condition, but
it may also fail due to some issue in the filesystem implementation
that will indefinitely prevent successful unmount.  In such a case,
the retry logic in the recursive unmount facility will cause the
deferred unmount taskqueue to execute constantly.

Avoid this scenario by imposing a retry limit, with a default value
of 10, beyond which the recursive unmount facility will emit a log
message and give up.  Additionally, introduce a grace period, with
a default value of 1s, between successive unmount retries on the
same mount.

Create a new sysctl node, vfs.deferred_unmount, to export the total
number of failed recursive unmount attempts since boot, and to allow
the retry limit and retry grace period to be tuned.

Reviewed by:	kib (earlier revision), mkusick
Differential Revision:  https://reviews.freebsd.org/D31450
2021-08-20 13:20:50 -07:00
Mateusz Guzik
dbc689cdef vfs: use vn_lock_pair to avoid establishing an ordering on mount
This fixes some of the LORs seen on mount/unmount.

Complete fix will require taking care of unmount as well.

Reviewed by:	kib
Tested by:	pho (previous version)
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31611
2021-08-20 17:52:24 +00:00
Jason A. Harmening
c746ed724d Allow stacked filesystems to be recursively unmounted
In certain emergency cases such as media failure or removal, UFS will
initiate a forced unmount in order to prevent dirty buffers from
accumulating against the no-longer-usable filesystem.  The presence
of a stacked filesystem such as nullfs or unionfs above the UFS mount
will prevent this forced unmount from succeeding.

This change addreses the situation by allowing stacked filesystems to
be recursively unmounted on a taskqueue thread when the MNT_RECURSE
flag is specified to dounmount().  This call will block until all upper
mounts have been removed unless the caller specifies the MNT_DEFERRED
flag to indicate the base filesystem should also be unmounted from the
taskqueue.

To achieve this, the recently-added vfs_pin_from_vp()/vfs_unpin() KPIs
have been combined with the existing 'mnt_uppers' list used by nullfs
and renamed to vfs_register_upper_from_vp()/vfs_unregister_upper().
The format of the mnt_uppers list has also been changed to accommodate
filesystems such as unionfs in which a given mount may be stacked atop
more than one lower mount.  Additionally, management of lower FS
reclaim/unlink notifications has been split into a separate list
managed by a separate set of KPIs, as registration of an upper FS no
longer implies interest in these notifications.

Reviewed by:	kib, mckusick
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D31016
2021-07-24 12:52:00 -07:00
Warner Losh
6475667f7b devctl: don't publish the mount options
Mount options aren't solely ASCII strings. In addition, experience to
date suggests that the mount options are much less useful than was
originally supposed and the mount flags suffice to make decisions. Drop
the reporting of options for the mount/remount/unmount events.

Reviewed by:		markj
Reported by:		KASAN
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D31287
2021-07-24 09:03:53 -06:00
Jason A. Harmening
59409cb90f Add a generic mechanism for preventing forced unmount
This is aimed at preventing stacked filesystems like nullfs and unionfs
from "losing" their lower mounts due to forced unmount.  Otherwise,
VFS operations that are passed through to the lower filesystem(s) may
crash or otherwise cause unpredictable behavior.

Introduce two new functions: vfs_pin_from_vp() and vfs_unpin().
which are intended to be called on the lower mount(s) when the stacked
filesystem is mounted and unmounted, respectively.
Much as registration in the mnt_uppers list previously did, pinning
will prevent even forced unmount of the lower FS and will allow the
stacked FS to freely operate on the lower mount either by direct
use of the struct mount* or indirect use through a properly-referenced
vnode's v_mount field.

vfs_pin_from_vp() is modeled after vfs_ref_from_vp() in that it uses
the mount interlock coupled with re-checking vp->v_mount to ensure
that it will fail in the face of a pending unmount request, even if
the concurrent unmount fully completes.

Adopt these new functions in both nullfs and unionfs.

Reviewed By:	kib, markj
Differential Revision: https://reviews.freebsd.org/D30401
2021-06-05 18:20:36 -07:00
Mark Johnston
2425f5e912 mount: Disallow mounting over a jail root
Discussed with:	jamie
Approved by:	so
Security:	CVE-2020-25584
Security:	FreeBSD-SA-21:10.jail_mount
2021-04-06 14:49:36 -04:00
Mateusz Guzik
a15f787adb vfs: add vfs_ref_from_vp
This generalizes what vop_stdgetwritemount used to be doing.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D28695
2021-02-21 00:43:05 +00:00
Mateusz Guzik
82397d7919 vfs: denote vnode being a mount point with VIRF_MOUNTPOINT
Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D27794
2021-01-03 06:50:06 +00:00
Konstantin Belousov
164438a7b9 More careful handling of the mount failure.
- VFS_UNMOUNT() requires vn_start_write() around it [*].
- call VFS_PURGE() before unmount.
- do not destroy mp if cleanup unmount did not succeed.
- set MNTK_UNMOUNT, and indicate forced unmount with MNTK_UNMOUNTF
  for VFS_UNMOUNT() in cleanup.

PR:	251320 [*]
Reported by:	Tong Zhang <ztong0001@gmail.com>
Reviewed by:	markj, mjg
Discussed with:	rmacklem
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27327
2020-11-26 18:08:42 +00:00
Mateusz Guzik
f6dd1aefb7 vfs: group mount per-cpu vars into one struct
While here move frequently read stuff into the same cacheline.

This shrinks struct mount by 64 bytes.

Tested by:	pho
2020-11-09 23:02:13 +00:00
Konstantin Belousov
f10845877e Suspend all writeable local filesystems on power suspend.
This ensures that no writes are pending in memory, either metadata or
user data, but not including dirty pages not yet converted to fs writes.

Only filesystems declared local are suspended.

Note that this does not guarantee absence of the metadata errors or
leaks if resume is not done: for instance, on UFS unlinked but opened
inodes are leaked and require fsck to gc.

Reviewed by:	markj
Discussed with:	imp
Tested by:	imp (previous version), pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D27054
2020-11-05 20:52:49 +00:00
Mateusz Guzik
2dee296a3d Rationalize per-cpu zones.
The 2 provided zones had inconsistent naming between each other
("int" and "64") and other allocator zones (which use bytes).

Follow malloc by naming them "pcpu-" + size in bytes.

This is a step towards replacing ad-hoc per-cpu zones with
general slabs.
2020-11-05 15:08:56 +00:00
Mateusz Guzik
ad89066af4 vfs: annotate mountlist_mtx with __exclusive_cache_line 2020-10-17 08:47:08 +00:00
Mateusz Guzik
a3d9bf49b5 cache: drop the force flag from purgevfs
The optional scan is wasteful, thus it is removed altogether from unmount.

Callers which always want it anyway remain unaffected.
2020-09-23 10:46:07 +00:00
Rick Macklem
df665abd34 Fix a "v_seqc_users == 0 not met" panic when VFS_STATFS() fails during mount.
r363210 introduced v_seqc_users to the vnodes.  This change requires
a vn_seqc_write_end() to match the vn_seqc_write_begin() in
vfs_cache_root_clear().
mjg@ provided this patch which seems to fix the panic.

Tested for an NFS mount where the VFS_STATFS() call will fail.

Submitted by:	mjg
Reviewed by:	mjg
Differential Revision:	https://reviews.freebsd.org/D26160
2020-08-26 21:49:43 +00:00
Warner Losh
773e541e8d Use devctl.h instead of bus.h to reduce newbus pollution.
There's no need for these parts of the kernel to know about newbus,
so narrow what is included to devctl.h for device_notify_*.

Suggested by: kib@
2020-08-21 00:03:24 +00:00
Warner Losh
0f2c2c1c58 Use names suggested by kib@ in review D25969, move call for unmount to not call
with vnode locked, use NOWAIT alloc and only report when we don't overflow.

These changes were accidentally omitted from r364402, except for the not
reporting on overflow. They were lumped in with a debugging commit in my tree
that I omitted w/o realizing this.

Other issues from the review are pending some other changes I need to do first.
2020-08-20 16:52:48 +00:00
Warner Losh
8ef773d1b4 Add VFS FS events for mount and unmount to devctl/devd
Report when a filesystem is mounted, remounted or unmounted via devd, along with
details about the mount point and mount options.

Discussed with:	kib@
Reviewed by: kirk@ (prior version)
Sponsored by: Netflix
Diffential Revision: https://reviews.freebsd.org/D25969
2020-08-19 17:10:04 +00:00
Mateusz Guzik
4b3208a97b vfs: sanity check mount counters in vfs_op_enter 2020-08-19 02:50:09 +00:00
Mateusz Guzik
0379ff6ae3 vfs: introduce vnode sequence counters
Modified on each permission change and link/unlink.

Reviewed by:	kib
Tested by:	pho (in a patchset)
Differential Revision:	https://reviews.freebsd.org/D25573
2020-07-25 10:31:52 +00:00
Mateusz Guzik
8c1f410c19 vfs: avoid spurious memcpy in vfs_statfs
It is quite often called for the very same buffer.
2020-07-10 06:46:42 +00:00
Ryan Moeller
33b39b6615 Apply default security flavor in vfs_export
There may be some version of mountd out there that does not supply a default
security flavor when none is given for an export.

Set the default security flavor in vfs_export if none is given, and remove the
workaround for oexport compat.

Reported by:	npn
Reviewed by:	rmacklem
Approved by:	mav (mentor)
MFC after:	3 days
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D25300
2020-06-16 21:30:30 +00:00
Rick Macklem
1f7104d720 Fix export_args ex_flags field so that is 64bits, the same as mnt_flags.
Since mnt_flags was upgraded to 64bits there has been a quirk in
"struct export_args", since it hold a copy of mnt_flags
in ex_flags, which is an "int" (32bits).
This happens to currently work, since all the flag bits used in ex_flags are
defined in the low order 32bits. However, new export flags cannot be defined.
Also, ex_anon is a "struct xucred", which limits it to 16 additional groups.
This patch revises "struct export_args" to make ex_flags 64bits and replaces
ex_anon with ex_uid, ex_ngroups and ex_groups (which points to a
groups list, so it can be malloc'd up to NGROUPS in size.
This requires that the VFS_CHECKEXP() arguments change, so I also modified the
last "secflavors" argument to be an array pointer, so that the
secflavors could be copied in VFS_CHECKEXP() while the export entry is locked.
(Without this patch VFS_CHECKEXP() returns a pointer to the secflavors
array and then it is used after being unlocked, which is potentially
a problem if the exports entry is changed.
In practice this does not occur when mountd is run with "-S",
but I think it is worth fixing.)

This patch also deleted the vfs_oexport_conv() function, since
do_mount_update() does the conversion, as required by the old vfs_cmount()
calls.

Reviewed by:	kib, freqlabs
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D25088
2020-06-14 00:10:18 +00:00
Rick Macklem
c13e414dc2 Fix build issue introduced by r361699.
Reported by:	cy (and others)
2020-06-02 00:03:26 +00:00
Ryan Moeller
1cfffed85d Assign default security flavor when converting old export args
vfs_export requires security flavors be explicitly listed when
exporting as of r360900.

Use the default AUTH_SYS flavor when converting old export args to
ensure compatibility with the legacy mount syscall.

Reported by:	rmacklem
Reviewed by:	rmacklem
Approved by:	mav (mentor)
MFC after:	3 days
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D25045
2020-06-01 18:43:51 +00:00
Rick Macklem
f9122b6488 Fix an NFS mount attempt where VFS_STATFS() fails.
r353150 added mnt_rootvnode and this seems to have broken NFS mounts when the
VFS_STATFS() called just after VFS_MOUNT() returns an error.
Then the code calls VFS_UNMOUNT(), which calls vflush(), which returns EBUSY.
Then the thread get stuck sleeping on "mntref" in vfs_mount_destroy().
This patch fixes this problem.

Reviewed by:	kib, mjg
Differential Revision:	https://reviews.freebsd.org/D24022
2020-03-22 18:18:30 +00:00
Mateusz Guzik
ed67a63c39 vfs: drop remaining zpcpu casts 2020-02-12 11:18:12 +00:00
Mateusz Guzik
123c519731 vfs: switch to smp_rendezvous_cpus_retry for vfs_op_thread_enter/exit
In particular on amd64 this eliminates an atomic op in the common case,
trading it for IPIs in the uncommon case of catching CPUs executing the
code while the filesystem is getting suspended or unmounted.
2020-02-12 11:17:45 +00:00
Mateusz Guzik
3eb6b656c2 vfs: remove now useless ENODEV handling from vn_fullpath consumers
Noted by:	ngie
2020-02-08 15:51:08 +00:00
Edward Tomasz Napierala
b3fb13eb55 Add kern_unmount() and use in Linuxulator. No functional changes.
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22646
2020-01-24 11:57:55 +00:00
Kirk McKusick
bbb1e07d65 Peter Holm reports that his test that does an umount(8) on an active
mount point while numerous tests are running that are writing to
files on that mount point cause the unmount(8) to hang forever.

The unmount(8) system call is handled in the kernel by the dounmount()
function. The cause of the hang is that prior to dounmount() calling
VFS_UNMOUNT() it is calling VFS_SYNC(mp, MNT_WAIT). The MNT_WAIT
flag indicates that VFS_SYNC() should not return until all the dirty
buffers associated with the mount point have been written to disk.
Because user processes are allowed to continue writing and can do
so faster than the data can be written to disk, the call to VFS_SYNC()
can never finish.

Unlike VFS_SYNC(), the VFS_UNMOUNT() routine can suspend all processes
when they request to do a write thus having a finite number of dirty
buffers to write that cannot be expanded. There is no need to call
VFS_SYNC() before calling VFS_UNMOUNT(), because VFS_UNMOUNT() needs
to flush everything again anyway after suspending writes, to catch
anything that was dirtied between the VFS_SYNC() and writes being
suspended.

The fix is to simply remove the unnecessary call to VFS_SYNC() from
dounmount().

Reported by:  Peter Holm
Analysis by:  Chuck Silvers
Tested by:    Peter Holm
MFC after:    7 days
Sponsored by: Netflix
2020-01-15 18:53:32 +00:00
Mateusz Guzik
cc3593fbd9 vfs: rework vnode list management
The current notion of an active vnode is eliminated.

Vnodes transition between 0<->1 hold counts all the time and the
associated traversal between different lists induces significant
scalability problems in certain workloads.

Introduce a global list containing all allocated vnodes. They get
unlinked only when UMA reclaims memory and are only requeued when
hold count reaches 0.

Sample result from an incremental make -s -j 104 bzImage on tmpfs:
stock:   118.55s user 3649.73s system 7479% cpu 50.382 total
patched: 122.38s user 1780.45s system 6242% cpu 30.480 total

Reviewed by:	jeff
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D22997
2020-01-13 02:37:25 +00:00
Mateusz Guzik
57083d2576 vfs: add per-mount vnode lazy list and use it for deferred inactive + msync
This obviates the need to scan the entire active list looking for vnodes
of interest.

msync is handled by adding all vnodes with write count to the lazy list.

deferred inactive directly adds vnodes as it sets the VI_DEFINACT flag.

Vnodes get dequeued from the list when their hold count reaches 0.

Newly added MNT_VNODE_FOREACH_LAZY* macros support filtering so that
spurious locking is avoided in the common case.

Reviewed by:	jeff
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D22995
2020-01-13 02:34:02 +00:00
Mateusz Guzik
c8b3463dd0 vfs: reimplement deferred inactive to use a dedicated flag (VI_DEFINACT)
The previous behavior of leaving VI_OWEINACT vnodes on the active list without
a hold count is eliminated. Hold count is kept and inactive processing gets
explicitly deferred by setting the VI_DEFINACT flag. The syncer is then
responsible for vdrop.

Reviewed by:	kib (previous version)
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D23036
2020-01-07 15:56:24 +00:00
Mateusz Guzik
b249ce48ea vfs: drop the mostly unused flags argument from VOP_UNLOCK
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D21427
2020-01-03 22:29:58 +00:00
Mateusz Guzik
dc20b834ca vfs: add optional root vnode caching
Root vnodes looekd up all the time, e.g. when crossing a mount point.
Currently used routines always perform a costly lookup which can be
trivially avoided.

Reviewed by:	jeff (previous version), kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21646
2019-10-06 22:14:32 +00:00
Andrew Turner
50bb04b750 Check the vfs option length is valid before accessing through
When a VFS option passed to nmount is present but NULL the kernel will
place an empty option in its internal list. This will have a NULL
pointer and a length of 0. When we come to read one of these the kernel
will try to load from the last address of virtual memory. This is
normally invalid so will fault resulting in a kernel panic.

Fix this by checking if the length is valid before dereferencing.

MFC after:	3 days
Sponsored by:	DARPA, AFRL
2019-09-27 16:22:28 +00:00
Sean Eric Fagan
ba7a55d934 Add two options to allow mount to avoid covering up existing mount points.
The two options are

* nocover/cover:  Prevent/allow mounting over an existing root mountpoint.
E.g., "mount -t ufs -o nocover /dev/sd1a /usr/local" will fail if /usr/local
is already a mountpoint.
* emptydir/noemptydir:  Prevent/allow mounting on a non-empty directory.
E.g., "mount -t ufs -o emptydir /dev/sd1a /usr" will fail.

Neither of these options is intended to be a default, for historical and
compatibility reasons.

Reviewed by:	allanjude, kib
Differential Revision:	https://reviews.freebsd.org/D21458
2019-09-23 04:28:07 +00:00