freebsd-dev

Author	SHA1	Message	Date
Jason A. Harmening	c746ed724d	Allow stacked filesystems to be recursively unmounted In certain emergency cases such as media failure or removal, UFS will initiate a forced unmount in order to prevent dirty buffers from accumulating against the no-longer-usable filesystem. The presence of a stacked filesystem such as nullfs or unionfs above the UFS mount will prevent this forced unmount from succeeding. This change addreses the situation by allowing stacked filesystems to be recursively unmounted on a taskqueue thread when the MNT_RECURSE flag is specified to dounmount(). This call will block until all upper mounts have been removed unless the caller specifies the MNT_DEFERRED flag to indicate the base filesystem should also be unmounted from the taskqueue. To achieve this, the recently-added vfs_pin_from_vp()/vfs_unpin() KPIs have been combined with the existing 'mnt_uppers' list used by nullfs and renamed to vfs_register_upper_from_vp()/vfs_unregister_upper(). The format of the mnt_uppers list has also been changed to accommodate filesystems such as unionfs in which a given mount may be stacked atop more than one lower mount. Additionally, management of lower FS reclaim/unlink notifications has been split into a separate list managed by a separate set of KPIs, as registration of an upper FS no longer implies interest in these notifications. Reviewed by: kib, mckusick Tested by: pho Differential Revision: https://reviews.freebsd.org/D31016	2021-07-24 12:52:00 -07:00
Warner Losh	6475667f7b	devctl: don't publish the mount options Mount options aren't solely ASCII strings. In addition, experience to date suggests that the mount options are much less useful than was originally supposed and the mount flags suffice to make decisions. Drop the reporting of options for the mount/remount/unmount events. Reviewed by: markj Reported by: KASAN Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D31287	2021-07-24 09:03:53 -06:00
Jason A. Harmening	59409cb90f	Add a generic mechanism for preventing forced unmount This is aimed at preventing stacked filesystems like nullfs and unionfs from "losing" their lower mounts due to forced unmount. Otherwise, VFS operations that are passed through to the lower filesystem(s) may crash or otherwise cause unpredictable behavior. Introduce two new functions: vfs_pin_from_vp() and vfs_unpin(). which are intended to be called on the lower mount(s) when the stacked filesystem is mounted and unmounted, respectively. Much as registration in the mnt_uppers list previously did, pinning will prevent even forced unmount of the lower FS and will allow the stacked FS to freely operate on the lower mount either by direct use of the struct mount* or indirect use through a properly-referenced vnode's v_mount field. vfs_pin_from_vp() is modeled after vfs_ref_from_vp() in that it uses the mount interlock coupled with re-checking vp->v_mount to ensure that it will fail in the face of a pending unmount request, even if the concurrent unmount fully completes. Adopt these new functions in both nullfs and unionfs. Reviewed By: kib, markj Differential Revision: https://reviews.freebsd.org/D30401	2021-06-05 18:20:36 -07:00
Mark Johnston	2425f5e912	mount: Disallow mounting over a jail root Discussed with: jamie Approved by: so Security: CVE-2020-25584 Security: FreeBSD-SA-21:10.jail_mount	2021-04-06 14:49:36 -04:00
Mateusz Guzik	a15f787adb	vfs: add vfs_ref_from_vp This generalizes what vop_stdgetwritemount used to be doing. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D28695	2021-02-21 00:43:05 +00:00
Mateusz Guzik	82397d7919	vfs: denote vnode being a mount point with VIRF_MOUNTPOINT Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D27794	2021-01-03 06:50:06 +00:00
Konstantin Belousov	164438a7b9	More careful handling of the mount failure. - VFS_UNMOUNT() requires vn_start_write() around it []. - call VFS_PURGE() before unmount. - do not destroy mp if cleanup unmount did not succeed. - set MNTK_UNMOUNT, and indicate forced unmount with MNTK_UNMOUNTF for VFS_UNMOUNT() in cleanup. PR: 251320 [] Reported by: Tong Zhang <ztong0001@gmail.com> Reviewed by: markj, mjg Discussed with: rmacklem Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27327	2020-11-26 18:08:42 +00:00
Mateusz Guzik	f6dd1aefb7	vfs: group mount per-cpu vars into one struct While here move frequently read stuff into the same cacheline. This shrinks struct mount by 64 bytes. Tested by: pho	2020-11-09 23:02:13 +00:00
Konstantin Belousov	f10845877e	Suspend all writeable local filesystems on power suspend. This ensures that no writes are pending in memory, either metadata or user data, but not including dirty pages not yet converted to fs writes. Only filesystems declared local are suspended. Note that this does not guarantee absence of the metadata errors or leaks if resume is not done: for instance, on UFS unlinked but opened inodes are leaked and require fsck to gc. Reviewed by: markj Discussed with: imp Tested by: imp (previous version), pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D27054	2020-11-05 20:52:49 +00:00
Mateusz Guzik	2dee296a3d	Rationalize per-cpu zones. The 2 provided zones had inconsistent naming between each other ("int" and "64") and other allocator zones (which use bytes). Follow malloc by naming them "pcpu-" + size in bytes. This is a step towards replacing ad-hoc per-cpu zones with general slabs.	2020-11-05 15:08:56 +00:00
Mateusz Guzik	ad89066af4	vfs: annotate mountlist_mtx with __exclusive_cache_line	2020-10-17 08:47:08 +00:00
Mateusz Guzik	a3d9bf49b5	cache: drop the force flag from purgevfs The optional scan is wasteful, thus it is removed altogether from unmount. Callers which always want it anyway remain unaffected.	2020-09-23 10:46:07 +00:00
Rick Macklem	df665abd34	Fix a "v_seqc_users == 0 not met" panic when VFS_STATFS() fails during mount. r363210 introduced v_seqc_users to the vnodes. This change requires a vn_seqc_write_end() to match the vn_seqc_write_begin() in vfs_cache_root_clear(). mjg@ provided this patch which seems to fix the panic. Tested for an NFS mount where the VFS_STATFS() call will fail. Submitted by: mjg Reviewed by: mjg Differential Revision: https://reviews.freebsd.org/D26160	2020-08-26 21:49:43 +00:00
Warner Losh	773e541e8d	Use devctl.h instead of bus.h to reduce newbus pollution. There's no need for these parts of the kernel to know about newbus, so narrow what is included to devctl.h for device_notify_*. Suggested by: kib@	2020-08-21 00:03:24 +00:00
Warner Losh	0f2c2c1c58	Use names suggested by kib@ in review D25969, move call for unmount to not call with vnode locked, use NOWAIT alloc and only report when we don't overflow. These changes were accidentally omitted from r364402, except for the not reporting on overflow. They were lumped in with a debugging commit in my tree that I omitted w/o realizing this. Other issues from the review are pending some other changes I need to do first.	2020-08-20 16:52:48 +00:00
Warner Losh	8ef773d1b4	Add VFS FS events for mount and unmount to devctl/devd Report when a filesystem is mounted, remounted or unmounted via devd, along with details about the mount point and mount options. Discussed with: kib@ Reviewed by: kirk@ (prior version) Sponsored by: Netflix Diffential Revision: https://reviews.freebsd.org/D25969	2020-08-19 17:10:04 +00:00
Mateusz Guzik	4b3208a97b	vfs: sanity check mount counters in vfs_op_enter	2020-08-19 02:50:09 +00:00
Mateusz Guzik	0379ff6ae3	vfs: introduce vnode sequence counters Modified on each permission change and link/unlink. Reviewed by: kib Tested by: pho (in a patchset) Differential Revision: https://reviews.freebsd.org/D25573	2020-07-25 10:31:52 +00:00
Mateusz Guzik	8c1f410c19	vfs: avoid spurious memcpy in vfs_statfs It is quite often called for the very same buffer.	2020-07-10 06:46:42 +00:00
Ryan Moeller	33b39b6615	Apply default security flavor in vfs_export There may be some version of mountd out there that does not supply a default security flavor when none is given for an export. Set the default security flavor in vfs_export if none is given, and remove the workaround for oexport compat. Reported by: npn Reviewed by: rmacklem Approved by: mav (mentor) MFC after: 3 days Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D25300	2020-06-16 21:30:30 +00:00
Rick Macklem	1f7104d720	Fix export_args ex_flags field so that is 64bits, the same as mnt_flags. Since mnt_flags was upgraded to 64bits there has been a quirk in "struct export_args", since it hold a copy of mnt_flags in ex_flags, which is an "int" (32bits). This happens to currently work, since all the flag bits used in ex_flags are defined in the low order 32bits. However, new export flags cannot be defined. Also, ex_anon is a "struct xucred", which limits it to 16 additional groups. This patch revises "struct export_args" to make ex_flags 64bits and replaces ex_anon with ex_uid, ex_ngroups and ex_groups (which points to a groups list, so it can be malloc'd up to NGROUPS in size. This requires that the VFS_CHECKEXP() arguments change, so I also modified the last "secflavors" argument to be an array pointer, so that the secflavors could be copied in VFS_CHECKEXP() while the export entry is locked. (Without this patch VFS_CHECKEXP() returns a pointer to the secflavors array and then it is used after being unlocked, which is potentially a problem if the exports entry is changed. In practice this does not occur when mountd is run with "-S", but I think it is worth fixing.) This patch also deleted the vfs_oexport_conv() function, since do_mount_update() does the conversion, as required by the old vfs_cmount() calls. Reviewed by: kib, freqlabs Relnotes: yes Differential Revision: https://reviews.freebsd.org/D25088	2020-06-14 00:10:18 +00:00
Rick Macklem	c13e414dc2	Fix build issue introduced by r361699. Reported by: cy (and others)	2020-06-02 00:03:26 +00:00
Ryan Moeller	1cfffed85d	Assign default security flavor when converting old export args vfs_export requires security flavors be explicitly listed when exporting as of r360900. Use the default AUTH_SYS flavor when converting old export args to ensure compatibility with the legacy mount syscall. Reported by: rmacklem Reviewed by: rmacklem Approved by: mav (mentor) MFC after: 3 days Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D25045	2020-06-01 18:43:51 +00:00
Rick Macklem	f9122b6488	Fix an NFS mount attempt where VFS_STATFS() fails. r353150 added mnt_rootvnode and this seems to have broken NFS mounts when the VFS_STATFS() called just after VFS_MOUNT() returns an error. Then the code calls VFS_UNMOUNT(), which calls vflush(), which returns EBUSY. Then the thread get stuck sleeping on "mntref" in vfs_mount_destroy(). This patch fixes this problem. Reviewed by: kib, mjg Differential Revision: https://reviews.freebsd.org/D24022	2020-03-22 18:18:30 +00:00
Mateusz Guzik	ed67a63c39	vfs: drop remaining zpcpu casts	2020-02-12 11:18:12 +00:00
Mateusz Guzik	123c519731	vfs: switch to smp_rendezvous_cpus_retry for vfs_op_thread_enter/exit In particular on amd64 this eliminates an atomic op in the common case, trading it for IPIs in the uncommon case of catching CPUs executing the code while the filesystem is getting suspended or unmounted.	2020-02-12 11:17:45 +00:00
Mateusz Guzik	3eb6b656c2	vfs: remove now useless ENODEV handling from vn_fullpath consumers Noted by: ngie	2020-02-08 15:51:08 +00:00
Edward Tomasz Napierala	b3fb13eb55	Add kern_unmount() and use in Linuxulator. No functional changes. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22646	2020-01-24 11:57:55 +00:00
Kirk McKusick	bbb1e07d65	Peter Holm reports that his test that does an umount(8) on an active mount point while numerous tests are running that are writing to files on that mount point cause the unmount(8) to hang forever. The unmount(8) system call is handled in the kernel by the dounmount() function. The cause of the hang is that prior to dounmount() calling VFS_UNMOUNT() it is calling VFS_SYNC(mp, MNT_WAIT). The MNT_WAIT flag indicates that VFS_SYNC() should not return until all the dirty buffers associated with the mount point have been written to disk. Because user processes are allowed to continue writing and can do so faster than the data can be written to disk, the call to VFS_SYNC() can never finish. Unlike VFS_SYNC(), the VFS_UNMOUNT() routine can suspend all processes when they request to do a write thus having a finite number of dirty buffers to write that cannot be expanded. There is no need to call VFS_SYNC() before calling VFS_UNMOUNT(), because VFS_UNMOUNT() needs to flush everything again anyway after suspending writes, to catch anything that was dirtied between the VFS_SYNC() and writes being suspended. The fix is to simply remove the unnecessary call to VFS_SYNC() from dounmount(). Reported by: Peter Holm Analysis by: Chuck Silvers Tested by: Peter Holm MFC after: 7 days Sponsored by: Netflix	2020-01-15 18:53:32 +00:00
Mateusz Guzik	cc3593fbd9	vfs: rework vnode list management The current notion of an active vnode is eliminated. Vnodes transition between 0<->1 hold counts all the time and the associated traversal between different lists induces significant scalability problems in certain workloads. Introduce a global list containing all allocated vnodes. They get unlinked only when UMA reclaims memory and are only requeued when hold count reaches 0. Sample result from an incremental make -s -j 104 bzImage on tmpfs: stock: 118.55s user 3649.73s system 7479% cpu 50.382 total patched: 122.38s user 1780.45s system 6242% cpu 30.480 total Reviewed by: jeff Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D22997	2020-01-13 02:37:25 +00:00
Mateusz Guzik	57083d2576	vfs: add per-mount vnode lazy list and use it for deferred inactive + msync This obviates the need to scan the entire active list looking for vnodes of interest. msync is handled by adding all vnodes with write count to the lazy list. deferred inactive directly adds vnodes as it sets the VI_DEFINACT flag. Vnodes get dequeued from the list when their hold count reaches 0. Newly added MNT_VNODE_FOREACH_LAZY* macros support filtering so that spurious locking is avoided in the common case. Reviewed by: jeff Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D22995	2020-01-13 02:34:02 +00:00
Mateusz Guzik	c8b3463dd0	vfs: reimplement deferred inactive to use a dedicated flag (VI_DEFINACT) The previous behavior of leaving VI_OWEINACT vnodes on the active list without a hold count is eliminated. Hold count is kept and inactive processing gets explicitly deferred by setting the VI_DEFINACT flag. The syncer is then responsible for vdrop. Reviewed by: kib (previous version) Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D23036	2020-01-07 15:56:24 +00:00
Mateusz Guzik	b249ce48ea	vfs: drop the mostly unused flags argument from VOP_UNLOCK Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427	2020-01-03 22:29:58 +00:00
Mateusz Guzik	dc20b834ca	vfs: add optional root vnode caching Root vnodes looekd up all the time, e.g. when crossing a mount point. Currently used routines always perform a costly lookup which can be trivially avoided. Reviewed by: jeff (previous version), kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21646	2019-10-06 22:14:32 +00:00
Andrew Turner	50bb04b750	Check the vfs option length is valid before accessing through When a VFS option passed to nmount is present but NULL the kernel will place an empty option in its internal list. This will have a NULL pointer and a length of 0. When we come to read one of these the kernel will try to load from the last address of virtual memory. This is normally invalid so will fault resulting in a kernel panic. Fix this by checking if the length is valid before dereferencing. MFC after: 3 days Sponsored by: DARPA, AFRL	2019-09-27 16:22:28 +00:00
Sean Eric Fagan	ba7a55d934	Add two options to allow mount to avoid covering up existing mount points. The two options are * nocover/cover: Prevent/allow mounting over an existing root mountpoint. E.g., "mount -t ufs -o nocover /dev/sd1a /usr/local" will fail if /usr/local is already a mountpoint. * emptydir/noemptydir: Prevent/allow mounting on a non-empty directory. E.g., "mount -t ufs -o emptydir /dev/sd1a /usr" will fail. Neither of these options is intended to be a default, for historical and compatibility reasons. Reviewed by: allanjude, kib Differential Revision: https://reviews.freebsd.org/D21458	2019-09-23 04:28:07 +00:00
Mateusz Guzik	b488246b45	vfs: group fields used for per-cpu ops in one cacheline Sponsored by: The FreeBSD Foundation	2019-09-19 21:23:14 +00:00
Mateusz Guzik	4cace859c2	vfs: convert struct mount counters to per-cpu There are 3 counters modified all the time in this structure - one for keeping the structure alive, one for preventing unmount and one for tracking active writers. Exact values of these counters are very rarely needed, which makes them a prime candidate for conversion to a per-cpu scheme, resulting in much better performance. Sample benchmark performing fstatfs (modifying 2 out of 3 counters) on a 104-way 2 socket Skylake system: before: 852393 ops/s after: 76682077 ops/s Reviewed by: kib, jeff Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21637	2019-09-16 21:37:47 +00:00
Mateusz Guzik	a8c8e44bf0	vfs: manage mnt_ref with atomics New primitive is introduced to denote sections can operate locklessly on aspects of struct mount, but which can also be disabled if necessary. This provides an opportunity to start scaling common case modifications while providing stable state of the struct when facing unmount, write suspendion or other events. mnt_ref is the first counter to start being managed in this manner with the intent to make it per-cpu. Reviewed by: kib, jeff Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21425	2019-09-16 21:31:02 +00:00
Konstantin Belousov	e671edac06	De-commision the MNTK_NOINSMNTQ kernel mount flag. After all the changes, its dynamic scope is same as for MNTK_UNMOUNT, but to allow the syncer vnode to be re-installed on unmount failure. But the case of syncer was already handled by using the VV_FORCEINSMQ flag for quite some time. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-08-23 19:40:10 +00:00
Mateusz Guzik	4b3f767340	vfs: fix up r351193 ("stop always overwriting ->mnt_stat in VFS_STATFS") fs-specific part of vfs_statfs routines only fill in small portion of the structure. Previous code was always copying everything at a higher layer to acoomodate it and this patch does the same. 'df' (no arguments) worked fine because the caller uses mnt_stat itself as the target buffer, making all the copying a no-op for its own case. 'df /' and similar use a different consumer which passes its own buffer and this is where you can run into trouble. Reported by: cy Fixes: r351193 Sponsored by: The FreeBSD Foundation	2019-08-19 14:11:54 +00:00
Mateusz Guzik	e7c1709aaf	vfs: stop always overwriting ->mnt_stat in VFS_STATFS The struct is already populated on each mount (and remount). Fields are either constant or not used by filesystem in the first place. Some infrequently used functions use it to avoid having to allocate a new buffer and are left alone. The current code results in an avoidable copying single-threaded and significant cache line bouncing multithreaded While here deduplicate initial filling of the struct. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21317	2019-08-18 18:40:12 +00:00
Conrad Meyer	daec92844e	Include ktr.h in more compilation units Similar to r348026, exhaustive search for uses of CTRn() and cross reference ktr.h includes. Where it was obvious that an OS compat header of some kind included ktr.h indirectly, .c files were left alone. Some of these files clearly got ktr.h via header pollution in some scenarios, or tinderbox would not be passing prior to this revision, but go ahead and explicitly include it in files using it anyway. Like r348026, these CUs did not show up in tinderbox as missing the include. Reported by: peterj (arm64/mp_machdep.c) X-MFC-With: r347984 Sponsored by: Dell EMC Isilon	2019-05-21 20:38:48 +00:00
Kirk McKusick	13c31c29ca	Some filesystems (like cd9660 and ext3) require that VFS_STATFS() be called before VFS_ROOT() is called. Move the call for VFS_STATFS() so that it is done after VFS_MOUNT(), but before VFS_ROOT(). This change actually improves the robustness of the mount system call because it returns an error rather than failing silently when VFS_STATFS() returns failure. Reported by: Rebecca Cran <rebecca@bluestop.org> Sponsored by: Netflix	2018-12-21 01:09:25 +00:00
Kirk McKusick	e04d2a3c5a	Under UFS/FFS the VFS_ROOT() function will return an error if the inode check-hash fails. Panic'ing is not an appropriate response. So, check for an error return from VFS_ROOT() and when an error is reported, unwind and return the error. Reported by: Gary Jennejohn (gj) Sponsored by: Netflix	2018-12-15 19:04:50 +00:00
Mateusz Guzik	cc426dd319	Remove unused argument to priv_check_cred. Patch mostly generated with cocinnelle: @@ expression E1,E2; @@ - priv_check_cred(E1,E2,0) + priv_check_cred(E1,E2) Sponsored by: The FreeBSD Foundation	2018-12-11 19:32:16 +00:00
Mark Johnston	970a174f3b	Add FALLTHROUGH comments to appease Coverity. CID: 1017862-1017864, 1017866-1017868 MFC after: 2 weeks	2018-10-25 15:43:21 +00:00
Konstantin Belousov	4fceda6206	Correct condition to detect mount(2) support by a filesystem. Reported and tested by: cy Sponsored by: The FreeBSD Foundation Approved by: re (rgrimes)	2018-10-24 19:40:09 +00:00
Konstantin Belousov	8ff7fad1d7	Only call sigdeferstop() for NFS. Use bypass to catch any NFS VOP dispatch and route it through the wrapper which does sigdeferstop() and then dispatches original VOP. NFS does not need a bypass below it, which is not supported. The vop offset in the vop_vector is added since otherwise it is impossible to get vop_op_t from the internal table, and I did not wanted to create the layered fs only to wrap NFS VOPs. VFS_OP()s wrap is straightforward. Requested and reviewed by: mjg (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D17658	2018-10-23 21:43:41 +00:00
Jamie Gritton	0e5c6bd436	Make it easier for filesystems to count themselves as jail-enabled, by doing most of the work in a new function prison_add_vfs in kern_jail.c Now a jail-enabled filesystem need only mark itself with VFCF_JAIL, and the rest is taken care of. This includes adding a jail parameter like allow.mount.foofs, and a sysctl like security.jail.mount_foofs_allowed. Both of these used to be a static list of known filesystems, with predefined permission bits. Reviewed by: kib Differential Revision: D14681	2018-05-04 20:54:27 +00:00

1 2 3 4 5 ...

442 Commits