freebsd-dev

Author	SHA1	Message	Date
Mateusz Guzik	b488246b45	vfs: group fields used for per-cpu ops in one cacheline Sponsored by: The FreeBSD Foundation	2019-09-19 21:23:14 +00:00
Mateusz Guzik	4cace859c2	vfs: convert struct mount counters to per-cpu There are 3 counters modified all the time in this structure - one for keeping the structure alive, one for preventing unmount and one for tracking active writers. Exact values of these counters are very rarely needed, which makes them a prime candidate for conversion to a per-cpu scheme, resulting in much better performance. Sample benchmark performing fstatfs (modifying 2 out of 3 counters) on a 104-way 2 socket Skylake system: before: 852393 ops/s after: 76682077 ops/s Reviewed by: kib, jeff Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21637	2019-09-16 21:37:47 +00:00
Mateusz Guzik	a8c8e44bf0	vfs: manage mnt_ref with atomics New primitive is introduced to denote sections can operate locklessly on aspects of struct mount, but which can also be disabled if necessary. This provides an opportunity to start scaling common case modifications while providing stable state of the struct when facing unmount, write suspendion or other events. mnt_ref is the first counter to start being managed in this manner with the intent to make it per-cpu. Reviewed by: kib, jeff Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21425	2019-09-16 21:31:02 +00:00
Konstantin Belousov	e671edac06	De-commision the MNTK_NOINSMNTQ kernel mount flag. After all the changes, its dynamic scope is same as for MNTK_UNMOUNT, but to allow the syncer vnode to be re-installed on unmount failure. But the case of syncer was already handled by using the VV_FORCEINSMQ flag for quite some time. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-08-23 19:40:10 +00:00
Mateusz Guzik	4b3f767340	vfs: fix up r351193 ("stop always overwriting ->mnt_stat in VFS_STATFS") fs-specific part of vfs_statfs routines only fill in small portion of the structure. Previous code was always copying everything at a higher layer to acoomodate it and this patch does the same. 'df' (no arguments) worked fine because the caller uses mnt_stat itself as the target buffer, making all the copying a no-op for its own case. 'df /' and similar use a different consumer which passes its own buffer and this is where you can run into trouble. Reported by: cy Fixes: r351193 Sponsored by: The FreeBSD Foundation	2019-08-19 14:11:54 +00:00
Mateusz Guzik	e7c1709aaf	vfs: stop always overwriting ->mnt_stat in VFS_STATFS The struct is already populated on each mount (and remount). Fields are either constant or not used by filesystem in the first place. Some infrequently used functions use it to avoid having to allocate a new buffer and are left alone. The current code results in an avoidable copying single-threaded and significant cache line bouncing multithreaded While here deduplicate initial filling of the struct. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21317	2019-08-18 18:40:12 +00:00
Conrad Meyer	daec92844e	Include ktr.h in more compilation units Similar to r348026, exhaustive search for uses of CTRn() and cross reference ktr.h includes. Where it was obvious that an OS compat header of some kind included ktr.h indirectly, .c files were left alone. Some of these files clearly got ktr.h via header pollution in some scenarios, or tinderbox would not be passing prior to this revision, but go ahead and explicitly include it in files using it anyway. Like r348026, these CUs did not show up in tinderbox as missing the include. Reported by: peterj (arm64/mp_machdep.c) X-MFC-With: r347984 Sponsored by: Dell EMC Isilon	2019-05-21 20:38:48 +00:00
Kirk McKusick	13c31c29ca	Some filesystems (like cd9660 and ext3) require that VFS_STATFS() be called before VFS_ROOT() is called. Move the call for VFS_STATFS() so that it is done after VFS_MOUNT(), but before VFS_ROOT(). This change actually improves the robustness of the mount system call because it returns an error rather than failing silently when VFS_STATFS() returns failure. Reported by: Rebecca Cran <rebecca@bluestop.org> Sponsored by: Netflix	2018-12-21 01:09:25 +00:00
Kirk McKusick	e04d2a3c5a	Under UFS/FFS the VFS_ROOT() function will return an error if the inode check-hash fails. Panic'ing is not an appropriate response. So, check for an error return from VFS_ROOT() and when an error is reported, unwind and return the error. Reported by: Gary Jennejohn (gj) Sponsored by: Netflix	2018-12-15 19:04:50 +00:00
Mateusz Guzik	cc426dd319	Remove unused argument to priv_check_cred. Patch mostly generated with cocinnelle: @@ expression E1,E2; @@ - priv_check_cred(E1,E2,0) + priv_check_cred(E1,E2) Sponsored by: The FreeBSD Foundation	2018-12-11 19:32:16 +00:00
Mark Johnston	970a174f3b	Add FALLTHROUGH comments to appease Coverity. CID: 1017862-1017864, 1017866-1017868 MFC after: 2 weeks	2018-10-25 15:43:21 +00:00
Konstantin Belousov	4fceda6206	Correct condition to detect mount(2) support by a filesystem. Reported and tested by: cy Sponsored by: The FreeBSD Foundation Approved by: re (rgrimes)	2018-10-24 19:40:09 +00:00
Konstantin Belousov	8ff7fad1d7	Only call sigdeferstop() for NFS. Use bypass to catch any NFS VOP dispatch and route it through the wrapper which does sigdeferstop() and then dispatches original VOP. NFS does not need a bypass below it, which is not supported. The vop offset in the vop_vector is added since otherwise it is impossible to get vop_op_t from the internal table, and I did not wanted to create the layered fs only to wrap NFS VOPs. VFS_OP()s wrap is straightforward. Requested and reviewed by: mjg (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D17658	2018-10-23 21:43:41 +00:00
Jamie Gritton	0e5c6bd436	Make it easier for filesystems to count themselves as jail-enabled, by doing most of the work in a new function prison_add_vfs in kern_jail.c Now a jail-enabled filesystem need only mark itself with VFCF_JAIL, and the rest is taken care of. This includes adding a jail parameter like allow.mount.foofs, and a sysctl like security.jail.mount_foofs_allowed. Both of these used to be a static list of known filesystems, with predefined permission bits. Reviewed by: kib Differential Revision: D14681	2018-05-04 20:54:27 +00:00
Andriy Gapon	31260bf042	vfs_donmount: in certain cases try r/o mount if r/w mount fails If the operation is not an update, if neither r/w nor r/o mode is explicitly requested, if the error code hints at the possibility of the media being read-only, and if the fallback is allowed, then we can try to automatically downgrade to the readonly mode. This is especially useful for auto-mounting of removable media that sometimes can happen to be write-protected. The fallback to r/o is not enabled by default. It can be requested on a per-mount basis with a new mount option, 'autoro'. Or it can be globally allowed by setting vfs.default_autoro. Reviewed by: cem, kib MFC after: 3 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D13361	2018-03-27 14:31:42 +00:00
Ian Lepore	ac579135b0	Use EVENTHANDLER_DIRECT_INVOKE for [un]mount events, for better performance.	2018-01-07 18:07:22 +00:00
Pedro F. Giffuni	51369649b0	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.	2017-11-20 19:43:44 +00:00
Andriy Gapon	f92e3400bc	remove process and jail directory machinations from dounmount The manipulations done by mountcheckdirs() are not that useful during the unmount, they can bring about unexpected security consequences. Thic change effectively reverts the change in r73241. The change also allows to simplify the handling of rootvnode global variable. Discussed with: mckusick, mjg, kib Reviewed by: trasz MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D12366	2017-10-13 09:42:05 +00:00
Konstantin Belousov	9770475ce7	Do not vrele() covered vnode under the mp mutex. If vrele() changes the hold count to zero, it needs to acquire the vnode lock. Sponsored by: The FreeBSD Foundation Discussed with: avg X-MFC with: r323578	2017-09-19 16:49:45 +00:00
Andriy Gapon	cbc785c293	dounmount: do not release the mount point's reference on the covered vnode As long as mnt_ref is not zero there can be a consumer that might try to access mnt_vnodecovered. For this reason the covered vnode must not be freed until mnt_ref goes to zero. So, move the release of the covered vnode to vfs_mount_destroy. Reviewed by: kib MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D12329	2017-09-14 08:47:06 +00:00
Ed Maste	3e85b721d6	Remove register keyword from sys/ and ANSIfy prototypes A long long time ago the register keyword told the compiler to store the corresponding variable in a CPU register, but it is not relevant for any compiler used in the FreeBSD world today. ANSIfy related prototypes while here. Reviewed by: cem, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D10193	2017-05-17 00:34:34 +00:00
Konstantin Belousov	2f304845e2	Do not allocate struct statfs on kernel stack. Right now size of the structure is 472 bytes on amd64, which is already large and stack allocations are indesirable. With the ino64 work, MNAMELEN is increased to 1024, which will make it impossible to have struct statfs on the stack. Extracted from: ino64 work by gleb Discussed with: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-01-05 17:19:26 +00:00
Konstantin Belousov	714b7df502	Provide simple mutual exclusion between mount point update and unmount. Currently mount update keeps vfs_busy(9) reference on the mount point during MNT_UPDATE VFS_MOUNT() vfsops call. This already provides the exclusion, but is problematic for filesystems which need to perform namei(9) during VFS_MOUNT(MNT_UPDATE) operations, e.g. to refresh mnt_from path, because namei(9) must not be called while the vfs_busy(9) reference is owned. Check for MNT_UPDATE flag before setting MNTK_UNMOUNT, and for MNTK_UNMOUNT before entering innards of vfs_domount_update(), failing syscalls with EBUSY if conflict is detected. Keep vfs_busy(9) reference around VFS_MOUNT(MNT_UPDATE) calls still to not change VFS KPI. In the update path in ffs_mount(), drop vfs_busy() reference around namei(), which is now safe due to unmount never executing in parallel with VFS_MOUNT(MNT_UPDATE), and which avoids the deadlock. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-11-13 21:49:51 +00:00
Konstantin Belousov	9eb8f495b8	Move common cleanup code into helper. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-11-13 21:39:55 +00:00
Mateusz Guzik	45571f8886	vfs: assert empty tmp free list on unmount	2016-10-08 13:38:05 +00:00
Konstantin Belousov	f71d08566c	Limit scope of the optimization in r306608 to dounmount() caller only. Other uses of cache_purgevfs() do rely on the cache purge for correct operations, when paths are invalidated without unmount. Reported and tested by: jkim Discussed with: mjg Sponsored by: The FreeBSD Foundation	2016-10-07 11:38:28 +00:00
Mateusz Guzik	5bb81f9b2d	vfs: batch free vnodes in per-mnt lists Previously free vnodes would always by directly returned to the global LRU list. With this change up to mnt_free_list_batch vnodes are collected first. syncer runs always return the batch regardless of its size. While vnodes on per-mnt lists are not counted as free, they can be returned in case of vnode shortage. Reviewed by: kib Tested by: pho	2016-09-30 17:27:17 +00:00
Edward Tomasz Napierala	e313b4dd95	Fix bug introduced with r302388, which could cause processes accessing automounted shares to hang with "vfs_busy" wchan. (As a workaround one can run 'automount -u' from cron.) Reviewed by: kib@ MFC after: 1 month	2016-09-21 05:44:13 +00:00
Ed Maste	69a2875821	Renumber license clauses in sys/kern to avoid skipping #3	2016-09-15 13:16:20 +00:00
Edward Tomasz Napierala	411455a8fb	Replace all remaining calls to vprint(9) with vn_printf(9), and remove the old macro. MFC after: 1 month	2016-08-10 16:12:31 +00:00
Edward Tomasz Napierala	debc480e03	Add new unmount(2) flag, MNT_NONBUSY, to check whether there are any open vnodes before proceeding. Make autounmound(8) use this flag. Without it, even an unsuccessfull unmount causes filesystem flush, which interferes with normal operation. Reviewed by: kib@ Approved by: re (gjb@) MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D7047	2016-07-07 09:03:57 +00:00
Konstantin Belousov	9fdbfd3b6c	Do not assume that we own the use reference on the covered vnode until we set MNTK_UNMOUNT flag on the mp. Otherwise parallel unmount which wins race with us could dereference the covered vnode, and we are left with the locked freed memory. Reported and tested by: pho Sponsored by: The FreeBSD Foundation Approved by: re (gjb) MFC after: 1 week	2016-06-15 15:56:03 +00:00
Andriy Gapon	8614f45b2d	dounmount: do not call mountcheckdirs() for mounts with MNT_IGNORE This is a bit hackish, but the flag is currently set only for ZFS snapshots mounted under .zfs. mountcheckdirs() can change cdir/rdir references to a covered vnode. But for the said snapshots the covered vnode is really ephemeral and it must never be accessed (except for a few specific cases). To do: consider removing mountcheckdirs() entirely MFC after: 5 days	2016-05-16 07:23:24 +00:00
Konstantin Belousov	76c404fce5	Do not copy by field when converting struct oexport_args to struct export_args on mount update, bzero() is consistent with vfs_oexport_conv(). Make the code structure more explicit by using switch. Return EINVAL if export option layout (deduced from size) is unknown. Based on the submission by: bde Sponsored by: The FreeBSD Foundation	2016-02-04 16:32:21 +00:00
Edward Tomasz Napierala	c9ba65040f	Make vfs_unmountall() unmount /dev after /, not before. The only reason this didn't result in an unclean shutdown is that devfs ignores MNT_FORCE flag. Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3467	2015-08-24 13:18:13 +00:00
Konstantin Belousov	1965f86c72	Vnode is not referenced by the vfs_domount() at the point where asserts are made. Remove them, since we might dereference freed memory. Leaked locks are asserted by the syscall return code anyway. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-07-02 14:31:47 +00:00
Konstantin Belousov	780dca1b1e	Right now, dounmount() is called with unreferenced mount point. Nothing stops a parallel unmount to suceed before the given call to dounmount() checks and locks the covered vnode. Prevent dounmount() from acting on the freed (although type-stable) memory by changing the interface to require the mount point to be referenced. dounmount() consumes the reference on return, regardless of the sucessfull or erronous result. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-05-27 09:22:50 +00:00
Konstantin Belousov	5d6f5b24ca	Mountd iterating over the mount points may race with the parallel unmount, which causes error from nmount(2) call when performing MNT_DELEXPORT over the directory which ceased to be a mount point. The race is legitimate and innocent, but results in the chatty mountd. Silence it by providing an distinguished error code for the situation, and ignoring the error in mountd loop. Based on the patch by: Andreas Longwitz <longwitz@incore.de> Prodded and tested by: bdrewery Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-02-10 18:00:32 +00:00
Konstantin Belousov	b2344ab5ff	Do not call VFS_SYNC() before VFS_UNMOUNT() for forced unmount. Since VFS does not/cannot stop writes, sync might run indefinitely, or be a wrong thing to do at all. E. g. NFS ignores VFS_SYNC() for forced unmounts, since non-responding server does not allow sync to finish. On the other hand, filesystems can and do stop writes using fs-specific facilities, and should already fully flush caches in VFS_UNMOUNT() due to the race. Adjust msdosfs tp sync in unmount for forced call, to accomodate the new behaviour. Note that it is still racy, since writes are not stopped. Discussed with: avg, bjk, mckusick Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2014-12-09 10:00:47 +00:00
Edward Tomasz Napierala	3914ddf8a7	Bring in the new automounter, similar to what's provided in most other UNIX systems, eg. MacOS X and Solaris. It uses Sun-compatible map format, has proper kernel support, and LDAP integration. There are still a few outstanding problems; they will be fixed shortly. Reviewed by: allanjude@, emaste@, kib@, wblock@ (earlier versions) Phabric: D523 MFC after: 2 weeks Relnotes: yes Sponsored by: The FreeBSD Foundation	2014-08-17 09:44:42 +00:00
Konstantin Belousov	168f4ee0a8	Remove Giant acquisition from the mount and unmount pathes. It could be claimed that two things were reasonable protected by Giant. One is vfsconf list links, which is converted to the new dedicated sx vfsconf_sx. Another is vfsconf.vfc_refcount, which is now updated with atomics. Note that vfc_refcount still has the same races now as it has under the Giant, the unload of filesystem modules can happen while the module is still in use. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-08-03 03:27:54 +00:00
Bryan Drewery	97c0df733f	Use proper MFSNAMELEN for fs type. MFC after: 2 weeks Reviewed by: rodrigc Also spotted by:ambrisko	2014-04-12 21:39:17 +00:00
Sean Bruno	d3baefa809	Change len checks for fstypelen and fspathlen to be against absolute len not strlen as they are not strings. Discovered by GSOC student, Mike Ma <mikemandarine@gmail.com> during his fuse.glusterfs port to FreeBSD. Final patch from mckusick@ Submitted by: mckusick@ Approved by: re (hrs) MFC after: 2 weeks	2013-10-03 22:52:03 +00:00
Rick Macklem	8fe6bddff7	Forced dismounts of NFS mounts can fail when thread(s) are stuck waiting for an RPC reply from the server while holding the mount point busy (mnt_lockref incremented). This happens because dounmount() msleep()s waiting for mnt_lockref to become 0, before calling VFS_UNMOUNT(). This patch adds a new VFS operation called VFS_PURGE(), which the NFS client implements as purging RPCs in progress. Making this call before checking mnt_lockref fixes the problem, by ensuring that the VOP_xxx() calls will fail and unbusy the mount point. Reported by: sbruno Reviewed by: kib MFC after: 2 weeks	2013-09-01 23:02:59 +00:00
Marcel Moolenaar	8939c0693c	Add vfs_mounted and vfs_unmounted events so that components can be informed about mount and unmount events. This is used by Juniper to implement a more optimal implementation of NetBSD's veriexec. This change differs from r253224 in the following way: o The vfs_mounted handler is called before mountcheckdirs() and with newdp locked. vp is unlocked. o The event handlers are declared in <sys/eventhandler.h> and not in <sys/mount.h>. The <sys/mount.h> header is used in user land code that pretends to be kernel code and as such creates a very convoluted environment. It's hard to untangle. Submitted by: stevek@juniper.net Discussed with: pjd@ Obtained from: Juniper Networks, Inc.	2013-07-10 15:35:25 +00:00
Marcel Moolenaar	4612275fdb	Revert r251590. It unexpectedly broke the build and there were some questions on locking. As part of commit-bit grooming, I'd like Steve to handle this, but can't leave things broken in the mean time.	2013-06-10 15:22:27 +00:00
Marcel Moolenaar	8c7ca16f63	Add vfs_mounted and vfs_unmounted events so that components can be informed about mount and unmount events. This is used by Juniper to implement a more optimal implementation of NetBSD's veriexec. Submitted by: stevek@juniper.net Obtained from: Juniper Networks, Inc	2013-06-09 23:51:26 +00:00
Gabor Kovesdan	ab3f6b347e	- Correct mispellings of the word occurrence Submitted by: Christoph Mallon <christoph.mallon@gmx.de> (via private mail)	2013-04-17 11:40:10 +00:00
Konstantin Belousov	8d6884ce9c	When the journaled FFS volume is suspended due to the journal space becoming too low, the softdep flush thread processes the workitems, which frees the space in journal, and then unsuspends the fs. The softdep_flush() and other workitem processing functions busy the filesystem before iterating over the worklist, to prevent the parallel unmount from freeing the mount data. The vfs_busy() is called with MBF_NOWAIT flag. Now, if the unmount is already started and the filesystem is suspended due to low journal space, the journal is never flushed and filesystem is never unsuspended, because vfs_busy(MBF_NOWAIT) call cannot succeed for the unmounting fs, and softdep_flush() does not process the workitems. Unmount needs to write metadata, where it hangs in the "suspfs" state. Move the vn_start_write() call in the dounmount() before setting the MNTK_UNMOUNT flag. This practically ensures that softdep_flush() processed the pending journal writes by making dounmount() wait for the lift of the suspension. Sponsored by: The FreeBSD Foundation Reported and tested by: pho MFC after: 2 weeks	2013-03-20 21:07:49 +00:00
David Xu	eea8d86d4d	Revert revision 244760 because strncpy pads trailing space with zero, this prevents kernel data from being leaked. Noticed by: Joerg Sonnenberger < joerg at britannica dot bec dot de >	2013-01-04 11:11:12 +00:00
Konstantin Belousov	d1c5e3f8b0	Remove the deprecated MNT_VNODE_FOREACH interface. Use the MNT_VNODE_FOREACH_ALL instead.	2013-01-03 19:02:52 +00:00
David Xu	9d4bf0db7c	Use strlcpy to NULL-terminate error message even if user provided a short buffer.	2012-12-28 02:43:33 +00:00
Attilio Rao	b1308d72c2	Fixup r218424: uio_yield() was scaling directly to userland priority. When kern_yield() was introduced with the possibility to specify a new priority, the behaviour changed by not lowering priority at all in the consumers, making the yielding mechanism highly ineffective for high priority kthreads like bufdaemon, syncer, vlrudaemon, etc. There are no evidences that consumers could bear with such change in semantic and this situation could finally lead to bugs similar to the ones fixed in r244240. Re-specify userland pri for kthreads involved. Tested by: pho Reviewed by: kib, mdf MFC after: 1 week	2012-12-21 13:14:12 +00:00
Konstantin Belousov	796fa4fb86	Fix typo. MFC after: 3 days	2012-12-09 20:26:51 +00:00
Pawel Jakub Dawidek	e1216d1335	IFp4 @208450: Remove redundant call to AUDIT_ARG_UPATH1(). Path will be remembered by the following NDINIT(AUDITVNODE1) call. Sponsored by: FreeBSD Foundation (auditdistd) MFC after: 2 weeks	2012-11-30 22:49:28 +00:00
Konstantin Belousov	5050aa86cf	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho	2012-10-22 17:50:54 +00:00
Konstantin Belousov	bcd5bb8e57	Add a facility for vgone() to inform the set of subscribed mounts about vnode reclamation. Typical use is for the bypass mounts like nullfs to get a notification about lower vnode going away. Now, vgone() calls new VFS op vfs_reclaim_lowervp() with an argument lowervp which is reclaimed. It is possible to register several reclamation event listeners, to correctly handle the case of several nullfs mounts over the same directory. For the filesystem not having nullfs mounts over it, the overhead added is a single mount interlock lock/unlock in the vnode reclamation path. In collaboration with: pho MFC after: 3 weeks	2012-09-09 19:17:15 +00:00
Kirk McKusick	f257ebbb2e	This change creates a new list of active vnodes associated with a mount point. Active vnodes are those with a non-zero use or hold count, e.g., those vnodes that are not on the free list. Note that this list is in addition to the list of all the vnodes associated with a mount point. To avoid adding another set of linkage pointers to the vnode structure, the active list uses the existing linkage pointers used by the free list (previously named v_freelist, now renamed v_actfreelist). This update adds the MNT_VNODE_FOREACH_ACTIVE interface that loops over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks	2012-04-20 06:50:44 +00:00
Kirk McKusick	71469bb38f	Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL. The primary changes are that the user of the interface no longer needs to manage the mount-mutex locking and that the vnode that is returned has its mutex locked (thus avoiding the need to check to see if its is DOOMED or other possible end of life senarios). To minimize compatibility issues for third-party developers, the old MNT_VNODE_FOREACH interface will remain available so that this change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH will be removed in head. The reason for this update is to prepare for the addition of the MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks	2012-04-17 16:28:22 +00:00
Gleb Kurtsou	0ff93c48da	Add vfs_getopt_size. Support human readable file system options in tmpfs. Increase maximum tmpfs file system size to 4GB*PAGE_SIZE on 32 bit archs. Discussed with: delphij MFC after: 2 weeks	2012-04-07 15:27:34 +00:00
Konstantin Belousov	38ddb5725b	Decomission mnt_noasync. Introduce MNTK_NOASYNC mnt_kern_flag which allows a filesystem to request VFS to not allow MNTK_ASYNC. MFC after: 1 week	2012-03-09 00:12:05 +00:00
Martin Matuska	a91d2201f9	Analogous to r230407 a separate path buffer in vfs_mount.c is required for r230129. Fixes a out of bounds write to fspath. MFC after: 10 days	2012-02-05 10:59:50 +00:00
Kirk McKusick	cc672d3599	Make sure all intermediate variables holding mount flags (mnt_flag) and that all internal kernel calls passing mount flags are declared as uint64_t so that flags in the top 32-bits are not lost. MFC after: 2 weeks	2012-01-17 01:08:01 +00:00
Martin Matuska	f6e633a9e1	Introduce vn_path_to_global_path() This function updates path string to vnode's full global path and checks the size of the new path string against the pathlen argument. In vfs_domount(), sys_unmount() and kern_jail_set() this new function is used to update the supplied path argument to the respective global path. Unbreaks jailed zfs(8) with enforce_statfs set to 1. Reviewed by: kib MFC after: 1 month	2012-01-15 12:08:20 +00:00
Attilio Rao	ed1f6dc235	Introduce the option VFS_ALLOW_NONMPSAFE and turn it on by default on all the architectures. The option allows to mount non-MPSAFE filesystem. Without it, the kernel will refuse to mount a non-MPSAFE filesytem. This patch is part of the effort of killing non-MPSAFE filesystems from the tree. No MFC is expected for this patch. Tested by: gianni Reviewed by: kib	2011-11-08 10:18:07 +00:00
Ed Schouten	d745c852be	Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs. This means that their use is restricted to a single C file.	2011-11-07 06:44:47 +00:00
Kirk McKusick	cd795a6e1f	When unmounting a filesystem always wait for the vfs_busy lock to clear so that if no vnodes in the filesystem are actively in use the unmount will succeed rather than failing with EBUSY. Reported by: Garrett Cooper Reviewed by: Attilio Rao and Kostik Belousov Tested by: Garrett Cooper PR: kern/161016 MFC after: 3 weeks	2011-10-11 18:46:41 +00:00
Kip Macy	8451d0dd78	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)	2011-09-16 13:58:51 +00:00
Martin Matuska	d16eac5c57	Revert r224655 and r224614 because vn_fullpath* does not always work on nullfs mounts. Change shall be reconsidered after 9.0 is released. Requested by: re (kib) Approved by: re (kib)	2011-08-08 14:02:08 +00:00
Martin Matuska	5388625fff	The change in r224615 didn't take into account that vn_fullpath_global() doesn't operate on locked vnode. This could cause a panic. Fix by unlocking vnode, re-locking afterwards and verifying that it wasn't renamed or deleted. To improve readability and reduce code size, move code to a new static function vfs_verify_global_path(). In addition, fix missing giant unlock in unmount(). Reported by: David Wolfskill <david@catwhisker.org> Reviewed by: kib Approved by: re (bz) MFC after: 2 weeks	2011-08-05 11:12:50 +00:00
Martin Matuska	f6c1d63e47	For mount, discover f_mntonname from supplied path argument using vn_fullpath_global(). This fixes f_mntonname if mounting inside chroot, jail or with relative path as argument. For unmount in jail, use vn_fullpath_global() to discover global path from supplied path argument. This fixes unmount in jail. Reviewed by: pjd, kib Approved by: re (kib) MFC after: 2 weeks	2011-08-02 19:13:56 +00:00
Kirk McKusick	6beb3bb4eb	This update changes the mnt_flag field in the mount structure from 32 bits to 64 bits and eliminates the unused mnt_xflag field. The existing mnt_flag field is completely out of bits, so this update gives us room to expand. Note that the f_flags field in the statfs structure is already 64 bits, so the expanded mnt_flag field can be exported without having to make any changes in the statfs structure. Approved by: re (bz)	2011-07-24 17:43:09 +00:00
Andrey V. Elsukov	fef7c585de	Include sys/sbuf.h directly.	2011-07-11 05:17:46 +00:00
Matthew D Fleming	3d08a76bbc	Use a name instead of a magic number for kern_yield(9) when the priority should not change. Fetch the td_user_pri under the thread lock. This is probably not necessary but a magic number also seems preferable to knowing the implementation details here. Requested by: Jason Behmer < jason DOT behmer AT isilon DOT com >	2011-05-13 05:27:58 +00:00
Jaakko Heinonen	1b0fe69dc9	Utilize vfs_sanitizeopts() in vfs_mergeopts() to merge options. Because vfs_sanitizeopts() can handle "ro" and "rw" options properly, there is no more need to add "noro" in vfs_donmount() to cancel "ro". This also fixes a problem of canceling options beginning with "no". For example, "noatime" didn't cancel "nonoatime". Thus it was possible that both "noatime" and "nonoatime" were active at the same time. Reviewed by: bde	2011-04-22 07:26:09 +00:00
Jaakko Heinonen	9dc6abbd8a	Fix some style issues in r219925. Reported by: bde MFC after: 1 month	2011-03-26 17:17:24 +00:00
Jaakko Heinonen	3fd8fe5b54	Recognize "ro", "rdonly", "norw", "rw" and "noro" as equal options in vfs_equalopts(). This allows vfs_sanitizeopts() to filter redundant occurrences of these options. It was possible that for example both "ro" and "rw" options became active concurrently. PR: kern/133614 Discussed on: freebsd-hackers MFC after: 1 month	2011-03-23 17:56:38 +00:00
Jaakko Heinonen	da2e368f70	Don't restore old mount options and flags if VFS_MOUNT(9) succeeds but vfs_export() fails. Restoring old options and flags after successful VFS_MOUNT(9) call may cause the file system internal state to become inconsistent with mount options and flags. Specifically the FFS super block fs_ronly field and the MNT_RDONLY flag may get out of sync. PR: kern/133614 Discussed on: freebsd-hackers	2011-02-19 14:27:14 +00:00
Matthew D Fleming	e7ceb1e99b	Based on discussions on the svn-src mailing list, rework r218195: - entirely eliminate some calls to uio_yeild() as being unnecessary, such as in a sysctl handler. - move should_yield() and maybe_yield() to kern_synch.c and move the prototypes from sys/uio.h to sys/proc.h - add a slightly more generic kern_yield() that can replace the functionality of uio_yield(). - replace source uses of uio_yield() with the functional equivalent, or in some cases do not change the thread priority when switching. - fix a logic inversion bug in vlrureclaim(), pointed out by bde@. - instead of using the per-cpu last switched ticks, use a per thread variable for should_yield(). With PREEMPTION, the only reasonable use of this is to determine if a lock has been held a long time and relinquish it. Without PREEMPTION, this is essentially the same as the per-cpu variable.	2011-02-08 00:16:36 +00:00
Matthew D Fleming	08b163fa51	Put the general logic for being a CPU hog into a new function should_yield(). Use this in various places. Encapsulate the common case of check-and-yield into a new function maybe_yield(). Change several checks for a magic number of iterations to use should_yield() instead. MFC after: 1 week	2011-02-02 16:35:10 +00:00
Jaakko Heinonen	1a4fbae871	Replace spaces with tabs.	2011-01-24 17:08:26 +00:00
Sergey Kandaurov	f03749ca2d	Update MNT_ROOTFS comments after changes in the root mount logic. Reported by: arundel Suggested by: marcel (phrasing) Approved by: kib (mentor)	2010-11-23 13:49:15 +00:00
Marcel Moolenaar	c1f0aabb9f	In vfs_filteropt(), only print the errmsg when there's no errmsg mount option. Otherwise errors tend to get printed multiple times.	2010-10-18 04:34:42 +00:00
Konstantin Belousov	d0cc54f3b4	The r184588 changed the layout of struct export_args, causing an ABI breakage for old mount(2) syscall, since most struct <filesystem>_args embed export_args. The mount(2) is supposed to provide ABI compatibility for pre-nmount mount(8) binaries, so restore ABI to pre-r184588. Requested and reviewed by: bde MFC after: 2 weeks	2010-10-10 07:05:47 +00:00
Marcel Moolenaar	24e01f5998	Split the root mount logic from the (generic) mount code and move it (the root mount code) into a new file called vfs_mountroot.c The split is almost trivial, as the code is almost perfectly non-intertwined. The only adjustment needed was to move the UMA zone allocation out of vfs_mountroot() [in vfs_mountroot.c] and into vfs_mount.c, where it had to be done as a SYSINIT [see vfs_mount_init()]. There are no functional changes with this commit.	2010-10-02 19:44:13 +00:00
Konstantin Belousov	9a24dc0760	Protect mnt_syncer with the sync_mtx. This prevents a (rare) vnode leak when mount and update are executed in parallel. Encapsulate syncer vnode deallocation into the helper function vfs_deallocate_syncvnode(), to not externalize sync_mtx from vfs_subr.c. Found and reviewed by: jh (previous version of the patch) Tested by: pho MFC after: 3 weeks	2010-09-11 13:06:06 +00:00
Pawel Jakub Dawidek	4946fa6791	Remove VI_MOUNT flag from vnode on VFS_MOUNT() failure.	2010-09-09 07:55:13 +00:00
Pawel Jakub Dawidek	7443b79b81	Doing first mount and updating mount points are both handled by the same syscall and the same function, but are very different and share almost no code. To make it easier to read and analyze, split vfs_domount() into vfs_domount_first() and vfs_domount_update(). Reviewed by: kib	2010-09-08 21:00:53 +00:00
Pawel Jakub Dawidek	a34512e3f0	- Log all the problems in devfs_fixup(). - Correct error paths. The system will be useless on devfs_fixup() failure, so why bother? Maybe for the same reason why a dead body is washed and dressed in a nice suit before it is put into a coffin? Maybe system's last will is to panic without any locks held? Reviewed by: kib	2010-09-08 20:56:18 +00:00
Pawel Jakub Dawidek	c87f1ad43c	There is a bug in vfs_allocate_syncvnode() failure handling in mount code. Actually it is hard to properly handle such a failure, especially in MNT_UPDATE case. The only reason for the vfs_allocate_syncvnode() function to fail is getnewvnode() failure. Fortunately it is impossible for current implementation of getnewvnode() to fail, so we can assert this and make vfs_allocate_syncvnode() void. This in turn free us from handling its failures in the mount code. Reviewed by: kib MFC after: 1 month	2010-08-28 08:57:15 +00:00
Pawel Jakub Dawidek	d779d44313	- Reduce scope of vnode lock. vfs_mount_alloc() doesn't need vnode to be locked. - Remove code duplication.	2010-02-18 22:22:45 +00:00
Antoine Brodin	13e403fdea	(S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument. Fix some wrong usages. Note: this does not affect generated binaries as this argument is not used. PR: 137213 Submitted by: Eygene Ryabinkin (initial version) MFC after: 1 month	2009-12-28 22:56:30 +00:00
Attilio Rao	d113304956	Add the possibility for vfs.root.mountfrom tunable to accept a list of items rather than a single one. The list is a space separated collection of items defined as the current one accepted. While there fix also a nit in a comment. Obtained from: Sandvine Incorporated Reviewed by: emaste Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com> Sponsored by: Sandvine Incorporated MFC: 2 weeks	2009-11-12 15:59:05 +00:00
Edward Tomasz Napierala	7ba889b8f4	Add suggestion for zfs root.	2009-11-08 09:54:25 +00:00
John Baldwin	87eca70e0c	Fix some LORs between vnode locks and filedescriptor table locks. - Don't grab the filedesc lock just to read fd_cmask. - Drop vnode locks earlier when mounting the root filesystem and before sanitizing stdin/out/err file descriptors during execve(). Submitted by: kib Approved by: re (rwatson) MFC after: 1 week	2009-07-31 13:40:06 +00:00
Robert Watson	791b0ad2bf	Eliminate ARG_UPATH[12] arguments to AUDIT_ARG_UPATH() and instead provide specific macros, AUDIT_ARG_UPATH1() and AUDIT_ARG_UPATH2() to capture path information for audit records. This allows us to move the definitions of ARG_* out of the public audit header file, as they are an implementation detail of our current kernel-internal audit record, which may change. Approved by: re (kensmith) Obtained from: TrustedBSD Project MFC after: 1 month	2009-07-29 07:44:43 +00:00
Robert Watson	6d5a61563a	When auditing unmount(2), capture FSID arguments as regular text strings rather than as paths, which would lead to them being treated as relative pathnames and hence confusingly converted into absolute pathnames. Capture flags to unmount(2) via an argument token. Approved by: re (audit argument blanket) MFC after: 3 days	2009-07-01 16:56:56 +00:00
Robert Watson	14961ba789	Replace AUDIT_ARG() with variable argument macros with a set more more specific macros for each audit argument type. This makes it easier to follow call-graphs, especially for automated analysis tools (such as fxr). In MFC, we should leave the existing AUDIT_ARG() macros as they may be used by third-party kernel modules. Suggested by: brooks Approved by: re (kib) Obtained from: TrustedBSD Project MFC after: 1 week	2009-06-27 13:58:44 +00:00
Robert Watson	bcf11e8d00	Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd	2009-06-05 14:55:22 +00:00
Craig Rodrigues	0c349f0856	sys/boot/common.c ================= Extend the loader to parse the root file system mount options in /etc/fstab, and set a new loader variable vfs.root.mountfrom.options with these options. The root mount options must be a comma-delimited string, as specified in /etc/fstab. Only set the vfs.root.mountfrom.options variable if it has not been set in the environment. sys/kern/vfs_mount.c ==================== When mounting the root file system, pass the mount options specified in vfs.root.mountfrom.options, but filter out "rw" and "noro", since the initial mount of the root file system must be done as "ro". While we are here, try to add a few hints to the mountroot prompt to give users and idea what might of gone wrong during mounting of the root file system. Reviewed by: jhb (an earlier patch)	2009-06-01 01:02:30 +00:00

1 2 3 4 5 ...

456 Commits