Commit Graph

456 Commits

Author SHA1 Message Date
Mateusz Guzik
b488246b45 vfs: group fields used for per-cpu ops in one cacheline
Sponsored by:	The FreeBSD Foundation
2019-09-19 21:23:14 +00:00
Mateusz Guzik
4cace859c2 vfs: convert struct mount counters to per-cpu
There are 3 counters modified all the time in this structure - one for
keeping the structure alive, one for preventing unmount and one for
tracking active writers. Exact values of these counters are very rarely
needed, which makes them a prime candidate for conversion to a per-cpu
scheme, resulting in much better performance.

Sample benchmark performing fstatfs (modifying 2 out of 3 counters) on
a 104-way 2 socket Skylake system:
before:   852393 ops/s
after:  76682077 ops/s

Reviewed by:	kib, jeff
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21637
2019-09-16 21:37:47 +00:00
Mateusz Guzik
a8c8e44bf0 vfs: manage mnt_ref with atomics
New primitive is introduced to denote sections can operate locklessly
on aspects of struct mount, but which can also be disabled if necessary.
This provides an opportunity to start scaling common case modifications
while providing stable state of the struct when facing unmount, write
suspendion or other events.

mnt_ref is the first counter to start being managed in this manner with
the intent to make it per-cpu.

Reviewed by:	kib, jeff
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21425
2019-09-16 21:31:02 +00:00
Konstantin Belousov
e671edac06 De-commision the MNTK_NOINSMNTQ kernel mount flag.
After all the changes, its dynamic scope is same as for MNTK_UNMOUNT,
but to allow the syncer vnode to be re-installed on unmount failure.
But the case of syncer was already handled by using the VV_FORCEINSMQ
flag for quite some time.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-08-23 19:40:10 +00:00
Mateusz Guzik
4b3f767340 vfs: fix up r351193 ("stop always overwriting ->mnt_stat in VFS_STATFS")
fs-specific part of vfs_statfs routines only fill in small portion of the
structure. Previous code was always copying everything at a higher layer to
acoomodate it and this patch does the same.

'df' (no arguments) worked fine because the caller uses mnt_stat itself as the
target buffer, making all the copying a no-op for its own case.
'df /' and similar use a different consumer which passes its own buffer and
this is where you can run into trouble.

Reported by:	cy
Fixes: r351193
Sponsored by:	The FreeBSD Foundation
2019-08-19 14:11:54 +00:00
Mateusz Guzik
e7c1709aaf vfs: stop always overwriting ->mnt_stat in VFS_STATFS
The struct is already populated on each mount (and remount). Fields are either
constant or not used by filesystem in the first place.

Some infrequently used functions use it to avoid having to allocate a new buffer
and are left alone.

The current code results in an avoidable copying single-threaded and significant
cache line bouncing multithreaded

While here deduplicate initial filling of the struct.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21317
2019-08-18 18:40:12 +00:00
Conrad Meyer
daec92844e Include ktr.h in more compilation units
Similar to r348026, exhaustive search for uses of CTRn() and cross reference
ktr.h includes.  Where it was obvious that an OS compat header of some kind
included ktr.h indirectly, .c files were left alone.  Some of these files
clearly got ktr.h via header pollution in some scenarios, or tinderbox would
not be passing prior to this revision, but go ahead and explicitly include it
in files using it anyway.

Like r348026, these CUs did not show up in tinderbox as missing the include.

Reported by:	peterj (arm64/mp_machdep.c)
X-MFC-With:	r347984
Sponsored by:	Dell EMC Isilon
2019-05-21 20:38:48 +00:00
Kirk McKusick
13c31c29ca Some filesystems (like cd9660 and ext3) require that VFS_STATFS()
be called before VFS_ROOT() is called. Move the call for VFS_STATFS()
so that it is done after VFS_MOUNT(), but before VFS_ROOT().
This change actually improves the robustness of the mount system
call because it returns an error rather than failing silently
when VFS_STATFS() returns failure.

Reported by:  Rebecca Cran <rebecca@bluestop.org>
Sponsored by: Netflix
2018-12-21 01:09:25 +00:00
Kirk McKusick
e04d2a3c5a Under UFS/FFS the VFS_ROOT() function will return an error if the inode
check-hash fails. Panic'ing is not an appropriate response. So, check
for an error return from VFS_ROOT() and when an error is reported,
unwind and return the error.

Reported by:  Gary Jennejohn (gj)
Sponsored by: Netflix
2018-12-15 19:04:50 +00:00
Mateusz Guzik
cc426dd319 Remove unused argument to priv_check_cred.
Patch mostly generated with cocinnelle:

@@
expression E1,E2;
@@

- priv_check_cred(E1,E2,0)
+ priv_check_cred(E1,E2)

Sponsored by:	The FreeBSD Foundation
2018-12-11 19:32:16 +00:00
Mark Johnston
970a174f3b Add FALLTHROUGH comments to appease Coverity.
CID:		1017862-1017864, 1017866-1017868
MFC after:	2 weeks
2018-10-25 15:43:21 +00:00
Konstantin Belousov
4fceda6206 Correct condition to detect mount(2) support by a filesystem.
Reported and tested by:	cy
Sponsored by:	The FreeBSD Foundation
Approved by:	re (rgrimes)
2018-10-24 19:40:09 +00:00
Konstantin Belousov
8ff7fad1d7 Only call sigdeferstop() for NFS.
Use bypass to catch any NFS VOP dispatch and route it through the
wrapper which does sigdeferstop() and then dispatches original
VOP. NFS does not need a bypass below it, which is not supported.

The vop offset in the vop_vector is added since otherwise it is
impossible to get vop_op_t from the internal table, and I did not
wanted to create the layered fs only to wrap NFS VOPs.

VFS_OP()s wrap is straightforward.

Requested and reviewed by:	mjg (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D17658
2018-10-23 21:43:41 +00:00
Jamie Gritton
0e5c6bd436 Make it easier for filesystems to count themselves as jail-enabled,
by doing most of the work in a new function prison_add_vfs in kern_jail.c
Now a jail-enabled filesystem need only mark itself with VFCF_JAIL, and
the rest is taken care of.  This includes adding a jail parameter like
allow.mount.foofs, and a sysctl like security.jail.mount_foofs_allowed.
Both of these used to be a static list of known filesystems, with
predefined permission bits.

Reviewed by:	kib
Differential Revision:	D14681
2018-05-04 20:54:27 +00:00
Andriy Gapon
31260bf042 vfs_donmount: in certain cases try r/o mount if r/w mount fails
If the operation is not an update, if neither r/w nor r/o mode is
explicitly requested, if the error code hints at the possibility of the
media being read-only, and if the fallback is allowed, then we can try
to automatically downgrade to the readonly mode.

This is especially useful for auto-mounting of removable media that
sometimes can happen to be write-protected.

The fallback to r/o is not enabled by default.  It can be requested on a
per-mount basis with a new mount option, 'autoro'.  Or it can be
globally allowed by setting vfs.default_autoro.

Reviewed by:	cem, kib
MFC after:	3 weeks
Relnotes:	yes
Differential Revision: https://reviews.freebsd.org/D13361
2018-03-27 14:31:42 +00:00
Ian Lepore
ac579135b0 Use EVENTHANDLER_DIRECT_INVOKE for [un]mount events, for better performance. 2018-01-07 18:07:22 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Andriy Gapon
f92e3400bc remove process and jail directory machinations from dounmount
The manipulations done by mountcheckdirs() are not that useful during
the unmount, they can bring about unexpected security consequences.

Thic change effectively reverts the change in r73241.

The change also allows to simplify the handling of rootvnode global
variable.

Discussed with:	mckusick, mjg, kib
Reviewed by:	trasz
MFC after:	1 month
Differential Revision: https://reviews.freebsd.org/D12366
2017-10-13 09:42:05 +00:00
Konstantin Belousov
9770475ce7 Do not vrele() covered vnode under the mp mutex.
If vrele() changes the hold count to zero, it needs to acquire the
vnode lock.

Sponsored by:	The FreeBSD Foundation
Discussed with:	avg
X-MFC with:	r323578
2017-09-19 16:49:45 +00:00
Andriy Gapon
cbc785c293 dounmount: do not release the mount point's reference on the covered vnode
As long as mnt_ref is not zero there can be a consumer that might try
to access mnt_vnodecovered.  For this reason the covered vnode must not
be freed until mnt_ref goes to zero.
So, move the release of the covered vnode to vfs_mount_destroy.

Reviewed by:	kib
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D12329
2017-09-14 08:47:06 +00:00
Ed Maste
3e85b721d6 Remove register keyword from sys/ and ANSIfy prototypes
A long long time ago the register keyword told the compiler to store
the corresponding variable in a CPU register, but it is not relevant
for any compiler used in the FreeBSD world today.

ANSIfy related prototypes while here.

Reviewed by:	cem, jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D10193
2017-05-17 00:34:34 +00:00
Konstantin Belousov
2f304845e2 Do not allocate struct statfs on kernel stack.
Right now size of the structure is 472 bytes on amd64, which is
already large and stack allocations are indesirable.  With the ino64
work, MNAMELEN is increased to 1024, which will make it impossible to have
struct statfs on the stack.

Extracted from:	ino64 work by gleb
Discussed with:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-05 17:19:26 +00:00
Konstantin Belousov
714b7df502 Provide simple mutual exclusion between mount point update and unmount.
Currently mount update keeps vfs_busy(9) reference on the mount point
during MNT_UPDATE VFS_MOUNT() vfsops call.  This already provides the
exclusion, but is problematic for filesystems which need to perform
namei(9) during VFS_MOUNT(MNT_UPDATE) operations, e.g. to refresh
mnt_from path, because namei(9) must not be called while the
vfs_busy(9) reference is owned.

Check for MNT_UPDATE flag before setting MNTK_UNMOUNT, and for
MNTK_UNMOUNT before entering innards of vfs_domount_update(), failing
syscalls with EBUSY if conflict is detected.  Keep vfs_busy(9)
reference around VFS_MOUNT(MNT_UPDATE) calls still to not change VFS
KPI.

In the update path in ffs_mount(), drop vfs_busy() reference around
namei(), which is now safe due to unmount never executing in parallel
with VFS_MOUNT(MNT_UPDATE), and which avoids the deadlock.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-11-13 21:49:51 +00:00
Konstantin Belousov
9eb8f495b8 Move common cleanup code into helper.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-11-13 21:39:55 +00:00
Mateusz Guzik
45571f8886 vfs: assert empty tmp free list on unmount 2016-10-08 13:38:05 +00:00
Konstantin Belousov
f71d08566c Limit scope of the optimization in r306608 to dounmount() caller only.
Other uses of cache_purgevfs() do rely on the cache purge for correct
operations, when paths are invalidated without unmount.

Reported and tested by:	jkim
Discussed with:	mjg
Sponsored by:	The FreeBSD Foundation
2016-10-07 11:38:28 +00:00
Mateusz Guzik
5bb81f9b2d vfs: batch free vnodes in per-mnt lists
Previously free vnodes would always by directly returned to the global
LRU list. With this change up to mnt_free_list_batch vnodes are collected
first.

syncer runs always return the batch regardless of its size.

While vnodes on per-mnt lists are not counted as free, they can be
returned in case of vnode shortage.

Reviewed by:	kib
Tested by:	pho
2016-09-30 17:27:17 +00:00
Edward Tomasz Napierala
e313b4dd95 Fix bug introduced with r302388, which could cause processes accessing
automounted shares to hang with "vfs_busy" wchan.

(As a workaround one can run 'automount -u' from cron.)

Reviewed by:	kib@
MFC after:	1 month
2016-09-21 05:44:13 +00:00
Ed Maste
69a2875821 Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
Edward Tomasz Napierala
411455a8fb Replace all remaining calls to vprint(9) with vn_printf(9), and remove
the old macro.

MFC after:	1 month
2016-08-10 16:12:31 +00:00
Edward Tomasz Napierala
debc480e03 Add new unmount(2) flag, MNT_NONBUSY, to check whether there are
any open vnodes before proceeding. Make autounmound(8) use this flag.
Without it, even an unsuccessfull unmount causes filesystem flush,
which interferes with normal operation.

Reviewed by:	kib@
Approved by:	re (gjb@)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D7047
2016-07-07 09:03:57 +00:00
Konstantin Belousov
9fdbfd3b6c Do not assume that we own the use reference on the covered vnode until
we set MNTK_UNMOUNT flag on the mp.  Otherwise parallel unmount which
wins race with us could dereference the covered vnode, and we are
left with the locked freed memory.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
MFC after:	1 week
2016-06-15 15:56:03 +00:00
Andriy Gapon
8614f45b2d dounmount: do not call mountcheckdirs() for mounts with MNT_IGNORE
This is a bit hackish, but the flag is currently set only for ZFS
snapshots mounted under .zfs.  mountcheckdirs() can change cdir/rdir
references to a covered vnode.  But for the said snapshots the covered
vnode is really ephemeral and it must never be accessed (except
for a few specific cases).

To do:	consider removing mountcheckdirs() entirely

MFC after:	5 days
2016-05-16 07:23:24 +00:00
Konstantin Belousov
76c404fce5 Do not copy by field when converting struct oexport_args to struct
export_args on mount update, bzero() is consistent with
vfs_oexport_conv().
Make the code structure more explicit by using switch.
Return EINVAL if export option layout (deduced from size) is unknown.

Based on the submission by:	bde
Sponsored by:	The FreeBSD Foundation
2016-02-04 16:32:21 +00:00
Edward Tomasz Napierala
c9ba65040f Make vfs_unmountall() unmount /dev after /, not before. The only
reason this didn't result in an unclean shutdown is that devfs ignores
MNT_FORCE flag.

Reviewed by:	kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3467
2015-08-24 13:18:13 +00:00
Konstantin Belousov
1965f86c72 Vnode is not referenced by the vfs_domount() at the point where
asserts are made.  Remove them, since we might dereference freed
memory.  Leaked locks are asserted by the syscall return code anyway.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-07-02 14:31:47 +00:00
Konstantin Belousov
780dca1b1e Right now, dounmount() is called with unreferenced mount point.
Nothing stops a parallel unmount to suceed before the given call to
dounmount() checks and locks the covered vnode.  Prevent dounmount()
from acting on the freed (although type-stable) memory by changing the
interface to require the mount point to be referenced.  dounmount()
consumes the reference on return, regardless of the sucessfull or
erronous result.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-05-27 09:22:50 +00:00
Konstantin Belousov
5d6f5b24ca Mountd iterating over the mount points may race with the parallel
unmount, which causes error from nmount(2) call when performing
MNT_DELEXPORT over the directory which ceased to be a mount point.

The race is legitimate and innocent, but results in the chatty mountd.
Silence it by providing an distinguished error code for the situation,
and ignoring the error in mountd loop.

Based on the patch by:	Andreas Longwitz <longwitz@incore.de>
Prodded and tested by:	bdrewery
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-02-10 18:00:32 +00:00
Konstantin Belousov
b2344ab5ff Do not call VFS_SYNC() before VFS_UNMOUNT() for forced unmount.
Since VFS does not/cannot stop writes, sync might run indefinitely, or
be a wrong thing to do at all.  E. g. NFS ignores VFS_SYNC() for
forced unmounts, since non-responding server does not allow sync to
finish.  On the other hand, filesystems can and do stop writes using
fs-specific facilities, and should already fully flush caches in
VFS_UNMOUNT() due to the race.

Adjust msdosfs tp sync in unmount for forced call, to accomodate the
new behaviour.  Note that it is still racy, since writes are not
stopped.

Discussed with:	avg, bjk, mckusick
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2014-12-09 10:00:47 +00:00
Edward Tomasz Napierala
3914ddf8a7 Bring in the new automounter, similar to what's provided in most other
UNIX systems, eg. MacOS X and Solaris.  It uses Sun-compatible map format,
has proper kernel support, and LDAP integration.

There are still a few outstanding problems; they will be fixed shortly.

Reviewed by:	allanjude@, emaste@, kib@, wblock@ (earlier versions)
Phabric:	D523
MFC after:	2 weeks
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
2014-08-17 09:44:42 +00:00
Konstantin Belousov
168f4ee0a8 Remove Giant acquisition from the mount and unmount pathes.
It could be claimed that two things were reasonable protected by
Giant.  One is vfsconf list links, which is converted to the new
dedicated sx vfsconf_sx.  Another is vfsconf.vfc_refcount, which is
now updated with atomics.

Note that vfc_refcount still has the same races now as it has under
the Giant, the unload of filesystem modules can happen while the
module is still in use.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-08-03 03:27:54 +00:00
Bryan Drewery
97c0df733f Use proper MFSNAMELEN for fs type.
MFC after:	2 weeks
Reviewed by:	rodrigc
Also spotted by:ambrisko
2014-04-12 21:39:17 +00:00
Sean Bruno
d3baefa809 Change len checks for fstypelen and fspathlen to be against absolute len
not strlen as they are *not* strings.

Discovered by GSOC student, Mike Ma <mikemandarine@gmail.com> during his
fuse.glusterfs port to FreeBSD.

Final patch from mckusick@

Submitted by:	mckusick@
Approved by:	re (hrs)
MFC after:	2 weeks
2013-10-03 22:52:03 +00:00
Rick Macklem
8fe6bddff7 Forced dismounts of NFS mounts can fail when thread(s) are stuck
waiting for an RPC reply from the server while holding the mount
point busy (mnt_lockref incremented). This happens because dounmount()
msleep()s waiting for mnt_lockref to become 0, before calling
VFS_UNMOUNT(). This patch adds a new VFS operation called VFS_PURGE(),
which the NFS client implements as purging RPCs in progress. Making
this call before checking mnt_lockref fixes the problem, by ensuring
that the VOP_xxx() calls will fail and unbusy the mount point.

Reported by:	sbruno
Reviewed by:	kib
MFC after:	2 weeks
2013-09-01 23:02:59 +00:00
Marcel Moolenaar
8939c0693c Add vfs_mounted and vfs_unmounted events so that components can be informed
about mount and unmount events. This is used by Juniper to implement a more
optimal implementation of NetBSD's veriexec.

This change differs from r253224 in the following way:
o   The vfs_mounted handler is called before mountcheckdirs() and with
    newdp locked. vp is unlocked.
o   The event handlers are declared in <sys/eventhandler.h> and not in
    <sys/mount.h>. The <sys/mount.h> header is used in user land code
    that pretends to be kernel code and as such creates a very convoluted
    environment. It's hard to untangle.

Submitted by:	stevek@juniper.net
Discussed with:	pjd@
Obtained from:	Juniper Networks, Inc.
2013-07-10 15:35:25 +00:00
Marcel Moolenaar
4612275fdb Revert r251590. It unexpectedly broke the build and there were some
questions on locking. As part of commit-bit grooming, I'd like Steve
to handle this, but can't leave things broken in the mean time.
2013-06-10 15:22:27 +00:00
Marcel Moolenaar
8c7ca16f63 Add vfs_mounted and vfs_unmounted events so that components can be informed
about mount and unmount events. This is used by Juniper to implement a more
optimal implementation of NetBSD's veriexec.

Submitted by:	stevek@juniper.net
Obtained from:	Juniper Networks, Inc
2013-06-09 23:51:26 +00:00
Gabor Kovesdan
ab3f6b347e - Correct mispellings of the word occurrence
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de> (via private mail)
2013-04-17 11:40:10 +00:00
Konstantin Belousov
8d6884ce9c When the journaled FFS volume is suspended due to the journal space
becoming too low, the softdep flush thread processes the workitems,
which frees the space in journal, and then unsuspends the fs.  The
softdep_flush() and other workitem processing functions busy the
filesystem before iterating over the worklist, to prevent the parallel
unmount from freeing the mount data. The vfs_busy() is called with
MBF_NOWAIT flag.

Now, if the unmount is already started and the filesystem is suspended
due to low journal space, the journal is never flushed and filesystem
is never unsuspended, because vfs_busy(MBF_NOWAIT) call cannot succeed
for the unmounting fs, and softdep_flush() does not process the
workitems. Unmount needs to write metadata, where it hangs in the
"suspfs" state.

Move the vn_start_write() call in the dounmount() before setting the
MNTK_UNMOUNT flag. This practically ensures that softdep_flush()
processed the pending journal writes by making dounmount() wait for
the lift of the suspension.

Sponsored by:	The FreeBSD Foundation
Reported and tested by:	pho
MFC after:	2 weeks
2013-03-20 21:07:49 +00:00
David Xu
eea8d86d4d Revert revision 244760 because strncpy pads trailing space with zero,
this prevents kernel data from being leaked.

Noticed by: Joerg Sonnenberger &lt; joerg at britannica dot bec dot de &gt;
2013-01-04 11:11:12 +00:00
Konstantin Belousov
d1c5e3f8b0 Remove the deprecated MNT_VNODE_FOREACH interface. Use the
MNT_VNODE_FOREACH_ALL instead.
2013-01-03 19:02:52 +00:00
David Xu
9d4bf0db7c Use strlcpy to NULL-terminate error message even if user provided a short
buffer.
2012-12-28 02:43:33 +00:00
Attilio Rao
b1308d72c2 Fixup r218424: uio_yield() was scaling directly to userland priority.
When kern_yield() was introduced with the possibility to specify
a new priority, the behaviour changed by not lowering priority at all
in the consumers, making the yielding mechanism highly ineffective for
high priority kthreads like bufdaemon, syncer, vlrudaemon, etc.
There are no evidences that consumers could bear with such change in
semantic and this situation could finally lead to bugs similar to the
ones fixed in r244240.
Re-specify userland pri for kthreads involved.

Tested by:	pho
Reviewed by:	kib, mdf
MFC after:	1 week
2012-12-21 13:14:12 +00:00
Konstantin Belousov
796fa4fb86 Fix typo.
MFC after:	3 days
2012-12-09 20:26:51 +00:00
Pawel Jakub Dawidek
e1216d1335 IFp4 @208450:
Remove redundant call to AUDIT_ARG_UPATH1().
Path will be remembered by the following NDINIT(AUDITVNODE1) call.

Sponsored by:	FreeBSD Foundation (auditdistd)
MFC after:	2 weeks
2012-11-30 22:49:28 +00:00
Konstantin Belousov
5050aa86cf Remove the support for using non-mpsafe filesystem modules.
In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by:	attilio
Tested by:	pho
2012-10-22 17:50:54 +00:00
Konstantin Belousov
bcd5bb8e57 Add a facility for vgone() to inform the set of subscribed mounts
about vnode reclamation. Typical use is for the bypass mounts like
nullfs to get a notification about lower vnode going away.

Now, vgone() calls new VFS op vfs_reclaim_lowervp() with an argument
lowervp which is reclaimed. It is possible to register several
reclamation event listeners, to correctly handle the case of several
nullfs mounts over the same directory.

For the filesystem not having nullfs mounts over it, the overhead
added is a single mount interlock lock/unlock in the vnode reclamation
path.

In collaboration with:	pho
MFC after:	3 weeks
2012-09-09 19:17:15 +00:00
Kirk McKusick
f257ebbb2e This change creates a new list of active vnodes associated with
a mount point. Active vnodes are those with a non-zero use or hold
count, e.g., those vnodes that are not on the free list. Note that
this list is in addition to the list of all the vnodes associated
with a mount point.

To avoid adding another set of linkage pointers to the vnode
structure, the active list uses the existing linkage pointers
used by the free list (previously named v_freelist, now renamed
v_actfreelist).

This update adds the MNT_VNODE_FOREACH_ACTIVE interface that loops
over just the active vnodes associated with a mount point (typically
less than 1% of the vnodes associated with the mount point).

Reviewed by: kib
Tested by:   Peter Holm
MFC after:   2 weeks
2012-04-20 06:50:44 +00:00
Kirk McKusick
71469bb38f Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL.
The primary changes are that the user of the interface no longer
needs to manage the mount-mutex locking and that the vnode that
is returned has its mutex locked (thus avoiding the need to check
to see if its is DOOMED or other possible end of life senarios).

To minimize compatibility issues for third-party developers, the
old MNT_VNODE_FOREACH interface will remain available so that this
change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH
will be removed in head.

The reason for this update is to prepare for the addition of the
MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the
active vnodes associated with a mount point (typically less than
1% of the vnodes associated with the mount point).

Reviewed by: kib
Tested by:   Peter Holm
MFC after:   2 weeks
2012-04-17 16:28:22 +00:00
Gleb Kurtsou
0ff93c48da Add vfs_getopt_size. Support human readable file system options in tmpfs.
Increase maximum tmpfs file system size to 4GB*PAGE_SIZE on 32 bit archs.

Discussed with:	delphij
MFC after:	2 weeks
2012-04-07 15:27:34 +00:00
Konstantin Belousov
38ddb5725b Decomission mnt_noasync. Introduce MNTK_NOASYNC mnt_kern_flag which
allows a filesystem to request VFS to not allow MNTK_ASYNC.

MFC after:	1 week
2012-03-09 00:12:05 +00:00
Martin Matuska
a91d2201f9 Analogous to r230407 a separate path buffer in vfs_mount.c is required
for r230129. Fixes a out of bounds write to fspath.

MFC after:	10 days
2012-02-05 10:59:50 +00:00
Kirk McKusick
cc672d3599 Make sure all intermediate variables holding mount flags (mnt_flag)
and that all internal kernel calls passing mount flags are declared
as uint64_t so that flags in the top 32-bits are not lost.

MFC after: 2 weeks
2012-01-17 01:08:01 +00:00
Martin Matuska
f6e633a9e1 Introduce vn_path_to_global_path()
This function updates path string to vnode's full global path and checks
the size of the new path string against the pathlen argument.

In vfs_domount(), sys_unmount() and kern_jail_set() this new function
is used to update the supplied path argument to the respective global path.

Unbreaks jailed zfs(8) with enforce_statfs set to 1.

Reviewed by:	kib
MFC after:	1 month
2012-01-15 12:08:20 +00:00
Attilio Rao
ed1f6dc235 Introduce the option VFS_ALLOW_NONMPSAFE and turn it on by default on
all the architectures.
The option allows to mount non-MPSAFE filesystem. Without it, the
kernel will refuse to mount a non-MPSAFE filesytem.

This patch is part of the effort of killing non-MPSAFE filesystems
from the tree.

No MFC is expected for this patch.

Tested by:	gianni
Reviewed by:	kib
2011-11-08 10:18:07 +00:00
Ed Schouten
d745c852be Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.
This means that their use is restricted to a single C file.
2011-11-07 06:44:47 +00:00
Kirk McKusick
cd795a6e1f When unmounting a filesystem always wait for the vfs_busy lock to clear
so that if no vnodes in the filesystem are actively in use the unmount
will succeed rather than failing with EBUSY.

Reported by: Garrett Cooper
Reviewed by: Attilio Rao and Kostik Belousov
Tested by:   Garrett Cooper
PR:          kern/161016
MFC after:   3 weeks
2011-10-11 18:46:41 +00:00
Kip Macy
8451d0dd78 In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)
2011-09-16 13:58:51 +00:00
Martin Matuska
d16eac5c57 Revert r224655 and r224614 because vn_fullpath* does not always work
on nullfs mounts.

Change shall be reconsidered after 9.0 is released.

Requested by:	re (kib)
Approved by:	re (kib)
2011-08-08 14:02:08 +00:00
Martin Matuska
5388625fff The change in r224615 didn't take into account that vn_fullpath_global()
doesn't operate on locked vnode. This could cause a panic.

Fix by unlocking vnode, re-locking afterwards and verifying that it wasn't
renamed or deleted. To improve readability and reduce code size, move code
to a new static function vfs_verify_global_path().

In addition, fix missing giant unlock in unmount().

Reported by:	David Wolfskill <david@catwhisker.org>
Reviewed by:	kib
Approved by:	re (bz)
MFC after:	2 weeks
2011-08-05 11:12:50 +00:00
Martin Matuska
f6c1d63e47 For mount, discover f_mntonname from supplied path argument
using vn_fullpath_global(). This fixes f_mntonname if mounting
inside chroot, jail or with relative path as argument.

For unmount in jail, use vn_fullpath_global() to discover
global path from supplied path argument. This fixes unmount in jail.

Reviewed by:	pjd, kib
Approved by:	re (kib)
MFC after:	2 weeks
2011-08-02 19:13:56 +00:00
Kirk McKusick
6beb3bb4eb This update changes the mnt_flag field in the mount structure from
32 bits to 64 bits and eliminates the unused mnt_xflag field.  The
existing mnt_flag field is completely out of bits, so this update
gives us room to expand. Note that the f_flags field in the statfs
structure is already 64 bits, so the expanded mnt_flag field can
be exported without having to make any changes in the statfs structure.

Approved by: re (bz)
2011-07-24 17:43:09 +00:00
Andrey V. Elsukov
fef7c585de Include sys/sbuf.h directly. 2011-07-11 05:17:46 +00:00
Matthew D Fleming
3d08a76bbc Use a name instead of a magic number for kern_yield(9) when the priority
should not change.  Fetch the td_user_pri under the thread lock.  This
is probably not necessary but a magic number also seems preferable to
knowing the implementation details here.

Requested by:	Jason Behmer < jason DOT behmer AT isilon DOT com >
2011-05-13 05:27:58 +00:00
Jaakko Heinonen
1b0fe69dc9 Utilize vfs_sanitizeopts() in vfs_mergeopts() to merge options. Because
vfs_sanitizeopts() can handle "ro" and "rw" options properly, there is
no more need to add "noro" in vfs_donmount() to cancel "ro".

This also fixes a problem of canceling options beginning with "no".
For example, "noatime" didn't cancel "nonoatime". Thus it was possible
that both "noatime" and "nonoatime" were active at the same time.

Reviewed by:	bde
2011-04-22 07:26:09 +00:00
Jaakko Heinonen
9dc6abbd8a Fix some style issues in r219925.
Reported by:	bde
MFC after:	1 month
2011-03-26 17:17:24 +00:00
Jaakko Heinonen
3fd8fe5b54 Recognize "ro", "rdonly", "norw", "rw" and "noro" as equal options in
vfs_equalopts(). This allows vfs_sanitizeopts() to filter redundant
occurrences of these options. It was possible that for example both "ro"
and "rw" options became active concurrently.

PR:		kern/133614
Discussed on:	freebsd-hackers
MFC after:	1 month
2011-03-23 17:56:38 +00:00
Jaakko Heinonen
da2e368f70 Don't restore old mount options and flags if VFS_MOUNT(9) succeeds but
vfs_export() fails. Restoring old options and flags after successful
VFS_MOUNT(9) call may cause the file system internal state to become
inconsistent with mount options and flags. Specifically the FFS super
block fs_ronly field and the MNT_RDONLY flag may get out of sync.

PR:		kern/133614
Discussed on:	freebsd-hackers
2011-02-19 14:27:14 +00:00
Matthew D Fleming
e7ceb1e99b Based on discussions on the svn-src mailing list, rework r218195:
- entirely eliminate some calls to uio_yeild() as being unnecessary,
   such as in a sysctl handler.

 - move should_yield() and maybe_yield() to kern_synch.c and move the
   prototypes from sys/uio.h to sys/proc.h

 - add a slightly more generic kern_yield() that can replace the
   functionality of uio_yield().

 - replace source uses of uio_yield() with the functional equivalent,
   or in some cases do not change the thread priority when switching.

 - fix a logic inversion bug in vlrureclaim(), pointed out by bde@.

 - instead of using the per-cpu last switched ticks, use a per thread
   variable for should_yield().  With PREEMPTION, the only reasonable
   use of this is to determine if a lock has been held a long time and
   relinquish it.  Without PREEMPTION, this is essentially the same as
   the per-cpu variable.
2011-02-08 00:16:36 +00:00
Matthew D Fleming
08b163fa51 Put the general logic for being a CPU hog into a new function
should_yield().  Use this in various places.  Encapsulate the common
case of check-and-yield into a new function maybe_yield().

Change several checks for a magic number of iterations to use
should_yield() instead.

MFC after:	1 week
2011-02-02 16:35:10 +00:00
Jaakko Heinonen
1a4fbae871 Replace spaces with tabs. 2011-01-24 17:08:26 +00:00
Sergey Kandaurov
f03749ca2d Update MNT_ROOTFS comments after changes in the root mount logic.
Reported by:	arundel
Suggested by:	marcel (phrasing)
Approved by:	kib (mentor)
2010-11-23 13:49:15 +00:00
Marcel Moolenaar
c1f0aabb9f In vfs_filteropt(), only print the errmsg when there's no errmsg
mount option. Otherwise errors tend to get printed multiple times.
2010-10-18 04:34:42 +00:00
Konstantin Belousov
d0cc54f3b4 The r184588 changed the layout of struct export_args, causing an ABI
breakage for old mount(2) syscall, since most struct <filesystem>_args
embed export_args. The mount(2) is supposed to provide ABI
compatibility for pre-nmount mount(8) binaries, so restore ABI to
pre-r184588.

Requested and reviewed by:	bde
MFC after:    2 weeks
2010-10-10 07:05:47 +00:00
Marcel Moolenaar
24e01f5998 Split the root mount logic from the (generic) mount code and move
it (the root mount code) into a new file called vfs_mountroot.c

The split is almost trivial, as the code is almost perfectly
non-intertwined. The only adjustment needed was to move the UMA
zone allocation out of vfs_mountroot() [in vfs_mountroot.c] and
into vfs_mount.c, where it had to be done as a SYSINIT [see
vfs_mount_init()].

There are no functional changes with this commit.
2010-10-02 19:44:13 +00:00
Konstantin Belousov
9a24dc0760 Protect mnt_syncer with the sync_mtx. This prevents a (rare) vnode leak
when mount and update are executed in parallel.

Encapsulate syncer vnode deallocation into the helper function
vfs_deallocate_syncvnode(), to not externalize sync_mtx from vfs_subr.c.

Found and reviewed by:	jh (previous version of the patch)
Tested by:	pho
MFC after:	3 weeks
2010-09-11 13:06:06 +00:00
Pawel Jakub Dawidek
4946fa6791 Remove VI_MOUNT flag from vnode on VFS_MOUNT() failure. 2010-09-09 07:55:13 +00:00
Pawel Jakub Dawidek
7443b79b81 Doing first mount and updating mount points are both handled by the same
syscall and the same function, but are very different and share almost no code.
To make it easier to read and analyze, split vfs_domount() into
vfs_domount_first() and vfs_domount_update().

Reviewed by:	kib
2010-09-08 21:00:53 +00:00
Pawel Jakub Dawidek
a34512e3f0 - Log all the problems in devfs_fixup().
- Correct error paths. The system will be useless on devfs_fixup() failure, so
  why bother?  Maybe for the same reason why a dead body is washed and dressed
  in a nice suit before it is put into a coffin? Maybe system's last will is to
  panic without any locks held?

Reviewed by:	kib
2010-09-08 20:56:18 +00:00
Pawel Jakub Dawidek
c87f1ad43c There is a bug in vfs_allocate_syncvnode() failure handling in mount code.
Actually it is hard to properly handle such a failure, especially in MNT_UPDATE
case. The only reason for the vfs_allocate_syncvnode() function to fail is
getnewvnode() failure. Fortunately it is impossible for current implementation
of getnewvnode() to fail, so we can assert this and make
vfs_allocate_syncvnode() void. This in turn free us from handling its failures
in the mount code.

Reviewed by:	kib
MFC after:	1 month
2010-08-28 08:57:15 +00:00
Pawel Jakub Dawidek
d779d44313 - Reduce scope of vnode lock. vfs_mount_alloc() doesn't need vnode to be
locked.
- Remove code duplication.
2010-02-18 22:22:45 +00:00
Antoine Brodin
13e403fdea (S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.

PR:		137213
Submitted by:	Eygene Ryabinkin (initial version)
MFC after:	1 month
2009-12-28 22:56:30 +00:00
Attilio Rao
d113304956 Add the possibility for vfs.root.mountfrom tunable to accept a list of
items rather than a single one. The list is a space separated collection
of items defined as the current one accepted.

While there fix also a nit in a comment.

Obtained from:	Sandvine Incorporated
Reviewed by:	emaste
Tested by:	Giovanni Trematerra
		<giovanni dot trematerra at gmail dot com>
Sponsored by:	Sandvine Incorporated
MFC:		2 weeks
2009-11-12 15:59:05 +00:00
Edward Tomasz Napierala
7ba889b8f4 Add suggestion for zfs root. 2009-11-08 09:54:25 +00:00
John Baldwin
87eca70e0c Fix some LORs between vnode locks and filedescriptor table locks.
- Don't grab the filedesc lock just to read fd_cmask.
- Drop vnode locks earlier when mounting the root filesystem and before
  sanitizing stdin/out/err file descriptors during execve().

Submitted by:	kib
Approved by:	re (rwatson)
MFC after:	1 week
2009-07-31 13:40:06 +00:00
Robert Watson
791b0ad2bf Eliminate ARG_UPATH[12] arguments to AUDIT_ARG_UPATH() and instead
provide specific macros, AUDIT_ARG_UPATH1() and AUDIT_ARG_UPATH2()
to capture path information for audit records.  This allows us to
move the definitions of ARG_* out of the public audit header file,
as they are an implementation detail of our current kernel-internal
audit record, which may change.

Approved by:	re (kensmith)
Obtained from:	TrustedBSD Project
MFC after:	1 month
2009-07-29 07:44:43 +00:00
Robert Watson
6d5a61563a When auditing unmount(2), capture FSID arguments as regular text strings
rather than as paths, which would lead to them being treated as relative
pathnames and hence confusingly converted into absolute pathnames.

Capture flags to unmount(2) via an argument token.

Approved by:	re (audit argument blanket)
MFC after:	3 days
2009-07-01 16:56:56 +00:00
Robert Watson
14961ba789 Replace AUDIT_ARG() with variable argument macros with a set more more
specific macros for each audit argument type.  This makes it easier to
follow call-graphs, especially for automated analysis tools (such as
fxr).

In MFC, we should leave the existing AUDIT_ARG() macros as they may be
used by third-party kernel modules.

Suggested by:	brooks
Approved by:	re (kib)
Obtained from:	TrustedBSD Project
MFC after:	1 week
2009-06-27 13:58:44 +00:00
Robert Watson
bcf11e8d00 Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with:	pjd
2009-06-05 14:55:22 +00:00
Craig Rodrigues
0c349f0856 sys/boot/common.c
=================
Extend the loader to parse the root file system mount options in /etc/fstab,
and set a new loader variable vfs.root.mountfrom.options with these options.
The root mount options must be a comma-delimited string, as specified in
/etc/fstab.
Only set the vfs.root.mountfrom.options variable if it has not been
set in the environment.

sys/kern/vfs_mount.c
====================
When mounting the root file system, pass the mount options
specified in vfs.root.mountfrom.options, but filter out "rw" and "noro",
since the initial mount of the root file system must be done as "ro".
While we are here, try to add a few hints to the mountroot prompt
to give users and idea what might of gone wrong during mounting
of the root file system.

Reviewed by:	jhb (an earlier patch)
2009-06-01 01:02:30 +00:00