Commit Graph

442 Commits

Author SHA1 Message Date
David E. O'Brien
6e6049e9df Add freebsd32 compat shim for nmount(2).
(and quiet some compiler warnings for vfs_donmount)
2008-09-19 15:17:32 +00:00
Simon L. B. Nielsen
59ca51adba - Fix amd64 local privilege escalation. [08:07]
- Fix nmount(2) local privilege escalation. [08:08]
- Fix IPv6 remote kernel panics. [08:09]

Fix for [08:07] is merge of r181823.

Submitted by:	kib [08:07], csjp [08:08], bz [08:09]
Reviewed by:	peter [08:07], jhb [08:07]
Reviewed by:	jinmei [08:09], rwatson [08:09]
Approved by:	re (SA blanket)
Approved by:	so (simon)
Security:	FreeBSD-SA-08:07.amd64
Security:	FreeBSD-SA-08:08.nmount
Security:	FreeBSD-SA-08:09.icmp6
2008-09-03 19:09:47 +00:00
Attilio Rao
59d4932531 Decontextualize vfs_busy(), vfs_unbusy() and vfs_mount_alloc() functions.
Manpages are updated accordingly.

Tested by:	Diego Sardina <siarodx at gmail dot com>
2008-08-31 14:26:08 +00:00
Attilio Rao
0359a12ead Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2008-08-28 15:23:18 +00:00
Craig Rodrigues
d5bdb2f68d In nmount(), when we see the "force" option,
set the MNT_FORCE flag, but do not persist "force"
in the options list, since it is a command, not a persistent property
of a mount.

Similarly, when we see "reload", set MNT_RELOAD,
but delete "reload" from the options list.

MFC after:	1 week
2008-08-23 01:16:09 +00:00
Konstantin Belousov
e792b09be2 Revert r181345.
Move the NULL pointer check to the vfs_deleteopt() function.

Discussed with:	rodrigc
MFC after:	3 days
2008-08-10 12:15:36 +00:00
Dag-Erling Smørgrav
2616144e43 Add sbuf_new_auto as a shortcut for the very common case of creating a
completely dynamic sbuf.

Obtained from:	Varnish
MFC after:	2 weeks
2008-08-09 11:14:05 +00:00
Craig Rodrigues
1aad294b4e In nmount(), if we see "update" in the mount options,
set MNT_UPDATE in fsflags, and delete the
"update" option from the global mount options.

MNT_UPDATE is a command, and not a property of a mount
that should persist after the command is executed.

We need to do similar things for MNT_FORCE and MNT_RELOAD.

All mount flags are prefixed by MNT_..... it would
be nice if flags which were commands were named differently
from flags which are persistent properties of a mount.
This was not such a big deal in the pre-nmount() days,
but with nmount() it is more important.

Requested by:	yar
MFC after:	2 weeks
2008-07-12 20:12:40 +00:00
Konstantin Belousov
a70537835f Provide the mutual exclusion between the nfs export list modifications
and nfs requests processing. Lockmgr lock provides the shared locking for
nfs requests, while exclusive mode is used for modifications. The writer
starvation is handled by lockmgr too.

Reported by:	kris, pho, many
Based on the submission by:	mohan
Tested by:	pho
MFC after:	2 weeks
2008-06-09 10:31:38 +00:00
Wojciech A. Koszek
2e75877f12 Remove checks against DDB, which isn't used in this file.
My intention is to bring no functional change.

Discussion on:	IRC
Reviewed by:	ed, kan, rink,
2008-06-08 20:43:27 +00:00
Craig Rodrigues
a9722ace80 Do not convert the "snapshot" string to the MNT_SNAPSHOT flag here, since
we do it further down in ffs_vfsops.c

MFC after:	1 month
2008-05-23 23:33:07 +00:00
Roman Divacky
d6891277a4 Lock filedesc exclusively when modifying fd_[cr]dir.
This is probably harmless but it's better to lock it
correctly.

Approved by:	kib (mentor)
2008-04-29 21:40:11 +00:00
Poul-Henning Kamp
9b4a8ab7ba Now that all platforms use genclock, shuffle things around slightly
for better structure.

Much of this is related to <sys/clock.h>, which should really have
been called <sys/calendar.h>, but unless and until we need the name,
the repocopy can wait.

In general the kernel does not know about minutes, hours, days,
timezones, daylight savings time, leap-years and such.  All that
is theoretically a matter for userland only.

Parts of kernel code does however care: badly designed filesystems
store timestamps in local time and RTC chips almost universally
track time in a YY-MM-DD HH:MM:SS format, and sometimes in local
timezone instead of UTC.  For this we have <sys/clock.h>

<sys/time.h> on the other hand, deals with time_t, timeval, timespec
and so on.  These know only seconds and fractions thereof.

Move inittodr() and resettodr() prototypes to <sys/time.h>.
Retain the names as it is one of the few surviving PDP/VAX references.

Move startrtclock() to <machine/clock.h> on relevant platforms, it
is a MD call between machdep.c/clock.c.  Remove references to it
elsewhere.

Remove a lot of unnecessary <sys/clock.h> includes.

Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs.
XXX: should be kern.disable_rtc_set really, it's not MD.
2008-04-22 19:38:30 +00:00
Sam Leffler
00c71fb7c3 o add a mountroot event handler that fires when / is mounted; this information
was lost when root started being mounted by init
o remove SI_SUB_MOUNT_ROOT since it's no longer meaningful

MFC after:	2 weeks
2008-04-08 17:53:33 +00:00
Konstantin Belousov
57b4252e45 Add the support for the AT_FDCWD and fd-relative name lookups to the
namei(9).

Based on the submission by rdivacky,
	sponsored by Google Summer of Code 2007
Reviewed by:	rwatson, rdivacky
Tested by:	pho
2008-03-31 12:01:21 +00:00
Konstantin Belousov
1be222e9df Yield the cpu in the kernel while iterating the list of the
vnodes belonging to the mountpoint. Also, yield when in the
softdep_process_worklist() even when we are not going to sleep due to
buffer drain.

It is believed that the ULE fixed the problem [1], but the yielding
seems to be needed at least for the 4BSD case.

Discussed:	on stable@, with bde
Reviewed by:	tegge, jeff [1]
MFC after:	2 weeks
2008-03-23 13:45:24 +00:00
Yaroslav Tykhiy
c6446de05d Undo the damage I did in sys/kern/vfs_mount.c #1.274 and
sbin/mount_nfs/mount_nfs.c #1.76.  Let the dragons sleep.

Requested by:	rodrigc, des
PR:		kern/120319 (welcome the bug back)
2008-02-18 20:58:57 +00:00
Yaroslav Tykhiy
37ed722f78 Add a remark on a questionable property of vfs_mergeopts(). 2008-02-18 10:10:42 +00:00
Yaroslav Tykhiy
38a7fd05f7 In the new order of things dictated by nmount(2), a read-only mount
is to be requested via a "ro" option.  At the same time, MNT_RDONLY
is gradually becoming an indicator of the current state of the FS
instead of a command flag.  Today passing MNT_RDONLY alone to the
kernel's mount machinery will lead to various glitches.  (See the
PRs for examples.)

Therefore mount the root FS with a "ro" option instead of the
MNT_RDONLY flag.  (Note that MNT_RDONLY still is added to the mount
flags internally, by vfs_donmount(), if "ro" was specified.)

To be able to pass "ro" cleanly to kernel_vmount(), teach the latter
function to accept options with NULL values.

Also correct the comment explaining how mount_arg() handles length
of -1.

PR:		bin/106636 kern/120319
Submitted by:	Jaakko Heinonen <see PR kern/120319 for email> (originally)
2008-02-14 17:04:31 +00:00
Attilio Rao
0e9eb108f0 Cleanup lockmgr interface and exported KPI:
- Remove the "thread" argument from the lockmgr() function as it is
  always curthread now
- Axe lockcount() function as it is no longer used
- Axe LOCKMGR_ASSERT() as it is bogus really and no currently used.
  Hopefully this will be soonly replaced by something suitable for it.
- Remove the prototype for dumplockinfo() as the function is no longer
  present

Addictionally:
- Introduce a KASSERT() in lockstatus() in order to let it accept only
  curthread or NULL as they should only be passed
- Do a little bit of style(9) cleanup on lockmgr.h

KPI results heavilly broken by this change, so manpages and
FreeBSD_version will be modified accordingly by further commits.

Tested by: matteo
2008-01-24 12:34:30 +00:00
Attilio Rao
22db15c06f VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
Attilio Rao
cb05b60a89 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
Craig Rodrigues
450ea867c5 In vfs_scanopt(), make sure that the mount option value is not NULL
before calling vsscanf().

PR:		118531
Submitted by:	Jaakko Heinonen <jh saunalahti fi>
MFC after:	3 days
2007-12-31 23:44:53 +00:00
Warner Losh
b27aa20e8d A partial solution to some of the 'pull the umass device with a
mounted FS' problems.  These are more along the lines of 'avoiding an
avoidable panic' than a complete solution to removable devices.  We
now close the barn door after the horse has gotten lose and has been
hit by a truck, as it were.  The barn no longer catches fire in this
case, but the horse is still dead :-).

The vfs_bio.c fix causes us not to put a failed write back into the
dirty pool if the error returned was ENXIO.  In that case, the buffer
is treated like any other clean buffer that's being retured.  ENXIO
means the device isn't there anymore and will never be there again in
the future, so retrying is futile.

The vfs_mount.c fix treats 'ENXIO' as success for unmounting a file
system.  If the device is gone, retrying later won't help and we'll
never be able to unmount the device.

These two are part of a larger patch set submitted by the author.  The
other patches will be forth coming.  I added comments to these two
patches.

Submitted by: Henrik Gulbrandsen
Reviewed by: phk@
PR: usb/46176 (partial)
2007-12-27 16:38:28 +00:00
Craig Rodrigues
62bdb328bb In nmount(), internally convert the mount option: "rdonly" to "ro".
This makes updates mounts such as:
 "mount -u -o rdonly" work more like, "mount -u -o ro".

References to "-o rdonly" were changed to "-o ro" in revision 1.60 of
the mount(8) man page,
but some people still like to use "-o rdonly" since it was documented
in earlier versions of FreeBSD.

Requested by:	rwatson
MFC after:	1 week
2007-12-05 03:26:14 +00:00
Craig Rodrigues
b4b5bf359b In nmount(), if MNT_ROOT is in the mount flags, filter it
out instead of returning an error.
(1)  This makes the behavior consistent with mount(2).
(2)  This makes update mounts on the root file system work properly.
(3)  The explicit checks for MNT_ROOTFS in src/sbin/fsck_ffs/main.c
     and src/usr.sbin/mountd/mountd.c which were put in to
     eliminate errors during update mounts on the root file system
     can be removed.

The only place were MNT_ROOTFS can be validly set
is inside the kernel, i.e. with vfs_mountroot_try().

Reviewed by:	phk
MFC after:	3 days
2007-10-27 15:59:18 +00:00
Robert Watson
30d239bc4c Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

  mac_<object>_<method/action>
  mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme.  Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier.  Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods.  Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by:	SPARTA (original patches against Mac OS X)
Obtained from:	TrustedBSD Project, Apple Computer
2007-10-24 19:04:04 +00:00
Konstantin Belousov
245b204491 When restoring the mount after umount failed, the MNTK_UNMOUNT flag
prevents insmntque() from placing reallocated syncer vnode on mount
list, that causes panic in vfs_allocate_syncvnode().

Introduce MNTK_NOINSMNTQ flag, that marks the period when instmntque is
not allowed to success, instead of MNTK_UNMOUNT. The MNTK_NOINSMNTQ is
set and cleared simultaneously with MNTK_UNMOUNT, except on umount error
path, where it is cleaned just before the syncer vnode is going to be
allocated.

Reported by:	Peter Jeremy <peterjeremy optushome com au>
Suggested by:	tegge
Approved by:	re (rwatson)
2007-09-12 16:31:32 +00:00
John Baldwin
1dc5b1cc56 On 6.x this works:
% mount | grep home
/dev/ad4s1e on /home (ufs, local, noatime, soft-updates)
% mount -u -o atime /home
% mount | grep home
/dev/ad4s1e on /home (ufs, local, soft-updates)

Restore this behavior for on 7.x for the following mount options:
noatime, noclusterr, noclusterw, noexec, nosuid, nosymfollow

In addition, on 7.x, the following are equivalent:
mount -u -o atime /home
mount -u -o nonoatime /home

Ideally, when we introduce new mount options, we should avoid
options starting with "no". :)

Requested by:	jhb
Reported by:	Karol Kwiat <karol.kwiat gmail com>, Scott Hetzel <swhetzel gmail com>
Approved by:	re (bmah)
Proxy commit for:	rodrigc
2007-08-15 17:40:09 +00:00
Pawel Jakub Dawidek
68c1a246ae The v_mountedhere field is protected by the vnode lock, not vnode's internal
lock.

Approved by:	re (rwatson)
2007-07-26 16:52:57 +00:00
Craig Rodrigues
d7f81adbd4 Revert previous commits which I committed by mistake.
Approved by:	re (implicit)
Pointy hat to:	me
2007-07-14 21:23:31 +00:00
Craig Rodrigues
d678780e60 The last entry in the ext2_opts array must be NULL,
otherwise the kernel with crash in vfs_filteropt() if an invalid
mount option is passed to ext2fs.

Approved by:	re (kensmith)
2007-07-14 21:18:19 +00:00
Robert Watson
32f9753cfb Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths.  Do, however, move those prototypes to priv.h.

Reviewed by:	csjp
Obtained from:	TrustedBSD Project
2007-06-12 00:12:01 +00:00
Konstantin Belousov
e5ea32c290 Allow the dounmount() to proceed even for doomed coveredvp.
In dounmount(), before or while vn_lock(coveredvp) is called, coveredvp
vnode may be VI_DOOMED due to one of the following:
- other thread finished unmount and vput()ed it, and vnode was chosen
  for recycling, while vn_lock() slept;
- forced unmount of the coveredvp->v_mount fs.
In the first case, next check for changed v_mountedhere or mnt_gen counter
would be successfull. In the second case, the unmount shall be allowed.

Submitted by:	sobomax
MFC after:	2 weeks
2007-04-26 08:56:56 +00:00
Pawel Jakub Dawidek
7760d8409f Export vfs_mount_alloc() as it is used in ZFS. 2007-04-17 21:14:06 +00:00
Pawel Jakub Dawidek
24b0502ee0 Fix jails and jail-friendly file systems handling:
- We need to allow for PRIV_VFS_MOUNT_OWNER inside a jail.
- Move security checks to vfs_suser() and deny unmounting and updating
  for jailed root from different jails, etc.

OK'ed by:	rwatson
2007-04-13 23:54:22 +00:00
Nate Lawson
a363f67a81 Restore the locking for the sleep/wakeup to avoid waiting an extra 1 sec
if a race was lost.  We're still single-threaded at this point, but just
be safe for the future.
2007-04-09 21:10:04 +00:00
Nate Lawson
6b1e469ea5 Clean up the root mount and mount wait code. No mutexes are needed here
since a spurious wakeup() is the only possible outcome and this is fine in
the BSD programming model.
2007-04-09 19:23:52 +00:00
Pawel Jakub Dawidek
2eb68d493f Add root_mounted() function that returns true if the root file system is
already mounted.
2007-04-08 23:54:01 +00:00
Pawel Jakub Dawidek
f3a8d2f93c Add security.jail.mount_allowed sysctl, which allows to mount and
unmount jail-friendly file systems from within a jail.
Precisely it grants PRIV_VFS_MOUNT, PRIV_VFS_UNMOUNT and
PRIV_VFS_MOUNT_NONUSER privileges for a jailed super-user.
It is turned off by default.

A jail-friendly file system is a file system which driver registers
itself with VFCF_JAIL flag via VFS_SET(9) API.
The lsvfs(1) command can be used to see which file systems are
jail-friendly ones.

There currently no jail-friendly file systems, ZFS will be the first one.
In the future we may consider marking file systems like nullfs as
jail-friendly.

Reviewed by:	rwatson
2007-04-05 21:03:05 +00:00
Robert Watson
5e3f7694b1 Replace custom file descriptor array sleep lock constructed using a mutex
and flags with an sxlock.  This leads to a significant and measurable
performance improvement as a result of access to shared locking for
frequent lookup operations, reduced general overhead, and reduced overhead
in the event of contention.  All of these are imported for threaded
applications where simultaneous access to a shared file descriptor array
occurs frequently.  Kris has reported 2x-4x transaction rate improvements
on 8-core MySQL benchmarks; smaller improvements can be expected for many
workloads as a result of reduced overhead.

- Generally eliminate the distinction between "fast" and regular
  acquisisition of the filedesc lock; the plan is that they will now all
  be fast.  Change all locking instances to either shared or exclusive
  locks.

- Correct a bug (pointed out by kib) in fdfree() where previously msleep()
  was called without the mutex held; sx_sleep() is now always called with
  the sxlock held exclusively.

- Universally hold the struct file lock over changes to struct file,
  rather than the filedesc lock or no lock.  Always update the f_ops
  field last. A further memory barrier is required here in the future
  (discussed with jhb).

- Improve locking and reference management in linux_at(), which fails to
  properly acquire vnode references before using vnode pointers.  Annotate
  improper use of vn_fullpath(), which will be replaced at a future date.

In fcntl(), we conservatively acquire an exclusive lock, even though in
some cases a shared lock may be sufficient, which should be revisited.
The dropping of the filedesc lock in fdgrowtable() is no longer required
as the sxlock can be held over the sleep operation; we should consider
removing that (pointed out by attilio).

Tested by:	kris
Discussed with:	jhb, kris, attilio, jeff
2007-04-04 09:11:34 +00:00
Pawel Jakub Dawidek
afd894bb12 Add root_mount_wait() function which can be used to wait until the root
file system is mounted. This is useful for kernel modules loaded from
/boot/loader.conf, that have to access file system.
2007-04-03 11:45:28 +00:00
Pawel Jakub Dawidek
5c1c2e82e2 I think the code I'm removing here is completely bogus.
vfs_flags field is used for VFCF_* flags which are given at file system
driver creation time (via VFS_SET(9)) macro.

What this code did was bascially this:

If file system registers itself with VFCF_UNICODE flag (stores file names
as Unicode), it will gain MNT_SOFTDEP flag (UFS soft-updates).

If file system registers itself with VFCF_LOOPBACK flag (aliases some other
mounted FS), it will gain MNT_SUIDDIR flag (special handling of SUID on
dirs).

The latter will be quite dangerous, but those flags are reset later in
vfs_domount().

MFC after:	1 month
2007-04-01 13:08:05 +00:00
Pawel Jakub Dawidek
695919ad9a Make vfs_mount_destroy() and vfs_freeopts() non-static, I'd like to use them. 2007-03-31 22:44:45 +00:00
Pawel Jakub Dawidek
9a2fd584b4 Don't deny unmounting file systems for jailed processes immediately, allow
prison_priv_check() to decide what to do.

This change is suppose not to change current (security) behaviour
in any way.

This change is simlar to the change of PRIV_VFS_MOUNT in previous revision.
2007-03-18 02:39:19 +00:00
Pawel Jakub Dawidek
7533652025 Don't deny mounting for jailed processes immediately, allow
prison_priv_check() to decide what to do.

This change is suppose not to change current (security) behaviour
in any way.

Reviewed by:	rwatson
2007-03-14 13:09:59 +00:00
Pawel Jakub Dawidek
f7d4e990c7 White space nits. 2007-03-14 12:54:10 +00:00
Robert Watson
873fbcd776 Further system call comment cleanup:
- Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde)
- Remove extra blank lines in some cases.
- Add extra blank lines in some cases.
- Remove no-op comments consisting solely of the function name, the word
  "syscall", or the system call name.
- Add punctuation.
- Re-wrap some comments.
2007-03-05 13:10:58 +00:00
Olivier Houchard
38cc2a5caa Make vfs_getopts() set *error to ENOENT if the option wasn't found, so that
consumers don't have to check for both error and the return value (some of
them actually don't do it).

MFC After:	1 week
2007-02-13 01:28:48 +00:00
Craig Rodrigues
2892f3bbfa Add a function vfs_deleteopt() which searches through the vfsoptlist
linked list of mount options by name, and deletes the option if it finds it.
2006-12-16 15:44:03 +00:00
Robert Watson
acd3428b7d Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
Robert Watson
aed5570872 Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h.  sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA
2006-10-22 11:52:19 +00:00
Konstantin Belousov
30af71199e Fix the remaining race in the revs. 1.232, 1,233 that could occur during
unmount when mp structure is reused while waiting for coveredvp lock.
Introduce struct mount generation count, increment it on each reuse and
compare the generations before and after obtaining the coveredvp lock.

Reviewed by:	tegge, pjd
Approved by:	pjd (mentor)
MFC after:	2 weeks
2006-10-03 10:47:04 +00:00
Poul-Henning Kamp
f645b0b51c First part of a little cleanup in the calendar/timezone/RTC handling.
Move relevant variables to <sys/clock.h> and fix #includes as necessary.

Use libkern's much more time- & spamce-efficient BCD routines.
2006-10-02 12:59:59 +00:00
Tor Egge
e60c361218 Reduce fluctuations of mnt_flag to allow unlocked readers to get a
slightly more consistent view.
2006-09-26 04:20:09 +00:00
Tor Egge
fba924ce9b Don't restore MNT_QUOTA bit in mnt_flag after a failed mount with
MNT_UPDATE flag, closing a race between nmount() and quotactl().
2006-09-26 04:18:36 +00:00
Tor Egge
a1e363f256 Add mnt_noasync counter to better handle interleaved calls to nmount(),
sync() and sync_fsync() without losing MNT_ASYNC.  Add MNTK_ASYNC flag
which is set only when MNT_ASYNC is set and mnt_noasync is zero, and
check that flag instead of MNT_ASYNC before initiating async io.
2006-09-26 04:15:59 +00:00
Tor Egge
cea9d840d8 Don't restore mnt_kern_flag on failed MNT_UPDATE mount, it can race
with dounmount(), causing loss of MNTK_UNMOUNT flag.
2006-09-26 04:15:04 +00:00
Tor Egge
5da56ddb21 Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag.
This eliminates a race where MNT_UPDATE flag could be lost when nmount()
raced against sync(), sync_fsync() or quotactl().
2006-09-26 04:12:49 +00:00
Konstantin Belousov
f37e633887 Fix the bug in rev. 1.232. If vfs_suser returned false, coveredvp shall be
unlocked only if it really exists.

Found with:	Coverity Prevent(tm)
CID:	1535
Approved by:	pjd (mentor)
2006-09-19 14:04:12 +00:00
Konstantin Belousov
4dec8579bd Fix the race while waiting for coveredvp lock during unmount. The vnode may
be recycled during the sleep, wrap the vn_lock with vhold/vdrop.
Check that coveredvp still points to the same mp after sleep (needed
because sleep dropped Giant).
Move check for user rights for unmount after coveredvp lock is obtained.

Tested by:	Peter Holm
Reviewed by:	tegge
Approved by:	kan (mentor)
MFC after:	2 weeks
2006-09-18 15:35:22 +00:00
Marius Strobl
aed760ef8a Fix another bug introduced with rev. 1.204; in vfs_donmount() if
the 'vfs_getopt(optlist, "errmsg", (void **)&errmsg, &errmsg_len)'
call fails, 'errmsg' is left uninitialized, making the later tests
against NULL meaningless, and the uses bogus. Thus initialize
'errmsg' to NULL beforehand. [1]
While at it, remove the superfluous assignment of 0 to 'errmsg_len'
if the above mentioned call fails as it's already initialized to 0.

Submitted by:	Michael Plass [1]
2006-08-26 16:28:19 +00:00
Pawel Jakub Dawidek
bebabf24bb Fix comment. 2006-08-25 15:13:49 +00:00
Marius Strobl
3a30d178fe Fix a bug introduced with rev. 1.204; in vfs_donmount() use
copyout(9) instead of copystr(9) for copying the errmsg from
kernel- to user-space. This fixes a panic on sparc64 when
using the nmount(2)-converted mountd(8).
While at it, use bcopy(3) instead of strncpy(3) in the kernel-
to kernel-space case for consistency with vfs_buildopts() and
between kernel- to user-space and kernel- to kernel-space case.
2006-08-24 18:52:28 +00:00
John Baldwin
597d608f86 - Expand the scope of Giant some in mount(2) to protect the vfsp structure
from going away.  mount(2) is now MPSAFE.
- Expand the scope of Giant some in unmount(2) to protect the mp structure
  (or rather, to handle concurrent unmount races) from going away.
  umount(2) is now MPSAFE, as well as linux_umount() and linux_oldumount().
- nmount(2) and linux_mount() were already MPSAFE.
2006-06-27 14:46:31 +00:00
Robert Watson
7ebfc8df78 Audit some arguments to nmount(), mount(), umount().
Submitted by:	wsalamon
Obtained from:	TrustedBSD Project
2006-06-05 15:32:07 +00:00
Pawel Jakub Dawidek
1f58dd4956 Fix a problem introduced in revision 1.220. On mount(2) failure, don't
forget to unbusy file system before its destruction.

This fixes the following warning on mount failure:

	Mount point <X> had 1 dangling refs

Tested by:	wkoszek
2006-06-02 20:29:02 +00:00
Craig Rodrigues
0c89bb0a02 Add "update" mount option to global_opts array,
for use with vfs_filteropt().
2006-05-26 02:38:48 +00:00
Craig Rodrigues
5eb304a91a Remove calls to vfs_export() for exporting a filesystem for NFS mounting
from individual filesystems.  Call it instead in vfs_mount.c,
after we call VFS_MOUNT() for a specific filesystem.
2006-05-26 00:32:21 +00:00
Kelly Yancey
c9ad8a67af Restore the ability to mount procfs and fdescfs filesystems via the
mount(2) system call:

  * Add cmount hook to fdescfs and pseudofs (and, by extension, procfs and
    linprocfs).  This (mostly) restores the ability to mount these
    filesystems using the old mount(2) system call (see below for the
    rest of the fix).

  * Remove not-NULL check for the data argument from the mount(2) entry
    point.  Per the mount(2) man page, it is up to the individual
    filesystem being mounted to verify data.  Or, in the case of procfs,
    etc. the filesystem is free to ignore the data parameter if it does
    not use it.  Enforcing data to be not-NULL in the mount(2) system call
    entry point prevented passing NULL to filesystems which ignored the
    data pointer value.  Apparently, passing NULL was common practice
    in such cases, as even our own mount_std(8) used to do it in the
    pre-nmount(2) world.

All userland programs in the tree were converted to nmount(2) long ago,
but I've found at least one external program which broke due to this
(presumably unintentional) mount(2) API change.  One could argue that
external programs should also be converted to nmount(2), but then there
isn't much point in keeping the mount(2) interface for backward
compatibility if it isn't backward compatible.
2006-05-15 19:42:10 +00:00
Craig Rodrigues
5250012a1d For nmount(), if "rw" is specified as a mount option,
add "noro" to the list of mount options.  This allows
a read-only mount to be converted to read-write via:
mount -u -o rw

Requested by:	kris
2006-05-14 01:51:38 +00:00
Jeff Roberson
ba5eb429e3 - When there are dangling vnodes at unmount print them before we panic.
Sponsored by:	Isilon Systems, Inc.
2006-03-31 23:38:15 +00:00
Jeff Roberson
a218edceb2 - Allocate mounts from a uma zone that uses UMA_ZONE_NOFREE to prevent
mount memory from being reclaimed.  This resolves a number of race
   conditions described in vfs_default.c and introduced with the
   VFS_LOCK_GIANT macros.
 - Let the mtx and lock remain valid after the mount structure has been
   freed by using init and fini calls.  Technically fini will never be
   called but is included for completeness.
 - Consistently use lockmgr directly rather than lockmgr to lock and
   vfs_unbusy to unlock.

Discussed with:	tegge
Tested by:	kris
Sponsored by:	Isilon Systems, Inc.
2006-03-31 03:49:51 +00:00
Ruslan Ermilov
936ddefcd6 The mount(8) manpage says: "In case of conflicting options being
specified, the rightmost option takes effect."  Fix code to obey
this.  This makes e.g. "mount -r /usr" or "mount -ar" actually
mount file systems read-only.
2006-03-13 14:58:37 +00:00
Tor Egge
791dd2fade Use vn_start_secondary_write() and vn_finished_secondary_write() as a
replacement for vn_write_suspend_wait() to better account for secondary write
processing.

Close race where secondary writes could be started after ffs_sync() returned
but before the file system was marked as suspended.

Detect if secondary writes or softdep processing occurred during vnode sync
loop in ffs_sync() and retry the loop if needed.
2006-03-08 23:43:39 +00:00
Jeff Roberson
a4aeaefe5a - We can not hold a vnode lock while we do a lookup. Search for and load
modules prior to looking up the directory which we will cover to avoid
   this problem in mount.
 - We must hold the coveredvp locked before we can busy the mountpoint to
   prevent a lock order reversal with the vfs_busy() in lookup which holds
   the directory lock prior to doing a vfs_busy().  The directory lock is
   required to safely clear the v_mountedhere field on the directory.

MFC After:	1 week
2006-02-22 06:29:55 +00:00
Jeff Roberson
04f6d3effa - Add a ref count to the mount structure. Sleep for up to 3 seconds in
vfs_mount_destroy waiting for this ref to hit 0.  We don't print an
   error if we are rebooting as the root mount always retains some refernces
   by init proc.
 - Acquire a mnt ref for every vnode allocated to a mount point.  Drop this
   ref only once vdestroy() has been called and the mount has been freed.
 - No longer NULL the v_mount pointer in delmntque() so that we may release
   the ref after vgone() has been called.  This allows us to guarantee
   that the mount point structure will be valid until the last vnode has
   lost its last ref.
 - Fix a few places that rely on checking v_mount to detect recycling.

Sponsored by:	Isilon Systems, Inc.
MFC After:	1 week
2006-02-06 10:19:50 +00:00
Suleiman Souhlal
c270875f7c Don't try to load KLDs if we're mounting the root. We'd otherwise panic.
Tested by:	kris
MFC after:	3 days
2006-01-28 22:58:39 +00:00
Christian S.J. Peron
323203d389 vfs_busy can only return something useful if MNTK_UNMOUNT has been set.
Since we are using vfs_busy() on a freshly allocated mount structure, use
(void) to show that we do not care about the return value.

Found with:	Coverity Prevent (tm)
MFC after:	2 weeks
2006-01-15 20:14:11 +00:00
Robert Watson
6994eebcab Cast VFS_STATFS() in vfs_domount() to (void) to indicate that ignoring the
return value is intentional: this is simply an attempt to pre-cache the
statfs state.

Found with:	Coverity Prevent (tm)
MFC after:	3 days
2006-01-15 20:01:05 +00:00
Ruslan Ermilov
6a61c14ee1 AMD64 also supports disk slices. 2006-01-14 20:47:11 +00:00
Tor Egge
82be0a5a24 Add marker vnodes to ensure that all vnodes associated with the mount point are
iterated over when using MNT_VNODE_FOREACH.

Reviewed by:	truckman
2006-01-09 20:42:19 +00:00
Pawel Jakub Dawidek
ade9b797a0 vfs_mount_alloc() always returns 0, but what we really want is newly
allocated 'struct mount *' pointer, so simplify code a bit and return
the pointer directly.

Reviewed by:	ssouhlal
2005-12-20 00:43:51 +00:00
Pawel Jakub Dawidek
003ba8a000 Use 'td' instead of 'curthread'. 2005-12-19 16:27:13 +00:00
Craig Rodrigues
d5989f64cf In devfs_first(), set mp->mnt_opt to a valid empty list of mount options
instead of leaving it NULL.  This eliminates a kernel panic
when trying to do a mount -o update of /dev.

Noticed by:	cjsp
Reviewed by:	phk
2005-12-08 04:27:53 +00:00
Craig Rodrigues
8539ca4cde Add "errmsg" to list of global mount options. 2005-12-08 04:09:29 +00:00
Craig Rodrigues
1245b3433e Add "rdonly" to global_opts, and parse it in vfs_donmount().
Requested by:	rwatson
2005-12-03 12:04:20 +00:00
Craig Rodrigues
ec528a3472 - Add "rw" mount option to global_opts.
- In vfs_donmount(), parse "ro", "noro", and "rw", in order to set or
  unset the MNT_RDONLY filesystem flag.
2005-12-03 01:26:27 +00:00
Craig Rodrigues
5e6b93a014 In nmount() and vfs_donmount(), do not strcmp() the options in the iovec
directly.  We need to copyin() the strings in the iovec before
we can strcmp() them.  Also, when we want to send the errmsg back
to userspace, we need to copyout()/copystr() the string.

Add a small helper function vfs_getopt_pos() which takes in the
name of an option, and returns the array index of the name in the iovec,
or -1 if not found.  This allows us to locate an option in
the iovec without actually manipulating the iovec members. directly via
strcmp().

Noticed by:	kris on sparc64
2005-11-23 20:51:15 +00:00
Marcel Moolenaar
60b7823989 Fix bug introduced in revision 1.186:
When all file systems have a time stamp of zero, which is the case
for example when the root file system is on a read-only medium, we
ended up not calling inittodr() at all.  A potential uncleanliness
existed as well. If multiple file systems had a non-zero time stamp,
we would call inittodr() multiple times. While this should not be
harmful, it's definitely not ideal.
Fix both issues by iterating over the mounted file systems to find
the largest time stamp and call inittodr() exactly once with that
time stamp. This could of course be a zero time stamp if none of the
mounted file systems have a non-zero time stamp. In that case the
annoying errors mentioned in the commit log for revision 1.186 still
haven't been avoided. The bottom line is that inittodr() should not
complain when it gets a time base of zero. At the time of this
commit only alpha seems to have that problem.

Reported by: Dario Freni (saturnero at freesbie dot org)
MFC after: 1 week
2005-11-19 21:51:45 +00:00
Craig Rodrigues
425e5b6268 Parse more mount options in vfs_donmount(), before vfs_domount()
is called.  It looks like there are lots of different mount flags checked
in vfs_domount(), so we need to do the parsing for these particular
mount flags earlier on.  The new flags parsed are:
async, force, multilabel, noasync, noatime, noclusterr, noclusterw,
noexec, nosuid, nosymfollow, snapshot, suiddir, sync, union.

Existing code which uses mount() to mount UFS filesystems is not
affected, but new code which uses nmount() to mount UFS filesystems
should behave better.
2005-11-19 21:22:21 +00:00
Craig Rodrigues
8fd860cfa1 In vfs_nmount(), check to see if "update" mount option was passed
in, and if so, set MNT_UPDATE filesystem flag.
vfs_nmount() calls vfs_domount(), and there is special logic
inside vfs_domount() if MNT_UPDATE is set.  This is very important
when we want to do an update mount of the root filesystem, using nmount().
2005-11-18 01:31:10 +00:00
Craig Rodrigues
d5328381f1 style(9) cleanups.
Spotted by:	njl, bde
2005-11-12 14:41:44 +00:00
Craig Rodrigues
4560dfb5b1 For nmount(), allow a text string error message to be propagated back
to user-space if a parameter named "errmsg" is passed into the iovec.
Used in conjunction with vfs_mount_error(), more useful error messages
than errno can be passed back to userspace when mounting a filesystem
fails.

Discussed with:		phk, pjd
2005-11-09 02:26:38 +00:00
Craig Rodrigues
84e69560b6 Add utility function to propagate mount errors as text string messages.
Discussed with:		phk
2005-11-08 04:13:39 +00:00
Suleiman Souhlal
2611e5a6a9 Don't unbusy the devfs mount in vfs_mountroot_try() as it gets accessed
and unbusied in devfs_fixup(), which assumes that the devfs mount is
still locked.

Granced at by:	phk
MFC after:	3 days
2005-09-02 13:37:54 +00:00
Pawel Jakub Dawidek
b578b0bdd3 devfs_first() return value isn't used, remove it. 2005-05-18 22:05:12 +00:00
Pawel Jakub Dawidek
07ebf8c8c3 We don't use 'mp' variable, but we do want to mount devfs, ehh. 2005-05-12 01:49:51 +00:00
Pawel Jakub Dawidek
b8bc5373e1 Remove unised variable introduced by accident in rev 1.168.
Found by:	Coverity Prevent analysis tool
2005-05-11 19:50:34 +00:00
Pawel Jakub Dawidek
f850b2781f Plug memory leaks.
Found by:		Coverity Prevent analysis tool
2005-05-11 19:27:38 +00:00
Jeff Roberson
194dfed917 - Remove an old splcam hack. 2005-05-01 00:59:55 +00:00
Pawel Jakub Dawidek
f163441e7e Call g_waitidle() before every check the list of holds is empty.
Suggested by:	phk
2005-04-19 21:44:44 +00:00
Poul-Henning Kamp
d1c712ede2 Call g_waitidle() instead of GEOM using the root_mount_hold() KPI.
GEOM could (and will) get events as a result of drivers coming in
late so a one-shot method is not good enough for GEOM.
2005-04-19 06:23:59 +00:00
Poul-Henning Kamp
73fbaa74e5 Add a named reference-count KPI to hold off mounting of the root filesystem.
While we wait for holds to be released, print a list of who holds us
back once per second.

Use the new KPI from GEOM instead of vfs_mount.c calling g_waitidle().

Use the new KPI also from ata.

With ATAmkIII's newbusification, ata could narrowly miss the window
and ad0 would not exist when we tried to mount root.
2005-04-18 21:21:26 +00:00
Poul-Henning Kamp
bdb3564638 Initialize mountlist_mtx with an MTX_SYSINIT(), we need it to be ready
earlier.
2005-04-18 21:11:47 +00:00
Jeff Roberson
f247a5240d - LK_NOPAUSE is a nop now.
Sponsored by:   Isilon Systems, Inc.
2005-03-31 04:37:09 +00:00
Marcel Moolenaar
379ba85322 Fix inittodr() invocation. Now that devfs is mounted before the
actual root file system is mounted, the first entry on the mountlist
is not the root file system and the timestamp for that entry is
typically 0. Passing that to inittodr() caused annoying errors on
alpha and ia64.
So, call inittodr() for all file systems on mountlist, but only when
the timestamp (mnt_time) is non-zero.
2005-03-25 01:56:12 +00:00
Jeff Roberson
d830f82824 - Pass LK_EXCLUSIVE to VFS_ROOT() to satisfy the new flags argument. For
now, all calls to VFS_ROOT() should still acquire exclusive locks.

Sponsored by:	Isilon Systems, Inc.
2005-03-24 07:31:38 +00:00
Poul-Henning Kamp
9068e77689 Fix a memoryleak in case of failed root filesystem mount.
Spotted by:     Coverity via sam
2005-03-16 11:06:49 +00:00
John-Mark Gurney
2a77000b75 MFp4: print a more useful error when we don't have a /dev to mount devfs
on..
2005-03-16 08:04:39 +00:00
Poul-Henning Kamp
78bb3c21ed Add mnt_hashseed to struct mount and initialize it witn PRNG bits, use
it to get better hashing in vfs_hash.

In case of an insert collision in vfs_hash_insert(), put the loosing vnode
on a special list so that vfs_hash_remove() can just assume that it is on
a list.

Drop the VI_HASHED flag.
2005-03-16 07:35:06 +00:00
David Schultz
e8ed933099 Remove VFS_START(). Its original purpose involved the mfs filesystem,
which is long gone.

Discussed with:	mckusick
Reviewed by:	phk
2005-02-20 23:02:20 +00:00
Poul-Henning Kamp
ebbfc2f82d Make various mountpoint related functions static. 2005-02-10 12:25:38 +00:00
Pawel Jakub Dawidek
f627315f1e - Move gets() function to libkern (I want to use it outside vfs_mount.c).
- Add buffer size limitations (overflow will not be possible anymore).
- Add 'visible' option, which will allow for passphrase reading in the
  future.
- Remove special treatment of '@' and '#', those two are only confusing.

Discussed with:	rwatson
MFC after:	2 weeks
2005-02-03 15:10:58 +00:00
Jeff Roberson
fc48b760ac - Protect mnt_kern_flag with the mountpoint's mutex. This is required
to make the suspend related functions mpsafe.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:28:41 +00:00
Poul-Henning Kamp
7c0745eeae Eliminate unused and unnecessary "cred" argument from vinvalbuf() 2005-01-14 07:33:51 +00:00
Poul-Henning Kamp
8df6bac4c7 Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC().
I'm not sure why a credential was added to these in the first place, it is
not used anywhere and it doesn't make much sense:

	The credentials for syncing a file (ability to write to the
	file) should be checked at the system call level.

	Credentials for syncing one or more filesystems ("none")
	should be checked at the system call level as well.

	If the filesystem implementation needs a particular credential
	to carry out the syncing it would logically have to the
	cached mount credential, or a credential cached along with
	any delayed write data.

Discussed with:	rwatson
2005-01-11 07:36:22 +00:00
Warner Losh
9454b2d864 /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
Alexander Kabaev
aa6f98d12f Do not vput(9) unlocked vnode and do not VREF it with the sole purpose
of vputting it back immediately.

Complained by:	DEBUG_VFS_LOCKS
2004-12-27 05:17:11 +00:00
Poul-Henning Kamp
72e8dfe5a0 Hide/remove various printfs, now that root mounting doesn't seem to explode
on people.
2004-12-20 21:59:25 +00:00
Poul-Henning Kamp
12b18fdab4 Move the checkdirs() function from vfs_mount.c to kern_descrip.c and
call it mountcheckdirs().
2004-12-14 08:23:18 +00:00
Poul-Henning Kamp
1ab58cc2df Copy the entire stats structure. Let compiler decide how. 2004-12-11 22:13:02 +00:00
Poul-Henning Kamp
e40da1f149 Fix whitespace.
Spotted by:	njl
2004-12-11 20:41:32 +00:00
Poul-Henning Kamp
494ea31a7d Remove the /dev/dev -> / symlink after we are done with it. 2004-12-11 12:48:37 +00:00
Poul-Henning Kamp
20a92a18f1 The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly
split the conversion of the remaining three filesystems out from the root
mounting changes, so in one go:

cd9660:
	Convert to nmount.
	Add omount compat shims.
	Remove dedicated rootfs mounting code.
	Use vfs_mountedfrom()
	Rely on vfs_mount.c calling VFS_STATFS()

nfs(client):
	Convert to nmount (the simple way, mount_nfs(8) is still necessary).
	Add omount compat shims.
	Drop COMPAT_PRELITE2 mount arg compatibility.

ffs:
	Convert to nmount.
	Add omount compat shims.
	Remove dedicated rootfs mounting code.
	Use vfs_mountedfrom()
	Rely on vfs_mount.c calling VFS_STATFS()

Remove vfs_omount() method, all filesystems are now converted.

Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem
task, and they all do it now.

Change rootmounting to use DEVFS trampoline:

vfs_mount.c:
	Mount devfs on /.  Devfs needs no 'from' so this is clean.
	symlink /dev to /.  This makes it possible to lookup /dev/foo.
	Mount "real" root filesystem on /.
	Surgically move the devfs mountpoint from under the real root
	filesystem onto /dev in the real root filesystem.

Remove now unnecessary getdiskbyname().

kern_init.c:
	Don't do devfs mounting and rootvnode assignment here, it was
	already handled by vfs_mount.c.

Remove now unused bdevvp(), addaliasu() and addalias().  Put the
few necessary lines in devfs where they belong.  This eliminates the
second-last source of bogo vnodes, leaving only the lemming-syncer.

Remove rootdev variable, it doesn't give meaning in a global context and
was not trustworth anyway.  Correct information is provided by
statfs(/).
2004-12-07 08:15:41 +00:00
Poul-Henning Kamp
46d2b4184d Instead of complaining about it, just silently filter out MNT_ROOTFS.
This fixes the "fsck /" problem various people have reported overnight.
2004-12-07 06:58:42 +00:00
Poul-Henning Kamp
1e8ca0f0b0 Always call VFS_STATFS() on mp->mnt_stat when we have mounted a filesystem,
this way individual filesystems don't have to do it.
2004-12-06 19:53:32 +00:00
Poul-Henning Kamp
53a05b7c3f Add more functions for handling mount arguments in VFS_MOUNT():
vfs_flagopt() for binary/boolean options.
vfs_getopts() for string options
vfs_filteropt() to check for unknown options.
vfs_scanopt() for scanf() like processing of options.

Also add function for setting the stat.f_mntfromname field.
2004-12-06 18:18:35 +00:00
Poul-Henning Kamp
5ddb073996 Change the first argument of vfs_cmount() to a handy struct mntarg* and
call it accordingly.

(No filesystems implement vfs_cmount() yet, so this is a no-op commit)
2004-12-06 16:39:05 +00:00
Poul-Henning Kamp
49bfeeb848 Add a few convenient functions in the mount_arg() family and collect the
entire family at the end of the source file.
2004-12-06 13:01:41 +00:00
Poul-Henning Kamp
f0df036767 Collapse two almost identical license copies, preserving the rights of
all listed authors, rightholders and contributors.
2004-12-06 12:44:30 +00:00
Poul-Henning Kamp
def7671ad8 Remove the kern.rootdev sysctl.
Root filessytems (like NFS) don't have an associated disk device,
and even if they had, the exact semantics would be filesystem
dependent and should be implemented there.
2004-12-06 12:40:45 +00:00
Poul-Henning Kamp
a804d99c40 Make struct vfsopt{list} private to vfs_mount.c 2004-12-06 12:36:17 +00:00
Poul-Henning Kamp
743312367a VFS_STATFS(mp, ...) is mostly called with &mp->mnt_stat, but a few cases
doesn't.  Most of the implementations have grown weeds for this so they
copy some fields from mnt_stat if the passed argument isn't that.

Fix this the cleaner way:  Always call the implementation on mnt_stat
and copy that in toto to the VFS_STATFS argument if different.
2004-12-05 22:41:02 +00:00
Poul-Henning Kamp
6c12df5a19 Implement a function, mount_arg() for accumulating a list of mount parameters
to nmount.

Make kernel_mount() accept the output from mount_arg() and know how to
free the malloc'ed space.

Make kernel_vmount() use the new function.
2004-12-03 22:38:06 +00:00
Poul-Henning Kamp
b74f4d8bd1 When omount() is called, check if the filesystem have a cmount method
and if so call it.

The cmount method will gather and interpret omount() style arguments,
and issue a kern_[v]mount() call to execute the corresponding nmount
operation.
2004-12-03 21:14:46 +00:00
Poul-Henning Kamp
2a8b79eb6a Add early checks for MNT_ROOTFS since we need to allow it later on in
the code path.
2004-12-03 19:25:44 +00:00
Poul-Henning Kamp
a08805c741 Retire unused vfs_mount() function in the name of nmount migration. 2004-12-03 18:40:58 +00:00
Poul-Henning Kamp
32ba8e9390 Introduce vfs_byname_kld() which will try to load the filesystem
as a module if possible.

Use it so we don't have linker magic in the middle of the already
complex mount code.
2004-12-03 16:11:01 +00:00
Poul-Henning Kamp
a7db6b6ed3 Use FILEDESC_LOCK_FAST in checkdirs() 2004-11-28 11:26:43 +00:00
Poul-Henning Kamp
6518a5aa8e Eliminate MNT_NODEV usage, it doesn't have any meaning any more.
Keep a #define MNT_NODEV 0 around to avoid dealing with contrib
userland like mount_smbfs.
2004-11-26 19:28:39 +00:00
Poul-Henning Kamp
1b52747b5f Allow a filesystem to have both old and new mount methods at the same
time.  This will be necessary for transitioning.
2004-11-25 12:19:24 +00:00
Poul-Henning Kamp
19da2efc3e Assert Giant held in vfs_domount() and vfs_dounmount()
Explicitly grab Giant before calling these.
2004-11-25 12:06:43 +00:00
Poul-Henning Kamp
de4cbbf593 Integrate the relevant bits of vfs_rootmountalloc() where it matters. 2004-11-25 09:47:51 +00:00
Poul-Henning Kamp
7cc9fb79db Pass path to filesystem when mounting root 2004-11-18 14:31:24 +00:00
Poul-Henning Kamp
b6eb669970 remove unused variable 2004-11-10 09:56:28 +00:00
Poul-Henning Kamp
c2597f2ddb Remove hack which mounts the root filesystem R/W if the device is
named 'md<something>'.  While convenient, it does not belong here,
if anywhere at all.
2004-11-10 07:36:09 +00:00
Poul-Henning Kamp
e207b52afa Make getdiskbyname() static to vfs_mount.c.
Eliminate use of vn_todev() while here.
2004-11-09 23:03:34 +00:00
Poul-Henning Kamp
186e51cb30 Drop Giant around the call to g_waitidle().
This is necessary to allow any geom events which need it to pick up Giant.
2004-10-23 20:21:05 +00:00
Pawel Jakub Dawidek
8d02a378aa Back out changes which were introduced to delay mounting root file system.
Those changes were made on gmirror needs, but now gmirror handles this
by itself.
2004-10-05 11:26:43 +00:00
Pawel Jakub Dawidek
d0257d9c10 Rename 'mount_root_delay' tunable to 'vfs.root.mountdelay', which fits
a bit better to our current naming scheme.

Discussed with:	ru
2004-09-24 09:19:03 +00:00
Poul-Henning Kamp
fe0b82752b Eliminate devsw() call, we are not dereferencing the pointer. 2004-09-24 07:11:02 +00:00
Pawel Jakub Dawidek
5a19f8b0c4 Introduce new /boot/loader.conf variable: root_mount_delay.
It can be used to delay mounting root partition to give a chance to GEOM
providers to show up.
Now, when there is no needed provider, vfs_rootmount() function will look
for it every second and if it can't be find in defined time, it'll ask
for root device name (before this change it was done immediately).

This will allow to boot from gmirror device in degraded mode.
2004-09-23 10:13:18 +00:00
Alfred Perlstein
4c0bef6230 It's too easy to panic the machine when INVARIANTS are turned on
and you botch a call to nmount(2).

This is because there is an INVARIANTS check that asserts that
opt->len must be zero if opt->val is not NULL.  The problem is that
the code does not actually follow this invariant if there is an
error while processing mount options.

Fix the code to honor the INVARIANT.

Silence on: fs@
2004-09-05 22:24:28 +00:00
Poul-Henning Kamp
5e8c582ac2 Put a version element in the VFS filesystem configuration structure
and refuse initializing filesystems with a wrong version.  This will
aid maintenance activites on the 5-stable branch.

s/vfs_mount/vfs_omount/

s/vfs_nmount/vfs_mount/

Name our filesystems mount function consistently.

Eliminate the namiedata argument to both vfs_mount and vfs_omount.
It was originally there to save stack space.  A few places abused
it to get hold of some credentials to pass around.  Effectively
it is unused.

Reorganize the root filesystem selection code.
2004-07-30 22:08:52 +00:00
Poul-Henning Kamp
3dfe213e61 Convert the vfsconf list to a TAILQ.
Introduce vfs_byname() function to find things on it.

Staticize vfs_nmount() function under the name vfs_donmount().

Various cleanups.
2004-07-27 22:32:01 +00:00
Poul-Henning Kamp
65a311fcb2 Give kldunload a -f(orce) argument.
Add a MOD_QUIESCE event for modules.  This should return error (EBUSY)
of the module is in use.

MOD_UNLOAD should now only fail if it is impossible (as opposed to
inconvenient) to unload the module.  Valid reasons are memory references
into the module which cannot be tracked down and eliminated.

When kldunloading, we abandon if MOD_UNLOAD fails, and if -force is
not given, MOD_QUIESCE failing will also prevent the unload.

For backwards compatibility, we treat EOPNOTSUPP from MOD_QUIESCE as
success.

Document that modules should return EOPNOTSUPP for unknown events.
2004-07-13 19:36:59 +00:00
Alfred Perlstein
f257b7a54b Make VFS_ROOT() and vflush() take a thread argument.
This is to allow filesystems to decide based on the passed thread
which vnode to return.
Several filesystems used curthread, they now use the passed thread.
2004-07-12 08:14:09 +00:00
Poul-Henning Kamp
552afd9c12 Clean up and wash struct iovec and struct uio handling.
Add copyiniov() which copies a struct iovec array in from userland into
a malloc'ed struct iovec.  Caller frees.

Change uiofromiov() to malloc the uio (caller frees) and name it
copyinuio() which is more appropriate.

Add cloneuio() which returns a malloc'ed copy.  Caller frees.

Use them throughout.
2004-07-10 15:42:16 +00:00
Alfred Perlstein
1ea6061793 Use vfs_suser() where appropriate. 2004-07-06 09:39:32 +00:00
Alfred Perlstein
c713aaaeca NFS mobility PHASE I, II & III (phase VI, and V pending):
Rebind the client socket when we experience a timeout.  This fixes
the case where our IP changes for some reason.

Signal a VFS event when NFS transitions from up to down and vice
versa.

Add a placeholder vfs_sysctl where we will put status reporting
shortly.

Also:
Make down NFS mounts return EIO instead of EINTR when there is a
soft timeout or force unmount in progress.
2004-07-06 09:12:03 +00:00
Alfred Perlstein
94ed9c8af5 Introduce a new kevent filter. EVFILT_FS that will be used to signal
generic filesystem events to userspace.  Currently only mount and unmount
of filesystems are signalled.  Soon to be added, up/down status of NFS.

Introduce a sysctl node used to route requests to/from filesystems
based on filesystem ids.

Introduce a new vfsop, vfs_sysctl(mp, req) that is used as the callback/
entrypoint by the sysctl code to change individual filesystems.
2004-07-04 10:52:54 +00:00
Poul-Henning Kamp
e3c5a7a4dd When we traverse the vnodes on a mountpoint we need to look out for
our cached 'next vnode' being removed from this mountpoint.  If we
find that it was recycled, we restart our traversal from the start
of the list.

Code to do that is in all local disk filesystems (and a few other
places) and looks roughly like this:

		MNT_ILOCK(mp);
	loop:
		for (vp = TAILQ_FIRST(&mp...);
		    (vp = nvp) != NULL;
		    nvp = TAILQ_NEXT(vp,...)) {
			if (vp->v_mount != mp)
				goto loop;
			MNT_IUNLOCK(mp);
			...
			MNT_ILOCK(mp);
		}
		MNT_IUNLOCK(mp);

The code which takes vnodes off a mountpoint looks like this:

	MNT_ILOCK(vp->v_mount);
	...
	TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes);
	...
	MNT_IUNLOCK(vp->v_mount);
	...
	vp->v_mount = something;

(Take a moment and try to spot the locking error before you read on.)

On a SMP system, one CPU could have removed nvp from our mountlist
but not yet gotten to assign a new value to vp->v_mount while another
CPU simultaneously get to the top of the traversal loop where it
finds that (vp->v_mount != mp) is not true despite the fact that
the vnode has indeed been removed from our mountpoint.

Fix:

Introduce the macro MNT_VNODE_FOREACH() to traverse the list of
vnodes on a mountpoint while taking into account that vnodes may
be removed from the list as we go.  This saves approx 65 lines of
duplicated code.

Split the insmntque() which potentially moves a vnode from one mount
point to another into delmntque() and insmntque() which does just
what the names say.

Fix delmntque() to set vp->v_mount to NULL while holding the
mountpoint lock.
2004-07-04 08:52:35 +00:00
Thomas Moestl
3971dcfa4b Initialize ni_cnd.cn_cred before calling lookup() (this is normally done
by namei(), which cannot easily be used here however). This fixes boot
time crashes on sparc64 and probably other platforms.

Reviewed by:	phk
2004-06-20 17:31:01 +00:00
Poul-Henning Kamp
b90c855961 Reduce the thaumaturgical level of root filesystem mounts: Instead of using
an otherwise redundant clone routine in geom_disk.c, mount a temporary
DEVFS and do a proper lookup.

Submitted by:	thomas
2004-06-17 21:24:13 +00:00
Poul-Henning Kamp
f3732fd15b Second half of the dev_t cleanup.
The big lines are:
	NODEV -> NULL
	NOUDEV -> NODEV
	udev_t -> dev_t
	udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.
2004-06-17 17:16:53 +00:00
Poul-Henning Kamp
89c9c53da0 Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.
2004-06-16 09:47:26 +00:00
Warner Losh
7f8a436ff2 Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
2004-04-05 21:03:37 +00:00
Pawel Jakub Dawidek
0b68054f9d - Add a description for vfs.usermount sysctl.
- Add the vfs_equalopts() function for mount options comparsion.
  Now it looks much more clear.
- Style fixed.

In co-operation with:	bde
2004-03-27 08:39:28 +00:00
Pawel Jakub Dawidek
6c8cc8ec4b - Loudly disallow MNT_SUIDDIR mount flag for unprivileged users mounts.
- Style fixed.

Submitted by:	bde
2004-03-27 08:09:00 +00:00
Pawel Jakub Dawidek
2c6040bbb7 We probably shouldn't allow users to mount file systems with MNT_SUIDDIR.
There should be not shell access when SUIDDIR is compiled in, but
better be sure.

Reviewed by:	rwatson
2004-03-26 21:12:14 +00:00
Tim J. Robbins
537370d0a4 Make vfs_nmount() public. The Linux emulator needs this in order to mount
linprocfs filesystems.
2004-03-16 08:59:37 +00:00
Poul-Henning Kamp
2b348f7429 Remove unused mnt_reservedvnlist field. 2004-03-11 16:59:57 +00:00
Colin Percival
3a1bdbf8d1 Don't ignore errors from vfs_allocate_syncvnode.
PR:		kern/18503
Submitted by:	Anatoly Vorobey <mellon@pobox.com>
Approved by:	rwatson (mentor)
2004-02-18 05:20:54 +00:00
Pawel Jakub Dawidek
3410b19324 Fix many issues related to mount/unmount:
1. Root from inside a jail was able to unmount any file system
   (except /).
2. Unprivileged root was able to unmount file systems mounted by
   privileged root (execpt /).
3. User from inside a jail was able to mount file system when
   sysctl vfs.usermount was set to 1.
4. User was able to mount file system when vfs.usermount was set to 1
   (that's ok) and unmount it even if vfs.usermount was equal to 0
   (that's not correct).

Possibility from point 1 was reported by: Dariusz Kowalski <darek@76.pl>

Only a part of this fix will be MFC'ed (if approved).

PR:		kern/60149
Reviewed by:	rwatson
Approved by:	scottl (mentor)
MFC after:	3 days
2004-02-02 19:02:05 +00:00
Ian Dowse
25cb5d7a6b In dounmount(), only call checkdirs() prior to VFS_UNMOUNT() in the
forced unmount case. Otherwise, a file system that is referenced
only by process fd_cdir/fd_rdir references to the file system root
vnode will be successfully unmounted without the MNT_FORCE flag.

The previous behaviour was not compatible with the unmount semantics
required by amd(8), so file systems could be unexpectedly unmounted
while there were still references to the file system root directory.

Reported by:	Erez Zadok <ezk@cs.sunysb.edu>
Approved by:	re (scottl)
2003-11-30 23:30:09 +00:00
Alexander Kabaev
97c43a540a Do not attempt to destroy NULL vfs options list.
Approved by: re (scottl)
Reported by: Christian Laursen <xi atborderworlds dot dk>
2003-11-23 17:13:48 +00:00
Alexander Kabaev
3b39740df8 Fix a number of style(9) bugs introduced in r1.113 by me.
Suggested by:	bde
2003-11-14 05:27:41 +00:00
Peter Wemm
cde6302bf0 MNAMELEN is back to an int again after Kirk's statfs commit
kern/vfs_mount.c:1305: warning: signed size_t format, different type arg (arg 4)
*** Error code 1
2003-11-12 17:09:12 +00:00
Alexander Kabaev
5c957adbf1 1. Consolidate mount struct allocation/destruction into a common code in
vfs_mount_alloc/vfs_mount_destroy functions and take care to completely
destroy the mount point along with its locks. Mount struct has grown in
coplexity recently and depending on each failure path to destroy it
completely isn't working anymore.

2. Eliminate largely identical vfs_mount and vfs_unmount question by
moving the code to handle both cases into a newly introduced vfs_domount
function.

3. Simplify nfs_mount_diskless to always expect an allocated mount
struct and never attempt an allocation/destruction itself. The
vfs_allocroot allocation was there to support 'magic' swap space
configuration for diskless clients that was already removed by PHK some
time ago.

4. Include a vfs_buildopts cleanups by Peter Edwards to validate the
sanity of nmount parameters passed from userland.

Submitted by:  (4) Peter Edwards <peter.edwards@openet-telecom.com>
Reviewed by:    rwatson
2003-11-12 02:54:47 +00:00
Alexander Kabaev
ca430f2e92 Remove mntvnode_mtx and replace it with per-mountpoint mutex.
Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to
operate on this mutex transparently.

Eventually new mutex will be protecting more fields in
struct mount, not only vnode list.

Discussed with: jeff
2003-11-05 04:30:08 +00:00
Poul-Henning Kamp
3d4274a52b Update the list of CDROM device names to try for booting with RB_CDROM
flag set.
2003-09-26 09:07:27 +00:00
Ian Dowse
ffe40c80ea In the !MNT_BYFSID case, return EINVAL from unmount(2) when the
specified directory is not found in the mount list. Before the
MNT_BYFSID changes, unmount(2) used to return ENOENT for a nonexistent
path and EINVAL for a non-mountpoint, but we can no longer distinguish
between these cases. Of the two error codes, EINVAL was more likely
to occur in practice, and it was the only one of the two that was
documented.

Update the manual page to match the current behaviour.

Suggested by:	tjr
Reviewed by:	tjr
2003-09-08 16:23:21 +00:00
Ian Dowse
318f2fb4bf Add a new mount flag MNT_BYFSID that can be used to unmount a file
system by specifying the file system ID instead of a path. Use this
by default in umount(8). This avoids the need to perform any vnode
operations to look up the mount point, so it makes it possible to
unmount a file system whose root vnode cannot be looked up (e.g.
due to a dead NFS server, or a file system that has become detached
from the hierarchy because an underlying file system was unmounted).
It also provides an unambiguous way to specify which file system is
to be unmunted.

Since the ability to unmount using a path name is retained only for
compatibility, that case now just uses a simple string comparison
of the supplied path against f_mntonname of each mounted file system.

Discussed on:	freebsd-arch
mdoc help from:	ru
2003-07-01 17:40:23 +00:00
David E. O'Brien
677b542ea2 Use __FBSDID(). 2003-06-11 00:56:59 +00:00
Poul-Henning Kamp
84c080a85e Improve the root-dev prompt facility for printing devices which could
possibly be a root filesystem.
2003-06-07 15:46:53 +00:00
Tim J. Robbins
38dd7dee8a Free mount credentials (mnt_cred) when freeing the mount struct
in failure cases to avoid leaking struct ucreds, and ultimately
leaking struct uidinfo references.
2003-04-24 08:16:06 +00:00
David E. O'Brien
2603007ace Add /dev to the Alpha manual mount root example. 2003-04-23 05:02:40 +00:00
Tor Egge
6b08046175 Adjust the number of vnodes scanned by vlrureclaim() according to the
size of the vnode list.
2003-03-26 22:15:58 +00:00
Robert Watson
838a6d03e8 Export the name of the device used to mount the root file system as
kern.rootdev.  If rootdev is undefined (NFS mount, etc), export an
empty string.

Desired by:	peter
2003-02-22 05:01:12 +00:00
Warner Losh
a163d034fa Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
Alfred Perlstein
edf6699ae6 Fix LOR with PROC/filedesc. Introduce fdesc_mtx that will be used as a
barrier between free'ing filedesc structures.  Basically if you want to
access another process's filedesc, you want to hold this mutex over the
entire operation.
2003-02-15 05:52:56 +00:00
Dag-Erling Smørgrav
af2eed6648 Style nit. 2003-02-14 13:30:25 +00:00
Alfred Perlstein
3dc593c895 KASSERT format string does not need newline termination 2003-02-14 13:28:44 +00:00
Alfred Perlstein
0c5f7aaab5 Add kasserts to catch bad API usage.
Submitted by: Hiten Pandya <hiten@unixdaemons.com>
2003-02-14 13:18:51 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
Alfred Perlstein
13438f6823 When compiling the kernel do not implicitly include filedesc.h from proc.h,
this was causing filedesc work to be very painful.
In order to make this work split out sigio definitions to thier own header
(sigio.h) which is included from proc.h for the time being.
2003-01-01 01:56:19 +00:00
Alfred Perlstein
f97182acf8 unwrap lines made short enough by SCARGS removal 2002-12-14 08:18:06 +00:00
Alfred Perlstein
b80521fee5 remove syscallarg().
Suggested by: peter
2002-12-14 02:07:32 +00:00
Alfred Perlstein
d1e405c5ce SCARGS removal take II. 2002-12-14 01:56:26 +00:00