Commit Graph

435 Commits

Author SHA1 Message Date
Kirk McKusick
6beb3bb4eb This update changes the mnt_flag field in the mount structure from
32 bits to 64 bits and eliminates the unused mnt_xflag field.  The
existing mnt_flag field is completely out of bits, so this update
gives us room to expand. Note that the f_flags field in the statfs
structure is already 64 bits, so the expanded mnt_flag field can
be exported without having to make any changes in the statfs structure.

Approved by: re (bz)
2011-07-24 17:43:09 +00:00
Andrey V. Elsukov
fef7c585de Include sys/sbuf.h directly. 2011-07-11 05:17:46 +00:00
Matthew D Fleming
3d08a76bbc Use a name instead of a magic number for kern_yield(9) when the priority
should not change.  Fetch the td_user_pri under the thread lock.  This
is probably not necessary but a magic number also seems preferable to
knowing the implementation details here.

Requested by:	Jason Behmer < jason DOT behmer AT isilon DOT com >
2011-05-13 05:27:58 +00:00
Jaakko Heinonen
1b0fe69dc9 Utilize vfs_sanitizeopts() in vfs_mergeopts() to merge options. Because
vfs_sanitizeopts() can handle "ro" and "rw" options properly, there is
no more need to add "noro" in vfs_donmount() to cancel "ro".

This also fixes a problem of canceling options beginning with "no".
For example, "noatime" didn't cancel "nonoatime". Thus it was possible
that both "noatime" and "nonoatime" were active at the same time.

Reviewed by:	bde
2011-04-22 07:26:09 +00:00
Jaakko Heinonen
9dc6abbd8a Fix some style issues in r219925.
Reported by:	bde
MFC after:	1 month
2011-03-26 17:17:24 +00:00
Jaakko Heinonen
3fd8fe5b54 Recognize "ro", "rdonly", "norw", "rw" and "noro" as equal options in
vfs_equalopts(). This allows vfs_sanitizeopts() to filter redundant
occurrences of these options. It was possible that for example both "ro"
and "rw" options became active concurrently.

PR:		kern/133614
Discussed on:	freebsd-hackers
MFC after:	1 month
2011-03-23 17:56:38 +00:00
Jaakko Heinonen
da2e368f70 Don't restore old mount options and flags if VFS_MOUNT(9) succeeds but
vfs_export() fails. Restoring old options and flags after successful
VFS_MOUNT(9) call may cause the file system internal state to become
inconsistent with mount options and flags. Specifically the FFS super
block fs_ronly field and the MNT_RDONLY flag may get out of sync.

PR:		kern/133614
Discussed on:	freebsd-hackers
2011-02-19 14:27:14 +00:00
Matthew D Fleming
e7ceb1e99b Based on discussions on the svn-src mailing list, rework r218195:
- entirely eliminate some calls to uio_yeild() as being unnecessary,
   such as in a sysctl handler.

 - move should_yield() and maybe_yield() to kern_synch.c and move the
   prototypes from sys/uio.h to sys/proc.h

 - add a slightly more generic kern_yield() that can replace the
   functionality of uio_yield().

 - replace source uses of uio_yield() with the functional equivalent,
   or in some cases do not change the thread priority when switching.

 - fix a logic inversion bug in vlrureclaim(), pointed out by bde@.

 - instead of using the per-cpu last switched ticks, use a per thread
   variable for should_yield().  With PREEMPTION, the only reasonable
   use of this is to determine if a lock has been held a long time and
   relinquish it.  Without PREEMPTION, this is essentially the same as
   the per-cpu variable.
2011-02-08 00:16:36 +00:00
Matthew D Fleming
08b163fa51 Put the general logic for being a CPU hog into a new function
should_yield().  Use this in various places.  Encapsulate the common
case of check-and-yield into a new function maybe_yield().

Change several checks for a magic number of iterations to use
should_yield() instead.

MFC after:	1 week
2011-02-02 16:35:10 +00:00
Jaakko Heinonen
1a4fbae871 Replace spaces with tabs. 2011-01-24 17:08:26 +00:00
Sergey Kandaurov
f03749ca2d Update MNT_ROOTFS comments after changes in the root mount logic.
Reported by:	arundel
Suggested by:	marcel (phrasing)
Approved by:	kib (mentor)
2010-11-23 13:49:15 +00:00
Marcel Moolenaar
c1f0aabb9f In vfs_filteropt(), only print the errmsg when there's no errmsg
mount option. Otherwise errors tend to get printed multiple times.
2010-10-18 04:34:42 +00:00
Konstantin Belousov
d0cc54f3b4 The r184588 changed the layout of struct export_args, causing an ABI
breakage for old mount(2) syscall, since most struct <filesystem>_args
embed export_args. The mount(2) is supposed to provide ABI
compatibility for pre-nmount mount(8) binaries, so restore ABI to
pre-r184588.

Requested and reviewed by:	bde
MFC after:    2 weeks
2010-10-10 07:05:47 +00:00
Marcel Moolenaar
24e01f5998 Split the root mount logic from the (generic) mount code and move
it (the root mount code) into a new file called vfs_mountroot.c

The split is almost trivial, as the code is almost perfectly
non-intertwined. The only adjustment needed was to move the UMA
zone allocation out of vfs_mountroot() [in vfs_mountroot.c] and
into vfs_mount.c, where it had to be done as a SYSINIT [see
vfs_mount_init()].

There are no functional changes with this commit.
2010-10-02 19:44:13 +00:00
Konstantin Belousov
9a24dc0760 Protect mnt_syncer with the sync_mtx. This prevents a (rare) vnode leak
when mount and update are executed in parallel.

Encapsulate syncer vnode deallocation into the helper function
vfs_deallocate_syncvnode(), to not externalize sync_mtx from vfs_subr.c.

Found and reviewed by:	jh (previous version of the patch)
Tested by:	pho
MFC after:	3 weeks
2010-09-11 13:06:06 +00:00
Pawel Jakub Dawidek
4946fa6791 Remove VI_MOUNT flag from vnode on VFS_MOUNT() failure. 2010-09-09 07:55:13 +00:00
Pawel Jakub Dawidek
7443b79b81 Doing first mount and updating mount points are both handled by the same
syscall and the same function, but are very different and share almost no code.
To make it easier to read and analyze, split vfs_domount() into
vfs_domount_first() and vfs_domount_update().

Reviewed by:	kib
2010-09-08 21:00:53 +00:00
Pawel Jakub Dawidek
a34512e3f0 - Log all the problems in devfs_fixup().
- Correct error paths. The system will be useless on devfs_fixup() failure, so
  why bother?  Maybe for the same reason why a dead body is washed and dressed
  in a nice suit before it is put into a coffin? Maybe system's last will is to
  panic without any locks held?

Reviewed by:	kib
2010-09-08 20:56:18 +00:00
Pawel Jakub Dawidek
c87f1ad43c There is a bug in vfs_allocate_syncvnode() failure handling in mount code.
Actually it is hard to properly handle such a failure, especially in MNT_UPDATE
case. The only reason for the vfs_allocate_syncvnode() function to fail is
getnewvnode() failure. Fortunately it is impossible for current implementation
of getnewvnode() to fail, so we can assert this and make
vfs_allocate_syncvnode() void. This in turn free us from handling its failures
in the mount code.

Reviewed by:	kib
MFC after:	1 month
2010-08-28 08:57:15 +00:00
Pawel Jakub Dawidek
d779d44313 - Reduce scope of vnode lock. vfs_mount_alloc() doesn't need vnode to be
locked.
- Remove code duplication.
2010-02-18 22:22:45 +00:00
Antoine Brodin
13e403fdea (S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.

PR:		137213
Submitted by:	Eygene Ryabinkin (initial version)
MFC after:	1 month
2009-12-28 22:56:30 +00:00
Attilio Rao
d113304956 Add the possibility for vfs.root.mountfrom tunable to accept a list of
items rather than a single one. The list is a space separated collection
of items defined as the current one accepted.

While there fix also a nit in a comment.

Obtained from:	Sandvine Incorporated
Reviewed by:	emaste
Tested by:	Giovanni Trematerra
		<giovanni dot trematerra at gmail dot com>
Sponsored by:	Sandvine Incorporated
MFC:		2 weeks
2009-11-12 15:59:05 +00:00
Edward Tomasz Napierala
7ba889b8f4 Add suggestion for zfs root. 2009-11-08 09:54:25 +00:00
John Baldwin
87eca70e0c Fix some LORs between vnode locks and filedescriptor table locks.
- Don't grab the filedesc lock just to read fd_cmask.
- Drop vnode locks earlier when mounting the root filesystem and before
  sanitizing stdin/out/err file descriptors during execve().

Submitted by:	kib
Approved by:	re (rwatson)
MFC after:	1 week
2009-07-31 13:40:06 +00:00
Robert Watson
791b0ad2bf Eliminate ARG_UPATH[12] arguments to AUDIT_ARG_UPATH() and instead
provide specific macros, AUDIT_ARG_UPATH1() and AUDIT_ARG_UPATH2()
to capture path information for audit records.  This allows us to
move the definitions of ARG_* out of the public audit header file,
as they are an implementation detail of our current kernel-internal
audit record, which may change.

Approved by:	re (kensmith)
Obtained from:	TrustedBSD Project
MFC after:	1 month
2009-07-29 07:44:43 +00:00
Robert Watson
6d5a61563a When auditing unmount(2), capture FSID arguments as regular text strings
rather than as paths, which would lead to them being treated as relative
pathnames and hence confusingly converted into absolute pathnames.

Capture flags to unmount(2) via an argument token.

Approved by:	re (audit argument blanket)
MFC after:	3 days
2009-07-01 16:56:56 +00:00
Robert Watson
14961ba789 Replace AUDIT_ARG() with variable argument macros with a set more more
specific macros for each audit argument type.  This makes it easier to
follow call-graphs, especially for automated analysis tools (such as
fxr).

In MFC, we should leave the existing AUDIT_ARG() macros as they may be
used by third-party kernel modules.

Suggested by:	brooks
Approved by:	re (kib)
Obtained from:	TrustedBSD Project
MFC after:	1 week
2009-06-27 13:58:44 +00:00
Robert Watson
bcf11e8d00 Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with:	pjd
2009-06-05 14:55:22 +00:00
Craig Rodrigues
0c349f0856 sys/boot/common.c
=================
Extend the loader to parse the root file system mount options in /etc/fstab,
and set a new loader variable vfs.root.mountfrom.options with these options.
The root mount options must be a comma-delimited string, as specified in
/etc/fstab.
Only set the vfs.root.mountfrom.options variable if it has not been
set in the environment.

sys/kern/vfs_mount.c
====================
When mounting the root file system, pass the mount options
specified in vfs.root.mountfrom.options, but filter out "rw" and "noro",
since the initial mount of the root file system must be done as "ro".
While we are here, try to add a few hints to the mountroot prompt
to give users and idea what might of gone wrong during mounting
of the root file system.

Reviewed by:	jhb (an earlier patch)
2009-06-01 01:02:30 +00:00
Jamie Gritton
0304c73163 Add hierarchical jails. A jail may further virtualize its environment
by creating a child jail, which is visible to that jail and to any
parent jails.  Child jails may be restricted more than their parents,
but never less.  Jail names reflect this hierarchy, being MIB-style
dot-separated strings.

Every thread now points to a jail, the default being prison0, which
contains information about the physical system.  Prison0's root
directory is the same as rootvnode; its hostname is the same as the
global hostname, and its securelevel replaces the global securelevel.
Note that the variable "securelevel" has actually gone away, which
should not cause any problems for code that properly uses
securelevel_gt() and securelevel_ge().

Some jail-related permissions that were kept in global variables and
set via sysctls are now per-jail settings.  The sysctls still exist for
backward compatibility, used only by the now-deprecated jail(2) system
call.

Approved by:	bz (mentor)
2009-05-27 14:11:23 +00:00
Attilio Rao
dfd233edd5 Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS.  Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled.  Bump __FreeBSD_version in order to signal such
situation.
2009-05-11 15:33:26 +00:00
Andrew Thompson
853a10a581 Revert r190676,190677
The geom and CAM changes for root_hold are the wrong solution for USB design
quirks.

Requested by:	scottl
2009-04-10 04:08:34 +00:00
Andrew Thompson
626fc9fe3d Add a how argument to root_mount_hold() so it can be passed NOWAIT and be called
in situations where sleeping isnt allowed.
2009-04-03 19:46:12 +00:00
Andrew Thompson
46b70f07bc Further rate limit the root wait status, it will be printed once per
root_mount_rel() wakeup.
2009-03-30 05:57:55 +00:00
Andrew Thompson
d24d45d9a9 Skip the allocation of the root hold token if the mount already happened. 2009-03-27 03:52:08 +00:00
Jamie Gritton
f86bce5ed0 Extend the "vfsopt" mount options for more general use. Make struct
vfsopt and the vfs_buildopts function public, and add some new fields
to struct vfsopt (pos and seen), and new functions vfs_getopt_pos and
vfs_opterror.

Further extend the interface to allow reading options from the kernel
in addition to sending them to the kernel, with vfs_setopt and related
functions.

While this allows the "name=value" option interface to be used for more
than just FS mounts (planned use is for jails), it retains the current
"vfsopt" name and <sys/mount.h> requirement.

Approved by:	bz (mentor)
2009-03-02 23:26:30 +00:00
Attilio Rao
feabc903d9 Add more KTR_VFS logging point in order to have a more effective tracing.
Reviewed by:	brueffer, kib
Tested by:	Gianni Trematerra <giovanni D trematerra A gmail D com>
2009-02-05 15:03:35 +00:00
Attilio Rao
4a0f807602 1) Fix a deadlock in the VFS:
- threadA runs vfs_rel(mp1)
- threadB does unmount the mp1 fs, sets MNTK_UNMOUNT and drop MNT_ILOCK()
- threadA runs vfs_busy(mp1) and, as long as, MNTK_UNMOUNT is set, sleeps
  waiting for threadB to complete the unmount
- threadB, in vfs_mount_destroy(), finds mnt_lock > 0 and sleeps waiting
  for the refcount to expire.

Fix the deadlock by adding a flag called MNTK_REFEXPIRE which signals the
unmounter is waiting for mnt_ref to expire.
The vfs_busy contenders got awake, fails, and if they retry the
MNTK_REFEXPIRE won't allow them to sleep again.

2) Simplify significantly the code of vfs_mount_destroy() trimming
   unnecessary codes:
   - as long as any reference exited, it is no-more possible to have
     write-op (primarty and secondary) in progress.
   - it is no needed to drop and reacquire the mount lock.
   - filling the structures with dummy values is unuseful as long as
     it is going to be freed.

Tested by:	pho, Andrea Barberio <insomniac at slackware dot it>
Discussed with:	kib
2008-12-16 23:16:10 +00:00
Attilio Rao
ccc55b33b7 Fix an inverted check introduced in r184554.
Submitted by:	tegge
Pointy hat to:	me
2008-12-01 03:00:26 +00:00
Attilio Rao
30f60d8c31 Remove the mnt_holdcnt and mnt_holdcntwaiters because they are useless.
Really, the concept of holdcnt in the struct mount is rappresented by
the mnt_ref (which prevents the type-stable structure from being
"recycled) handled through vfs_ref() and vfs_rel().
On this optic, switch the holdcnt acquisition into an emulated vfs_ref()
(and subsequent release into vfs_rel()).

Discussed with:	kib
Tested by:	pho
2008-11-03 20:00:35 +00:00
Doug Rabson
a9148abd9d Implement support for RPCSEC_GSS authentication to both the NFS client
and server. This replaces the RPC implementation of the NFS client and
server with the newer RPC implementation originally developed
(actually ported from the userland sunrpc code) to support the NFS
Lock Manager.  I have tested this code extensively and I believe it is
stable and that performance is at least equal to the legacy RPC
implementation.

The NFS code currently contains support for both the new RPC
implementation and the older legacy implementation inherited from the
original NFS codebase. The default is to use the new implementation -
add the NFS_LEGACYRPC option to fall back to the old code. When I
merge this support back to RELENG_7, I will probably change this so
that users have to 'opt in' to get the new code.

To use RPCSEC_GSS on either client or server, you must build a kernel
which includes the KGSSAPI option and the crypto device. On the
userland side, you must build at least a new libc, mountd, mount_nfs
and gssd. You must install new versions of /etc/rc.d/gssd and
/etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf.

As long as gssd is running, you should be able to mount an NFS
filesystem from a server that requires RPCSEC_GSS authentication. The
mount itself can happen without any kerberos credentials but all
access to the filesystem will be denied unless the accessing user has
a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There
is currently no support for situations where the ticket file is in a
different place, such as when the user logged in via SSH and has
delegated credentials from that login. This restriction is also
present in Solaris and Linux. In theory, we could improve this in
future, possibly using Brooks Davis' implementation of variant
symlinks.

Supporting RPCSEC_GSS on a server is nearly as simple. You must create
service creds for the server in the form 'nfs/<fqdn>@<REALM>' and
install them in /etc/krb5.keytab. The standard heimdal utility ktutil
makes this fairly easy. After the service creds have been created, you
can add a '-sec=krb5' option to /etc/exports and restart both mountd
and nfsd.

The only other difference an administrator should notice is that nfsd
doesn't fork to create service threads any more. In normal operation,
there will be two nfsd processes, one in userland waiting for TCP
connections and one in the kernel handling requests. The latter
process will create as many kthreads as required - these should be
visible via 'top -H'. The code has some support for varying the number
of service threads according to load but initially at least, nfsd uses
a fixed number of threads according to the value supplied to its '-n'
option.

Sponsored by:	Isilon Systems
MFC after:	1 month
2008-11-03 10:38:00 +00:00
Attilio Rao
83b3bdbc8a Improve VFS locking:
- Implement real draining for vfs consumers by not relying on the
  mnt_lock and using instead a refcount in order to keep track of lock
  requesters.
- Due to the change above, remove the mnt_lock lockmgr because it is now
  useless.
- Due to the change above, vfs_busy() is no more linked to a lockmgr.
  Change so its KPI by removing the interlock argument and defining 2 new
  flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the
  old version (which was unlinked from the lockmgr alredy) and
  MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx
  once the mnt interlock is held (ability still desired by most consumers).
- The stub used into vfs_mount_destroy(), that allows to override the
  mnt_ref if running for more than 3 seconds, make it totally useless.
  Remove it as it was thought to work into older versions.
  If a problem of "refcount held never going away" should appear, we will
  need to fix properly instead than trust on such hackish solution.
- Fix a bug where returning (with an error) from dounmount() was still
  leaving the MNTK_MWAIT flag on even if it the waiters were actually
  woken up. Just a place in vfs_mount_destroy() is left because it is
  going to recycle the structure in any case, so it doesn't matter.
- Remove the markercnt refcount as it is useless.

This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and
__FreeBSD_version will be modified accordingly.

Discussed with:	kib
Tested by:	pho
2008-11-02 10:15:42 +00:00
Attilio Rao
0d7935fd01 Remove the struct thread unuseful argument from bufobj interface.
In particular following functions KPI results modified:
- bufobj_invalbuf()
- bufsync()

and BO_SYNC() "virtual method" of the buffer objects set.
Main consumers of bufobj functions are affected by this change too and,
in particular, functions which changed their KPI are:
- vinvalbuf()
- g_vfs_close()

Due to the KPI breakage, __FreeBSD_version will be bumped in a later
commit.

As a side note, please consider just temporary the 'curthread' argument
passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP

Reviewed by:	kib
Tested by:	Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2008-10-10 21:23:50 +00:00
David E. O'Brien
6e6049e9df Add freebsd32 compat shim for nmount(2).
(and quiet some compiler warnings for vfs_donmount)
2008-09-19 15:17:32 +00:00
Simon L. B. Nielsen
59ca51adba - Fix amd64 local privilege escalation. [08:07]
- Fix nmount(2) local privilege escalation. [08:08]
- Fix IPv6 remote kernel panics. [08:09]

Fix for [08:07] is merge of r181823.

Submitted by:	kib [08:07], csjp [08:08], bz [08:09]
Reviewed by:	peter [08:07], jhb [08:07]
Reviewed by:	jinmei [08:09], rwatson [08:09]
Approved by:	re (SA blanket)
Approved by:	so (simon)
Security:	FreeBSD-SA-08:07.amd64
Security:	FreeBSD-SA-08:08.nmount
Security:	FreeBSD-SA-08:09.icmp6
2008-09-03 19:09:47 +00:00
Attilio Rao
59d4932531 Decontextualize vfs_busy(), vfs_unbusy() and vfs_mount_alloc() functions.
Manpages are updated accordingly.

Tested by:	Diego Sardina <siarodx at gmail dot com>
2008-08-31 14:26:08 +00:00
Attilio Rao
0359a12ead Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2008-08-28 15:23:18 +00:00
Craig Rodrigues
d5bdb2f68d In nmount(), when we see the "force" option,
set the MNT_FORCE flag, but do not persist "force"
in the options list, since it is a command, not a persistent property
of a mount.

Similarly, when we see "reload", set MNT_RELOAD,
but delete "reload" from the options list.

MFC after:	1 week
2008-08-23 01:16:09 +00:00
Konstantin Belousov
e792b09be2 Revert r181345.
Move the NULL pointer check to the vfs_deleteopt() function.

Discussed with:	rodrigc
MFC after:	3 days
2008-08-10 12:15:36 +00:00
Dag-Erling Smørgrav
2616144e43 Add sbuf_new_auto as a shortcut for the very common case of creating a
completely dynamic sbuf.

Obtained from:	Varnish
MFC after:	2 weeks
2008-08-09 11:14:05 +00:00
Craig Rodrigues
1aad294b4e In nmount(), if we see "update" in the mount options,
set MNT_UPDATE in fsflags, and delete the
"update" option from the global mount options.

MNT_UPDATE is a command, and not a property of a mount
that should persist after the command is executed.

We need to do similar things for MNT_FORCE and MNT_RELOAD.

All mount flags are prefixed by MNT_..... it would
be nice if flags which were commands were named differently
from flags which are persistent properties of a mount.
This was not such a big deal in the pre-nmount() days,
but with nmount() it is more important.

Requested by:	yar
MFC after:	2 weeks
2008-07-12 20:12:40 +00:00
Konstantin Belousov
a70537835f Provide the mutual exclusion between the nfs export list modifications
and nfs requests processing. Lockmgr lock provides the shared locking for
nfs requests, while exclusive mode is used for modifications. The writer
starvation is handled by lockmgr too.

Reported by:	kris, pho, many
Based on the submission by:	mohan
Tested by:	pho
MFC after:	2 weeks
2008-06-09 10:31:38 +00:00
Wojciech A. Koszek
2e75877f12 Remove checks against DDB, which isn't used in this file.
My intention is to bring no functional change.

Discussion on:	IRC
Reviewed by:	ed, kan, rink,
2008-06-08 20:43:27 +00:00
Craig Rodrigues
a9722ace80 Do not convert the "snapshot" string to the MNT_SNAPSHOT flag here, since
we do it further down in ffs_vfsops.c

MFC after:	1 month
2008-05-23 23:33:07 +00:00
Roman Divacky
d6891277a4 Lock filedesc exclusively when modifying fd_[cr]dir.
This is probably harmless but it's better to lock it
correctly.

Approved by:	kib (mentor)
2008-04-29 21:40:11 +00:00
Poul-Henning Kamp
9b4a8ab7ba Now that all platforms use genclock, shuffle things around slightly
for better structure.

Much of this is related to <sys/clock.h>, which should really have
been called <sys/calendar.h>, but unless and until we need the name,
the repocopy can wait.

In general the kernel does not know about minutes, hours, days,
timezones, daylight savings time, leap-years and such.  All that
is theoretically a matter for userland only.

Parts of kernel code does however care: badly designed filesystems
store timestamps in local time and RTC chips almost universally
track time in a YY-MM-DD HH:MM:SS format, and sometimes in local
timezone instead of UTC.  For this we have <sys/clock.h>

<sys/time.h> on the other hand, deals with time_t, timeval, timespec
and so on.  These know only seconds and fractions thereof.

Move inittodr() and resettodr() prototypes to <sys/time.h>.
Retain the names as it is one of the few surviving PDP/VAX references.

Move startrtclock() to <machine/clock.h> on relevant platforms, it
is a MD call between machdep.c/clock.c.  Remove references to it
elsewhere.

Remove a lot of unnecessary <sys/clock.h> includes.

Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs.
XXX: should be kern.disable_rtc_set really, it's not MD.
2008-04-22 19:38:30 +00:00
Sam Leffler
00c71fb7c3 o add a mountroot event handler that fires when / is mounted; this information
was lost when root started being mounted by init
o remove SI_SUB_MOUNT_ROOT since it's no longer meaningful

MFC after:	2 weeks
2008-04-08 17:53:33 +00:00
Konstantin Belousov
57b4252e45 Add the support for the AT_FDCWD and fd-relative name lookups to the
namei(9).

Based on the submission by rdivacky,
	sponsored by Google Summer of Code 2007
Reviewed by:	rwatson, rdivacky
Tested by:	pho
2008-03-31 12:01:21 +00:00
Konstantin Belousov
1be222e9df Yield the cpu in the kernel while iterating the list of the
vnodes belonging to the mountpoint. Also, yield when in the
softdep_process_worklist() even when we are not going to sleep due to
buffer drain.

It is believed that the ULE fixed the problem [1], but the yielding
seems to be needed at least for the 4BSD case.

Discussed:	on stable@, with bde
Reviewed by:	tegge, jeff [1]
MFC after:	2 weeks
2008-03-23 13:45:24 +00:00
Yaroslav Tykhiy
c6446de05d Undo the damage I did in sys/kern/vfs_mount.c #1.274 and
sbin/mount_nfs/mount_nfs.c #1.76.  Let the dragons sleep.

Requested by:	rodrigc, des
PR:		kern/120319 (welcome the bug back)
2008-02-18 20:58:57 +00:00
Yaroslav Tykhiy
37ed722f78 Add a remark on a questionable property of vfs_mergeopts(). 2008-02-18 10:10:42 +00:00
Yaroslav Tykhiy
38a7fd05f7 In the new order of things dictated by nmount(2), a read-only mount
is to be requested via a "ro" option.  At the same time, MNT_RDONLY
is gradually becoming an indicator of the current state of the FS
instead of a command flag.  Today passing MNT_RDONLY alone to the
kernel's mount machinery will lead to various glitches.  (See the
PRs for examples.)

Therefore mount the root FS with a "ro" option instead of the
MNT_RDONLY flag.  (Note that MNT_RDONLY still is added to the mount
flags internally, by vfs_donmount(), if "ro" was specified.)

To be able to pass "ro" cleanly to kernel_vmount(), teach the latter
function to accept options with NULL values.

Also correct the comment explaining how mount_arg() handles length
of -1.

PR:		bin/106636 kern/120319
Submitted by:	Jaakko Heinonen <see PR kern/120319 for email> (originally)
2008-02-14 17:04:31 +00:00
Attilio Rao
0e9eb108f0 Cleanup lockmgr interface and exported KPI:
- Remove the "thread" argument from the lockmgr() function as it is
  always curthread now
- Axe lockcount() function as it is no longer used
- Axe LOCKMGR_ASSERT() as it is bogus really and no currently used.
  Hopefully this will be soonly replaced by something suitable for it.
- Remove the prototype for dumplockinfo() as the function is no longer
  present

Addictionally:
- Introduce a KASSERT() in lockstatus() in order to let it accept only
  curthread or NULL as they should only be passed
- Do a little bit of style(9) cleanup on lockmgr.h

KPI results heavilly broken by this change, so manpages and
FreeBSD_version will be modified accordingly by further commits.

Tested by: matteo
2008-01-24 12:34:30 +00:00
Attilio Rao
22db15c06f VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
Attilio Rao
cb05b60a89 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
Craig Rodrigues
450ea867c5 In vfs_scanopt(), make sure that the mount option value is not NULL
before calling vsscanf().

PR:		118531
Submitted by:	Jaakko Heinonen <jh saunalahti fi>
MFC after:	3 days
2007-12-31 23:44:53 +00:00
Warner Losh
b27aa20e8d A partial solution to some of the 'pull the umass device with a
mounted FS' problems.  These are more along the lines of 'avoiding an
avoidable panic' than a complete solution to removable devices.  We
now close the barn door after the horse has gotten lose and has been
hit by a truck, as it were.  The barn no longer catches fire in this
case, but the horse is still dead :-).

The vfs_bio.c fix causes us not to put a failed write back into the
dirty pool if the error returned was ENXIO.  In that case, the buffer
is treated like any other clean buffer that's being retured.  ENXIO
means the device isn't there anymore and will never be there again in
the future, so retrying is futile.

The vfs_mount.c fix treats 'ENXIO' as success for unmounting a file
system.  If the device is gone, retrying later won't help and we'll
never be able to unmount the device.

These two are part of a larger patch set submitted by the author.  The
other patches will be forth coming.  I added comments to these two
patches.

Submitted by: Henrik Gulbrandsen
Reviewed by: phk@
PR: usb/46176 (partial)
2007-12-27 16:38:28 +00:00
Craig Rodrigues
62bdb328bb In nmount(), internally convert the mount option: "rdonly" to "ro".
This makes updates mounts such as:
 "mount -u -o rdonly" work more like, "mount -u -o ro".

References to "-o rdonly" were changed to "-o ro" in revision 1.60 of
the mount(8) man page,
but some people still like to use "-o rdonly" since it was documented
in earlier versions of FreeBSD.

Requested by:	rwatson
MFC after:	1 week
2007-12-05 03:26:14 +00:00
Craig Rodrigues
b4b5bf359b In nmount(), if MNT_ROOT is in the mount flags, filter it
out instead of returning an error.
(1)  This makes the behavior consistent with mount(2).
(2)  This makes update mounts on the root file system work properly.
(3)  The explicit checks for MNT_ROOTFS in src/sbin/fsck_ffs/main.c
     and src/usr.sbin/mountd/mountd.c which were put in to
     eliminate errors during update mounts on the root file system
     can be removed.

The only place were MNT_ROOTFS can be validly set
is inside the kernel, i.e. with vfs_mountroot_try().

Reviewed by:	phk
MFC after:	3 days
2007-10-27 15:59:18 +00:00
Robert Watson
30d239bc4c Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

  mac_<object>_<method/action>
  mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme.  Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier.  Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods.  Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by:	SPARTA (original patches against Mac OS X)
Obtained from:	TrustedBSD Project, Apple Computer
2007-10-24 19:04:04 +00:00
Konstantin Belousov
245b204491 When restoring the mount after umount failed, the MNTK_UNMOUNT flag
prevents insmntque() from placing reallocated syncer vnode on mount
list, that causes panic in vfs_allocate_syncvnode().

Introduce MNTK_NOINSMNTQ flag, that marks the period when instmntque is
not allowed to success, instead of MNTK_UNMOUNT. The MNTK_NOINSMNTQ is
set and cleared simultaneously with MNTK_UNMOUNT, except on umount error
path, where it is cleaned just before the syncer vnode is going to be
allocated.

Reported by:	Peter Jeremy <peterjeremy optushome com au>
Suggested by:	tegge
Approved by:	re (rwatson)
2007-09-12 16:31:32 +00:00
John Baldwin
1dc5b1cc56 On 6.x this works:
% mount | grep home
/dev/ad4s1e on /home (ufs, local, noatime, soft-updates)
% mount -u -o atime /home
% mount | grep home
/dev/ad4s1e on /home (ufs, local, soft-updates)

Restore this behavior for on 7.x for the following mount options:
noatime, noclusterr, noclusterw, noexec, nosuid, nosymfollow

In addition, on 7.x, the following are equivalent:
mount -u -o atime /home
mount -u -o nonoatime /home

Ideally, when we introduce new mount options, we should avoid
options starting with "no". :)

Requested by:	jhb
Reported by:	Karol Kwiat <karol.kwiat gmail com>, Scott Hetzel <swhetzel gmail com>
Approved by:	re (bmah)
Proxy commit for:	rodrigc
2007-08-15 17:40:09 +00:00
Pawel Jakub Dawidek
68c1a246ae The v_mountedhere field is protected by the vnode lock, not vnode's internal
lock.

Approved by:	re (rwatson)
2007-07-26 16:52:57 +00:00
Craig Rodrigues
d7f81adbd4 Revert previous commits which I committed by mistake.
Approved by:	re (implicit)
Pointy hat to:	me
2007-07-14 21:23:31 +00:00
Craig Rodrigues
d678780e60 The last entry in the ext2_opts array must be NULL,
otherwise the kernel with crash in vfs_filteropt() if an invalid
mount option is passed to ext2fs.

Approved by:	re (kensmith)
2007-07-14 21:18:19 +00:00
Robert Watson
32f9753cfb Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths.  Do, however, move those prototypes to priv.h.

Reviewed by:	csjp
Obtained from:	TrustedBSD Project
2007-06-12 00:12:01 +00:00
Konstantin Belousov
e5ea32c290 Allow the dounmount() to proceed even for doomed coveredvp.
In dounmount(), before or while vn_lock(coveredvp) is called, coveredvp
vnode may be VI_DOOMED due to one of the following:
- other thread finished unmount and vput()ed it, and vnode was chosen
  for recycling, while vn_lock() slept;
- forced unmount of the coveredvp->v_mount fs.
In the first case, next check for changed v_mountedhere or mnt_gen counter
would be successfull. In the second case, the unmount shall be allowed.

Submitted by:	sobomax
MFC after:	2 weeks
2007-04-26 08:56:56 +00:00
Pawel Jakub Dawidek
7760d8409f Export vfs_mount_alloc() as it is used in ZFS. 2007-04-17 21:14:06 +00:00
Pawel Jakub Dawidek
24b0502ee0 Fix jails and jail-friendly file systems handling:
- We need to allow for PRIV_VFS_MOUNT_OWNER inside a jail.
- Move security checks to vfs_suser() and deny unmounting and updating
  for jailed root from different jails, etc.

OK'ed by:	rwatson
2007-04-13 23:54:22 +00:00
Nate Lawson
a363f67a81 Restore the locking for the sleep/wakeup to avoid waiting an extra 1 sec
if a race was lost.  We're still single-threaded at this point, but just
be safe for the future.
2007-04-09 21:10:04 +00:00
Nate Lawson
6b1e469ea5 Clean up the root mount and mount wait code. No mutexes are needed here
since a spurious wakeup() is the only possible outcome and this is fine in
the BSD programming model.
2007-04-09 19:23:52 +00:00
Pawel Jakub Dawidek
2eb68d493f Add root_mounted() function that returns true if the root file system is
already mounted.
2007-04-08 23:54:01 +00:00
Pawel Jakub Dawidek
f3a8d2f93c Add security.jail.mount_allowed sysctl, which allows to mount and
unmount jail-friendly file systems from within a jail.
Precisely it grants PRIV_VFS_MOUNT, PRIV_VFS_UNMOUNT and
PRIV_VFS_MOUNT_NONUSER privileges for a jailed super-user.
It is turned off by default.

A jail-friendly file system is a file system which driver registers
itself with VFCF_JAIL flag via VFS_SET(9) API.
The lsvfs(1) command can be used to see which file systems are
jail-friendly ones.

There currently no jail-friendly file systems, ZFS will be the first one.
In the future we may consider marking file systems like nullfs as
jail-friendly.

Reviewed by:	rwatson
2007-04-05 21:03:05 +00:00
Robert Watson
5e3f7694b1 Replace custom file descriptor array sleep lock constructed using a mutex
and flags with an sxlock.  This leads to a significant and measurable
performance improvement as a result of access to shared locking for
frequent lookup operations, reduced general overhead, and reduced overhead
in the event of contention.  All of these are imported for threaded
applications where simultaneous access to a shared file descriptor array
occurs frequently.  Kris has reported 2x-4x transaction rate improvements
on 8-core MySQL benchmarks; smaller improvements can be expected for many
workloads as a result of reduced overhead.

- Generally eliminate the distinction between "fast" and regular
  acquisisition of the filedesc lock; the plan is that they will now all
  be fast.  Change all locking instances to either shared or exclusive
  locks.

- Correct a bug (pointed out by kib) in fdfree() where previously msleep()
  was called without the mutex held; sx_sleep() is now always called with
  the sxlock held exclusively.

- Universally hold the struct file lock over changes to struct file,
  rather than the filedesc lock or no lock.  Always update the f_ops
  field last. A further memory barrier is required here in the future
  (discussed with jhb).

- Improve locking and reference management in linux_at(), which fails to
  properly acquire vnode references before using vnode pointers.  Annotate
  improper use of vn_fullpath(), which will be replaced at a future date.

In fcntl(), we conservatively acquire an exclusive lock, even though in
some cases a shared lock may be sufficient, which should be revisited.
The dropping of the filedesc lock in fdgrowtable() is no longer required
as the sxlock can be held over the sleep operation; we should consider
removing that (pointed out by attilio).

Tested by:	kris
Discussed with:	jhb, kris, attilio, jeff
2007-04-04 09:11:34 +00:00
Pawel Jakub Dawidek
afd894bb12 Add root_mount_wait() function which can be used to wait until the root
file system is mounted. This is useful for kernel modules loaded from
/boot/loader.conf, that have to access file system.
2007-04-03 11:45:28 +00:00
Pawel Jakub Dawidek
5c1c2e82e2 I think the code I'm removing here is completely bogus.
vfs_flags field is used for VFCF_* flags which are given at file system
driver creation time (via VFS_SET(9)) macro.

What this code did was bascially this:

If file system registers itself with VFCF_UNICODE flag (stores file names
as Unicode), it will gain MNT_SOFTDEP flag (UFS soft-updates).

If file system registers itself with VFCF_LOOPBACK flag (aliases some other
mounted FS), it will gain MNT_SUIDDIR flag (special handling of SUID on
dirs).

The latter will be quite dangerous, but those flags are reset later in
vfs_domount().

MFC after:	1 month
2007-04-01 13:08:05 +00:00
Pawel Jakub Dawidek
695919ad9a Make vfs_mount_destroy() and vfs_freeopts() non-static, I'd like to use them. 2007-03-31 22:44:45 +00:00
Pawel Jakub Dawidek
9a2fd584b4 Don't deny unmounting file systems for jailed processes immediately, allow
prison_priv_check() to decide what to do.

This change is suppose not to change current (security) behaviour
in any way.

This change is simlar to the change of PRIV_VFS_MOUNT in previous revision.
2007-03-18 02:39:19 +00:00
Pawel Jakub Dawidek
7533652025 Don't deny mounting for jailed processes immediately, allow
prison_priv_check() to decide what to do.

This change is suppose not to change current (security) behaviour
in any way.

Reviewed by:	rwatson
2007-03-14 13:09:59 +00:00
Pawel Jakub Dawidek
f7d4e990c7 White space nits. 2007-03-14 12:54:10 +00:00
Robert Watson
873fbcd776 Further system call comment cleanup:
- Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde)
- Remove extra blank lines in some cases.
- Add extra blank lines in some cases.
- Remove no-op comments consisting solely of the function name, the word
  "syscall", or the system call name.
- Add punctuation.
- Re-wrap some comments.
2007-03-05 13:10:58 +00:00
Olivier Houchard
38cc2a5caa Make vfs_getopts() set *error to ENOENT if the option wasn't found, so that
consumers don't have to check for both error and the return value (some of
them actually don't do it).

MFC After:	1 week
2007-02-13 01:28:48 +00:00
Craig Rodrigues
2892f3bbfa Add a function vfs_deleteopt() which searches through the vfsoptlist
linked list of mount options by name, and deletes the option if it finds it.
2006-12-16 15:44:03 +00:00
Robert Watson
acd3428b7d Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
Robert Watson
aed5570872 Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h.  sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA
2006-10-22 11:52:19 +00:00
Konstantin Belousov
30af71199e Fix the remaining race in the revs. 1.232, 1,233 that could occur during
unmount when mp structure is reused while waiting for coveredvp lock.
Introduce struct mount generation count, increment it on each reuse and
compare the generations before and after obtaining the coveredvp lock.

Reviewed by:	tegge, pjd
Approved by:	pjd (mentor)
MFC after:	2 weeks
2006-10-03 10:47:04 +00:00
Poul-Henning Kamp
f645b0b51c First part of a little cleanup in the calendar/timezone/RTC handling.
Move relevant variables to <sys/clock.h> and fix #includes as necessary.

Use libkern's much more time- & spamce-efficient BCD routines.
2006-10-02 12:59:59 +00:00
Tor Egge
e60c361218 Reduce fluctuations of mnt_flag to allow unlocked readers to get a
slightly more consistent view.
2006-09-26 04:20:09 +00:00
Tor Egge
fba924ce9b Don't restore MNT_QUOTA bit in mnt_flag after a failed mount with
MNT_UPDATE flag, closing a race between nmount() and quotactl().
2006-09-26 04:18:36 +00:00
Tor Egge
a1e363f256 Add mnt_noasync counter to better handle interleaved calls to nmount(),
sync() and sync_fsync() without losing MNT_ASYNC.  Add MNTK_ASYNC flag
which is set only when MNT_ASYNC is set and mnt_noasync is zero, and
check that flag instead of MNT_ASYNC before initiating async io.
2006-09-26 04:15:59 +00:00
Tor Egge
cea9d840d8 Don't restore mnt_kern_flag on failed MNT_UPDATE mount, it can race
with dounmount(), causing loss of MNTK_UNMOUNT flag.
2006-09-26 04:15:04 +00:00
Tor Egge
5da56ddb21 Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag.
This eliminates a race where MNT_UPDATE flag could be lost when nmount()
raced against sync(), sync_fsync() or quotactl().
2006-09-26 04:12:49 +00:00
Konstantin Belousov
f37e633887 Fix the bug in rev. 1.232. If vfs_suser returned false, coveredvp shall be
unlocked only if it really exists.

Found with:	Coverity Prevent(tm)
CID:	1535
Approved by:	pjd (mentor)
2006-09-19 14:04:12 +00:00
Konstantin Belousov
4dec8579bd Fix the race while waiting for coveredvp lock during unmount. The vnode may
be recycled during the sleep, wrap the vn_lock with vhold/vdrop.
Check that coveredvp still points to the same mp after sleep (needed
because sleep dropped Giant).
Move check for user rights for unmount after coveredvp lock is obtained.

Tested by:	Peter Holm
Reviewed by:	tegge
Approved by:	kan (mentor)
MFC after:	2 weeks
2006-09-18 15:35:22 +00:00
Marius Strobl
aed760ef8a Fix another bug introduced with rev. 1.204; in vfs_donmount() if
the 'vfs_getopt(optlist, "errmsg", (void **)&errmsg, &errmsg_len)'
call fails, 'errmsg' is left uninitialized, making the later tests
against NULL meaningless, and the uses bogus. Thus initialize
'errmsg' to NULL beforehand. [1]
While at it, remove the superfluous assignment of 0 to 'errmsg_len'
if the above mentioned call fails as it's already initialized to 0.

Submitted by:	Michael Plass [1]
2006-08-26 16:28:19 +00:00
Pawel Jakub Dawidek
bebabf24bb Fix comment. 2006-08-25 15:13:49 +00:00
Marius Strobl
3a30d178fe Fix a bug introduced with rev. 1.204; in vfs_donmount() use
copyout(9) instead of copystr(9) for copying the errmsg from
kernel- to user-space. This fixes a panic on sparc64 when
using the nmount(2)-converted mountd(8).
While at it, use bcopy(3) instead of strncpy(3) in the kernel-
to kernel-space case for consistency with vfs_buildopts() and
between kernel- to user-space and kernel- to kernel-space case.
2006-08-24 18:52:28 +00:00
John Baldwin
597d608f86 - Expand the scope of Giant some in mount(2) to protect the vfsp structure
from going away.  mount(2) is now MPSAFE.
- Expand the scope of Giant some in unmount(2) to protect the mp structure
  (or rather, to handle concurrent unmount races) from going away.
  umount(2) is now MPSAFE, as well as linux_umount() and linux_oldumount().
- nmount(2) and linux_mount() were already MPSAFE.
2006-06-27 14:46:31 +00:00
Robert Watson
7ebfc8df78 Audit some arguments to nmount(), mount(), umount().
Submitted by:	wsalamon
Obtained from:	TrustedBSD Project
2006-06-05 15:32:07 +00:00
Pawel Jakub Dawidek
1f58dd4956 Fix a problem introduced in revision 1.220. On mount(2) failure, don't
forget to unbusy file system before its destruction.

This fixes the following warning on mount failure:

	Mount point <X> had 1 dangling refs

Tested by:	wkoszek
2006-06-02 20:29:02 +00:00
Craig Rodrigues
0c89bb0a02 Add "update" mount option to global_opts array,
for use with vfs_filteropt().
2006-05-26 02:38:48 +00:00
Craig Rodrigues
5eb304a91a Remove calls to vfs_export() for exporting a filesystem for NFS mounting
from individual filesystems.  Call it instead in vfs_mount.c,
after we call VFS_MOUNT() for a specific filesystem.
2006-05-26 00:32:21 +00:00
Kelly Yancey
c9ad8a67af Restore the ability to mount procfs and fdescfs filesystems via the
mount(2) system call:

  * Add cmount hook to fdescfs and pseudofs (and, by extension, procfs and
    linprocfs).  This (mostly) restores the ability to mount these
    filesystems using the old mount(2) system call (see below for the
    rest of the fix).

  * Remove not-NULL check for the data argument from the mount(2) entry
    point.  Per the mount(2) man page, it is up to the individual
    filesystem being mounted to verify data.  Or, in the case of procfs,
    etc. the filesystem is free to ignore the data parameter if it does
    not use it.  Enforcing data to be not-NULL in the mount(2) system call
    entry point prevented passing NULL to filesystems which ignored the
    data pointer value.  Apparently, passing NULL was common practice
    in such cases, as even our own mount_std(8) used to do it in the
    pre-nmount(2) world.

All userland programs in the tree were converted to nmount(2) long ago,
but I've found at least one external program which broke due to this
(presumably unintentional) mount(2) API change.  One could argue that
external programs should also be converted to nmount(2), but then there
isn't much point in keeping the mount(2) interface for backward
compatibility if it isn't backward compatible.
2006-05-15 19:42:10 +00:00
Craig Rodrigues
5250012a1d For nmount(), if "rw" is specified as a mount option,
add "noro" to the list of mount options.  This allows
a read-only mount to be converted to read-write via:
mount -u -o rw

Requested by:	kris
2006-05-14 01:51:38 +00:00
Jeff Roberson
ba5eb429e3 - When there are dangling vnodes at unmount print them before we panic.
Sponsored by:	Isilon Systems, Inc.
2006-03-31 23:38:15 +00:00
Jeff Roberson
a218edceb2 - Allocate mounts from a uma zone that uses UMA_ZONE_NOFREE to prevent
mount memory from being reclaimed.  This resolves a number of race
   conditions described in vfs_default.c and introduced with the
   VFS_LOCK_GIANT macros.
 - Let the mtx and lock remain valid after the mount structure has been
   freed by using init and fini calls.  Technically fini will never be
   called but is included for completeness.
 - Consistently use lockmgr directly rather than lockmgr to lock and
   vfs_unbusy to unlock.

Discussed with:	tegge
Tested by:	kris
Sponsored by:	Isilon Systems, Inc.
2006-03-31 03:49:51 +00:00
Ruslan Ermilov
936ddefcd6 The mount(8) manpage says: "In case of conflicting options being
specified, the rightmost option takes effect."  Fix code to obey
this.  This makes e.g. "mount -r /usr" or "mount -ar" actually
mount file systems read-only.
2006-03-13 14:58:37 +00:00
Tor Egge
791dd2fade Use vn_start_secondary_write() and vn_finished_secondary_write() as a
replacement for vn_write_suspend_wait() to better account for secondary write
processing.

Close race where secondary writes could be started after ffs_sync() returned
but before the file system was marked as suspended.

Detect if secondary writes or softdep processing occurred during vnode sync
loop in ffs_sync() and retry the loop if needed.
2006-03-08 23:43:39 +00:00
Jeff Roberson
a4aeaefe5a - We can not hold a vnode lock while we do a lookup. Search for and load
modules prior to looking up the directory which we will cover to avoid
   this problem in mount.
 - We must hold the coveredvp locked before we can busy the mountpoint to
   prevent a lock order reversal with the vfs_busy() in lookup which holds
   the directory lock prior to doing a vfs_busy().  The directory lock is
   required to safely clear the v_mountedhere field on the directory.

MFC After:	1 week
2006-02-22 06:29:55 +00:00
Jeff Roberson
04f6d3effa - Add a ref count to the mount structure. Sleep for up to 3 seconds in
vfs_mount_destroy waiting for this ref to hit 0.  We don't print an
   error if we are rebooting as the root mount always retains some refernces
   by init proc.
 - Acquire a mnt ref for every vnode allocated to a mount point.  Drop this
   ref only once vdestroy() has been called and the mount has been freed.
 - No longer NULL the v_mount pointer in delmntque() so that we may release
   the ref after vgone() has been called.  This allows us to guarantee
   that the mount point structure will be valid until the last vnode has
   lost its last ref.
 - Fix a few places that rely on checking v_mount to detect recycling.

Sponsored by:	Isilon Systems, Inc.
MFC After:	1 week
2006-02-06 10:19:50 +00:00
Suleiman Souhlal
c270875f7c Don't try to load KLDs if we're mounting the root. We'd otherwise panic.
Tested by:	kris
MFC after:	3 days
2006-01-28 22:58:39 +00:00
Christian S.J. Peron
323203d389 vfs_busy can only return something useful if MNTK_UNMOUNT has been set.
Since we are using vfs_busy() on a freshly allocated mount structure, use
(void) to show that we do not care about the return value.

Found with:	Coverity Prevent (tm)
MFC after:	2 weeks
2006-01-15 20:14:11 +00:00
Robert Watson
6994eebcab Cast VFS_STATFS() in vfs_domount() to (void) to indicate that ignoring the
return value is intentional: this is simply an attempt to pre-cache the
statfs state.

Found with:	Coverity Prevent (tm)
MFC after:	3 days
2006-01-15 20:01:05 +00:00
Ruslan Ermilov
6a61c14ee1 AMD64 also supports disk slices. 2006-01-14 20:47:11 +00:00
Tor Egge
82be0a5a24 Add marker vnodes to ensure that all vnodes associated with the mount point are
iterated over when using MNT_VNODE_FOREACH.

Reviewed by:	truckman
2006-01-09 20:42:19 +00:00
Pawel Jakub Dawidek
ade9b797a0 vfs_mount_alloc() always returns 0, but what we really want is newly
allocated 'struct mount *' pointer, so simplify code a bit and return
the pointer directly.

Reviewed by:	ssouhlal
2005-12-20 00:43:51 +00:00
Pawel Jakub Dawidek
003ba8a000 Use 'td' instead of 'curthread'. 2005-12-19 16:27:13 +00:00
Craig Rodrigues
d5989f64cf In devfs_first(), set mp->mnt_opt to a valid empty list of mount options
instead of leaving it NULL.  This eliminates a kernel panic
when trying to do a mount -o update of /dev.

Noticed by:	cjsp
Reviewed by:	phk
2005-12-08 04:27:53 +00:00
Craig Rodrigues
8539ca4cde Add "errmsg" to list of global mount options. 2005-12-08 04:09:29 +00:00
Craig Rodrigues
1245b3433e Add "rdonly" to global_opts, and parse it in vfs_donmount().
Requested by:	rwatson
2005-12-03 12:04:20 +00:00
Craig Rodrigues
ec528a3472 - Add "rw" mount option to global_opts.
- In vfs_donmount(), parse "ro", "noro", and "rw", in order to set or
  unset the MNT_RDONLY filesystem flag.
2005-12-03 01:26:27 +00:00
Craig Rodrigues
5e6b93a014 In nmount() and vfs_donmount(), do not strcmp() the options in the iovec
directly.  We need to copyin() the strings in the iovec before
we can strcmp() them.  Also, when we want to send the errmsg back
to userspace, we need to copyout()/copystr() the string.

Add a small helper function vfs_getopt_pos() which takes in the
name of an option, and returns the array index of the name in the iovec,
or -1 if not found.  This allows us to locate an option in
the iovec without actually manipulating the iovec members. directly via
strcmp().

Noticed by:	kris on sparc64
2005-11-23 20:51:15 +00:00
Marcel Moolenaar
60b7823989 Fix bug introduced in revision 1.186:
When all file systems have a time stamp of zero, which is the case
for example when the root file system is on a read-only medium, we
ended up not calling inittodr() at all.  A potential uncleanliness
existed as well. If multiple file systems had a non-zero time stamp,
we would call inittodr() multiple times. While this should not be
harmful, it's definitely not ideal.
Fix both issues by iterating over the mounted file systems to find
the largest time stamp and call inittodr() exactly once with that
time stamp. This could of course be a zero time stamp if none of the
mounted file systems have a non-zero time stamp. In that case the
annoying errors mentioned in the commit log for revision 1.186 still
haven't been avoided. The bottom line is that inittodr() should not
complain when it gets a time base of zero. At the time of this
commit only alpha seems to have that problem.

Reported by: Dario Freni (saturnero at freesbie dot org)
MFC after: 1 week
2005-11-19 21:51:45 +00:00
Craig Rodrigues
425e5b6268 Parse more mount options in vfs_donmount(), before vfs_domount()
is called.  It looks like there are lots of different mount flags checked
in vfs_domount(), so we need to do the parsing for these particular
mount flags earlier on.  The new flags parsed are:
async, force, multilabel, noasync, noatime, noclusterr, noclusterw,
noexec, nosuid, nosymfollow, snapshot, suiddir, sync, union.

Existing code which uses mount() to mount UFS filesystems is not
affected, but new code which uses nmount() to mount UFS filesystems
should behave better.
2005-11-19 21:22:21 +00:00
Craig Rodrigues
8fd860cfa1 In vfs_nmount(), check to see if "update" mount option was passed
in, and if so, set MNT_UPDATE filesystem flag.
vfs_nmount() calls vfs_domount(), and there is special logic
inside vfs_domount() if MNT_UPDATE is set.  This is very important
when we want to do an update mount of the root filesystem, using nmount().
2005-11-18 01:31:10 +00:00
Craig Rodrigues
d5328381f1 style(9) cleanups.
Spotted by:	njl, bde
2005-11-12 14:41:44 +00:00
Craig Rodrigues
4560dfb5b1 For nmount(), allow a text string error message to be propagated back
to user-space if a parameter named "errmsg" is passed into the iovec.
Used in conjunction with vfs_mount_error(), more useful error messages
than errno can be passed back to userspace when mounting a filesystem
fails.

Discussed with:		phk, pjd
2005-11-09 02:26:38 +00:00
Craig Rodrigues
84e69560b6 Add utility function to propagate mount errors as text string messages.
Discussed with:		phk
2005-11-08 04:13:39 +00:00
Suleiman Souhlal
2611e5a6a9 Don't unbusy the devfs mount in vfs_mountroot_try() as it gets accessed
and unbusied in devfs_fixup(), which assumes that the devfs mount is
still locked.

Granced at by:	phk
MFC after:	3 days
2005-09-02 13:37:54 +00:00
Pawel Jakub Dawidek
b578b0bdd3 devfs_first() return value isn't used, remove it. 2005-05-18 22:05:12 +00:00
Pawel Jakub Dawidek
07ebf8c8c3 We don't use 'mp' variable, but we do want to mount devfs, ehh. 2005-05-12 01:49:51 +00:00
Pawel Jakub Dawidek
b8bc5373e1 Remove unised variable introduced by accident in rev 1.168.
Found by:	Coverity Prevent analysis tool
2005-05-11 19:50:34 +00:00
Pawel Jakub Dawidek
f850b2781f Plug memory leaks.
Found by:		Coverity Prevent analysis tool
2005-05-11 19:27:38 +00:00
Jeff Roberson
194dfed917 - Remove an old splcam hack. 2005-05-01 00:59:55 +00:00
Pawel Jakub Dawidek
f163441e7e Call g_waitidle() before every check the list of holds is empty.
Suggested by:	phk
2005-04-19 21:44:44 +00:00
Poul-Henning Kamp
d1c712ede2 Call g_waitidle() instead of GEOM using the root_mount_hold() KPI.
GEOM could (and will) get events as a result of drivers coming in
late so a one-shot method is not good enough for GEOM.
2005-04-19 06:23:59 +00:00
Poul-Henning Kamp
73fbaa74e5 Add a named reference-count KPI to hold off mounting of the root filesystem.
While we wait for holds to be released, print a list of who holds us
back once per second.

Use the new KPI from GEOM instead of vfs_mount.c calling g_waitidle().

Use the new KPI also from ata.

With ATAmkIII's newbusification, ata could narrowly miss the window
and ad0 would not exist when we tried to mount root.
2005-04-18 21:21:26 +00:00
Poul-Henning Kamp
bdb3564638 Initialize mountlist_mtx with an MTX_SYSINIT(), we need it to be ready
earlier.
2005-04-18 21:11:47 +00:00
Jeff Roberson
f247a5240d - LK_NOPAUSE is a nop now.
Sponsored by:   Isilon Systems, Inc.
2005-03-31 04:37:09 +00:00
Marcel Moolenaar
379ba85322 Fix inittodr() invocation. Now that devfs is mounted before the
actual root file system is mounted, the first entry on the mountlist
is not the root file system and the timestamp for that entry is
typically 0. Passing that to inittodr() caused annoying errors on
alpha and ia64.
So, call inittodr() for all file systems on mountlist, but only when
the timestamp (mnt_time) is non-zero.
2005-03-25 01:56:12 +00:00