Commit Graph

1684 Commits

Author SHA1 Message Date
Edward Tomasz Napierala
9340fc72e6 Implement NFSv4 ACL support for UFS.
Reviewed by:	rwatson
2009-12-21 19:39:10 +00:00
Konstantin Belousov
49e3050e6c VI_OBJDIRTY vnode flag mirrors the state of OBJ_MIGHTBEDIRTY vm object
flag. Besides providing the redundand information, need to update both
vnode and object flags causes more acquisition of vnode interlock.
OBJ_MIGHTBEDIRTY is only checked for vnode-backed vm objects.

Remove VI_OBJDIRTY and make sure that OBJ_MIGHTBEDIRTY is set only for
vnode-backed vm objects.

Suggested and reviewed by:	alc
Tested by:	pho
MFC after:	3 weeks
2009-12-21 12:29:38 +00:00
Dag-Erling Smørgrav
0fbc5fbedf Sync with head 2009-09-25 22:45:59 +00:00
Dag-Erling Smørgrav
0bb2e5de60 Further improve comments. 2009-09-25 18:50:33 +00:00
Dag-Erling Smørgrav
91b2a3e204 Improve comments, and remove a bogus 0 id check. 2009-09-25 18:44:34 +00:00
Roman Divacky
e0a770a01d Don't build ufs_gjournal.c at all if UFS_GJOURNAL option is not given
instead of building an almost empty C file.

Approved by:	pjd
Approved by:	ed (mentor, implicit)
2009-09-22 16:22:05 +00:00
Dag-Erling Smørgrav
10b3b54548 Merge from head 2009-09-17 16:16:44 +00:00
Dag-Erling Smørgrav
7d4b968b0f Merge from head up to r188941 (last revision before the USB stack switch) 2009-09-17 13:31:39 +00:00
Brooks Davis
5343b3a28c Allocate space for the group array in a static credential used in
the quota code.  One case was correctly handled in r194498, but
this one was missed.

PR:		kern/138657
Tested by:	PR submitter
MFC after:	3 days
2009-09-17 12:35:13 +00:00
Edward Tomasz Napierala
2ff8f63aa6 Remove useless variable assignment. 2009-09-08 17:23:32 +00:00
Konstantin Belousov
6cc745d2d7 insmntque_stddtr() clears vp->v_data and resets vp->v_op to
dead_vnodeops before calling vgone(). Revert r189706 and corresponding
part of the r186560.

Noted and reviewed by:	tegge
Approved by:	des (pseudofs part)
MFC after:	3 days
2009-09-07 11:55:34 +00:00
Konstantin Belousov
5c61c646a3 The clear_remove() and clear_inodedeps() call vn_start_write(NULL, &mp,
V_NOWAIT) on the non-busied mount point. Unmount might free ufs-specific
mp data, causing ffs_vgetf() to access freed memory.

Busy mountpoint before dropping softdep lk.

Noted and reviewed by:	tegge
Tested by:	pho
MFC after:	1 week
2009-09-06 11:46:51 +00:00
Konstantin Belousov
165a3b418f When a UFS node is truncated to the zero length, e.g. by explicit
truncate(2) call, or by being removed or truncated on open, either
new softupdate freeblks structure is allocated to track the freed
blocks of the node, or truncation is done syncronously when too many SU
dependencies are accumulated. The decision does not take into account
the allocated freeblks dependencies, allowing workloads that do huge
amount of truncations to exhaust the kernel memory.

Take the number of allocated freeblks into consideration for
softdep_slowdown().

Reported by:	pluknet gmail com
Diagnosed and tested by:	pho
Approved by:	re (rwatson)
MFC after:	1 month
2009-08-14 11:00:38 +00:00
Edward Tomasz Napierala
340263c992 Fix fpathconf(3) on fifos, in effect making ls(1) properly
display '+' on them.  Taken from kern/125613, with cosmetic
changes.

PR:		kern/125613
Submitted by:	Jaakko Heinonen <jh at saunalahti dot fi>
Approved by:	re (kib)
2009-07-02 20:05:21 +00:00
Konstantin Belousov
f1eccd05ec In vn_vget_ino() and their inline equivalents, mnt_ref() the mount point
around the sequence that drop vnode lock and then busies the mount point.
Not having vlocked node or direct reference to the mp allows for the
forced unmount to proceed, making mp unmounted or reused.

Tested by:	pho
Reviewed by:	jeff
Approved by:	re (kensmith)
MFC after:	2 weeks
2009-07-02 18:02:55 +00:00
Edward Tomasz Napierala
4bc61fd4ec Don't panic on attempt to set ACL on a block device file.
This is just a part of kern/125613.

PR:		kern/125613
Submitted by:	Jaakko Heinonen <jh at saunalahti dot fi>
Reviewed by:	rwatson
Approved by:	re (kib)
2009-07-01 22:30:36 +00:00
Konstantin Belousov
cfba50c070 For SU mounts, softdep_fsync() might drop vnode lock, allowing other
threads to put dirty buffers on the vnode bufobj list. For regular files
and synchronous fsync requests, check for the condition and restart the
fsync vop if a new dirty buffer arrived.

Tested by:	pho
Approved by:	re (kensmith)
MFC after:	1 month
2009-06-30 10:07:33 +00:00
Konstantin Belousov
a50d1b2a66 Softdep_fsync() may need to lock parent directory of the synced vnode.
Use inlined (due to FFSV_FORCEINSMQ) version of vn_vget_ino() to prevent
mountpoint from being unmounted and freed while no vnodes are locked.

Tested by:	pho
Approved by:	re (kensmith)
MFC after:	1 month
2009-06-30 10:07:00 +00:00
Sean Nicholas Barkas
5a6dafe63e Fix a bug reported by pho@ where one can induce a panic by decreasing
vfs.ufs.dirhash_maxmem below the current amount of memory used by dirhash. When
ufsdirhash_build() is called with the memory in use greater than dirhash_maxmem,
it attempts to free up memory by calling ufsdirhash_recycle(). If successful in
freeing enough memory, ufsdirhash_recycle() leaves the dirhash list locked. But
at this point in ufsdirhash_build(), the list is not explicitly unlocked after
the call(s) to ufsdirhash_recycle(). When we next attempt to lock the dirhash
list, we will get a "panic: _mtx_lock_sleep: recursed on non-recursive mutex
dirhash list".

Tested by:	pho
Approved by:	dwmalone (mentor)
MFC after:	3 weeks
2009-06-25 20:40:13 +00:00
Brooks Davis
838d985825 Rework the credential code to support larger values of NGROUPS and
NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024
and 1023 respectively.  (Previously they were equal, but under a close
reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it
is the number of supplemental groups, not total number of groups.)

The bulk of the change consists of converting the struct ucred member
cr_groups from a static array to a pointer.  Do the equivalent in
kinfo_proc.

Introduce new interfaces crcopysafe() and crsetgroups() for duplicating
a process credential before modifying it and for setting group lists
respectively.  Both interfaces take care for the details of allocating
groups array. crsetgroups() takes care of truncating the group list
to the current maximum (NGROUPS) if necessary.  In the future,
crsetgroups() may be responsible for insuring invariants such as sorting
the supplemental groups to allow groupmember() to be implemented as a
binary search.

Because we can not change struct xucred without breaking application
ABIs, we leave it alone and introduce a new XU_NGROUPS value which is
always 16 and is to be used or NGRPS as appropriate for things such as
NFS which need to use no more than 16 groups.  When feasible, truncate
the group list rather than generating an error.

Minor changes:
  - Reduce the number of hand rolled versions of groupmember().
  - Do not assign to both cr_gid and cr_groups[0].
  - Modify ipfw to cache ucreds instead of part of their contents since
    they are immutable once referenced by more than one entity.

Submitted by:	Isilon Systems (initial implementation)
X-MFC after:	never
PR:		bin/113398 kern/133867
2009-06-19 17:10:35 +00:00
Sean Nicholas Barkas
fe1bd0a483 Keep dirhash tailq locked throughout the entirety of ufsdirhash_destroy() to fix
a potential race pointed out by pjd. Also use TAILQ_FOREACH_SAFE to iterate over
dirhashes in ufsdirhash_lowmem(), so that we can continue iterating even after a
dirhash is destroyed.

Suggested by:	pjd
Tested by:      pho
Approved by:	dwmalone (mentor)
2009-06-17 18:55:29 +00:00
Konstantin Belousov
01ed174831 Do not use casts (int *)0 and (struct thread *)0 for the arguments of
vn_rdwr, use NULL.

Reviewed by:	jhb
MFC after:	1 week
2009-06-16 15:13:45 +00:00
Robert Watson
bcf11e8d00 Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with:	pjd
2009-06-05 14:55:22 +00:00
Sean Nicholas Barkas
af0adb8312 Add vm_lowmem event handler for dirhash. This will cause dirhashes to be
deleted when the system is low on memory. This ought to allow an increase to
vfs.ufs.dirhash_maxmem on machines that have lots of memory, without
degrading performance by having too much memory reserved for dirhash when
other things need it. The default value for dirhash_maxmem is being kept at
2MB for now, though.

This work was mostly done during the 2008 Google Summer of Code.

Approved by:	dwmalone (mentor), re
MFC after:	3 months
2009-06-03 09:44:22 +00:00
Attilio Rao
f083018223 Handle lock recursion differenty by always checking against LO_RECURSABLE
instead the lock own flag itself.

Tested by:	pho
2009-06-02 13:03:35 +00:00
Jamie Gritton
0304c73163 Add hierarchical jails. A jail may further virtualize its environment
by creating a child jail, which is visible to that jail and to any
parent jails.  Child jails may be restricted more than their parents,
but never less.  Jail names reflect this hierarchy, being MIB-style
dot-separated strings.

Every thread now points to a jail, the default being prison0, which
contains information about the physical system.  Prison0's root
directory is the same as rootvnode; its hostname is the same as the
global hostname, and its securelevel replaces the global securelevel.
Note that the variable "securelevel" has actually gone away, which
should not cause any problems for code that properly uses
securelevel_gt() and securelevel_ge().

Some jail-related permissions that were kept in global variables and
set via sysctls are now per-jail settings.  The sysctls still exist for
backward compatibility, used only by the now-deprecated jail(2) system
call.

Approved by:	bz (mentor)
2009-05-27 14:11:23 +00:00
Edward Tomasz Napierala
ae1add4e55 Make 'struct acl' larger, as required to support NFSv4 ACLs. Provide
compatibility interfaces in both kernel and libc.

Reviewed by:	rwatson
2009-05-22 15:56:43 +00:00
Alan Cox
6e5982caf7 Introduce vfs_bio_set_valid() and use it from ffs_realloccg(). This
eliminates the misuse of vfs_bio_clrbuf() by ffs_realloccg().

In collaboration with:	tegge
2009-05-17 20:26:00 +00:00
Attilio Rao
dfd233edd5 Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS.  Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled.  Bump __FreeBSD_version in order to signal such
situation.
2009-05-11 15:33:26 +00:00
Alexander Kabaev
5679fe1957 Do not embed struct ucred into larger netcred parent structures.
Credential might need to hang around longer than its parent and be used
outside of mnt_explock scope controlling netcred lifetime. Use separate
reference-counted ucred allocated separately instead.

While there, extend mnt_explock coverage in vfs_stdexpcheck and clean-up
some unused declarations in new NFS code.

Reported by:	John Hickey
PR:		kern/133439
Reviewed by:	dfr, kib
2009-05-09 18:09:17 +00:00
Rick Macklem
933a30b6b7 Change the semantics of i_modrev/va_filerev to what is required for
the nfsv4 Change attribute. There are 2 changes:
 	1 - The value now changes on metadata changes as well as data
 	    modifications (incremented for IN_CHANGE instead of IN_UPDATE).
 	2 - It is now saved in spare space in the on-disk i-node so that it
 	    survives a crash.
 	Since va_filerev is not passed out into user space, the only current
 	use of va_filerev is in the nfs server, which uses it as the directory
 	cookie verifier. Since this verifier is only passed back to the server
 	by a client verbatim and then the server doesn't check it, changing the
 	semantics should not break anything currently in FreeBSD.

Reviewed by:	bde
Approved by:	kib (mentor)
2009-04-27 16:46:16 +00:00
Konstantin Belousov
f4ffd67c18 In ufs_checkpath(), recheck that '..' still points to the inode with
the same inode number after VFS_VGET() and relock of the vp. If '..'
changed, redo the lookup. To reduce code duplication, move the code to
read '..' dirent into the static helper function ufs_dir_dd_ino().

Supply the source inode number as an argument to ufs_checkpath() instead
of the source inode itself. The inode is unlocked, thus it might be
reclaimed, causing accesses to the freed memory.

Use vn_vget_ino() to get the '..' vnode by its inode number, instead of
directly code VFS_VGET() and relock, to properly busy the mount point
while vp lock is dropped.

Noted and reviewed by:	tegge
Tested by:	pho
MFC after:	1 month
2009-04-20 14:36:01 +00:00
Konstantin Belousov
f2cc3668fc When verifying '..' after VFS_VGET() in ufs_lookup(), do not return
error if '..' is still there but changed between lookup and check.
Start relookup instead. Rename is supposed to change '..' reference
atomically, so transient failures introduced by r191137 are wrong.

While rearranging the code to allow lookup restart in ufs_lookup(),
remove the comment that only distracts the reader.

Noted and reviewed by:	tegge
Also reported by:	pho
MFC after:	1 month
2009-04-19 05:34:07 +00:00
Edward Tomasz Napierala
b998d381f2 Use acl_alloc() and acl_free() instead of using uma(9) directly.
This will make switching to malloc(9) easier; also, it would be
neccessary to add these routines if/when we implement variable-size
ACLs.
2009-04-18 16:47:33 +00:00
Konstantin Belousov
71421dc116 Verify that '..' still exists with the same inode number after
VFS_VGET() has returned in ufs_lookup(). If the '..' lookup started
immediately before the parent directory was removed, we might return
either cleared or unrelated inode otherwise.

Ufs_lookup() is split into new function ufs_lookup_() that either does
lookup, or verifies that directory entry exists and references supplied
inode number.

Reviewed by:	tegge
Tested by:	pho,
	Andreas Tobler <andreast-list fgznet ch> (previous version)
MFC after:	1 month
2009-04-16 09:57:08 +00:00
Robert Watson
885868cd8f Remove VOP_LEASE and supporting functions. This hasn't been used since
the removal of NQNFS, but was left in in case it was required for NFSv4.
Since our new NFSv4 client and server can't use it for their
requirements, GC the old mechanism, as well as other unused lease-
related code and interfaces.

Due to its impact on kernel programming and binary interfaces, this
change should not be MFC'd.

Proposed by:    jeff
Reviewed by:    jeff
Discussed with: rmacklem, zach loafman @ isilon
2009-04-10 10:52:19 +00:00
Konstantin Belousov
bc364c4e99 When removing or renaming snaphost, do not delve into request_cleanup().
The later may need blocks from the underlying device that belongs
to normal files, that should not be locked while snap lock is held.

Reported and tested by:	pho
MFC after:	1 month
2009-04-04 12:19:52 +00:00
Konstantin Belousov
02e06d99e6 Correct typo.
Noted by:	kensmith
2009-03-27 15:46:02 +00:00
Konstantin Belousov
c1d8b5e82c Fix two issues with bufdaemon, often causing the processes to hang in
the "nbufkv" sleep.

First, ffs background cg group block write requests a new buffer for
the shadow copy. When ffs_bufwrite() is called from the bufdaemon due
to buffers shortage, requesting the buffer deadlock bufdaemon.
Introduce a new flag for getnewbuf(), GB_NOWAIT_BD, to request getblk
to not block while allocating the buffer, and return failure
instead. Add a flag argument to the geteblk to allow to pass the flags
to getblk(). Do not repeat the getnewbuf() call from geteblk if buffer
allocation failed and either GB_NOWAIT_BD is specified, or geteblk()
is called from bufdaemon (or its helper, see below). In
ffs_bufwrite(), fall back to synchronous cg block write if shadow
block allocation failed.

Since r107847, buffer write assumes that vnode owning the buffer is
locked. The second problem is that buffer cache may accumulate many
buffers belonging to limited number of vnodes. With such workload,
quite often threads that own the mentioned vnodes locks are trying to
read another block from the vnodes, and, due to buffer cache
exhaustion, are asking bufdaemon for help. Bufdaemon is unable to make
any substantial progress because the vnodes are locked.

Allow the threads owning vnode locks to help the bufdaemon by doing
the flush pass over the buffer cache before getnewbuf() is going to
uninterruptible sleep. Move the flushing code from buf_daemon() to new
helper function buf_do_flush(), that is called from getnewbuf().  The
number of buffers flushed by single call to buf_do_flush() from
getnewbuf() is limited by new sysctl vfs.flushbufqtarget.  Prevent
recursive calls to buf_do_flush() by marking the bufdaemon and threads
that temporarily help bufdaemon by TDP_BUFNEED flag.

In collaboration with:	pho
Reviewed by:	 tegge (previous version)
Tested by:	 glebius, yandex ...
MFC after:	 3 weeks
2009-03-16 15:39:46 +00:00
Konstantin Belousov
e65f5a4ead The non-modifying EA VOPs are executed with only shared vnode lock taken.
Provide a custom lock around initializing and tearing down EA area,
to prevent both memory leaks and double-free of it. Count the number
of EA area accessors.

Lock protocol requires either holding exclusive vnode lock to modify
i_ea_area, or shared vnode lock and owning IN_EA_LOCKED flag in i_flag.

Noted by:	YAMAMOTO, Taku <taku tackymt homeip net>
Tested by:	pho (previous version)
MFC after:	2 weeks
2009-03-12 12:43:56 +00:00
Konstantin Belousov
a9d9537110 Do not double-free the struct inode when insmntque failed. Default
insmntque destructor reclaims the vnode, and ufs_reclaim frees the memory.

Reviewed by:	tegge
MFC after:	3 days
2009-03-11 19:45:52 +00:00
John Baldwin
33fc362512 Add a new internal mount flag (MNTK_EXTENDED_SHARED) to indicate that a
filesystem supports additional operations using shared vnode locks.
Currently this is used to enable shared locks for open() and close() of
read-only file descriptors.
- When an ISOPEN namei() request is performed with LOCKSHARED, use a
  shared vnode lock for the leaf vnode only if the mount point has the
  extended shared flag set.
- Set LOCKSHARED in vn_open_cred() for requests that specify O_RDONLY but
  not O_CREAT.
- Use a shared vnode lock around VOP_CLOSE() if the file was opened with
  O_RDONLY and the mountpoint has the extended shared flag set.
- Adjust md(4) to upgrade the vnode lock on the vnode it gets back from
  vn_open() since it now may only have a shared vnode lock.
- Don't enable shared vnode locks on FIFO vnodes in ZFS and UFS since
  FIFO's require exclusive vnode locks for their open() and close()
  routines.  (My recent MPSAFE patches for UDF and cd9660 already included
  this change.)
- Enable extended shared operations on UFS, cd9660, and UDF.

Submitted by:	ups
Reviewed by:	pjd (ZFS bits)
MFC after:	1 month
2009-03-11 14:13:47 +00:00
John Baldwin
5bd65606f4 Adjust some variables (mostly related to the buffer cache) that hold
address space sizes to be longs instead of ints.  Specifically, the follow
values are now longs: runningbufspace, bufspace, maxbufspace,
bufmallocspace, maxbufmallocspace, lobufspace, hibufspace, lorunningspace,
hirunningspace, maxswzone, maxbcache, and maxpipekva.  Previously, a
relatively small number (~ 44000) of buffers set in kern.nbuf would result
in integer overflows resulting either in hangs or bogus values of
hidirtybuffers and lodirtybuffers.  Now one has to overflow a long to see
such problems.  There was a check for a nbuf setting that would cause
overflows in the auto-tuning of nbuf.  I've changed it to always check and
cap nbuf but warn if a user-supplied tunable would cause overflow.

Note that this changes the ABI of several sysctls that are used by things
like top(1), etc., so any MFC would probably require a some gross shims
to allow for that.

MFC after:	1 month
2009-03-09 19:35:20 +00:00
Edward Tomasz Napierala
4f560d7595 Right now, when trying to unmount a device that's already gone,
msdosfs_unmount() and ffs_unmount() exit early after getting ENXIO.
However, dounmount() treats ENXIO as a success and proceeds with
unmounting.  In effect, the filesystem gets unmounted without closing
GEOM provider etc.

Reviewed by:	kib
Approved by:	rwatson (mentor)
Tested by:	dho
Sponsored by:	FreeBSD Foundation
2009-02-23 21:09:28 +00:00
Edward Tomasz Napierala
3c140b2df4 Refactor, moving error checking outside of the
'if (mp->mnt_flag & MNT_SOFTDEP)' conditional.  No functional
changes.

Reviewed by:	kib
Approved by:	rwatson (mentor)
Tested by:	pho
Sponsored by:	FreeBSD Foundation
2009-02-23 20:56:27 +00:00
John Baldwin
ee445a69c5 - If the g_access() call for the initial root mount fails, then fully
cleanup.  Before the GEOM consumer would not have been closed.
- Bump the reference on the character device being mounted while the
  associated devfs vnode is locked.

Reviewed by:	kib
2009-02-11 22:19:54 +00:00
Edward Tomasz Napierala
8a3f2c376a When a device containing mounted UFS filesystem disappears, the type
of devvp becomes VBAD, which UFS incorrectly interprets as snapshot
vnode, which in turns causes panic.  Fix it by replacing '!= VCHR'
with '== VREG'.

With this fix in place, you should no longer be able to panic the system
by removing a device with an UFS filesystem mounted from it - assuming
you don't use softupdates.

Reviewed by:	kib
Tested by:	pho
Approved by:	rwatson (mentor)
Sponsored by:	FreeBSD Foundation
2009-02-06 17:14:07 +00:00
Dag-Erling Smørgrav
1b3515f39b WIP 2009-01-30 13:54:03 +00:00
Edward Tomasz Napierala
49c4791ccc Make sure the cdev doesn't go away while the filesystem is still mounted.
Otherwise dev2udev() could return garbage.

Reviewed by:	kib
Approved by:	rwatson (mentor)
Sponsored by:	FreeBSD Foundation
2009-01-29 16:47:15 +00:00
Robert Watson
ec7e66e84c Following a fair amount of real world experience with ACLs and
extended attributes since FreeBSD 5, make the following semantic
changes:

- Don't update the inode modification time (mtime) when extended
  attributes (and hence also ACLs) are added, modified, or removed.
- Don't update the inode access tie (atime) when extended attributes
  (and hence also ACLs) are queried.

This means that rsync (and related tools) won't improperly think
that the data in the file has changed when only the ACL has changed.

Note that ffs_reallocblks() has not been changed to not update on an
IO_EXT transaction, but currently EAs don't use the cluster write
routines so this shouldn't be a problem.  If EAs grow support for
clustering, then VOP_REALLOCBLKS() will need to grow a flag argument
to carry down IO_EXT to UFS.

MFC after:	1 week
PR:             ports/125739
Reported by:    Alexander Zagrebin <alexz@visp.ru>
Tested by:      pluknet <pluknet@gmail.com>,
                Greg Byshenk <freebsd@byshenk.net>
Discussed with: kib, kientzle, timur, Alexander Bokovoy <ab@samba.org>
2009-01-27 21:48:47 +00:00
John Baldwin
1c570a0c09 Fix a few style bogons.
Submitted by:	bde
2009-01-21 20:08:17 +00:00
Konstantin Belousov
e9aff35739 Move the code from ufs_lookup.c used to do dotdot lookup, into
the helper function. It is supposed to be useful for any filesystem
that has to unlock dvp to walk to the ".." entry in lookup routine.

Requested by:	jhb
Tested by:	pho
MFC after:	1 month
2009-01-21 14:51:38 +00:00
John Baldwin
beace17649 Move the VA_MARKATIME flag for VOP_SETATTR() out into its own VOP:
VOP_MARKATIME() since unlike the rest of VOP_SETATTR(), VA_MARKATIME
can be performed while holding a shared vnode lock (the same functionality
is done internally by VOP_READ which can run with a shared vnode lock).
Add missing locking of the vnode interlock to the ufs implementation and
remove a special note and test from the NFS client about not supporting the
feature.

Inspired by:	ups
Tested by:	pho
2009-01-21 14:42:00 +00:00
Konstantin Belousov
b51b07be87 The r187467 should remove all pages for V_NORMAL case too, because
indirect block pages are not removed by the mentioned invocation of
the vnode_pager_setsize().

Put a common code into the helper function ffs_pages_remove().

Reported and tested by:	dchagin
Reviewed by:	ups
MFC after:	3 weeks
2009-01-20 22:00:19 +00:00
John Baldwin
39e4a02c16 Add a comment explaining why the "bufwait" / "dirhash" LOR reported by
WITNESS will not actually result in a deadlock.

Discussed with:	kib
MFC after:	1 week
2009-01-20 16:35:34 +00:00
Konstantin Belousov
b1a4c8e522 When extending inode size, we call vnode_pager_setsize(), to have a
address space where to put vnode pages, and then call UFS_BALLOC(),
to actually allocate new block and map it. When UFS_BALLOC() returns
error, sometimes we forget to revert the vm object size increase,
allowing for the pages that are not backed by the logical disk blocks.

Revert vnode_pager_setsize() back when UFS_BALLOC() failed, for
ffs_truncate() and ffs_write().

PR:	129956
Reviewed by:	ups
MFC after:	3 weeks
2009-01-20 11:30:22 +00:00
Konstantin Belousov
9316467d05 FFS puts the extended attributes blocks at the negative blocks for the
vnode, from -1 down. When vinvalbuf(vp, V_ALT) is done for the vnode, it
incorrectly does vm_object_page_remove(0, 0), removing all pages from
the underlying vm object, not only the pages that back the extended
attributes data.

Change vinvalbuf() to not remove any pages from the object when
V_NORMAL or V_ALT are specified. Instead, the only in-tree caller
in ffs_inode.c:ffs_truncate() that specifies V_ALT explicitely
removes the corresponding page range. The V_NORMAL caller
does vnode_pager_setsize(vp, 0) immediately after the call to
vinvalbuf(V_NORMAL) already.

Reported by:	csjp
Reviewed by:	ups
MFC after:	3 weeks
2009-01-20 11:27:45 +00:00
Konstantin Belousov
e0431d5b1f Lock the uepm_lock around the autostart of extattrs.
Reported and tested by:	pho
Reviewed by:	rwatson
MFC after:	3 weeks
2009-01-08 12:49:55 +00:00
Konstantin Belousov
df86ccf642 If unmount of the ffs mp failed, reinitialize the extended attributes
for the mp, and restart them if autostart is enabled.

Reported and tested by:	pho
Reviewed by:	rwatson
MFC after:	3 weeks
2009-01-08 12:48:27 +00:00
Konstantin Belousov
73491c121c Do not busy twice the mount point where a quota operation is performed.
Tested by:	pho
MFC after:	1 month
2008-12-18 12:01:53 +00:00
Edward Tomasz Napierala
0da50f6ef8 According to phk@, VOP_STRATEGY should never, _ever_, return
anything other than 0.  Make it so.  This fixes
"panic: VOP_STRATEGY failed bp=0xc320dd90 vp=0xc3b9f648",
encountered when writing to an orphaned filesystem.  Reason
for the panic was the following assert:
KASSERT(i == 0, ("VOP_STRATEGY failed bp=%p vp=%p", bp, bp->b_vp));
at vfs_bio:bufstrategy().

Reviewed by:	scottl, phk
Approved by:	rwatson (mentor)
Sponsored by:	FreeBSD Foundation
2008-12-16 21:13:11 +00:00
Konstantin Belousov
269d02f171 The dqrele() function syncs the dq, then acquires the dqh lock, and then
does final drop of the the dq reference to put it onto the free list.
There is a possibility that the dq would be found by another thread
after sync and before the dqh lock is acquired. If that other thread
drops the dq before we have taken the dqh lock, the dirty dq is put on
the free list.

Recheck the DQ_MOD after the dqh lock is relocked. Repeat dqsync() if
the dq is dirty. This ensures that up to date dq is written in the quota
file and fixes assertion in dqget().

Reported and tested by:	Frode Nordahl <frode nordahl net>
MFC after:	3 days
2008-12-08 11:04:17 +00:00
Konstantin Belousov
7b603a4ac7 Improve usefulness of the panic by printing the pointer to the problematic
dquot. In-tree gdb is often unable to get the dq value, so supply it in
panic message.

MFC after:	3 days
2008-12-07 13:25:06 +00:00
Konstantin Belousov
ffdaeffe21 Do not lock vnode interlock around reading of v_iflag to check VI_DOOMED.
Read of the pointer is atomic, and flag cannot be set while vnode lock
is held.

Requested by:	jhb
MFC after:	1 month
2008-12-02 11:12:50 +00:00
Konstantin Belousov
11c68fb23f Busy ufs filesystem around block of code that does ".." lookup. Since
mnt_lock is before lock of any vnode on the mp, it uses LK_NOWAIT. Since
MNTK_UNMOUNT may be transient, pdp lock is dropped when vfs_busy()
failed, and operation is retried after some time. This way, ffs_vget()
is not called on the mp that may be in the process of being destroyed by
unmount.

Check for the VI_DOOMED flag on pdp after its lock is reacquired, to
better detect some situations where directory containing ".."
entry is removed during the lookup.

Reviewed by:	tegge, attilio (previous version)
Tested by:	pho
MFC after:	1 month
2008-11-22 13:11:11 +00:00
John Baldwin
25c398ee9b Fix typo. 2008-11-19 20:06:59 +00:00
Doug Ambrisko
f1c1cdbb9c For now on every 10 cyclinder groups flush the buffer cache to free
up space.  If the buffer cache fills up then the disk systems can
grind to a halt.  Better tuning can be figured out later.

Tested by:	Tim, others and work
Reviewed by:	Kostik Belousov
PR:		128832
2008-11-13 17:40:21 +00:00
John Baldwin
2ef42c06d6 Quiet a WITNESS warning with the dirhash sx locks by setting the DUPOK
flag.  Specifically, if two threads race to create a dirhash for a
directory, then one might already have created a private dirhash
structure (and locked it) when it realizes the directory now has a
structure and tries to lock that one.
2008-11-04 18:56:12 +00:00
Edward Tomasz Napierala
4a4f18ed37 In UFS, when reading EA that contains ACL fails for some reason, include
inode number and filesystem name, so the administrator can fix the problem.

Approved by:	rwatson (mentor)
2008-11-04 12:30:31 +00:00
Attilio Rao
83b3bdbc8a Improve VFS locking:
- Implement real draining for vfs consumers by not relying on the
  mnt_lock and using instead a refcount in order to keep track of lock
  requesters.
- Due to the change above, remove the mnt_lock lockmgr because it is now
  useless.
- Due to the change above, vfs_busy() is no more linked to a lockmgr.
  Change so its KPI by removing the interlock argument and defining 2 new
  flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the
  old version (which was unlinked from the lockmgr alredy) and
  MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx
  once the mnt interlock is held (ability still desired by most consumers).
- The stub used into vfs_mount_destroy(), that allows to override the
  mnt_ref if running for more than 3 seconds, make it totally useless.
  Remove it as it was thought to work into older versions.
  If a problem of "refcount held never going away" should appear, we will
  need to fix properly instead than trust on such hackish solution.
- Fix a bug where returning (with an error) from dounmount() was still
  leaving the MNTK_MWAIT flag on even if it the waiters were actually
  woken up. Just a place in vfs_mount_destroy() is left because it is
  going to recycle the structure in any case, so it doesn't matter.
- Remove the markercnt refcount as it is useless.

This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and
__FreeBSD_version will be modified accordingly.

Discussed with:	kib
Tested by:	pho
2008-11-02 10:15:42 +00:00
Edward Tomasz Napierala
15bc6b2bd8 Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary
to add more V* constants, and the variables changed by this patch were often
being assigned to mode_t variables, which is 16 bit.

Approved by:	rwatson (mentor)
2008-10-28 13:44:11 +00:00
Konstantin Belousov
5400fa16b4 Provide an explanation for getinoquota() call in the ufs_access vop.
MFC after:	3 days
2008-10-28 12:00:28 +00:00
Dag-Erling Smørgrav
e11e3f187d Fix a number of style issues in the MALLOC / FREE commit. I've tried to
be careful not to fix anything that was already broken; the NFSv4 code is
particularly bad in this respect.
2008-10-23 20:26:15 +00:00
Dag-Erling Smørgrav
1ede983cc9 Retire the MALLOC and FREE macros. They are an abomination unto style(9).
MFC after:	3 months
2008-10-23 15:53:51 +00:00
Konstantin Belousov
016f98f947 Assert that v_holdcnt is non-zero before entering lockmgr in vn_lock
and ffs_lock. This cannot catch situations where holdcnt is incremented
not by curthread, but I think it is useful.

Reviewed by:	tegge, attilio
Tested by:	pho
MFC after:	2 weeks
2008-10-20 10:11:33 +00:00
Konstantin Belousov
4560452f01 Sync up summary information for cylinder groups while data is already
in memory during snapshot creation. This improves the results of the
background fsck.

Submitted by: tegge
MFC after: 1 week
2008-10-13 14:05:01 +00:00
Attilio Rao
0d7935fd01 Remove the struct thread unuseful argument from bufobj interface.
In particular following functions KPI results modified:
- bufobj_invalbuf()
- bufsync()

and BO_SYNC() "virtual method" of the buffer objects set.
Main consumers of bufobj functions are affected by this change too and,
in particular, functions which changed their KPI are:
- vinvalbuf()
- g_vfs_close()

Due to the KPI breakage, __FreeBSD_version will be bumped in a later
commit.

As a side note, please consider just temporary the 'curthread' argument
passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP

Reviewed by:	kib
Tested by:	Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2008-10-10 21:23:50 +00:00
John Baldwin
f888634792 Enable shared lookups on UFS. There are some remaining issues with forced
unmounts, but those are in the VFS lookup code are not UFS specific.

Tested by:	pho, kris
2008-09-24 18:53:04 +00:00
John Baldwin
10fbe6292c Close a race between concurrent calls to ufsdirhash_recycle() and
ufsdirhash_free() introduced in my last commit by removing the dirhash
about to be free'd in ufsdirhash_free() from the global dirhash list
before dropping the sx lock.

Tested by:	kris
2008-09-22 20:53:22 +00:00
Konstantin Belousov
86dacdfe2b Initialize va_flags and va_filerev properly in VOP_GETATTR(). Don't
initialize va_vaflags and va_spare because they are not part of the
VOP_GETATTR() API. Also don't initialize birthtime to ctime or zero.

Submitted by:   Jaakko Heinonen <jh saunalahti fi>
Reviewed by:	bde
Discussed on:   freebsd-fs
MFC after:	1 month
2008-09-20 19:46:45 +00:00
John Baldwin
c481551083 Retire the 'i_reclen' field from the in-memory i-node. Previously,
during a DELETE lookup operation, lookup would cache the length of the
directory entry to be deleted in 'i_reclen'.  Later, the actual VOP to
remove the directory entry (ufs_remove, ufs_rename, etc.) would call
ufs_dirremove() which extended the length of the previous directory
entry to "remove" the deleted entry.

However, we always read the entire block containing the directory
entry when doing the removal, so we always have the directory entry to
be deleted in-memory when doing the update to the directory block.
Also, we already have to figure out where the directory entry that is
being removed is in the block so that we can pass the component name
to the dirhash code to update the dirhash.  So, instead of passing
'i_reclen' from ufs_lookup() to the ufs_dirremove() routine, just read
the 'd_reclen' field directly out of the entry being removed when
updating the length of the previous entry in the block.

This avoids a cosmetic issue of writing to 'i_reclen' while holding a
shared vnode lock.  It also slightly reduces the amount of side-band
data passed from ufs_lookup() to operations updating a directory via
the directory's i-node.

Reviewed by:	jeff
2008-09-16 19:06:44 +00:00
John Baldwin
b2ef6b1833 Fix a race with shared lookups on UFS. If the the dirhash code reached the
cap on memory usage, then shared LOOKUP operations could start free'ing
dirhash structures.  Without these fixes, concurrent free's on the same
directory could result in one of the threads blocked on a lock in a dirhash
structure free'd by the other thread.
- Replace the lockmgr lock in the dirhash structure with an sx lock.
- Use a reference count managed with ufsdirhash_hold()/drop() to determine
  when to free the dirhash structures.  The directory i-node holds a
  reference while the dirhash is attached to an i-node.  Code that wishes
  to lock the dirhash while holding a shared vnode lock must first
  acquire a private reference to the dirhash while holding the vnode
  interlock before acquiring the dirhash sx lock.  After acquiring the sx
  lock, it drops the private reference after checking to see if the
  dirhash is still used by the directory i-node.
2008-09-16 16:23:56 +00:00
John Baldwin
5316d529ec - Only set i_offset in the parent directory's i-node during a lookup for
non-LOOKUP operations.
- Relax a VOP assertion for a DELETE lookup.  rename() uses WANTPARENT
  instead of LOCKPARENT when looking up the source pathname.  ufs_rename()
  uses a relookup() to lock the parent directory when it decides to finally
  remove the source path.  Thus, it is ok for a DELETE with WANTPARENT set
  instead of LOCKPARENT to use a shared vnode lock rather than an exclusive
  vnode lock.

Reported by:	kris (2)
Reviewed by:	jeff
2008-09-16 16:18:36 +00:00
John Baldwin
1b7cf11b00 vdropl() drops the vnode interlock. Thus, the code in the QUOTA case that
upgrades the vnode lock if it is share locked was dropping the interlock
before actually checking VI_DOOMED.  Fix this by do the vdropl() after the
check and relying on it to drop the vnode interlock.

Reported by:	pho
Reviewed by:	kib
MFC after:	1 week
2008-09-16 16:15:38 +00:00
Konstantin Belousov
6fecb4e41e Suspend the write operations on the UFS filesystem being unmounted or
remounted from rw to ro.

Proposed and reviewed by:  tegge
In collaboration with:	pho
MFC after:	 1 month
2008-09-16 11:55:53 +00:00
Konstantin Belousov
2814d5ba5f When attempt is made to suspend a filesystem that is already syspended,
wait until the current suspension is lifted instead of silently returning
success immediately. The consequences of calling vfs_write() resume when
not owning the suspension are not well-defined at best.

Add the vfs_susp_clean() mount method to be called from
vfs_write_resume(). Set it to process_deferred_inactive() for ffs, and
stop calling it manually.

Add the thread flag TDP_IGNSUSP that allows to bypass the suspension
point in the vn_start_write. It is intended for use by VFS in the
situations where the suspender want to do some i/o requiring calls to
vn_start_write(), and this i/o cannot be done later.

Reviewed by:	tegge
In collaboration with:	pho
MFC after:	 1 month
2008-09-16 11:51:06 +00:00
Konstantin Belousov
52dfc8d7da Add the ffs structures introspection functions for ddb.
Show the b_dep value for the buffer in the show buffer command.
Add a comand to dump the dirty/clean buffer list for vnode.

Reviewed by:	tegge
Tested and used by:   pho
MFC after:   1 month
2008-09-16 11:19:38 +00:00
Konstantin Belousov
90446e360c When downgrading the read-write mount to read-only, do_unmount() sets
MNT_RDONLY flag before the VFS_MOUNT() is called. In ufs_inactive()
and ufs_itimes_locked(), UFS verifies whether the fs is read-only by
checking MNT_RDONLY, but this may cause loss of the IN_MODIFIED flag
for inode on the fs being remounted rw->ro.

Introduce UFS_RDONLY() struct ufsmount' method that reports the value
of the fs_ronly. The later is set to 1 only after the remount is
finished.

Reviewed by:	tegge
In collaboration with:	pho
MFC after:	 1 month
2008-09-16 10:59:35 +00:00
Konstantin Belousov
0411d79138 The struct inode *ip supplied to softdep_freefile is not neccessary the
inode having number ino. In r170991, the ip was marked IN_MODIFIED, that
is not quite correct.

Mark only the right inode modified by checking inode number.

Reviewed by:	tegge
In collaboration with:	pho
MFC after:	 1 month
2008-09-16 10:52:25 +00:00
Edward Tomasz Napierala
86a0c0aa7b When calling extattr_check_cred, use V{READ,WRITE}, not I{READ,WRITE}.
Approved by:	rwatson (mentor)
2008-09-03 12:46:09 +00:00
Attilio Rao
59d4932531 Decontextualize vfs_busy(), vfs_unbusy() and vfs_mount_alloc() functions.
Manpages are updated accordingly.

Tested by:	Diego Sardina <siarodx at gmail dot com>
2008-08-31 14:26:08 +00:00
Attilio Rao
0359a12ead Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2008-08-28 15:23:18 +00:00
Konstantin Belousov
acd05e468a In ffs_valloc(), ffs_vget() may fail because insmntque() refused to
insert new vnode into the mount vnode list. Then, for the SU-enabled
mount, ffs_vfree could create freefile dependency. This dependency can
hang around forever since inode is not marked as IN_MODIFIED and
correspondingly inodeblock may be not marked as dirty.

After ffs_vget() fails, retry with FFSV_FORCEINSMQ, mark the inode as
modified, and vput() it immediately. Take care of the dup alloc.

Tested by:	pho
Reviewed by:	tegge
MFC after:	1 month
2008-08-28 09:19:50 +00:00
Konstantin Belousov
7b7ed832e4 Softdep code may need to instantiate vnode when processing
dependencies. In particular, it may need this while syncing filesystem
being unmounted. Since during unmount MNTK_NOINSMNTQUE flag is set,
that could sometimes disallow insertion of the vnode into the vnode
mount list, softdep code needs to overwrite the MNTK_NOINSMNTQUE flag.

Create the ffs_vgetf() function that sets the VV_FORCEINSMQ flag for
new vnode and use it consistently from the softdep code instead of
ffs_vget().

Add the retry logic to the softdep_flushfiles() to flush the vnodes
that could be instantiated while flushing softdep dependencies.

Tested by:	pho, kris
Reviewed by:	tegge
MFC after:	1 month
2008-08-28 09:18:20 +00:00
Konstantin Belousov
f2228325de Put the relocked variable from the r182111 into the #ifdef QUOTA braces
to prevent warning about unused var on the !QUOTA kernels.

Reported by:	ed
MFC after:	1 week
2008-08-24 19:06:19 +00:00
Konstantin Belousov
689eae1d90 Revert the r167541: "Remove unneeded getinoquota() call in the
ufs_access()." The call to getinoquota in ufs_access() serves the
purpose of instantiating inode dquot from the vn_open(). Since quotas
are accounted only for the inodes with already attached dquot, removal
of the call prevented opened inodes from participation in the quota
calculations.

Since ufs_access() may be called with the vnode being only shared
locked, upgrade (and then downgrade) vnode lock if calling
getinoquota().

Reported by:	simon at optinet com
In collaboration with:	pho
MFC after:	1 week
2008-08-24 17:24:22 +00:00
Konstantin Belousov
e792b09be2 Revert r181345.
Move the NULL pointer check to the vfs_deleteopt() function.

Discussed with:	rodrigc
MFC after:	3 days
2008-08-10 12:15:36 +00:00
Konstantin Belousov
a1a917e029 User may do "mount -o snapshot ...", that causes new FFS mount to be
performed with snapshot option, while the mp->mnt_opt is NULL.
Protect against NULL pointer dereference.

Noted by:	Mateusz Guzik <mjguzik gmail com>
MFC after:	3 days
2008-08-06 14:47:19 +00:00
Dag-Erling Smørgrav
20ed1beeb5 ufsmount.h uses "struct\tfoo *bar;", except where it doesn't.
quota.h uses "struct foo\t*bar;", except where it doesn't.
Try to make them both agree with themselves (though not with eachother)
2008-08-05 15:24:07 +00:00
Dag-Erling Smørgrav
1ac541a69a Whitespace, prototypes 2008-08-05 10:25:55 +00:00