Commit Graph

2840 Commits

Author SHA1 Message Date
rmacklem
24def143f7 Fix the NFSv4 client for the case where mmap'd files are
written, but not msync'd by a process. A VOP_PUTPAGES()
called when VOP_RECLAIM() happens will usually fail, since
the NFSv4 Open has already been closed by VOP_INACTIVE().
Add a vm_object_page_clean() call to the NFSv4 client's
VOP_INACTIVE(), so that the write happens before the NFSv4
Open is closed. kib@ suggested using vgone() instead and
I will explore this, but this patch fixes things in the
meantime. For some reason, the VOP_PUTPAGES() is still
attaempted in VOP_RECLAIM(), but having this fail doesn't
cause any problems except a "stateid0 in write" being logged.

Reviewed by:	kib
MFC after:	1 week
2012-06-18 22:17:28 +00:00
rmacklem
77d92cc9de Move the nfsrpc_close() call in ncl_reclaim() for the NFSv4 client
to below the vnode_destroy_vobject() call, since that is where
writes are flushed.

Suggested by:	kib
MFC after:	1 week
2012-06-17 18:34:04 +00:00
kib
0f85e0cb46 Improve handling of uiomove(9) errors for the NFS client.
Do not brelse() the buffer unconditionally with BIO_ERROR set if
uiomove() failed. The brelse() treats most buffers with BIO_ERROR as
B_INVAL, dropping their content.  Instead, if the write request
covered the whole buffer, remember the cached state and brelse() with
BIO_ERROR set only if the buffer was not cached previously.

Update the buffer dirtyoff/dirtyend based on the progress recorded by
uiomove() in passed struct uio, even in the presence of
error. Otherwise, usermode could see changed data in the backed pages,
but later the buffer is destroyed without write-back.

If uiomove() failed for IO_UNIT request, try to truncate the vnode
back to the pre-write state, and rewind the progress in passed uio
accordingly, following the FFS behaviour.

Reviewed by:	rmacklem (some time ago)
Tested by:	pho
MFC after:	1 month
2012-06-06 16:30:16 +00:00
kib
b4b050eda2 Capitalize start of sentence.
MFC after:	3 days
2012-05-30 14:00:23 +00:00
marcel
e9bb2ca35e Catch a corner case where ssegs could be 0 and thus i would be 0 and
we index suinfo out of bounds (i.e. -1).

Approved by:	gber
2012-05-28 16:33:58 +00:00
ed
241db0ddf5 Fix style and consistency:
- Use tabs, not spaces.
- Add tab after #define.
- Don't mix the use of BSD and ISO C unsigned integer types. Prefer the
  ISO C ones.
2012-05-27 09:34:47 +00:00
gleb
fe722ad5af Use C99-style initialization for struct dirent in preparation for
changing the structure.

Sponsored by:	Google Summer of Code 2011
2012-05-25 09:16:59 +00:00
mav
08333340b6 Revert devfs part of r235911. I was unaware about old but unfinished
discussion between kib@ and gibbs@ about it.
2012-05-24 18:19:23 +00:00
mav
96f3e42ce2 MFprojects/zfsd:
Revamp the CAM enclosure services driver.
This updated driver uses an in-kernel daemon to track state changes and
publishes physical path location information\for disk elements into the
CAM device database.

Sponsored by:   Spectra Logic Corporation
Sponsored by:   iXsystems, Inc.
Submitted by:   gibbs, will, mav
2012-05-24 14:07:44 +00:00
rmacklem
568f302214 A problem with the NFSv4 server was reported by Andrew Leonard
to freebsd-fs@, where the setfacl of an NFSv4 acl would fail.
This was caused by the VOP_ACLCHECK() call for ZFS replying
EOPNOTSUPP. After discussion with rwatson@, it was determined
that a call to VOP_ACLCHECK() before doing VOP_SETACL() is not
required. This patch fixes the problem by deleting the
VOP_ACLCHECK() call.

Tested by:	Andrew Leonard (previous version)
MFC after:	1 week
2012-05-17 21:52:17 +00:00
gber
6f7c735300 Import work done under project/nand (@235533) into head.
The NAND Flash environment consists of several distinct components:
  - NAND framework (drivers harness for NAND controllers and NAND chips)
  - NAND simulator (NANDsim)
  - NAND file system (NAND FS)
  - Companion tools and utilities
  - Documentation (manual pages)

This work is still experimental. Please use with caution.

Obtained from: Semihalf
Supported by:  FreeBSD Foundation, Juniper Networks
2012-05-17 10:11:18 +00:00
pfg
b227d4379e Fix a couple of issues that appear to be inherited from the old
8.x code:
- If the lock cannot be acquired immediately unlocks 'bar' vnode
and then locks both vnodes in order.
- wrong vnode type panics from cache_enter_time after calls by
ext2_lookup.

The fix merges the fixes from ufs/ufs_lookup.c.

Submitted by:	Mateusz Guzik
Approved by:	jhb@ (mentor)
Reviewed by:	kib@
MFC after:	1 week
2012-05-16 15:53:38 +00:00
gleb
3288f283ff Skip directory entries with zero inode number during traversal.
Entries with zero inode number are considered placeholders by libc and
UFS.  Fix remaining uses of VOP_READDIR in kernel: vop_stdvptocnp,
unionfs.

Sponsored by:	Google Summer of Code 2011
2012-05-16 10:44:09 +00:00
rmacklem
4ff8331c1b Fix two cases in the new NFS server where a tsleep() is
used, when the code should actually protect the tested
variable with a mutex. Since the tsleep()s had a 10sec
timeout, the race would have only delayed the allocation
of a new clientid for a client. The sleeps will also
rarely occur, since having a callback in progress when
a client acquires a new clientid, is unlikely.
in practice, since having a callback in progress when
a fresh clientid is being acquired by a client is unlikely.

MFC after:	1 month
2012-05-12 22:20:55 +00:00
rmacklem
6a6a18bf5c PR# 165923 reported intermittent write failures for dirty
memory mapped pages being written back on an NFS mount.
Since any thread can call VOP_PUTPAGES() to write back a
dirty page, the credentials of that thread may not have
write access to the file on an NFS server. (Often the uid
is 0, which may be mapped to "nobody" in the NFS server.)
Although there is no completely correct fix for this
(NFS servers check access on every write RPC instead of at
open/mmap time), this patch avoids the common cases by
holding onto a credential that recently opened the file
for writing and uses that credential for the write RPCs
being done by VOP_PUTPAGES() for both NFS clients.

Tested by:	Joel Ray Holveck (joelh at juniper.net)
PR:		kern/165923
Reviewed by:	kib
MFC after:	2 weeks
2012-05-12 12:02:51 +00:00
pluknet
eb9c684005 Fix mount interlock oversights from the previous change in r234386.
Reported by:	dougb
Submitted by:	Mateusz Guzik <mjguzik at gmail com>
Reviewed by:	Kirk McKusick
Tested by:	pho
2012-05-10 20:28:33 +00:00
jwd
f638b8eae1 Use the common api helper routine instead of freeing the namei
buffer directly.

Approved by:	rmacklem (mentor)
MFC after:	1 month
2012-05-08 03:39:44 +00:00
daichi
56e77af2bb fixed a unionfs_readdir math issue
PR:		132987
Submitted by:	Matthew Fleming <mfleming@isilon.com>
2012-05-03 07:22:29 +00:00
daichi
83abcd3986 - fixed a vnode lock hang-up issue.
- fixed an incorrect lock status issue.
- fixed an incorrect lock issue of unionfs root vnode removed.
  (pointed out by keith)
- fixed an infinity loop issue.
  (pointed out by dumbbell)
- changed to do LK_RELEASE expressly when unlocked.

Submitted by:	ozawa@ongs.co.jp
2012-05-01 07:46:30 +00:00
rmacklem
2dcf58ad40 It was reported via email that some non-FreeBSD NFS servers
do not include file attributes in the reply to an NFS create RPC
under certain circumstances.
This resulted in a vnode of type VNON that was not usable.
This patch adds an NFS getattr RPC to nfs_create() for this case,
to fix the problem. It was tested by the person that reported
the problem and confirmed to fix this case for their server.

Tested by:	Steven Haber (steven.haber at isilon.com)
MFC after:	2 weeks
2012-04-27 22:23:06 +00:00
rmacklem
ffdff21e0e Fix a leak of namei lookup path buffers that occurs when a
ZFS volume is exported via the new NFS server. The leak occurred
because the new NFS server code didn't handle the case where
a file system sets the SAVENAME flag in its VOP_LOOKUP() and
ZFS does this for the DELETE case.

Tested by:	Oliver Brandmueller (ob at gruft.de), hrs
PR:		kern/167266
MFC after:	1 month
2012-04-27 20:23:24 +00:00
trasz
023bd7c6bf Remove unused thread argument to vrecycle().
Reviewed by:	kib
2012-04-23 14:10:34 +00:00
trasz
baac623cd9 Remove unused thread argument from vtruncbuf().
Reviewed by:	kib
2012-04-23 13:21:28 +00:00
mckusick
5b7b29e35b This change creates a new list of active vnodes associated with
a mount point. Active vnodes are those with a non-zero use or hold
count, e.g., those vnodes that are not on the free list. Note that
this list is in addition to the list of all the vnodes associated
with a mount point.

To avoid adding another set of linkage pointers to the vnode
structure, the active list uses the existing linkage pointers
used by the free list (previously named v_freelist, now renamed
v_actfreelist).

This update adds the MNT_VNODE_FOREACH_ACTIVE interface that loops
over just the active vnodes associated with a mount point (typically
less than 1% of the vnodes associated with the mount point).

Reviewed by: kib
Tested by:   Peter Holm
MFC after:   2 weeks
2012-04-20 06:50:44 +00:00
jh
ce0372c7aa Return EOPNOTSUPP rather than EPERM for the SF_SNAPSHOT flag because
tmpfs doesn't support snapshots.

Suggested by:	bde
2012-04-18 15:22:08 +00:00
mckusick
ffee40eeff Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL.
The primary changes are that the user of the interface no longer
needs to manage the mount-mutex locking and that the vnode that
is returned has its mutex locked (thus avoiding the need to check
to see if its is DOOMED or other possible end of life senarios).

To minimize compatibility issues for third-party developers, the
old MNT_VNODE_FOREACH interface will remain available so that this
change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH
will be removed in head.

The reason for this update is to prepare for the addition of the
MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the
active vnodes associated with a mount point (typically less than
1% of the vnodes associated with the mount point).

Reviewed by: kib
Tested by:   Peter Holm
MFC after:   2 weeks
2012-04-17 16:28:22 +00:00
jh
734fbc687a Sync tmpfs_chflags() with the recent changes to UFS:
- Add a check for unsupported file flags.
- Return EPERM when an user without PRIV_VFS_SYSFLAGS privilege attempts
  to toggle SF_SETTABLE flags.
2012-04-16 18:10:34 +00:00
jh
e8d1b1d0ce tmpfs: Allow update mounts only for certain options.
Since r230208 update mounts were allowed if the list of mount options
contained the "export" option. This is not correct as tmpfs doesn't
really support updating all options.

Reviewed by:	kevlo, trociny
2012-04-16 18:07:42 +00:00
gleb
748c9a0d1c Provide better description for vfs.tmpfs.memory_reserved sysctl.
Suggested by:	Anton Yuzhaninov <citrin@citrin.ru>
2012-04-15 21:59:28 +00:00
jh
2c87a056cc Apply changes from r234103 to ext2fs:
Return EPERM from ext2_setattr() when an user without PRIV_VFS_SYSFLAGS
privilege attempts to toggle SF_SETTABLE flags.

Flags are now stored to ip->i_flags in one place after all checks.

Also, remove SF_NOUNLINK from the checks because ext2fs doesn't support
that flag.

Reviewed by:	bde
2012-04-13 05:48:31 +00:00
jh
cfeeeaffc7 Restore the blank line incorrectly removed in r234104.
Pointed out by:	bde
2012-04-11 15:48:50 +00:00
jh
6734f8805b Apply changes from r233787 to ext2fs:
- Use more natural ip->i_flags instead of vap->va_flags in the final
  flags check.
- Style improvements.

No functional change intended.

MFC after:	2 weeks
2012-04-10 16:05:52 +00:00
attilio
628004ddfb - Introduce a cache-miss optimization for consistency with other
accesses of the cache member of vm_object objects.
- Use novel vm_page_is_cached() for checks outside of the vm subsystem.

Reviewed by:	alc
MFC after:	2 weeks
X-MFC:		r234039
2012-04-09 17:05:18 +00:00
mckusick
4bed5dbd8f Add I/O accounting to msdos filesystem.
Suggested and reviewed by: kib
2012-04-08 06:18:18 +00:00
gleb
75744aafa6 tmpfs supports only INT_MAX nodes due to limitations of unit number
allocator.

Replace UINT32_MAX checks with INT_MAX. Keeping more than 2^31 nodes in
memory is not likely to become possible in foreseeable feature and would
require new unit number allocator.

Discussed with: delphij
MFC after:	2 weeks
2012-04-07 15:30:46 +00:00
gleb
fb452e77b0 Add vfs_getopt_size. Support human readable file system options in tmpfs.
Increase maximum tmpfs file system size to 4GB*PAGE_SIZE on 32 bit archs.

Discussed with:	delphij
MFC after:	2 weeks
2012-04-07 15:27:34 +00:00
gleb
c4ba84f86e Add reserved memory limit sysctl to tmpfs.
Cleanup availble and used memory functions.
Check if free pages available before allocating new node.

Discussed with:	delphij
2012-04-07 15:23:51 +00:00
kib
22e84b3527 Add sysctl vfs.nfs.nfs_keep_dirty_on_error to switch the nfs client
behaviour on error from write RPC back to behaviour of old nfs client.
When set to not zero, the pages for which write failed are kept dirty.

PR:	kern/165927
Reviewed by:	alc
MFC after:	2 weeks
2012-03-17 23:03:20 +00:00
gleb
cc93f6fa4e Prevent tmpfs_rename() deadlock in a way similar to UFS
Unlock vnodes and try to lock them one by one. Relookup fvp and tvp.

Approved by:	mdf (mentor)
2012-03-14 09:15:50 +00:00
gleb
fa1fc18612 Don't enforce LK_RETRY to get existing vnode in tmpfs_alloc_vp()
Doomed vnode is hardly of any use here, besides all callers handle error
case. vfs_hash_get() does the same.

Don't mess with vnode holdcount, vget() takes care of it already.

Approved by:	mdf (mentor)
2012-03-14 08:29:21 +00:00
kevlo
af6e15978b Use NULL instead of 0 2012-03-13 10:04:13 +00:00
kib
0e86a223a9 Update comment.
Submitted by:	gianni
2012-03-11 15:58:27 +00:00
kib
8adabb0356 Remove fifo.h. The only used function declaration from the header is
migrated to sys/vnode.h.

Submitted by:	gianni
2012-03-11 12:19:58 +00:00
pfg
f8fecfb833 Add support for ns timestamps and birthtime to the ext2/3 driver.
When using big inodes there is sufficient space in ext3 to
keep extra resolution and birthtime (creation) timestamps.
The appropriate fields in the on-disk inode have been approved
for a long time but support for this in ext3 has not been
widely  distributed.

In preparation for ext4 most linux distributions have enabled
by default such bigger inodes and some people use nanosecond
timestamps in ext3. We now support those when the inode is big
enough and while we do recognize the EXT4F_ROCOMPAT_EXTRA_ISIZE,
we maintain the extra timestamps even when they are not used.

An additional note by Bruce Evans:
We blindly accept unrepresentable tv_nsec in VOP_SETATTR(), but
all file  systems have always done that.  When POSIX gets around
to  specifying the behaviour, it will probably require certain
rounding to the fs's resolution and not rejecting the request.
This unfortunately means that syscalls that set times can't
really tell if they succeeded without reading back the times
using stat() or similar and checking that they were set close
enough.

Reviewed by:	bde
Approved by:	jhb (mentor)
MFC after:	2 weeks
2012-03-08 21:06:05 +00:00
jhb
19feaba08b Add KTR_VFS traces to track modifications to a vnode's writecount. 2012-03-08 20:27:20 +00:00
kib
9d4d411642 The pipe_poll() performs lockless access to the vnode to test
fifo_iseof() condition, allowing the v_fifoinfo to be reset and freed
by fifo_cleanup().

Precalculate EOF at the places were fo_wgen is changed, and cache the
state in a new pipe state flag PIPE_SAMEWGEN.

Reported and tested by:	bf
Submitted by:	gianni
MFC after:	1 week (a backport)
2012-03-07 07:31:50 +00:00
kib
91747da5db Apply inlined vn_vget_ino() algorithm for ".." lookup in pseudofs.
Reported and tested by:	pho
MFC after:	2 weeks
2012-03-05 11:38:02 +00:00
kib
8428e3fd0f Remove unneeded cast to u_int. The values as small enough to fit into
int, beside the use of MIN macro which performs type promotions.

Submitted by:	bde
MFC after:	3 weeks
2012-03-04 14:51:42 +00:00
kevlo
15e0aa0e65 Remove unnecessary casts 2012-03-04 09:48:58 +00:00
kevlo
65436963c5 Clean up style(9) nits 2012-03-04 09:38:20 +00:00