Commit Graph

3081 Commits

Author SHA1 Message Date
jeff
fa887dba7b Prepare to replace the buf splay with a trie:
- Don't insert BKGRDMARKER bufs into the splay or dirty/clean buf lists.
   No consumers need to find them there and it complicates the tree.
   These flags are all FFS specific and could be moved out of the buf
   cache.
 - Use pbgetvp() and pbrelvp() to associate the background and journal
   bufs with the vp.  Not only is this much cheaper it makes more sense
   for these transient bufs.
 - Fix the assertions in pbget* and pbrel*.  It's not safe to check list
   pointers which were never initialized.  Use the BX flags instead.  We
   also check B_PAGING in reassignbuf() so this should cover all cases.

Discussed with:	kib, mckusick, attilio
Sponsored by:	EMC / Isilon Storage Division
2013-04-06 22:21:23 +00:00
kib
7ada0d9324 Strip the unnneeded spaces, mostly at the end of lines.
MFC after:	3 days
2013-04-01 09:56:48 +00:00
pjd
91184d303f - Constify local path variable for chflagsat().
- Use correct format characters (%lx) for u_long.

This fixes the build broken in r248599.
2013-03-22 07:40:34 +00:00
pjd
2a3cf7f364 - Make 'flags' argument to chflags(2), fchflags(2) and lchflags(2) of type
u_long. Before this change it was of type int for syscalls, but prototypes
  in sys/stat.h and documentation for chflags(2) and fchflags(2) (but not
  for lchflags(2)) stated that it was u_long. Now some related functions
  use u_long type for flags (strtofflags(3), fflagstostr(3)).
- Make path argument of type 'const char *' for consistency.

Discussed on:	arch
Sponsored by:	The FreeBSD Foundation
2013-03-21 22:44:33 +00:00
kib
7225171d66 Initialize the variable to avoid (false) compiler warning about
use of an uninitialized local.

Reported by:	Ivan Klymenko <fidaj@ukr.net>
MFC after:	2 weeks
2013-03-21 12:59:24 +00:00
kib
6620c04e30 Do not call vnode_pager_setsize() while a NFS node mutex is
locked. vnode_pager_setsize() might sleep waiting for the page after
EOF be unbusied.

Call vnode_pager_setsize() both for the regular and directory vnodes.

Reported by:	mich
Reviewed by:	rmacklem
Discussed with:	avg, jhb
MFC after:	2 weeks
2013-03-21 07:25:08 +00:00
emaste
2ccefecf01 Fix remainder calculation when biosize is not a power of 2
In common configurations biosize is a power of two, but is not required to
be so.  Thanks to markj@ for spotting an additional case beyond my original
patch.

Reviewed by: rmacklem@
2013-03-19 13:06:11 +00:00
kib
1b20f7cc18 Remove negative name cache entry pointing to the target name, which
could be instantiated while tdvp was unlocked.

Reported by:	Rick Miller <vmiller at hostileadmin com>
Tested by:	pho
MFC after:	1 week
2013-03-17 15:11:37 +00:00
kib
9b0c4b125b Add currently unused flag argument to the cluster_read(),
cluster_write() and cluster_wbuild() functions.  The flags to be
allowed are a subset of the GB_* flags for getblk().

Sponsored by:	The FreeBSD Foundation
Tested by:	pho
2013-03-14 20:28:26 +00:00
jhb
b2e811621c Revert 195703 and 195821 as this special stop handling in NFS is now
implemented via VFCF_SBDRY rather than passing PBDRY to individual
sleep calls.
2013-03-13 21:06:03 +00:00
glebius
ce7d7c6757 Finish r243882: mechanically substitute flags from historic mbuf
allocator with malloc(9) flags within sys.

Sponsored by:	Nginx, Inc.
2013-03-12 08:59:51 +00:00
davide
ce7dfce71d smbfs_lookup() in the DOTDOT case operates on dvp->n_parent without
proper locking. This doesn't prevent in any case reclaim of the vnode.
Avoid this not going over-the-wire in this case and relying on subsequent
smbfs_getattr() call to restore consistency.
While I'm here, change a couple of SMBVDEBUG() in MPASS().
sbmfs_smb_lookup() doesn't and shouldn't know about '.' and '..'

Reported by:	pho's stress2 suite
2013-03-09 13:25:45 +00:00
davide
f30e75d436 - Initialize variable in smbfs_rename() to silent compiler warning
- Fix smbfs_mkdir() return value (in case of error).

Reported by:	pho
2013-03-09 13:05:21 +00:00
attilio
63326e81a3 Garbage collect NWFS and NCP bits which are now completely disconnected
from the tree since few months.

This patch is not targeted for MFC.
2013-03-09 12:45:36 +00:00
attilio
bf1dc90446 MFC 2013-03-08 00:03:07 +00:00
attilio
5d57dc997e Garbage collect NTFS bits which are now completely disconnected from
the tree since few months.

This patch is not targeted for MFC.
2013-03-02 18:40:04 +00:00
attilio
59a3d435c9 Garbage collect PORTALFS bits which are now completely disconnected from
the tree since few months.

This patch is not targeted for MFC.
2013-03-02 16:43:28 +00:00
attilio
5d33ae7487 Garbage collect CODAFS bits which are now completely disconnected from
the tree since few months.

This patch is not targeted for MFC.
2013-03-02 16:30:18 +00:00
attilio
4b0353fc07 Garbage collect HPFS bits which are now already completely disconnected
from the tree since few months (please note that the userland bits
were already disconnected since a long time, thus there is no need
to update the OLD* entries).

This is not targeted for MFC.
2013-03-02 14:54:33 +00:00
attilio
e98f58faf6 MFC 2013-03-02 14:48:41 +00:00
jilles
869c43b8d9 nullfs: Improve f_flags in statfs().
Include some flags of the nullfs mount itself:
MNT_RDONLY, MNT_NOEXEC, MNT_NOSUID, MNT_UNION, MNT_NOSYMFOLLOW.

This allows userland code calling statfs() or fstatfs() to see these flags.
In particular, this allows opendir() to detect that a -t nullfs -o union
mount needs deduplication (otherwise at least . and .. are returned twice)
and allows rtld to detect a -t nullfs -o noexec mount as noexec.

Turn off the MNT_ROOTFS flag from the underlying filesystem because the
nullfs mount is definitely not the root filesystem.

Reviewed by:	kib
MFC after:	1 week
2013-03-02 12:42:23 +00:00
pjd
f07ebb8888 Merge Capsicum overhaul:
- Capability is no longer separate descriptor type. Now every descriptor
  has set of its own capability rights.

- The cap_new(2) system call is left, but it is no longer documented and
  should not be used in new code.

- The new syscall cap_rights_limit(2) should be used instead of
  cap_new(2), which limits capability rights of the given descriptor
  without creating a new one.

- The cap_getrights(2) syscall is renamed to cap_rights_get(2).

- If CAP_IOCTL capability right is present we can further reduce allowed
  ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed
  ioctls can be retrived with cap_ioctls_get(2) syscall.

- If CAP_FCNTL capability right is present we can further reduce fcntls
  that can be used with the new cap_fcntls_limit(2) syscall and retrive
  them with cap_fcntls_get(2).

- To support ioctl and fcntl white-listing the filedesc structure was
  heavly modified.

- The audit subsystem, kdump and procstat tools were updated to
  recognize new syscalls.

- Capability rights were revised and eventhough I tried hard to provide
  backward API and ABI compatibility there are some incompatible changes
  that are described in detail below:

	CAP_CREATE old behaviour:
	- Allow for openat(2)+O_CREAT.
	- Allow for linkat(2).
	- Allow for symlinkat(2).
	CAP_CREATE new behaviour:
	- Allow for openat(2)+O_CREAT.

	Added CAP_LINKAT:
	- Allow for linkat(2). ABI: Reuses CAP_RMDIR bit.
	- Allow to be target for renameat(2).

	Added CAP_SYMLINKAT:
	- Allow for symlinkat(2).

	Removed CAP_DELETE. Old behaviour:
	- Allow for unlinkat(2) when removing non-directory object.
	- Allow to be source for renameat(2).

	Removed CAP_RMDIR. Old behaviour:
	- Allow for unlinkat(2) when removing directory.

	Added CAP_RENAMEAT:
	- Required for source directory for the renameat(2) syscall.

	Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR):
	- Allow for unlinkat(2) on any object.
	- Required if target of renameat(2) exists and will be removed by this
	  call.

	Removed CAP_MAPEXEC.

	CAP_MMAP old behaviour:
	- Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and
	  PROT_WRITE.
	CAP_MMAP new behaviour:
	- Allow for mmap(2)+PROT_NONE.

	Added CAP_MMAP_R:
	- Allow for mmap(PROT_READ).
	Added CAP_MMAP_W:
	- Allow for mmap(PROT_WRITE).
	Added CAP_MMAP_X:
	- Allow for mmap(PROT_EXEC).
	Added CAP_MMAP_RW:
	- Allow for mmap(PROT_READ | PROT_WRITE).
	Added CAP_MMAP_RX:
	- Allow for mmap(PROT_READ | PROT_EXEC).
	Added CAP_MMAP_WX:
	- Allow for mmap(PROT_WRITE | PROT_EXEC).
	Added CAP_MMAP_RWX:
	- Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).

	Renamed CAP_MKDIR to CAP_MKDIRAT.
	Renamed CAP_MKFIFO to CAP_MKFIFOAT.
	Renamed CAP_MKNODE to CAP_MKNODEAT.

	CAP_READ old behaviour:
	- Allow pread(2).
	- Disallow read(2), readv(2) (if there is no CAP_SEEK).
	CAP_READ new behaviour:
	- Allow read(2), readv(2).
	- Disallow pread(2) (CAP_SEEK was also required).

	CAP_WRITE old behaviour:
	- Allow pwrite(2).
	- Disallow write(2), writev(2) (if there is no CAP_SEEK).
	CAP_WRITE new behaviour:
	- Allow write(2), writev(2).
	- Disallow pwrite(2) (CAP_SEEK was also required).

	Added convinient defines:

	#define	CAP_PREAD		(CAP_SEEK | CAP_READ)
	#define	CAP_PWRITE		(CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_R		(CAP_MMAP | CAP_SEEK | CAP_READ)
	#define	CAP_MMAP_W		(CAP_MMAP | CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_X		(CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL)
	#define	CAP_MMAP_RW		(CAP_MMAP_R | CAP_MMAP_W)
	#define	CAP_MMAP_RX		(CAP_MMAP_R | CAP_MMAP_X)
	#define	CAP_MMAP_WX		(CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_MMAP_RWX		(CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_RECV		CAP_READ
	#define	CAP_SEND		CAP_WRITE

	#define	CAP_SOCK_CLIENT \
		(CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \
		 CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN)
	#define	CAP_SOCK_SERVER \
		(CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \
		 CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \
		 CAP_SETSOCKOPT | CAP_SHUTDOWN)

	Added defines for backward API compatibility:

	#define	CAP_MAPEXEC		CAP_MMAP_X
	#define	CAP_DELETE		CAP_UNLINKAT
	#define	CAP_MKDIR		CAP_MKDIRAT
	#define	CAP_RMDIR		CAP_UNLINKAT
	#define	CAP_MKFIFO		CAP_MKFIFOAT
	#define	CAP_MKNOD		CAP_MKNODAT
	#define	CAP_SOCK_ALL		(CAP_SOCK_CLIENT | CAP_SOCK_SERVER)

Sponsored by:	The FreeBSD Foundation
Reviewed by:	Christoph Mallon <christoph.mallon@gmx.de>
Many aspects discussed with:	rwatson, benl, jonathan
ABI compatibility discussed with:	kib
2013-03-02 00:53:12 +00:00
attilio
afe5ce0c13 MFC 2013-02-26 17:33:18 +00:00
alc
8eacd44767 Eliminate a duplicate #include.
Sponsored by:	EMC / Isilon Storage Division
2013-02-26 07:00:24 +00:00
attilio
cb47f0509b Merge from vmobj-rwlock branch:
Remove unused inclusion of vm/vm_pager.h and vm/vnode_pager.h.

Sponsored by:	EMC / Isilon storage division
Tested by:	pho
Reviewed by:	alc
2013-02-26 01:00:11 +00:00
attilio
d883da7ba4 MFC 2013-02-21 21:59:35 +00:00
jhb
ca1e2e0739 Further refine the handling of stop signals in the NFS client. The
changes in r246417 were incomplete as they did not add explicit calls to
sigdeferstop() around all the places that previously passed SBDRY to
_sleep().  In addition, nfs_getcacheblk() could trigger a write RPC from
getblk() resulting in sigdeferstop() recursing.  Rather than manually
deferring stop signals in specific places, change the VFS_*() and VOP_*()
methods to defer stop signals for filesystems which request this behavior
via a new VFCF_SBDRY flag.  Note that this has to be a VFC flag rather than
a MNTK flag so that it works properly with VFS_MOUNT() when the mount is
not yet fully constructed.  For now, only the NFS clients are set this new
flag in VFS_SET().

A few other related changes:
- Add an assertion to ensure that TDF_SBDRY doesn't leak to userland.
- When a lookup request uses VOP_READLINK() to follow a symlink, mark
  the request as being on behalf of the thread performing the lookup
  (cnp_thread) rather than using a NULL thread pointer.  This causes
  NFS to properly handle signals during this VOP on an interruptible
  mount.

PR:		kern/176179
Reported by:	Russell Cattelan (sigdeferstop() recursion)
Reviewed by:	kib
MFC after:	1 month
2013-02-21 19:02:50 +00:00
attilio
8746bf6a5f MFC 2013-02-21 15:06:19 +00:00
imp
3bd19e3992 The request queue is already locked, so we don't need the splsofclock/splx
here to note future work.
2013-02-21 02:43:44 +00:00
attilio
15bf891afe Rename VM_OBJECT_LOCK(), VM_OBJECT_UNLOCK() and VM_OBJECT_TRYLOCK() to
their "write" versions.

Sponsored by:	EMC / Isilon storage division
2013-02-20 12:03:20 +00:00
attilio
658534ed5a Switch vm_object lock to be a rwlock.
* VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations
* VM_OBJECT_SLEEP() is introduced as a general purpose primitve to
  get a sleep operation using a VM_OBJECT_LOCK() as protection
* The approach must bear with vm_pager.h namespace pollution so many
  files require including directly rwlock.h
2013-02-20 10:38:34 +00:00
kib
e5238fcb15 Do not update the fsinfo block on each update of any fat block, this
is excessive. Postpone the flush of the fsinfo to VFS_SYNC(),
remembering the need for update with the flag MSDOSFS_FSIMOD, stored
in pm_flags.

FAT32 specification describes both FSI_Free_Count and FSI_Nxt_Free as
the advisory hints, not requiring them to be correct.

Based on the patch from bde, modified by me.

Reviewed by: bde
MFC after:   2 weeks
2013-02-17 20:35:54 +00:00
bapt
dafec87d65 Revert r246791 as it needs a security review first
Reported by:	gavin, rwatson
2013-02-14 15:17:53 +00:00
bapt
99980d8453 Allow fdescfs to be mounted from inside a jail
MFC after:	1 week
2013-02-14 13:03:15 +00:00
pfg
1d9f9f37f8 ext2fs: Use prototype declarations for function definitions
Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-10 19:49:37 +00:00
attilio
cde657f5b1 Remove a racy checks on resident and cached pages for
tmpfs_mapped{read, write}() functions:
- tmpfs_mapped{read, write}() are only called within VOP_{READ, WRITE}(),
  which check before-hand to work only on valid VREG vnodes.  Also the
  vnode is locked for the duration of the work, making vnode reclaiming
  impossible, during the operation. Hence, vobj can never be NULL.
- Currently check on resident pages and cached pages without vm object
  lock held is racy and can do even more harm than good, as a page could
  be transitioning between these 2 pools and then be skipped entirely.
  Skip the checks as lookups on empty splay trees are very cheap.

Discussed with:	alc
Tested by:	flo
MFC after:	2 weeks
2013-02-10 01:04:10 +00:00
pfg
c03e3032d5 ext2fs: Replace redundant EXT2_MIN_BLOCK with EXT2_MIN_BLOCK_SIZE.
Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-08 21:09:44 +00:00
pfg
9694490e0d ext2fs: make e2fs_maxcontig local and remove tautological check.
e2fs_maxcontig was modelled after UFS when bringing the
"Orlov allocator" to ext2. On UFS fs_maxcontig is kept in the
superblock and is used by userland tools (fsck and growfs),

In ext2 this information is volatile so it is not available
for userland tools, so in this case it doesn't have sense
to carry it in the in-memory superblock.

Also remove a pointless check for MAX(1, x) > 0.

Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-08 20:58:00 +00:00
pfg
f908ac5c6e Remove unused MAXSYMLINKLEN macro.
Reviewed by:	mckusick
PR:		kern/175794
MFC after:	1 week
2013-02-08 20:30:19 +00:00
kib
92d95b8406 Stop translating the ERESTART error from the open(2) into EINTR.
Posix requires that open(2) is restartable for SA_RESTART.

For non-posix objects, in particular, devfs nodes, still disable
automatic restart of the opens. The open call to a driver could have
significant side effects for the hardware.

Noted and reviewed by:	jilles
Discussed with:	bde
MFC after:	2 weeks
2013-02-07 14:53:33 +00:00
jhb
0fee3f66b8 Rework the handling of stop signals in the NFS client. The changes in
195702, 195703, and 195821 prevented a thread from suspending while holding
locks inside of NFS by forcing the thread to fail sleeps with EINTR or
ERESTART but defer the thread suspension to the user boundary.  However,
this had the effect that stopping a process during an NFS request could
abort the request and trigger EINTR errors that were visible to userland
processes (previously the thread would have suspended and completed the
request once it was resumed).

This change instead effectively masks stop signals while in the NFS client.
It uses the existing TDF_SBDRY flag to effect this since SIGSTOP cannot
be masked directly.  Also, instead of setting PBDRY on individual sleeps,
the NFS client now sets the TDF_SBDRY flag around each NFS request and
stop signals are masked for all sleeps during that region (the previous
change missed sleeps in lockmgr locks).  The end result is that stop
signals sent to threads performing an NFS request are completely
ignored until after the NFS request has finished processing and the
thread prepares to return to userland.  This restores the behavior of
stop signals being transparent to userland processes while still
preventing threads from suspending while holding NFS locks.

Reviewed by:	kib
MFC after:	1 month
2013-02-06 17:06:51 +00:00
pfg
5e55b2c6f7 ext2fs: move assignment where it is not dead.
Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-05 03:26:34 +00:00
pfg
935c860d1b ext2fs: Remove unused em_e2fsb definition..
Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-05 03:23:56 +00:00
pfg
c6538dcc30 ext2fs: Remove useless rootino local variable.
Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-05 03:17:41 +00:00
pfg
28dd7f0e2d ext2fs: Correct off-by-one errors in FFTODT() and DDTOFT().
Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-05 03:13:05 +00:00
pfg
c181635a65 ext2fs: Use nitems().
Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-05 03:08:56 +00:00
pfg
affc90ea66 ext2fs: Use EXT2_LINK_MAX instead of LINK_MAX
Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-05 03:01:04 +00:00
pfg
e94b41487b ext2fs: general cleanup.
- Remove unused extern declarations in fs.h
- Correct comments in ext2_dir.h
- Several panic() messages showed wrong function names.
- Remove commented out stray line in ext2_alloc.c.
- Remove the unused macro EXT2_BLOCK_SIZE_BITS() and the then
  write-only member e2fs_blocksize_bits from struct m_ext2fs.
- Remove the unused macro EXT2_FIRST_INO() and the then write-only
  member e2fs_first_inode from struct m_ext2fs.
- Remove EXT2_DESC_PER_BLOCK() and the member e2fs_descpb from
  struct m_ext2fs.
- Remove the unused members e2fs_bmask, e2fs_dbpg and
  e2fs_mount_opt from struct m_ext2fs
- Correct harmless off-by-one error for fspath in ext2_vfsops.c.
- Remove the unused and broken macros EXT2_ADDR_PER_BLOCK_BITS()
  and EXT2_DESC_PER_BLOCK_BITS().
- Remove the !_KERNEL versions of the EXT2_* macros.

Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-02 22:23:45 +00:00
kib
5e07d49c62 The MSDOSFSMNT_WAITONFAT flag is bogus and broken. It does less than
track the MNT_SYNCHRONOUS flag.  It is set to the latter at mount time
but not updated by MNT_UPDATE.

Use MNT_SYNCHRONOUS to decide to write the FAT updates syncrhonously.

Submitted by:	bde
MFC after:	1 week
2013-02-01 18:30:41 +00:00
kib
35907051bb Backup FATs were sometimes marked dirty by copying their first block
from the primary FAT, and then they were not marked clean on unmount.
Force marking them clean when appropriate.

Submitted by:	bde
MFC after:	1 week
2013-02-01 18:25:53 +00:00
kib
96b12145fb The directory entry for dotdot was corrupted in the FAT32 case when moving
a directory to a subdir of the root directory from somewhere else.

For all directory moves that change the parent directory, the dotdot
entry must be fixed up.  For msdosfs, the root directory is magic for
non-FAT32.  It is less magic for FAT32, but needs the same magic for
the dotdot fixup.  It didn't have it.

Both chkdsk and fsck_msdosfs fix the corrupt directory entries with no
problems.

The fix is to use the same magic for dotdot in msdosfs_rename() as in
msdosfs_mkdir().

For msdosfs_mkdir(), document the magic. When writing the dotdot entry
in mkdir, use explicitly set pcl variable instead on relying on the
start cluster of the root directory typically has a value < 65536.

Submitted by:	bde
MFC after:	1 week
2013-02-01 18:06:06 +00:00
kib
ad92b9afc4 The mountmsdosfs() function had an insane sanity test, remove it.
Trying FAT32 on a small partition failed to mount because
pmp->pm_Sectors was nonzero.  Normally, FAT32 file systems are so
large that the 16-bit pm_Sectors can't hold the size.  This is
indicated by setting it to 0 and using only pm_HugeSectors.  But at
least old versions of newfs_msdos use the 16-bit field if possible,
and msdosfs supports this except for breaking its own support in the
sanity check.  This is quite different from the handling of pm_FATsecs
-- now the 16-bit value is always ignored for FAT32 except for
checking that it is 0, and newfs_msdos doesn't use the 16-bit value
for FAT32.

Submitted by:	bde
MFC after:	1 week
2013-02-01 18:01:03 +00:00
kib
31d95b4c31 Fix a backwards comment in markvoldirty().
Submitted by:	bde
MFC after:	1 week
2013-02-01 17:58:37 +00:00
kib
5012e4bd24 Assert that the mbuf in the chain has sane length. Proper place for
this check is somewhere in the network code, but this assertion
already proven to be useful in catching what seems to be driver bugs
causing NFS scrambling random memory.

Discussed with:	rmacklem
MFC after:	1 week
2013-02-01 16:57:02 +00:00
kib
d622325e53 Be conservative and do not try to consume more bytes than was
requested from the server for the read operation.  Server shall not
reply with too large size, but client should be resilent too.

Reviewed by:	rmacklem
MFC after:	1 week
2013-01-27 09:34:25 +00:00
pfg
245e35ae97 Clean some 'svn:executable' properties in the tree.
Submitted by:	Christoph Mallon
MFC after:	3 days
2013-01-26 22:08:21 +00:00
pfg
440e8ae3c8 Cosmetical off-by-one
Technically, the case when all the blocks are released
is not a sanity check.
Move further the comment while here.

Suggested by:	bde
MFC after:	3 days
2013-01-26 21:50:52 +00:00
jhb
f2293255a9 Further cleanups to use of timestamps in NFS:
- Use NFSD_MONOSEC (which maps to time_uptime) instead of the seconds
  portion of wall-time stamps to manage timeouts on events.
- Remove unused nd_starttime from the per-request structure in the new
  NFS server.
- Use nanotime() for the modification time on a delegation to get as
  precise a time as possible.
- Use time_second instead of extracting the second from a call to
  getmicrotime().

Submitted by:	bde (3)
Reviewed by:	bde, rmacklem
MFC after:	2 weeks
2013-01-25 15:25:24 +00:00
pfg
f4f6188cae ext2fs: fix a check for negative block numbers.
The previous change accidentally left the substraction we
were trying to avoid in case that i_blocks could become
negative.

Reported by:	bde
MFC after:	4 days
2013-01-23 14:29:29 +00:00
pfg
646ebf1c31 ext2fs: make some inode fields match the ext2 spec.
Ext2fs uses unsigned fields in its dinode struct.
FreeBSD can have negative values in some of those
fields and the inode is meant to interact with the
system so we have never respected the unsigned
nature of most of those fields.

Block numbers and the NFS generation number do
not need to be signed so redefine them as
unsigned to better match the on-disk information.

MFC after:	1 week
2013-01-22 18:54:03 +00:00
pfg
7d48f835be ext2fs: temporarily disable the reallocation code.
Testing with fsx has revealed problems and in order to
hunt the bugs properly we need reduce the complexity.

This seems to help but is not a complete solution.

MFC after:	3 days
2013-01-22 18:36:31 +00:00
delphij
adf92df625 Make it possible to force async at server side on new NFS server, similar
to the old one's nfs.nfsrv.async.

Please note that by enabling this option (default is disabled), the system
could potentionally have silent data corruption if the server crashes
before write is committed to non-volatile storage, as the client side have
no way to tell if the data is already written.

Submitted by:	rmacklem
MFC after:	2 weeks
2013-01-18 19:42:08 +00:00
pfg
2b37b49e2c ext2fs: Add some DOINGASYNC check to match ffs.
This is mostly cosmetical.

Reviewed by:	bde
MFC after:	3 days
2013-01-18 19:11:17 +00:00
jhb
812f7427ff Use vfs_timestamp() to set file timestamps rather than invoking
getmicrotime() or getnanotime() directly in NFS.

Reviewed by:	rmacklem, bde
MFC after:	1 week
2013-01-18 18:43:38 +00:00
jhb
7cc7eee4d9 Remove a no-longer-used variable after the previous change to use
VA_UTIMES_NULL.

Submitted by:	bde, rmacklem
MFC after:	1 week
2013-01-17 18:45:20 +00:00
jhb
ecb4042c11 Use the VA_UTIMES_NULL flag to detect when NULL was passed to utimes()
instead of comparing the desired time against the current time as a
heuristic.

Reviewed by:	rmacklem
MFC after:	1 week
2013-01-16 21:52:31 +00:00
kib
82d5c5773a Remove the filtering of the acceptable mount options for nullfs, added
in r245004.  Although the report was for noatime option which is
non-functional for the nullfs, other standard options like nosuid or
noexec are useful with it.

Reported by:	Dewayne Geraghty <dewayne.geraghty@heuristicsystems.com.au>
MFC after:	3 days
2013-01-16 05:32:49 +00:00
jhb
e7637960eb - More properly handle interrupted NFS requests on an interruptible mount
by returning an error of EINTR rather than EACCES.
- While here, bring back some (but not all) of the NFS RPC statistics lost
  when krpc was committed.

Reviewed by:	rmacklem
MFC after:	1 week
2013-01-15 22:08:17 +00:00
kib
169c0a0a9f The current default size of the nullfs hash table used to lookup the
existing nullfs vnode by the lower vnode is only 16 slots.  Since the
default mode for the nullfs is to cache the vnodes, hash has extremely
huge chains.

Size the nullfs hashtbl based on the current value of
desiredvnodes. Use vfs_hash_index() to calculate the hash bucket for a
given vnode.

Pointy hat to:	    kib
Diagnosed and reviewed by:	peter
Tested by:    peter, pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	5 days
2013-01-14 05:44:47 +00:00
kib
b94d892898 When nullfs mount is forcibly unmounted and nullfs vnode is reclaimed,
get back the leased write reference from the lower vnode.  There is no
other path which can correct v_writecount on the lowervp.

Reported by:	flo
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2013-01-10 18:24:48 +00:00
bapt
afe1d4e213 Add support for IO_APPEND flag in fuse
This make open(..., O_APPEND) actually works on fuse filesystem.

Reviewed by:	attilio
2013-01-08 12:21:50 +00:00
pfg
947a420026 ext2fs: cleanup de dinode structure.
It was plagued with style errors and the offsets had been lost.
While here took the time to update the fields according to the
latest ext4 documentation.

Reviewed by:	bde
MFC after:	3 days
2013-01-07 03:36:32 +00:00
gleb
97e76936ec tmpfs: Replace directory entry linked list with RB-Tree.
Use file name hash as a tree key, handle duplicate keys.  Both VOP_LOOKUP
and VOP_READDIR operations utilize same tree for search.  Directory
entry offset (cookie) is either file name hash or incremental id in case
of hash collisions (duplicate-cookies).  Keep sorted per directory list
of duplicate-cookie entries to facilitate cookie number allocation.

Don't fail if previous VOP_READDIR() offset is no longer valid, start
with next dirent instead.  Other file system handle it similarly.

Workaround race prone tn_readdir_last[pn] fields update.

Add tmpfs_dir_destroy() to free all dirents.

Set NFS cookies in tmpfs_dir_getdents(). Return EJUSTRETURN from
tmpfs_dir_getdents() instead of hard coded -1.

Mark directory traversal routines static as they are no longer
used outside of tmpfs_subr.c
2013-01-06 22:15:44 +00:00
kib
f74da69096 Fix reversed condition in the assertion.
Pointy hat to:	kib
MFC after:	13 days
2013-01-04 07:52:47 +00:00
kib
a7c71037df Add the "nocache" nullfs mount option, which disables the caching of
the free nullfs vnodes, switching nullfs behaviour to pre-r240285.
The option is mostly intended as the last-resort when higher pressure
on the vnode cache due to doubling of the vnode counts is not
desirable.

Note that disabling the cache costs more than 2x wall time in the
metadata-hungry scenarious.  The default is "cache".

Tested and benchmarked by:	pho (previous version)
MFC after:	2 weeks
2013-01-03 19:17:57 +00:00
kib
defbe57abe Remove the last use of the deprecated MNT_VNODE_FOREACH interface in
the tree.

With the help from:	mjg
Tested by:	Ronald Klop <ronald-freebsd8@klop.yi.org>
MFC after:	2 weeks
2013-01-03 19:01:56 +00:00
kib
c6bad3bef7 Do not force a writer to the devfs file to drain the buffer writes.
Requested and tested by:	Ian Lepore <freebsd@damnhippie.dyndns.org>
MFC after:	2 weeks
2012-12-23 22:43:27 +00:00
pfg
16216f308e More constant renaming in preparation for newer features.
We also try to make better use of the fs flags instead of
trying adapt the code according to the fs structures. In
the case of subsecond timestamps and birthtime we now
check that the feature is explicitly enabled: previously
we only checked that the reserved space was available and
silently wrote them.

This approach is much safer, especially if the filesystem
happens to use embedded inodes or support EAs.

Discussed with:	Zheng Liu
MFC after:	5 days
2012-12-20 02:22:36 +00:00
rmacklem
1c6e1dc79b Add "nfsstat -m" support for the two new NFS mount options
added by r244042.
2012-12-09 22:23:50 +00:00
rmacklem
c82d89183d Move the NFSv4.1 client patches over from projects/nfsv4.1-client
to head. I don't think the NFS client behaviour will change unless
the new "minorversion=1" mount option is used. It includes basic
NFSv4.1 support plus support for pNFS using the Files Layout only.
All problems detecting during an NFSv4.1 Bakeathon testing event
in June 2012 have been resolved in this code and it has been tested
against the NFSv4.1 server available to me.
Although not reviewed, I believe that kib@ has looked at it.
2012-12-08 22:52:39 +00:00
glebius
8e20fa5ae9 Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually
2012-12-05 08:04:20 +00:00
rmacklem
d79bf0f49f Add an nfssvc() option to the kernel for the new NFS client
which dumps out the actual options being used by an NFS mount.
This will be used to implement a "-m" option for nfsstat(1).

Reviewed by:	alfred
MFC after:	2 weeks
2012-12-02 01:16:04 +00:00
pfg
b4fee55cdf Update some definitions or make them match NetBSD's headers.
Bring several definitions required for newer ext4 features.

Rename EXT2F_COMPAT_HTREE to EXT2F_COMPAT_DIRHASHINDEX since it
is not being used yet and the new name is more compatible with
NetBSD and Linux.

This change is purely cosmetic and has no effect on the real
code.

Obtained from:	NetBSD
MFC after:	3 days
2012-11-28 15:48:32 +00:00
pfg
0077f34174 Partially bring r242520 to ext2fs.
When a file is first being written, the dynamic block reallocation
(implemented by ext2_reallocblks) relocates the file's blocks
so as to cluster them together into a contiguous set of blocks on
the disk.

When the cluster crosses the boundary into the first indirect block,
the first indirect block is initially allocated in a position
immediately following the last direct block.  Block reallocation
would usually destroy locality by moving the indirect block out of
the way to keep the data blocks contiguous.

The issue was diagnosed long ago by Bruce Evans on ffs and surfaced
on ext2fs when block reallocaton was ported. This is only a partial
solution based on the similarities with FFS. We still require more
review of the allocation details that vary in ext2fs.

Reported by:	bde
MFC after:	1 week
2012-11-28 00:36:40 +00:00
davide
f3a37c7422 - smbfs_rename() might return an error value without correctly upgrading
the vnode use count, and this might cause the kernel to panic if compiled
with WITNESS enable.
- Be sure to put the '\0' terminator to the rpath string.

Sponsored by:	iXsystems inc.
2012-11-26 04:29:47 +00:00
davide
e52463677a - Remove reset of vpp pointer in some places as long as it's not really
useful and has the side effect of obfuscating the code a bit.
- Remove spurious references to simple_lock.

Reported by:	attilio [1]
Sponsored by:	iXsystems inc.
2012-11-22 09:13:45 +00:00
davide
017de7d030 Until now, smbfs_fullpath() computed the full path starting from the
vnode and following back the chain of n_parent pointers up to the root,
without acquiring the locks of the n_parent vnodes analyzed during the
computation. This is immediately wrong because if the vnode lock is not
held there's no guarantee on the validity of the vnode pointer or the data.
In order to fix, store the whole path in the smbnode structure so that
smbfs_fullpath() can use this information.

Discussed with:		kib
Reported and tested by:		pho
Sponsored by:		iXsystems inc.
2012-11-22 08:58:29 +00:00
kib
f31aa350da Remove the check and panic for an impossible condition. The NULL
lowervp vnode v_vnlock would cause panic due to NULL pointer
dereference much earlier.

MFC after:	1 week
2012-11-20 15:25:00 +00:00
attilio
b9a061c88d r16312 is not any longer real since many years (likely since when VFS
received granular locking) but the comment present in UFS has been
copied all over other filesystems code incorrectly for several times.

Removes comments that makes no sense now.

Reviewed by:	kib
MFC after:	3 days
2012-11-19 22:43:45 +00:00
kib
801de09716 In pget(9), if PGET_NOTWEXIT flag is not specified, also search the
zombie list for the pid. This allows several kern.proc sysctls to
report useful information for zombies.

Hold the allproc_lock around all searches instead of relocking it.
Remove private pfind_locked() from the new nfs client code.

Requested and reviewed by:	pjd
Tested by:	pho
MFC after:	3 weeks
2012-11-16 08:25:06 +00:00
kib
32941467f0 Remove M_USE_RESERVE from the devfs cdp allocator, which is one of two
uses of M_USE_RESERVE in the kernel. This allocation is not special.

Reviewed by:	alc
Tested by:	pho
MFC after:	2 weeks
2012-11-14 19:50:21 +00:00
davide
b4e3eb0677 Get rid of some old debug code. It provides checks similar to the one
offered by RedZone so there's no need to keep it.

Sponsored by:	iXsystems inc.
2012-11-14 19:10:50 +00:00
davide
d7bac61929 Fix the lookup in the DOTDOT case in the same way as other filesystems do,
i.e. inlining the vn_vget_ino() algorithm.

Sponsored by:	iXsystems inc.
2012-11-14 18:43:58 +00:00
attilio
0a289d546b - Protect mnt_data and mnt_flags under the mount interlock
- Move mp->mnt_stat manipulation where all of them happens

Reported by:	davide
Discussed with:	kib
Tested by:	flo
MFC after:	2 months
X-MFC:		241519, 242536,242616, 242727
2012-11-10 19:32:16 +00:00
attilio
d5d551ec46 Complete MPSAFE VFS interface and remove MNTK_MPSAFE flag.
Porters should refer to __FreeBSD_version 1000021 for this change as
it may have happened at the same timeframe.
2012-11-09 18:02:25 +00:00
attilio
1e93fc5eeb - Current caching mode is completely broken because it simply relies
on timing of the operations and not real lookup, bringing too many
  false positives. Remove the whole mechanism. If it needs to be
  implemented, next time it should really be done in the proper way.
- Fix VOP_GETATTR() in order to cope with userland bugs that would
  change the type of file and not panic. Instead it gets the entry as
  if it is not existing.

Reported and tested by:	flo
MFC after:	2 months
X-MFC:		241519, 242536,242616
2012-11-08 00:32:49 +00:00
attilio
908519dd89 fuse_io* must be able to crunch also VDIR vnodes.
Update assert appropriately.

Reported and Tested by:	flo
MFC after:	2 months
X-MFC:		241519,242536
2012-11-05 15:23:54 +00:00
attilio
35cb5e8a85 Fix a bug where operations was carried on even if not implemented,
leading to handling of an invalid fdip object.

Reported and tested by:	flo
MFC after:	2 months
X-MFC:		241519
2012-11-03 23:32:32 +00:00
kib
f16ea99007 The r241025 fixed the case when a binary, executed from nullfs mount,
was still possible to open for write from the lower filesystem.  There
is a symmetric situation where the binary could already has file
descriptors opened for write, but it can be executed from the nullfs
overlay.

Handle the issue by passing one v_writecount reference to the lower
vnode if nullfs vnode has non-zero v_writecount.  Note that only one
write reference can be donated, since nullfs only keeps one use
reference on the lower vnode.  Always use the lower vnode v_writecount
for the checks.

Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is
currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT
to manipulate the v_writecount value, which manages a single bypass
reference to the lower vnode.  Caling the VOPs instead of directly
accessing v_writecount provide the fix described in the previous
paragraph.

Tested by:	pho
MFC after:	3 weeks
2012-11-02 13:56:36 +00:00
davide
ff8afcf2e9 - Do not put in the mntqueue half-constructed vnodes.
- Change the code so that it relies on vfs_hash rather than on a
  home-made hashtable.
- There's no need to inline fnv_32_buf().

Reviewed by:	delphij
Tested by:	pho
Sponsored by:	iXsystems inc.
2012-10-31 03:55:33 +00:00
davide
793cdde76e Fix panic due to page faults while in kernel mode, under conditions of
VM pressure. The reason is that in some codepaths pointers to stack
variables were passed from one thread to another.

In collaboration with:	pho
Reported by:	pho's stress2 suite
Sponsored by:	iXsystems inc.
2012-10-31 03:34:07 +00:00
davide
a7cdc19e4b Change the code to use %jd as printf() placeholder for uio_offset and
cast to intmax_t.

Suggested by:	pjd
Sponsored by:	iXsystems inc.
2012-10-31 02:54:44 +00:00
davide
1206789da9 Fix build in case we have SMBVDEBUG turned on.
Reviewed by:	gnn
Approved by:	gnn
Sponsored by:	iXsystems inc.
2012-10-25 21:08:02 +00:00
davide
bac20b679d - Remove the references to the deprecated zalloc kernel interface
- Use M_ZERO flag in malloc() rather than bzero()
- malloc() with M_NOWAIT can't return NULL so there's no need to check

Reviewed by:	alc
Approved by:	alc
2012-10-25 20:23:04 +00:00
kib
560aa751e0 Remove the support for using non-mpsafe filesystem modules.
In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by:	attilio
Tested by:	pho
2012-10-22 17:50:54 +00:00
eadler
3f7a414911 remove duplicate semicolons where possible.
Approved by:	cperciva
MFC after:	1 week
2012-10-22 03:00:37 +00:00
ed
1fb45e9f39 Remove unneeded D_NEEDMINOR.
This is only needed when using clonelists. This got remove in r238693.
2012-10-18 19:28:31 +00:00
rmacklem
813fc27188 Add two new options to the nfssvc(2) syscall that allow
processes running as root to suspend/resume execution
of the kernel nfsd threads. An earlier version of this
patch was tested by Vincent Hoffman (vince at unsane.co.uk)
and John Hickey (jh at deterlab.net).

Reviewed by:	kib
MFC after:	2 weeks
2012-10-14 22:33:17 +00:00
kib
f327177d6f Grammar fixes.
Submitted by:	bf
MFC after:	1 week
2012-10-14 18:13:33 +00:00
kib
7cc759236b Replace the XXX comment with the proper description.
MFC after:	1 week
2012-10-14 17:07:34 +00:00
attilio
95ca52bfbf Rename s/DEBUG()/FS_DEBUG() and s/DEBUG2G()/FS_DEBUG2G() in order to
avoid a name clash in sparc64.

MFC after:	2 months
X-MFC:		r241519
2012-10-14 03:51:59 +00:00
attilio
af2d834e29 Import a FreeBSD port of the FUSE Linux module.
This has been developed during 2 summer of code mandates and being revived
by gnn recently.
The functionality in this commit mirrors entirely content of fusefs-kmod
port, which doesn't need to be installed anymore for -CURRENT setups.

In order to get some sparse technical notes, please refer to:
http://lists.freebsd.org/pipermail/freebsd-fs/2012-March/013876.html

or to the project branch:
svn://svn.freebsd.org/base/projects/fuse/

which also contains granular history of changes happened during port
refinements. This commit does not came from the branch reintegration
itself because it seems svn is not behaving properly for this functionaly
at the moment.

Partly Sponsored by:		Google, Summer of Code program 2005, 2011
Originally submitted by:	ilya, Csaba Henk <csaba-ml AT creo DOT hu >
In collabouration with:		pho
Tested by:			flo, gnn, Gustau Perez,
				Kevin Oberman <rkoberman AT gmail DOT com>
MFC after:			2 months
2012-10-13 23:54:26 +00:00
kib
8f845e475e Fix the mis-handling of the VV_TEXT on the nullfs vnodes.
If you have a binary on a filesystem which is also mounted over by
nullfs, you could execute the binary from the lower filesystem, or
from the nullfs mount. When executed from lower filesystem, the lower
vnode gets VV_TEXT flag set, and the file cannot be modified while the
binary is active. But, if executed as the nullfs alias, only the
nullfs vnode gets VV_TEXT set, and you still can open the lower vnode
for write.

Add a set of VOPs for the VV_TEXT query, set and clear operations,
which are correctly bypassed to lower vnode.

Tested by:	pho (previous version)
MFC after:	2 weeks
2012-09-28 11:25:02 +00:00
mdf
394f27b845 Fix up kernel sources to be ready for a 64-bit ino_t.
Original code by:	Gleb Kurtsou
2012-09-27 23:30:49 +00:00
rmacklem
c071417ab7 Modify the NFSv4 client so that it can handle owner
and owner_group strings that consist entirely of
digits, interpreting them as the uid/gid number.
This change was needed since new (>= 3.3) Linux
servers reply with these strings by default.
This change is mandated by the rfc3530bis draft.
Reported on freebsd-stable@ under the Subject
heading "Problem with Linux >= 3.3 as NFSv4 server"
by Norbert Aschendorff on Aug. 20, 2012.

Tested by:	norbert.aschendorff at yahoo.de
Reviewed by:	jhb
MFC after:	2 weeks
2012-09-20 02:49:25 +00:00
ed
123cfec6ca Prefer __containerof() above member2struct().
The first does proper checking of the argument types, while the latter
does not.
2012-09-15 19:28:54 +00:00
kib
10608e7d85 The deadfs VOPs for vop_ioctl and vop_bmap call itself recursively,
which is an elaborate way to cause kernel panic. Change the VOPs
implementation to return EBADF for a reclaimed vnode.

While the calls to vop_bmap should not reach deadfs, it is indeed
possible for vop_ioctl, because the VOP locking protocol is to pass
the vnode to VOP unlocked. The actual panic was observed when ioctl
was called on procfs filedescriptor which pointed to an exited
process.

Reported by:	zont
Tested by:	pho
MFC after:	1 week
2012-09-13 13:05:45 +00:00
kevlo
422999da8c Add VFCF_READONLY flag that indicates ntfs and xfs file systems are
only supported as read-only.
2012-09-12 03:42:52 +00:00
kevlo
60ab143617 Prevent nump NULL pointer dereference in bmap_getlbns() 2012-09-11 09:38:32 +00:00
kevlo
261aee2945 Fix style nit 2012-09-11 08:36:41 +00:00
rmacklem
cbc3fb8c5b Add a simple printf() based debug facility to the new nfs client.
Use it for a printf() that can be harmlessly generated for mmap()'d
files. It will be used extensively for the NFSv4.1 client.
Debugging printf()s are enabled by setting vfs.nfs.debuglevel to
a non-zero value. The higher the value, the more debugging printf()s.

Reviewed by:	jhb
MFC after:	2 weeks
2012-09-09 21:00:45 +00:00
kib
3ed1c80d25 Allow shared lookups for nullfs mounts, if lower filesystem supports
it.  There are two problems which shall be addressed for shared
lookups use to have measurable effect on nullfs scalability:

1. When vfs_lookup() calls VOP_LOOKUP() for nullfs, which passes lookup
operation to lower fs, resulting vnode is often only shared-locked. Then
null_nodeget() cannot instantiate covering vnode for lower vnode, since
insmntque1() and null_hashins() require exclusive lock on the lower.

Change the assert that lower vnode is exclusively locked to only
require any lock.  If null hash failed to find pre-existing nullfs
vnode for lower vnode and the vnode is shared-locked, the lower vnode
lock is upgraded.

2. Nullfs reclaims its vnodes on deactivation. This is due to nullfs
inability to detect reclamation of the lower vnode.  Reclamation of a
nullfs vnode at deactivation time prevents a reference to the lower
vnode to become stale.

Change nullfs VOP_INACTIVE to not reclaim the vnode, instead use the
VFS_RECLAIM_LOWERVP to get notification and reclaim upper vnode
together with the reclamation of the lower vnode.

Note that nullfs reclamation procedure calls vput() on the lowervp
vnode, temporary unlocking the vnode being reclaimed. This seems to be
fine for MPSAFE filesystems, but not-MPSAFE code often put partially
initialized vnode on some globally visible list, and later can decide
that half-constructed vnode is not needed.  If nullfs mount is created
above such filesystem, then other threads might catch such not
properly initialized vnode. Instead of trying to overcome this case,
e.g. by recursing the lower vnode lock in null_reclaim_lowervp(), I
decided to rely on nearby removal of the support for non-MPSAFE
filesystems.

In collaboration with:	pho
MFC after:	3 weeks
2012-09-09 19:20:23 +00:00
pfg
ddea4de78e Add some basic definitions for a future htree implementation.
MFC after:	3 days
2012-08-24 01:12:07 +00:00
kevlo
c598ba51b7 Fix typo 2012-08-18 16:13:16 +00:00
mjg
b45a39ac78 Remove unused member of struct indir (in_exists) from UFS and EXT2 code.
Reviewed by:	mckusick
Approved by:	trasz (mentor)
MFC after:	1 week
2012-08-17 17:45:27 +00:00
hselasky
cd2aff7346 Streamline use of cdevpriv and correct some corner cases.
1) It is not useful to call "devfs_clear_cdevpriv()" from
"d_close" callbacks, hence for example read, write, ioctl and
so on might be sleeping at the time of "d_close" being called
and then then freed private data can still be accessed.
Examples: dtrace, linux_compat, ksyms (all fixed by this patch)

2) In sys/dev/drm* there are some cases in which memory will
be freed twice, if open fails, first by code in the open
routine, secondly by the cdevpriv destructor. Move registration
of the cdevpriv to the end of the drm open routines.

3) devfs_clear_cdevpriv() is not called if the "d_open" callback
registered cdevpriv data and the "d_open" callback function
returned an error. Fix this.

Discussed with:	phk
MFC after:	2 weeks
2012-08-15 16:19:39 +00:00
kib
a3d0fb0175 Do not leave invalid pages in the object after the short read for a
network file systems (not only NFS proper). Short reads cause pages
other then the requested one, which were not filled by read response,
to stay invalid.

Change the vm_page_readahead_finish() interface to not take the error
code, but instead to make a decision to free or to (de)activate the
page only by its validity. As result, not requested invalid pages are
freed even if the read RPC indicated success.

Noted and reviewed by:	alc
MFC after:	1 week
2012-08-14 11:45:47 +00:00
kib
cac2fe116f After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason
to pull vm_param.h was removed.  Other big dependency of vm_page.h on
vm_param.h are PA_LOCK* definitions, which are only needed for
in-kernel code, because modules use KBI-safe functions to lock the
pages.

Stop including vm_param.h into vm_page.h. Include vm_param.h
explicitely for the kernel code which needs it.

Suggested and reviewed by:	alc
MFC after:    2 weeks
2012-08-05 14:11:42 +00:00
kib
4259905d31 Reduce code duplication and exposure of direct access to struct
vm_page oflags by providing helper function
vm_page_readahead_finish(), which handles completed reads for pages
with indexes other then the requested one, for VOP_GETPAGES().

Reviewed by:	alc
MFC after:	1 week
2012-08-04 18:16:43 +00:00
kib
92640c3632 The header uma_int.h is internal uma header, unused by this source
file.  Do not include it needlessly.

Reviewed by:  alc
MFC after:    1 week
2012-08-04 18:12:54 +00:00
davidxu
c8c77f184e I am comparing current pipe code with the one in 8.3-STABLE r236165,
I found 8.3 is a history BSD version using socket to implement FIFO
pipe, it uses per-file seqcount to compare with writer generation
stored in per-pipe object. The concept is after all writers are gone,
the pipe enters next generation, all old readers have not closed the
pipe should get the indication that the pipe is disconnected, result
is they should get EPIPE, SIGPIPE or get POLLHUP in poll().
But newcomer should not know that previous writters were gone, it
should treat it as a fresh session.
I am trying to bring back FIFO pipe to history behavior. It is still
unclear that if single EOF flag can represent SBS_CANTSENDMORE and
SBS_CANTRCVMORE which socket-based version is using, but I have run
the poll regression test in tool directory, output is same as the one
on 8.3-STABLE now.
I think the output "not ok 18 FIFO state 6b: poll result 0 expected 1.
expected POLLHUP; got 0" might be bogus, because newcomer should not
know that old writers were gone. I got the same behavior on Linux.
Our implementation always return POLLIN for disconnected pipe even it
should return POLLHUP, but I think it is not wise to remove POLLIN for
compatible reason, this is our history behavior.

Regression test: /usr/src/tools/regression/poll
2012-07-31 05:48:35 +00:00
davidxu
d2b97b9193 When a thread is blocked in direct write state, it only sets PIPE_DIRECTW
flag but not PIPE_WANTW, but FIFO pipe code does not understand this internal
state, when a FIFO peer reader closes the pipe, it wants to notify the writer,
it checks PIPE_WANTW, if not set, it skips calling wakeup(), so blocked writer
never noticed the case, but in general, the writer should return from the
syscall with EPIPE error code and may get SIGPIPE signal. Setting the
PIPE_WANTW fixed problem, or you can turn off direct write, it should fix the
problem too. This bug is found by PR/170203.

Another bug in FIFO pipe code is when peer closes the pipe, another end which
is being blocked in select() or poll() is not notified, it missed to call
pipeselwakeup().

Third problem is found in poll regression test, the existing code can not
pass 6b,6c,6d tests, but FreeBSD-4 works. This commit does not fix the
problem, I still need to study more to find the cause.

PR: 170203
Tested by: Garrett Copper &lt; yanegomi at gmail dot com &gt;
2012-07-31 02:00:37 +00:00
kevlo
e2ca2cfba2 Use NULL instead of 0 for pointers 2012-07-22 15:40:31 +00:00
brueffer
190886cd49 Simply error handling by moving the allocation of np down to where it is
actually used.  While here, improve style a little.

Submitted by:	mjg
MFC after:	2 weeks
2012-07-16 22:07:29 +00:00
brueffer
275e546b68 Save a bzero() by using M_ZERO.
Obtained from:	Dragonfly BSD (change 4faaf07c3d7ddd120deed007370aaf4d90b72ebb)
MFC after:	2 weeks
2012-07-15 15:50:12 +00:00
attilio
b76b6f7fdf Remove a check on MNTK_UPDATE that is not really necessary as it is
handled in a code snippet above.
2012-07-10 00:23:25 +00:00
attilio
c7ea063227 - Remove the unused and not completed write support for NTFS.
- Fix a bug where vfs_mountedfrom() is called also when the filesystem
  is not mounted successfully.

Tested by:	pho
2012-07-10 00:01:00 +00:00
kevlo
1944317ce0 Fix a typo 2012-07-03 08:03:07 +00:00
kib
53224f018a Extend the KPI to lock and unlock f_offset member of struct file. It
now fully encapsulates all accesses to f_offset, and extends f_offset
locking to other consumers that need it, in particular, to lseek() and
variants of getdirentries().

Ensure that on 32bit architectures f_offset, which is 64bit quantity,
always read and written under the mtxpool protection. This fixes
apparently easy to trigger race when parallel lseek()s or lseek() and
read/write could destroy file offset.

The already broken ABI emulations, including iBCS and SysV, are not
converted (yet).

Tested by:	pho
No objections from:	jhb
MFC after:    3 weeks
2012-07-02 21:01:03 +00:00
kib
80d58366a4 Do not override an error from uiomove() with (non-)error result from
bwrite().  VFS needs to know about EFAULT from uiomove() and does not
care much that partially filled block writeback after EFAULT was
successfull.  Early return without error causes short write to be
reported to usermode.

Reported and tested by:	andreast
MFC after:	3 weeks
2012-07-02 09:53:08 +00:00
kib
09b19ea8ee Enable deadlock avoidance code for NFS client.
MFC after:	2 weeks
2012-06-21 09:26:06 +00:00
rmacklem
24def143f7 Fix the NFSv4 client for the case where mmap'd files are
written, but not msync'd by a process. A VOP_PUTPAGES()
called when VOP_RECLAIM() happens will usually fail, since
the NFSv4 Open has already been closed by VOP_INACTIVE().
Add a vm_object_page_clean() call to the NFSv4 client's
VOP_INACTIVE(), so that the write happens before the NFSv4
Open is closed. kib@ suggested using vgone() instead and
I will explore this, but this patch fixes things in the
meantime. For some reason, the VOP_PUTPAGES() is still
attaempted in VOP_RECLAIM(), but having this fail doesn't
cause any problems except a "stateid0 in write" being logged.

Reviewed by:	kib
MFC after:	1 week
2012-06-18 22:17:28 +00:00
rmacklem
77d92cc9de Move the nfsrpc_close() call in ncl_reclaim() for the NFSv4 client
to below the vnode_destroy_vobject() call, since that is where
writes are flushed.

Suggested by:	kib
MFC after:	1 week
2012-06-17 18:34:04 +00:00
kib
0f85e0cb46 Improve handling of uiomove(9) errors for the NFS client.
Do not brelse() the buffer unconditionally with BIO_ERROR set if
uiomove() failed. The brelse() treats most buffers with BIO_ERROR as
B_INVAL, dropping their content.  Instead, if the write request
covered the whole buffer, remember the cached state and brelse() with
BIO_ERROR set only if the buffer was not cached previously.

Update the buffer dirtyoff/dirtyend based on the progress recorded by
uiomove() in passed struct uio, even in the presence of
error. Otherwise, usermode could see changed data in the backed pages,
but later the buffer is destroyed without write-back.

If uiomove() failed for IO_UNIT request, try to truncate the vnode
back to the pre-write state, and rewind the progress in passed uio
accordingly, following the FFS behaviour.

Reviewed by:	rmacklem (some time ago)
Tested by:	pho
MFC after:	1 month
2012-06-06 16:30:16 +00:00
kib
b4b050eda2 Capitalize start of sentence.
MFC after:	3 days
2012-05-30 14:00:23 +00:00
marcel
e9bb2ca35e Catch a corner case where ssegs could be 0 and thus i would be 0 and
we index suinfo out of bounds (i.e. -1).

Approved by:	gber
2012-05-28 16:33:58 +00:00
ed
241db0ddf5 Fix style and consistency:
- Use tabs, not spaces.
- Add tab after #define.
- Don't mix the use of BSD and ISO C unsigned integer types. Prefer the
  ISO C ones.
2012-05-27 09:34:47 +00:00
gleb
fe722ad5af Use C99-style initialization for struct dirent in preparation for
changing the structure.

Sponsored by:	Google Summer of Code 2011
2012-05-25 09:16:59 +00:00
mav
08333340b6 Revert devfs part of r235911. I was unaware about old but unfinished
discussion between kib@ and gibbs@ about it.
2012-05-24 18:19:23 +00:00
mav
96f3e42ce2 MFprojects/zfsd:
Revamp the CAM enclosure services driver.
This updated driver uses an in-kernel daemon to track state changes and
publishes physical path location information\for disk elements into the
CAM device database.

Sponsored by:   Spectra Logic Corporation
Sponsored by:   iXsystems, Inc.
Submitted by:   gibbs, will, mav
2012-05-24 14:07:44 +00:00