Commit Graph

2958 Commits

Author SHA1 Message Date
Attilio Rao
258bee160c Garbage collect HPFS bits which are now already completely disconnected
from the tree since few months (please note that the userland bits
were already disconnected since a long time, thus there is no need
to update the OLD* entries).

This is not targeted for MFC.
2013-03-02 14:54:33 +00:00
Jilles Tjoelker
6d6a91c50f nullfs: Improve f_flags in statfs().
Include some flags of the nullfs mount itself:
MNT_RDONLY, MNT_NOEXEC, MNT_NOSUID, MNT_UNION, MNT_NOSYMFOLLOW.

This allows userland code calling statfs() or fstatfs() to see these flags.
In particular, this allows opendir() to detect that a -t nullfs -o union
mount needs deduplication (otherwise at least . and .. are returned twice)
and allows rtld to detect a -t nullfs -o noexec mount as noexec.

Turn off the MNT_ROOTFS flag from the underlying filesystem because the
nullfs mount is definitely not the root filesystem.

Reviewed by:	kib
MFC after:	1 week
2013-03-02 12:42:23 +00:00
Pawel Jakub Dawidek
2609222ab4 Merge Capsicum overhaul:
- Capability is no longer separate descriptor type. Now every descriptor
  has set of its own capability rights.

- The cap_new(2) system call is left, but it is no longer documented and
  should not be used in new code.

- The new syscall cap_rights_limit(2) should be used instead of
  cap_new(2), which limits capability rights of the given descriptor
  without creating a new one.

- The cap_getrights(2) syscall is renamed to cap_rights_get(2).

- If CAP_IOCTL capability right is present we can further reduce allowed
  ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed
  ioctls can be retrived with cap_ioctls_get(2) syscall.

- If CAP_FCNTL capability right is present we can further reduce fcntls
  that can be used with the new cap_fcntls_limit(2) syscall and retrive
  them with cap_fcntls_get(2).

- To support ioctl and fcntl white-listing the filedesc structure was
  heavly modified.

- The audit subsystem, kdump and procstat tools were updated to
  recognize new syscalls.

- Capability rights were revised and eventhough I tried hard to provide
  backward API and ABI compatibility there are some incompatible changes
  that are described in detail below:

	CAP_CREATE old behaviour:
	- Allow for openat(2)+O_CREAT.
	- Allow for linkat(2).
	- Allow for symlinkat(2).
	CAP_CREATE new behaviour:
	- Allow for openat(2)+O_CREAT.

	Added CAP_LINKAT:
	- Allow for linkat(2). ABI: Reuses CAP_RMDIR bit.
	- Allow to be target for renameat(2).

	Added CAP_SYMLINKAT:
	- Allow for symlinkat(2).

	Removed CAP_DELETE. Old behaviour:
	- Allow for unlinkat(2) when removing non-directory object.
	- Allow to be source for renameat(2).

	Removed CAP_RMDIR. Old behaviour:
	- Allow for unlinkat(2) when removing directory.

	Added CAP_RENAMEAT:
	- Required for source directory for the renameat(2) syscall.

	Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR):
	- Allow for unlinkat(2) on any object.
	- Required if target of renameat(2) exists and will be removed by this
	  call.

	Removed CAP_MAPEXEC.

	CAP_MMAP old behaviour:
	- Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and
	  PROT_WRITE.
	CAP_MMAP new behaviour:
	- Allow for mmap(2)+PROT_NONE.

	Added CAP_MMAP_R:
	- Allow for mmap(PROT_READ).
	Added CAP_MMAP_W:
	- Allow for mmap(PROT_WRITE).
	Added CAP_MMAP_X:
	- Allow for mmap(PROT_EXEC).
	Added CAP_MMAP_RW:
	- Allow for mmap(PROT_READ | PROT_WRITE).
	Added CAP_MMAP_RX:
	- Allow for mmap(PROT_READ | PROT_EXEC).
	Added CAP_MMAP_WX:
	- Allow for mmap(PROT_WRITE | PROT_EXEC).
	Added CAP_MMAP_RWX:
	- Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).

	Renamed CAP_MKDIR to CAP_MKDIRAT.
	Renamed CAP_MKFIFO to CAP_MKFIFOAT.
	Renamed CAP_MKNODE to CAP_MKNODEAT.

	CAP_READ old behaviour:
	- Allow pread(2).
	- Disallow read(2), readv(2) (if there is no CAP_SEEK).
	CAP_READ new behaviour:
	- Allow read(2), readv(2).
	- Disallow pread(2) (CAP_SEEK was also required).

	CAP_WRITE old behaviour:
	- Allow pwrite(2).
	- Disallow write(2), writev(2) (if there is no CAP_SEEK).
	CAP_WRITE new behaviour:
	- Allow write(2), writev(2).
	- Disallow pwrite(2) (CAP_SEEK was also required).

	Added convinient defines:

	#define	CAP_PREAD		(CAP_SEEK | CAP_READ)
	#define	CAP_PWRITE		(CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_R		(CAP_MMAP | CAP_SEEK | CAP_READ)
	#define	CAP_MMAP_W		(CAP_MMAP | CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_X		(CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL)
	#define	CAP_MMAP_RW		(CAP_MMAP_R | CAP_MMAP_W)
	#define	CAP_MMAP_RX		(CAP_MMAP_R | CAP_MMAP_X)
	#define	CAP_MMAP_WX		(CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_MMAP_RWX		(CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_RECV		CAP_READ
	#define	CAP_SEND		CAP_WRITE

	#define	CAP_SOCK_CLIENT \
		(CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \
		 CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN)
	#define	CAP_SOCK_SERVER \
		(CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \
		 CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \
		 CAP_SETSOCKOPT | CAP_SHUTDOWN)

	Added defines for backward API compatibility:

	#define	CAP_MAPEXEC		CAP_MMAP_X
	#define	CAP_DELETE		CAP_UNLINKAT
	#define	CAP_MKDIR		CAP_MKDIRAT
	#define	CAP_RMDIR		CAP_UNLINKAT
	#define	CAP_MKFIFO		CAP_MKFIFOAT
	#define	CAP_MKNOD		CAP_MKNODAT
	#define	CAP_SOCK_ALL		(CAP_SOCK_CLIENT | CAP_SOCK_SERVER)

Sponsored by:	The FreeBSD Foundation
Reviewed by:	Christoph Mallon <christoph.mallon@gmx.de>
Many aspects discussed with:	rwatson, benl, jonathan
ABI compatibility discussed with:	kib
2013-03-02 00:53:12 +00:00
Alan Cox
2c8472682c Eliminate a duplicate #include.
Sponsored by:	EMC / Isilon Storage Division
2013-02-26 07:00:24 +00:00
Attilio Rao
590f9303e5 Merge from vmobj-rwlock branch:
Remove unused inclusion of vm/vm_pager.h and vm/vnode_pager.h.

Sponsored by:	EMC / Isilon storage division
Tested by:	pho
Reviewed by:	alc
2013-02-26 01:00:11 +00:00
John Baldwin
593efaf9f7 Further refine the handling of stop signals in the NFS client. The
changes in r246417 were incomplete as they did not add explicit calls to
sigdeferstop() around all the places that previously passed SBDRY to
_sleep().  In addition, nfs_getcacheblk() could trigger a write RPC from
getblk() resulting in sigdeferstop() recursing.  Rather than manually
deferring stop signals in specific places, change the VFS_*() and VOP_*()
methods to defer stop signals for filesystems which request this behavior
via a new VFCF_SBDRY flag.  Note that this has to be a VFC flag rather than
a MNTK flag so that it works properly with VFS_MOUNT() when the mount is
not yet fully constructed.  For now, only the NFS clients are set this new
flag in VFS_SET().

A few other related changes:
- Add an assertion to ensure that TDF_SBDRY doesn't leak to userland.
- When a lookup request uses VOP_READLINK() to follow a symlink, mark
  the request as being on behalf of the thread performing the lookup
  (cnp_thread) rather than using a NULL thread pointer.  This causes
  NFS to properly handle signals during this VOP on an interruptible
  mount.

PR:		kern/176179
Reported by:	Russell Cattelan (sigdeferstop() recursion)
Reviewed by:	kib
MFC after:	1 month
2013-02-21 19:02:50 +00:00
Warner Losh
b96f7e0a60 The request queue is already locked, so we don't need the splsofclock/splx
here to note future work.
2013-02-21 02:43:44 +00:00
Konstantin Belousov
bb7ca8229d Do not update the fsinfo block on each update of any fat block, this
is excessive. Postpone the flush of the fsinfo to VFS_SYNC(),
remembering the need for update with the flag MSDOSFS_FSIMOD, stored
in pm_flags.

FAT32 specification describes both FSI_Free_Count and FSI_Nxt_Free as
the advisory hints, not requiring them to be correct.

Based on the patch from bde, modified by me.

Reviewed by: bde
MFC after:   2 weeks
2013-02-17 20:35:54 +00:00
Baptiste Daroussin
3c4223a65a Revert r246791 as it needs a security review first
Reported by:	gavin, rwatson
2013-02-14 15:17:53 +00:00
Baptiste Daroussin
f4365abd91 Allow fdescfs to be mounted from inside a jail
MFC after:	1 week
2013-02-14 13:03:15 +00:00
Pedro F. Giffuni
a9d1b29995 ext2fs: Use prototype declarations for function definitions
Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-10 19:49:37 +00:00
Attilio Rao
5e60cb948e Remove a racy checks on resident and cached pages for
tmpfs_mapped{read, write}() functions:
- tmpfs_mapped{read, write}() are only called within VOP_{READ, WRITE}(),
  which check before-hand to work only on valid VREG vnodes.  Also the
  vnode is locked for the duration of the work, making vnode reclaiming
  impossible, during the operation. Hence, vobj can never be NULL.
- Currently check on resident pages and cached pages without vm object
  lock held is racy and can do even more harm than good, as a page could
  be transitioning between these 2 pools and then be skipped entirely.
  Skip the checks as lookups on empty splay trees are very cheap.

Discussed with:	alc
Tested by:	flo
MFC after:	2 weeks
2013-02-10 01:04:10 +00:00
Pedro F. Giffuni
4e1e0e2582 ext2fs: Replace redundant EXT2_MIN_BLOCK with EXT2_MIN_BLOCK_SIZE.
Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-08 21:09:44 +00:00
Pedro F. Giffuni
1a125d6d85 ext2fs: make e2fs_maxcontig local and remove tautological check.
e2fs_maxcontig was modelled after UFS when bringing the
"Orlov allocator" to ext2. On UFS fs_maxcontig is kept in the
superblock and is used by userland tools (fsck and growfs),

In ext2 this information is volatile so it is not available
for userland tools, so in this case it doesn't have sense
to carry it in the in-memory superblock.

Also remove a pointless check for MAX(1, x) > 0.

Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-08 20:58:00 +00:00
Pedro F. Giffuni
a940ce65cd Remove unused MAXSYMLINKLEN macro.
Reviewed by:	mckusick
PR:		kern/175794
MFC after:	1 week
2013-02-08 20:30:19 +00:00
Konstantin Belousov
2ca4998342 Stop translating the ERESTART error from the open(2) into EINTR.
Posix requires that open(2) is restartable for SA_RESTART.

For non-posix objects, in particular, devfs nodes, still disable
automatic restart of the opens. The open call to a driver could have
significant side effects for the hardware.

Noted and reviewed by:	jilles
Discussed with:	bde
MFC after:	2 weeks
2013-02-07 14:53:33 +00:00
John Baldwin
a120a7a3cd Rework the handling of stop signals in the NFS client. The changes in
195702, 195703, and 195821 prevented a thread from suspending while holding
locks inside of NFS by forcing the thread to fail sleeps with EINTR or
ERESTART but defer the thread suspension to the user boundary.  However,
this had the effect that stopping a process during an NFS request could
abort the request and trigger EINTR errors that were visible to userland
processes (previously the thread would have suspended and completed the
request once it was resumed).

This change instead effectively masks stop signals while in the NFS client.
It uses the existing TDF_SBDRY flag to effect this since SIGSTOP cannot
be masked directly.  Also, instead of setting PBDRY on individual sleeps,
the NFS client now sets the TDF_SBDRY flag around each NFS request and
stop signals are masked for all sleeps during that region (the previous
change missed sleeps in lockmgr locks).  The end result is that stop
signals sent to threads performing an NFS request are completely
ignored until after the NFS request has finished processing and the
thread prepares to return to userland.  This restores the behavior of
stop signals being transparent to userland processes while still
preventing threads from suspending while holding NFS locks.

Reviewed by:	kib
MFC after:	1 month
2013-02-06 17:06:51 +00:00
Pedro F. Giffuni
d427334435 ext2fs: move assignment where it is not dead.
Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-05 03:26:34 +00:00
Pedro F. Giffuni
80b6a61199 ext2fs: Remove unused em_e2fsb definition..
Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-05 03:23:56 +00:00
Pedro F. Giffuni
555368dcf1 ext2fs: Remove useless rootino local variable.
Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-05 03:17:41 +00:00
Pedro F. Giffuni
ef024b0da9 ext2fs: Correct off-by-one errors in FFTODT() and DDTOFT().
Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-05 03:13:05 +00:00
Pedro F. Giffuni
666116e4a3 ext2fs: Use nitems().
Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-05 03:08:56 +00:00
Pedro F. Giffuni
fdc100e4c2 ext2fs: Use EXT2_LINK_MAX instead of LINK_MAX
Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-05 03:01:04 +00:00
Pedro F. Giffuni
757224cbdb ext2fs: general cleanup.
- Remove unused extern declarations in fs.h
- Correct comments in ext2_dir.h
- Several panic() messages showed wrong function names.
- Remove commented out stray line in ext2_alloc.c.
- Remove the unused macro EXT2_BLOCK_SIZE_BITS() and the then
  write-only member e2fs_blocksize_bits from struct m_ext2fs.
- Remove the unused macro EXT2_FIRST_INO() and the then write-only
  member e2fs_first_inode from struct m_ext2fs.
- Remove EXT2_DESC_PER_BLOCK() and the member e2fs_descpb from
  struct m_ext2fs.
- Remove the unused members e2fs_bmask, e2fs_dbpg and
  e2fs_mount_opt from struct m_ext2fs
- Correct harmless off-by-one error for fspath in ext2_vfsops.c.
- Remove the unused and broken macros EXT2_ADDR_PER_BLOCK_BITS()
  and EXT2_DESC_PER_BLOCK_BITS().
- Remove the !_KERNEL versions of the EXT2_* macros.

Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-02 22:23:45 +00:00
Konstantin Belousov
11fca81ccd The MSDOSFSMNT_WAITONFAT flag is bogus and broken. It does less than
track the MNT_SYNCHRONOUS flag.  It is set to the latter at mount time
but not updated by MNT_UPDATE.

Use MNT_SYNCHRONOUS to decide to write the FAT updates syncrhonously.

Submitted by:	bde
MFC after:	1 week
2013-02-01 18:30:41 +00:00
Konstantin Belousov
79fb7dd167 Backup FATs were sometimes marked dirty by copying their first block
from the primary FAT, and then they were not marked clean on unmount.
Force marking them clean when appropriate.

Submitted by:	bde
MFC after:	1 week
2013-02-01 18:25:53 +00:00
Konstantin Belousov
9ec062dddd The directory entry for dotdot was corrupted in the FAT32 case when moving
a directory to a subdir of the root directory from somewhere else.

For all directory moves that change the parent directory, the dotdot
entry must be fixed up.  For msdosfs, the root directory is magic for
non-FAT32.  It is less magic for FAT32, but needs the same magic for
the dotdot fixup.  It didn't have it.

Both chkdsk and fsck_msdosfs fix the corrupt directory entries with no
problems.

The fix is to use the same magic for dotdot in msdosfs_rename() as in
msdosfs_mkdir().

For msdosfs_mkdir(), document the magic. When writing the dotdot entry
in mkdir, use explicitly set pcl variable instead on relying on the
start cluster of the root directory typically has a value < 65536.

Submitted by:	bde
MFC after:	1 week
2013-02-01 18:06:06 +00:00
Konstantin Belousov
48efa33b49 The mountmsdosfs() function had an insane sanity test, remove it.
Trying FAT32 on a small partition failed to mount because
pmp->pm_Sectors was nonzero.  Normally, FAT32 file systems are so
large that the 16-bit pm_Sectors can't hold the size.  This is
indicated by setting it to 0 and using only pm_HugeSectors.  But at
least old versions of newfs_msdos use the 16-bit field if possible,
and msdosfs supports this except for breaking its own support in the
sanity check.  This is quite different from the handling of pm_FATsecs
-- now the 16-bit value is always ignored for FAT32 except for
checking that it is 0, and newfs_msdos doesn't use the 16-bit value
for FAT32.

Submitted by:	bde
MFC after:	1 week
2013-02-01 18:01:03 +00:00
Konstantin Belousov
a26b949f2d Fix a backwards comment in markvoldirty().
Submitted by:	bde
MFC after:	1 week
2013-02-01 17:58:37 +00:00
Konstantin Belousov
dd6035234a Assert that the mbuf in the chain has sane length. Proper place for
this check is somewhere in the network code, but this assertion
already proven to be useful in catching what seems to be driver bugs
causing NFS scrambling random memory.

Discussed with:	rmacklem
MFC after:	1 week
2013-02-01 16:57:02 +00:00
Konstantin Belousov
6168020f66 Be conservative and do not try to consume more bytes than was
requested from the server for the read operation.  Server shall not
reply with too large size, but client should be resilent too.

Reviewed by:	rmacklem
MFC after:	1 week
2013-01-27 09:34:25 +00:00
Pedro F. Giffuni
646a7fea0c Clean some 'svn:executable' properties in the tree.
Submitted by:	Christoph Mallon
MFC after:	3 days
2013-01-26 22:08:21 +00:00
Pedro F. Giffuni
879aeda7b6 Cosmetical off-by-one
Technically, the case when all the blocks are released
is not a sanity check.
Move further the comment while here.

Suggested by:	bde
MFC after:	3 days
2013-01-26 21:50:52 +00:00
John Baldwin
a89a2c8ba4 Further cleanups to use of timestamps in NFS:
- Use NFSD_MONOSEC (which maps to time_uptime) instead of the seconds
  portion of wall-time stamps to manage timeouts on events.
- Remove unused nd_starttime from the per-request structure in the new
  NFS server.
- Use nanotime() for the modification time on a delegation to get as
  precise a time as possible.
- Use time_second instead of extracting the second from a call to
  getmicrotime().

Submitted by:	bde (3)
Reviewed by:	bde, rmacklem
MFC after:	2 weeks
2013-01-25 15:25:24 +00:00
Pedro F. Giffuni
69017f8d8c ext2fs: fix a check for negative block numbers.
The previous change accidentally left the substraction we
were trying to avoid in case that i_blocks could become
negative.

Reported by:	bde
MFC after:	4 days
2013-01-23 14:29:29 +00:00
Pedro F. Giffuni
1d04725a7a ext2fs: make some inode fields match the ext2 spec.
Ext2fs uses unsigned fields in its dinode struct.
FreeBSD can have negative values in some of those
fields and the inode is meant to interact with the
system so we have never respected the unsigned
nature of most of those fields.

Block numbers and the NFS generation number do
not need to be signed so redefine them as
unsigned to better match the on-disk information.

MFC after:	1 week
2013-01-22 18:54:03 +00:00
Pedro F. Giffuni
4b21c8fda9 ext2fs: temporarily disable the reallocation code.
Testing with fsx has revealed problems and in order to
hunt the bugs properly we need reduce the complexity.

This seems to help but is not a complete solution.

MFC after:	3 days
2013-01-22 18:36:31 +00:00
Xin LI
e4558aacfc Make it possible to force async at server side on new NFS server, similar
to the old one's nfs.nfsrv.async.

Please note that by enabling this option (default is disabled), the system
could potentionally have silent data corruption if the server crashes
before write is committed to non-volatile storage, as the client side have
no way to tell if the data is already written.

Submitted by:	rmacklem
MFC after:	2 weeks
2013-01-18 19:42:08 +00:00
Pedro F. Giffuni
b187108f66 ext2fs: Add some DOINGASYNC check to match ffs.
This is mostly cosmetical.

Reviewed by:	bde
MFC after:	3 days
2013-01-18 19:11:17 +00:00
John Baldwin
d177f14da9 Use vfs_timestamp() to set file timestamps rather than invoking
getmicrotime() or getnanotime() directly in NFS.

Reviewed by:	rmacklem, bde
MFC after:	1 week
2013-01-18 18:43:38 +00:00
John Baldwin
39804bc89d Remove a no-longer-used variable after the previous change to use
VA_UTIMES_NULL.

Submitted by:	bde, rmacklem
MFC after:	1 week
2013-01-17 18:45:20 +00:00
John Baldwin
5055536eec Use the VA_UTIMES_NULL flag to detect when NULL was passed to utimes()
instead of comparing the desired time against the current time as a
heuristic.

Reviewed by:	rmacklem
MFC after:	1 week
2013-01-16 21:52:31 +00:00
Konstantin Belousov
e8f966eeb8 Remove the filtering of the acceptable mount options for nullfs, added
in r245004.  Although the report was for noatime option which is
non-functional for the nullfs, other standard options like nosuid or
noexec are useful with it.

Reported by:	Dewayne Geraghty <dewayne.geraghty@heuristicsystems.com.au>
MFC after:	3 days
2013-01-16 05:32:49 +00:00
John Baldwin
6910d7a0d8 - More properly handle interrupted NFS requests on an interruptible mount
by returning an error of EINTR rather than EACCES.
- While here, bring back some (but not all) of the NFS RPC statistics lost
  when krpc was committed.

Reviewed by:	rmacklem
MFC after:	1 week
2013-01-15 22:08:17 +00:00
Konstantin Belousov
603f963e56 The current default size of the nullfs hash table used to lookup the
existing nullfs vnode by the lower vnode is only 16 slots.  Since the
default mode for the nullfs is to cache the vnodes, hash has extremely
huge chains.

Size the nullfs hashtbl based on the current value of
desiredvnodes. Use vfs_hash_index() to calculate the hash bucket for a
given vnode.

Pointy hat to:	    kib
Diagnosed and reviewed by:	peter
Tested by:    peter, pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	5 days
2013-01-14 05:44:47 +00:00
Konstantin Belousov
6b17595133 When nullfs mount is forcibly unmounted and nullfs vnode is reclaimed,
get back the leased write reference from the lower vnode.  There is no
other path which can correct v_writecount on the lowervp.

Reported by:	flo
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2013-01-10 18:24:48 +00:00
Baptiste Daroussin
3d94054c30 Add support for IO_APPEND flag in fuse
This make open(..., O_APPEND) actually works on fuse filesystem.

Reviewed by:	attilio
2013-01-08 12:21:50 +00:00
Pedro F. Giffuni
98f5c0d41a ext2fs: cleanup de dinode structure.
It was plagued with style errors and the offsets had been lost.
While here took the time to update the fields according to the
latest ext4 documentation.

Reviewed by:	bde
MFC after:	3 days
2013-01-07 03:36:32 +00:00
Gleb Kurtsou
4fd5efe79e tmpfs: Replace directory entry linked list with RB-Tree.
Use file name hash as a tree key, handle duplicate keys.  Both VOP_LOOKUP
and VOP_READDIR operations utilize same tree for search.  Directory
entry offset (cookie) is either file name hash or incremental id in case
of hash collisions (duplicate-cookies).  Keep sorted per directory list
of duplicate-cookie entries to facilitate cookie number allocation.

Don't fail if previous VOP_READDIR() offset is no longer valid, start
with next dirent instead.  Other file system handle it similarly.

Workaround race prone tn_readdir_last[pn] fields update.

Add tmpfs_dir_destroy() to free all dirents.

Set NFS cookies in tmpfs_dir_getdents(). Return EJUSTRETURN from
tmpfs_dir_getdents() instead of hard coded -1.

Mark directory traversal routines static as they are no longer
used outside of tmpfs_subr.c
2013-01-06 22:15:44 +00:00
Konstantin Belousov
268dd286a0 Fix reversed condition in the assertion.
Pointy hat to:	kib
MFC after:	13 days
2013-01-04 07:52:47 +00:00