changes in r246417 were incomplete as they did not add explicit calls to
sigdeferstop() around all the places that previously passed SBDRY to
_sleep(). In addition, nfs_getcacheblk() could trigger a write RPC from
getblk() resulting in sigdeferstop() recursing. Rather than manually
deferring stop signals in specific places, change the VFS_*() and VOP_*()
methods to defer stop signals for filesystems which request this behavior
via a new VFCF_SBDRY flag. Note that this has to be a VFC flag rather than
a MNTK flag so that it works properly with VFS_MOUNT() when the mount is
not yet fully constructed. For now, only the NFS clients are set this new
flag in VFS_SET().
A few other related changes:
- Add an assertion to ensure that TDF_SBDRY doesn't leak to userland.
- When a lookup request uses VOP_READLINK() to follow a symlink, mark
the request as being on behalf of the thread performing the lookup
(cnp_thread) rather than using a NULL thread pointer. This causes
NFS to properly handle signals during this VOP on an interruptible
mount.
PR: kern/176179
Reported by: Russell Cattelan (sigdeferstop() recursion)
Reviewed by: kib
MFC after: 1 month
is excessive. Postpone the flush of the fsinfo to VFS_SYNC(),
remembering the need for update with the flag MSDOSFS_FSIMOD, stored
in pm_flags.
FAT32 specification describes both FSI_Free_Count and FSI_Nxt_Free as
the advisory hints, not requiring them to be correct.
Based on the patch from bde, modified by me.
Reviewed by: bde
MFC after: 2 weeks
tmpfs_mapped{read, write}() functions:
- tmpfs_mapped{read, write}() are only called within VOP_{READ, WRITE}(),
which check before-hand to work only on valid VREG vnodes. Also the
vnode is locked for the duration of the work, making vnode reclaiming
impossible, during the operation. Hence, vobj can never be NULL.
- Currently check on resident pages and cached pages without vm object
lock held is racy and can do even more harm than good, as a page could
be transitioning between these 2 pools and then be skipped entirely.
Skip the checks as lookups on empty splay trees are very cheap.
Discussed with: alc
Tested by: flo
MFC after: 2 weeks
e2fs_maxcontig was modelled after UFS when bringing the
"Orlov allocator" to ext2. On UFS fs_maxcontig is kept in the
superblock and is used by userland tools (fsck and growfs),
In ext2 this information is volatile so it is not available
for userland tools, so in this case it doesn't have sense
to carry it in the in-memory superblock.
Also remove a pointless check for MAX(1, x) > 0.
Submitted by: Christoph Mallon
MFC after: 2 weeks
Posix requires that open(2) is restartable for SA_RESTART.
For non-posix objects, in particular, devfs nodes, still disable
automatic restart of the opens. The open call to a driver could have
significant side effects for the hardware.
Noted and reviewed by: jilles
Discussed with: bde
MFC after: 2 weeks
195702, 195703, and 195821 prevented a thread from suspending while holding
locks inside of NFS by forcing the thread to fail sleeps with EINTR or
ERESTART but defer the thread suspension to the user boundary. However,
this had the effect that stopping a process during an NFS request could
abort the request and trigger EINTR errors that were visible to userland
processes (previously the thread would have suspended and completed the
request once it was resumed).
This change instead effectively masks stop signals while in the NFS client.
It uses the existing TDF_SBDRY flag to effect this since SIGSTOP cannot
be masked directly. Also, instead of setting PBDRY on individual sleeps,
the NFS client now sets the TDF_SBDRY flag around each NFS request and
stop signals are masked for all sleeps during that region (the previous
change missed sleeps in lockmgr locks). The end result is that stop
signals sent to threads performing an NFS request are completely
ignored until after the NFS request has finished processing and the
thread prepares to return to userland. This restores the behavior of
stop signals being transparent to userland processes while still
preventing threads from suspending while holding NFS locks.
Reviewed by: kib
MFC after: 1 month
- Remove unused extern declarations in fs.h
- Correct comments in ext2_dir.h
- Several panic() messages showed wrong function names.
- Remove commented out stray line in ext2_alloc.c.
- Remove the unused macro EXT2_BLOCK_SIZE_BITS() and the then
write-only member e2fs_blocksize_bits from struct m_ext2fs.
- Remove the unused macro EXT2_FIRST_INO() and the then write-only
member e2fs_first_inode from struct m_ext2fs.
- Remove EXT2_DESC_PER_BLOCK() and the member e2fs_descpb from
struct m_ext2fs.
- Remove the unused members e2fs_bmask, e2fs_dbpg and
e2fs_mount_opt from struct m_ext2fs
- Correct harmless off-by-one error for fspath in ext2_vfsops.c.
- Remove the unused and broken macros EXT2_ADDR_PER_BLOCK_BITS()
and EXT2_DESC_PER_BLOCK_BITS().
- Remove the !_KERNEL versions of the EXT2_* macros.
Submitted by: Christoph Mallon
MFC after: 2 weeks
track the MNT_SYNCHRONOUS flag. It is set to the latter at mount time
but not updated by MNT_UPDATE.
Use MNT_SYNCHRONOUS to decide to write the FAT updates syncrhonously.
Submitted by: bde
MFC after: 1 week
a directory to a subdir of the root directory from somewhere else.
For all directory moves that change the parent directory, the dotdot
entry must be fixed up. For msdosfs, the root directory is magic for
non-FAT32. It is less magic for FAT32, but needs the same magic for
the dotdot fixup. It didn't have it.
Both chkdsk and fsck_msdosfs fix the corrupt directory entries with no
problems.
The fix is to use the same magic for dotdot in msdosfs_rename() as in
msdosfs_mkdir().
For msdosfs_mkdir(), document the magic. When writing the dotdot entry
in mkdir, use explicitly set pcl variable instead on relying on the
start cluster of the root directory typically has a value < 65536.
Submitted by: bde
MFC after: 1 week
Trying FAT32 on a small partition failed to mount because
pmp->pm_Sectors was nonzero. Normally, FAT32 file systems are so
large that the 16-bit pm_Sectors can't hold the size. This is
indicated by setting it to 0 and using only pm_HugeSectors. But at
least old versions of newfs_msdos use the 16-bit field if possible,
and msdosfs supports this except for breaking its own support in the
sanity check. This is quite different from the handling of pm_FATsecs
-- now the 16-bit value is always ignored for FAT32 except for
checking that it is 0, and newfs_msdos doesn't use the 16-bit value
for FAT32.
Submitted by: bde
MFC after: 1 week
this check is somewhere in the network code, but this assertion
already proven to be useful in catching what seems to be driver bugs
causing NFS scrambling random memory.
Discussed with: rmacklem
MFC after: 1 week
requested from the server for the read operation. Server shall not
reply with too large size, but client should be resilent too.
Reviewed by: rmacklem
MFC after: 1 week
- Use NFSD_MONOSEC (which maps to time_uptime) instead of the seconds
portion of wall-time stamps to manage timeouts on events.
- Remove unused nd_starttime from the per-request structure in the new
NFS server.
- Use nanotime() for the modification time on a delegation to get as
precise a time as possible.
- Use time_second instead of extracting the second from a call to
getmicrotime().
Submitted by: bde (3)
Reviewed by: bde, rmacklem
MFC after: 2 weeks
The previous change accidentally left the substraction we
were trying to avoid in case that i_blocks could become
negative.
Reported by: bde
MFC after: 4 days
Ext2fs uses unsigned fields in its dinode struct.
FreeBSD can have negative values in some of those
fields and the inode is meant to interact with the
system so we have never respected the unsigned
nature of most of those fields.
Block numbers and the NFS generation number do
not need to be signed so redefine them as
unsigned to better match the on-disk information.
MFC after: 1 week
Testing with fsx has revealed problems and in order to
hunt the bugs properly we need reduce the complexity.
This seems to help but is not a complete solution.
MFC after: 3 days
to the old one's nfs.nfsrv.async.
Please note that by enabling this option (default is disabled), the system
could potentionally have silent data corruption if the server crashes
before write is committed to non-volatile storage, as the client side have
no way to tell if the data is already written.
Submitted by: rmacklem
MFC after: 2 weeks
in r245004. Although the report was for noatime option which is
non-functional for the nullfs, other standard options like nosuid or
noexec are useful with it.
Reported by: Dewayne Geraghty <dewayne.geraghty@heuristicsystems.com.au>
MFC after: 3 days
by returning an error of EINTR rather than EACCES.
- While here, bring back some (but not all) of the NFS RPC statistics lost
when krpc was committed.
Reviewed by: rmacklem
MFC after: 1 week
existing nullfs vnode by the lower vnode is only 16 slots. Since the
default mode for the nullfs is to cache the vnodes, hash has extremely
huge chains.
Size the nullfs hashtbl based on the current value of
desiredvnodes. Use vfs_hash_index() to calculate the hash bucket for a
given vnode.
Pointy hat to: kib
Diagnosed and reviewed by: peter
Tested by: peter, pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 5 days
get back the leased write reference from the lower vnode. There is no
other path which can correct v_writecount on the lowervp.
Reported by: flo
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
It was plagued with style errors and the offsets had been lost.
While here took the time to update the fields according to the
latest ext4 documentation.
Reviewed by: bde
MFC after: 3 days
Use file name hash as a tree key, handle duplicate keys. Both VOP_LOOKUP
and VOP_READDIR operations utilize same tree for search. Directory
entry offset (cookie) is either file name hash or incremental id in case
of hash collisions (duplicate-cookies). Keep sorted per directory list
of duplicate-cookie entries to facilitate cookie number allocation.
Don't fail if previous VOP_READDIR() offset is no longer valid, start
with next dirent instead. Other file system handle it similarly.
Workaround race prone tn_readdir_last[pn] fields update.
Add tmpfs_dir_destroy() to free all dirents.
Set NFS cookies in tmpfs_dir_getdents(). Return EJUSTRETURN from
tmpfs_dir_getdents() instead of hard coded -1.
Mark directory traversal routines static as they are no longer
used outside of tmpfs_subr.c
the free nullfs vnodes, switching nullfs behaviour to pre-r240285.
The option is mostly intended as the last-resort when higher pressure
on the vnode cache due to doubling of the vnode counts is not
desirable.
Note that disabling the cache costs more than 2x wall time in the
metadata-hungry scenarious. The default is "cache".
Tested and benchmarked by: pho (previous version)
MFC after: 2 weeks