associated with the previous vnode (if any) associated with the target of
a rename(). Otherwise, a lookup of the target pathname concurrent with a
rename() could re-add a name cache entry after the namei(RENAME) lookup
in kern_renameat() had purged the target pathname.
MFC after: 2 weeks
vm_object_pip_{add,subtract}() on the swap object because the swap
object can't be destroyed while the vnode is exclusively locked.
Moreover, even if the swap object could have been destroyed during
tmpfs_nocacheread() and tmpfs_mappedwrite() this code is broken
because vm_object_pip_subtract() does not wake up the sleeping thread
that is trying to destroy the swap object.
Free invalid pages after an I/O error. There is no virtue in keeping
them around in the swap object creating more work for the page daemon.
(I believe that any non-busy page in the swap object will now always
be valid.)
vm_pager_get_pages() does not return a standard errno, so its return
value should not be returned by tmpfs without translation to an errno
value.
There is no reason for the wakeup on vpg in tmpfs_mappedwrite() to
occur with the swap object locked.
Eliminate printf()s from tmpfs_nocacheread() and tmpfs_mappedwrite().
(The swap pager already spam your console if data corruption is
imminent.)
Reviewed by: kib
MFC after: 3 weeks
tmpfs_nocacheread(). It is both unnecessary and a pessimization. It
results in either the page being zeroed twice or zeroed first and then
overwritten by an I/O operation.
MFC after: 3 weeks
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.
Document the changes to flags field to only require the page lock.
Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.
Reviewed by: alc, attilio
Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by: re (bz)
file's last accessed, modified, and changed times:
TMPFS_NODE_ACCESSED and TMPFS_NODE_CHANGED should be set unconditionally
in tmpfs_remove() without regard to the number of hard links to the file.
Otherwise, after the last directory entry for a file has been removed, a
process that still has the file open could read stale values for the last
accessed and changed times with fstat(2).
Similarly, tmpfs_close() should update the time-related fields even if all
directory entries for a file have been removed. In this case, the effect
is that the time-related fields will have values that are later than
expected. They will correspond to the time at which fstat(2) is called.
In collaboration with: kib
MFC after: 1 week
either overflow the supplied buffer, or cause uiomove fail.
Do not advance cached de when directory entry was not copied out.
Do not return EOF when no entries could be copied due to first entry
too large for supplied buffer, signal EINVAL instead.
Reported by: Beat G?tzi <beat chruetertee ch>
MFC after: 1 week
Otherwise, adding insult to injury, in addition to double-caching of data
we would always copy the data into a vnode's vm object page from backend.
This is specific to sendfile case only (VOP_READ with UIO_NOCOPY).
PR: kern/141305
Reported by: Wiktor Niesiobedzki <bsd@vink.pl>
Reviewed by: alc
Tested by: tools/regression/sockets/sendfile
MFC after: 2 weeks
Right now unionfs only allows filesystems to be mounted on top of
another if it supports whiteouts. Even though I have sent a patch to
daichi@ to let unionfs work without it, we'd better also add support for
whiteouts to tmpfs.
This patch implements .vop_whiteout and makes necessary changes to
lookup() and readdir() to take them into account. We must also make sure
that when adding or removing a file, we honour the componentname's
DOWHITEOUT and ISWHITEOUT, to prevent duplicate filenames.
MFC after: 1 month
to unconditionally set PG_REFERENCED on a page before sleeping. In many
cases, it's perfectly ok for the page to disappear, i.e., be reclaimed by
the page daemon, before the caller to vm_page_sleep() is reawakened.
Instead, we now explicitly set PG_REFERENCED in those cases where having
the page persist until the caller is awakened is clearly desirable. Note,
however, that setting PG_REFERENCED on the page is still only a hint,
and not a guarantee that the page should persist.
the VFS. Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.
In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.
While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.
VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled. Bump __FreeBSD_version in order to signal such
situation.
both node pointer and name component. This does the right thing for
hardlinks to the same node in the same directory.
Submitted by: Yoshihiro Ota <ota j email ne jp>
PR: kern/131356
MFC after: 2 weeks
to add more V* constants, and the variables changed by this patch were often
being assigned to mode_t variables, which is 16 bit.
Approved by: rwatson (mentor)
and bcmp are not the same thing. 'man bcmp' states that the return is
"non-zero" if the two byte strings are not identical. Where as,
'man memcmp' states that the return is the "difference between the
first two differing bytes (treated as unsigned char values" if the
two byte strings are not identical.
So provide a proper memcmp(9), but it is a C implementation not a tuned
assembly implementation. Therefore bcmp(9) should be preferred over memcmp(9).
initialize the vattr structure in VOP_GETATTR() with VATTR_NULL(),
vattr_null() or by zeroing it. Remove these to allow preinitialization
of fields work in vn_stat(). This is needed to get birthtime initialized
correctly.
Submitted by: Jaakko Heinonen <jh saunalahti fi>
Discussed on: freebsd-fs
MFC after: 1 month
NODEV is more appropriate when va_rdev doesn't have a meaningful value.
Submitted by: Jaakko Heinonen <jh saunalahti fi>
Suggested by: bde
Discussed on: freebsd-fs
MFC after: 1 month
initialize va_vaflags and va_spare because they are not part of the
VOP_GETATTR() API. Also don't initialize birthtime to ctime or zero.
Submitted by: Jaakko Heinonen <jh saunalahti fi>
Reviewed by: bde
Discussed on: freebsd-fs
MFC after: 1 month
filesystem-specific vnode data to the struct vnode. Provide the
default implementation for the vop_advlock and vop_advlockasync.
Purge the locks on the vnode reclaim by using the lf_purgelocks().
The default implementation is augmented for the nfs and smbfs.
In the nfs_advlock, push the Giant inside the nfs_dolock.
Before the change, the vop_advlock and vop_advlockasync have taken the
unlocked vnode and dereferenced the fs-private inode data, racing with
with the vnode reclamation due to forced unmount. Now, the vop_getattr
under the shared vnode lock is used to obtain the inode size, and
later, in the lf_advlockasync, after locking the vnode interlock, the
VI_DOOMED flag is checked to prevent an operation on the doomed vnode.
The implementation of the lf_purgelocks() is submitted by dfr.
Reported by: kris
Tested by: kris, pho
Discussed with: jeff, dfr
MFC after: 2 weeks
user-mode lock manager, build a kernel with the NFSLOCKD option and
add '-k' to 'rpc_lockd_flags' in rc.conf.
Highlights include:
* Thread-safe kernel RPC client - many threads can use the same RPC
client handle safely with replies being de-multiplexed at the socket
upcall (typically driven directly by the NIC interrupt) and handed
off to whichever thread matches the reply. For UDP sockets, many RPC
clients can share the same socket. This allows the use of a single
privileged UDP port number to talk to an arbitrary number of remote
hosts.
* Single-threaded kernel RPC server. Adding support for multi-threaded
server would be relatively straightforward and would follow
approximately the Solaris KPI. A single thread should be sufficient
for the NLM since it should rarely block in normal operation.
* Kernel mode NLM server supporting cancel requests and granted
callbacks. I've tested the NLM server reasonably extensively - it
passes both my own tests and the NFS Connectathon locking tests
running on Solaris, Mac OS X and Ubuntu Linux.
* Userland NLM client supported. While the NLM server doesn't have
support for the local NFS client's locking needs, it does have to
field async replies and granted callbacks from remote NLMs that the
local client has contacted. We relay these replies to the userland
rpc.lockd over a local domain RPC socket.
* Robust deadlock detection for the local lock manager. In particular
it will detect deadlocks caused by a lock request that covers more
than one blocking request. As required by the NLM protocol, all
deadlock detection happens synchronously - a user is guaranteed that
if a lock request isn't rejected immediately, the lock will
eventually be granted. The old system allowed for a 'deferred
deadlock' condition where a blocked lock request could wake up and
find that some other deadlock-causing lock owner had beaten them to
the lock.
* Since both local and remote locks are managed by the same kernel
locking code, local and remote processes can safely use file locks
for mutual exclusion. Local processes have no fairness advantage
compared to remote processes when contending to lock a region that
has just been unlocked - the local lock manager enforces a strict
first-come first-served model for both local and remote lockers.
Sponsored by: Isilon Systems
PR: 95247 107555 115524 116679
MFC after: 2 weeks
always curthread.
As KPI gets broken by this patch, manpages and __FreeBSD_version will be
updated by further commits.
Tested by: Andrea Barberio <insomniac at slackware dot it>
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.
KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.
Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.
Manpage and FreeBSD_version will be updated through further commits.
As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.
Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>
strees2 suite, to quote his letter, this change:
1. It removes the tn_lookup_dirent stuff. I think this cannot be fixed,
because nothing protects vnode/tmpfs node between lookup is done, and
actual operation is performed, in the case the vnode lock is dropped.
At least, this is the case with the from vnode for rename.
For now, we do the linear lookup in the parent node. This has its own
drawbacks. Not mentioning speed (that could be fixed by using hash), the
real problem is the situation where several hardlinks exist in the dvp.
But, I think this is fixable.
2. The patch restores the VV_ROOT flag on the root vnode after it became
reclaimed and allocated again. This fixes MPASS assertion at the start
of the tmpfs_lookup() reported by many.
Submitted by: kib
- LK_RETRY prohibits vget() and vn_lock() to return error.
Remove associated code. [1]
- Properly use vhold() and vdrop() instead of their unlocked
versions, we are guaranteed to have the vnode's interlock
unheld. [1]
- Fix a pseudo-infinite loop caused by 64/32-bit arithmetic
with the same way used in modern NetBSD versions. [2]
- Reorganize tmpfs_readdir to reduce duplicated code.
Submitted by: kib [1]
Obtained from: NetBSD [2]
Approved by: re (tmpfs blanket)
- Respect cnflag and don't lock vnode always as LK_EXCLUSIVE [1]
- Properly lock around tn_vnode to avoid NULL deference
- Be more careful handling vnodes (*)
(*) This is a WIP
[1] by pjd via howardsu
Thanks kib@ for his valuable VFS related comments.
Tested with: fsx, fstest, tmpfs regression test set
Found by: pho's stress2 suite
Approved by: re (tmpfs blanket)
should finally fix fsx test case.
The printf's added here would be eventually turned into
assertions.
Submitted by: Mingyan Guo (mostly)
Approved by: re (tmpfs blanket)