network file systems (not only NFS proper). Short reads cause pages
other then the requested one, which were not filled by read response,
to stay invalid.
Change the vm_page_readahead_finish() interface to not take the error
code, but instead to make a decision to free or to (de)activate the
page only by its validity. As result, not requested invalid pages are
freed even if the read RPC indicated success.
Noted and reviewed by: alc
MFC after: 1 week
to pull vm_param.h was removed. Other big dependency of vm_page.h on
vm_param.h are PA_LOCK* definitions, which are only needed for
in-kernel code, because modules use KBI-safe functions to lock the
pages.
Stop including vm_param.h into vm_page.h. Include vm_param.h
explicitely for the kernel code which needs it.
Suggested and reviewed by: alc
MFC after: 2 weeks
vm_page oflags by providing helper function
vm_page_readahead_finish(), which handles completed reads for pages
with indexes other then the requested one, for VOP_GETPAGES().
Reviewed by: alc
MFC after: 1 week
VM_PAGER_AGAIN to VM_PAGER_ERROR for the uwritten pages. Return
VM_PAGER_AGAIN for the partially written page. Always forward at least
one page in the loop of vm_object_page_clean().
VM_PAGER_ERROR causes the page reactivation and does not clear the
page dirty state, so the write is not lost.
The change fixes an infinite loop in vm_object_page_clean() when the
filesystem returns permanent errors for some page writes.
Reported and tested by: gavin
Reviewed by: alc, rmacklem
MFC after: 1 week
managed pages that didn't already have that lock held. (Freeing an
unmanaged page, such as the various pmaps use, doesn't require the page
lock.)
This allows a change in vm_page_remove()'s locking requirements. It now
expects the page lock to be held instead of the page queues lock.
Consequently, the page queues lock is no longer required at all by callers
to vm_page_rename().
Discussed with: kib
In particular following functions KPI results modified:
- bufobj_invalbuf()
- bufsync()
and BO_SYNC() "virtual method" of the buffer objects set.
Main consumers of bufobj functions are affected by this change too and,
in particular, functions which changed their KPI are:
- vinvalbuf()
- g_vfs_close()
Due to the KPI breakage, __FreeBSD_version will be bumped in a later
commit.
As a side note, please consider just temporary the 'curthread' argument
passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP
Reviewed by: kib
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
always curthread.
As KPI gets broken by this patch, manpages and __FreeBSD_version will be
updated by further commits.
Tested by: Andrea Barberio <insomniac at slackware dot it>
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.
Manpage and FreeBSD_version will be updated through further commits.
As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.
Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>
Now, we assume no more sched_lock protection for some of them and use the
distribuited loads method for vmmeter (distribuited through CPUs).
Reviewed by: alc, bde
Approved by: jeff (mentor)
argument from being file descriptor index into the pointer to struct file:
part 2. Convert calls missed in the first big commit.
Noted by: rwatson
Pointy hat to: kib
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.
Requested by: alc
Approved by: jeff (mentor)
vmcnts. This can be used to abstract away pcpu details but also changes
to use atomics for all counters now. This means sched lock is no longer
responsible for protecting counts in the switch routines.
Contributed by: Attilio Rao <attilio@FreeBSD.org>
synchronized by the lock on the object containing the page.
Transition PG_WANTED and PG_SWAPINPROG to use the new field,
eliminating the need for holding the page queues lock when setting
or clearing these flags. Rename PG_WANTED and PG_SWAPINPROG to
VPO_WANTED and VPO_SWAPINPROG, respectively.
Eliminate the assertion that the page queues lock is held in
vm_page_io_finish().
Eliminate the acquisition and release of the page queues lock
around calls to vm_page_io_finish() in kern_sendfile() and
vfs_unbusy_pages().
If B_NOCACHE is set the pages of vm backed buffers will be invalidated.
However clean buffers can be backed by dirty VM pages so invalidating them
can lead to data loss.
Add support for flush dirty page in the data invalidation function
of some network file systems.
This fixes data losses during vnode recycling (and other code paths
using invalbuf(*,V_SAVE,*,*)) for data written using an mmaped file.
Collaborative effort by: jhb@,mohans@,peter@,ps@,ups@
Reviewed by: tegge@
MFC after: 7 days
the filesystem. Check that rather than VI_XLOCK.
- VOP_INACTIVE should no longer drop the vnode lock.
- The vnode lock is required around calls to vrecycle() and vgone().
Sponsored by: Isilon Systems, Inc.
allocation and deallocation. This flag's principal use is shortly after
allocation. For such cases, clearing the flag is pointless. The only
unusual use of PG_ZERO is in vfs_bio_clrbuf(). However, allocbuf() never
requests a prezeroed page. So, vfs_bio_clrbuf() never sees a prezeroed
page.
Reviewed by: tegge@
- struct plimit includes a mutex to protect a reference count. The plimit
structure is treated similarly to struct ucred in that is is always copy
on write, so having a reference to a structure is sufficient to read from
it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
limits from a process to keep the limit structure from changing out from
under you while reading from it.
- Various global limits that are ints are not protected by a lock since
int writes are atomic on all the archs we support and thus a lock
wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
either an rlimit, or the current or max individual limit of the specified
resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
(it didn't used the stackgap when it should have) but uses lim_rlimit()
and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits. It
also no longer uses the stackgap for accessing sysctl's for the
ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result,
ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.
Submitted by: mtm (mostly, I only did a few cleanups and catchups)
Tested on: i386
Compiled on: alpha, amd64
stack trace supplied by phk, I now understand what's going on here. The
check for VI_XLOCK stops us from calling vinvalbuf once the vnode has been
partially torn down in vclean(). It is not clear that this would cause
a problem. Document this in nfs_bio.c, which is where the other two
filesystems copied this code from.
sufficient to guarantee that this race is not hit. The XLOCK will likely
have to be redesigned due to the way reference counting and mutexes work
in FreeBSD. We currently can not be guaranteed that xlock was not set
and cleared while we were blocked on the interlock while waiting to check
for XLOCK. This would lead us to reference a vnode which was not the
vnode we requested.
- Add a backtrace() call inside of INVARIANTS in the hopes of finding out if
this condition is ever hit. It should not, since we should be retaining
a reference to the vnode in these cases. The reference would be sufficient
to block recycling.
contain the filedescriptor number on opens from userland.
The index is used rather than a "struct file *" since it conveys a bit
more information, which may be useful to in particular fdescfs and /dev/fd/*
For now pass -1 all over the place.
not to the parent's smbnode, which may be freed during the lifetime
of the child if the mount is forcibly unmounted. umount -f should now
work properly (ie. not panic) on smbfs mounts.
that use it. Specifically, vop_stdlock uses the lock pointed to by
vp->v_vnlock. By default, getnewvnode sets up vp->v_vnlock to
reference vp->v_lock. Filesystems that wish to use the default
do not need to allocate a lock at the front of their node structure
(as some still did) or do a lockinit. They can simply start using
vn_lock/VOP_UNLOCK. Filesystems that wish to manage their own locks,
but still use the vop_stdlock functions (such as nullfs) can simply
replace vp->v_vnlock with a pointer to the lock that they wish to
have used for the vnode. Such filesystems are responsible for
setting the vp->v_vnlock back to the default in their vop_reclaim
routine (e.g., vp->v_vnlock = &vp->v_lock).
In theory, this set of changes cleans up the existing filesystem
lock interface and should have no function change to the existing
locking scheme.
Sponsored by: DARPA & NAI Labs.
- v_vflag is protected by the vnode lock and is used when synchronization
with VOP calls is needed.
- v_iflag is protected by interlock and is used for dealing with vnode
management issues. These flags include X/O LOCK, FREE, DOOMED, etc.
- All accesses to v_iflag and v_vflag have either been locked or marked with
mp_fixme's.
- Many ASSERT_VOP_LOCKED calls have been added where the locking was not
clear.
- Many functions in vfs_subr.c were restructured to provide for stronger
locking.
Idea stolen from: BSD/OS