Both of these features are not needed by many consumers and result in avoidable
reads which in turn puts them on profiles due to cache-line ping ponging.
On top of that the current lockgmr entry point is slower than necessary
single-threaded. As an attempted clean up preparing for other changes,
provide new routines which don't support any of the aforementioned features.
With these patches in place vop_stdlock and vop_stdunlock disappear from
flamegraphs during -j 104 buildkernel.
Reviewed by: jeff (previous version)
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22665
The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.
v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.
Reviewed by: kib, jeff
Differential Revision: https://reviews.freebsd.org/D22715
When an NFSv4 server replies NFSERR_MINORVERSMISMATCH, it does not generate
a status result for the first operation in the compound. Without this
patch, this will result in a bogus EBADXDR error return.
Returning EBADXDR is relatively harmless, but a correct reply of
NFSERR_MINORVERSMISMATCH is needed by the pNFS client to select the correct
minor version to use for a File Layout DS now that there can be NFSv4.2
DS servers.
mount_nfs.c still needs to be fixed for this, although how the mount fails
is only useful to help sysadmins isolate why a mount fails.
Found during testing of the NFSv4.2 client and server.
MFC after: 2 weeks
This is a preliminary commit of NFSv4.2 definitions that will be used by
subsequent commits which adds NFSv4.2 support to the NFS client and server.
There will be a series of these preliminary commits that will prepare for
a major commit of the NFSv4.2 client/server changes currently found in
subversion under projects/nfsv42/sys.
Since r355472 added code which clears the XATTRSUPPORT bit for non-NFSv4.2
mounts, it is now safe to set it.
There will be a series of these preliminary commits that will prepare for
a major commit of the NFSv4.2 client/server changes currently found in
subversion under projects/nfsv42/sys.
This commit completes updates to nfsproto.h required by the NFSv4.2.
This patch adds code to macros to clear attribute bits not supported
by NFSv4.2. For now, these bits are never set anyhow, but this prepares
the code for the addition of NFSv4.2 support in a future commit.
There will be a series of these preliminary commits that will prepare for
a major commit of the NFSv4.2 client/server changes currently found in
subversion under projects/nfsv42/sys.
This is a preliminary commit of NFSv4.2 definitions that will be used by
subsequent commits which adds NFSv4.2 support to the NFS client and server.
There will be a series of these preliminary commits that will prepare for
a major commit of the NFSv4.2 client/server changes currently found in
subversion under projects/nfsv42/sys.
This is a preliminary commit of NFSv4.2 definitions that will be used by
subsequent commits which adds NFSv4.2 support to the NFS client and server.
There will be a series of these preliminary commits that will prepare for
a major commit of the NFSv4.2 client/server changes currently found in
subversion under projects/nfsv42/sys.
Generally, it's preferred that an application fork/setsid if it doesn't want
to keep its controlling TTY, but it could be that a debugger is trying to
steal it instead -- so it would hook in, drop the controlling TTY, then do
some magic to set things up again. In this case, TIOCNOTTY is quite handy
and still respected by at least OpenBSD, NetBSD, and Linux as far as I can
tell.
I've dropped the note about obsoletion, as I intend to support TIOCNOTTY as
long as it doesn't impose a major burden.
Reviewed by: bcr (manpages), kib
Differential Revision: https://reviews.freebsd.org/D22572
This allows bumping threadcount without taking the global devmtx lock.
In particular this eliminates contention on said lock while using bhyve
with multiple vms.
Reviewed by: kib
Tested by: markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22548
We might race with reclaim, and then this is no longer a nfs vnode, in
which case we do not need to handle deferred vnode_pager_setsize()
either.
Reported by: rk@ronald.org
PR: 242184
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
A crash was reported where the nr_client field was NULL during an upcall
to the nfsuserd daemon. Since nr_client == NULL only occurs when the
nfsuserd daemon is being shut down, it appeared to be caused by a race
between doing an upcall and the daemon shutting down.
By inspection two races were identified:
1 - The nfsrv_nfsuserd variable is used to indicate whether or not the
daemon is running. However it did not handle the intermediate phase
where the daemon is starting or stopping.
This was fixed by making nfsrv_nfsuserd tri-state and having the
functions that are called during start/stop to obey the intermediate
state.
2 - nfsrv_nfsuserd was checked to see that the daemon was running at
the beginning of an upcall, but nothing prevented the daemon from
being shut down while an upcall was still in progress.
This race probably caused the crash.
The patch fixes this by adding a count of upcalls in progress and
having the shut down function delay until this count goes to zero
before getting rid of nr_client and related data used by an upcall.
Tested by: avg (Panzura QA)
Reported by: avg
Reviewed by: avg
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D22377
Top-level kern_renameat() increases the writecount on the mount point,
which, together with tmpfs unmount suspending the mount, already
ensures that unmount cannot proceed while rename unlocks and relocks
all operated vnodes.
Remove vfs_busy() call from tmpfs_rename() which was done while
holding a vnode lock, creating the deadlock. The only intent of the
busy operation seems to be the prevention of unmount, which is already
ensured.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The pNFS server currently reports SpaceUsed (va_bytes) for the metadata
file. This in not correct, since the metadata file is always empty and,
as such, va_bytes is just the allocation for the empty file.
This patch adds va_bytes to the list of attributes acquired from the
DS for a file, so that it includes the allocated data size and is updated
when the file is written.
For files created on a pNFS server before this patch is applied, the
va_bytes value is estimated by rounding va_size up to a multiple of
BLKDEV_IOSIZE. Once the file is written after this patch has been
applied to the metadata server, the va_bytes returned for the file
will be correct.
This patch only affects a pNFS metadata server.
Found during testing of the NFSv4.2 pNFS server for the Allocate operation.
(Not yet in head/current.)
MFC after: 2 weeks
Currently si_usecount is effectively a sum of usecounts from all associated
vnodes. This is maintained by special-casing for VCHR every time usecount is
modified. Apart from complicating the code a little bit, it has a scalability
impact since it forces a read from a cacheline shared with said count.
There are no consumers of the feature in the ports tree. In head there are only
2: revoke and devfs_close. Both can get away with a weaker requirement than the
exact usecount, namely just the count of active vnodes. Changing the meaning to
the latter means we only need to modify it on 0<->1 transitions, avoiding the
check plenty of times (and entirely in something like vrefact).
Reviewed by: kib, jeff
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22202
reudundant complicated checks and additional locking required only for
anonymous memory. Introduce vm_object_allocate_anon() to create these
objects. DEFAULT and SWAP objects now have the correct settings for
non-anonymous consumers and so individual consumers need not modify the
default flags to create super-pages and avoid ONEMAPPING/NOSPLIT.
Reviewed by: alc, dougm, kib, markj
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22119
flag and use the same system.
This enables further fault locking improvements by allowing more faults to
proceed with a shared lock.
Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22116
Vast majority of uses the cache are just checking if there is an entry
present on process exit (and evicting it if so). Both checking and
eviction process are very expensive and put the lock protecting it high
up on the profile during poudriere -j 104.
Convert the linked list into a hash. This allows to almost always avoid
taking the lock in the first place (and consequently almost removes it
from the profile). Note only one lock is preserved as a split did not
meaningfully impact contention.
Should the cache be used for something it will still run into contention
issues. The code needs a rewrite, but should someone want to tidy it up
further the following can be done:
1) per-chain locks (or at least an array)
2) hashing by something else than just pid
Sponsored by: The FreeBSD Foundation
Make the nfsclient always call vnode_pager_setsize() with the vnode
exclusively locked. This ensures that page fault always can find the
backing page if the object size check succeeded. Set VV_VMSIZEVNLOCK
flag on NFS nodes.
The main offender breaking the interface in nfsclient is
nfs_loadattrcache(), which is used whenever server responded with
updated attributes, which can happen on non-changing operations as
well. Also, iod threads only have buffers locked (and even that is
LK_KERNPROC), but they still may call nfs_loadattrcache() on RPC
response.
Instead of immediately calling vnode_pager_setsize() if server
response indicated changed file size, but the vnode is not exclusively
locked, set a new node flag NVNSETSZSKIP. When the vnode exclusively
locked, or when we can temporary upgrade the lock to exclusive, call
vnode_pager_setsize(), by providing the nfsclient VOP_LOCK() implementation.
Tested by: pho
Discussed with: rmacklem
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D21883
Atomics are used for page busy and valid state when the shared busy is
held. The details of the locking protocol and valid and dirty
synchronization are in the updated vm_page.h comments.
Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21594
This is the first in a series of patches that promotes the page busy field
to a first class lock that no longer requires the object lock for
consistency.
Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21548
Node' cdp.si_name is the full path as provided by make_dev(9), it
should not be returned by VOP_VPTOCNP() when only the last component
is requested. Use the dirent entry instead.
With this note, handling of VDIR and VCHR nodes only differs in
handling of root vnode, which simplifies and unifies the logic.
Reported by: Li, Zhichao1 <Zhichao_Li1@Dell.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
There are three more places in msdosfs_fat.c which might shift one
into the sign bit. While there, fix formatting of KASSERTs.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
In case the implementation ever changes from using a chain of next pointers,
then changing the macro definition will be necessary, but changing all the
files that iterate over vm_map entries will not.
Drop a counter in vm_object.c that would have an effect only if the
vm_map entry count was wrong.
Discussed with: alc
Reviewed by: markj
Tested by: pho (earlier version)
Differential Revision: https://reviews.freebsd.org/D21882
vn_write already checks for vnode type to see if bwillwrite should be called.
This effectively reverts r244643.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21905
During readdir() we guarantee that the tn_dir.tn_parent does not go
away, but it might be replaced by a parallel rename. Read tn_parent
only once, then use the cached value.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
To be consistent with replacing the mtx_lock()/mtx_unlock() calls on
the NFS node mutex (n_mtx) and ncl_iod_mutex, this patch replaces
all mtx_assert() calls on these mutexes with macros as well.
This will simplify changing these locks to sx locks in a future commit.
However, this change may be delayed indefinitely, since it appears there
is a deadlock when vnode_pager_setsize() is called to shrink the size
and the NFS node lock is held.
There is no semantic change as a result of this commit.
Suggested by: kib
MFC after: 1 week
Since the NFS node mutex needs to change to an sx lock so it can be held when
vnode_pager_setsize() is called and the iod lock is held when the NFS node lock
is acquired, the iod mutex will need to be changed to an sx lock as well.
To simply the future commit that changes both the NFS node lock and iod lock
to sx locks, this commit replaces all mtx_lock()/mtx_unlock() calls on the
iod lock with macros.
There is no semantic change as a result of this commit.
I don't know when the future commit will happen and be MFC'd, so I have
set the MFC on this commit to one week so that it can be MFC'd at the same
time.
Suggested by: kib
MFC after: 1 week
For a long time, some places in the NFS code have locked/unlocked the
NFS node lock with the macros NFSLOCKNODE()/NFSUNLOCKNODE() whereas
others have simply used mtx_lock()/mtx_unlock().
Since the NFS node mutex needs to change to an sx lock so it can be held when
vnode_pager_setsize() is called, replace all occurrences of mtx_lock/mtx_unlock
with the macros to simply making the change to an sx lock in future commit.
There is no semantic change as a result of this commit.
I am not sure if the change to an sx lock will be MFC'd soon, so I put
an MFC of 1 week on this commit so that it could be MFC'd with that commit.
Suggested by: kib
MFC after: 1 week
When a file is unlinked, the denode is not reclaimed until the last
reference is dropped, but the directory entry is immediately up for reuse.
This is a problem later when createde goes to grab a denode for the newly
created entry -- we search the hash and find a dead denode, then return that
without even bumping the reference count and the data later gets truncated
when the the last reference to the unlinked file is dropped.
This manifested itself as a broken in-place strip(1) on msdosfs. elfcopy
will do a sequence incredibly roughly like this:
open("/mnt/foo", ...) => fd 3
mmap()
unlink("/mnt/foo")
open("/mnt/foo", ...) => fd 4
write(4, ...)
close(4)
close(3)
and the resulting file would be truncated, but the write succeeded, as long
as a reference to the unlinked file had not been closed.
Some archaeology indicates that this bug has likely existed since msdosfs
was converted to use vfs_hash instead of a home rolled hash implementation
in r143570. Prior to that point, the hashget implementation would do a
refcnt check while searching and explicitly only return a denode with
de_refcnt != 0. vfs_hash did not yet have the callback that it does today,
so this slipped away and did not come back when it later grew that
functionality.
The comment indicating that we want to skip these denodes has been updated
to reflect where this is actually done. My repo-diving session seems to
indicate that the refcnt check was likely never actually below the comment,
to be pedantic, but instead a detail wrapped up in the hashget
implementation since the beginning of its inclusion into FreeBSD.
This bug was the cause behind the issue addressed in r352557.
Reported by: jhibbits
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D21731
node lock when shrinking.
This is similar to r252528, applied to the above commit.
Apparently there is a race which makes necessary at least to keep the
n_size and pager size consistent when extending. Current suspect is
that iod threads perform vnode_pager_setsize() without taking the
vnode lock, which corrupts the file content.
Reported and tested by: Masachika ISHIZUKA <ish@amail.plala.or.jp>
Discussed with: rmacklem (related issues)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
There are 3 counters modified all the time in this structure - one for
keeping the structure alive, one for preventing unmount and one for
tracking active writers. Exact values of these counters are very rarely
needed, which makes them a prime candidate for conversion to a per-cpu
scheme, resulting in much better performance.
Sample benchmark performing fstatfs (modifying 2 out of 3 counters) on
a 104-way 2 socket Skylake system:
before: 852393 ops/s
after: 76682077 ops/s
Reviewed by: kib, jeff
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21637
That revision addressed a Coverity CID that could lead to a buffer overflow,
but it had an off-by-one error in the buffer size check.
Reported by: Coverity
Coverity CID: 1405530
MFC after: 3 days
MFC-With: 351961
Sponsored by: The FreeBSD Foundation
* When unparenting a vnode, actually clear the flag. AFAIK this is basically
a no-op because we only unparent a vnode when reclaiming it or when
unlinking.
* There's no need to call fuse_vnode_setparent during reclaim, because we're
about to free the vnode data anyway.
Reviewed by: emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21630
vnode_pager_setsize() under the node mutex.
r248567 moved some calls of vnode_pager_setsize() after the node lock
is unlocked, do the rest now.
Reported and tested by: peterj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
of the reverse.
This fixes Linux sysctl(8) binary - it assumes the first two
directory entries are always "." and "..". There might be other
Linux apps affected by this.
NB it might be a good idea to rewrite it using queue(3).
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21550