Commit Graph

4185 Commits

Author SHA1 Message Date
Mateusz Guzik
2e310f6f72 pseudofs: hashed vncache
Vast majority of uses the cache are just checking if there is an entry
present on process exit (and evicting it if so). Both checking and
eviction process are very expensive and put the lock protecting it high
up on the profile during poudriere -j 104.

Convert the linked list into a hash. This allows to almost always avoid
taking the lock in the first place (and consequently almost removes it
from the profile). Note only one lock is preserved as a split did not
meaningfully impact contention.

Should the cache be used for something it will still run into contention
issues. The code needs a rewrite, but should someone want to tidy it up
further the following can be done:

1) per-chain locks (or at least an array)
2) hashing by something else than just pid

Sponsored by:	The FreeBSD Foundation
2019-10-22 22:52:53 +00:00
Konstantin Belousov
c6ba06d86c Fix interface between nfsclient and vnode pager.
Make the nfsclient always call vnode_pager_setsize() with the vnode
exclusively locked.  This ensures that page fault always can find the
backing page if the object size check succeeded.  Set VV_VMSIZEVNLOCK
flag on NFS nodes.

The main offender breaking the interface in nfsclient is
nfs_loadattrcache(), which is used whenever server responded with
updated attributes, which can happen on non-changing operations as
well.  Also, iod threads only have buffers locked (and even that is
LK_KERNPROC), but they still may call nfs_loadattrcache() on RPC
response.

Instead of immediately calling vnode_pager_setsize() if server
response indicated changed file size, but the vnode is not exclusively
locked, set a new node flag NVNSETSZSKIP.  When the vnode exclusively
locked, or when we can temporary upgrade the lock to exclusive, call
vnode_pager_setsize(), by providing the nfsclient VOP_LOCK() implementation.

Tested by:	pho
Discussed with:	rmacklem
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D21883
2019-10-22 16:17:38 +00:00
Jeff Roberson
0012f373e4 (4/6) Protect page valid with the busy lock.
Atomics are used for page busy and valid state when the shared busy is
held.  The details of the locking protocol and valid and dirty
synchronization are in the updated vm_page.h comments.

Reviewed by:    kib, markj
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision:        https://reviews.freebsd.org/D21594
2019-10-15 03:45:41 +00:00
Jeff Roberson
63e9755548 (1/6) Replace busy checks with acquires where it is trival to do so.
This is the first in a series of patches that promotes the page busy field
to a first class lock that no longer requires the object lock for
consistency.

Reviewed by:	kib, markj
Tested by:	pho
Sponsored by:	Netflix, Intel
Differential Revision:	https://reviews.freebsd.org/D21548
2019-10-15 03:35:11 +00:00
Mateusz Guzik
9c04e4c01e tmpfs: use MNTK_NOMSYNC
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22009
2019-10-13 15:42:41 +00:00
Mateusz Guzik
48c426f226 pseudofs: use MNTK_NOMSYNC
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22009
2019-10-13 15:42:25 +00:00
Mateusz Guzik
be4cd6912f nullfs: use MNTK_NOMSYNC
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22009
2019-10-13 15:42:04 +00:00
Mateusz Guzik
4c9ba39aea devfs: use MNTK_NOMSYNC
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22009
2019-10-13 15:41:47 +00:00
Konstantin Belousov
e3fdd051f9 devfs_vptocnp(): correct the component name when node is not at top.
Node' cdp.si_name is the full path as provided by make_dev(9), it
should not be returned by VOP_VPTOCNP() when only the last component
is requested.  Use the dirent entry instead.

With this note, handling of VDIR and VCHR nodes only differs in
handling of root vnode, which simplifies and unifies the logic.

Reported by:	Li, Zhichao1 <Zhichao_Li1@Dell.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-10-11 18:41:24 +00:00
Konstantin Belousov
53fcc6c960 Plug the rest of undef behavior places that were missed in r337456.
There are three more places in msdosfs_fat.c which might shift one
into the sign bit.  While there, fix formatting of KASSERTs.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-10-11 18:37:02 +00:00
Doug Moore
2288078c5e Define macro VM_MAP_ENTRY_FOREACH for enumerating the entries in a vm_map.
In case the implementation ever changes from using a chain of next pointers,
then changing the macro definition will be necessary, but changing all the
files that iterate over vm_map entries will not.

Drop a counter in vm_object.c that would have an effect only if the
vm_map entry count was wrong.

Discussed with: alc
Reviewed by: markj
Tested by: pho (earlier version)
Differential Revision:	https://reviews.freebsd.org/D21882
2019-10-08 07:14:21 +00:00
Mateusz Guzik
d511f93e45 nfsclient: add root vnode caching
See r353150.

Sponsored by:   The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D21646
2019-10-06 22:17:29 +00:00
Mateusz Guzik
7682d0be2b tmpfs: add root vnode caching
See r353150.

Sponsored by:   The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D21646
2019-10-06 22:17:11 +00:00
Mateusz Guzik
559ac49d41 devfs: add root vnode caching
See r353150.

Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21646
2019-10-06 22:16:55 +00:00
Mateusz Guzik
dfa8dae493 devfs: plug redundant bwillwrite avoidance
vn_write already checks for vnode type to see if bwillwrite should be called.

This effectively reverts r244643.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21905
2019-10-05 17:44:33 +00:00
Konstantin Belousov
c5dac63c15 tmpfs_readdir(): unlock the locked node.
During readdir() we guarantee that the tn_dir.tn_parent does not go
away, but it might be replaced by a parallel rename.  Read tn_parent
only once, then use the cached value.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-10-03 19:55:05 +00:00
Konstantin Belousov
f7e69a6fa0 tmpfs_rename: style.
Reformat multi-line comments to follow style.
Also fix some typos.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-10-03 19:51:56 +00:00
Konstantin Belousov
d60ac9d561 Remove unnecessary vm/vm_page.h and vm/vm_pager.h includes from
tmpfs/tmpfs_vnodes.c.

Submitted by:	ota@j.email.ne.jp
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21881
2019-10-03 08:25:09 +00:00
Rick Macklem
ee7201a725 Replace all mtx_assert() calls for n_mtx and ncl_iod_mutex with macros.
To be consistent with replacing the mtx_lock()/mtx_unlock() calls on
the NFS node mutex (n_mtx) and ncl_iod_mutex, this patch replaces
all mtx_assert() calls on these mutexes with macros as well.
This will simplify changing these locks to sx locks in a future commit.
However, this change may be delayed indefinitely, since it appears there
is a deadlock when vnode_pager_setsize() is called to shrink the size
and the NFS node lock is held.
There is no semantic change as a result of this commit.

Suggested by:	kib
MFC after:	1 week
2019-09-26 02:54:45 +00:00
Rick Macklem
b662b41e62 Replace all mtx_lock()/mtx_unlock() on the iod lock with macros.
Since the NFS node mutex needs to change to an sx lock so it can be held when
vnode_pager_setsize() is called and the iod lock is held when the NFS node lock
is acquired, the iod mutex will need to be changed to an sx lock as well.
To simply the future commit that changes both the NFS node lock and iod lock
to sx locks, this commit replaces all mtx_lock()/mtx_unlock() calls on the
iod lock with macros.
There is no semantic change as a result of this commit.

I don't know when the future commit will happen and be MFC'd, so I have
set the MFC on this commit to one week so that it can be MFC'd at the same
time.

Suggested by:	kib
MFC after:	1 week
2019-09-24 23:38:10 +00:00
Rick Macklem
5d85e12f44 Replace all mtx_lock()/mtx_unlock() on n_mtx with the macros.
For a long time, some places in the NFS code have locked/unlocked the
NFS node lock with the macros NFSLOCKNODE()/NFSUNLOCKNODE() whereas
others have simply used mtx_lock()/mtx_unlock().
Since the NFS node mutex needs to change to an sx lock so it can be held when
vnode_pager_setsize() is called, replace all occurrences of mtx_lock/mtx_unlock
with the macros to simply making the change to an sx lock in future commit.
There is no semantic change as a result of this commit.

I am not sure if the change to an sx lock will be MFC'd soon, so I put
an MFC of 1 week on this commit so that it could be MFC'd with that commit.

Suggested by:	kib
MFC after:	1 week
2019-09-24 01:58:54 +00:00
Kyle Evans
5fdac75222 msdosfs: do not deget unlinked denodes
When a file is unlinked, the denode is not reclaimed until the last
reference is dropped, but the directory entry is immediately up for reuse.
This is a problem later when createde goes to grab a denode for the newly
created entry -- we search the hash and find a dead denode, then return that
without even bumping the reference count and the data later gets truncated
when the the last reference to the unlinked file is dropped.

This manifested itself as a broken in-place strip(1) on msdosfs. elfcopy
will do a sequence incredibly roughly like this:

open("/mnt/foo", ...) => fd 3
mmap()
unlink("/mnt/foo")
open("/mnt/foo", ...) => fd 4
write(4, ...)
close(4)
close(3)

and the resulting file would be truncated, but the write succeeded, as long
as a reference to the unlinked file had not been closed.

Some archaeology indicates that this bug has likely existed since msdosfs
was converted to use vfs_hash instead of a home rolled hash implementation
in r143570. Prior to that point, the hashget implementation would do a
refcnt check while searching and explicitly only return a denode with
de_refcnt != 0. vfs_hash did not yet have the callback that it does today,
so this slipped away and did not come back when it later grew that
functionality.

The comment indicating that we want to skip these denodes has been updated
to reflect where this is actually done. My repo-diving session seems to
indicate that the refcnt check was likely never actually below the comment,
to be pedantic, but instead a detail wrapped up in the hashget
implementation since the beginning of its inclusion into FreeBSD.

This bug was the cause behind the issue addressed in r352557.

Reported by:	jhibbits
Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D21731
2019-09-20 20:47:10 +00:00
Konstantin Belousov
6fd583583b Further refine r352393, only call vnode_pager_setsize() outside the
node lock when shrinking.

This is similar to r252528, applied to the above commit.

Apparently there is a race which makes necessary at least to keep the
n_size and pager size consistent when extending.  Current suspect is
that iod threads perform vnode_pager_setsize() without taking the
vnode lock, which corrupts the file content.

Reported and tested by:	Masachika ISHIZUKA <ish@amail.plala.or.jp>
Discussed with:	rmacklem (related issues)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-09-17 18:41:39 +00:00
Mateusz Guzik
4cace859c2 vfs: convert struct mount counters to per-cpu
There are 3 counters modified all the time in this structure - one for
keeping the structure alive, one for preventing unmount and one for
tracking active writers. Exact values of these counters are very rarely
needed, which makes them a prime candidate for conversion to a per-cpu
scheme, resulting in much better performance.

Sample benchmark performing fstatfs (modifying 2 out of 3 counters) on
a 104-way 2 socket Skylake system:
before:   852393 ops/s
after:  76682077 ops/s

Reviewed by:	kib, jeff
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21637
2019-09-16 21:37:47 +00:00
Alan Somers
320c848ff6 Fix an off-by-one error from r351961
That revision addressed a Coverity CID that could lead to a buffer overflow,
but it had an off-by-one error in the buffer size check.

Reported by:	Coverity
Coverity CID:	1405530
MFC after:	3 days
MFC-With:	351961
Sponsored by:	The FreeBSD Foundation
2019-09-16 16:41:01 +00:00
Alan Somers
42767f76af fusefs: fix some minor issues with fuse_vnode_setparent
* When unparenting a vnode, actually clear the flag. AFAIK this is basically
  a no-op because we only unparent a vnode when reclaiming it or when
  unlinking.

* There's no need to call fuse_vnode_setparent during reclaim, because we're
  about to free the vnode data anyway.

Reviewed by:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21630
2019-09-16 14:51:49 +00:00
Konstantin Belousov
1246ee664b nfscl_loadattrcache: fix rest of the cases to not call
vnode_pager_setsize() under the node mutex.

r248567 moved some calls of vnode_pager_setsize() after the node lock
is unlocked, do the rest now.

Reported and tested by:	peterj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-09-16 13:26:27 +00:00
Edward Tomasz Napierala
cf38985293 Make pseudofs(9) create directory entries in order, instead
of the reverse.

This fixes Linux sysctl(8) binary - it assumes the first two
directory entries are always "." and "..". There might be other
Linux apps affected by this.

NB it might be a good idea to rewrite it using queue(3).

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21550
2019-09-14 19:16:07 +00:00
Conrad Meyer
aaa3852435 buf: Add B_INVALONERR flag to discard data
Setting the B_INVALONERR flag before a synchronous write causes the buf
cache to forcibly invalidate contents if the write fails (BIO_ERROR).

This is intended to be used to allow layers above the buffer cache to make
more informed decisions about when discarding dirty buffers without
successful write is acceptable.

As a proof of concept, use in msdosfs to handle failures to mark the on-disk
'dirty' bit during rw mount or ro->rw update.

Extending this to other filesystems is left as future work.

PR:		210316
Reviewed by:	kib (with objections)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D21539
2019-09-11 21:24:14 +00:00
Alan Somers
6c0c362075 fusefs: Fix iosize for FUSE_WRITE in 7.8 compat mode
When communicating with a FUSE server that implements version 7.8 (or older)
of the FUSE protocol, the FUSE_WRITE request structure is 16 bytes shorter
than normal. The protocol version check wasn't applied universally, leading
to an extra 16 bytes being sent to such servers. The extra bytes were
allocated and bzero()d, so there was no information disclosure.

Reviewed by:	emaste
MFC after:	3 days
MFC-With:	r350665
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21557
2019-09-11 19:29:40 +00:00
Mark Johnston
fee2a2fa39 Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator.  In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well.  These references are protected by the page lock, which must
therefore be acquired for many per-page operations.  This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.

Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter.  A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held.  As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.

The vm_page_wire() and vm_page_unwire() KPIs are changed.  The former
requires that either the object lock or the busy lock is held.  The
latter no longer has a return value and may free the page if it releases
the last reference to that page.  vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate.  vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold().  It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler.  vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state).  In particular, synchronization details are no longer
leaked into the caller.

The change excises the page lock from several frequently executed code
paths.  In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock.  In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.

__FreeBSD_version is bumped.  The DRM ports have been updated to
accomodate the KPI changes.

Reviewed by:	jeff (earlier version)
Tested by:	gallatin (earlier version), pho
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
Ed Maste
4f0372f8cb msdosfsmount.h: fix ifdef comment 2019-09-09 18:35:17 +00:00
Alan Somers
16f8783452 Coverity fixes in fusefs(5)
CID 1404532 fixes a signed vs unsigned comparison error in fuse_vnop_bmap.
It could potentially have resulted in VOP_BMAP reporting too many
consecutive blocks.

CID 1404364 is much worse. It was an array access by an untrusted,
user-provided variable. It could potentially have resulted in a malicious
file system crashing the kernel or worse.

Reported by:	Coverity
Reviewed by:	emaste
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21466
2019-09-06 19:40:11 +00:00
Conrad Meyer
454409a372 msdosfs: Remove redundant brelse() after r294954
Same automation.

No functional change.
2019-09-06 08:08:10 +00:00
Conrad Meyer
dadc0f97d0 cd9660: Remove redundant brelse() after r294954
Same automation.

No functional change.
2019-09-06 08:07:36 +00:00
Conrad Meyer
fe8b34563d ext2fs: Remove redundant brelse() after r294954
Coccinelle:

@ rule1 @
 identifier __error;
@@
 ...
 int __error;
 ...

@ rule2 depends on rule1 @
 identifier rule1.__error;
 identifier __bp;
@@

 __error =
(
 bread
|
 bread_gb
|
 breadn
|
 breadn_flags
)
 (..., &__bp);
 if (
(
 __error
|
 __error != 0
)
 ) {
 ...
- brelse(__bp);
 ...
 }

No functional change.
2019-09-06 08:07:12 +00:00
Rick Macklem
4ce21f37fd Delete the unused "nd" argument for nfsrv_proxyds().
The "nd" argument for nfsrv_proxyds() is no longer used by the function.
This patch deletes it. This allows a subsequent patch to delete the "nd"
argument from nfsvno_getattr(), since it's only use of "nd" was to pass it
to nfsrv_proxyds().
Getting rid of the "nd" argument from nfsvno_getattr() avoids confusion
over why it might need "nd".

This patch is trivial and does not have any semantic effect.
2019-09-05 22:25:19 +00:00
Conrad Meyer
a6935d085c Remove long-dead BUF_ASSERT_{,UN}HELD assertions
These were fully neutered in r177676 (2008), but not removed at the time for
unclear reasons.  They're totally dead code, so go ahead and yank them now.

No functional change.
2019-09-05 21:43:33 +00:00
Conrad Meyer
f80cbeb292 msdosfs: Drop an unneeded brelse in bread error condition
After r294954, it is an invariant that bread returns non-NULL bp if and only
if the routine succeeded.  On error, it handles any buffer cleanup
internally.  So the brelse(NULL) here was just redundant.

No functional change.

Discussed with:	kib (extracted from a larger differential)
2019-09-05 21:30:52 +00:00
Rick Macklem
2e67077700 Delete the unused "nd" argument for nfsrv_checkdsattr().
The "nd" argument for nfsrv_checkdsattr() is no longer used by the function.
This patch deletes it. This allows subsequent patches to delete the "nd"
argument from nfsrv_proxyds(), since it's only use of "nd" was to pass it
to nfsrv_checkdsattr(). The same will then be true for nfsvno_getattr(),
which passes "nd" to nfsrv_proxyds().
Getting rid of the "nd" argument from nfsvno_getattr() avoids confusion
over why it might need "nd".

This patch is trivial and does not have any semantic effect.
Found by inspection while working on the NFSv4.2 server.
2019-09-04 22:37:28 +00:00
Kyle Evans
f99c5e8d28 pseudofs: make readdir work without a pid again
Specifically, the following was broken:

$ mount -t procfs procfs /proc
$ ls -l /proc

r351741 reworked readdir slightly to avoid pfs_node/pidhash LOR, but
inadvertently regressed pid == NO_PID; new pfs_lookup_proc() fails for the
obvious reasons, and later pfs_visible_proc doesn't capture the
pid == NO_PID -> return 1 aspect of pfs_visible. We can infact skip this
whole block if we're operating on a directory w/ NO_PID, as it's always
visible.

Reported by:	trasz
Reviewed by:	mjg
Differential Revision:	https://reviews.freebsd.org/D21518
2019-09-04 14:20:39 +00:00
Mateusz Guzik
f5791174df pseudofs: fix a LOR pfs_node vs pidhash (sleepable after non-sleepable)
Sponsored by:	The FreeBSD Foundation
2019-09-03 12:54:51 +00:00
Ed Maste
840aca2880 makefs: share msdosfsmount.h between kernel msdosfs and makefs
Sponsored by:	The FreeBSD Foundation
2019-09-01 16:55:33 +00:00
Mateusz Guzik
e0f4540a2a nullfs: reduce areas protected by vnode interlock in null_lock
Similarly to the other routine stop taking the interlock for the lower
vnode. The interlock for nullfs vnode is still taken to ensure
stability of ->v_data.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21480
2019-09-01 02:52:00 +00:00
Mateusz Guzik
13c73428dc nullfs: use VOP_NEED_INACTIVE
Reviewed by:	kib
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
2019-08-30 00:30:03 +00:00
Mark Johnston
9222b82368 Remove unused VM page locking macros.
They were orphaned by r292373.

Reviewed by:	asomers
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21469
2019-08-29 22:13:15 +00:00
Konstantin Belousov
6470c8d3db Rework v_object lifecycle for vnodes.
Current implementation of vnode_create_vobject() and
vnode_destroy_vobject() is written so that it prepared to handle the
vm object destruction for live vnode.  Practically, no filesystems use
this, except for some remnants that were present in UFS till today.
One of the consequences of that model is that each filesystem must
call vnode_destroy_vobject() in VOP_RECLAIM() or earlier, as result
all of them get rid of the v_object in reclaim.

Move the call to vnode_destroy_vobject() to vgonel() before
VOP_RECLAIM().  This makes v_object stable: either the object is NULL,
or it is valid vm object till the vnode reclamation.  Remove code from
vnode_create_vobject() to handle races with the parallel destruction.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D21412
2019-08-29 07:50:25 +00:00
Mateusz Guzik
a89cd2a4bd tmpfs: use VOP_NEED_INACTIVE
Reviewed by:	kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21371
2019-08-28 20:35:23 +00:00
Mateusz Guzik
1e2f0ceb2f vfs: add VOP_NEED_INACTIVE
vnode usecount drops to 0 all the time (e.g. for directories during path lookup).
When that happens the kernel would always lock the exclusive lock for the vnode
in order to call vinactive(). This blocks other threads who want to use the vnode
for looukp.

vinactive is very rarely needed and can be tested for without the vnode lock held.

This patch gives filesytems an opportunity to do it, sample total wait time for
tmpfs over 500 minutes of poudriere -j 104:

before: 557563641706 (lockmgr:tmpfs)
after:   46309603301 (lockmgr:tmpfs)

Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21371
2019-08-28 20:34:24 +00:00
Alan Somers
5e63333052 fusefs: Fix some bugs regarding the size of the LISTXATTR list
* A small error in r338152 let to the returned size always being exactly
  eight bytes too large.

* The FUSE_LISTXATTR operation works like Linux's listxattr(2): if the
  caller does not provide enough space, then the server should return ERANGE
  rather than return a truncated list.  That's true even though in FUSE's
  case the kernel doesn't provide space to the client at all; it simply
  requests a maximum size for the list.  We previously weren't handling the
  case where the server returns ERANGE even though the kernel requested as
  much size as the server had told us it needs; that can happen due to a
  race.

* We also need to ensure that a pathological server that always returns
  ERANGE no matter what size we request in FUSE_LISTXATTR won't cause an
  infinite loop in the kernel.  As of this commit, it will instead cause an
  infinite loop that exits and enters the kernel on each iteration, allowing
  signals to be processed.

Reviewed by:	cem
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21287
2019-08-28 04:19:37 +00:00