Commit Graph

4322 Commits

Author SHA1 Message Date
Mateusz Guzik
559ac49d41 devfs: add root vnode caching
See r353150.

Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21646
2019-10-06 22:16:55 +00:00
Mateusz Guzik
dfa8dae493 devfs: plug redundant bwillwrite avoidance
vn_write already checks for vnode type to see if bwillwrite should be called.

This effectively reverts r244643.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21905
2019-10-05 17:44:33 +00:00
Konstantin Belousov
c5dac63c15 tmpfs_readdir(): unlock the locked node.
During readdir() we guarantee that the tn_dir.tn_parent does not go
away, but it might be replaced by a parallel rename.  Read tn_parent
only once, then use the cached value.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-10-03 19:55:05 +00:00
Konstantin Belousov
f7e69a6fa0 tmpfs_rename: style.
Reformat multi-line comments to follow style.
Also fix some typos.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-10-03 19:51:56 +00:00
Konstantin Belousov
d60ac9d561 Remove unnecessary vm/vm_page.h and vm/vm_pager.h includes from
tmpfs/tmpfs_vnodes.c.

Submitted by:	ota@j.email.ne.jp
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21881
2019-10-03 08:25:09 +00:00
Rick Macklem
ee7201a725 Replace all mtx_assert() calls for n_mtx and ncl_iod_mutex with macros.
To be consistent with replacing the mtx_lock()/mtx_unlock() calls on
the NFS node mutex (n_mtx) and ncl_iod_mutex, this patch replaces
all mtx_assert() calls on these mutexes with macros as well.
This will simplify changing these locks to sx locks in a future commit.
However, this change may be delayed indefinitely, since it appears there
is a deadlock when vnode_pager_setsize() is called to shrink the size
and the NFS node lock is held.
There is no semantic change as a result of this commit.

Suggested by:	kib
MFC after:	1 week
2019-09-26 02:54:45 +00:00
Rick Macklem
b662b41e62 Replace all mtx_lock()/mtx_unlock() on the iod lock with macros.
Since the NFS node mutex needs to change to an sx lock so it can be held when
vnode_pager_setsize() is called and the iod lock is held when the NFS node lock
is acquired, the iod mutex will need to be changed to an sx lock as well.
To simply the future commit that changes both the NFS node lock and iod lock
to sx locks, this commit replaces all mtx_lock()/mtx_unlock() calls on the
iod lock with macros.
There is no semantic change as a result of this commit.

I don't know when the future commit will happen and be MFC'd, so I have
set the MFC on this commit to one week so that it can be MFC'd at the same
time.

Suggested by:	kib
MFC after:	1 week
2019-09-24 23:38:10 +00:00
Rick Macklem
5d85e12f44 Replace all mtx_lock()/mtx_unlock() on n_mtx with the macros.
For a long time, some places in the NFS code have locked/unlocked the
NFS node lock with the macros NFSLOCKNODE()/NFSUNLOCKNODE() whereas
others have simply used mtx_lock()/mtx_unlock().
Since the NFS node mutex needs to change to an sx lock so it can be held when
vnode_pager_setsize() is called, replace all occurrences of mtx_lock/mtx_unlock
with the macros to simply making the change to an sx lock in future commit.
There is no semantic change as a result of this commit.

I am not sure if the change to an sx lock will be MFC'd soon, so I put
an MFC of 1 week on this commit so that it could be MFC'd with that commit.

Suggested by:	kib
MFC after:	1 week
2019-09-24 01:58:54 +00:00
Kyle Evans
5fdac75222 msdosfs: do not deget unlinked denodes
When a file is unlinked, the denode is not reclaimed until the last
reference is dropped, but the directory entry is immediately up for reuse.
This is a problem later when createde goes to grab a denode for the newly
created entry -- we search the hash and find a dead denode, then return that
without even bumping the reference count and the data later gets truncated
when the the last reference to the unlinked file is dropped.

This manifested itself as a broken in-place strip(1) on msdosfs. elfcopy
will do a sequence incredibly roughly like this:

open("/mnt/foo", ...) => fd 3
mmap()
unlink("/mnt/foo")
open("/mnt/foo", ...) => fd 4
write(4, ...)
close(4)
close(3)

and the resulting file would be truncated, but the write succeeded, as long
as a reference to the unlinked file had not been closed.

Some archaeology indicates that this bug has likely existed since msdosfs
was converted to use vfs_hash instead of a home rolled hash implementation
in r143570. Prior to that point, the hashget implementation would do a
refcnt check while searching and explicitly only return a denode with
de_refcnt != 0. vfs_hash did not yet have the callback that it does today,
so this slipped away and did not come back when it later grew that
functionality.

The comment indicating that we want to skip these denodes has been updated
to reflect where this is actually done. My repo-diving session seems to
indicate that the refcnt check was likely never actually below the comment,
to be pedantic, but instead a detail wrapped up in the hashget
implementation since the beginning of its inclusion into FreeBSD.

This bug was the cause behind the issue addressed in r352557.

Reported by:	jhibbits
Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D21731
2019-09-20 20:47:10 +00:00
Konstantin Belousov
6fd583583b Further refine r352393, only call vnode_pager_setsize() outside the
node lock when shrinking.

This is similar to r252528, applied to the above commit.

Apparently there is a race which makes necessary at least to keep the
n_size and pager size consistent when extending.  Current suspect is
that iod threads perform vnode_pager_setsize() without taking the
vnode lock, which corrupts the file content.

Reported and tested by:	Masachika ISHIZUKA <ish@amail.plala.or.jp>
Discussed with:	rmacklem (related issues)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-09-17 18:41:39 +00:00
Mateusz Guzik
4cace859c2 vfs: convert struct mount counters to per-cpu
There are 3 counters modified all the time in this structure - one for
keeping the structure alive, one for preventing unmount and one for
tracking active writers. Exact values of these counters are very rarely
needed, which makes them a prime candidate for conversion to a per-cpu
scheme, resulting in much better performance.

Sample benchmark performing fstatfs (modifying 2 out of 3 counters) on
a 104-way 2 socket Skylake system:
before:   852393 ops/s
after:  76682077 ops/s

Reviewed by:	kib, jeff
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21637
2019-09-16 21:37:47 +00:00
Alan Somers
320c848ff6 Fix an off-by-one error from r351961
That revision addressed a Coverity CID that could lead to a buffer overflow,
but it had an off-by-one error in the buffer size check.

Reported by:	Coverity
Coverity CID:	1405530
MFC after:	3 days
MFC-With:	351961
Sponsored by:	The FreeBSD Foundation
2019-09-16 16:41:01 +00:00
Alan Somers
42767f76af fusefs: fix some minor issues with fuse_vnode_setparent
* When unparenting a vnode, actually clear the flag. AFAIK this is basically
  a no-op because we only unparent a vnode when reclaiming it or when
  unlinking.

* There's no need to call fuse_vnode_setparent during reclaim, because we're
  about to free the vnode data anyway.

Reviewed by:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21630
2019-09-16 14:51:49 +00:00
Konstantin Belousov
1246ee664b nfscl_loadattrcache: fix rest of the cases to not call
vnode_pager_setsize() under the node mutex.

r248567 moved some calls of vnode_pager_setsize() after the node lock
is unlocked, do the rest now.

Reported and tested by:	peterj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-09-16 13:26:27 +00:00
Edward Tomasz Napierala
cf38985293 Make pseudofs(9) create directory entries in order, instead
of the reverse.

This fixes Linux sysctl(8) binary - it assumes the first two
directory entries are always "." and "..". There might be other
Linux apps affected by this.

NB it might be a good idea to rewrite it using queue(3).

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21550
2019-09-14 19:16:07 +00:00
Conrad Meyer
aaa3852435 buf: Add B_INVALONERR flag to discard data
Setting the B_INVALONERR flag before a synchronous write causes the buf
cache to forcibly invalidate contents if the write fails (BIO_ERROR).

This is intended to be used to allow layers above the buffer cache to make
more informed decisions about when discarding dirty buffers without
successful write is acceptable.

As a proof of concept, use in msdosfs to handle failures to mark the on-disk
'dirty' bit during rw mount or ro->rw update.

Extending this to other filesystems is left as future work.

PR:		210316
Reviewed by:	kib (with objections)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D21539
2019-09-11 21:24:14 +00:00
Alan Somers
6c0c362075 fusefs: Fix iosize for FUSE_WRITE in 7.8 compat mode
When communicating with a FUSE server that implements version 7.8 (or older)
of the FUSE protocol, the FUSE_WRITE request structure is 16 bytes shorter
than normal. The protocol version check wasn't applied universally, leading
to an extra 16 bytes being sent to such servers. The extra bytes were
allocated and bzero()d, so there was no information disclosure.

Reviewed by:	emaste
MFC after:	3 days
MFC-With:	r350665
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21557
2019-09-11 19:29:40 +00:00
Mark Johnston
fee2a2fa39 Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator.  In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well.  These references are protected by the page lock, which must
therefore be acquired for many per-page operations.  This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.

Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter.  A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held.  As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.

The vm_page_wire() and vm_page_unwire() KPIs are changed.  The former
requires that either the object lock or the busy lock is held.  The
latter no longer has a return value and may free the page if it releases
the last reference to that page.  vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate.  vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold().  It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler.  vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state).  In particular, synchronization details are no longer
leaked into the caller.

The change excises the page lock from several frequently executed code
paths.  In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock.  In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.

__FreeBSD_version is bumped.  The DRM ports have been updated to
accomodate the KPI changes.

Reviewed by:	jeff (earlier version)
Tested by:	gallatin (earlier version), pho
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20486
2019-09-09 21:32:42 +00:00
Ed Maste
4f0372f8cb msdosfsmount.h: fix ifdef comment 2019-09-09 18:35:17 +00:00
Alan Somers
16f8783452 Coverity fixes in fusefs(5)
CID 1404532 fixes a signed vs unsigned comparison error in fuse_vnop_bmap.
It could potentially have resulted in VOP_BMAP reporting too many
consecutive blocks.

CID 1404364 is much worse. It was an array access by an untrusted,
user-provided variable. It could potentially have resulted in a malicious
file system crashing the kernel or worse.

Reported by:	Coverity
Reviewed by:	emaste
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21466
2019-09-06 19:40:11 +00:00
Conrad Meyer
454409a372 msdosfs: Remove redundant brelse() after r294954
Same automation.

No functional change.
2019-09-06 08:08:10 +00:00
Conrad Meyer
dadc0f97d0 cd9660: Remove redundant brelse() after r294954
Same automation.

No functional change.
2019-09-06 08:07:36 +00:00
Conrad Meyer
fe8b34563d ext2fs: Remove redundant brelse() after r294954
Coccinelle:

@ rule1 @
 identifier __error;
@@
 ...
 int __error;
 ...

@ rule2 depends on rule1 @
 identifier rule1.__error;
 identifier __bp;
@@

 __error =
(
 bread
|
 bread_gb
|
 breadn
|
 breadn_flags
)
 (..., &__bp);
 if (
(
 __error
|
 __error != 0
)
 ) {
 ...
- brelse(__bp);
 ...
 }

No functional change.
2019-09-06 08:07:12 +00:00
Rick Macklem
4ce21f37fd Delete the unused "nd" argument for nfsrv_proxyds().
The "nd" argument for nfsrv_proxyds() is no longer used by the function.
This patch deletes it. This allows a subsequent patch to delete the "nd"
argument from nfsvno_getattr(), since it's only use of "nd" was to pass it
to nfsrv_proxyds().
Getting rid of the "nd" argument from nfsvno_getattr() avoids confusion
over why it might need "nd".

This patch is trivial and does not have any semantic effect.
2019-09-05 22:25:19 +00:00
Conrad Meyer
a6935d085c Remove long-dead BUF_ASSERT_{,UN}HELD assertions
These were fully neutered in r177676 (2008), but not removed at the time for
unclear reasons.  They're totally dead code, so go ahead and yank them now.

No functional change.
2019-09-05 21:43:33 +00:00
Conrad Meyer
f80cbeb292 msdosfs: Drop an unneeded brelse in bread error condition
After r294954, it is an invariant that bread returns non-NULL bp if and only
if the routine succeeded.  On error, it handles any buffer cleanup
internally.  So the brelse(NULL) here was just redundant.

No functional change.

Discussed with:	kib (extracted from a larger differential)
2019-09-05 21:30:52 +00:00
Rick Macklem
2e67077700 Delete the unused "nd" argument for nfsrv_checkdsattr().
The "nd" argument for nfsrv_checkdsattr() is no longer used by the function.
This patch deletes it. This allows subsequent patches to delete the "nd"
argument from nfsrv_proxyds(), since it's only use of "nd" was to pass it
to nfsrv_checkdsattr(). The same will then be true for nfsvno_getattr(),
which passes "nd" to nfsrv_proxyds().
Getting rid of the "nd" argument from nfsvno_getattr() avoids confusion
over why it might need "nd".

This patch is trivial and does not have any semantic effect.
Found by inspection while working on the NFSv4.2 server.
2019-09-04 22:37:28 +00:00
Kyle Evans
f99c5e8d28 pseudofs: make readdir work without a pid again
Specifically, the following was broken:

$ mount -t procfs procfs /proc
$ ls -l /proc

r351741 reworked readdir slightly to avoid pfs_node/pidhash LOR, but
inadvertently regressed pid == NO_PID; new pfs_lookup_proc() fails for the
obvious reasons, and later pfs_visible_proc doesn't capture the
pid == NO_PID -> return 1 aspect of pfs_visible. We can infact skip this
whole block if we're operating on a directory w/ NO_PID, as it's always
visible.

Reported by:	trasz
Reviewed by:	mjg
Differential Revision:	https://reviews.freebsd.org/D21518
2019-09-04 14:20:39 +00:00
Mateusz Guzik
f5791174df pseudofs: fix a LOR pfs_node vs pidhash (sleepable after non-sleepable)
Sponsored by:	The FreeBSD Foundation
2019-09-03 12:54:51 +00:00
Ed Maste
840aca2880 makefs: share msdosfsmount.h between kernel msdosfs and makefs
Sponsored by:	The FreeBSD Foundation
2019-09-01 16:55:33 +00:00
Mateusz Guzik
e0f4540a2a nullfs: reduce areas protected by vnode interlock in null_lock
Similarly to the other routine stop taking the interlock for the lower
vnode. The interlock for nullfs vnode is still taken to ensure
stability of ->v_data.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21480
2019-09-01 02:52:00 +00:00
Mateusz Guzik
13c73428dc nullfs: use VOP_NEED_INACTIVE
Reviewed by:	kib
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
2019-08-30 00:30:03 +00:00
Mark Johnston
9222b82368 Remove unused VM page locking macros.
They were orphaned by r292373.

Reviewed by:	asomers
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21469
2019-08-29 22:13:15 +00:00
Konstantin Belousov
6470c8d3db Rework v_object lifecycle for vnodes.
Current implementation of vnode_create_vobject() and
vnode_destroy_vobject() is written so that it prepared to handle the
vm object destruction for live vnode.  Practically, no filesystems use
this, except for some remnants that were present in UFS till today.
One of the consequences of that model is that each filesystem must
call vnode_destroy_vobject() in VOP_RECLAIM() or earlier, as result
all of them get rid of the v_object in reclaim.

Move the call to vnode_destroy_vobject() to vgonel() before
VOP_RECLAIM().  This makes v_object stable: either the object is NULL,
or it is valid vm object till the vnode reclamation.  Remove code from
vnode_create_vobject() to handle races with the parallel destruction.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D21412
2019-08-29 07:50:25 +00:00
Mateusz Guzik
a89cd2a4bd tmpfs: use VOP_NEED_INACTIVE
Reviewed by:	kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21371
2019-08-28 20:35:23 +00:00
Mateusz Guzik
1e2f0ceb2f vfs: add VOP_NEED_INACTIVE
vnode usecount drops to 0 all the time (e.g. for directories during path lookup).
When that happens the kernel would always lock the exclusive lock for the vnode
in order to call vinactive(). This blocks other threads who want to use the vnode
for looukp.

vinactive is very rarely needed and can be tested for without the vnode lock held.

This patch gives filesytems an opportunity to do it, sample total wait time for
tmpfs over 500 minutes of poudriere -j 104:

before: 557563641706 (lockmgr:tmpfs)
after:   46309603301 (lockmgr:tmpfs)

Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21371
2019-08-28 20:34:24 +00:00
Alan Somers
5e63333052 fusefs: Fix some bugs regarding the size of the LISTXATTR list
* A small error in r338152 let to the returned size always being exactly
  eight bytes too large.

* The FUSE_LISTXATTR operation works like Linux's listxattr(2): if the
  caller does not provide enough space, then the server should return ERANGE
  rather than return a truncated list.  That's true even though in FUSE's
  case the kernel doesn't provide space to the client at all; it simply
  requests a maximum size for the list.  We previously weren't handling the
  case where the server returns ERANGE even though the kernel requested as
  much size as the server had told us it needs; that can happen due to a
  race.

* We also need to ensure that a pathological server that always returns
  ERANGE no matter what size we request in FUSE_LISTXATTR won't cause an
  infinite loop in the kernel.  As of this commit, it will instead cause an
  infinite loop that exits and enters the kernel on each iteration, allowing
  signals to be processed.

Reviewed by:	cem
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21287
2019-08-28 04:19:37 +00:00
Mateusz Guzik
4840711516 unionfs: stop passing LK_INTERLOCK to VOP_UNLOCK
This is part of the preparation to remove flags argument from VOP_UNLOCK.
Also has a side effect of fixing stacking on top of nullfs broken by r351472.

Reported by:	cy
Sponsored by:	The FreeBSD Foundation
2019-08-27 20:51:17 +00:00
Mateusz Guzik
33d46a3cef nullfs: reduce areas protected by vnode interlock
Some places only take the interlock to hold the vnode, which was a requiremnt
before they started being manipulated with atomics. Use the newly introduced
vholdnz to bump the count.

Reviewed by:	kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21358
2019-08-25 05:13:15 +00:00
Ed Maste
b257feb247 msdosfs_fat: reduce diffs with NetBSD and makefs
Use pointer arithmetic (as now done in makefs, and in NetBSD) instead of
taking the address of array element.  No functional change, but this
makes it easier to compare different versions of this file.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21365
2019-08-22 16:06:52 +00:00
Mateusz Guzik
81f666e79d nullfs: lock the vnode with LK_SHARED in null_vptocnp
null_nodeget which follows almost always finds the target vnode in the hash,
avoiding insmntque1 altogether. Should it be needed, it already checks if the
lock needs to be upgraded.

Reviewed by:	kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20244
2019-08-21 23:24:40 +00:00
Ed Maste
476b0ab758 makefs: share denode.h between kernel msdosfs and makefs
There is no need to duplicate this file when it can be trivially
shared (just exposing sections previously under #ifdef _KERNEL).

MFC with:	r351273
Differential Revision:	The FreeBSD Foundation
2019-08-21 19:07:13 +00:00
Ed Maste
51e79affa3 makefs: share fat.h between kernel msdosfs and makefs
There is no reason to duplicate this file when it can be trivially
shared (just exposing one section previously under #ifdef _KERNEL).

Reviewed by:	imp, cem
MFC with:	r351273
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21346
2019-08-21 02:21:40 +00:00
Konstantin Belousov
de4e1aeb21 Fix an issue with executing tmpfs binary.
Suppose that a binary was executed from tmpfs mount, and the text
vnode was reclaimed while the binary was still running.  It is
possible during even the normal operations since tmpfs vnode'
vm_object has swap type, and no references on the vnode is held.  Also
assume that the text vnode was revived for some reason.  Then, on the
process exit or exec, unmapping of the text mapping tries to remove
the text reference from the vnode, but since it went from
recycle/instantiation cycle, there is no reference kept, and assertion
in VOP_UNSET_TEXT_CHECKED() triggers.

Fix this by keeping a use reference on the tmpfs vnode for each exec
reference.  This prevents the vnode reclamation while executable map
entry is active.

Do it by adding per-mount flag MNTK_TEXT_REFS that directs
vop_stdset_text() to add use ref on first vnode text use, and
per-vnode VI_TEXT_REF flag, to record the need on unref in
vop_stdunset_text() on last vnode text use going away.  Set
MNTK_TEXT_REFS for tmpfs mounts.

Reported by:	bdrewery
Tested by:	sbruno, pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-08-18 20:36:11 +00:00
Alan Somers
3a79e8e772 fusefs: don't send the namespace during listextattr
The FUSE_LISTXATTR operation always returns the full list of a file's
extended attributes, in all namespaces. There's no way to filter the list
server-side. However, currently FreeBSD's fusefs driver sends a namespace
string with the FUSE_LISTXATTR request. That behavior was probably copied
from fuse_vnop_getextattr, which has an attribute name argument. It's
been there ever since extended attribute support was added in r324620. This
commit removes it.

Reviewed by:	cem
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21280
2019-08-16 05:06:54 +00:00
Alan Somers
bf50749773 fusefs: Fix the size of fuse_getattr_in
In FUSE protocol 7.9, the size of the FUSE_GETATTR request has increased.
However, the fusefs driver is currently not sending the additional fields.
In our implementation, the additional fields are always zero, so I there
haven't been any test failures until now.  But fusefs-lkl requires the
request's length to be correct.

Fix this bug, and also enhance the test suite to catch similar bugs.

PR:		239830
MFC after:	2 weeks
MFC-With:	350665
Sponsored by:	The FreeBSD Foundation
2019-08-14 20:45:00 +00:00
Alan Somers
0b4275accb fusefs: merge from projects/fuse2
This commit imports the new fusefs driver. It raises the protocol level
from 7.8 to 7.23, fixes many bugs, adds a test suite for the driver, and
adds many new features. New features include:

* Optional kernel-side permissions checks (-o default_permissions)
* Implement VOP_MKNOD, VOP_BMAP, and VOP_ADVLOCK
* Allow interrupting FUSE operations
* Support named pipes and unix-domain sockets in fusefs file systems
* Forward UTIME_NOW during utimensat(2) to the daemon
* kqueue support for /dev/fuse
* Allow updating mounts with "mount -u"
* Allow exporting fusefs file systems over NFS
* Server-initiated invalidation of the name cache or data cache
* Respect RLIMIT_FSIZE
* Try to support servers as old as protocol 7.4

Performance enhancements include:

* Implement FUSE's FOPEN_KEEP_CACHE and FUSE_ASYNC_READ flags
* Cache file attributes
* Cache lookup entries, both positive and negative
* Server-selectable cache modes: writethrough, writeback, or uncached
* Write clustering
* Readahead
* Use counter(9) for statistical reporting

PR:		199934 216391 233783 234581 235773 235774 235775
PR:		236226 236231 236236 236291 236329 236381 236405
PR:		236327 236466 236472 236473 236474 236530 236557
PR:		236560 236844 237052 237181 237588 238565
Reviewed by:	bcr (man pages)
Reviewed by:	cem, ngie, rpokala, glebius, kib, bde, emaste (post-commit
		review on project branch)
MFC after:	3 weeks
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
Pull Request:	https://reviews.freebsd.org/D21110
2019-08-07 00:38:26 +00:00
Alan Somers
427d205cb5 fusefs: remove superfluous counter_u64_zero
Reported by:	glebius
Sponsored by:	The FreeBSD Foundation
2019-08-06 00:50:25 +00:00
Konstantin Belousov
30d49d536b Try to decrease the number of bugs in unionfs after the VV_TEXT flag removal.
- Provide unionfs_add_writecount() which passes the writecount to the
  lower or upper vnode as appropriate.
- In unionfs VOP_RECLAIM() implementation, annulate unionfs
  writecounts from upper or lower vnode.  It is not clear that it is
  always correct to remove the all references from either lower or
  upper vnode, but we currently do not track which vnode get how many
  refs anyway.

Reported and tested by:	t_uemura@macome.co.jp
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-08-01 14:40:37 +00:00
Alan Somers
508abc9494 fusefs: fix the build after r350446
fuse needs to include an additional header after r350446

Sponsored by:	The FreeBSD Foundation
2019-07-31 21:48:35 +00:00
Alan Somers
58df81b339 MFHead @350426
Sponsored by:	The FreeBSD Foundation
2019-07-30 04:17:36 +00:00
Mark Johnston
918988576c Avoid relying on header pollution from sys/refcount.h.
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-07-29 20:26:01 +00:00
Alan Somers
669a092af1 fusefs: fix panic when writing with O_DIRECT and using writeback cache
When a fusefs file system is mounted using the writeback cache, the cache
may still be bypassed by opening a file with O_DIRECT.  When writing with
O_DIRECT, the cache must be invalidated for the affected portion of the
file.  Fix some panics caused by inadvertently invalidating too much.

Sponsored by:	The FreeBSD Foundation
2019-07-28 15:17:32 +00:00
Alan Somers
a63915c2d7 MFHead @r350386
Sponsored by:	The FreeBSD Foundation
2019-07-28 04:02:22 +00:00
Alan Somers
ed74f781c9 fusefs: add a intr/nointr mount option
FUSE file systems can optionally support interrupting outstanding
operations.  However, the file system does not identify to the kernel at
mount time whether it's capable of doing that.  Instead it signals its
noncapability by returning ENOSYS to the first FUSE_INTERRUPT operation it
receives.  That's a problem for reliable signal delivery, because the kernel
must choose which thread should get a signal before it knows whether the
FUSE server can handle interrupts.  The problem is even worse because the
FUSE protocol allows a file system to simply ignore all FUSE_INTERRUPT
operations.

Fix the signal delivery logic by making interruptibility an opt-in mount
option.  This will require a corresponding change to libfuse, but not to
most file systems that link to libfuse.

Bump __FreeBSD_version due to the new mount option.

Sponsored by:	The FreeBSD Foundation
2019-07-18 17:55:13 +00:00
Alan Somers
f05962453e fusefs: fix another semi-infinite loop bug regarding signal handling
fticket_wait_answer would spin if it received an unhandled signal whose
default disposition is to terminate.  The reason is because msleep(9) would
return EINTR even for a masked signal.  One reason is when the thread is
stopped, which happens for example during sigexit().  Fix this bug by
returning immediately if fticket_wait_answer ever gets interrupted a second
time, for any reason.

Sponsored by:	The FreeBSD Foundation
2019-07-18 15:30:00 +00:00
Alan Somers
d26d63a4af fusefs: multiple interruptility improvements
1) Don't explicitly not mask SIGKILL.  kern_sigprocmask won't allow it to be
   masked, anyway.

2) Fix an infinite loop bug.  If a process received both a maskable signal
   lower than 9 (like SIGINT) and then received SIGKILL,
   fticket_wait_answer would spin.  msleep would immediately return EINTR,
   but cursig would return SIGINT, so the sleep would get retried.  Fix it
   by explicitly checking whether SIGKILL has been received.

3) Abandon the sig_isfatal optimization introduced by r346357.  That
   optimization would cause fticket_wait_answer to return immediately,
   without waiting for a response from the server, if the process were going
   to exit anyway.  However, it's vulnerable to a race:

   1) fatal signal is received while fticket_wait_answer is sleeping.
   2) fticket_wait_answer sends the FUSE_INTERRUPT operation.
   3) fticket_wait_answer determines that the signal was fatal and returns
      without waiting for a response.
   4) Another thread changes the signal to non-fatal.
   5) The first thread returns to userspace.  Instead of exiting, the
      process continues.
   6) The application receives EINTR, wrongly believes that the operation
      was successfully interrupted, and restarts it.  This could cause
      problems for non-idempotent operations like FUSE_RENAME.

Reported by:    kib (the race part)
Sponsored by:   The FreeBSD Foundation
2019-07-17 22:45:43 +00:00
Alan Somers
07e86257e6 fusefs: fix the build with some NODEBUG kernels
systm.h needs to be included before counter.h

Sponsored by:	The FreeBSD Foundation
2019-07-13 21:41:12 +00:00
Alan Somers
97b0512b23 projects/fuse2: build fixes
* Fix the kernel build with gcc by removing a redundant extern declaration
* In the tests, fix a printf format specifier that assumed LP64

Sponsored by:	The FreeBSD Foundation
2019-07-13 14:42:09 +00:00
Fedor Uporov
6ce04e595a Add additional check for 'blocks per group' and 'fragments per group' superblock fields.
These fields will not be equal only in case if bigalloc filesystem feature is turned on.
This feature is not supported for now.

Reported by:    Christopher Krah, Thomas Barabosch, and Jan-Niclas Hilgert of Fraunhofer FKIE
Reported as:    FS-27-EXT2-12: Denial of Service in openat-0 (vm_fault_hold/ext2_clusteracct)

MFC after:	2 weeks
2019-07-07 08:58:02 +00:00
Fedor Uporov
c008656263 Remove ufs fragments logic.
The ext2fs fragments are different from ufs fragments.
In case of ext2fs the fragment should be equal or more then block size.
The values more than block size are used only in case of bigalloc feature, which is does not supported for now.

Reported by:    Christopher Krah, Thomas Barabosch, and Jan-Niclas Hilgert of Fraunhofer FKIE
Reported as:    FS-22-EXT2-9: Denial of service in ftruncate-0 (ext2_balloc)

MFC after:	2 weeks
2019-07-07 08:56:13 +00:00
Fedor Uporov
590517d05a Remove unneeded mount point unlock call.
Reported by:    Christopher Krah, Thomas Barabosch, and Jan-Niclas Hilgert of Fraunhofer FKIE
Reported as:    FS-11-EXT2-6: Denial Of Service in write-1 (ext2_balloc)

MFC after:	2 weeks
2019-07-07 08:53:52 +00:00
Alan Somers
7e1f5432f4 fusefs: don't leak memory of unsent operations on unmount
Sponsored by:	The FreeBSD Foundation
2019-06-28 18:48:02 +00:00
Alan Somers
8aafc8c389 [skip ci] update copyright headers in fusefs files
Sponsored by:	The FreeBSD Foundation
2019-06-28 04:18:10 +00:00
Alan Somers
7f49ce7a0b MFHead @349476
Sponsored by:	The FreeBSD Foundation
2019-06-27 23:50:54 +00:00
Alan Somers
c1afff113c fusefs: fix a memory leak regarding FUSE_INTERRUPT
We were leaking the fuse ticket if the original operation completed before
the daemon received the INTERRUPT operation.  Fixing this was easier than I
expected.

Sponsored by:	The FreeBSD Foundation
2019-06-27 22:24:56 +00:00
Alan Somers
435ecf40bb fusefs: recycle vnodes after their last unlink
Previously fusefs would never recycle vnodes.  After VOP_INACTIVE, they'd
linger around until unmount or the vnlru reclaimed them.  This commit
essentially actives and inlines the old reclaim_revoked sysctl, and fixes
some issues dealing with the attribute cache and multiply linked files.

Sponsored by:	The FreeBSD Foundation
2019-06-27 20:18:12 +00:00
Alan Somers
38c8634635 fusefs: counter(9) variables should not be statically initialized
Reported by:	rpokala
Sponsored by:	The FreeBSD Foundation
2019-06-27 17:59:15 +00:00
Alan Somers
560a55d094 fusefs: convert statistical sysctls to use counter(9)
counter(9) is more performant than using atomic instructions to update
sysctls that just report statistics to userland.

Sponsored by:	The FreeBSD Foundation
2019-06-27 16:30:25 +00:00
Alan Somers
caeea8b4cc fusefs: fix some memory leaks
Fix memory leaks relating to FUSE_BMAP and FUSE_CREATE.  There are still
leaks relating to FUSE_INTERRUPT, but they'll be harder to fix since the
server is legally allowed to never respond to a FUSE_INTERRUPT operation.

Sponsored by:	The FreeBSD Foundation
2019-06-27 00:00:48 +00:00
Alan Somers
f8ebf1cd7e fusefs: implement protocol 7.23's FUSE_WRITEBACK_CACHE option
As of protocol 7.23, fuse file systems can specify their cache behavior on a
per-mountpoint basis.  If they set FUSE_WRITEBACK_CACHE in
fuse_init_out.flags, then they'll get the writeback cache.  If not, then
they'll get the writethrough cache.  If they set FOPEN_DIRECT_IO in every
FUSE_OPEN response, then they'll get no cache at all.

The old vfs.fusefs.data_cache_mode sysctl is ignored for servers that use
protocol 7.23 or later.  However, it's retained for older servers,
especially for those running in jails that lack access to the new protocol.

This commit also fixes two other minor test bugs:
* WriteCluster:SetUp was using an uninitialized variable.
* Read.direct_io_pread wasn't verifying that the cache was actually
  bypassed.

Sponsored by:	The FreeBSD Foundation
2019-06-26 17:32:31 +00:00
Alan Somers
205696a17d fusefs: delete some unused mount options
The fusefs kernel module allegedly supported no_attrcache, no_readahed,
no_datacache, no_namecache, and no_mmap mount options, but the mount_fusefs
binary never did.  So there was no way to ever activate these options.
Delete them.  Some of them have alternatives:

no_attrcache: set the attr_valid time to 0 in FUSE_LOOKUP and FUSE_GETATTR
	responses.
no_readahed: set max_readahead to 0 in the FUSE_INIT response.
no_datacache: set the vfs.fusefs.data_cache_mode sysctl to 0, or (coming
	soon) set the attr_valid time to 0 and set FUSE_AUTO_INVAL_DATA in
	the FUSE_INIT response.
no_namecache: set entry_valid time to 0 in FUSE_LOOKUP and FUSE_GETATTR
	responses.

Sponsored by:	The FreeBSD Foundation
2019-06-26 15:15:24 +00:00
Alan Somers
fef464546c fusefs: implement the "time_gran" feature.
If a server supports a timestamp granularity other than 1ns, it can tell the
client this as of protocol 7.23.  The client will use that granularity when
updating its cached timestamps during write.  This way the timestamps won't
appear to change following flush.

Sponsored by:	The FreeBSD Foundation
2019-06-26 02:09:22 +00:00
Alan Somers
0a8fe2d369 fusefs: set ctime during FUSE_SETATTR following a write
As of r349396 the kernel will internally update the mtime and ctime of files
on write.  It will also flush the mtime should a SETATTR happen before the
data cache gets flushed.  Now it will flush the ctime too, if the server is
using protocol 7.23 or higher.

This is the only case in which the kernel will explicitly set a file's
ctime, since neither utimensat(2) nor any other user interfaces allow it.

Sponsored by:	The FreeBSD Foundation
2019-06-26 00:03:37 +00:00
Alan Somers
788af9538a fusefs: automatically update mtime and ctime on write
Writing should implicitly update a file's mtime and ctime.  For fuse, the
server is supposed to do that.  But the client needs to do it too, because
the FUSE_WRITE response does not include time attributes, and it's not
desirable to issue a GETATTR after every WRITE.  When using the writeback
cache, there's another hitch: the kernel should ignore the mtime and ctime
fields in any GETATTR response for files with a dirty write cache.

Sponsored by:	The FreeBSD Foundation
2019-06-25 23:40:18 +00:00
Alan Somers
0d3a88d76c fusefs: writes should update the file size, even when data_cache_mode=0
Writes that extend a file should update the file's size.  r344185 restricted
that behavior for fusefs to only happen when the data cache was enabled.
That probably made sense at the time because the attribute cache wasn't
fully baked yet.  Now that it is, we should always update the cached file
size during write.

Sponsored by:	The FreeBSD Foundation
2019-06-25 18:36:11 +00:00
Alan Somers
b9e2019755 fusefs: rewrite vop_getpages and vop_putpages
Use the standard facilities for getpages and putpages instead of bespoke
implementations that don't work well with the writeback cache.  This has
several corollaries:

* Change the way we handle short reads _again_.  vfs_bio_getpages doesn't
  provide any way to handle unexpected short reads.  Plus, I found some more
  lock-order problems.  So now when the short read is detected we'll just
  clear the vnode's attribute cache, forcing the file size to be requeried
  the next time it's needed.  VOP_GETPAGES doesn't have any way to indicate
  a short read to the "caller", so we just bzero the rest of the page
  whenever a short read happens.

* Change the way we decide when to set the FUSE_WRITE_CACHE bit.  We now set
  it for clustered writes even when the writeback cache is not in use.

Sponsored by:   The FreeBSD Foundation
2019-06-25 17:24:43 +00:00
Hans Petter Selasky
43a9329e1b Free all allocated unit IDs in cuse(3) after the client character
devices have been destroyed to avoid creating character devices with
identical name.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-06-25 11:46:01 +00:00
Hans Petter Selasky
c7ffaed92e Fix for deadlock situation in cuse(3)
The final server unref should be done by the server thread to prevent
deadlock in the client cdevpriv destructor, which cannot destroy
itself.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-06-25 11:42:53 +00:00
Warner Losh
e5500f1efa Replay r349334 by markj accidentally reverted by r349352
Remove a lingering use of splbio().

The buffer must be locked by the caller.  No functional change
intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-06-25 06:14:00 +00:00
Warner Losh
f5a95d9a07 Remove NAND and NANDFS support
NANDFS has been broken for years. Remove it. The NAND drivers that
remain are for ancient parts that are no longer relevant. They are
polled, have terrible performance and just for ancient arm
hardware. NAND parts have evolved significantly from this early work
and little to none of it would be relevant should someone need to
update to support raw nand. This code has been off by default for
years and has violated the vnode protocol leading to panics since it
was committed.

Numerous posts to arch@ and other locations have found no actual users
for this software.

Relnotes:	Yes
No Objection From: arch@
Differential Revision: https://reviews.freebsd.org/D20745
2019-06-25 04:50:09 +00:00
Alan Somers
1734e205f3 fusefs: refine the short read fix from r349332
b_fsprivate1 needs to be initialized even for write operations, probably
because a buffer can be used to read, write, and read again with the final
read serviced by cache.

Sponsored by:	The FreeBSD Foundation
2019-06-24 20:08:28 +00:00
Mark Johnston
673c1c2944 Remove a lingering use of splbio().
The buffer must be locked by the caller.  No functional change
intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-06-24 19:19:37 +00:00
Alan Somers
17575bad85 fusefs: improve the short read fix from r349279
VOP_GETPAGES intentionally tries to read beyond EOF, so fuse_read_biobackend
can't rely on bp->b_resid > 0 indicating a short read.  And adjusting
bp->b_count after a short read seems to cause some sort of resource leak.
Instead, store the shortfall in the bp->b_fsprivate1 field.

Sponsored by:	The FreeBSD Foundation
2019-06-24 17:05:31 +00:00
Alan Somers
44f654fdc5 fusefs: fix corruption on short reads caused by r349279
Even if a short read is caused by EOF, it's still necessary to bzero the
remaining buffer, because that buffer could become valid as a result of a
future ftruncate or pwrite operation.

Reported by:	fsx
Sponsored by:	The FreeBSD Foundation
2019-06-21 23:29:29 +00:00
Alan Somers
aef22f2d75 fusefs: correctly handle short reads
A fuse server may return a short read for three reasons:

* The file is opened with FOPEN_DIRECT_IO.  In this case, the short read
  should be returned directly to userland.  We already handled this case
  correctly.

* The file was truncated server-side, and the read hit EOF.  In this case,
  the kernel should update the file size.  Fixed in the case of VOP_READ.
  Fixing this for VOP_GETPAGES is TODO.

* The file is opened in writeback mode, there are dirty buffers past what
  the server thinks is the file's EOF, and the read hit what the server
  thinks is the file's EOF.  In this case, the client is trying to read a
  hole, and should zero-fill it.  We already handled this case, and I added
  a test for it.

Sponsored by:	The FreeBSD Foundation
2019-06-21 21:44:31 +00:00
Alan Somers
87ff949a7b fusefs: raise protocol level to 7.23
None of the new features are implemented yet.  This commit just adds the new
protocol definitions and adds backwards-compatibility code for pre 7.23
servers.

Sponsored by:	The FreeBSD Foundation
2019-06-21 04:57:23 +00:00
Alan Somers
8f9b3ba718 fusefs: use standard integer types in fuse_kernel.h
This is a merge of Linux revision 4c82456eeb4da081dd63dc69e91aa6deabd29e03.
No functional change.

Sponsored by:	The FreeBSD Foundation
2019-06-21 03:17:27 +00:00
Alan Somers
b160acd1c0 fusefs: raise the protocol level to 7.21
Jumping from protocol 7.15 to 7.21 adds several new features.  While they're
all potentially useful, they're also all optional, and I'm not implementing
any right now because my highest priority lies in a later version.

Sponsored by:	The FreeBSD Foundation
2019-06-21 03:04:56 +00:00
Alan Somers
ecb489158c fusefs: diff reduction of fuse_kernel.h vs the upstream version
fuse_kernel.h is based on Linux's fuse.h.  In r349250 I modified
fuse_kernel.h by generating a diff of two versions of Linux's fuse.h and
applying it to our tree.  patch succeeded, but it put one chunk in the wrong
location.  This commit fixes that.  No functional changes.

Sponsored by:	The FreeBSD Foundation
2019-06-21 02:55:43 +00:00
Alan Somers
7cbb8e8a06 fusefs: raise protocol level to 7.15
This protocol level adds two new features: the ability for the server to
store or retrieve data into/from the client's cache.  But the messages
aren't defined soundly since they identify the file only by its inode,
without the generation number.  So it's possible for them to modify the
wrong file's cache.  Also, I don't know of any file systems in ports that
use these messages.  So I'm not implementing them.  I did add a (disabled)
test for the store message, however.

Sponsored by:	The FreeBSD Foundation
2019-06-20 23:32:25 +00:00
Alan Somers
bb23d43901 fusefs: trivially raise protocol level to 7.14
The only new feature is splice(2) support on /dev/fuse, which FreeBSD can't
support.

Sponsored by:	The FreeBSD Foundation
2019-06-20 23:12:19 +00:00
Alan Somers
38b06f8ac4 fcntl: fix overflow when setting F_READAHEAD
VOP_READ and VOP_WRITE take the seqcount in blocks in a 16-bit field.
However, fcntl allows you to set the seqcount in bytes to any nonnegative
31-bit value. The result can be a 16-bit overflow, which will be
sign-extended in functions like ffs_read. Fix this by sanitizing the
argument in kern_fcntl. As a matter of policy, limit to IO_SEQMAX rather
than INT16_MAX.

Also, fifos have overloaded the f_seqcount field for a completely different
purpose ever since r238936.  Formalize that by using a union type.

Reviewed by:	cem
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20710
2019-06-20 23:07:20 +00:00
Alan Somers
192a918194 fusefs: attempt to support servers as old as protocol 7.4
Previously we allowed servers as old as 7.1 to connect (there never was a
7.0).  However, we wrongly assumed a few things about protocols older than
7.8.  This commit attempts to support servers as old as 7.4 but no older.  I
added no new tests because I'm not sure there actually _are_ any servers
this old in the wild.

Sponsored by:	The FreeBSD Foundation
2019-06-20 22:21:42 +00:00
Alan Somers
2ffddc5ee9 fusefs: raise protocol level to 7.13
This protocol version adds one new feature: the ability for the server to
set the maximum number of background requests and a "congestion threshold"
with ill-defined properties.  I don't know of any fuse file systems in ports
that use this feature, so I'm not implementing it.

Sponsored by:	The FreeBSD Foundation
2019-06-20 21:29:28 +00:00
Alan Somers
a1c9f4ad0d fusefs: implement VOP_BMAP
If the fuse daemon supports FUSE_BMAP, then use that for the block mapping.
Otherwise, use the same technique used by vop_stdbmap.  Report large values
for runp and runb in order to maximize read clustering and minimize upcalls,
even if we don't know the true layout.

The major result of this change is that sequential reads to FUSE files will
now usually happen 128KB at a time instead of 64KB.

Sponsored by:	The FreeBSD Foundation
2019-06-20 17:08:21 +00:00
Alan Somers
e532a99901 MFHead @349234
Sponsored by:	The FreeBSD Foundation
2019-06-20 15:56:08 +00:00
Alan Somers
84879e46c2 fusefs: multiple fixes related to the write cache
* Don't always write the last page synchronously.  That's not actually
  required.  It was probably just masking another bug that I fixed later,
  possibly in r349021.

* Enable the NotifyWriteback tests now that Writeback cache is working.

* Add a test to ensure that the write cache isn't flushed synchronously when
  in writeback mode.

Sponsored by:	The FreeBSD Foundation
2019-06-17 23:34:11 +00:00
Alan Somers
402b609c80 fusefs: use cluster_read for more readahead
fusefs will now use cluster_read.  This allows readahead of more than one
cache block.  However, it won't yet actually cluster the reads because that
requires VOP_BMAP, which fusefs does not yet implement.

Sponsored by:	The FreeBSD Foundation
2019-06-17 22:01:23 +00:00
Xin LI
f89d207279 Separate kernel crc32() implementation to its own header (gsb_crc32.h) and
rename the source to gsb_crc32.c.

This is a prerequisite of unifying kernel zlib instances.

PR:		229763
Submitted by:	Yoshihiro Ota <ota at j.email.ne.jp>
Differential Revision:	https://reviews.freebsd.org/D20193
2019-06-17 19:49:08 +00:00
Alan Somers
d569012f45 fusefs: implement non-clustered readahead
fusefs will now read ahead at most one cache block at a time (usually 64
KB).  Clustered reads are still TODO.  Individual file systems may disable
read ahead by setting fuse_init_out.max_readahead=0 during initialization.

Sponsored by:	The FreeBSD Foundation
2019-06-17 16:56:51 +00:00
Alan Somers
b5aaf286ea fusefs: fix the "write-through" of write-through cacheing
Our fusefs(5) module supports three cache modes: uncached, write-through,
and write-back.  However, the write-through mode (which is the default) has
never actually worked as its name suggests.  Rather, it's always been more
like "write-around".  It wrote directly, bypassing the cache.  The cache
would only be populated by a subsequent read of the same data.

This commit fixes that problem.  Now the write-through mode works as one
would expect: write(2) immediately adds data to the cache and then blocks
while the daemon processes the write operation.

A side effect of this change is that non-cache-block-aligned writes will now
incur a read-modify-write cycle of the cache block.  The old behavior
(bypassing write cache entirely) can still be achieved by opening a file
with O_DIRECT.

PR:		237588
Sponsored by:	The FreeBSD Foundation
2019-06-14 19:47:48 +00:00
Alan Somers
8eecd9ce05 fusefs: enable write clustering
Enable write clustering in fusefs whenever cache mode is set to writeback
and the "async" mount option is used.  With default values for MAXPHYS,
DFLTPHYS, and the fuse max_write mount parameter, that means sequential
writes will now be written 128KB at a time instead of 64KB.

Also, add a regression test for PR 238565, a panic during unmount that
probably affects UFS, ext2, and msdosfs as well as fusefs.

PR:		238565
Sponsored by:	The FreeBSD Foundation
2019-06-14 18:14:51 +00:00
Alan Somers
dff3a6b410 fusefs: fix a bug with WriteBack cacheing
An errant vfs_bio_clrbuf snuck in in r348931.  Surprisingly, it doesn't have
any effect most of the time.  But under some circumstances it cause the
buffer to behave in a write-only fashion.

Sponsored by:	The FreeBSD Foundation
2019-06-13 19:07:03 +00:00
Alan Somers
93c0c1d4ce fusefs: fix a page fault with writeback cacheing
When truncating a file downward through a dirty buffer, it's neccessary to
update the buffer's b->dirtyend.

Sponsored by:	The FreeBSD Foundation
2019-06-11 23:46:31 +00:00
Alan Somers
a87e0831ab fusefs: WIP fixing writeback cacheing
The current "writeback" cache mode, selected by the
vfs.fusefs.data_cache_mode sysctl, doesn't do writeback cacheing at all.  It
merely goes through the motions of using buf(9), but then writes every
buffer synchronously.  This commit:

* Enables delayed writes when the sysctl is set to writeback cacheing
* Fixes a cache-coherency problem when extending a file whose last page has
  just been written.
* Removes the "sync" mount option, which had been set unconditionally.
* Adjusts some SDT probes
* Adds several new tests that mimic what fsx does but with more control and
  without a real file system.  As I discover failures with fsx, I add
  regression tests to this file.
* Adds a test that ensures we can append to a file without reading any data
  from it.

This change is still incomplete.  Clustered writing is not yet supported,
and there are frequent "panic: vm_fault_hold: fault on nofault entry" panics
that I need to fix.

Sponsored by:	The FreeBSD Foundation
2019-06-11 16:32:33 +00:00
Alan Somers
ddc51e453e fusefs: remove some stuff that was copy/pasted from nfsclient
fusefs's I/O methods were originally copy/pasted from nfsclient.  This
commit removes some irrelevant parts, like stuff involving B_NEEDCOMMIT.

Sponsored by:	The FreeBSD Foundation
2019-06-06 20:35:41 +00:00
Alan Somers
0269ae4c19 MFHead @348740
Sponsored by:	The FreeBSD Foundation
2019-06-06 16:20:50 +00:00
Alan Somers
011bca9948 fusefs: simplify fuse_write_biobackend. No functional change.
Sponsored by:	The FreeBSD Foundation
2019-06-05 20:18:56 +00:00
Konstantin Belousov
3c93d22758 Manually clear text references on reclaim for nullfs and tmpfs.
Both filesystems do no use vnode_pager_dealloc() which would handle
this case otherwise.  Nullfs because vnode vm_object handle never
points to nullfs vnode.  Tmpfs because its vm_object is never vnode
object at all.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-06-05 20:16:25 +00:00
Alan Somers
a639731ba9 fusefs: respect RLIMIT_FSIZE
Sponsored by:	The FreeBSD Foundation
2019-06-03 23:24:07 +00:00
Alan Somers
6ff7f297f8 fusefs: don't require FUSE_EXPORT_SUPPORT for async invalidation
In r348560 I thought that FUSE_EXPORT_SUPPORT was required for cases where
the node to be invalidated (or the parent of the entry to be invalidated)
wasn't cached.  But I realize now that that's not the case.  During entry
invalidation, if the parent isn't in the vfs hash table, then it must've
been reclaimed.  And since fuse_vnop_reclaim does a cache_purge, that means
the entry to be invalidated has already been removed from the namecache.
And during inode invalidation, if the inode to be invalidated isn't in the
vfs hash table, then it too must've been reclaimed.  In that case it will
have no buffer cache to invalidate.

Sponsored by:	The FreeBSD Foundation
2019-06-03 20:45:32 +00:00
Alan Somers
eae1ae132c fusefs: support asynchronous cache invalidation
Protocol 7.12 adds a way for the server to notify the client that it should
invalidate an inode's data cache and/or attributes.  This commit implements
that mechanism.  Unlike Linux's implementation, ours requires that the file
system also supports FUSE_EXPORT_SUPPORT (NFS-style lookups).  Otherwise the
invalidation operation will return EINVAL.

Sponsored by:	The FreeBSD Foundation
2019-06-03 17:34:01 +00:00
Alan Somers
c2d70d6e6f fusefs: support name cache invalidation
Protocol 7.12 adds a way for the server to notify the client that it should
invalidate an entry from its name cache.  This commit implements that
mechanism.

Sponsored by:	The FreeBSD Foundation
2019-06-01 00:11:19 +00:00
Alan Somers
0d2bf48996 fusefs: check the vnode cache when looking up files for the NFS server
FUSE allows entries to be cached for a limited amount of time.  fusefs's
vnop_lookup method already implements that using the timeout functionality
of cache_lookup/cache_enter_time.  However, lookups for the NFS server go
through a separate path: vfs_vget.  That path can't use the same timeout
functionality because cache_lookup/cache_enter_time only work on pathnames,
whereas vfs_vget works by inode number.

This commit adds entry timeout information to the fuse vnode structure, and
checks it during vfs_vget.  This allows the NFS server to take advantage of
cached entries.  It's also the same path that FUSE's asynchronous cache
invalidation operations will use.

Sponsored by:	The FreeBSD Foundation
2019-05-31 21:22:58 +00:00
Rick Macklem
6aab442af9 Get rid of extraneous initialization.
Get rid of an extraneous initialization, mainly to keep a static analyser
happy. No semantic change.

PR:		238167
Submitted by:	Alexey Dokuchaev
2019-05-31 03:13:09 +00:00
Rick Macklem
26fd36b29d Clean up silly code case.
This silly code segment has existed in the sources since it was brought
into FreeBSD 10 years ago. I honestly have no idea why this was done.
It was possible that I thought that it might have been better to not
set B_ASYNC for the "else" case, but I can't remember.
Anyhow, this patch gets rid of the if/else that does the same thing
either way, since it looks silly and upsets a static analyser.
This will have no semantic effect on the NFS client.

PR:		238167
2019-05-31 00:56:31 +00:00
Alan Somers
a4856c96d0 fusefs: raise protocol level to 7.12
This commit raises the protocol level and adds backwards-compatibility code
to handle structure size changes.  It doesn't implement any new features.
The new features added in protocol 7.12 are:

* server-side umask processing (which FreeBSD won't do)
* asynchronous inode and directory entry invalidation (which I'll do next)

Sponsored by:	The FreeBSD Foundation
2019-05-29 16:39:52 +00:00
Alan Somers
e039bafa87 fusefs: add comments explaining why 7.11 features aren't implemented
Protocol 7.11 adds two new features, but neither of them were defined
correctly.  FUSE_IOCTL messages don't work for 32-bit daemons on a 64-bit
host (fixed in protocol 7.16).  FUSE_POLL is basically unusable until 7.21.
Before 7.21, the client can't choose which events to register for; the
client registers for "something" and the server replies to say which events
the client is registered for.  Also, before 7.21 there was no way for a
client to deregister a file handle.

Sponsored by:	The FreeBSD Foundation
2019-05-29 02:03:08 +00:00
Alan Somers
9c62bc7045 fusefs: raise protocol level to 7.11
This commit adds the definitions for protocol 7.11 but doesn't yet implement
the new features.  The new features are optional, so they can come later.

Sponsored by:	The FreeBSD Foundation
2019-05-29 00:54:49 +00:00
Alan Somers
3f105d16a0 fusefs: raise protocol level to 7.10
Protocol version 7.10 has only one new feature, and I'm choosing not to
implement it, so this commit is basically a noop.  The sole new feature is
the FOPEN_NONSEEKABLE flag, which a fuse file system can return to indicate
that a certain file handle cannot be seeked.  However, I'm unaware of any
file system in ports that uses this flag.

Sponsored by:	The FreeBSD Foundation
2019-05-29 00:01:36 +00:00
Johannes Lundberg
1e363d64a5 pseudofs: Ignore unsupported commands in vop_setattr.
Users of pseudofs (e.g. lindebugfs), should be able to receive
input from command line via commands like "echo 1 > /path/to/file".
Currently this fails because sh tries to truncate the file first and
vop_setattr returns not supported error for this. This patch simply
ignores the error and returns 0 instead.

Reviewed by:	imp (mentor), asomers
Approved by:	imp (mentor), asomers
MFC after:	1 week
Differential Revision: D20451
2019-05-28 20:54:59 +00:00
Alan Somers
d4fd0c8148 fusefs: set the flags fields of fuse_write_in and fuse_read_in
These fields are supposed to contain the file descriptor flags as supplied
to open(2) or set by fcntl(2).  The feature is kindof useless on FreeBSD
since we don't supply all of these flags to fuse (because of the weak
relationship between struct file and struct vnode).  But we should at least
set the access mode flags (O_RDONLY, etc).

This is the last fusefs change needed to get full protocol 7.9 support.
There are still a few options we don't support for good reason (mandatory
file locking is dumb, flock support is broken in the protocol until 7.17,
etc), but there's nothing else to do at this protocol level.

Sponsored by:	The FreeBSD Foundation
2019-05-28 01:09:19 +00:00
Alan Somers
8aa24ed381 fusefs: flock(2) locks must be implemented in-kernel
If a FUSE file system sets the FUSE_POSIX_LOCKS flag then it can support
fcntl(2)-style locks directly.  However, the protocol does not adequately
support flock(2)-style locks until revision 7.17.  They must be implemented
locally in-kernel instead.  This unfortunately breaks the interoperability
of fcntl(2) and flock(2) locks for file systems that support the former.
C'est la vie.

Prior to this commit flock(2) would get sent to the server as a
fcntl(2)-style lock with the lock owner field set to stack garbage.

Sponsored by:	The FreeBSD Foundation
2019-05-28 00:03:46 +00:00
Alan Somers
9bcecf0fd7 fusefs: clear fuse_getattr_in.getattr_flags
Protocol 7.9 adds this field.  We could use it to store the file handle of
the file whose attributes we're requesting.  However, that requires extra
work at runtime to look up a file handle, and I'm not aware of any file
systems that care.  So it's easiest just to clear it.

Sponsored by:	The FreeBSD Foundation
2019-05-27 22:25:39 +00:00
Alan Somers
bda39894c5 fusefs: set FUSE_WRITE_CACHE when writing from cache
This bit tells the server that we're not sure which uid, gid, and/or pid
originated the write.  I don't know of a single file system that cares, but
it's part of the protocol.

Sponsored by:	The FreeBSD Foundation
2019-05-27 21:36:28 +00:00
Alan Somers
93fecd02a1 fusefs: misc build fixes
* Only build the tests on platforms with C++14 support
* Fix an undefined symbol error on lint builds
* Remove an unused function: fiov_clear

Sponsored by:	The FreeBSD Foundation
2019-05-25 21:40:27 +00:00
Alan Somers
65417f5e27 Remove "struct ucred*" argument from vtruncbuf
vtruncbuf takes a "struct ucred*" argument. AFAICT, it's been unused ever
since that function was first added in r34611. Remove it.  Also, remove some
"struct ucred" arguments from fuse and nfs functions that were only used by
vtruncbuf.

Reviewed by:	cem
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20377
2019-05-24 20:27:50 +00:00
Alan Somers
e97ae4ad2d fusefs: implement FUSE_ASYNC_READ
If a daemon sets the FUSE_ASYNC_READ flag during initialization, then the
client is allowed to issue multiple concurrent reads for the same file
handle.  Otherwise concurrent reads are not allowed.  This commit implements
it.  Previously we unconditionally disallowed concurrent reads.

Sponsored by:	The FreeBSD Foundation
2019-05-24 05:12:43 +00:00
Alan Somers
ad587bc5df fusefs: fix some garbage left behind by r348209
Sponsored by:	The FreeBSD Foundation
2019-05-24 00:56:50 +00:00
Alan Somers
e76986fde0 fusefs: fix exporting fuse filesystems with nfsd
A previous commit made fuse exportable via userland NFS servers.
Compatibility with the in-kernel nfsd required two more changes:

* During read and write operations, implicitly do a FUSE_OPEN if there isn't
  already a valid file handle.  That's because nfsd never calls VOP_OPEN.
* During VOP_READDIR, if an implicit open was necessary, directory offsets
  from a previous VOP_READDIR may not be valid, so VOP_READDIR may have to
  start from the beginning and read until it encounters the requested
  offset.

I've done only limited testing over NFS, so there are probably still some
more bugs.  Thanks to rmacklem for all of the readdir changes, which he had
made for his pnfs work.

Sponsored by:	The FreeBSD Foundation
2019-05-23 23:06:26 +00:00
Alan Somers
db7b0e747f fusefs: assume the mountpoint's generation is 0
This seems to be libfuse's behavior (its documentation notwithstanding).

Sponsored by:	The FreeBSD Foundation
2019-05-23 22:57:57 +00:00
Alan Somers
e5b50fe736 fusefs: Make fuse file systems NFS-exportable
This commit adds the VOPs needed by userspace NFS servers (tested with
net/unfs3).  More work is needed to make the in-kernel nfsd work, because of
its stateless nature.  It doesn't open files prior to doing I/O.  Also, the
NFS-related VOPs currently ignore the entry cache.

Sponsored by:	The FreeBSD Foundation
2019-05-23 00:44:01 +00:00
Alan Somers
2013b723d3 fusefs: improve attribute cacheing
Consolidate all calls to fuse_vnode_setsize as a result of a file attribute
change to one location in fuse_internal_setattr.  There are still a few
calls elsewhere that happen as a result of a write.

Sponsored by:	The FreeBSD Foundation
2019-05-23 00:22:03 +00:00
Alan Somers
18a2264e27 fusefs: fix "recursing on non recursive lockmgr" panic
When mounted with -o default_permissions and when
vfs.fusefs.data_cache_mode=2, fuse_io_strategy would try to clear the suid
bit after a successful write by a non-owner.  When combined with a
not-yet-committed attribute-caching patch I'm working on, and if the
FUSE_SETATTR response indicates an unexpected filesize (legal, if the file
system has other clients), this would end up calling vtruncbuf.  That would
panic, because the buffer lock was already held by bufwrite or bufstrategy
or something else upstack from fuse_vnop_strategy.

Sponsored by:	The FreeBSD Foundation
2019-05-22 23:30:51 +00:00
Alan Somers
b6b7fe7c7d fusefs: remove the vfs.fusefs.sync_resize syctl, correctly this time
In r347547 I intended to remove the vfs.fusefs.sync_resize sysctl, leaving
fusefs's behavior as though sync_resize had its default value.  But I forgot
that I had already turned off sync_resize in my development system's
/etc/sysctl.conf.

This commit complete removes the optional behavior that was formerly
controlled by sync_resize.  There's no need for explicitly calling
FUSE_SETATTR after every FUSE_WRITE that extends a file.  The daemon can
infer that the file is being extended.  If this sysctl was added as a
workaround for a buggy daemon, there's no clue as to what that daemon may
have been.

Sponsored by:	The FreeBSD Foundation
2019-05-22 19:49:25 +00:00
Conrad Meyer
daec92844e Include ktr.h in more compilation units
Similar to r348026, exhaustive search for uses of CTRn() and cross reference
ktr.h includes.  Where it was obvious that an OS compat header of some kind
included ktr.h indirectly, .c files were left alone.  Some of these files
clearly got ktr.h via header pollution in some scenarios, or tinderbox would
not be passing prior to this revision, but go ahead and explicitly include it
in files using it anyway.

Like r348026, these CUs did not show up in tinderbox as missing the include.

Reported by:	peterj (arm64/mp_machdep.c)
X-MFC-With:	r347984
Sponsored by:	Dell EMC Isilon
2019-05-21 20:38:48 +00:00
Alan Somers
a6fac00c53 fusefs: Allow update mounts
Allow "mount -u" to change some mount options for fusefs.

Sponsored by:	The FreeBSD Foundation
2019-05-21 19:34:39 +00:00
Alan Somers
d311d6c467 fusefs: eliminate a superfluous fuse_node_setparent
Sponsored by:	The FreeBSD Foundation
2019-05-20 20:55:01 +00:00
Alan Somers
6f8114adcc fusefs: unset MNT_LOCAL
The kernel can't tell whether or not a fuse file system is truly local.  But
what really matters is two things:

1) Can I/O to a file system block indefinitely?
2) Can the file system bypass the O_BENEATH restriction during lookup?

For fuse, the answer to both of those question is yes.  So as far as the
kernel is concerned, it's a non-local file system.

Sponsored by:	The FreeBSD Foundation
2019-05-20 20:54:09 +00:00
Conrad Meyer
e2e050c8ef Extract eventfilter declarations to sys/_eventfilter.h
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.

EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).

As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions.  The remainder of the patch addresses
adding appropriate includes to fix those files.

LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).

No functional change (intended).  Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed.  __FreeBSD_version has been bumped.
2019-05-20 00:38:23 +00:00
Alan Somers
fe221e0177 fusefs: forward UTIME_NOW to the server
If a user sets both atime and mtime to UTIME_NOW when calling a syscall like
utimensat(2), allow the server to choose what "now" means.  Due to the
design of FreeBSD's VFS, it's not possible to do this for just one of atime
or mtime; it's all or none.

PR:		237181
Sponsored by:	The FreeBSD Foundation
2019-05-16 23:17:39 +00:00
Alan Somers
e7f73af118 fusefs: allow the server to specify st_blksize
If the server sets fuse_attr.blksize to a nonzero value in the response to
FUSE_GETATTR, then the client should use that as the value for
stat.st_blksize .

Sponsored by:	The FreeBSD Foundation
2019-05-16 22:50:04 +00:00
Alan Somers
16bd2d47c7 fusefs: Upgrade FUSE protocol to version 7.9.
This commit upgrades the FUSE API to protocol 7.9 and adds unit tests for
backwards compatibility with servers built for version 7.8.  It doesn't
implement any of 7.9's new features yet.

Sponsored by:	The FreeBSD Foundation
2019-05-16 17:24:11 +00:00
Alan Somers
96192dfce0 fusefs: diff reduction vs the upstream sources
fuse_kernel.h defines the structures used by the FUSE protocol.  Originally
it came from libfuse, but the current source of truth is the Linux kernel.
This commit minimizes the diffs between our version and the Linux version as
of 21f3da95d (protocol version 7.8).

The flags field of struct fuse_listxattr_out and fuse_listxattr_in was an
error in our header.  Those fields don't exist in Linux or libfuse, and
they've never been used in FreeBSD.  In fact, those structs don't even exist
in Linux and libfuse; those projects confusingly overload the identical
fuse_getexattr_in and fuse_getxattr_out structs.

Sponsored by:	The FreeBSD Foundation
2019-05-15 22:51:25 +00:00
Alan Somers
3d15b234a4 fusefs: don't track a file's size in two places
fuse_vnode_data.filesize was mostly redundant with
fuse_vnode_data.cached_attrs.st_size, but didn't have exactly the same
meaning.  It was very confusing.  This commit eliminates the former.  It
also eliminates fuse_vnode_refreshsize, which ignored the cache timeout
value.

Sponsored by:	The FreeBSD Foundation
2019-05-15 00:38:52 +00:00
Alan Somers
96658124d7 fusefs: eliminate superfluous FUSE_GETATTR when filesize=0
fuse_vnode_refreshsize was using 0 as a flag value for filesize meaning
"uninitialized" (thanks to the malloc(...M_ZERO) in fuse_vnode_alloc.  But
this led to unnecessary getattr operations when the filesize legitimately
happened to be zero.  Fix by adding a distinct flag value.

Sponsored by:	The FreeBSD Foundation
2019-05-13 23:30:06 +00:00
Alan Somers
5940f822ae fusefs: remove the vfs.fusefs.data_cache_invalidate sysctl
This sysctl was added > 6.5 years ago and I don't know why.  The description
seems at odds with the code.  While it's supposed to "discard clean cached
data" during VOP_INACTIVE, it looks like it would discard any cached data,
clean or otherwise.

Sponsored by:	The FreeBSD Foundation
2019-05-13 20:57:21 +00:00
Alan Somers
fcefa6ef66 fusefs: remove the vfs.fusefs.mmap_enable sysctl
This sysctl was added > 6.5 years ago for no clear reason.  Perhaps it was
intended to gate an unstable feature?  But now there's no reason to globally
disable mmap.  I'm not deleting the -ono_mmap mount option just yet, because
it might be useful as a workaround for bug 237588.

Sponsored by:	The FreeBSD Foundation
2019-05-13 20:42:09 +00:00
Alan Somers
515183969d fusefs: remove the vfs.fusefs.refresh_size sysctl
This was added > 6.5 years ago with no evident reason why.  It probably had
something to do with the incomplete cached attribute implementation.  But
cache attributes work now.  I see no reason to retain this sysctl.

Sponsored by:	The FreeBSD Foundation
2019-05-13 20:31:10 +00:00
Alan Somers
4d09e76a73 fusefs: remove the vfs.fusefs.sync_resize syctl
This sysctl was added > 6.5 years ago for no clear purpose.  I'm guessing
that it may have had something to do with the incomplete attribute cache.
But the attribute cache works now.  Since there's no clear motivation for
this sysctl, it's best to remove it.

Sponsored by:	The FreeBSD Foundation
2019-05-13 19:47:31 +00:00
Alan Somers
bad4c94dc8 fusefs: remove the vfs.fusefs.fix_broken_io sysctl
This looks like it may have been a workaround for a specific buggy FUSE
filesystem.  However, there's no information about what that bug may have
been, and the workaround is > 6.5 years old, so I consider the sysctl to be
unmaintainable.

Sponsored by:	The FreeBSD Foundation
2019-05-13 19:31:09 +00:00
Alan Somers
4abf87666a fusefs: reap dead sysctls
Remove the "sync_unmount" and "init_backgrounded" sysctls and the associated
options from mount_fusefs.  Add no backwards-compatibility hidden options to
mount_fusefs because these options never had any effect, and are therefore
unlikely to be used.

Sponsored by:	The FreeBSD Foundation
2019-05-13 19:03:46 +00:00
Alan Somers
7648bc9fee MFHead @347527
Sponsored by:	The FreeBSD Foundation
2019-05-13 18:25:55 +00:00
Alan Somers
8350cbd8d6 [skip ci] fusefs: remove an obsolete comment
Sponsored by:	The FreeBSD Foundation
2019-05-13 15:39:54 +00:00
Alan Somers
f82e92e52b fusefs: enhance an SDT probe added in r346998
Sponsored by:	The FreeBSD Foundation
2019-05-13 15:39:19 +00:00
Alan Somers
0a7c63e075 fusefs: Report the number of available ops in kevent(2)
Just like /dev/devctl, /dev/fuse will now report the number of operations
available for immediate read in the kevent.data field during kevent(2).

Sponsored by:	The FreeBSD Foundation
2019-05-12 15:27:18 +00:00
Alan Somers
3429092cd1 fusefs: support kqueue for /dev/fuse
/dev/fuse was already pollable with poll and select.  Add support for
kqueue, too.  And add tests for polling with poll, select, and kqueue.

Sponsored by:	The FreeBSD Foundation
2019-05-11 22:58:25 +00:00
Alan Somers
7e0aac2408 fusefs: return ENOTCONN instead of EIO if the daemon dies suddenly
If the daemon dies, return ENOTCONN for all operations that have already
been sent to the daemon, as well as any new ones.

Sponsored by:	The FreeBSD Foundation
2019-05-10 16:41:33 +00:00
Alan Somers
d5024ba275 fusefs: minor optimization to interrupted fuse operations
If the daemon is known to ignore FUSE_INTERRUPT, then we may as well block
all signals while waiting for a response.

Sponsored by:	The FreeBSD Foundation
2019-05-10 16:31:51 +00:00
Alan Somers
8b73a4c5ae fusefs: fix running multiple daemons concurrently
When a FUSE daemon dies or closes /dev/fuse, all of that daemon's pending
requests must be terminated.  Previously that was done in /dev/fuse's
.d_close method.  However, d_close only gets called on the *last* close of
the device.  That means that if multiple daemons were running concurrently,
all but the last daemon to close would leave their I/O hanging around.  The
problem was easily visible just by running "kyua -v parallelism=2 test" in
fusefs's test directory.

Fix this bug by terminating a daemon's pending I/O during /dev/fuse's
cdvpriv dtor method instead.  That method runs on every close of a file.

Also, fix some potential races in the tests:
* Clear SA_RESTART when registering the daemon's signal handler so read(2)
  will return EINTR.
* Wait for the daemon to die before unmounting the mountpoint, so we won't
  see an unwanted FUSE_DESTROY operation in the mock file system.

Sponsored by:	The FreeBSD Foundation
2019-05-10 15:02:29 +00:00
Alan Somers
d5ff268834 fusefs: create sockets with FUSE_MKNOD, not FUSE_CREATE
libfuse expects sockets to be created with FUSE_MKNOD, not FUSE_CREATE,
because that's how Linux does it.  My first attempt at creating sockets
(r346894) used FUSE_CREATE because FreeBSD uses VOP_CREATE for this purpose.
There are no backwards-compatibility concerns with this change, because
socket support hasn't yet been merged to head.

Sponsored by:	The FreeBSD Foundation
2019-05-09 16:25:01 +00:00
Alan Somers
002e54b0aa fusefs: clear a dir's attr cache when its contents change
Any change to a directory's contents should cause its mtime and ctime to be
updated by the FUSE daemon.  Clear its attribute cache so we'll get the new
attributs the next time that they're needed.  This affects the following
VOPs: VOP_CREATE, VOP_LINK, VOP_MKDIR, VOP_MKNOD, VOP_REMOVE, VOP_RMDIR, and
VOP_SYMLINK

Reported by:	pjdfstest
Sponsored by:	The FreeBSD Foundation
2019-05-09 01:16:34 +00:00
Alan Somers
8e45ec4e64 fusefs: fix a permission handling bug during VOP_RENAME
If the file to be renamed is a directory and it's going to get a new parent,
then the user must have write permissions to that directory, because the
".." dirent must be changed.

Reported by:	pjdfstest
Sponsored by:	The FreeBSD Foundation
2019-05-08 22:28:13 +00:00
Alan Somers
d943c93e76 fusefs: allow non-owners to set timestamps to UTIME_NOW
utimensat should allow anybody with write access to set atime and mtime to
UTIME_NOW.

PR:		237181
Sponsored by:	The FreeBSD Foundation
2019-05-08 19:42:00 +00:00
Alan Somers
4ae3a56cb1 fusefs: updated cached attributes during VOP_LINK.
FUSE_LINK returns a new set of attributes.  fusefs should cache them just
like it does during other VOPs.  This is not only a matter of performance
but of correctness too; without caching the new attributes the vnode's nlink
value would be out-of-date.

Reported by:	pjdfstest
Sponsored by:	The FreeBSD Foundation
2019-05-08 18:12:38 +00:00
Alan Somers
a2bdd7379b fusefs: drop suid after a successful chown by a non-root user
Drop sgid too.  Also, drop them after a successful chgrp.

Reported by:	pjdfstest
Sponsored by:	The FreeBSD Foundation
2019-05-07 22:38:13 +00:00
Alan Somers
4e83d6555e fusefs: allow the null chown and null chgrp
Even an unprivileged user should be able to chown a file to its current
owner, or chgrp it to its current group.  Those are no-ops.

Reported by:	pjdfstest
Sponsored by:	The FreeBSD Foundation
2019-05-07 01:27:23 +00:00
Alan Somers
1c8a5f5e39 fusefs: disable posix_fallocate
fuse file systems have far too much variability for the standard
posix_fallocate implementation to work.  A future protocol revision (7.19)
adds a FUSE_FALLOCATE operation, but we don't support that yet.  Better to
simply return EINVAL until then.

Reported by:	pjdfstest
Sponsored by:	The FreeBSD Foundation
2019-05-07 00:03:05 +00:00
Alan Somers
3fa127896b fusefs: allow ftruncate on files without write permission
ftruncate should succeed as long as the file descriptor is writable, even if
the file doesn't have write permission.  This is important when combined
with O_CREAT.

Reported by:	pjdfstest
Sponsored by:	The FreeBSD Foundation
2019-05-06 20:46:58 +00:00
Alan Somers
8cfb44315a fusefs: Fix another obscure permission handling bug
Don't allow unprivileged users to set SGID on files to whose group they
don't belong.  This is slightly different than what POSIX says we should do
(clear sgid on return from a successful chmod), but it matches what UFS
currently does.

Reported by:	pjdfstest
Sponsored by:	The FreeBSD Foundation
2019-05-06 16:54:35 +00:00
Alan Somers
a90e32de25 fusefs: clear SUID & SGID after a successful write by a non-owner
Reported by:	pjdfstest
Sponsored by:	The FreeBSD Foundation
2019-05-06 16:17:55 +00:00
Alan Somers
ac0a68e9cd fusefs: don't allow truncating irregular files on an read-only mount
The readonly mount check had a special case allowing the sizes of files to
be changed if they weren't regular files.  I don't know why.  Neither UFS,
ZFS, nor ext2 have such a special case, and I don't know when you would ever
change the size of a non-regular file anyway.

Sponsored by:	The FreeBSD Foundation
2019-05-06 15:20:18 +00:00
Konstantin Belousov
391918a3c1 Do not flush NFS node from NFS VOP_SET_TEXT().
The more appropriate place to do the flushing is VOP_OPEN().  This was
uncovered because VOP_SET_TEXT() is now called with the vnode'
vm_object rlocked, which is incompatible with the flush operations.

After the move, there is no need for NFS-specific VOP_SET_TEXT
overload.

Sponsored by:	The FreeBSD Foundation
MFC after:	30 days
2019-05-06 08:49:43 +00:00
Konstantin Belousov
78022527bb Switch to use shared vnode locks for text files during image activation.
kern_execve() locks text vnode exclusive to be able to set and clear
VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0
condition.

The change removes VV_TEXT, replacing it with the condition
v_writecount <= -1, and puts v_writecount under the vnode interlock.
Each text reference decrements v_writecount.  To clear the text
reference when the segment is unmapped, it is recorded in the
vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and
v_writecount is incremented on the map entry removal

The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that
v_writecount does not contradict the desired change.  vn_writecheck()
is now racy and its use was eliminated everywhere except access.
Atomic check for writeability and increment of v_writecount is
performed by the VOP.  vn_truncate() now increments v_writecount
around VOP_SETATTR() call, lack of which is arguably a bug on its own.

nullfs bypasses v_writecount to the lower vnode always, so nullfs
vnode has its own v_writecount correct, and lower vnode gets all
references, since object->handle is always lower vnode.

On the text vnode' vm object dealloc, the v_writecount value is reset
to zero, and deadfs vop_unset_text short-circuit the operation.
Reclamation of lowervp always reclaims all nullfs vnodes referencing
lowervp first, so no stray references are left.

Reviewed by:	markj, trasz
Tested by:	mjg, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D19923
2019-05-05 11:20:43 +00:00
Alan Somers
e5ff3a7e28 fusefs: only root may set the sticky bit on a non-directory
PR:		216391
Reported by:	pjdfstest
Sponsored by:	The FreeBSD Foundation
2019-05-04 16:27:58 +00:00
Alan Somers
61b0a927cb fusefs: use effective gid, not real gid, for FUSE operations
This is the gid used for stuff like setting the group of a newly created
file.

Reported by:	pjdfstest
Sponsored by:	The FreeBSD Foundation
2019-05-04 02:11:28 +00:00
Alan Somers
72f03b7ccd fusefs: fix "returning with lock held" panics in fuse_vnode_alloc
These panics all lie in the error path.  The only one I've hit is caused by
a buggy FUSE server unexpectedly changing the type of a vnode.

Sponsored by:	The FreeBSD Foundation
2019-05-01 17:27:04 +00:00
Alan Somers
93198e64fb fusefs: fix a memory leak from r346979
PR:		216391
Sponsored by:	The FreeBSD Foundation
2019-05-01 17:24:53 +00:00
Alan Somers
474ba6fa3b fusefs: fix some permission checks with -o default_permissions
When mounted with -o default_permissions fusefs is supposed to validate all
permissions in the kernel, not the file system.  This commit fixes two
permissions that I had previously overlooked.

* Only root may chown a file
* Non-root users may only chgrp a file to a group to which they belong

PR:		216391
Sponsored by:	The FreeBSD Foundation
2019-05-01 00:00:49 +00:00
Alan Somers
ede571e40a fusefs: support unix-domain sockets
Also, fix the teardown of the Fifo.read_write test

Sponsored by:	The FreeBSD Foundation
2019-04-29 16:24:51 +00:00
Alan Somers
f9b0e30ba7 fusefs: FIFO support
Sponsored by:	The FreeBSD Foundation
2019-04-29 01:40:35 +00:00
Alan Somers
9c7ec33162 fusefs: fix a deadlock in VOP_PUTPAGES
As of r346162 fuse now invalidates the cache during writes.  But it can't do
that when writing from VOP_PUTPAGES, because the write is coming _from_ the
cache.  Trying to invalidate the cache in that situation causes a deadlock
in vm_object_page_remove, because the pages in question have already been
busied by the same thread.

PR:		235774
Sponsored by:	The FreeBSD Foundation
2019-04-26 19:47:43 +00:00
Alan Somers
102c7ac083 fusefs: handle ENOSYS for FUSE_INTERRUPT
Though it's not documented, Linux will interpret a FUSE_INTERRUPT response
of ENOSYS as "the file system does not support FUSE_INTERRUPT".
Subsequently it will never send FUSE_INTERRUPT again to the same mount
point.  This change matches Linux's behavior.

PR:		346357
Sponsored by:	The FreeBSD Foundation
2019-04-24 17:30:50 +00:00
Alan Somers
ebbfe00ec2 fusefs: interruptibility improvements suggested by kib
* Block stop signals in fticket_wait_answer
* Hold ps_mtx while checking signal disposition
* style(9) changes

PR:		346357
Reported by:	kib
Sponsored by:	The FreeBSD Foundation
2019-04-24 15:54:18 +00:00
Alan Somers
21d4686c5c fusefs: diff reduction between fuse_read_biobackend and ext_read
The main difference is to replace some custom logic with bread.  No
functional change at this point, but this is one step towards adding
readahead.

Sponsored by:	The FreeBSD Foundation
2019-04-23 22:34:32 +00:00
Alan Somers
bad3de4365 fusefs: use vfs_bio_clrbuf in fuse_vnode_setsize
Reuse fuse_vnode_setsize instead of reinventing the wheel.  This is what
ext2_ind_truncate does.

PR:		233783
Sponsored by:	The FreeBSD Foundation
2019-04-23 22:25:50 +00:00
Rick Macklem
a6f77c9a6e Add #ifdef INET as requested by bz@. 2019-04-21 22:53:51 +00:00
Alan Somers
419e7ff674 fusefs: rename the SDT probes from "fuse" to "fusefs"
This matches the new name of the kld.

Sponsored by:	The FreeBSD Foundation
2019-04-20 00:04:31 +00:00
Rick Macklem
b4372164ed Add support for the ModeSetMasked attribute to the NFSv4.1 server.
I do not know of an extant NFSv4.1 client that currently does a Setattr
operation for the ModeSetMasked, but it has been discussed on the linux-nfs
mailing list.
This patch adds support for doing a Setattr of ModeSetMasked, so that it
will work for any future NFSv4.1 client that chooses to do so.
Tested via a hacked FreeBSD NFSv4.1 client.

MFC after:	2 weeks
2019-04-19 23:35:08 +00:00
Rick Macklem
b4645807af Replace "vp" with NULL to make the code more readable.
At the time of this nfsv4_sattr() call, "vp == NULL", so this patch doesn't
change the semantics, but I think it makes the code more readable.
It also makes it consistent with the nfsv4_sattr() call a few lines above
this one. Found during code inspection.

MFC after:	2 weeks
2019-04-19 23:27:23 +00:00
Alan Somers
4423ae76ca fusefs: reap dead code
Sponsored by:	The FreeBSD Foundation
2019-04-19 23:04:07 +00:00
Alan Somers
268c28edbc fusefs: give priority to FUSE_INTERRUPT operations
When interrupting a FUSE operation, send the FUSE_INTERRUPT op to the daemon
ASAP, ahead of other unrelated operations.

PR:		236530
Sponsored by:	The FreeBSD Foundation
2019-04-19 21:50:23 +00:00
Alan Somers
f0f7fc1be4 fusefs: fix interrupting FUSE_SETXATTR
fusefs's VOP_SETEXTATTR calls uiomove(9) before blocking, so it can't be
restarted.  It must be interrupted instead.

PR:		236530
Sponsored by:	The FreeBSD Foundation
2019-04-19 20:31:12 +00:00
Alan Somers
3d070fdc76 fusefs: don't send FUSE_INTERRUPT for ops that are still in-kernel
If a pending FUSE operation hasn't yet been sent to the daemon, then there's
no reason to inform the daemon that it's been interrupted.  Instead, simply
remove it from the fuse message queue and set its status to EINTR or
ERESTART as appropriate.

PR:		346357
Sponsored by:	The FreeBSD Foundation
2019-04-19 15:05:32 +00:00
Rick Macklem
ea5776ec47 Fix the NFSv4.0 server so that it does not support NFSv4.1 attributes.
During inspection of a packet trace, I noticed that an NFSv4.0 mount
reported that it supported attributes that are only defined for NFSv4.1.
In practice, this bug appears to be benign, since NFSv4.0 clients will
not use attributes that were added for NFSv4.1.
However, this was not correct and this patch fixes the NFSv4.0 server
so that it only supports attributes defined for NFSv4.0.
It also adds a definition for NFSv4.1 attributes that can only be set,
although it is only defined as 0 for now.
This is anticipation of the addition of support for the NFSv4.1 mode+mask
attribute soon.

MFC after:	2 weeks
2019-04-19 03:36:22 +00:00
Alan Somers
a154214620 fusefs: improvements to interruptibility
* If a process receives a fatal signal while blocked on a fuse operation,
  return ASAP without waiting for the operation to complete.  But still send
  the FUSE_INTERRUPT op to the daemon.
* Plug memory leaks from r346339

Interruptibility is now fully functional, but it could be better:
* Operations that haven't been sent to the server yet should be aborted
  without sending FUSE_INTERRUPT.
* It would be great if write operations could be made restartable.
  That would require delaying uiomove until the last possible moment, which
  would be sometime during fuse_device_read.
* It would be nice if we didn't have to guess which EAGAIN responses were
  for FUSE_INTERRUPT operations.

PR:		236530
Sponsored by:	The FreeBSD Foundation
2019-04-18 19:16:34 +00:00
Hans Petter Selasky
db92a6cd51 Implement flag for telling cuse(3) clients if the peer is running in 32-bit
compat mode or not. This is useful when implementing compatibility ioctl(2)
handlers in userspace.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-04-18 19:04:07 +00:00
Alan Somers
723c776829 fusefs: WIP making FUSE operations interruptible
The fuse protocol includes a FUSE_INTERRUPT operation that the client can
send to the server to indicate that it wants to abort an in-progress
operation.  It's required to interrupt any syscall that is blocking on a
fuse operation.

This commit adds basic FUSE_INTERRUPT support.  If a process receives any
signal while it's blocking on a FUSE operation, it will send a
FUSE_INTERRUPT and wait for the original operation to complete.  But there
is still much to do:

* The current code will leak memory if the server ignores FUSE_INTERRUPT,
  which many do.  It will also leak memory if the server completes the
  original operation before it receives the FUSE_INTERRUPT.
* An interrupted read(2) will incorrectly appear to be successful.
* fusefs should return immediately for fatal signals.
* Operations that haven't been sent to the server yet should be aborted
  without sending FUSE_INTERRUPT.
* Test coverage should be better.
* It would be great if write operations could be made restartable.
  That would require delaying uiomove until the last possible moment, which
  would be sometime during fuse_device_read.

PR:		236530
Sponsored by:	The FreeBSD Foundation
2019-04-17 23:32:38 +00:00
Fedor Uporov
84b89556b4 ext2fs: Initial version of DTrace support.
Commit forgotten file.

Reviewed by:    pfg, gnn
MFC after:      1 week

Differential Revision:    https://reviews.freebsd.org/D19848
2019-04-16 11:37:15 +00:00