Commit Graph

273 Commits

Author SHA1 Message Date
Alan Somers
9338f18965 fusefs: add a dtrace probe that fires after mounting is complete
This probe is useful for showing the protocol options negotiated with a FUSE
server.

MFC after:	2 weeks
2020-03-30 14:03:35 +00:00
Alan Somers
b0ecfb42d1 fusefs: avoid cache corruption with buggy fuse servers
The FUSE protocol allows the client (kernel) to cache a file's size, if the
server (userspace daemon) allows it. A well-behaved daemon obviously should
not change a file's size while a client has it cached. But a buggy daemon
might. If the kernel ever detects that that has happened, then it should
invalidate the entire cache for that file. Previously, we would not only
cache stale data, but in the case of a file extension while we had the size
cached, we accidentally extended the cache with zeros.

PR:		244178
Reported by:	Ben RUBSON <ben.rubson@gmx.com>
Reviewed by:	cem
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D24012
2020-03-11 04:29:45 +00:00
Alan Somers
d970778e6f fusefs: fix fsync for files with multiple open handles
We were reusing a structure for multiple operations, but failing to
reinitialize one member.  The result is that a server that cares about FUSE
file handle IDs would see one correct FUSE_FSYNC operation, and one with the
FHID unset.

PR:		244431
Reported by:	Agata <chogata@gmail.com>
MFC after:	2 weeks
2020-03-09 01:57:21 +00:00
Pawel Biernacki
ef06a80cdb Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (8 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Approved by:	kib (mentor, blanket)
Differential Revision:	https://reviews.freebsd.org/D23627
2020-02-24 10:33:51 +00:00
Kyle Evans
6a5abb1ee5 Provide O_SEARCH
O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping
permissions checks on the directory itself after the initial open(). This is
close to the semantics we've historically applied for O_EXEC on a directory,
which is UB according to POSIX. Conveniently, O_SEARCH on a file is also
explicitly undefined behavior according to POSIX, so O_EXEC would be a fine
choice. The spec goes on to state that O_SEARCH and O_EXEC need not be
distinct values, but they're not defined to be the same value.

This was pointed out as an incompatibility with other systems that had made
its way into libarchive, which had assumed that O_EXEC was an alias for
O_SEARCH.

This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC
respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a
directory is checked in vn_open_vnode already, so for completeness we add a
NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not
re-check that when descending in namei.

[0] https://pubs.opengroup.org/onlinepubs/9699919799/

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23247
2020-02-02 16:34:57 +00:00
Mateusz Guzik
388820fbef fusefs: add missing CLTFLAG_MPSAFE annotation 2020-01-15 01:31:28 +00:00
Mateusz Guzik
b249ce48ea vfs: drop the mostly unused flags argument from VOP_UNLOCK
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D21427
2020-01-03 22:29:58 +00:00
Mateusz Guzik
6fa079fc3f vfs: flatten vop vectors
This eliminates the following loop from all VOP calls:

while(vop != NULL && \
    vop->vop_spare2 == NULL && vop->vop_bypass == NULL)
        vop = vop->vop_default;

Reviewed by:	jeff
Tesetd by:	pho
Differential Revision:	https://reviews.freebsd.org/D22738
2019-12-16 00:06:22 +00:00
Mateusz Guzik
abd80ddb94 vfs: introduce v_irflag and make v_type smaller
The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.

v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.

Reviewed by:	kib, jeff
Differential Revision:	https://reviews.freebsd.org/D22715
2019-12-08 21:30:04 +00:00
Alan Somers
320c848ff6 Fix an off-by-one error from r351961
That revision addressed a Coverity CID that could lead to a buffer overflow,
but it had an off-by-one error in the buffer size check.

Reported by:	Coverity
Coverity CID:	1405530
MFC after:	3 days
MFC-With:	351961
Sponsored by:	The FreeBSD Foundation
2019-09-16 16:41:01 +00:00
Alan Somers
42767f76af fusefs: fix some minor issues with fuse_vnode_setparent
* When unparenting a vnode, actually clear the flag. AFAIK this is basically
  a no-op because we only unparent a vnode when reclaiming it or when
  unlinking.

* There's no need to call fuse_vnode_setparent during reclaim, because we're
  about to free the vnode data anyway.

Reviewed by:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21630
2019-09-16 14:51:49 +00:00
Alan Somers
6c0c362075 fusefs: Fix iosize for FUSE_WRITE in 7.8 compat mode
When communicating with a FUSE server that implements version 7.8 (or older)
of the FUSE protocol, the FUSE_WRITE request structure is 16 bytes shorter
than normal. The protocol version check wasn't applied universally, leading
to an extra 16 bytes being sent to such servers. The extra bytes were
allocated and bzero()d, so there was no information disclosure.

Reviewed by:	emaste
MFC after:	3 days
MFC-With:	r350665
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21557
2019-09-11 19:29:40 +00:00
Alan Somers
16f8783452 Coverity fixes in fusefs(5)
CID 1404532 fixes a signed vs unsigned comparison error in fuse_vnop_bmap.
It could potentially have resulted in VOP_BMAP reporting too many
consecutive blocks.

CID 1404364 is much worse. It was an array access by an untrusted,
user-provided variable. It could potentially have resulted in a malicious
file system crashing the kernel or worse.

Reported by:	Coverity
Reviewed by:	emaste
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21466
2019-09-06 19:40:11 +00:00
Mark Johnston
9222b82368 Remove unused VM page locking macros.
They were orphaned by r292373.

Reviewed by:	asomers
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21469
2019-08-29 22:13:15 +00:00
Konstantin Belousov
6470c8d3db Rework v_object lifecycle for vnodes.
Current implementation of vnode_create_vobject() and
vnode_destroy_vobject() is written so that it prepared to handle the
vm object destruction for live vnode.  Practically, no filesystems use
this, except for some remnants that were present in UFS till today.
One of the consequences of that model is that each filesystem must
call vnode_destroy_vobject() in VOP_RECLAIM() or earlier, as result
all of them get rid of the v_object in reclaim.

Move the call to vnode_destroy_vobject() to vgonel() before
VOP_RECLAIM().  This makes v_object stable: either the object is NULL,
or it is valid vm object till the vnode reclamation.  Remove code from
vnode_create_vobject() to handle races with the parallel destruction.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D21412
2019-08-29 07:50:25 +00:00
Alan Somers
5e63333052 fusefs: Fix some bugs regarding the size of the LISTXATTR list
* A small error in r338152 let to the returned size always being exactly
  eight bytes too large.

* The FUSE_LISTXATTR operation works like Linux's listxattr(2): if the
  caller does not provide enough space, then the server should return ERANGE
  rather than return a truncated list.  That's true even though in FUSE's
  case the kernel doesn't provide space to the client at all; it simply
  requests a maximum size for the list.  We previously weren't handling the
  case where the server returns ERANGE even though the kernel requested as
  much size as the server had told us it needs; that can happen due to a
  race.

* We also need to ensure that a pathological server that always returns
  ERANGE no matter what size we request in FUSE_LISTXATTR won't cause an
  infinite loop in the kernel.  As of this commit, it will instead cause an
  infinite loop that exits and enters the kernel on each iteration, allowing
  signals to be processed.

Reviewed by:	cem
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21287
2019-08-28 04:19:37 +00:00
Alan Somers
3a79e8e772 fusefs: don't send the namespace during listextattr
The FUSE_LISTXATTR operation always returns the full list of a file's
extended attributes, in all namespaces. There's no way to filter the list
server-side. However, currently FreeBSD's fusefs driver sends a namespace
string with the FUSE_LISTXATTR request. That behavior was probably copied
from fuse_vnop_getextattr, which has an attribute name argument. It's
been there ever since extended attribute support was added in r324620. This
commit removes it.

Reviewed by:	cem
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21280
2019-08-16 05:06:54 +00:00
Alan Somers
bf50749773 fusefs: Fix the size of fuse_getattr_in
In FUSE protocol 7.9, the size of the FUSE_GETATTR request has increased.
However, the fusefs driver is currently not sending the additional fields.
In our implementation, the additional fields are always zero, so I there
haven't been any test failures until now.  But fusefs-lkl requires the
request's length to be correct.

Fix this bug, and also enhance the test suite to catch similar bugs.

PR:		239830
MFC after:	2 weeks
MFC-With:	350665
Sponsored by:	The FreeBSD Foundation
2019-08-14 20:45:00 +00:00
Alan Somers
427d205cb5 fusefs: remove superfluous counter_u64_zero
Reported by:	glebius
Sponsored by:	The FreeBSD Foundation
2019-08-06 00:50:25 +00:00
Alan Somers
508abc9494 fusefs: fix the build after r350446
fuse needs to include an additional header after r350446

Sponsored by:	The FreeBSD Foundation
2019-07-31 21:48:35 +00:00
Alan Somers
58df81b339 MFHead @350426
Sponsored by:	The FreeBSD Foundation
2019-07-30 04:17:36 +00:00
Mark Johnston
918988576c Avoid relying on header pollution from sys/refcount.h.
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-07-29 20:26:01 +00:00
Alan Somers
669a092af1 fusefs: fix panic when writing with O_DIRECT and using writeback cache
When a fusefs file system is mounted using the writeback cache, the cache
may still be bypassed by opening a file with O_DIRECT.  When writing with
O_DIRECT, the cache must be invalidated for the affected portion of the
file.  Fix some panics caused by inadvertently invalidating too much.

Sponsored by:	The FreeBSD Foundation
2019-07-28 15:17:32 +00:00
Alan Somers
ed74f781c9 fusefs: add a intr/nointr mount option
FUSE file systems can optionally support interrupting outstanding
operations.  However, the file system does not identify to the kernel at
mount time whether it's capable of doing that.  Instead it signals its
noncapability by returning ENOSYS to the first FUSE_INTERRUPT operation it
receives.  That's a problem for reliable signal delivery, because the kernel
must choose which thread should get a signal before it knows whether the
FUSE server can handle interrupts.  The problem is even worse because the
FUSE protocol allows a file system to simply ignore all FUSE_INTERRUPT
operations.

Fix the signal delivery logic by making interruptibility an opt-in mount
option.  This will require a corresponding change to libfuse, but not to
most file systems that link to libfuse.

Bump __FreeBSD_version due to the new mount option.

Sponsored by:	The FreeBSD Foundation
2019-07-18 17:55:13 +00:00
Alan Somers
f05962453e fusefs: fix another semi-infinite loop bug regarding signal handling
fticket_wait_answer would spin if it received an unhandled signal whose
default disposition is to terminate.  The reason is because msleep(9) would
return EINTR even for a masked signal.  One reason is when the thread is
stopped, which happens for example during sigexit().  Fix this bug by
returning immediately if fticket_wait_answer ever gets interrupted a second
time, for any reason.

Sponsored by:	The FreeBSD Foundation
2019-07-18 15:30:00 +00:00
Alan Somers
d26d63a4af fusefs: multiple interruptility improvements
1) Don't explicitly not mask SIGKILL.  kern_sigprocmask won't allow it to be
   masked, anyway.

2) Fix an infinite loop bug.  If a process received both a maskable signal
   lower than 9 (like SIGINT) and then received SIGKILL,
   fticket_wait_answer would spin.  msleep would immediately return EINTR,
   but cursig would return SIGINT, so the sleep would get retried.  Fix it
   by explicitly checking whether SIGKILL has been received.

3) Abandon the sig_isfatal optimization introduced by r346357.  That
   optimization would cause fticket_wait_answer to return immediately,
   without waiting for a response from the server, if the process were going
   to exit anyway.  However, it's vulnerable to a race:

   1) fatal signal is received while fticket_wait_answer is sleeping.
   2) fticket_wait_answer sends the FUSE_INTERRUPT operation.
   3) fticket_wait_answer determines that the signal was fatal and returns
      without waiting for a response.
   4) Another thread changes the signal to non-fatal.
   5) The first thread returns to userspace.  Instead of exiting, the
      process continues.
   6) The application receives EINTR, wrongly believes that the operation
      was successfully interrupted, and restarts it.  This could cause
      problems for non-idempotent operations like FUSE_RENAME.

Reported by:    kib (the race part)
Sponsored by:   The FreeBSD Foundation
2019-07-17 22:45:43 +00:00
Alan Somers
07e86257e6 fusefs: fix the build with some NODEBUG kernels
systm.h needs to be included before counter.h

Sponsored by:	The FreeBSD Foundation
2019-07-13 21:41:12 +00:00
Alan Somers
97b0512b23 projects/fuse2: build fixes
* Fix the kernel build with gcc by removing a redundant extern declaration
* In the tests, fix a printf format specifier that assumed LP64

Sponsored by:	The FreeBSD Foundation
2019-07-13 14:42:09 +00:00
Alan Somers
7e1f5432f4 fusefs: don't leak memory of unsent operations on unmount
Sponsored by:	The FreeBSD Foundation
2019-06-28 18:48:02 +00:00
Alan Somers
8aafc8c389 [skip ci] update copyright headers in fusefs files
Sponsored by:	The FreeBSD Foundation
2019-06-28 04:18:10 +00:00
Alan Somers
c1afff113c fusefs: fix a memory leak regarding FUSE_INTERRUPT
We were leaking the fuse ticket if the original operation completed before
the daemon received the INTERRUPT operation.  Fixing this was easier than I
expected.

Sponsored by:	The FreeBSD Foundation
2019-06-27 22:24:56 +00:00
Alan Somers
435ecf40bb fusefs: recycle vnodes after their last unlink
Previously fusefs would never recycle vnodes.  After VOP_INACTIVE, they'd
linger around until unmount or the vnlru reclaimed them.  This commit
essentially actives and inlines the old reclaim_revoked sysctl, and fixes
some issues dealing with the attribute cache and multiply linked files.

Sponsored by:	The FreeBSD Foundation
2019-06-27 20:18:12 +00:00
Alan Somers
38c8634635 fusefs: counter(9) variables should not be statically initialized
Reported by:	rpokala
Sponsored by:	The FreeBSD Foundation
2019-06-27 17:59:15 +00:00
Alan Somers
560a55d094 fusefs: convert statistical sysctls to use counter(9)
counter(9) is more performant than using atomic instructions to update
sysctls that just report statistics to userland.

Sponsored by:	The FreeBSD Foundation
2019-06-27 16:30:25 +00:00
Alan Somers
caeea8b4cc fusefs: fix some memory leaks
Fix memory leaks relating to FUSE_BMAP and FUSE_CREATE.  There are still
leaks relating to FUSE_INTERRUPT, but they'll be harder to fix since the
server is legally allowed to never respond to a FUSE_INTERRUPT operation.

Sponsored by:	The FreeBSD Foundation
2019-06-27 00:00:48 +00:00
Alan Somers
f8ebf1cd7e fusefs: implement protocol 7.23's FUSE_WRITEBACK_CACHE option
As of protocol 7.23, fuse file systems can specify their cache behavior on a
per-mountpoint basis.  If they set FUSE_WRITEBACK_CACHE in
fuse_init_out.flags, then they'll get the writeback cache.  If not, then
they'll get the writethrough cache.  If they set FOPEN_DIRECT_IO in every
FUSE_OPEN response, then they'll get no cache at all.

The old vfs.fusefs.data_cache_mode sysctl is ignored for servers that use
protocol 7.23 or later.  However, it's retained for older servers,
especially for those running in jails that lack access to the new protocol.

This commit also fixes two other minor test bugs:
* WriteCluster:SetUp was using an uninitialized variable.
* Read.direct_io_pread wasn't verifying that the cache was actually
  bypassed.

Sponsored by:	The FreeBSD Foundation
2019-06-26 17:32:31 +00:00
Alan Somers
205696a17d fusefs: delete some unused mount options
The fusefs kernel module allegedly supported no_attrcache, no_readahed,
no_datacache, no_namecache, and no_mmap mount options, but the mount_fusefs
binary never did.  So there was no way to ever activate these options.
Delete them.  Some of them have alternatives:

no_attrcache: set the attr_valid time to 0 in FUSE_LOOKUP and FUSE_GETATTR
	responses.
no_readahed: set max_readahead to 0 in the FUSE_INIT response.
no_datacache: set the vfs.fusefs.data_cache_mode sysctl to 0, or (coming
	soon) set the attr_valid time to 0 and set FUSE_AUTO_INVAL_DATA in
	the FUSE_INIT response.
no_namecache: set entry_valid time to 0 in FUSE_LOOKUP and FUSE_GETATTR
	responses.

Sponsored by:	The FreeBSD Foundation
2019-06-26 15:15:24 +00:00
Alan Somers
fef464546c fusefs: implement the "time_gran" feature.
If a server supports a timestamp granularity other than 1ns, it can tell the
client this as of protocol 7.23.  The client will use that granularity when
updating its cached timestamps during write.  This way the timestamps won't
appear to change following flush.

Sponsored by:	The FreeBSD Foundation
2019-06-26 02:09:22 +00:00
Alan Somers
0a8fe2d369 fusefs: set ctime during FUSE_SETATTR following a write
As of r349396 the kernel will internally update the mtime and ctime of files
on write.  It will also flush the mtime should a SETATTR happen before the
data cache gets flushed.  Now it will flush the ctime too, if the server is
using protocol 7.23 or higher.

This is the only case in which the kernel will explicitly set a file's
ctime, since neither utimensat(2) nor any other user interfaces allow it.

Sponsored by:	The FreeBSD Foundation
2019-06-26 00:03:37 +00:00
Alan Somers
788af9538a fusefs: automatically update mtime and ctime on write
Writing should implicitly update a file's mtime and ctime.  For fuse, the
server is supposed to do that.  But the client needs to do it too, because
the FUSE_WRITE response does not include time attributes, and it's not
desirable to issue a GETATTR after every WRITE.  When using the writeback
cache, there's another hitch: the kernel should ignore the mtime and ctime
fields in any GETATTR response for files with a dirty write cache.

Sponsored by:	The FreeBSD Foundation
2019-06-25 23:40:18 +00:00
Alan Somers
0d3a88d76c fusefs: writes should update the file size, even when data_cache_mode=0
Writes that extend a file should update the file's size.  r344185 restricted
that behavior for fusefs to only happen when the data cache was enabled.
That probably made sense at the time because the attribute cache wasn't
fully baked yet.  Now that it is, we should always update the cached file
size during write.

Sponsored by:	The FreeBSD Foundation
2019-06-25 18:36:11 +00:00
Alan Somers
b9e2019755 fusefs: rewrite vop_getpages and vop_putpages
Use the standard facilities for getpages and putpages instead of bespoke
implementations that don't work well with the writeback cache.  This has
several corollaries:

* Change the way we handle short reads _again_.  vfs_bio_getpages doesn't
  provide any way to handle unexpected short reads.  Plus, I found some more
  lock-order problems.  So now when the short read is detected we'll just
  clear the vnode's attribute cache, forcing the file size to be requeried
  the next time it's needed.  VOP_GETPAGES doesn't have any way to indicate
  a short read to the "caller", so we just bzero the rest of the page
  whenever a short read happens.

* Change the way we decide when to set the FUSE_WRITE_CACHE bit.  We now set
  it for clustered writes even when the writeback cache is not in use.

Sponsored by:   The FreeBSD Foundation
2019-06-25 17:24:43 +00:00
Alan Somers
1734e205f3 fusefs: refine the short read fix from r349332
b_fsprivate1 needs to be initialized even for write operations, probably
because a buffer can be used to read, write, and read again with the final
read serviced by cache.

Sponsored by:	The FreeBSD Foundation
2019-06-24 20:08:28 +00:00
Alan Somers
17575bad85 fusefs: improve the short read fix from r349279
VOP_GETPAGES intentionally tries to read beyond EOF, so fuse_read_biobackend
can't rely on bp->b_resid > 0 indicating a short read.  And adjusting
bp->b_count after a short read seems to cause some sort of resource leak.
Instead, store the shortfall in the bp->b_fsprivate1 field.

Sponsored by:	The FreeBSD Foundation
2019-06-24 17:05:31 +00:00
Alan Somers
44f654fdc5 fusefs: fix corruption on short reads caused by r349279
Even if a short read is caused by EOF, it's still necessary to bzero the
remaining buffer, because that buffer could become valid as a result of a
future ftruncate or pwrite operation.

Reported by:	fsx
Sponsored by:	The FreeBSD Foundation
2019-06-21 23:29:29 +00:00
Alan Somers
aef22f2d75 fusefs: correctly handle short reads
A fuse server may return a short read for three reasons:

* The file is opened with FOPEN_DIRECT_IO.  In this case, the short read
  should be returned directly to userland.  We already handled this case
  correctly.

* The file was truncated server-side, and the read hit EOF.  In this case,
  the kernel should update the file size.  Fixed in the case of VOP_READ.
  Fixing this for VOP_GETPAGES is TODO.

* The file is opened in writeback mode, there are dirty buffers past what
  the server thinks is the file's EOF, and the read hit what the server
  thinks is the file's EOF.  In this case, the client is trying to read a
  hole, and should zero-fill it.  We already handled this case, and I added
  a test for it.

Sponsored by:	The FreeBSD Foundation
2019-06-21 21:44:31 +00:00
Alan Somers
87ff949a7b fusefs: raise protocol level to 7.23
None of the new features are implemented yet.  This commit just adds the new
protocol definitions and adds backwards-compatibility code for pre 7.23
servers.

Sponsored by:	The FreeBSD Foundation
2019-06-21 04:57:23 +00:00
Alan Somers
8f9b3ba718 fusefs: use standard integer types in fuse_kernel.h
This is a merge of Linux revision 4c82456eeb4da081dd63dc69e91aa6deabd29e03.
No functional change.

Sponsored by:	The FreeBSD Foundation
2019-06-21 03:17:27 +00:00
Alan Somers
b160acd1c0 fusefs: raise the protocol level to 7.21
Jumping from protocol 7.15 to 7.21 adds several new features.  While they're
all potentially useful, they're also all optional, and I'm not implementing
any right now because my highest priority lies in a later version.

Sponsored by:	The FreeBSD Foundation
2019-06-21 03:04:56 +00:00
Alan Somers
ecb489158c fusefs: diff reduction of fuse_kernel.h vs the upstream version
fuse_kernel.h is based on Linux's fuse.h.  In r349250 I modified
fuse_kernel.h by generating a diff of two versions of Linux's fuse.h and
applying it to our tree.  patch succeeded, but it put one chunk in the wrong
location.  This commit fixes that.  No functional changes.

Sponsored by:	The FreeBSD Foundation
2019-06-21 02:55:43 +00:00