Commit Graph

4119 Commits

Author SHA1 Message Date
Alan Somers
a63915c2d7 MFHead @r350386
Sponsored by:	The FreeBSD Foundation
2019-07-28 04:02:22 +00:00
Alan Somers
ed74f781c9 fusefs: add a intr/nointr mount option
FUSE file systems can optionally support interrupting outstanding
operations.  However, the file system does not identify to the kernel at
mount time whether it's capable of doing that.  Instead it signals its
noncapability by returning ENOSYS to the first FUSE_INTERRUPT operation it
receives.  That's a problem for reliable signal delivery, because the kernel
must choose which thread should get a signal before it knows whether the
FUSE server can handle interrupts.  The problem is even worse because the
FUSE protocol allows a file system to simply ignore all FUSE_INTERRUPT
operations.

Fix the signal delivery logic by making interruptibility an opt-in mount
option.  This will require a corresponding change to libfuse, but not to
most file systems that link to libfuse.

Bump __FreeBSD_version due to the new mount option.

Sponsored by:	The FreeBSD Foundation
2019-07-18 17:55:13 +00:00
Alan Somers
f05962453e fusefs: fix another semi-infinite loop bug regarding signal handling
fticket_wait_answer would spin if it received an unhandled signal whose
default disposition is to terminate.  The reason is because msleep(9) would
return EINTR even for a masked signal.  One reason is when the thread is
stopped, which happens for example during sigexit().  Fix this bug by
returning immediately if fticket_wait_answer ever gets interrupted a second
time, for any reason.

Sponsored by:	The FreeBSD Foundation
2019-07-18 15:30:00 +00:00
Alan Somers
d26d63a4af fusefs: multiple interruptility improvements
1) Don't explicitly not mask SIGKILL.  kern_sigprocmask won't allow it to be
   masked, anyway.

2) Fix an infinite loop bug.  If a process received both a maskable signal
   lower than 9 (like SIGINT) and then received SIGKILL,
   fticket_wait_answer would spin.  msleep would immediately return EINTR,
   but cursig would return SIGINT, so the sleep would get retried.  Fix it
   by explicitly checking whether SIGKILL has been received.

3) Abandon the sig_isfatal optimization introduced by r346357.  That
   optimization would cause fticket_wait_answer to return immediately,
   without waiting for a response from the server, if the process were going
   to exit anyway.  However, it's vulnerable to a race:

   1) fatal signal is received while fticket_wait_answer is sleeping.
   2) fticket_wait_answer sends the FUSE_INTERRUPT operation.
   3) fticket_wait_answer determines that the signal was fatal and returns
      without waiting for a response.
   4) Another thread changes the signal to non-fatal.
   5) The first thread returns to userspace.  Instead of exiting, the
      process continues.
   6) The application receives EINTR, wrongly believes that the operation
      was successfully interrupted, and restarts it.  This could cause
      problems for non-idempotent operations like FUSE_RENAME.

Reported by:    kib (the race part)
Sponsored by:   The FreeBSD Foundation
2019-07-17 22:45:43 +00:00
Alan Somers
07e86257e6 fusefs: fix the build with some NODEBUG kernels
systm.h needs to be included before counter.h

Sponsored by:	The FreeBSD Foundation
2019-07-13 21:41:12 +00:00
Alan Somers
97b0512b23 projects/fuse2: build fixes
* Fix the kernel build with gcc by removing a redundant extern declaration
* In the tests, fix a printf format specifier that assumed LP64

Sponsored by:	The FreeBSD Foundation
2019-07-13 14:42:09 +00:00
Fedor Uporov
6ce04e595a Add additional check for 'blocks per group' and 'fragments per group' superblock fields.
These fields will not be equal only in case if bigalloc filesystem feature is turned on.
This feature is not supported for now.

Reported by:    Christopher Krah, Thomas Barabosch, and Jan-Niclas Hilgert of Fraunhofer FKIE
Reported as:    FS-27-EXT2-12: Denial of Service in openat-0 (vm_fault_hold/ext2_clusteracct)

MFC after:	2 weeks
2019-07-07 08:58:02 +00:00
Fedor Uporov
c008656263 Remove ufs fragments logic.
The ext2fs fragments are different from ufs fragments.
In case of ext2fs the fragment should be equal or more then block size.
The values more than block size are used only in case of bigalloc feature, which is does not supported for now.

Reported by:    Christopher Krah, Thomas Barabosch, and Jan-Niclas Hilgert of Fraunhofer FKIE
Reported as:    FS-22-EXT2-9: Denial of service in ftruncate-0 (ext2_balloc)

MFC after:	2 weeks
2019-07-07 08:56:13 +00:00
Fedor Uporov
590517d05a Remove unneeded mount point unlock call.
Reported by:    Christopher Krah, Thomas Barabosch, and Jan-Niclas Hilgert of Fraunhofer FKIE
Reported as:    FS-11-EXT2-6: Denial Of Service in write-1 (ext2_balloc)

MFC after:	2 weeks
2019-07-07 08:53:52 +00:00
Alan Somers
7e1f5432f4 fusefs: don't leak memory of unsent operations on unmount
Sponsored by:	The FreeBSD Foundation
2019-06-28 18:48:02 +00:00
Alan Somers
8aafc8c389 [skip ci] update copyright headers in fusefs files
Sponsored by:	The FreeBSD Foundation
2019-06-28 04:18:10 +00:00
Alan Somers
7f49ce7a0b MFHead @349476
Sponsored by:	The FreeBSD Foundation
2019-06-27 23:50:54 +00:00
Alan Somers
c1afff113c fusefs: fix a memory leak regarding FUSE_INTERRUPT
We were leaking the fuse ticket if the original operation completed before
the daemon received the INTERRUPT operation.  Fixing this was easier than I
expected.

Sponsored by:	The FreeBSD Foundation
2019-06-27 22:24:56 +00:00
Alan Somers
435ecf40bb fusefs: recycle vnodes after their last unlink
Previously fusefs would never recycle vnodes.  After VOP_INACTIVE, they'd
linger around until unmount or the vnlru reclaimed them.  This commit
essentially actives and inlines the old reclaim_revoked sysctl, and fixes
some issues dealing with the attribute cache and multiply linked files.

Sponsored by:	The FreeBSD Foundation
2019-06-27 20:18:12 +00:00
Alan Somers
38c8634635 fusefs: counter(9) variables should not be statically initialized
Reported by:	rpokala
Sponsored by:	The FreeBSD Foundation
2019-06-27 17:59:15 +00:00
Alan Somers
560a55d094 fusefs: convert statistical sysctls to use counter(9)
counter(9) is more performant than using atomic instructions to update
sysctls that just report statistics to userland.

Sponsored by:	The FreeBSD Foundation
2019-06-27 16:30:25 +00:00
Alan Somers
caeea8b4cc fusefs: fix some memory leaks
Fix memory leaks relating to FUSE_BMAP and FUSE_CREATE.  There are still
leaks relating to FUSE_INTERRUPT, but they'll be harder to fix since the
server is legally allowed to never respond to a FUSE_INTERRUPT operation.

Sponsored by:	The FreeBSD Foundation
2019-06-27 00:00:48 +00:00
Alan Somers
f8ebf1cd7e fusefs: implement protocol 7.23's FUSE_WRITEBACK_CACHE option
As of protocol 7.23, fuse file systems can specify their cache behavior on a
per-mountpoint basis.  If they set FUSE_WRITEBACK_CACHE in
fuse_init_out.flags, then they'll get the writeback cache.  If not, then
they'll get the writethrough cache.  If they set FOPEN_DIRECT_IO in every
FUSE_OPEN response, then they'll get no cache at all.

The old vfs.fusefs.data_cache_mode sysctl is ignored for servers that use
protocol 7.23 or later.  However, it's retained for older servers,
especially for those running in jails that lack access to the new protocol.

This commit also fixes two other minor test bugs:
* WriteCluster:SetUp was using an uninitialized variable.
* Read.direct_io_pread wasn't verifying that the cache was actually
  bypassed.

Sponsored by:	The FreeBSD Foundation
2019-06-26 17:32:31 +00:00
Alan Somers
205696a17d fusefs: delete some unused mount options
The fusefs kernel module allegedly supported no_attrcache, no_readahed,
no_datacache, no_namecache, and no_mmap mount options, but the mount_fusefs
binary never did.  So there was no way to ever activate these options.
Delete them.  Some of them have alternatives:

no_attrcache: set the attr_valid time to 0 in FUSE_LOOKUP and FUSE_GETATTR
	responses.
no_readahed: set max_readahead to 0 in the FUSE_INIT response.
no_datacache: set the vfs.fusefs.data_cache_mode sysctl to 0, or (coming
	soon) set the attr_valid time to 0 and set FUSE_AUTO_INVAL_DATA in
	the FUSE_INIT response.
no_namecache: set entry_valid time to 0 in FUSE_LOOKUP and FUSE_GETATTR
	responses.

Sponsored by:	The FreeBSD Foundation
2019-06-26 15:15:24 +00:00
Alan Somers
fef464546c fusefs: implement the "time_gran" feature.
If a server supports a timestamp granularity other than 1ns, it can tell the
client this as of protocol 7.23.  The client will use that granularity when
updating its cached timestamps during write.  This way the timestamps won't
appear to change following flush.

Sponsored by:	The FreeBSD Foundation
2019-06-26 02:09:22 +00:00
Alan Somers
0a8fe2d369 fusefs: set ctime during FUSE_SETATTR following a write
As of r349396 the kernel will internally update the mtime and ctime of files
on write.  It will also flush the mtime should a SETATTR happen before the
data cache gets flushed.  Now it will flush the ctime too, if the server is
using protocol 7.23 or higher.

This is the only case in which the kernel will explicitly set a file's
ctime, since neither utimensat(2) nor any other user interfaces allow it.

Sponsored by:	The FreeBSD Foundation
2019-06-26 00:03:37 +00:00
Alan Somers
788af9538a fusefs: automatically update mtime and ctime on write
Writing should implicitly update a file's mtime and ctime.  For fuse, the
server is supposed to do that.  But the client needs to do it too, because
the FUSE_WRITE response does not include time attributes, and it's not
desirable to issue a GETATTR after every WRITE.  When using the writeback
cache, there's another hitch: the kernel should ignore the mtime and ctime
fields in any GETATTR response for files with a dirty write cache.

Sponsored by:	The FreeBSD Foundation
2019-06-25 23:40:18 +00:00
Alan Somers
0d3a88d76c fusefs: writes should update the file size, even when data_cache_mode=0
Writes that extend a file should update the file's size.  r344185 restricted
that behavior for fusefs to only happen when the data cache was enabled.
That probably made sense at the time because the attribute cache wasn't
fully baked yet.  Now that it is, we should always update the cached file
size during write.

Sponsored by:	The FreeBSD Foundation
2019-06-25 18:36:11 +00:00
Alan Somers
b9e2019755 fusefs: rewrite vop_getpages and vop_putpages
Use the standard facilities for getpages and putpages instead of bespoke
implementations that don't work well with the writeback cache.  This has
several corollaries:

* Change the way we handle short reads _again_.  vfs_bio_getpages doesn't
  provide any way to handle unexpected short reads.  Plus, I found some more
  lock-order problems.  So now when the short read is detected we'll just
  clear the vnode's attribute cache, forcing the file size to be requeried
  the next time it's needed.  VOP_GETPAGES doesn't have any way to indicate
  a short read to the "caller", so we just bzero the rest of the page
  whenever a short read happens.

* Change the way we decide when to set the FUSE_WRITE_CACHE bit.  We now set
  it for clustered writes even when the writeback cache is not in use.

Sponsored by:   The FreeBSD Foundation
2019-06-25 17:24:43 +00:00
Hans Petter Selasky
43a9329e1b Free all allocated unit IDs in cuse(3) after the client character
devices have been destroyed to avoid creating character devices with
identical name.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-06-25 11:46:01 +00:00
Hans Petter Selasky
c7ffaed92e Fix for deadlock situation in cuse(3)
The final server unref should be done by the server thread to prevent
deadlock in the client cdevpriv destructor, which cannot destroy
itself.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-06-25 11:42:53 +00:00
Warner Losh
e5500f1efa Replay r349334 by markj accidentally reverted by r349352
Remove a lingering use of splbio().

The buffer must be locked by the caller.  No functional change
intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-06-25 06:14:00 +00:00
Warner Losh
f5a95d9a07 Remove NAND and NANDFS support
NANDFS has been broken for years. Remove it. The NAND drivers that
remain are for ancient parts that are no longer relevant. They are
polled, have terrible performance and just for ancient arm
hardware. NAND parts have evolved significantly from this early work
and little to none of it would be relevant should someone need to
update to support raw nand. This code has been off by default for
years and has violated the vnode protocol leading to panics since it
was committed.

Numerous posts to arch@ and other locations have found no actual users
for this software.

Relnotes:	Yes
No Objection From: arch@
Differential Revision: https://reviews.freebsd.org/D20745
2019-06-25 04:50:09 +00:00
Alan Somers
1734e205f3 fusefs: refine the short read fix from r349332
b_fsprivate1 needs to be initialized even for write operations, probably
because a buffer can be used to read, write, and read again with the final
read serviced by cache.

Sponsored by:	The FreeBSD Foundation
2019-06-24 20:08:28 +00:00
Mark Johnston
673c1c2944 Remove a lingering use of splbio().
The buffer must be locked by the caller.  No functional change
intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-06-24 19:19:37 +00:00
Alan Somers
17575bad85 fusefs: improve the short read fix from r349279
VOP_GETPAGES intentionally tries to read beyond EOF, so fuse_read_biobackend
can't rely on bp->b_resid > 0 indicating a short read.  And adjusting
bp->b_count after a short read seems to cause some sort of resource leak.
Instead, store the shortfall in the bp->b_fsprivate1 field.

Sponsored by:	The FreeBSD Foundation
2019-06-24 17:05:31 +00:00
Alan Somers
44f654fdc5 fusefs: fix corruption on short reads caused by r349279
Even if a short read is caused by EOF, it's still necessary to bzero the
remaining buffer, because that buffer could become valid as a result of a
future ftruncate or pwrite operation.

Reported by:	fsx
Sponsored by:	The FreeBSD Foundation
2019-06-21 23:29:29 +00:00
Alan Somers
aef22f2d75 fusefs: correctly handle short reads
A fuse server may return a short read for three reasons:

* The file is opened with FOPEN_DIRECT_IO.  In this case, the short read
  should be returned directly to userland.  We already handled this case
  correctly.

* The file was truncated server-side, and the read hit EOF.  In this case,
  the kernel should update the file size.  Fixed in the case of VOP_READ.
  Fixing this for VOP_GETPAGES is TODO.

* The file is opened in writeback mode, there are dirty buffers past what
  the server thinks is the file's EOF, and the read hit what the server
  thinks is the file's EOF.  In this case, the client is trying to read a
  hole, and should zero-fill it.  We already handled this case, and I added
  a test for it.

Sponsored by:	The FreeBSD Foundation
2019-06-21 21:44:31 +00:00
Alan Somers
87ff949a7b fusefs: raise protocol level to 7.23
None of the new features are implemented yet.  This commit just adds the new
protocol definitions and adds backwards-compatibility code for pre 7.23
servers.

Sponsored by:	The FreeBSD Foundation
2019-06-21 04:57:23 +00:00
Alan Somers
8f9b3ba718 fusefs: use standard integer types in fuse_kernel.h
This is a merge of Linux revision 4c82456eeb4da081dd63dc69e91aa6deabd29e03.
No functional change.

Sponsored by:	The FreeBSD Foundation
2019-06-21 03:17:27 +00:00
Alan Somers
b160acd1c0 fusefs: raise the protocol level to 7.21
Jumping from protocol 7.15 to 7.21 adds several new features.  While they're
all potentially useful, they're also all optional, and I'm not implementing
any right now because my highest priority lies in a later version.

Sponsored by:	The FreeBSD Foundation
2019-06-21 03:04:56 +00:00
Alan Somers
ecb489158c fusefs: diff reduction of fuse_kernel.h vs the upstream version
fuse_kernel.h is based on Linux's fuse.h.  In r349250 I modified
fuse_kernel.h by generating a diff of two versions of Linux's fuse.h and
applying it to our tree.  patch succeeded, but it put one chunk in the wrong
location.  This commit fixes that.  No functional changes.

Sponsored by:	The FreeBSD Foundation
2019-06-21 02:55:43 +00:00
Alan Somers
7cbb8e8a06 fusefs: raise protocol level to 7.15
This protocol level adds two new features: the ability for the server to
store or retrieve data into/from the client's cache.  But the messages
aren't defined soundly since they identify the file only by its inode,
without the generation number.  So it's possible for them to modify the
wrong file's cache.  Also, I don't know of any file systems in ports that
use these messages.  So I'm not implementing them.  I did add a (disabled)
test for the store message, however.

Sponsored by:	The FreeBSD Foundation
2019-06-20 23:32:25 +00:00
Alan Somers
bb23d43901 fusefs: trivially raise protocol level to 7.14
The only new feature is splice(2) support on /dev/fuse, which FreeBSD can't
support.

Sponsored by:	The FreeBSD Foundation
2019-06-20 23:12:19 +00:00
Alan Somers
38b06f8ac4 fcntl: fix overflow when setting F_READAHEAD
VOP_READ and VOP_WRITE take the seqcount in blocks in a 16-bit field.
However, fcntl allows you to set the seqcount in bytes to any nonnegative
31-bit value. The result can be a 16-bit overflow, which will be
sign-extended in functions like ffs_read. Fix this by sanitizing the
argument in kern_fcntl. As a matter of policy, limit to IO_SEQMAX rather
than INT16_MAX.

Also, fifos have overloaded the f_seqcount field for a completely different
purpose ever since r238936.  Formalize that by using a union type.

Reviewed by:	cem
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20710
2019-06-20 23:07:20 +00:00
Alan Somers
192a918194 fusefs: attempt to support servers as old as protocol 7.4
Previously we allowed servers as old as 7.1 to connect (there never was a
7.0).  However, we wrongly assumed a few things about protocols older than
7.8.  This commit attempts to support servers as old as 7.4 but no older.  I
added no new tests because I'm not sure there actually _are_ any servers
this old in the wild.

Sponsored by:	The FreeBSD Foundation
2019-06-20 22:21:42 +00:00
Alan Somers
2ffddc5ee9 fusefs: raise protocol level to 7.13
This protocol version adds one new feature: the ability for the server to
set the maximum number of background requests and a "congestion threshold"
with ill-defined properties.  I don't know of any fuse file systems in ports
that use this feature, so I'm not implementing it.

Sponsored by:	The FreeBSD Foundation
2019-06-20 21:29:28 +00:00
Alan Somers
a1c9f4ad0d fusefs: implement VOP_BMAP
If the fuse daemon supports FUSE_BMAP, then use that for the block mapping.
Otherwise, use the same technique used by vop_stdbmap.  Report large values
for runp and runb in order to maximize read clustering and minimize upcalls,
even if we don't know the true layout.

The major result of this change is that sequential reads to FUSE files will
now usually happen 128KB at a time instead of 64KB.

Sponsored by:	The FreeBSD Foundation
2019-06-20 17:08:21 +00:00
Alan Somers
e532a99901 MFHead @349234
Sponsored by:	The FreeBSD Foundation
2019-06-20 15:56:08 +00:00
Alan Somers
84879e46c2 fusefs: multiple fixes related to the write cache
* Don't always write the last page synchronously.  That's not actually
  required.  It was probably just masking another bug that I fixed later,
  possibly in r349021.

* Enable the NotifyWriteback tests now that Writeback cache is working.

* Add a test to ensure that the write cache isn't flushed synchronously when
  in writeback mode.

Sponsored by:	The FreeBSD Foundation
2019-06-17 23:34:11 +00:00
Alan Somers
402b609c80 fusefs: use cluster_read for more readahead
fusefs will now use cluster_read.  This allows readahead of more than one
cache block.  However, it won't yet actually cluster the reads because that
requires VOP_BMAP, which fusefs does not yet implement.

Sponsored by:	The FreeBSD Foundation
2019-06-17 22:01:23 +00:00
Xin LI
f89d207279 Separate kernel crc32() implementation to its own header (gsb_crc32.h) and
rename the source to gsb_crc32.c.

This is a prerequisite of unifying kernel zlib instances.

PR:		229763
Submitted by:	Yoshihiro Ota <ota at j.email.ne.jp>
Differential Revision:	https://reviews.freebsd.org/D20193
2019-06-17 19:49:08 +00:00
Alan Somers
d569012f45 fusefs: implement non-clustered readahead
fusefs will now read ahead at most one cache block at a time (usually 64
KB).  Clustered reads are still TODO.  Individual file systems may disable
read ahead by setting fuse_init_out.max_readahead=0 during initialization.

Sponsored by:	The FreeBSD Foundation
2019-06-17 16:56:51 +00:00
Alan Somers
b5aaf286ea fusefs: fix the "write-through" of write-through cacheing
Our fusefs(5) module supports three cache modes: uncached, write-through,
and write-back.  However, the write-through mode (which is the default) has
never actually worked as its name suggests.  Rather, it's always been more
like "write-around".  It wrote directly, bypassing the cache.  The cache
would only be populated by a subsequent read of the same data.

This commit fixes that problem.  Now the write-through mode works as one
would expect: write(2) immediately adds data to the cache and then blocks
while the daemon processes the write operation.

A side effect of this change is that non-cache-block-aligned writes will now
incur a read-modify-write cycle of the cache block.  The old behavior
(bypassing write cache entirely) can still be achieved by opening a file
with O_DIRECT.

PR:		237588
Sponsored by:	The FreeBSD Foundation
2019-06-14 19:47:48 +00:00
Alan Somers
8eecd9ce05 fusefs: enable write clustering
Enable write clustering in fusefs whenever cache mode is set to writeback
and the "async" mount option is used.  With default values for MAXPHYS,
DFLTPHYS, and the fuse max_write mount parameter, that means sequential
writes will now be written 128KB at a time instead of 64KB.

Also, add a regression test for PR 238565, a panic during unmount that
probably affects UFS, ext2, and msdosfs as well as fusefs.

PR:		238565
Sponsored by:	The FreeBSD Foundation
2019-06-14 18:14:51 +00:00