Commit Graph

333 Commits

Author SHA1 Message Date
Alan Somers
18ed2ce77a fusefs: fix the build without INVARIANTS after 00134a0789
MFC after:	2 weeks
MFC with:	00134a0789
Reported by:	se
2022-02-04 18:44:27 -07:00
Alan Somers
00134a0789 fusefs: require FUSE_NO_OPENDIR_SUPPORT for NFS exporting
FUSE file systems that do not set FUSE_NO_OPENDIR_SUPPORT do not
guarantee that d_off will be valid after closing and reopening a
directory.  That conflicts with NFS's statelessness, that results in
unresolvable bugs when NFS reads large directories, if:

* The file system _does_ change the d_off field for the last directory
  entry previously returned by VOP_READDIR, or
* The file system deletes the last directory entry previously seen by
  NFS.

Rather than doing a poor job of exporting such file systems, it's better
just to refuse.

Even though this is technically a breaking change, 13.0-RELEASE's
NFS-FUSE support was bad enough that an MFC should be allowed.

MFC after:	3 weeks.
Reviewed by:	rmacklem
Differential Revision: https://reviews.freebsd.org/D33726
2022-02-04 16:31:05 -07:00
Alan Somers
4a6526d84a fusefs: optimize NFS readdir for FUSE_NO_OPENDIR_SUPPORT
In its lowest common denominator, FUSE does not require that a directory
entry's d_off field is valid outside of the lifetime of the directory's
FUSE file handle.  But since NFS is stateless, it must reopen the
directory on every call to VOP_READDIR.  That means reading the
directory all the way from the first entry.  Not only does this create
an O(n^2) condition for large directories, but it can also result in
incorrect behavior if either:

* The file system _does_ change the d_off field for the last directory
  entry previously seen by NFS, or
* The file system deletes the last directory entry previously seen by
  NFS.

Handily, for file systems that set FUSE_NO_OPENDIR_SUPPORT d_off is
guaranteed to be valid for the lifetime of the directory entry, there is
no need to read the directory from the start.

MFC after:	3 weeks
Reviewed by:	rmacklem
2022-02-04 16:30:58 -07:00
Alan Somers
d088dc76e1 Fix NFS exports of FUSE file systems for big directories
The FUSE protocol does not require that a directory entry's d_off field
outlive the lifetime of its directory's file handle.  Since the NFS
server must reopen the directory on every VOP_READDIR call, that means
it can't pass uio->uio_offset down to the FUSE server.  Instead, it must
read the directory from 0 each time.  It may need to issue multiple
FUSE_READDIR operations until it finds the d_off field that it's looking
for.  That was the intention behind SVN r348209 and r297887, but a logic
bug prevented subsequent FUSE_READDIR operations from ever being issued,
rendering large directories incompletely browseable.

MFC after:	3 weeks
Reviewed by:	rmacklem
2022-02-04 16:30:49 -07:00
Mark Johnston
3d8562348c fusefs: Address -Wunused-but-set-variable warnings
Reviewed by:	asomers
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D33957
2022-01-20 08:25:00 -05:00
Alan Somers
89d57b94d7 fusefs: implement VOP_DEALLOCATE
MFC after:	Never
Reviewed by:	khng
Differential Revision: https://reviews.freebsd.org/D33800
2022-01-18 21:13:02 -07:00
Alan Somers
398c88c758 fusefs: implement VOP_ALLOCATE
Now posix_fallocate will be correctly forwarded to fuse file system
servers, for those that support it.

MFC after:	2 weeks
Reviewed by:	pfg
Differential Revision: https://reviews.freebsd.org/D33389
2021-12-31 21:05:28 -07:00
Alan Somers
1613087a81 fusefs: fix .. lookups when the parent has been reclaimed.
By default, FUSE file systems are assumed not to support lookups for "."
and "..".  They must opt-in to that.  To cope with this limitation, the
fusefs kernel module caches every fuse vnode's parent's inode number,
and uses that during VOP_LOOKUP for "..".  But if the parent's vnode has
been reclaimed that won't be possible.  Previously we paniced in this
situation.  Now, we'll return ESTALE instead.  Or, if the file system
has opted into ".." lookups, we'll just do that instead.

This commit also fixes VOP_LOOKUP to respect the cache timeout for ".."
lookups, if the FUSE file system specified a finite timeout.

PR:		259974
MFC after:	2 weeks
Reviewed by:	pfg
Differential Revision: https://reviews.freebsd.org/D33239
2021-12-31 20:38:27 -07:00
Alan Somers
5169832c96 fusefs: copy_file_range must update file timestamps
If FUSE_COPY_FILE_RANGE returns successfully, update the atime of the
source and the mtime and ctime of the destination.

MFC after:	2 weeks
Reviewers:	pfg
Differential Revision: https://reviews.freebsd.org/D33159
2021-12-31 17:43:57 -07:00
Alan Somers
13d593a5b0 Fix a race in fusefs that can corrupt a file's size.
VOPs like VOP_SETATTR can change a file's size, with the vnode
exclusively locked.  But VOPs like VOP_LOOKUP look up the file size from
the server without the vnode locked.  So a race is possible.  For
example:

1) One thread calls VOP_SETATTR to truncate a file.  It locks the vnode
   and sends FUSE_SETATTR to the server.
2) A second thread calls VOP_LOOKUP and fetches the file's attributes from
   the server.  Then it blocks trying to acquire the vnode lock.
3) FUSE_SETATTR returns and the first thread releases the vnode lock.
4) The second thread acquires the vnode lock and caches the file's
   attributes, which are now out-of-date.

Fix this race by recording a timestamp in the vnode of the last time
that its filesize was modified.  Check that timestamp during VOP_LOOKUP
and VFS_VGET.  If it's newer than the time at which FUSE_LOOKUP was
issued to the server, ignore the attributes returned by FUSE_LOOKUP.

PR:		259071
Reported by:	Agata <chogata@moosefs.pro>
Reviewed by:	pfg
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D33158
2021-12-31 17:38:42 -07:00
Alan Somers
b214fcceac Change VOP_READDIR's cookies argument to a **uint64_t
The cookies argument is only used by the NFS server.  NFSv2 defines the
cookie as 32 bits on the wire, but NFSv3 increased it to 64 bits.  Our
VOP_READDIR, however, has always defined it as u_long, which is 32 bits
on some architectures.  Change it to 64 bits on all architectures.  This
doesn't matter for any in-tree file systems, but it matters for some
FUSE file systems that use 64-bit directory cookies.

PR:             260375
Reviewed by:    rmacklem
Differential Revision: https://reviews.freebsd.org/D33404
2021-12-15 20:54:57 -07:00
Bjoern A. Zeeb
df38ada293 modules: increase MAXMODNAME and provide backward compat
With various firmware files used by graphics and wireless drivers
we are exceeding the current 32 character module name (file path
in kldxref) length.
In order to overcome this issue bump it to the maximum path length
for the next version.
To be able to MFC provide backward compat support for another version
of the struct as the offsets for the second half change due to the
array size increase.

MAXMODNAME being defined to MAXPATHLEN needs param.h to be
included first.  With only 7 modules (or LinuxKPI module.h) not
doing that adjust them rather than including param.h in module.h [1].

Reported by:	Greg V (greg unrelenting.technology)
Sponsored by:	The FreeBSD Foundation
Suggested by:	imp [1]
MFC after:	10 days
Reviewed by:	imp (and others to different level)
Differential Revision:	https://reviews.freebsd.org/D32383
2021-12-09 18:09:53 +00:00
Alan Somers
41ae9f9e64 fusefs: invalidate the cache during copy_file_range
FUSE_COPY_FILE_RANGE instructs the server to write data to a file.
fusefs must invalidate any cached data within the written range.

PR:		260242
MFC after:	2 weeks
Reviewed by:	pfg
Differential Revision: https://reviews.freebsd.org/D33280
2021-12-06 21:41:50 -07:00
Alan Somers
dc433e1530 fusefs: inline fuse_io_dispatch
This function was always confusing, because it created an H-shaped
callgraph: two functions called in and left via different paths based on
which which called.

MFC after: 2 weeks
2021-12-06 21:41:50 -07:00
Alan Somers
25927e068f fusefs: correctly handle an inode that changes file types
Correctly handle the situation where a FUSE server unlinks a file, then
creates a new file of a different type but with the same inode number.
Previously fuse_vnop_lookup in this situation would return EAGAIN.  But
since it didn't call vgone(), the vnode couldn't be reused right away.
Fix this by immediately calling vgone() and reallocating a new vnode.

This problem can occur in three code paths, during VOP_LOOKUP,
VOP_SETATTR, or following FUSE_GETATTR, which usually happens during
VOP_GETATTR but can occur during other vops, too.  Note that the correct
response actually doesn't depend on whether the entry cache has expired.
In fact, during VOP_LOOKUP, we can't even tell.  Either it has expired
already, or else the vnode got reclaimed by vnlru.

Also, correct the error code during the VOP_SETATTR path.

PR:		258022
Reported by:	chogata@moosefs.pro
MFC after:	2 weeks
Reviewed by:	pfg
Differential Revision: https://reviews.freebsd.org/D33283
2021-12-06 21:36:46 -07:00
Alan Somers
91972cfcdd fusefs: update atime on reads when using cached attributes
When using cached attributes, whether or not the data cache is enabled,
fusefs must update a file's atime whenever it reads from it, so long as
it wasn't mounted with -o noatime.  Update it in-kernel, and flush it to
the server on close or during the next setattr operation.

The downside is that close() will now frequently trigger a FUSE_SETATTR
upcall.  But if you care about performance, you should be using
-o noatime anyway.

MFC after:	2 weeks
Reviewed by:	pfg
Differential Revision: https://reviews.freebsd.org/D33145
2021-11-28 18:53:31 -07:00
Alan Somers
65d70b3bae fusefs: fix copy_file_range when extending a file
When copy_file_range extends a file, it must update the cached file
size.

MFC after:	2 weeks
Reviewed by:	rmacklem, pfg
Differential Revision: https://reviews.freebsd.org/D33151
2021-11-28 18:35:58 -07:00
Alan Somers
8fbae6c7bd fusefs: delete a redundant getnanouptime
It's been redundant since SVN r346060 added another getnanouptime just
above.

MFC after:	2 weeks
2021-11-28 16:05:30 -07:00
Mateusz Guzik
7e1d3eefd4 vfs: remove the unused thread argument from NDINIT*
See b4a58fbf64 ("vfs: remove cn_thread")

Bump __FreeBSD_version to 1400043.
2021-11-25 22:50:42 +00:00
Mateusz Guzik
b4a58fbf64 vfs: remove cn_thread
It is always curthread.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D32453
2021-10-11 13:21:47 +00:00
Alan Somers
032a5bd55b fusefs: Fix a bug during VOP_STRATEGY when the server changes file size
If the FUSE server tells the kernel that a file's size has changed, then
the kernel must invalidate any portion of that file in cache.  But the
kernel can't do that during VOP_STRATEGY, because the file's buffers are
already locked.  Instead, proceed with the write.

PR:		256937
Reported by:	Agata <chogata@moosefs.pro>
Tested by:	Agata <chogata@moosefs.pro>
MFC after:	2 weeks
Reviewed by:	pfg
Differential Revision: https://reviews.freebsd.org/D32332
2021-10-06 14:07:33 -06:00
Alan Somers
7430017b99 fusefs: fix a recurse-on-non-recursive lockmgr panic
fuse_vnop_bmap needs to know the file's size in order to calculate the
optimum amount of readahead.  If the file's size is unknown, it must ask
the FUSE server.  But if the file's data was previously cached and the
server reports that its size has shrunk, fusefs must invalidate the
cached data.  That's not possible during VOP_BMAP because the buffer
object is already locked.

Fix the panic by not querying the FUSE server for the file's size during
VOP_BMAP if we don't need it.  That's also a a slight performance
optimization.

PR:		256937
Reported by:	Agata <chogata@moosefs.pro>
Tested by:	Agata <chogata@moosefs.pro>
MFC after:	2 weeks
2021-10-06 14:07:33 -06:00
Alan Somers
5d94aaacb5 fusefs: quiet some cache-related warnings
If the FUSE server does something that would make our cache incoherent,
we should print a warning to the user.  However, we previously warned in
some situations when we shouldn't, such as if the file's size changed on
the server _after_ our own attribute cache had expired.  This change
suppresses the warning in cases like that.  It also moves the warning
logic to a single place within the code.

PR:		256936
Reported by:	Agata <chogata@moosefs.pro>
Tested by:	Agata <chogata@moosefs.pro>, jSML4ThWwBID69YC@protonmail.com
MFC after:	2 weeks
2021-10-06 14:07:33 -06:00
Alan Somers
7124d2bc3a fusefs: implement FUSE_NO_OPEN_SUPPORT and FUSE_NO_OPENDIR_SUPPORT
For file systems that allow it, fusefs will skip FUSE_OPEN,
FUSE_RELEASE, FUSE_OPENDIR, and FUSE_RELEASEDIR operations, a minor
optimization.

MFC after:	2 weeks
Reviewed by:	pfg
Differential Revision: https://reviews.freebsd.org/D32141
2021-09-26 21:57:29 -06:00
Alan Somers
a3a1ce3794 fusefs: diff reduction in fuse_kernel.h
Synchronize formatting and documentation in fuse_kernel.h with upstream
sources.

MFC after:	2 weeks
Reviewed by:	pfg
Differential Revision:	https://reviews.freebsd.org/D32141
2021-09-26 21:57:07 -06:00
Alan Somers
4f917847c9 fusefs: don't panic if FUSE_GETATTR fails durint VOP_GETPAGES
During VOP_GETPAGES, fusefs needs to determine the file's length, which
could require a FUSE_GETATTR operation.  If that fails, it's better to
SIGBUS than panic.

MFC after:	1 week
Sponsored by:	Axcient
Reviewed by: 	markj, kib
Differential Revision: https://reviews.freebsd.org/D31994
2021-09-21 14:01:06 -06:00
Konstantin Belousov
197a4f29f3 buffer pager: allow get_blksize method to return error
Reported and reviewed by:	asomers
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31998
2021-09-17 20:29:55 +03:00
Alan Somers
18b19f8c6e fusefs: correctly set lock owner during FUSE_SETLK
During FUSE_SETLK, the owner field should uniquely identify the calling
process.  The fusefs module now sets it to the process's pid.
Previously, it expected the calling process to set it directly, which
was wrong.

libfuse also apparently expects the owner field to be set during
FUSE_GETLK, though I'm not sure why.

PR:		256005
Reported by:	Agata <chogata@moosefs.pro>
MFC after:	2 weeks
Reviewed by:	pfg
Differential Revision: https://reviews.freebsd.org/D30622
2021-06-25 20:40:08 -06:00
Alan Somers
5403f2c163 fusefs: ensure that FUSE ops' headers' unique values are actually unique
Every FUSE operation has a unique value in its header.  As the name
implies, these values are supposed to be unique among all outstanding
operations.  And since FUSE_INTERRUPT is asynchronous and racy, it is
desirable that the unique values be unique among all operations that are
"close in time".

Ensure that they are actually unique by incrementing them whenever we
reuse a fuse_dispatcher object, for example during fsync, write, and
listextattr.

PR:		244686
MFC after:	2 weeks
Reviewed by:	pfg
Differential Revision: https://reviews.freebsd.org/D30810
2021-06-19 14:45:29 -06:00
Alan Somers
b97c7abc1a fusefs: delete dead code
It was always dead, accidentally included in SVN r345876.

MFC after:	2 weeks
Reviewed by:	pfg
2021-06-19 14:45:04 -06:00
gAlfonso-bit
9b876fbd50 Simplify fuse_device_filt_write
It always returns 1, so why bother having a variable.

MFC after:	2 weeks
MFC with:	7b8622fa22
Pull Request:	https://github.com/freebsd/freebsd-src/pull/478
2021-06-16 15:54:24 -06:00
Alan Somers
7b8622fa22 fusefs: support EVFILT_WRITE on /dev/fuse
/dev/fuse is always ready for writing, so it's kind of dumb to poll it.
But some applications do it anyway.  Better to return ready than EINVAL.

MFC after:	2 weeks
Reviewed by:	emaste, pfg
Differential Revision: https://reviews.freebsd.org/D30784
2021-06-16 13:34:14 -06:00
Alan Somers
0b9a5c6fa1 fusefs: improve warnings about buggy FUSE servers
The fusefs driver will print warning messages about FUSE servers that
commit protocol violations.  Previously it would print those warnings on
every violation, but that could spam the console.  Now it will print
each warning no more than once per lifetime of the mount.  There is also
now a dtrace probe for each violation.

MFC after:	2 weeks
Sponsored by:	Axcient
Reviewed by:	emaste, pfg
Differential Revision: https://reviews.freebsd.org/D30780
2021-06-16 13:31:31 -06:00
Alan Somers
d63e6bc256 fusefs: delete dead code
Delete two fields in the per-mountpoint struct that have never been
used.

MFC after:	2 weeks
Sponsored by:	Axcient
2021-06-15 13:34:01 -06:00
Alan Somers
9c5aac8f2e fusefs: fix a dead store in fuse_vnop_advlock
kevans actually caught this in the original review and I fixed it, but
then I committed an older copy of the branch.  Whoops.

Reported by:	kevans
MFC after:	13 days
MFC with:	929acdb19a
Differential Revision:	https://reviews.freebsd.org/D29031
2021-03-19 19:38:57 -06:00
Alan Somers
929acdb19a fusefs: fix two bugs regarding fcntl file locks
1) F_SETLKW (blocking) operations would be sent to the FUSE server as
   F_SETLK (non-blocking).

2) Release operations, F_SETLK with lk_type = F_UNLCK, would simply
   return EINVAL.

PR:		253500
Reported by:	John Millikin <jmillikin@gmail.com>
MFC after:	2 weeks
2021-03-18 17:09:10 -06:00
Konstantin Belousov
2bfd8992c7 vnode: move write cluster support data to inodes.
The data is only needed by filesystems that
1. use buffer cache
2. utilize clustering write support.

Requested by:	mjg
Reviewed by:	asomers (previous version), fsu (ext2 parts), mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28679
2021-02-21 11:38:21 +02:00
Alan Somers
71befc3506 fusefs: set d_off during VOP_READDIR
This allows d_off to be used with lseek to position the file so that
getdirentries(2) will return the next entry.  It is not used by
readdir(3).

PR:		253411
Reported by:	John Millikin <jmillikin@gmail.com>
Reviewed by:	cem
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D28605
2021-02-12 21:50:52 -07:00
Alan Somers
17a82e6af8 Fix vnode locking bug in fuse_vnop_copy_file_range
MFC-With:	92bbfe1f0d
Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D27938
2021-01-03 11:16:20 -07:00
Alan Somers
34477e25c1 fusefs: only check vnode locks with DEBUG_VFS_LOCKS
MFC-With:	37df9d3bba
Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D27939
2021-01-03 09:19:00 -07:00
Alan Somers
542711e520 Fix a vnode locking bug in fuse_vnop_advlock.
Must lock the vnode before accessing the fufh table.  Also, check for
invalid parameters earlier.  Bug introduced by r346170.

MFC after:	2 weeks

Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D27936
2021-01-03 09:16:23 -07:00
Alan Somers
92bbfe1f0d fusefs: implement FUSE_COPY_FILE_RANGE.
This updates the FUSE protocol to 7.28, though most of the new features
are optional and are not yet implemented.

MFC after:	2 weeks
Relnotes:	yes
Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D27818
2021-01-01 10:18:23 -07:00
Alan Somers
37df9d3bba fusefs: update FUSE protocol to 7.24 and implement FUSE_LSEEK
FUSE_LSEEK reports holes on fuse file systems, and is used for example
by bsdtar.

MFC after:	2 weeks
Relnotes:	yes
Reviewed by:	cem
Differential Revision: https://reviews.freebsd.org/D27804
2020-12-31 08:51:47 -07:00
Alan Somers
4f4111d2c5 fusefs: delete some dead code
The original fusefs GSoC project seems to have envisioned exchanging two
types of messages with FUSE servers.  Perhaps vectored and non-vectored?
But in practice only one type has ever been used.  Delete the other type.

Reviewed by:		cem
Differential Revision:	https://reviews.freebsd.org/D27770
2020-12-28 19:05:35 +00:00
Konstantin Belousov
cd85379104 Make MAXPHYS tunable. Bump MAXPHYS to 1M.
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.

Make b_pages[] array in struct buf flexible.  Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*).  Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.

Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys.  Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight.  Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.

Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.

Suggested by: mav (*)
Reviewed by:	imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27225
2020-11-28 12:12:51 +00:00
Conrad Meyer
85078b8573 Split out cwd/root/jail, cmask state from filedesc table
No functional change intended.

Tracking these structures separately for each proc enables future work to
correctly emulate clone(2) in linux(4).

__FreeBSD_version is bumped (to 1300130) for consumption by, e.g., lsof.

Reviewed by:	kib
Discussed with:	markj, mjg
Differential Revision:	https://reviews.freebsd.org/D27037
2020-11-17 21:14:13 +00:00
Edward Tomasz Napierala
e3b1c847a4 Make it possible to mount a fuse filesystem, such as squashfuse,
from a Linux binary.  Should come handy for AppImages.

Reviewed by:	asomers
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26959
2020-11-09 08:53:15 +00:00
Mateusz Guzik
ab21ed17ed vfs: drop the de facto curthread argument from VOP_INACTIVE 2020-10-20 07:19:03 +00:00
Alan Somers
a62772a78e fusefs: fix mmap'd writes in direct_io mode
If a FUSE server returns FOPEN_DIRECT_IO in response to FUSE_OPEN, that
instructs the kernel to bypass the page cache for that file. This feature
is also known by libfuse's name: "direct_io".

However, when accessing a file via mmap, there is no possible way to bypass
the cache completely. This change fixes a deadlock that would happen when
an mmap'd write tried to invalidate a portion of the cache, wrongly assuming
that a write couldn't possibly come from cache if direct_io were set.

Arguably, we could instead disable mmap for files with FOPEN_DIRECT_IO set.
But allowing it is less likely to cause user complaints, and is more in
keeping with the spirit of open(2), where O_DIRECT instructs the kernel to
"reduce", not "eliminate" cache effects.

PR:		247276
Reported by:	trapexit@spawn.link
Reviewed by:	cem
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D26485
2020-09-24 16:27:53 +00:00
Mateusz Guzik
586ee69f09 fs: clean up empty lines in .c and .h files 2020-09-01 21:18:40 +00:00