* Don't always write the last page synchronously. That's not actually
required. It was probably just masking another bug that I fixed later,
possibly in r349021.
* Enable the NotifyWriteback tests now that Writeback cache is working.
* Add a test to ensure that the write cache isn't flushed synchronously when
in writeback mode.
Sponsored by: The FreeBSD Foundation
fusefs will now use cluster_read. This allows readahead of more than one
cache block. However, it won't yet actually cluster the reads because that
requires VOP_BMAP, which fusefs does not yet implement.
Sponsored by: The FreeBSD Foundation
rename the source to gsb_crc32.c.
This is a prerequisite of unifying kernel zlib instances.
PR: 229763
Submitted by: Yoshihiro Ota <ota at j.email.ne.jp>
Differential Revision: https://reviews.freebsd.org/D20193
fusefs will now read ahead at most one cache block at a time (usually 64
KB). Clustered reads are still TODO. Individual file systems may disable
read ahead by setting fuse_init_out.max_readahead=0 during initialization.
Sponsored by: The FreeBSD Foundation
Our fusefs(5) module supports three cache modes: uncached, write-through,
and write-back. However, the write-through mode (which is the default) has
never actually worked as its name suggests. Rather, it's always been more
like "write-around". It wrote directly, bypassing the cache. The cache
would only be populated by a subsequent read of the same data.
This commit fixes that problem. Now the write-through mode works as one
would expect: write(2) immediately adds data to the cache and then blocks
while the daemon processes the write operation.
A side effect of this change is that non-cache-block-aligned writes will now
incur a read-modify-write cycle of the cache block. The old behavior
(bypassing write cache entirely) can still be achieved by opening a file
with O_DIRECT.
PR: 237588
Sponsored by: The FreeBSD Foundation
Enable write clustering in fusefs whenever cache mode is set to writeback
and the "async" mount option is used. With default values for MAXPHYS,
DFLTPHYS, and the fuse max_write mount parameter, that means sequential
writes will now be written 128KB at a time instead of 64KB.
Also, add a regression test for PR 238565, a panic during unmount that
probably affects UFS, ext2, and msdosfs as well as fusefs.
PR: 238565
Sponsored by: The FreeBSD Foundation
An errant vfs_bio_clrbuf snuck in in r348931. Surprisingly, it doesn't have
any effect most of the time. But under some circumstances it cause the
buffer to behave in a write-only fashion.
Sponsored by: The FreeBSD Foundation
The current "writeback" cache mode, selected by the
vfs.fusefs.data_cache_mode sysctl, doesn't do writeback cacheing at all. It
merely goes through the motions of using buf(9), but then writes every
buffer synchronously. This commit:
* Enables delayed writes when the sysctl is set to writeback cacheing
* Fixes a cache-coherency problem when extending a file whose last page has
just been written.
* Removes the "sync" mount option, which had been set unconditionally.
* Adjusts some SDT probes
* Adds several new tests that mimic what fsx does but with more control and
without a real file system. As I discover failures with fsx, I add
regression tests to this file.
* Adds a test that ensures we can append to a file without reading any data
from it.
This change is still incomplete. Clustered writing is not yet supported,
and there are frequent "panic: vm_fault_hold: fault on nofault entry" panics
that I need to fix.
Sponsored by: The FreeBSD Foundation
fusefs's I/O methods were originally copy/pasted from nfsclient. This
commit removes some irrelevant parts, like stuff involving B_NEEDCOMMIT.
Sponsored by: The FreeBSD Foundation
Both filesystems do no use vnode_pager_dealloc() which would handle
this case otherwise. Nullfs because vnode vm_object handle never
points to nullfs vnode. Tmpfs because its vm_object is never vnode
object at all.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
In r348560 I thought that FUSE_EXPORT_SUPPORT was required for cases where
the node to be invalidated (or the parent of the entry to be invalidated)
wasn't cached. But I realize now that that's not the case. During entry
invalidation, if the parent isn't in the vfs hash table, then it must've
been reclaimed. And since fuse_vnop_reclaim does a cache_purge, that means
the entry to be invalidated has already been removed from the namecache.
And during inode invalidation, if the inode to be invalidated isn't in the
vfs hash table, then it too must've been reclaimed. In that case it will
have no buffer cache to invalidate.
Sponsored by: The FreeBSD Foundation
Protocol 7.12 adds a way for the server to notify the client that it should
invalidate an inode's data cache and/or attributes. This commit implements
that mechanism. Unlike Linux's implementation, ours requires that the file
system also supports FUSE_EXPORT_SUPPORT (NFS-style lookups). Otherwise the
invalidation operation will return EINVAL.
Sponsored by: The FreeBSD Foundation
Protocol 7.12 adds a way for the server to notify the client that it should
invalidate an entry from its name cache. This commit implements that
mechanism.
Sponsored by: The FreeBSD Foundation
FUSE allows entries to be cached for a limited amount of time. fusefs's
vnop_lookup method already implements that using the timeout functionality
of cache_lookup/cache_enter_time. However, lookups for the NFS server go
through a separate path: vfs_vget. That path can't use the same timeout
functionality because cache_lookup/cache_enter_time only work on pathnames,
whereas vfs_vget works by inode number.
This commit adds entry timeout information to the fuse vnode structure, and
checks it during vfs_vget. This allows the NFS server to take advantage of
cached entries. It's also the same path that FUSE's asynchronous cache
invalidation operations will use.
Sponsored by: The FreeBSD Foundation
This silly code segment has existed in the sources since it was brought
into FreeBSD 10 years ago. I honestly have no idea why this was done.
It was possible that I thought that it might have been better to not
set B_ASYNC for the "else" case, but I can't remember.
Anyhow, this patch gets rid of the if/else that does the same thing
either way, since it looks silly and upsets a static analyser.
This will have no semantic effect on the NFS client.
PR: 238167
This commit raises the protocol level and adds backwards-compatibility code
to handle structure size changes. It doesn't implement any new features.
The new features added in protocol 7.12 are:
* server-side umask processing (which FreeBSD won't do)
* asynchronous inode and directory entry invalidation (which I'll do next)
Sponsored by: The FreeBSD Foundation
Protocol 7.11 adds two new features, but neither of them were defined
correctly. FUSE_IOCTL messages don't work for 32-bit daemons on a 64-bit
host (fixed in protocol 7.16). FUSE_POLL is basically unusable until 7.21.
Before 7.21, the client can't choose which events to register for; the
client registers for "something" and the server replies to say which events
the client is registered for. Also, before 7.21 there was no way for a
client to deregister a file handle.
Sponsored by: The FreeBSD Foundation
This commit adds the definitions for protocol 7.11 but doesn't yet implement
the new features. The new features are optional, so they can come later.
Sponsored by: The FreeBSD Foundation
Protocol version 7.10 has only one new feature, and I'm choosing not to
implement it, so this commit is basically a noop. The sole new feature is
the FOPEN_NONSEEKABLE flag, which a fuse file system can return to indicate
that a certain file handle cannot be seeked. However, I'm unaware of any
file system in ports that uses this flag.
Sponsored by: The FreeBSD Foundation
Users of pseudofs (e.g. lindebugfs), should be able to receive
input from command line via commands like "echo 1 > /path/to/file".
Currently this fails because sh tries to truncate the file first and
vop_setattr returns not supported error for this. This patch simply
ignores the error and returns 0 instead.
Reviewed by: imp (mentor), asomers
Approved by: imp (mentor), asomers
MFC after: 1 week
Differential Revision: D20451
These fields are supposed to contain the file descriptor flags as supplied
to open(2) or set by fcntl(2). The feature is kindof useless on FreeBSD
since we don't supply all of these flags to fuse (because of the weak
relationship between struct file and struct vnode). But we should at least
set the access mode flags (O_RDONLY, etc).
This is the last fusefs change needed to get full protocol 7.9 support.
There are still a few options we don't support for good reason (mandatory
file locking is dumb, flock support is broken in the protocol until 7.17,
etc), but there's nothing else to do at this protocol level.
Sponsored by: The FreeBSD Foundation
If a FUSE file system sets the FUSE_POSIX_LOCKS flag then it can support
fcntl(2)-style locks directly. However, the protocol does not adequately
support flock(2)-style locks until revision 7.17. They must be implemented
locally in-kernel instead. This unfortunately breaks the interoperability
of fcntl(2) and flock(2) locks for file systems that support the former.
C'est la vie.
Prior to this commit flock(2) would get sent to the server as a
fcntl(2)-style lock with the lock owner field set to stack garbage.
Sponsored by: The FreeBSD Foundation
Protocol 7.9 adds this field. We could use it to store the file handle of
the file whose attributes we're requesting. However, that requires extra
work at runtime to look up a file handle, and I'm not aware of any file
systems that care. So it's easiest just to clear it.
Sponsored by: The FreeBSD Foundation
This bit tells the server that we're not sure which uid, gid, and/or pid
originated the write. I don't know of a single file system that cares, but
it's part of the protocol.
Sponsored by: The FreeBSD Foundation
* Only build the tests on platforms with C++14 support
* Fix an undefined symbol error on lint builds
* Remove an unused function: fiov_clear
Sponsored by: The FreeBSD Foundation
vtruncbuf takes a "struct ucred*" argument. AFAICT, it's been unused ever
since that function was first added in r34611. Remove it. Also, remove some
"struct ucred" arguments from fuse and nfs functions that were only used by
vtruncbuf.
Reviewed by: cem
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20377
If a daemon sets the FUSE_ASYNC_READ flag during initialization, then the
client is allowed to issue multiple concurrent reads for the same file
handle. Otherwise concurrent reads are not allowed. This commit implements
it. Previously we unconditionally disallowed concurrent reads.
Sponsored by: The FreeBSD Foundation
A previous commit made fuse exportable via userland NFS servers.
Compatibility with the in-kernel nfsd required two more changes:
* During read and write operations, implicitly do a FUSE_OPEN if there isn't
already a valid file handle. That's because nfsd never calls VOP_OPEN.
* During VOP_READDIR, if an implicit open was necessary, directory offsets
from a previous VOP_READDIR may not be valid, so VOP_READDIR may have to
start from the beginning and read until it encounters the requested
offset.
I've done only limited testing over NFS, so there are probably still some
more bugs. Thanks to rmacklem for all of the readdir changes, which he had
made for his pnfs work.
Sponsored by: The FreeBSD Foundation
This commit adds the VOPs needed by userspace NFS servers (tested with
net/unfs3). More work is needed to make the in-kernel nfsd work, because of
its stateless nature. It doesn't open files prior to doing I/O. Also, the
NFS-related VOPs currently ignore the entry cache.
Sponsored by: The FreeBSD Foundation
Consolidate all calls to fuse_vnode_setsize as a result of a file attribute
change to one location in fuse_internal_setattr. There are still a few
calls elsewhere that happen as a result of a write.
Sponsored by: The FreeBSD Foundation
When mounted with -o default_permissions and when
vfs.fusefs.data_cache_mode=2, fuse_io_strategy would try to clear the suid
bit after a successful write by a non-owner. When combined with a
not-yet-committed attribute-caching patch I'm working on, and if the
FUSE_SETATTR response indicates an unexpected filesize (legal, if the file
system has other clients), this would end up calling vtruncbuf. That would
panic, because the buffer lock was already held by bufwrite or bufstrategy
or something else upstack from fuse_vnop_strategy.
Sponsored by: The FreeBSD Foundation
In r347547 I intended to remove the vfs.fusefs.sync_resize sysctl, leaving
fusefs's behavior as though sync_resize had its default value. But I forgot
that I had already turned off sync_resize in my development system's
/etc/sysctl.conf.
This commit complete removes the optional behavior that was formerly
controlled by sync_resize. There's no need for explicitly calling
FUSE_SETATTR after every FUSE_WRITE that extends a file. The daemon can
infer that the file is being extended. If this sysctl was added as a
workaround for a buggy daemon, there's no clue as to what that daemon may
have been.
Sponsored by: The FreeBSD Foundation
Similar to r348026, exhaustive search for uses of CTRn() and cross reference
ktr.h includes. Where it was obvious that an OS compat header of some kind
included ktr.h indirectly, .c files were left alone. Some of these files
clearly got ktr.h via header pollution in some scenarios, or tinderbox would
not be passing prior to this revision, but go ahead and explicitly include it
in files using it anyway.
Like r348026, these CUs did not show up in tinderbox as missing the include.
Reported by: peterj (arm64/mp_machdep.c)
X-MFC-With: r347984
Sponsored by: Dell EMC Isilon
The kernel can't tell whether or not a fuse file system is truly local. But
what really matters is two things:
1) Can I/O to a file system block indefinitely?
2) Can the file system bypass the O_BENEATH restriction during lookup?
For fuse, the answer to both of those question is yes. So as far as the
kernel is concerned, it's a non-local file system.
Sponsored by: The FreeBSD Foundation
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.
EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).
As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions. The remainder of the patch addresses
adding appropriate includes to fix those files.
LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).
No functional change (intended). Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.
If a user sets both atime and mtime to UTIME_NOW when calling a syscall like
utimensat(2), allow the server to choose what "now" means. Due to the
design of FreeBSD's VFS, it's not possible to do this for just one of atime
or mtime; it's all or none.
PR: 237181
Sponsored by: The FreeBSD Foundation
If the server sets fuse_attr.blksize to a nonzero value in the response to
FUSE_GETATTR, then the client should use that as the value for
stat.st_blksize .
Sponsored by: The FreeBSD Foundation
This commit upgrades the FUSE API to protocol 7.9 and adds unit tests for
backwards compatibility with servers built for version 7.8. It doesn't
implement any of 7.9's new features yet.
Sponsored by: The FreeBSD Foundation
fuse_kernel.h defines the structures used by the FUSE protocol. Originally
it came from libfuse, but the current source of truth is the Linux kernel.
This commit minimizes the diffs between our version and the Linux version as
of 21f3da95d (protocol version 7.8).
The flags field of struct fuse_listxattr_out and fuse_listxattr_in was an
error in our header. Those fields don't exist in Linux or libfuse, and
they've never been used in FreeBSD. In fact, those structs don't even exist
in Linux and libfuse; those projects confusingly overload the identical
fuse_getexattr_in and fuse_getxattr_out structs.
Sponsored by: The FreeBSD Foundation
fuse_vnode_data.filesize was mostly redundant with
fuse_vnode_data.cached_attrs.st_size, but didn't have exactly the same
meaning. It was very confusing. This commit eliminates the former. It
also eliminates fuse_vnode_refreshsize, which ignored the cache timeout
value.
Sponsored by: The FreeBSD Foundation
fuse_vnode_refreshsize was using 0 as a flag value for filesize meaning
"uninitialized" (thanks to the malloc(...M_ZERO) in fuse_vnode_alloc. But
this led to unnecessary getattr operations when the filesize legitimately
happened to be zero. Fix by adding a distinct flag value.
Sponsored by: The FreeBSD Foundation
This sysctl was added > 6.5 years ago and I don't know why. The description
seems at odds with the code. While it's supposed to "discard clean cached
data" during VOP_INACTIVE, it looks like it would discard any cached data,
clean or otherwise.
Sponsored by: The FreeBSD Foundation
This sysctl was added > 6.5 years ago for no clear reason. Perhaps it was
intended to gate an unstable feature? But now there's no reason to globally
disable mmap. I'm not deleting the -ono_mmap mount option just yet, because
it might be useful as a workaround for bug 237588.
Sponsored by: The FreeBSD Foundation
This was added > 6.5 years ago with no evident reason why. It probably had
something to do with the incomplete cached attribute implementation. But
cache attributes work now. I see no reason to retain this sysctl.
Sponsored by: The FreeBSD Foundation
This sysctl was added > 6.5 years ago for no clear purpose. I'm guessing
that it may have had something to do with the incomplete attribute cache.
But the attribute cache works now. Since there's no clear motivation for
this sysctl, it's best to remove it.
Sponsored by: The FreeBSD Foundation
This looks like it may have been a workaround for a specific buggy FUSE
filesystem. However, there's no information about what that bug may have
been, and the workaround is > 6.5 years old, so I consider the sysctl to be
unmaintainable.
Sponsored by: The FreeBSD Foundation
Remove the "sync_unmount" and "init_backgrounded" sysctls and the associated
options from mount_fusefs. Add no backwards-compatibility hidden options to
mount_fusefs because these options never had any effect, and are therefore
unlikely to be used.
Sponsored by: The FreeBSD Foundation
Just like /dev/devctl, /dev/fuse will now report the number of operations
available for immediate read in the kevent.data field during kevent(2).
Sponsored by: The FreeBSD Foundation
/dev/fuse was already pollable with poll and select. Add support for
kqueue, too. And add tests for polling with poll, select, and kqueue.
Sponsored by: The FreeBSD Foundation
If the daemon dies, return ENOTCONN for all operations that have already
been sent to the daemon, as well as any new ones.
Sponsored by: The FreeBSD Foundation
If the daemon is known to ignore FUSE_INTERRUPT, then we may as well block
all signals while waiting for a response.
Sponsored by: The FreeBSD Foundation
When a FUSE daemon dies or closes /dev/fuse, all of that daemon's pending
requests must be terminated. Previously that was done in /dev/fuse's
.d_close method. However, d_close only gets called on the *last* close of
the device. That means that if multiple daemons were running concurrently,
all but the last daemon to close would leave their I/O hanging around. The
problem was easily visible just by running "kyua -v parallelism=2 test" in
fusefs's test directory.
Fix this bug by terminating a daemon's pending I/O during /dev/fuse's
cdvpriv dtor method instead. That method runs on every close of a file.
Also, fix some potential races in the tests:
* Clear SA_RESTART when registering the daemon's signal handler so read(2)
will return EINTR.
* Wait for the daemon to die before unmounting the mountpoint, so we won't
see an unwanted FUSE_DESTROY operation in the mock file system.
Sponsored by: The FreeBSD Foundation
libfuse expects sockets to be created with FUSE_MKNOD, not FUSE_CREATE,
because that's how Linux does it. My first attempt at creating sockets
(r346894) used FUSE_CREATE because FreeBSD uses VOP_CREATE for this purpose.
There are no backwards-compatibility concerns with this change, because
socket support hasn't yet been merged to head.
Sponsored by: The FreeBSD Foundation
Any change to a directory's contents should cause its mtime and ctime to be
updated by the FUSE daemon. Clear its attribute cache so we'll get the new
attributs the next time that they're needed. This affects the following
VOPs: VOP_CREATE, VOP_LINK, VOP_MKDIR, VOP_MKNOD, VOP_REMOVE, VOP_RMDIR, and
VOP_SYMLINK
Reported by: pjdfstest
Sponsored by: The FreeBSD Foundation
If the file to be renamed is a directory and it's going to get a new parent,
then the user must have write permissions to that directory, because the
".." dirent must be changed.
Reported by: pjdfstest
Sponsored by: The FreeBSD Foundation
FUSE_LINK returns a new set of attributes. fusefs should cache them just
like it does during other VOPs. This is not only a matter of performance
but of correctness too; without caching the new attributes the vnode's nlink
value would be out-of-date.
Reported by: pjdfstest
Sponsored by: The FreeBSD Foundation
Even an unprivileged user should be able to chown a file to its current
owner, or chgrp it to its current group. Those are no-ops.
Reported by: pjdfstest
Sponsored by: The FreeBSD Foundation
fuse file systems have far too much variability for the standard
posix_fallocate implementation to work. A future protocol revision (7.19)
adds a FUSE_FALLOCATE operation, but we don't support that yet. Better to
simply return EINVAL until then.
Reported by: pjdfstest
Sponsored by: The FreeBSD Foundation
ftruncate should succeed as long as the file descriptor is writable, even if
the file doesn't have write permission. This is important when combined
with O_CREAT.
Reported by: pjdfstest
Sponsored by: The FreeBSD Foundation
Don't allow unprivileged users to set SGID on files to whose group they
don't belong. This is slightly different than what POSIX says we should do
(clear sgid on return from a successful chmod), but it matches what UFS
currently does.
Reported by: pjdfstest
Sponsored by: The FreeBSD Foundation
The readonly mount check had a special case allowing the sizes of files to
be changed if they weren't regular files. I don't know why. Neither UFS,
ZFS, nor ext2 have such a special case, and I don't know when you would ever
change the size of a non-regular file anyway.
Sponsored by: The FreeBSD Foundation
The more appropriate place to do the flushing is VOP_OPEN(). This was
uncovered because VOP_SET_TEXT() is now called with the vnode'
vm_object rlocked, which is incompatible with the flush operations.
After the move, there is no need for NFS-specific VOP_SET_TEXT
overload.
Sponsored by: The FreeBSD Foundation
MFC after: 30 days
kern_execve() locks text vnode exclusive to be able to set and clear
VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0
condition.
The change removes VV_TEXT, replacing it with the condition
v_writecount <= -1, and puts v_writecount under the vnode interlock.
Each text reference decrements v_writecount. To clear the text
reference when the segment is unmapped, it is recorded in the
vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and
v_writecount is incremented on the map entry removal
The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that
v_writecount does not contradict the desired change. vn_writecheck()
is now racy and its use was eliminated everywhere except access.
Atomic check for writeability and increment of v_writecount is
performed by the VOP. vn_truncate() now increments v_writecount
around VOP_SETATTR() call, lack of which is arguably a bug on its own.
nullfs bypasses v_writecount to the lower vnode always, so nullfs
vnode has its own v_writecount correct, and lower vnode gets all
references, since object->handle is always lower vnode.
On the text vnode' vm object dealloc, the v_writecount value is reset
to zero, and deadfs vop_unset_text short-circuit the operation.
Reclamation of lowervp always reclaims all nullfs vnodes referencing
lowervp first, so no stray references are left.
Reviewed by: markj, trasz
Tested by: mjg, pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D19923
These panics all lie in the error path. The only one I've hit is caused by
a buggy FUSE server unexpectedly changing the type of a vnode.
Sponsored by: The FreeBSD Foundation
When mounted with -o default_permissions fusefs is supposed to validate all
permissions in the kernel, not the file system. This commit fixes two
permissions that I had previously overlooked.
* Only root may chown a file
* Non-root users may only chgrp a file to a group to which they belong
PR: 216391
Sponsored by: The FreeBSD Foundation
As of r346162 fuse now invalidates the cache during writes. But it can't do
that when writing from VOP_PUTPAGES, because the write is coming _from_ the
cache. Trying to invalidate the cache in that situation causes a deadlock
in vm_object_page_remove, because the pages in question have already been
busied by the same thread.
PR: 235774
Sponsored by: The FreeBSD Foundation
Though it's not documented, Linux will interpret a FUSE_INTERRUPT response
of ENOSYS as "the file system does not support FUSE_INTERRUPT".
Subsequently it will never send FUSE_INTERRUPT again to the same mount
point. This change matches Linux's behavior.
PR: 346357
Sponsored by: The FreeBSD Foundation
* Block stop signals in fticket_wait_answer
* Hold ps_mtx while checking signal disposition
* style(9) changes
PR: 346357
Reported by: kib
Sponsored by: The FreeBSD Foundation
The main difference is to replace some custom logic with bread. No
functional change at this point, but this is one step towards adding
readahead.
Sponsored by: The FreeBSD Foundation
I do not know of an extant NFSv4.1 client that currently does a Setattr
operation for the ModeSetMasked, but it has been discussed on the linux-nfs
mailing list.
This patch adds support for doing a Setattr of ModeSetMasked, so that it
will work for any future NFSv4.1 client that chooses to do so.
Tested via a hacked FreeBSD NFSv4.1 client.
MFC after: 2 weeks
At the time of this nfsv4_sattr() call, "vp == NULL", so this patch doesn't
change the semantics, but I think it makes the code more readable.
It also makes it consistent with the nfsv4_sattr() call a few lines above
this one. Found during code inspection.
MFC after: 2 weeks
When interrupting a FUSE operation, send the FUSE_INTERRUPT op to the daemon
ASAP, ahead of other unrelated operations.
PR: 236530
Sponsored by: The FreeBSD Foundation
fusefs's VOP_SETEXTATTR calls uiomove(9) before blocking, so it can't be
restarted. It must be interrupted instead.
PR: 236530
Sponsored by: The FreeBSD Foundation
If a pending FUSE operation hasn't yet been sent to the daemon, then there's
no reason to inform the daemon that it's been interrupted. Instead, simply
remove it from the fuse message queue and set its status to EINTR or
ERESTART as appropriate.
PR: 346357
Sponsored by: The FreeBSD Foundation
During inspection of a packet trace, I noticed that an NFSv4.0 mount
reported that it supported attributes that are only defined for NFSv4.1.
In practice, this bug appears to be benign, since NFSv4.0 clients will
not use attributes that were added for NFSv4.1.
However, this was not correct and this patch fixes the NFSv4.0 server
so that it only supports attributes defined for NFSv4.0.
It also adds a definition for NFSv4.1 attributes that can only be set,
although it is only defined as 0 for now.
This is anticipation of the addition of support for the NFSv4.1 mode+mask
attribute soon.
MFC after: 2 weeks
* If a process receives a fatal signal while blocked on a fuse operation,
return ASAP without waiting for the operation to complete. But still send
the FUSE_INTERRUPT op to the daemon.
* Plug memory leaks from r346339
Interruptibility is now fully functional, but it could be better:
* Operations that haven't been sent to the server yet should be aborted
without sending FUSE_INTERRUPT.
* It would be great if write operations could be made restartable.
That would require delaying uiomove until the last possible moment, which
would be sometime during fuse_device_read.
* It would be nice if we didn't have to guess which EAGAIN responses were
for FUSE_INTERRUPT operations.
PR: 236530
Sponsored by: The FreeBSD Foundation
compat mode or not. This is useful when implementing compatibility ioctl(2)
handlers in userspace.
MFC after: 1 week
Sponsored by: Mellanox Technologies
The fuse protocol includes a FUSE_INTERRUPT operation that the client can
send to the server to indicate that it wants to abort an in-progress
operation. It's required to interrupt any syscall that is blocking on a
fuse operation.
This commit adds basic FUSE_INTERRUPT support. If a process receives any
signal while it's blocking on a FUSE operation, it will send a
FUSE_INTERRUPT and wait for the original operation to complete. But there
is still much to do:
* The current code will leak memory if the server ignores FUSE_INTERRUPT,
which many do. It will also leak memory if the server completes the
original operation before it receives the FUSE_INTERRUPT.
* An interrupted read(2) will incorrectly appear to be successful.
* fusefs should return immediately for fatal signals.
* Operations that haven't been sent to the server yet should be aborted
without sending FUSE_INTERRUPT.
* Test coverage should be better.
* It would be great if write operations could be made restartable.
That would require delaying uiomove until the last possible moment, which
would be sometime during fuse_device_read.
PR: 236530
Sponsored by: The FreeBSD Foundation
r340744 broke the NFSv4 client, because it replaced pfind_locked() with a
call to pfind(), since pfind() acquires the sx lock for the pid hash and
the NFSv4 already holds a mutex when it does the call.
The patch fixes the problem by recreating a pfind_any_locked() and adding the
functions pidhash_slockall() and pidhash_sunlockall to acquire/release
all of the pid hash locks.
These functions are then used by the NFSv4 client instead of acquiring
the allproc_lock and calling pfind().
Reviewed by: kib, mjg
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D19887
PR#223036 reported that INET6 callback addresses were not printed by
nfsdumpstate(8). This kernel patch adds INET6 addresses to the dump structure,
so that nfsdumpstate(8) can print them out, post-r346190.
The patch also includes the addition of #ifdef INET, INET6 as requested
by bz@.
PR: 223036
Reviewed by: bz, rgrimes
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D19839
fusefs's default cache mode is "writethrough", although it currently works
more like "write-around"; writes bypass the cache completely. Since writes
bypass the cache, they were leaving stale previously-read data in the cache.
This commit invalidates that stale data. It also adds a new global
v_inval_buf_range method, like vtruncbuf but for a range of a file.
PR: 235774
Reported by: cem
Sponsored by: The FreeBSD Foundation
Otherwise we might dereference NULL vp->v_data after
VP_TO_TMPFS_NODE().
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
For many FUSE opcodes, an error of ENOSYS has special meaning. fusefs
already handled some of those; this commit adds handling for the remainder:
* FUSE_FSYNC, FUSE_FSYNCDIR: ENOSYS means "success, and automatically return
success without calling the daemon from now on"
* All extattr operations: ENOSYS means "fail EOPNOTSUPP, and automatically
do it without calling the daemon from now on"
PR: 236557
Sponsored by: The FreeBSD Foundation
The fuse protocol is designed with security in mind. It prevents users from
spying on each others' activities. And it doesn't grant users any
privileges that they didn't already have. So it's appropriate to make it
available to everyone. Plus, it's necessary in order for kyua to run tests
as an unprivileged user.
Sponsored by: The FreeBSD Foundation
fusefs tracks each vnode's parent. The rename code was already correctly
updating it. Delete a comment that said otherwise, and add a regression
test for it.
Sponsored by: The FreeBSD Foundation
Don't panic if the server changes the file type of a file without us first
deleting it. That could indicate a buggy server, but it could also be the
result of one of several race conditions. Return EAGAIN as we do elsewhere.
Sponsored by: The FreeBSD Foundation
When the entry cache expires, it's only necessary to purge the cache.
Disappearing a vnode also purges the attribute cache, which is unnecessary,
and invalidates the data cache, which could be harmful.
Sponsored by: The FreeBSD Foundation
I got most of -o default_permissions working in r346088. This commit adds
sticky bit checks. One downside is that sometimes there will be an extra
FUSE_GETATTR call for the parent directory during unlink or rename. But in
actual use I think those attributes will almost always be cached.
PR: 216391
Sponsored by: The FreeBSD Foundation
fuse_vnop_lookup was using a FUSE_GETATTR operation when looking up "." and
"..", even though the only information it needed was the file type and file
size. "." and ".." are obviously always going to be directories; there's no
need to double check.
Sponsored by: The FreeBSD Foundation
fuse_vnop_lookup contained an awkward hack meant to reduce daemon activity
during long lookup chains. However, the hack is no longer necessary now
that we properly cache file attributes. Also, I'm 99% certain that it
could've bypassed permission checks when using openat to open a file
relative to a directory that lacks execute permission.
Sponsored by: The FreeBSD Foundation
* Eliminate fuse_access_param. Whatever it was supposed to do, it seems
like it was never complete. The only real function it ever seems to have
had was a minor performance optimization, which I've already eliminated.
* Make extended attribute operations obey the allow_other mount option.
* Allow unprivileged access to the SYSTEM extattr namespace when
-o default_permissions is not in use.
* Disallow setextattr and deleteextattr on read-only mounts.
* Add tests for a few more error cases.
Sponsored by: The FreeBSD Foundation
Normally all permission checking is done in the fuse server. But when -o
default_permissions is used, it should be done in the kernel instead. This
commit adds appropriate permission checks through fusefs when -o
default_permissions is used. However, sticky bit checks aren't working yet.
I'll handle those in a follow-up commit.
There are no checks for file flags, because those aren't supported by our
version of the FUSE protocol. Nor is there any support for ACLs, though
that could be added if there were any demand.
PR: 216391
Reported by: hiyorin@gmail.com
Sponsored by: The FreeBSD Foundation
The FUSE protocol includes a way for a server to tell the client that a
negative lookup response is cacheable for a certain amount of time.
PR: 236226
Sponsored by: The FreeBSD Foundation
Provide a convenience function to avoid the hack with filling fake
struct vop_fsync_args and then calling vop_stdfsync().
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
If truncate(2) is performed on msdosfs file, which extends the file by
system-depended large amount, fs creates corresponding amount of dirty
delayed-write buffers, which can consume all buffers. Such buffers
cannot be flushed by the bufdaemon because the ftruncate() thread owns
the vnode lock. So the system runs out of free buffers, and even
truncate() thread starves, which means deadlock because it owns the
vnode lock.
Fix this by doing vnode fsync in extendfile() when low memory or low
buffers condition detected, which flushes all dirty buffers belonging
to the file being extended.
Note that the more usual fallback to bawrite() does not work
acceptable in this situation, because it would only allow one buffer
to be recycled. Other filesystems, most important UFS, do not allow
userspace to create arbitrary amount of dirty delayed-write buffers
without feedback, so bawrite() is good enough for them.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Follow-up to r346046. These two commits implement fuse cache timeouts for
both entries and attributes. They also remove the vfs.fusefs.lookup_cache
enable sysctl, which is no longer needed now that cache timeouts are
honored.
PR: 235773
Sponsored by: The FreeBSD Foundation
The FUSE protocol allows the server to specify the timeout period for the
client's attribute and entry caches. This commit implements the timeout
period for the attribute cache. The entry cache's timeout period is
currently disabled because it panics, and is guarded by the
vfs.fusefs.lookup_cache_expire sysctl.
PR: 235773
Reported by: cem
Sponsored by: The FreeBSD Foundation
FUSE_LOOKUP, FUSE_GETATTR, FUSE_SETATTR, FUSE_MKDIR, FUSE_LINK,
FUSE_SYMLINK, FUSE_MKNOD, and FUSE_CREATE all return file attributes with a
cache validity period. fusefs will now cache the attributes, if the server
returns a non-zero cache validity period.
This change does _not_ implement finite attr cache timeouts. That will
follow as part of PR 235773.
PR: 235775
Reported by: cem
Sponsored by: The FreeBSD Foundation
The kernel code uses UDP to do upcalls to the nfsuserd(8) daemon to get
updates to the username<->uid and groupname<->gid mappings.
A change to AF_LOCAL last year had to be reverted, since it could result
in vnode locking issues on the AF_LOCAL socket.
This patch adds INET6 support and the required #ifdef INET and INET6
to the code.
Requested by: bz
PR: 205193
Reviewed by: bz, rgrimes
MFC after: 2 weeks
Differential Revision: http://reviews.freebsd.org/D19218
Don't page fault if the file descriptor provided with "-o fd" is invalid.
This is a merge of r345419 from the projects/fuse2 branch.
Reviewed by: ngie
Tested by: Marek Zarychta <zarychtam@plan-b.pwste.edu.pl>
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19836
VOP_ACCESS was never fully implemented in fusefs. This change:
* Removes the FACCESS_DO_ACCESS flag, which pretty much disabled the whole
vop.
* Removes a quixotic special case for VEXEC on regular files. I don't know
why that was in there.
* Removes another confusing special case for VADMIN.
* Removes the FACCESS_NOCHECKSPY flag. It seemed to be a performance
optimization, but I'm unconvinced that it was a net positive.
* Updates test cases.
This change does NOT implement -o default_permissions. That will be handled
separately.
PR: 236291
Sponsored by: The FreeBSD Foundation
When -o allow_other is not in use, fusefs is supposed to prevent access to
the filesystem by any user other than the one who owns the daemon. Our
fusefs implementation was only enforcing that restriction at the mountpoint
itself. That was usually good enough because lookup usually descends from
the mountpoint. However, there are cases when it doesn't, such as when
using openat relative to a file beneath the mountpoint.
PR: 237052
Sponsored by: The FreeBSD Foundation
r338192 reverted the changes to nfsuserd so that it could use an AF_LOCAL
socket, since it resulted in a vnode locking panic().
Post r338192 nfsuserd daemons use the old AF_INET socket for upcalls and
do not use these kernel changes.
I left them in for a while, so that nfsuserd daemons built from head sources
between r320757 (Jul. 6, 2017) and r338192 (Aug. 22, 2018) would need them
by default.
This only affects head, since the changes were never MFC'd.
I will add an UPDATING entry, since an nfsuserd daemon built from head
sources between r320757 and r338192 will not run unless the "-use-udpsock"
option is specified. (This command line option is only in the affected
revisions of the nfsuserd daemon.)
I suspect few will be affected by this, since most who run systems built
from head sources (not stable or releases) will have rebuilt their nfsuserd
daemon from sources post r338192 (Aug. 22, 2018)
This is being reverted in preparation for an update to include AF_INET6
support to the code.
If a fuse file system returne FOPEN_KEEP_CACHE in the open or create
response, then the client is supposed to _not_ clear its caches for that
file. I don't know why clearing the caches would be the default given that
there's a separate flag to bypass the cache altogether, but that's the way
it is. fusefs(5) will now honor this flag.
Our behavior is slightly different than Linux's because we reuse file
handles. That means that open(2) wont't clear the cache if there's a
reusable file handle, even if the file server wouldn't have sent
FOPEN_KEEP_CACHE had we opened a new file handle like Linux does.
PR: 236560
Sponsored by: The FreeBSD Foundation
This bug was long present, but was exacerbated by r345876.
The problem is that fiov_refresh was bzero()ing a buffer _before_ it
reallocated that buffer. That's obviously the wrong order. I fixed the
order in r345876, which exposed the main problem. Previously, the first 160
bytes of the buffer were getting bzero()ed when it was first allocated in
fiov_init. Subsequently, as that buffer got recycled between callers, the
portion used by the _previous_ caller was getting bzero()ed by the current
caller in fiov_refresh. The problem was never visible simply because no
caller was trying to use more than 160 bytes.
Now the buffer gets properly bzero()ed both at initialization time and any
time it gets enlarged or reallocated.
Sponsored by: The FreeBSD Foundation
There are a few places that use hand crafted versions of the macros
from sys/netinet/in.h making it difficult to actually alter the
values in use by these macros. Correct that by replacing handcrafted
code with proper macro usage.
Reviewed by: karels, kristof
Approved by: bde (mentor)
MFC after: 3 weeks
Sponsored by: John Gilmore
Differential Revision: https://reviews.freebsd.org/D19317
If a FUSE daemon returns FOPEN_DIRECT_IO when a file is opened, then it's
allowed to write less data than was requested during a FUSE_WRITE operation
on that file handle. fusefs should simply return a short write to userland.
The old code attempted to resend the unsent data. Not only was that
incorrect behavior, but it did it in an ineffective way, by attempting to
"rewind" the uio and uiomove the unsent data again.
This commit correctly handles short writes by returning directly to
userland if FOPEN_DIRECT_IO was set. If it wasn't set (making the short
write technically a protocol violation), then we resend the unsent data.
But instead of rewinding the uio, just resend the data that's already in the
kernel.
That necessitated a few changes to fuse_ipc.c to reduce the amount of bzero
activity. fusefs may be marginally faster as a result.
PR: 236381
Sponsored by: The FreeBSD Foundation
The original fusefs import, r238402, contained a bug in fuse_vnop_close that
could close a directory's file handle while there were still other open file
descriptors. The code looks deliberate, but there is no explanation for it.
This necessitated a workaround in fuse_vnop_readdir that would open a new
file handle if, "for some mysterious reason", that vnode didn't have any
open file handles. r345781 had the effect of causing the workaround to
panic, making the problem more visible.
This commit removes the workaround and the original bug, which also fixes
the panic.
Sponsored by: The FreeBSD Foundation
The FUSE protocol says that FUSE_FLUSH should be send every time a file
descriptor is closed. That's not quite possible in FreeBSD because multiple
file descriptors can share a single struct file, and closef doesn't call
fo_close until the last close. However, we can still send FUSE_FLUSH on
every VOP_CLOSE, which is probably good enough.
There are two purposes for FUSE_FLUSH. One is to allow file systems to
return EIO if they have an error when writing data that's cached
server-side. The other is to release POSIX file locks (which fusefs(5) does
not yet support).
PR: 236405, 236327
Sponsored by: The FreeBSD Foundation
Despite the call to vtruncbuf() from detrunc(), which results in
zeroing part of the partial page after EOF, there still is a
possibility to retain the stale data which is revived on file
enlargement. If the filesystem block size is greater than the page
size, partial block might keep other after-EOF pages wired and they
get reused then. Fix it by zeroing whole part of the partial buffer
after EOF, not relying on vnode_pager_setsize().
PR: 236977
Reported by: asomers
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
During truncate, fusefs was discarding entire cached blocks, but it wasn't
zeroing out the unused portion of a final partial block. This resulted in
reads returning stale data.
PR: 233783
Reported by: fsx
Sponsored by: The FreeBSD Foundation
This commit cleans up after recent commits, especially 345766, 345768, and
345781. There is no functional change. The most important change is to add
comments documenting why we can't send flags like O_APPEND in
FUSE_WRITE_OPEN.
PR: 236340
Sponsored by: The FreeBSD Foundation
In particular:
- suspend the mount around vflush() to avoid new writes come after the
vnode is processed;
- flush pending metadata updates (mostly node times);
- remap all rw mappings of files from the mount into ro.
It is not clear to me how to handle writeable mappings on rw->ro for
tmpfs best. Other filesystems, which use vnode vm object, call
vgone() on vnodes with writers, which sets the vm object type to
OBJT_DEAD, and keep the resident pages and installed ptes as is. In
particular, the existing mappings continue to work as far as
application only accesses resident pages, but changes are not flushed
to file.
For tmpfs the vm object of VREG vnodes also serves as the data pages
container, giving single copy of the mapped pages, so it cannot be set
to OBJT_DEAD. Alternatives for making rw mappings ro could be either
invalidating them at all, or marking as CoW.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D19737
In particular, this fixes atimes still changing for ro tmpfs.
tmpfs_set_status() gains tmpfs_mount * argument.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D19737
By default, FUSE performs authorization in the server. That means that it's
insecure for the client to reuse FUSE file handles between different users,
groups, or processes. Linux handles this problem by creating a different
FUSE file handle for every file descriptor. FreeBSD can't, due to
differences in our VFS design.
This commit adds credential information to each fuse_filehandle. During
open(2), fusefs will now only reuse a file handle if it matches the exact
same access mode, pid, uid, and gid of the calling process.
PR: 236844
Sponsored by: The FreeBSD Foundation
O_EXEC is useful for fexecve(2) and fchdir(2). Treat it as another fufh
type alongside the existing RDONLY, WRONLY, and RDWR. Prior to r345742 this
would've caused a memory and performance penalty.
PR: 236329
Sponsored by: The FreeBSD Foundation
r345742 replaced fusefs's fufh array with a fufh list. But it left a few
array idioms in place. This commit replaces those idioms with more
efficient list idioms. One location is in fuse_filehandle_close, which now
takes a pointer argument. Three other locations are places that had to loop
over all of a vnode's fuse filehandles.
Sponsored by: The FreeBSD Foundation