Commit Graph

4234 Commits

Author SHA1 Message Date
Alan Somers
d311d6c467 fusefs: eliminate a superfluous fuse_node_setparent
Sponsored by:	The FreeBSD Foundation
2019-05-20 20:55:01 +00:00
Alan Somers
6f8114adcc fusefs: unset MNT_LOCAL
The kernel can't tell whether or not a fuse file system is truly local.  But
what really matters is two things:

1) Can I/O to a file system block indefinitely?
2) Can the file system bypass the O_BENEATH restriction during lookup?

For fuse, the answer to both of those question is yes.  So as far as the
kernel is concerned, it's a non-local file system.

Sponsored by:	The FreeBSD Foundation
2019-05-20 20:54:09 +00:00
Conrad Meyer
e2e050c8ef Extract eventfilter declarations to sys/_eventfilter.h
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.

EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).

As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions.  The remainder of the patch addresses
adding appropriate includes to fix those files.

LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).

No functional change (intended).  Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed.  __FreeBSD_version has been bumped.
2019-05-20 00:38:23 +00:00
Alan Somers
fe221e0177 fusefs: forward UTIME_NOW to the server
If a user sets both atime and mtime to UTIME_NOW when calling a syscall like
utimensat(2), allow the server to choose what "now" means.  Due to the
design of FreeBSD's VFS, it's not possible to do this for just one of atime
or mtime; it's all or none.

PR:		237181
Sponsored by:	The FreeBSD Foundation
2019-05-16 23:17:39 +00:00
Alan Somers
e7f73af118 fusefs: allow the server to specify st_blksize
If the server sets fuse_attr.blksize to a nonzero value in the response to
FUSE_GETATTR, then the client should use that as the value for
stat.st_blksize .

Sponsored by:	The FreeBSD Foundation
2019-05-16 22:50:04 +00:00
Alan Somers
16bd2d47c7 fusefs: Upgrade FUSE protocol to version 7.9.
This commit upgrades the FUSE API to protocol 7.9 and adds unit tests for
backwards compatibility with servers built for version 7.8.  It doesn't
implement any of 7.9's new features yet.

Sponsored by:	The FreeBSD Foundation
2019-05-16 17:24:11 +00:00
Alan Somers
96192dfce0 fusefs: diff reduction vs the upstream sources
fuse_kernel.h defines the structures used by the FUSE protocol.  Originally
it came from libfuse, but the current source of truth is the Linux kernel.
This commit minimizes the diffs between our version and the Linux version as
of 21f3da95d (protocol version 7.8).

The flags field of struct fuse_listxattr_out and fuse_listxattr_in was an
error in our header.  Those fields don't exist in Linux or libfuse, and
they've never been used in FreeBSD.  In fact, those structs don't even exist
in Linux and libfuse; those projects confusingly overload the identical
fuse_getexattr_in and fuse_getxattr_out structs.

Sponsored by:	The FreeBSD Foundation
2019-05-15 22:51:25 +00:00
Alan Somers
3d15b234a4 fusefs: don't track a file's size in two places
fuse_vnode_data.filesize was mostly redundant with
fuse_vnode_data.cached_attrs.st_size, but didn't have exactly the same
meaning.  It was very confusing.  This commit eliminates the former.  It
also eliminates fuse_vnode_refreshsize, which ignored the cache timeout
value.

Sponsored by:	The FreeBSD Foundation
2019-05-15 00:38:52 +00:00
Alan Somers
96658124d7 fusefs: eliminate superfluous FUSE_GETATTR when filesize=0
fuse_vnode_refreshsize was using 0 as a flag value for filesize meaning
"uninitialized" (thanks to the malloc(...M_ZERO) in fuse_vnode_alloc.  But
this led to unnecessary getattr operations when the filesize legitimately
happened to be zero.  Fix by adding a distinct flag value.

Sponsored by:	The FreeBSD Foundation
2019-05-13 23:30:06 +00:00
Alan Somers
5940f822ae fusefs: remove the vfs.fusefs.data_cache_invalidate sysctl
This sysctl was added > 6.5 years ago and I don't know why.  The description
seems at odds with the code.  While it's supposed to "discard clean cached
data" during VOP_INACTIVE, it looks like it would discard any cached data,
clean or otherwise.

Sponsored by:	The FreeBSD Foundation
2019-05-13 20:57:21 +00:00
Alan Somers
fcefa6ef66 fusefs: remove the vfs.fusefs.mmap_enable sysctl
This sysctl was added > 6.5 years ago for no clear reason.  Perhaps it was
intended to gate an unstable feature?  But now there's no reason to globally
disable mmap.  I'm not deleting the -ono_mmap mount option just yet, because
it might be useful as a workaround for bug 237588.

Sponsored by:	The FreeBSD Foundation
2019-05-13 20:42:09 +00:00
Alan Somers
515183969d fusefs: remove the vfs.fusefs.refresh_size sysctl
This was added > 6.5 years ago with no evident reason why.  It probably had
something to do with the incomplete cached attribute implementation.  But
cache attributes work now.  I see no reason to retain this sysctl.

Sponsored by:	The FreeBSD Foundation
2019-05-13 20:31:10 +00:00
Alan Somers
4d09e76a73 fusefs: remove the vfs.fusefs.sync_resize syctl
This sysctl was added > 6.5 years ago for no clear purpose.  I'm guessing
that it may have had something to do with the incomplete attribute cache.
But the attribute cache works now.  Since there's no clear motivation for
this sysctl, it's best to remove it.

Sponsored by:	The FreeBSD Foundation
2019-05-13 19:47:31 +00:00
Alan Somers
bad4c94dc8 fusefs: remove the vfs.fusefs.fix_broken_io sysctl
This looks like it may have been a workaround for a specific buggy FUSE
filesystem.  However, there's no information about what that bug may have
been, and the workaround is > 6.5 years old, so I consider the sysctl to be
unmaintainable.

Sponsored by:	The FreeBSD Foundation
2019-05-13 19:31:09 +00:00
Alan Somers
4abf87666a fusefs: reap dead sysctls
Remove the "sync_unmount" and "init_backgrounded" sysctls and the associated
options from mount_fusefs.  Add no backwards-compatibility hidden options to
mount_fusefs because these options never had any effect, and are therefore
unlikely to be used.

Sponsored by:	The FreeBSD Foundation
2019-05-13 19:03:46 +00:00
Alan Somers
7648bc9fee MFHead @347527
Sponsored by:	The FreeBSD Foundation
2019-05-13 18:25:55 +00:00
Alan Somers
8350cbd8d6 [skip ci] fusefs: remove an obsolete comment
Sponsored by:	The FreeBSD Foundation
2019-05-13 15:39:54 +00:00
Alan Somers
f82e92e52b fusefs: enhance an SDT probe added in r346998
Sponsored by:	The FreeBSD Foundation
2019-05-13 15:39:19 +00:00
Alan Somers
0a7c63e075 fusefs: Report the number of available ops in kevent(2)
Just like /dev/devctl, /dev/fuse will now report the number of operations
available for immediate read in the kevent.data field during kevent(2).

Sponsored by:	The FreeBSD Foundation
2019-05-12 15:27:18 +00:00
Alan Somers
3429092cd1 fusefs: support kqueue for /dev/fuse
/dev/fuse was already pollable with poll and select.  Add support for
kqueue, too.  And add tests for polling with poll, select, and kqueue.

Sponsored by:	The FreeBSD Foundation
2019-05-11 22:58:25 +00:00
Alan Somers
7e0aac2408 fusefs: return ENOTCONN instead of EIO if the daemon dies suddenly
If the daemon dies, return ENOTCONN for all operations that have already
been sent to the daemon, as well as any new ones.

Sponsored by:	The FreeBSD Foundation
2019-05-10 16:41:33 +00:00
Alan Somers
d5024ba275 fusefs: minor optimization to interrupted fuse operations
If the daemon is known to ignore FUSE_INTERRUPT, then we may as well block
all signals while waiting for a response.

Sponsored by:	The FreeBSD Foundation
2019-05-10 16:31:51 +00:00
Alan Somers
8b73a4c5ae fusefs: fix running multiple daemons concurrently
When a FUSE daemon dies or closes /dev/fuse, all of that daemon's pending
requests must be terminated.  Previously that was done in /dev/fuse's
.d_close method.  However, d_close only gets called on the *last* close of
the device.  That means that if multiple daemons were running concurrently,
all but the last daemon to close would leave their I/O hanging around.  The
problem was easily visible just by running "kyua -v parallelism=2 test" in
fusefs's test directory.

Fix this bug by terminating a daemon's pending I/O during /dev/fuse's
cdvpriv dtor method instead.  That method runs on every close of a file.

Also, fix some potential races in the tests:
* Clear SA_RESTART when registering the daemon's signal handler so read(2)
  will return EINTR.
* Wait for the daemon to die before unmounting the mountpoint, so we won't
  see an unwanted FUSE_DESTROY operation in the mock file system.

Sponsored by:	The FreeBSD Foundation
2019-05-10 15:02:29 +00:00
Alan Somers
d5ff268834 fusefs: create sockets with FUSE_MKNOD, not FUSE_CREATE
libfuse expects sockets to be created with FUSE_MKNOD, not FUSE_CREATE,
because that's how Linux does it.  My first attempt at creating sockets
(r346894) used FUSE_CREATE because FreeBSD uses VOP_CREATE for this purpose.
There are no backwards-compatibility concerns with this change, because
socket support hasn't yet been merged to head.

Sponsored by:	The FreeBSD Foundation
2019-05-09 16:25:01 +00:00
Alan Somers
002e54b0aa fusefs: clear a dir's attr cache when its contents change
Any change to a directory's contents should cause its mtime and ctime to be
updated by the FUSE daemon.  Clear its attribute cache so we'll get the new
attributs the next time that they're needed.  This affects the following
VOPs: VOP_CREATE, VOP_LINK, VOP_MKDIR, VOP_MKNOD, VOP_REMOVE, VOP_RMDIR, and
VOP_SYMLINK

Reported by:	pjdfstest
Sponsored by:	The FreeBSD Foundation
2019-05-09 01:16:34 +00:00
Alan Somers
8e45ec4e64 fusefs: fix a permission handling bug during VOP_RENAME
If the file to be renamed is a directory and it's going to get a new parent,
then the user must have write permissions to that directory, because the
".." dirent must be changed.

Reported by:	pjdfstest
Sponsored by:	The FreeBSD Foundation
2019-05-08 22:28:13 +00:00
Alan Somers
d943c93e76 fusefs: allow non-owners to set timestamps to UTIME_NOW
utimensat should allow anybody with write access to set atime and mtime to
UTIME_NOW.

PR:		237181
Sponsored by:	The FreeBSD Foundation
2019-05-08 19:42:00 +00:00
Alan Somers
4ae3a56cb1 fusefs: updated cached attributes during VOP_LINK.
FUSE_LINK returns a new set of attributes.  fusefs should cache them just
like it does during other VOPs.  This is not only a matter of performance
but of correctness too; without caching the new attributes the vnode's nlink
value would be out-of-date.

Reported by:	pjdfstest
Sponsored by:	The FreeBSD Foundation
2019-05-08 18:12:38 +00:00
Alan Somers
a2bdd7379b fusefs: drop suid after a successful chown by a non-root user
Drop sgid too.  Also, drop them after a successful chgrp.

Reported by:	pjdfstest
Sponsored by:	The FreeBSD Foundation
2019-05-07 22:38:13 +00:00
Alan Somers
4e83d6555e fusefs: allow the null chown and null chgrp
Even an unprivileged user should be able to chown a file to its current
owner, or chgrp it to its current group.  Those are no-ops.

Reported by:	pjdfstest
Sponsored by:	The FreeBSD Foundation
2019-05-07 01:27:23 +00:00
Alan Somers
1c8a5f5e39 fusefs: disable posix_fallocate
fuse file systems have far too much variability for the standard
posix_fallocate implementation to work.  A future protocol revision (7.19)
adds a FUSE_FALLOCATE operation, but we don't support that yet.  Better to
simply return EINVAL until then.

Reported by:	pjdfstest
Sponsored by:	The FreeBSD Foundation
2019-05-07 00:03:05 +00:00
Alan Somers
3fa127896b fusefs: allow ftruncate on files without write permission
ftruncate should succeed as long as the file descriptor is writable, even if
the file doesn't have write permission.  This is important when combined
with O_CREAT.

Reported by:	pjdfstest
Sponsored by:	The FreeBSD Foundation
2019-05-06 20:46:58 +00:00
Alan Somers
8cfb44315a fusefs: Fix another obscure permission handling bug
Don't allow unprivileged users to set SGID on files to whose group they
don't belong.  This is slightly different than what POSIX says we should do
(clear sgid on return from a successful chmod), but it matches what UFS
currently does.

Reported by:	pjdfstest
Sponsored by:	The FreeBSD Foundation
2019-05-06 16:54:35 +00:00
Alan Somers
a90e32de25 fusefs: clear SUID & SGID after a successful write by a non-owner
Reported by:	pjdfstest
Sponsored by:	The FreeBSD Foundation
2019-05-06 16:17:55 +00:00
Alan Somers
ac0a68e9cd fusefs: don't allow truncating irregular files on an read-only mount
The readonly mount check had a special case allowing the sizes of files to
be changed if they weren't regular files.  I don't know why.  Neither UFS,
ZFS, nor ext2 have such a special case, and I don't know when you would ever
change the size of a non-regular file anyway.

Sponsored by:	The FreeBSD Foundation
2019-05-06 15:20:18 +00:00
Konstantin Belousov
391918a3c1 Do not flush NFS node from NFS VOP_SET_TEXT().
The more appropriate place to do the flushing is VOP_OPEN().  This was
uncovered because VOP_SET_TEXT() is now called with the vnode'
vm_object rlocked, which is incompatible with the flush operations.

After the move, there is no need for NFS-specific VOP_SET_TEXT
overload.

Sponsored by:	The FreeBSD Foundation
MFC after:	30 days
2019-05-06 08:49:43 +00:00
Konstantin Belousov
78022527bb Switch to use shared vnode locks for text files during image activation.
kern_execve() locks text vnode exclusive to be able to set and clear
VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0
condition.

The change removes VV_TEXT, replacing it with the condition
v_writecount <= -1, and puts v_writecount under the vnode interlock.
Each text reference decrements v_writecount.  To clear the text
reference when the segment is unmapped, it is recorded in the
vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and
v_writecount is incremented on the map entry removal

The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that
v_writecount does not contradict the desired change.  vn_writecheck()
is now racy and its use was eliminated everywhere except access.
Atomic check for writeability and increment of v_writecount is
performed by the VOP.  vn_truncate() now increments v_writecount
around VOP_SETATTR() call, lack of which is arguably a bug on its own.

nullfs bypasses v_writecount to the lower vnode always, so nullfs
vnode has its own v_writecount correct, and lower vnode gets all
references, since object->handle is always lower vnode.

On the text vnode' vm object dealloc, the v_writecount value is reset
to zero, and deadfs vop_unset_text short-circuit the operation.
Reclamation of lowervp always reclaims all nullfs vnodes referencing
lowervp first, so no stray references are left.

Reviewed by:	markj, trasz
Tested by:	mjg, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D19923
2019-05-05 11:20:43 +00:00
Alan Somers
e5ff3a7e28 fusefs: only root may set the sticky bit on a non-directory
PR:		216391
Reported by:	pjdfstest
Sponsored by:	The FreeBSD Foundation
2019-05-04 16:27:58 +00:00
Alan Somers
61b0a927cb fusefs: use effective gid, not real gid, for FUSE operations
This is the gid used for stuff like setting the group of a newly created
file.

Reported by:	pjdfstest
Sponsored by:	The FreeBSD Foundation
2019-05-04 02:11:28 +00:00
Alan Somers
72f03b7ccd fusefs: fix "returning with lock held" panics in fuse_vnode_alloc
These panics all lie in the error path.  The only one I've hit is caused by
a buggy FUSE server unexpectedly changing the type of a vnode.

Sponsored by:	The FreeBSD Foundation
2019-05-01 17:27:04 +00:00
Alan Somers
93198e64fb fusefs: fix a memory leak from r346979
PR:		216391
Sponsored by:	The FreeBSD Foundation
2019-05-01 17:24:53 +00:00
Alan Somers
474ba6fa3b fusefs: fix some permission checks with -o default_permissions
When mounted with -o default_permissions fusefs is supposed to validate all
permissions in the kernel, not the file system.  This commit fixes two
permissions that I had previously overlooked.

* Only root may chown a file
* Non-root users may only chgrp a file to a group to which they belong

PR:		216391
Sponsored by:	The FreeBSD Foundation
2019-05-01 00:00:49 +00:00
Alan Somers
ede571e40a fusefs: support unix-domain sockets
Also, fix the teardown of the Fifo.read_write test

Sponsored by:	The FreeBSD Foundation
2019-04-29 16:24:51 +00:00
Alan Somers
f9b0e30ba7 fusefs: FIFO support
Sponsored by:	The FreeBSD Foundation
2019-04-29 01:40:35 +00:00
Alan Somers
9c7ec33162 fusefs: fix a deadlock in VOP_PUTPAGES
As of r346162 fuse now invalidates the cache during writes.  But it can't do
that when writing from VOP_PUTPAGES, because the write is coming _from_ the
cache.  Trying to invalidate the cache in that situation causes a deadlock
in vm_object_page_remove, because the pages in question have already been
busied by the same thread.

PR:		235774
Sponsored by:	The FreeBSD Foundation
2019-04-26 19:47:43 +00:00
Alan Somers
102c7ac083 fusefs: handle ENOSYS for FUSE_INTERRUPT
Though it's not documented, Linux will interpret a FUSE_INTERRUPT response
of ENOSYS as "the file system does not support FUSE_INTERRUPT".
Subsequently it will never send FUSE_INTERRUPT again to the same mount
point.  This change matches Linux's behavior.

PR:		346357
Sponsored by:	The FreeBSD Foundation
2019-04-24 17:30:50 +00:00
Alan Somers
ebbfe00ec2 fusefs: interruptibility improvements suggested by kib
* Block stop signals in fticket_wait_answer
* Hold ps_mtx while checking signal disposition
* style(9) changes

PR:		346357
Reported by:	kib
Sponsored by:	The FreeBSD Foundation
2019-04-24 15:54:18 +00:00
Alan Somers
21d4686c5c fusefs: diff reduction between fuse_read_biobackend and ext_read
The main difference is to replace some custom logic with bread.  No
functional change at this point, but this is one step towards adding
readahead.

Sponsored by:	The FreeBSD Foundation
2019-04-23 22:34:32 +00:00
Alan Somers
bad3de4365 fusefs: use vfs_bio_clrbuf in fuse_vnode_setsize
Reuse fuse_vnode_setsize instead of reinventing the wheel.  This is what
ext2_ind_truncate does.

PR:		233783
Sponsored by:	The FreeBSD Foundation
2019-04-23 22:25:50 +00:00
Rick Macklem
a6f77c9a6e Add #ifdef INET as requested by bz@. 2019-04-21 22:53:51 +00:00
Alan Somers
419e7ff674 fusefs: rename the SDT probes from "fuse" to "fusefs"
This matches the new name of the kld.

Sponsored by:	The FreeBSD Foundation
2019-04-20 00:04:31 +00:00
Rick Macklem
b4372164ed Add support for the ModeSetMasked attribute to the NFSv4.1 server.
I do not know of an extant NFSv4.1 client that currently does a Setattr
operation for the ModeSetMasked, but it has been discussed on the linux-nfs
mailing list.
This patch adds support for doing a Setattr of ModeSetMasked, so that it
will work for any future NFSv4.1 client that chooses to do so.
Tested via a hacked FreeBSD NFSv4.1 client.

MFC after:	2 weeks
2019-04-19 23:35:08 +00:00
Rick Macklem
b4645807af Replace "vp" with NULL to make the code more readable.
At the time of this nfsv4_sattr() call, "vp == NULL", so this patch doesn't
change the semantics, but I think it makes the code more readable.
It also makes it consistent with the nfsv4_sattr() call a few lines above
this one. Found during code inspection.

MFC after:	2 weeks
2019-04-19 23:27:23 +00:00
Alan Somers
4423ae76ca fusefs: reap dead code
Sponsored by:	The FreeBSD Foundation
2019-04-19 23:04:07 +00:00
Alan Somers
268c28edbc fusefs: give priority to FUSE_INTERRUPT operations
When interrupting a FUSE operation, send the FUSE_INTERRUPT op to the daemon
ASAP, ahead of other unrelated operations.

PR:		236530
Sponsored by:	The FreeBSD Foundation
2019-04-19 21:50:23 +00:00
Alan Somers
f0f7fc1be4 fusefs: fix interrupting FUSE_SETXATTR
fusefs's VOP_SETEXTATTR calls uiomove(9) before blocking, so it can't be
restarted.  It must be interrupted instead.

PR:		236530
Sponsored by:	The FreeBSD Foundation
2019-04-19 20:31:12 +00:00
Alan Somers
3d070fdc76 fusefs: don't send FUSE_INTERRUPT for ops that are still in-kernel
If a pending FUSE operation hasn't yet been sent to the daemon, then there's
no reason to inform the daemon that it's been interrupted.  Instead, simply
remove it from the fuse message queue and set its status to EINTR or
ERESTART as appropriate.

PR:		346357
Sponsored by:	The FreeBSD Foundation
2019-04-19 15:05:32 +00:00
Rick Macklem
ea5776ec47 Fix the NFSv4.0 server so that it does not support NFSv4.1 attributes.
During inspection of a packet trace, I noticed that an NFSv4.0 mount
reported that it supported attributes that are only defined for NFSv4.1.
In practice, this bug appears to be benign, since NFSv4.0 clients will
not use attributes that were added for NFSv4.1.
However, this was not correct and this patch fixes the NFSv4.0 server
so that it only supports attributes defined for NFSv4.0.
It also adds a definition for NFSv4.1 attributes that can only be set,
although it is only defined as 0 for now.
This is anticipation of the addition of support for the NFSv4.1 mode+mask
attribute soon.

MFC after:	2 weeks
2019-04-19 03:36:22 +00:00
Alan Somers
a154214620 fusefs: improvements to interruptibility
* If a process receives a fatal signal while blocked on a fuse operation,
  return ASAP without waiting for the operation to complete.  But still send
  the FUSE_INTERRUPT op to the daemon.
* Plug memory leaks from r346339

Interruptibility is now fully functional, but it could be better:
* Operations that haven't been sent to the server yet should be aborted
  without sending FUSE_INTERRUPT.
* It would be great if write operations could be made restartable.
  That would require delaying uiomove until the last possible moment, which
  would be sometime during fuse_device_read.
* It would be nice if we didn't have to guess which EAGAIN responses were
  for FUSE_INTERRUPT operations.

PR:		236530
Sponsored by:	The FreeBSD Foundation
2019-04-18 19:16:34 +00:00
Hans Petter Selasky
db92a6cd51 Implement flag for telling cuse(3) clients if the peer is running in 32-bit
compat mode or not. This is useful when implementing compatibility ioctl(2)
handlers in userspace.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-04-18 19:04:07 +00:00
Alan Somers
723c776829 fusefs: WIP making FUSE operations interruptible
The fuse protocol includes a FUSE_INTERRUPT operation that the client can
send to the server to indicate that it wants to abort an in-progress
operation.  It's required to interrupt any syscall that is blocking on a
fuse operation.

This commit adds basic FUSE_INTERRUPT support.  If a process receives any
signal while it's blocking on a FUSE operation, it will send a
FUSE_INTERRUPT and wait for the original operation to complete.  But there
is still much to do:

* The current code will leak memory if the server ignores FUSE_INTERRUPT,
  which many do.  It will also leak memory if the server completes the
  original operation before it receives the FUSE_INTERRUPT.
* An interrupted read(2) will incorrectly appear to be successful.
* fusefs should return immediately for fatal signals.
* Operations that haven't been sent to the server yet should be aborted
  without sending FUSE_INTERRUPT.
* Test coverage should be better.
* It would be great if write operations could be made restartable.
  That would require delaying uiomove until the last possible moment, which
  would be sometime during fuse_device_read.

PR:		236530
Sponsored by:	The FreeBSD Foundation
2019-04-17 23:32:38 +00:00
Fedor Uporov
84b89556b4 ext2fs: Initial version of DTrace support.
Commit forgotten file.

Reviewed by:    pfg, gnn
MFC after:      1 week

Differential Revision:    https://reviews.freebsd.org/D19848
2019-04-16 11:37:15 +00:00
Fedor Uporov
ebc94b66fe ext2fs: Initial version of DTrace support.
Reviewed by:    pfg, gnn
MFC after:      1 week

Differential Revision:    https://reviews.freebsd.org/D19848
2019-04-16 11:20:10 +00:00
Rick Macklem
eeb1f3ed51 Fix the NFSv4 client to safely find processes.
r340744 broke the NFSv4 client, because it replaced pfind_locked() with a
call to pfind(), since pfind() acquires the sx lock for the pid hash and
the NFSv4 already holds a mutex when it does the call.
The patch fixes the problem by recreating a pfind_any_locked() and adding the
functions pidhash_slockall() and pidhash_sunlockall to acquire/release
all of the pid hash locks.
These functions are then used by the NFSv4 client instead of acquiring
the allproc_lock and calling pfind().

Reviewed by:	kib, mjg
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D19887
2019-04-15 01:27:15 +00:00
Rick Macklem
ed2f100170 Add support for INET6 addresses to the kernel code that dumps open/lock state.
PR#223036 reported that INET6 callback addresses were not printed by
nfsdumpstate(8). This kernel patch adds INET6 addresses to the dump structure,
so that nfsdumpstate(8) can print them out, post-r346190.
The patch also includes the addition of #ifdef INET, INET6 as requested
by bz@.

PR:		223036
Reviewed by:	bz, rgrimes
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D19839
2019-04-13 22:00:09 +00:00
Alan Somers
f067b60946 fusefs: implement VOP_ADVLOCK
PR:		234581
Sponsored by:	The FreeBSD Foundation
2019-04-12 23:22:27 +00:00
Alan Somers
6af6fdcea7 fusefs: evict invalidated cache contents during write-through
fusefs's default cache mode is "writethrough", although it currently works
more like "write-around"; writes bypass the cache completely.  Since writes
bypass the cache, they were leaving stale previously-read data in the cache.
This commit invalidates that stale data.  It also adds a new global
v_inval_buf_range method, like vtruncbuf but for a range of a file.

PR:		235774
Reported by:	cem
Sponsored by:	The FreeBSD Foundation
2019-04-12 19:05:06 +00:00
Konstantin Belousov
28ce2bc1b5 Ignore doomed vnodes in tmpfs_update_mtime().
Otherwise we might dereference NULL vp->v_data after
VP_TO_TMPFS_NODE().

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-04-12 17:11:50 +00:00
Alan Somers
1f4a83f981 fusefs: Handle ENOSYS for all remaining opcodes
For many FUSE opcodes, an error of ENOSYS has special meaning.  fusefs
already handled some of those; this commit adds handling for the remainder:

* FUSE_FSYNC, FUSE_FSYNCDIR: ENOSYS means "success, and automatically return
  success without calling the daemon from now on"
* All extattr operations: ENOSYS means "fail EOPNOTSUPP, and automatically
  do it without calling the daemon from now on"

PR:		236557
Sponsored by:	The FreeBSD Foundation
2019-04-12 00:15:36 +00:00
Alan Somers
b4227f34e8 fusefs: /dev/fuse should be 0666
The fuse protocol is designed with security in mind.  It prevents users from
spying on each others' activities.  And it doesn't grant users any
privileges that they didn't already have.  So it's appropriate to make it
available to everyone.  Plus, it's necessary in order for kyua to run tests
as an unprivileged user.

Sponsored by:	The FreeBSD Foundation
2019-04-11 22:43:19 +00:00
Alan Somers
c9c34c2057 fusefs: test that we reparent a vnode during rename
fusefs tracks each vnode's parent.  The rename code was already correctly
updating it.  Delete a comment that said otherwise, and add a regression
test for it.

Sponsored by:	The FreeBSD Foundation
2019-04-11 22:34:28 +00:00
Alan Somers
64f31d4f3b fusefs: fix a panic in a stale vnode situation
Don't panic if the server changes the file type of a file without us first
deleting it.  That could indicate a buggy server, but it could also be the
result of one of several race conditions.  Return EAGAIN as we do elsewhere.

Sponsored by:	The FreeBSD Foundation
2019-04-11 22:32:34 +00:00
Alan Somers
4683b90591 fusefs: don't disappear a vnode on entry cache expiration
When the entry cache expires, it's only necessary to purge the cache.
Disappearing a vnode also purges the attribute cache, which is unnecessary,
and invalidates the data cache, which could be harmful.

Sponsored by:	The FreeBSD Foundation
2019-04-11 21:13:54 +00:00
Alan Somers
6124fd7106 fusefs: Finish supporting -o default_permissions
I got most of -o default_permissions working in r346088.  This commit adds
sticky bit checks.  One downside is that sometimes there will be an extra
FUSE_GETATTR call for the parent directory during unlink or rename.  But in
actual use I think those attributes will almost always be cached.

PR:		216391
Sponsored by:	The FreeBSD Foundation
2019-04-11 21:00:40 +00:00
Alan Somers
dc14d593a6 fusefs: use vn_vget_ino_gen in fuse_vnop_lookup
vn_vget_ino_gen is a helper function added in r268606 to simplify cases just
like this.

Sponsored by:	The FreeBSD Foundation
2019-04-11 17:20:15 +00:00
Alan Somers
438b8a6fa2 fusefs: eliminate a superfluous FUSE_GETATTR from VOP_LOOKUP
fuse_vnop_lookup was using a FUSE_GETATTR operation when looking up "." and
"..", even though the only information it needed was the file type and file
size.  "." and ".." are obviously always going to be directories; there's no
need to double check.

Sponsored by:	The FreeBSD Foundation
2019-04-11 05:11:02 +00:00
Alan Somers
73825da397 fusefs: remove "early permission check hack"
fuse_vnop_lookup contained an awkward hack meant to reduce daemon activity
during long lookup chains.  However, the hack is no longer necessary now
that we properly cache file attributes.  Also, I'm 99% certain that it
could've bypassed permission checks when using openat to open a file
relative to a directory that lacks execute permission.

Sponsored by:	The FreeBSD Foundation
2019-04-10 21:46:59 +00:00
Alan Somers
666f8543bb fusefs: various cleanups
* Eliminate fuse_access_param.  Whatever it was supposed to do, it seems
  like it was never complete.  The only real function it ever seems to have
  had was a minor performance optimization, which I've already eliminated.
* Make extended attribute operations obey the allow_other mount option.
* Allow unprivileged access to the SYSTEM extattr namespace when
  -o default_permissions is not in use.
* Disallow setextattr and deleteextattr on read-only mounts.
* Add tests for a few more error cases.

Sponsored by:	The FreeBSD Foundation
2019-04-10 21:10:21 +00:00
Alan Somers
ff4fbdf548 fusefs: WIP supporting -o default_permissions
Normally all permission checking is done in the fuse server.  But when -o
default_permissions is used, it should be done in the kernel instead.  This
commit adds appropriate permission checks through fusefs when -o
default_permissions is used.  However, sticky bit checks aren't working yet.
I'll handle those in a follow-up commit.

There are no checks for file flags, because those aren't supported by our
version of the FUSE protocol.  Nor is there any support for ACLs, though
that could be added if there were any demand.

PR:		216391
Reported by:	hiyorin@gmail.com
Sponsored by:	The FreeBSD Foundation
2019-04-10 17:31:00 +00:00
Alan Somers
44f10c6e40 fusefs: cache negative lookups
The FUSE protocol includes a way for a server to tell the client that a
negative lookup response is cacheable for a certain amount of time.

PR:		236226
Sponsored by:	The FreeBSD Foundation
2019-04-09 21:22:02 +00:00
Konstantin Belousov
ae90941431 Add vn_fsync_buf().
Provide a convenience function to avoid the hack with filling fake
struct vop_fsync_args and then calling vop_stdfsync().

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-04-09 20:20:04 +00:00
Konstantin Belousov
997febb1e7 Fix dirty buf exhaustion easily triggered with msdosfs.
If truncate(2) is performed on msdosfs file, which extends the file by
system-depended large amount, fs creates corresponding amount of dirty
delayed-write buffers, which can consume all buffers.  Such buffers
cannot be flushed by the bufdaemon because the ftruncate() thread owns
the vnode lock.  So the system runs out of free buffers, and even
truncate() thread starves, which means deadlock because it owns the
vnode lock.

Fix this by doing vnode fsync in extendfile() when low memory or low
buffers condition detected, which flushes all dirty buffers belonging
to the file being extended.

Note that the more usual fallback to bawrite() does not work
acceptable in this situation, because it would only allow one buffer
to be recycled.  Other filesystems, most important UFS, do not allow
userspace to create arbitrary amount of dirty delayed-write buffers
without feedback, so bawrite() is good enough for them.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-04-09 19:55:02 +00:00
Alan Somers
ccb75e4939 fusefs: implement entry cache timeouts
Follow-up to r346046.  These two commits implement fuse cache timeouts for
both entries and attributes.  They also remove the vfs.fusefs.lookup_cache
enable sysctl, which is no longer needed now that cache timeouts are
honored.

PR:		235773
Sponsored by:	The FreeBSD Foundation
2019-04-09 17:23:34 +00:00
Alan Somers
3f2c630c74 fusefs: implement attribute cache timeouts
The FUSE protocol allows the server to specify the timeout period for the
client's attribute and entry caches.  This commit implements the timeout
period for the attribute cache.  The entry cache's timeout period is
currently disabled because it panics, and is guarded by the
vfs.fusefs.lookup_cache_expire sysctl.

PR:		235773
Reported by:	cem
Sponsored by:	The FreeBSD Foundation
2019-04-09 00:47:38 +00:00
Alan Somers
cad677915f fusefs: cache file attributes
FUSE_LOOKUP, FUSE_GETATTR, FUSE_SETATTR, FUSE_MKDIR, FUSE_LINK,
FUSE_SYMLINK, FUSE_MKNOD, and FUSE_CREATE all return file attributes with a
cache validity period.  fusefs will now cache the attributes, if the server
returns a non-zero cache validity period.

This change does _not_ implement finite attr cache timeouts.  That will
follow as part of PR 235773.

PR:		235775
Reported by:	cem
Sponsored by:	The FreeBSD Foundation
2019-04-08 18:45:41 +00:00
Rick Macklem
80405bcf79 Add INET6 support for the upcalls to the nfsuserd daemon.
The kernel code uses UDP to do upcalls to the nfsuserd(8) daemon to get
updates to the username<->uid and groupname<->gid mappings.
A change to AF_LOCAL last year had to be reverted, since it could result
in vnode locking issues on the AF_LOCAL socket.
This patch adds INET6 support and the required #ifdef INET and INET6
to the code.

Requested by:	bz
PR:		205193
Reviewed by:	bz, rgrimes
MFC after:	2 weeks
Differential Revision:	http://reviews.freebsd.org/D19218
2019-04-06 21:53:46 +00:00
Alan Somers
2c338af141 fusefs: fix a panic on mount
Don't page fault if the file descriptor provided with "-o fd" is invalid.
This is a merge of r345419 from the projects/fuse2 branch.

Reviewed by:	ngie
Tested by:	Marek Zarychta <zarychtam@plan-b.pwste.edu.pl>
MFC after:	2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19836
2019-04-06 18:04:04 +00:00
Alan Somers
caf5f57d2d fusefs: implement VOP_ACCESS
VOP_ACCESS was never fully implemented in fusefs.  This change:
* Removes the FACCESS_DO_ACCESS flag, which pretty much disabled the whole
  vop.
* Removes a quixotic special case for VEXEC on regular files.  I don't know
  why that was in there.
* Removes another confusing special case for VADMIN.
* Removes the FACCESS_NOCHECKSPY flag.  It seemed to be a performance
  optimization, but I'm unconvinced that it was a net positive.
* Updates test cases.

This change does NOT implement -o default_permissions.  That will be handled
separately.

PR:		236291
Sponsored by:	The FreeBSD Foundation
2019-04-05 18:37:48 +00:00
Alan Somers
efa23d9784 fusefs: enforce -onoallow_other even beneath the mountpoint
When -o allow_other is not in use, fusefs is supposed to prevent access to
the filesystem by any user other than the one who owns the daemon.  Our
fusefs implementation was only enforcing that restriction at the mountpoint
itself.  That was usually good enough because lookup usually descends from
the mountpoint.  However, there are cases when it doesn't, such as when
using openat relative to a file beneath the mountpoint.

PR:		237052
Sponsored by:	The FreeBSD Foundation
2019-04-05 17:21:23 +00:00
Alan Somers
140bb4927a fusefs: correctly return EROFS from VOP_ACCESS
Sponsored by:	The FreeBSD Foundation
2019-04-05 15:33:43 +00:00
Rick Macklem
02c8dd7d72 Revert r320698, since the related userland changes were reverted by r338192.
r338192 reverted the changes to nfsuserd so that it could use an AF_LOCAL
socket, since it resulted in a vnode locking panic().
Post r338192 nfsuserd daemons use the old AF_INET socket for upcalls and
do not use these kernel changes.
I left them in for a while, so that nfsuserd daemons built from head sources
between r320757 (Jul. 6, 2017) and r338192 (Aug. 22, 2018) would need them
by default.
This only affects head, since the changes were never MFC'd.
I will add an UPDATING entry, since an nfsuserd daemon built from head
sources between r320757 and r338192 will not run unless the "-use-udpsock"
option is specified. (This command line option is only in the affected
revisions of the nfsuserd daemon.)

I suspect few will be affected by this, since most who run systems built
from head sources (not stable or releases) will have rebuilt their nfsuserd
daemon from sources post r338192 (Aug. 22, 2018)

This is being reverted in preparation for an update to include AF_INET6
support to the code.
2019-04-04 23:30:27 +00:00
Alan Somers
a7e81cb3db fusefs: properly handle FOPEN_KEEP_CACHE
If a fuse file system returne FOPEN_KEEP_CACHE in the open or create
response, then the client is supposed to _not_ clear its caches for that
file.  I don't know why clearing the caches would be the default given that
there's a separate flag to bypass the cache altogether, but that's the way
it is.  fusefs(5) will now honor this flag.

Our behavior is slightly different than Linux's because we reuse file
handles.  That means that open(2) wont't clear the cache if there's a
reusable file handle, even if the file server wouldn't have sent
FOPEN_KEEP_CACHE had we opened a new file handle like Linux does.

PR:		236560
Sponsored by:	The FreeBSD Foundation
2019-04-04 20:30:14 +00:00
Alan Somers
8d013bec7a fusefs: fix some uninitialized memory references
This bug was long present, but was exacerbated by r345876.

The problem is that fiov_refresh was bzero()ing a buffer _before_ it
reallocated that buffer.  That's obviously the wrong order.  I fixed the
order in r345876, which exposed the main problem.  Previously, the first 160
bytes of the buffer were getting bzero()ed when it was first allocated in
fiov_init.  Subsequently, as that buffer got recycled between callers, the
portion used by the _previous_ caller was getting bzero()ed by the current
caller in fiov_refresh.  The problem was never visible simply because no
caller was trying to use more than 160 bytes.

Now the buffer gets properly bzero()ed both at initialization time and any
time it gets enlarged or reallocated.

Sponsored by:	The FreeBSD Foundation
2019-04-04 20:24:58 +00:00
Rodney W. Grimes
6c1c6ae537 Use IN_foo() macros from sys/netinet/in.h inplace of handcrafted code
There are a few places that use hand crafted versions of the macros
from sys/netinet/in.h making it difficult to actually alter the
values in use by these macros.  Correct that by replacing handcrafted
code with proper macro usage.

Reviewed by:		karels, kristof
Approved by:		bde (mentor)
MFC after:		3 weeks
Sponsored by:		John Gilmore
Differential Revision:	https://reviews.freebsd.org/D19317
2019-04-04 19:01:13 +00:00
Alan Somers
9a696dc6bb MFHead@r345880 2019-04-04 18:26:32 +00:00
Alan Somers
12292a99ac fusefs: correctly handle short writes
If a FUSE daemon returns FOPEN_DIRECT_IO when a file is opened, then it's
allowed to write less data than was requested during a FUSE_WRITE operation
on that file handle.  fusefs should simply return a short write to userland.

The old code attempted to resend the unsent data.  Not only was that
incorrect behavior, but it did it in an ineffective way, by attempting to
"rewind" the uio and uiomove the unsent data again.

This commit correctly handles short writes by returning directly to
userland if FOPEN_DIRECT_IO was set.  If it wasn't set (making the short
write technically a protocol violation), then we resend the unsent data.
But instead of rewinding the uio, just resend the data that's already in the
kernel.

That necessitated a few changes to fuse_ipc.c to reduce the amount of bzero
activity.  fusefs may be marginally faster as a result.

PR:		236381
Sponsored by:	The FreeBSD Foundation
2019-04-04 16:51:34 +00:00
Alan Somers
35cf0e7e56 fusefs: fix a panic in VOP_READDIR
The original fusefs import, r238402, contained a bug in fuse_vnop_close that
could close a directory's file handle while there were still other open file
descriptors.  The code looks deliberate, but there is no explanation for it.
This necessitated a workaround in fuse_vnop_readdir that would open a new
file handle if, "for some mysterious reason", that vnode didn't have any
open file handles.  r345781 had the effect of causing the workaround to
panic, making the problem more visible.

This commit removes the workaround and the original bug, which also fixes
the panic.

Sponsored by:	The FreeBSD Foundation
2019-04-03 20:57:43 +00:00
Alan Somers
9f10f423a9 fusefs: send FUSE_FLUSH during VOP_CLOSE
The FUSE protocol says that FUSE_FLUSH should be send every time a file
descriptor is closed.  That's not quite possible in FreeBSD because multiple
file descriptors can share a single struct file, and closef doesn't call
fo_close until the last close.  However, we can still send FUSE_FLUSH on
every VOP_CLOSE, which is probably good enough.

There are two purposes for FUSE_FLUSH.  One is to allow file systems to
return EIO if they have an error when writing data that's cached
server-side.  The other is to release POSIX file locks (which fusefs(5) does
not yet support).

PR:		236405, 236327
Sponsored by:	The FreeBSD Foundation
2019-04-03 19:59:45 +00:00
Konstantin Belousov
c973ee9e06 msdosfs: zero tail of the last block on truncation for VREG vnodes as well.
Despite the call to vtruncbuf() from detrunc(), which results in
zeroing part of the partial page after EOF, there still is a
possibility to retain the stale data which is revived on file
enlargement.  If the filesystem block size is greater than the page
size, partial block might keep other after-EOF pages wired and they
get reused then.  Fix it by zeroing whole part of the partial buffer
after EOF, not relying on vnode_pager_setsize().

PR:	236977
Reported by:	asomers
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-04-03 17:02:18 +00:00
Alan Somers
e312493b37 fusefs: during ftruncate, discard cached data past truncation point
During truncate, fusefs was discarding entire cached blocks, but it wasn't
zeroing out the unused portion of a final partial block.  This resulted in
reads returning stale data.

PR:		233783
Reported by:	fsx
Sponsored by:	The FreeBSD Foundation
2019-04-03 02:29:56 +00:00
Alan Somers
d3a8f2dd09 fusefs: fix a just-introduced panic in readdir
r345808 changed the interface of fuse_filehandle_open, but failed to update
one caller.

Sponsored by:	The FreeBSD Foundation
2019-04-02 19:20:55 +00:00
Alan Somers
9e4448719b fusefs: cleanup and refactor some recent commits
This commit cleans up after recent commits, especially 345766, 345768, and
345781.  There is no functional change.  The most important change is to add
comments documenting why we can't send flags like O_APPEND in
FUSE_WRITE_OPEN.

PR:		236340
Sponsored by:	The FreeBSD Foundation
2019-04-02 18:09:40 +00:00
Konstantin Belousov
5c4ce6fac2 tmpfs: plug holes on rw->ro mount update.
In particular:
- suspend the mount around vflush() to avoid new writes come after the
  vnode is processed;
- flush pending metadata updates (mostly node times);
- remap all rw mappings of files from the mount into ro.

It is not clear to me how to handle writeable mappings on rw->ro for
tmpfs best.  Other filesystems, which use vnode vm object, call
vgone() on vnodes with writers, which sets the vm object type to
OBJT_DEAD, and keep the resident pages and installed ptes as is.  In
particular, the existing mappings continue to work as far as
application only accesses resident pages, but changes are not flushed
to file.

For tmpfs the vm object of VREG vnodes also serves as the data pages
container, giving single copy of the mapped pages, so it cannot be set
to OBJT_DEAD.  Alternatives for making rw mappings ro could be either
invalidating them at all, or marking as CoW.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D19737
2019-04-02 13:59:04 +00:00
Konstantin Belousov
e1cdc30faa tmpfs: ignore tmpfs_set_status() if mount point is read-only.
In particular, this fixes atimes still changing for ro tmpfs.
tmpfs_set_status() gains tmpfs_mount * argument.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D19737
2019-04-02 13:49:32 +00:00
Konstantin Belousov
ae26575394 Block creation of the new nodes for read-only tmpfs mounts.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D19737
2019-04-02 13:41:26 +00:00
Alan Somers
f8d4af104b fusefs: send FUSE_OPEN for every open(2) with unique credentials
By default, FUSE performs authorization in the server.  That means that it's
insecure for the client to reuse FUSE file handles between different users,
groups, or processes.  Linux handles this problem by creating a different
FUSE file handle for every file descriptor.  FreeBSD can't, due to
differences in our VFS design.

This commit adds credential information to each fuse_filehandle.  During
open(2), fusefs will now only reuse a file handle if it matches the exact
same access mode, pid, uid, and gid of the calling process.

PR:		236844
Sponsored by:	The FreeBSD Foundation
2019-04-01 20:42:15 +00:00
Alan Somers
363a74163b fusefs: allow opening files O_EXEC
O_EXEC is useful for fexecve(2) and fchdir(2).  Treat it as another fufh
type alongside the existing RDONLY, WRONLY, and RDWR.  Prior to r345742 this
would've caused a memory and performance penalty.

PR:		236329
Sponsored by:	The FreeBSD Foundation
2019-04-01 16:36:02 +00:00
Alan Somers
4a6d5507f7 fusefs: fix an inverted error check in my last commit
This should be merged alongside 345766

Sponsored by:	The FreeBSD Foundation
2019-04-01 16:15:29 +00:00
Alan Somers
5ec10aa527 fusefs: replace obsolete array idioms
r345742 replaced fusefs's fufh array with a fufh list.  But it left a few
array idioms in place.  This commit replaces those idioms with more
efficient list idioms.  One location is in fuse_filehandle_close, which now
takes a pointer argument.  Three other locations are places that had to loop
over all of a vnode's fuse filehandles.

Sponsored by:	The FreeBSD Foundation
2019-04-01 14:23:43 +00:00
Alan Somers
1cedd6dfac fusefs: replace the fufh table with a linked list
The FUSE protocol allows each open file descriptor to have a unique file
handle.  On FreeBSD, these file handles must all be stored in the vnode.
The old method (also used by OSX and OpenBSD) is to store them all in a
small array.  But that limits the total number that can be stored.  This
commit replaces the array with a linked list (a technique also used by
Illumos).  There is not yet any change in functionality, but this is the
first step to fixing several bugs.

PR:		236329, 236340, 236381, 236560, 236844
Discussed with:	cem
Sponsored by:	The FreeBSD Foundation
2019-03-31 03:19:10 +00:00
Alan Somers
5fccbf313a fusefs: don't force direct io for files opened O_WRONLY
Previously fusefs would treat any file opened O_WRONLY as though the
FOPEN_DIRECT_IO flag were set, in an attempt to avoid issuing reads as part
of a RMW write operation on a cached part of the file.  However, the FUSE
protocol explicitly allows reads of write-only files for precisely that
reason.

Sponsored by:	The FreeBSD Foundation
2019-03-30 00:57:07 +00:00
Alan Somers
f220ef0b35 fix the GENERIC-NODEBUG build after r345675
Submitted by:	cy
Reported by:	cy, Michael Butler <imb@protected-networks.net>
MFC after:	2 weeks
X-MFC-With:	345675
2019-03-29 14:07:30 +00:00
Alan Somers
415e34c4d5 MFHead@r345677 2019-03-29 03:25:20 +00:00
Alan Somers
080518d810 fusefs: convert debug printfs into dtrace probes
fuse(4) was heavily instrumented with debug printf statements that could
only be enabled with compile-time flags. They fell into three basic groups:

1. Totally redundant with dtrace FBT probes. These I deleted.
2. Print textual information, usually error messages. These I converted to
   SDT probes of the form fuse:fuse:FILE:trace. They work just like the old
   printf statements except they can be enabled at runtime with dtrace. They
   can be filtered by FILE and/or by priority.
3. More complicated probes that print detailed information. These I
   converted into ad-hoc SDT probes.

Also, de-inline fuse_internal_cache_attrs.  It's big enough to be a regular
function, and this way it gets a dtrace FBT probe.

This commit is a merge of r345304, r344914, r344703, and r344664 from
projects/fuse2.

Reviewed by:	cem
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19667
2019-03-29 02:13:06 +00:00
Alan Somers
98852a32af fusefs: fix error handling in fuse_vnop_strategy
Reported by:	cem
Sponsored by:	The FreeBSD Foundation
2019-03-28 21:57:42 +00:00
Alan Somers
f203d1734d fusefs: don't ignore errors in fuse_vnode_refreshsize
Reported by:	Coverity
Coverity CID:	1368622
Sponsored by:	The FreeBSD Foundation
2019-03-27 16:45:30 +00:00
Alan Somers
019dca0199 fusefs: delete dead code in fuse_vnop_setattr
The dead code in question was a broken and incomplete attempt to support the
default_permissions mount option during VOP_SETATTR.  There wasn't anything
there worth saving; I'll have to rewrite it later.

Reported by:	Coverity
Coverity CID:	1008668
Sponsored by:	The FreeBSD Foundation
2019-03-27 16:19:02 +00:00
Alan Somers
3885d4091d fusefs: fix a derefence-after-null-check
Reported by:	Coverity
Coverity CID:	1017940
Sponsored by:	The FreeBSD Foundation
2019-03-27 14:15:35 +00:00
Alan Somers
e0bec057db fusefs: correctly set fuse_release_in.flags in an error path
fuse_vnop_create must close the newly created file if it can't allocate a
vnode.  When it does so, it must use the same file flags for FUSE_RELEASE as
it used for FUSE_OPEN or FUSE_CREATE.

Reported by:	Coverity
Coverity CID:	1066204
Sponsored by:	The FreeBSD Foundation
2019-03-27 02:57:59 +00:00
Alan Somers
4a4282cb06 FUSEFS: during FUSE_READDIR, set the read size correctly.
The old formula was unnecessarily restrictive.

Sponsored by:	The FreeBSD Foundation
2019-03-27 02:01:34 +00:00
Alan Somers
3ba6a4d473 fusefs: set fuse_init_in->max_readahead correctly
The old value was correct only by coincidence.

Sponsored by:	The FreeBSD Foundation
2019-03-27 01:49:35 +00:00
Alan Somers
fd2749f25d fusefs: delete dead code
This change also inlines several previously #define'd symbols that didn't
really have the meanings indicated by the comments.

Sponsored by:	The FreeBSD Foundation
2019-03-26 03:02:45 +00:00
Maxim Sobolev
4f20706113 Refine r345425: get rid of superfluous helper macro that I have added.
MFC after:	2 weeks
2019-03-26 01:28:10 +00:00
Allan Jude
b4b3e3498b Make TMPFS_PAGES_MINRESERVED a kernel option
TMPFS_PAGES_MINRESERVED controls how much memory is reserved for the system
and not used by tmpfs.

On very small memory systems, the default value may be too high and this
prevents these small memory systems from using reroot, which is required
for them to install firmware updates.

Submitted by:	Hiroki Mori <yamori813@yahoo.co.jp>
Reviewed by:	mizhka
Differential Revision:	https://reviews.freebsd.org/D13583
2019-03-25 07:46:20 +00:00
Alan Somers
19ef317d62 fusefs: fallback to MKNOD/OPEN if a filesystem doesn't support CREATE
If a FUSE filesystem returns ENOSYS for FUSE_CREATE, then fallback to
FUSE_MKNOD/FUSE_OPEN.

Also, fix a memory leak in the error path of fuse_vnop_create.  And do a
little cleanup in fuse_vnop_open.

PR:		199934
Reported by:	samm@os2.kiev.ua
Sponsored by:	The FreeBSD Foundation
2019-03-23 00:22:29 +00:00
Maxim Sobolev
ac1a10efad Make it possible to update TMPFS mount point from read-only to read-write
and vice versa.

Reviewed by:	delphij
Approved by:	delphij
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D19682
2019-03-22 21:31:21 +00:00
Alan Somers
bf4d70841f fusefs: support VOP_MKNOD
PR:		236236
Sponsored by:	The FreeBSD Foundation
2019-03-22 19:08:48 +00:00
Alan Somers
8ba190efeb fusefs: fix a panic on mount
Don't page fault if the file descriptor provided with "-o fd" is invalid.

Sponsored by:	The FreeBSD Foundation
2019-03-22 17:53:13 +00:00
Alan Somers
6248288e97 fusefs: correctly handle cacheable negative LOOKUP responses
The FUSE protocol allows for LOOKUP to return a cacheable negative response,
which means that the file doesn't exist and the kernel can cache its
nonexistence.  As of this commit fusefs doesn't cache the nonexistence, but
it does correctly handle such responses.  Prior to this commit attempting to
create a file, even with O_CREAT would fail with ENOENT if the daemon
returned a cacheable negative response.

PR:		236231
Sponsored by:	The FreeBSD Foundation
2019-03-21 23:31:10 +00:00
Alan Somers
915012e0d0 fusefs: Don't treat fsync the same as fdatasync
For an unknown reason, fusefs was _always_ sending the fdatasync operation
instead of fsync.  Now it correctly sends one or the other.

Also, remove the Fsync.fsync_metadata_only test, along with the recently
removed Fsync.nop.  They should never have been added.  The kernel shouldn't
keep track of which files have dirty data; that's the daemon's job.

PR:		236473
Sponsored by:	The FreeBSD Foundation
2019-03-21 23:01:56 +00:00
Alan Somers
90612f3c38 fusefs: VOP_FSYNC should be synchronous -- sometimes
I committed too hastily in r345390.  There are cases, not directly reachable
from userland, where VOP_FSYNC ought to be asynchronous.  This commit fixes
fusefs to handle VOP_FSYNC synchronously if and only if the VFS requests it.

PR:		236474
X-MFC-With:	345390
Sponsored by:	The FreeBSD Foundation
2019-03-21 22:17:10 +00:00
Alan Somers
cc34f2f66a fusefs: VOP_FSYNC should be synchronous
returning asynchronously pretty much defeats the point of fsync

PR:		236474
Sponsored by:	The FreeBSD Foundation
2019-03-21 21:53:55 +00:00
Konstantin Belousov
7ae3486e6d nullfs: fix unmounts when filesystem is active.
If vflush() did not completely flushed the mount vnodes queue, either
retry for forced unmounts, or give up for non-forced.  This situation
can occur when new vnodes are instantiated while vflush() worked.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-03-21 13:30:48 +00:00
Alan Somers
f9856d0813 MFHead @345353 2019-03-20 23:32:37 +00:00
Alan Somers
123af6ec70 Rename fuse(4) to fusefs(4)
This makes it more consistent with other filesystems, which all end in "fs",
and more consistent with its mount helper, which is already named
"mount_fusefs".

Reviewed by:	cem, rgrimes
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19649
2019-03-20 21:48:43 +00:00
Alan Somers
7e4844f7d9 fuse(4): remove more debugging printfs
I missed these in r344664.  They're basically useless because they can only
be controlled at compile-time.  Also, de-inline fuse_internal_cache_attrs.
It's big enough to be a regular function, and this way it gets a dtrace FBT
probe.

Sponsored by:	The FreeBSD Foundation
2019-03-19 17:49:15 +00:00
Alan Somers
2aaf9152a8 MFHead@r345275 2019-03-18 19:21:53 +00:00
Fedor Uporov
0204d1c793 Remove unneeded mount point unlock function calls.
The ext2_nodealloccg() function unlocks the mount point
in case of successful node allocation.
The additional unlocks are not required and should be removed.

PR:		236452
Reported by:	pho
MFC after:	3 days
2019-03-15 11:49:46 +00:00
Edward Tomasz Napierala
2df8bd90c8 Drop unused 'p' argument to nfsv4_strtogid().
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2019-03-12 15:07:47 +00:00
Edward Tomasz Napierala
c703cba811 Drop unused 'p' argument to nfsv4_gidtostr().
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2019-03-12 15:05:11 +00:00
Edward Tomasz Napierala
0658ac3943 Drop unused 'p' argument to nfsv4_strtouid().
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2019-03-12 15:02:52 +00:00
Edward Tomasz Napierala
0f86b94a56 Drop unused 'p' argument to nfsv4_uidtostr().
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2019-03-12 14:59:08 +00:00
Edward Tomasz Napierala
f32bf2922f Drop unused 'p' argument to nfsrv_getuser().
Reviewed by:	rmacklem
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19455
2019-03-12 14:53:53 +00:00
Simon J. Gerraty
f5fdf82d82 Add _PC_ACL_* to vop_stdpathconf
This avoid EINVAL from tmpfs etc.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D19512
2019-03-11 20:40:56 +00:00
Alan Somers
84c4fd1f48 fuse(4): add dtrace probe for illegal short writes
Sponsored by:	The FreeBSD Foundation
2019-03-08 02:00:49 +00:00
Conrad Meyer
9a6a45d850 fuse: switch from DFLTPHYS/MAXBSIZE to maxcachebuf
On GENERIC kernels with empty loader.conf, there is no functional change.
DFLTPHYS and MAXBSIZE are both 64kB at the moment.  This change allows
larger bufcache block sizes to be used when either MAXBSIZE (custom kernel)
or the loader.conf tunable vfs.maxbcachebuf (GENERIC) is adjusted higher
than the default.

Suggested by:	ken@
2019-03-07 00:55:49 +00:00
Conrad Meyer
e7df98863b FUSE: Prevent trivial panic
When open(2) was invoked against a FUSE filesystem with an unexpected flags
value (no O_RDONLY / O_RDWR / O_WRONLY), an assertion fired, causing panic.

For now, prevent the panic by rejecting such VOP_OPENs with EINVAL.

This is not considered the correct long term fix, but does prevent an
unprivileged denial-of-service.

PR:		236329
Reported by:	asomers
Reviewed by:	asomers
Sponsored by:	Dell EMC Isilon
2019-03-06 22:56:49 +00:00
Alan Somers
4cbb4f8886 fuse(4): add tests related to FUSE_MKNOD
PR:		236236
Sponsored by:	The FreeBSD Foundation
2019-03-05 00:27:54 +00:00
Edward Tomasz Napierala
01c27978f5 Don't pass td to nfsvno_open().
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2019-03-04 14:50:00 +00:00
Edward Tomasz Napierala
127152fe56 Don't pass td to nfsvno_createsub().
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2019-03-04 14:30:53 +00:00
Edward Tomasz Napierala
5edc9102dc Don't pass td to nfsd_fhtovp(), it's unused.
Reviewed by:	rmacklem (earlier version)
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19421
2019-03-04 13:18:04 +00:00
Edward Tomasz Napierala
af444b18ed Push down the thread argument in NFS server code, using curthread
instead of passing it explicitly. No functional changes

Reviewed by:	rmacklem (earlier version)
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19419
2019-03-04 13:12:23 +00:00
Edward Tomasz Napierala
113aa93390 Push down td in nfsrvd_dorpc() - make it use curthread instead
of it being explicitly passed as an argument. No functional changes.

The big picture here is that I want to get rid of the 'td' argument
being passed everywhere, and this is the first piece that affects
the NFS server.

Reviewed by:	rmacklem
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19417
2019-03-04 13:02:36 +00:00
Fedor Uporov
9441309ae0 Fix double free in case of mount error.
Reported by:    Christopher Krah <krah@protonmail.com>
Reported as:    FS-9-EXT3-2: Denial Of Service in nmount-5 (vm_fault_hold)
Reviewed by:    pfg
MFC after:      1 week

Differential Revision:    https://reviews.freebsd.org/D19385
2019-03-04 11:33:49 +00:00
Fedor Uporov
3eed9f20d4 Do not read the on-disk inode in case of vnode allocation.
Reported by:    Christopher Krah <krah@protonmail.com>
Reported as:    FS-6-EXT2-4: Denial Of Service in mkdir-0 (ext2_mkdir/vn_rdwr)
Reviewed by:    pfg
MFC after:      1 week

Differential Revision:    https://reviews.freebsd.org/D19327
2019-03-04 11:27:47 +00:00
Fedor Uporov
736da5176d Fix integer overflow possibility.
Reported by:    Christopher Krah <krah@protonmail.com>
Reported as:    FS-2-EXT2-1: Out-of-Bounds Write in nmount (ext2_vget)
Reviewed by:    pfg
MFC after:      1 week

Differential Revision:    https://reviews.freebsd.org/D19326
2019-03-04 11:19:21 +00:00
Fedor Uporov
4ff6603ab3 Do not panic if inode bitmap is corrupted.
admbug:         804
Reported by:    Ilja Van Sprundel <ivansprundel@ioactive.com>
Reviewed by:    pfg
MFC after:      1 week

Differential Revision:    https://reviews.freebsd.org/D19325
2019-03-04 11:12:19 +00:00
Fedor Uporov
80a4a9716b Validate block bitmaps.
Reviewed by:    pfg
MFC after:      1 week

Differential Revision:    https://reviews.freebsd.org/D19324
2019-03-04 11:01:23 +00:00
Fedor Uporov
daa2d62da2 Add additional on-disk inode checks.
Reviewed by:    pfg
MFC after:      1 week

Differential Revision:    https://reviews.freebsd.org/D19323
2019-03-04 10:55:01 +00:00
Fedor Uporov
6e38bf94e5 Make superblock reading logic more strict.
Add more on-disk superblock consistency checks to ext2_compute_sb_data() function.
It should decrease the probability of mounting filesystems with corrupted superblock data.

Reviewed by:    pfg
MFC after:      1 week

Differential Revision:    https://reviews.freebsd.org/D19322
2019-03-04 10:42:25 +00:00
Alan Somers
c02ccc7e44 Fix typos from r344664
Sponsored by:	The FreeBSD Foundation
2019-03-01 15:49:11 +00:00
Alan Somers
cf16949867 fuse(4): convert debug printfs into dtrace probes
fuse(4) was heavily instrumented with debug printf statements that could
only be enabled with compile-time flags.  They fell into three basic groups:

1) Totally redundant with dtrace FBT probes.  These I deleted.
2) Print textual information, usually error messages.  These I converted to
   SDT probes of the form fuse:fuse:FILE:trace.  They work just like the old
   printf statements except they can be enabled at runtime with dtrace.
   They can be filtered by FILE and/or by priority.
3) More complicated probes that print detailed information.  These I
   converted into ad-hoc SDT probes.

Sponsored by:	The FreeBSD Foundation
2019-02-28 19:27:54 +00:00
Conrad Meyer
f6ebb68395 fuse: Fix a regression introduced in r337165
On systems with non-default DFLTPHYS and/or MAXBSIZE, FUSE would attempt to
use a buf cache block size in excess of permitted size.  This did not affect
most configurations, since DFLTPHYS and MAXBSIZE both default to 64kB.
The issue was discovered and reported using a custom kernel with a DFLTPHYS
of 512kB.

PR:		230260 (comment #9)
Reported by:	ken@
MFC after:	π/𝑒 weeks
2019-02-21 02:41:57 +00:00
Matt Macy
81167243b4 PFS: Bump NAMELEN and don't require clients to be sleepable
- debugfs consumers expect to be able to export names more than 48 characters

- debugfs consumers expect to be able to hold locks across calls and are able
  to handle allocation failures

Reviewed by:	hps@
MFC after:	1 week
Sponsored by:	iX Systems
Differential Revision:	https://reviews.freebsd.org/D19256
2019-02-20 20:55:02 +00:00
Conrad Meyer
02295caf43 Fuse: whitespace and style(9) cleanup
Take a pass through fixing some of the most egregious whitespace issues in
fs/fuse.  Also fix some style(9) warts while here.  Not 100% cleaned up, but
somewhat less painful to look at and edit.

No functional change.
2019-02-20 02:49:26 +00:00
Conrad Meyer
bd4cb2a46d fuse: add descriptions for remaining sysctls
(Except reclaim revoked; I don't know what that goal of that one is.)
2019-02-20 02:48:59 +00:00
Edward Tomasz Napierala
c9172fb4f1 Work around the "nfscl: bad open cnt on server" assertion
that can happen when rerooting into NFSv4 rootfs with kernel
built with INVARIANTS.

I've talked to rmacklem@ (back in 2017), and while the root cause
is still unknown, the case guarded by assertion (nfscl_doclose()
being called from VOP_INACTIVE) is believed to be safe, and the
whole thing seems to run just fine.

Obtained from:	CheriBSD
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2019-02-19 12:45:37 +00:00
Conrad Meyer
3c324b9465 FUSE: Refresh cached file size when it changes (lookup)
The cached fvdat->filesize is indepedent of the (mostly unused)
cached_attrs, and we failed to update it when a cached (but perhaps
inactive) vnode was found during VOP_LOOKUP to have a different size than
cached.

As noted in the code comment, this can occur in distributed filesystems or
with other kinds of irregular file behavior (anything is possible in FUSE).

We do something similar in fuse_vnop_getattr already.

PR:		230258 (as reported in description; other issues explored in
			comments are not all resolved)
Reported by:	MooseFS FreeBSD Team <freebsd AT moosefs.com>
Submitted by:	Jakub Kruszona-Zawadzki <acid AT moosefs.com> (earlier version)
2019-02-15 22:55:13 +00:00
Conrad Meyer
c4af8b173a FUSE: The FUSE design expects writethrough caching
At least prior to 7.23 (which adds FUSE_WRITEBACK_CACHE), the FUSE protocol
specifies only clean data to be cached.

Prior to this change, we implement and default to writeback caching.  This
is ok enough for local only filesystems without hardlinks, but violates the
general design contract with FUSE and breaks distributed filesystems or
concurrent access to hardlinks of the same inode.

In this change, add cache mode as an extension of cache enable/disable.  The
new modes are UC (was: cache disabled), WT (default), and WB (was: cache
enabled).

For now, WT caching is implemented as write-around, which meets the goal of
only caching clean data.  WT can be better than WA for workloads that
frequently read data that was recently written, but WA is trivial to
implement.  Note that this has no effect on O_WRONLY-opened files, which
were already coerced to write-around.

Refs:
  * https://sourceforge.net/p/fuse/mailman/message/8902254/
  * https://github.com/vgough/encfs/issues/315

PR:		230258 (inspired by)
2019-02-15 22:52:49 +00:00
Conrad Meyer
194e691aaf FUSE: Only "dirty" cached file size when data is dirty
Most users of fuse_vnode_setsize() set the cached fvdat->filesize and update
the buf cache bounds as a result of either a read from the underlying FUSE
filesystem, or as part of a write-through type operation (like truncate =>
VOP_SETATTR).  In these cases, do not set the FN_SIZECHANGE flag, which
indicates that an inode's data is dirty (in particular, that the local buf
cache and fvdat->filesize have dirty extended data).

PR:		230258 (related)
2019-02-15 22:51:09 +00:00
Conrad Meyer
09176f096b FUSE: Respect userspace FS "do-not-cache" of path components
The FUSE protocol demands that kernel implementations cache user filesystem
path components (lookup/cnp data) for a maximum period of time in the range
of [0, ULONG_MAX] seconds.  In practice, typical requests are for 0, 1, or
10 seconds; or "a long time" to represent indefinite caching.

Historically, FreeBSD FUSE has ignored this client directive entirely.  This
works fine for local-only filesystems, but causes consistency issues with
multi-writer network filesystems.

For now, respect 0 second cache TTLs and do not cache such metadata.
Non-zero metadata caching TTLs in the range [0.000000001, ULONG_MAX] seconds
are still cached indefinitely, because it is unclear how a userspace
filesystem could do anything sensible with those semantics even if
implemented.

Pass fuse_entry_out to fuse_vnode_get when available and only cache lookup
if the user filesystem did not set a zero second TTL.

PR:		230258 (inspired by; does not fix)
2019-02-15 22:50:31 +00:00
Conrad Meyer
78a7722fbc FUSE: Respect userspace FS "do-not-cache" of file attributes
The FUSE protocol demands that kernel implementations cache user filesystem
file attributes (vattr data) for a maximum period of time in the range of
[0, ULONG_MAX] seconds.  In practice, typical requests are for 0, 1, or 10
seconds; or "a long time" to represent indefinite caching.

Historically, FreeBSD FUSE has ignored this client directive entirely.  This
works fine for local-only filesystems, but causes consistency issues with
multi-writer network filesystems.

For now, respect 0 second cache TTLs and do not cache such metadata.
Non-zero metadata caching TTLs in the range [0.000000001, ULONG_MAX] seconds
are still cached indefinitely, because it is unclear how a userspace
filesystem could do anything sensible with those semantics even if
implemented.

In the future, as an optimization, we should implement notify_inval_entry,
etc, which provide userspace filesystems a way of evicting the kernel cache.

One potentially bogus access to invalid cached attribute data was left in
fuse_io_strategy.  It is restricted behind the undocumented and non-default
"vfs.fuse.fix_broken_io" sysctl or "brokenio" mount option; maybe these are
deadcode and can be eliminated?

Some minor APIs changed to facilitate this:

1. Attribute cache validity is tracked in FUSE inodes ("fuse_vnode_data").

2. cache_attrs() respects the provided TTL and only caches in the FUSE
inode if TTL > 0.  It also grows an "out" argument, which, if non-NULL,
stores the translated fuse_attr (even if not suitable for caching).

3. FUSE VTOVA(vp) returns NULL if the vnode's cache is invalid, to help
avoid programming mistakes.

4. A VOP_LINK check for potential nlink overflow prior to invoking the FUSE
link op was weakened (only performed when we have a valid attr cache).  The
check is racy in a multi-writer network filesystem anyway -- classic TOCTOU.
We have to trust any userspace filesystem that rejects local caching to
account for it correctly.

PR:		230258 (inspired by; does not fix)
2019-02-15 22:49:15 +00:00
Konstantin Belousov
b9662886ef Un null_vptocnp(), cache vp->v_mount and use it for null_nodeget() call.
The vp vnode is unlocked during the execution of the VOP method and
can be reclaimed, zeroing vp->v_data.  Caching allows to use the
correct mount point.

Reported and tested by:	pho
PR: 235549
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-08 08:20:18 +00:00
Konstantin Belousov
25728e8411 Before using VTONULL(), check that the covered vnode belongs to nullfs.
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-08 08:17:31 +00:00
Konstantin Belousov
930cc2dbef Some style for nullfs_mount(). Also use bool type for isvnunlocked.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-08 08:15:29 +00:00
Pedro F. Giffuni
771ec59bb7 ext2fs: Add some extra consistency checks for the superblock.
Maliciously formed, or badly corrupted, filesystems can cause kernel
panics.  In general, such acts of foot-shooting can only be accomplished
by root, but in a world with VM images that is  moving towards automated
mounts it is important to have some form of prevention.

Reported by: Christopher Krah, Thomas Barabosch, and Jan-Niclas Hilgert
of Fraunhofer FKIE.
Incidentaly this should also fix a memory corruption issue reported by
Dr Silvio Cesare of InfoSect.

Huge thanks to all reseachers for making us aware of the issue.

admbug:		872, 891
Reviewed by:	fsu
Obtained from:	NetBSD (with minor changes)
MFC after:	3 days
2019-01-25 22:22:29 +00:00
Mark Johnston
d9463dd4f3 nfs: Zero the buffers exported by NFSSVC_DUMPCLIENTS and DUMPLOCKS.
Note that these interfaces are available only to root.

admbugs:	765
Reported by:	Vlad Tsyrklevich <vlad@tsyrklevich.net>
Reviewed by:	rmacklem
MFC after:	1 day
Security:	Kernel memory disclosure
Sponsored by:	The FreeBSD Foundation
2019-01-21 23:54:33 +00:00
Oleksandr Tymoshenko
52b2c8e242 [smbfs] Allow semicolon in mounts that support long names
Semicolon is a legal character in long names but not in 8.3 format.
Move it to respective character set.

PR:		140068
Submitted by:	tom@uffner.com
MFC after:	3 weeks
2019-01-20 05:52:16 +00:00
Gleb Smirnoff
756a541279 Allocate pager bufs from UMA instead of 80-ish mutex protected linked list.
o In vm_pager_bufferinit() create pbuf_zone and start accounting on how many
  pbufs are we going to have set.
  In various subsystems that are going to utilize pbufs create private zones
  via call to pbuf_zsecond_create(). The latter calls uma_zsecond_create(),
  and sets a limit on created zone. After startup preallocate pbufs according
  to requirements of all pbuf zones.

  Subsystems that used to have a private limit with old allocator now have
  private pbuf zones: md(4), fusefs, NFS client, smbfs, VFS cluster, FFS,
  swap, vnode pager.

  The following subsystems use shared pbuf zone: cam(4), nvme(4), physio(9),
  aio(4). They should have their private limits, but changing that is out of
  scope of this commit.

o Fetch tunable value of kern.nswbuf from init_param2() and while here move
  NSWBUF_MIN to opt_param.h and eliminate opt_swap.h, that was holding only
  this option.
  Default values aren't touched by this commit, but they probably should be
  reviewed wrt to modern hardware.

This change removes a tight bottleneck from sendfile(2) operation, that
uses pbufs in vnode pager. Other pagers also would benefit from faster
allocation.

Together with:	gallatin
Tested by:	pho
2019-01-15 01:02:16 +00:00
Kirk McKusick
c0029546f8 When loading an inode from disk, verify that its mode is valid.
If invalid, return EINVAL. Note that inode check-hashes greatly
reduce the chance that these errors will go undetected.

Reported by:  Christopher Krah <krah@protonmail.com>
Reported as:  FS-5-UFS-2: Denial Of Service in nmount-3 (ffs_read)
Reviewed by:  kib
MFC after:    1 week
Sponsored by: Netflix

M    sys/fs/ext2fs/ext2_vnops.c
M    sys/kern/vfs_subr.c
M    sys/ufs/ffs/ffs_snapshot.c
M    sys/ufs/ufs/ufs_vnops.c
2018-12-27 07:18:53 +00:00
Bruce Evans
416e232cc6 Fix clobbering of the fatchain cache for clustered i/o's when full
clustering is not done.  The bug caused extreme slowness for large
files in some cases.

There is no way to tell VOP_BMAP() how many blocks are wanted, so for
all file systems it has to waste time in some cases by searching for
more contiguous blocks than will be accessed.  For msdosfs, it also
clobbered the fatchain cache in these cases by advancing the cache to
point to the chain entry for block that won't be read.  This makes
the cache useless for the next sequential i/o (or VOP_BMAP()), so the
fat chain is searched from the beginning.  The cache only has 1 relevant
entry, so it is similarly useless for random i/o.

Fix this by only advancing the cache to point to the chain entry for
the first block that will be read.  Clustering uses results from
VOP_BMAP(), so when more than 1 block is read by clustering, the cache
is not advanced as optimally as before, but it is at most 1 cluster
size behind and searching the chain through the blocks for this cluster
doesn't take too long.
2018-12-21 21:17:45 +00:00
Bruce Evans
8ec22c4d65 Quick fix for initialization of mnt_iosize_max. (This limit controls
mainly clustering and read-ahead.)  Copy the initialization from ffs,
and also copy a couple of lines of ffs's nearby style for initialization
order and whitespace.

A correct fix would de-duplicate the initialization and fix bitrot in it
instead of adding another instance of the duplication.  Complications to
use the size preferred by the device have been reduced to hard-coding
slightly pessimal and/or inconsistent defaults, using large code that was
almost needed to support the complications.

For msdosfs, the result was that mnt_iosize_max was DFTLPHYS (64K) but is
now MAXPHYS (128K).
2018-12-21 20:12:43 +00:00
Rick Macklem
23114c6c2a Fix the NFSv4 server to obey vfs.nfsd.nfs_privport.
When the NFSv4 server was coded, I believed that the specification authors
did not want NFSv4 servers to require a client to use a reserved port#.
However, recently it has been noted that the Linux NFSv4 server does support
a check for a reserved port#.
Since both the FreeBSD and Linux NFSv4 clients use a reserved port# by
default, enabling vfs.nfsd.nfs_privport to require a reserved port# for
NFSv4 the same as it does for NFSv2, 3 seems reasonable.
The only case where this could cause a POLA violation is a FreeBSD NFSv4
server with vfs.nfsd.nfs_privport set, but with NFSv4 clients doing mounts
without using a reserved port# (< 1024).

Tested by:	chaz.newton58@gmail.com
PR:		234106
MFC after:	1 week
2018-12-20 22:21:41 +00:00
Mateusz Guzik
cc426dd319 Remove unused argument to priv_check_cred.
Patch mostly generated with cocinnelle:

@@
expression E1,E2;
@@

- priv_check_cred(E1,E2,0)
+ priv_check_cred(E1,E2)

Sponsored by:	The FreeBSD Foundation
2018-12-11 19:32:16 +00:00
Mark Johnston
352aaa5122 Plug memory disclosures via ptrace(2).
On some architectures, the structures returned by PT_GET*REGS were not
fully populated and could contain uninitialized stack memory.  The same
issue existed with the register files in procfs.

Reported by:	Thomas Barabosch, Fraunhofer FKIE
Reviewed by:	kib
MFC after:	3 days
Security:	kernel stack memory disclosure
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18421
2018-12-03 20:54:17 +00:00
Mark Johnston
fee65dfc37 Ensure the dirent remains initialized when dirent.d_fileno is unset.
Reported by:	rmacklem
MFC with:	r340856
Sponsored by:	The FreeBSD Foundation
2018-11-23 23:07:49 +00:00
Mark Johnston
6d2e2df764 Ensure that directory entry padding bytes are zeroed.
Directory entries must be padded to maintain alignment; in many
filesystems the padding was not initialized, resulting in stack
memory being copied out to userspace.  With the ino64 work there
are also some explicit pad fields in struct dirent.  Add a subroutine
to clear these bytes and use it in the in-tree filesystems.  The
NFS client is omitted for now as it was fixed separately in r340787.

Reported by:	Thomas Barabosch, Fraunhofer FKIE
Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2018-11-23 22:24:59 +00:00
Rick Macklem
f86bce1770 Make sure the NFS readdir client fills in all "struct dirent" data.
The NFS client code (nfsrpc_readdir() and nfsrpc_readdirplus()) wasn't
filling in parts of the readdir reply, such as d_pad[01] and the bytes
at the end of d_name within d_reclen. As such, data left in a buffer cache
block could be leaked to userland in the readdir reply.
This patch makes sure all of the data is filled in.

Reported by:	Thomas Barabosch, Fraunhofer FKIE
Reviewed by:	kib, markj
MFC after:	2 weeks
2018-11-23 00:17:47 +00:00
Mateusz Guzik
53011553fa proc: convert pfind & friends to use pidhash locks and other cleanup
pfind_locked is retired as it relied on allproc which unnecessarily
restricts locking of the hash.

Sponsored by:	The FreeBSD Foundation
2018-11-21 20:15:56 +00:00
Mateusz Guzik
30e0cf499f tmpfs: use unr64 for inode numbers
Sponsored by:	The FreeBSD Foundation
2018-11-20 15:14:30 +00:00
Rick Macklem
75772b69f2 Improve sanity checking for the dircount hint argument to
NFSv3's ReaddirPlus and NFSv4's Readdir operations. The code
checked for a zero argument, but did not check for a very large value.
This patch clips dircount at the server's maximum data size.

MFC after:	1 week
2018-11-20 01:59:57 +00:00
Rick Macklem
778f29833b nfsm_advance() would panic() when the offs argument was negative.
The code assumed that this would indicate a corrupted mbuf chain, but
it could simply be caused by bogus RPC message data.
This patch replaces the panic() with a printf() plus error return.

MFC after:	1 week
2018-11-20 01:56:34 +00:00
Rick Macklem
1d171e7971 r304026 added code that started statistics gathering for an operation
before the operation number (the variable called "op") was sanity checked.
This patch moves the code down to below the range sanity check for "op".
2018-11-20 01:52:45 +00:00
Mark Johnston
3d2a0fe762 Remove comments made obsolete by the ino64 work.
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2018-11-19 17:33:44 +00:00
Konstantin Belousov
1c4ca77890 Add d_off support for multiple filesystems.
The d_off field has been added to the dirent structure recently.
Currently filesystems don't support this feature.  Support has been
added and tested for zfs, ufs, ext2fs, fdescfs, msdosfs and unionfs.
A stub implementation is available for cd9660, nandfs, udf and
pseudofs but hasn't been tested.

Motivation for this feature: our usecase is for a userspace nfs server
(nfs-ganesha) with zfs.  At the moment we cache direntry offsets by
calling lseek once per entry, with this patch we can get the offset
directly from getdirentries(2) calls which provides a significant
speedup.

Submitted by:	Jack Halford <jack@gandi.net>
Reviewed by:	mckusick, pfg, rmacklem (previous versions)
Sponsored by:	Gandi.net
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17917
2018-11-14 14:18:35 +00:00
Rick Macklem
6ad8a6eaa4 Change nfs_advlock() so that the NFSVOPUNLOCK() is mostly done at the end.
Prior to this patch, nfs_advlock() did NFSVOPUNLOCK(); return (error);
in many places. This patch replaces these code sequenences with a "goto out;"
and does the NFSVOPUNLOCK(); return (error); at the end of the function
in order to make the vnode locking simpler.
This patch does not change the semantics of nfs_advlock().

Suggested by:	kib
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D17853
2018-11-06 22:50:50 +00:00
Brooks Davis
318f0d7720 Use declared types for caddr_t arguments.
Leave ptrace(2) alone for the moment as it's defined to take a caddr_t.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17852
2018-11-06 18:46:38 +00:00
Brooks Davis
1493c2ee62 Make vop_symlink take a const target path.
This will enable callers to take const paths as part of syscall
decleration improvements.

Where doing so is easy and non-distruptive carry the const through
implementations. In UFS the value is passed to an interface that must
take non-const values. In ZFS, const poisoning would touch code shared
with upstream and it's not worth adding diffs.

Bump __FreeBSD_version for external API consumers.

Reviewed by:	kib (prior version)
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17805
2018-11-02 14:42:36 +00:00
Rick Macklem
881a9516a2 Fix NFS client vnode locking to avoid a crash during forced dismount.
A crash was reported where the crash occurred in nfs_advlock() when the
NFS_ISV4(vp) macro was being executed. This was caused by the vnode
being VI_DOOMED due to a forced dismount in progress.
This patch fixes the problem by locking the vnode before executing the
NFS_ISV4() macro.

Tested by:	rlibby
PR:		232673
Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D17757
2018-11-01 15:27:22 +00:00
Brooks Davis
ed34a7fcf2 Move 32-bit compat support for FIODGNAME to the right place.
ioctl(2) commands only have meaning in the context of a file descriptor
so translating them in the syscall layer is incorrect.

The new handler users an accessor to retrieve/construct a pointer from
the last member of the passed structure and relies on type punning to
access the other member which requires no translation.

Unlike r339174 this change supports both places FIODGNAME is handled.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17475
2018-10-26 17:59:25 +00:00
Konstantin Belousov
8ff7fad1d7 Only call sigdeferstop() for NFS.
Use bypass to catch any NFS VOP dispatch and route it through the
wrapper which does sigdeferstop() and then dispatches original
VOP. NFS does not need a bypass below it, which is not supported.

The vop offset in the vop_vector is added since otherwise it is
impossible to get vop_op_t from the internal table, and I did not
wanted to create the layered fs only to wrap NFS VOPs.

VFS_OP()s wrap is straightforward.

Requested and reviewed by:	mjg (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D17658
2018-10-23 21:43:41 +00:00
Andriy Gapon
ca8f3d1ca2 nfsrvd_readdirplus: for some errors, do not fail the entire request
Instead, a failing entry is skipped.
This change consist of two logical changes.

A failure to vget or lookup an entry is considered to be a result of a
concurrent removal, which is the only reasonable explanation given that
the filesystem is busied.  So, the entry would be silently skipped.

In the case of a failure to get attributes of an entry for an NFSv3
request, the entry would be silently skipped.  There can be legitimate
reasons for the failure, but NFSv3 does not provide any means to report
the error, so we have two options: either fail the whole request or
ignore the failed entry.  Traditionally, the old NFS server used the
latter option, so the code is reverted to it.  Making the whole
directory unreadable because of a single entry seems to be unpractical.

Additionally, some bits of code are slightly re-arranged to account for
the new control flow and to honor style(9).

Reviewed by:	rmacklem
Sponsored by:	Panzura
Differential Revision: https://reviews.freebsd.org/D15424
2018-10-22 15:33:05 +00:00
Rick Macklem
910ccc7727 Fix the pNFS server's reporting of disk space usage for the "#<path>" case.
The pNFS server would report the total disk space used and free for all
of the DSs, even when certain DSs are assigned to the file system via
the "#<path>" suffix used in the "nfsd -p" option argument.
This patch fixes this case. It only reports usage for the file system
that the argument vnode resides on. This is consistent with the non-pNFS
NFSv4 server. In NFSv4 it is possible to have subtrees on other file
systems, but these are not included in the usage information for NFSv4.

Approved by:	re (gjb)
2018-10-09 01:10:50 +00:00
Brooks Davis
9bc603bd20 Revert r339174: Move 32-bit compat support for FIODGNAME to the right place.
A case was missed in this commit which breaks sshing into a 32-bit sshd
on a 64-bit system.

Approved by:	re (gjb)
2018-10-04 23:55:03 +00:00
Brooks Davis
23f2e22802 Move 32-bit compat support for FIODGNAME to the right place.
ioctl(2) commands only have meaning in the context of a file descriptor
so translating them in the syscall layer is incorrect.

The new handler users an accessor to retrieve/construct a pointer from
the last member of the passed structure and relies on type punning to
access the other member which requires no translation.

Reviewed by:	kib
Approved by:	re (rgrimes, gjb)
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Review:	https://reviews.freebsd.org/D17388
2018-10-03 20:39:48 +00:00
Mark Murray
19fa89e938 Remove the Yarrow PRNG algorithm option in accordance with due notice
given in random(4).

This includes updating of the relevant man pages, and no-longer-used
harvesting parameters.

Ensure that the pseudo-unit-test still does something useful, now also
with the "other" algorithm instead of Yarrow.

PR:		230870
Reviewed by:	cem
Approved by:	so(delphij,gtetlow)
Approved by:	re(marius)
Differential Revision:	https://reviews.freebsd.org/D16898
2018-08-26 12:51:46 +00:00
Fedor Uporov
28f4f62303 FUSE extattrs: fix issue when neither uio nor size were not passed to VOP_* (cosmetic only).
Reviewed by:    cem, pfg
MFC after:      2 weeks

Differential Revision:	https://reviews.freebsd.org/D13737
2018-08-21 18:50:29 +00:00
Fedor Uporov
493b4a8ccd FUSE extattrs: fix issue when neither uio nor size were not passed to VOP_*.
The requested size was returned incorrectly in case uio == NULL from listextattr because the
nameprefix/name conversion was not applied.
Also, make a_size/uio returning logic more unified with other filesystems.

Reviewed by:    cem, pfg
MFC after:      2 weeks

Differential Revision:	https://reviews.freebsd.org/D13528
2018-08-21 18:39:47 +00:00
Fedor Uporov
4c1e1d2bcc Change unused inodes counters behavior in the cylinder groups.
Make it more close to native ext4 implementation to avoid fsck errors.
2018-08-21 18:39:29 +00:00
Fedor Uporov
e49d64a7a7 Fix directory blocks checksum updating logic.
Count dirent tail in the searchslot logic in case of directory block search.
Add htree root csum update function call in case of rename.
2018-08-21 18:39:02 +00:00
Rick Macklem
fdab4d3b29 Fix LORs between vn_start_write() and vn_lock() in nfsrv_copymr().
When coding the pNFS server, I added vn_start_write() calls in nfsrv_copymr()
done while the vnodes were locked, not realizing I had introduced LORs and
possible deadlock when an exported file system on the MDS is suspended.
This patch fixes the LORs by moving the vn_start_write() calls up to before
where the vnodes are locked. For "tvp", the vn_start_write() probaby isn't
necessary, because NFS mounts can't be suspended. However, I think doing
so is harmless.
Thanks go to kib@ for letting me know that I had introduced these LORs.
This patch only affects the behaviour of the pNFS server when pnfsdscopymr(8)
is used to recover a mirrored DS.
2018-08-18 19:14:06 +00:00
Rick Macklem
3e5ba2e187 Fix LORs between vn_start_write() and vn_lock() in the pNFS server.
When coding the pNFS server, I added several vn_start_write() calls done
while the vnode was locked, not realizing I had introduced LORs and
possible deadlock when an exported file system on the MDS is suspended.
This patch fixes this by removing the added vn_start_write() calls and
modifying the code so that the extant vn_start_write() call before the
NFS RPC/operation is done when needed by the pNFS server.
Flags are changed so that LayoutCommit and LayoutReturn now get a
vn_start_write() done for them.
When the pNFS server is enabled, the code now also changes the flags for
Getattr, so that the vn_start_write() is done for Getattr, since it may
need to do a vn_set_extattr(). The nfs_writerpc flag array was made global
to the NFS server and renamed nfsrv_writerpc, which is consistent naming
for globals in the NFS server.
Thanks go to kib@ for reporting that doing vn_start_write() while the vnode is
locked results in a LOR.
This patch only affects the behaviour of the pNFS server.
2018-08-17 21:12:16 +00:00
Rick Macklem
9fbb0faf4f Don't set a file's size for the MDS file of a pNFS service.
When a pNFS service is running, the size of the files created on the MDS
are normally 0, since the data is written to the data files on the DS(s).
However, without this patch, if a Setattr with a non-zero size was done by
a client, the MDS file was set to that size.  This was thought to be benign,
but it turns out that files with a non-zero size plus extended attributes
can cause a "ffs_truncate3" panic in UFS. Although the exact cause of this
panic() has not been isolated, this patch avoids the panic() and leaves
the MDS files in a consistent state of always having a size == 0.
Note that these MDS files never store data. The patch also includes an
unnecessary initialization of savsize in case some compiler or static
analyser complains it might not be initialized.
This patch only affects the NFS server when pNFS is enabled via the "-p"
command line option on nfsd.
2018-08-17 12:32:38 +00:00
Jamie Gritton
284001a222 Put jail(2) under COMPAT_FREEBSD11. It has been the "old" way of creating
jails since FreeBSD 7.

Along with the system call, put the various security.jail.allow_foo and
security.jail.foo_allowed sysctls partly under COMPAT_FREEBSD11 (or
BURN_BRIDGES).  These sysctls had two disparate uses: on the system side,
they were global permissions for jails created via jail(2) which lacked
fine-grained permission controls; inside a jail, they're read-only
descriptions of what the current jail is allowed to do.  The first use
is obsolete along with jail(2), but keep them for the second-read-only use.

Differential Revision:	D14791
2018-08-16 18:40:16 +00:00
Conrad Meyer
5cb27f0813 FUSE: Document global sysctl knobs
So that I don't have to keep grepping around the codebase to remember what each
one does.  And maybe it saves someone else some time.

Fix a trivial whitespace issue while here.

No functional change.

Sponsored by:	Dell EMC Isilon
2018-08-15 17:41:19 +00:00
Toomas Soome
527d337fdb cd9660 pointer sign issues and missing __packed attribute
The isonum_* functions are defined to take unsigend char* as an argument,
but the structure fields are defined as char. Change to u_char where needed.

Probably the full structure should be changed, but I'm not sure about the
side affects.

While there, add __packed attribute.

Differential Revision:	https://reviews.freebsd.org/D16564
2018-08-15 06:42:31 +00:00
Rick Macklem
41df1b5b47 Assorted fixes to handling of LayoutRecall callbacks, mostly error handling.
After a re-read of the appropriate section of RFC5661, I decided that a
few things should be changed related to LayoutRecall callback handling.
Here are the things fixed by this patch.
- For two of the three cases that LayoutRecall is done, I now think
  setting the clora_changed argument false is correct.
- All errors other than NFSERR_DELAY returned by LayoutRecall appear
  permanent, so don't retry for any of them. (NFSERR_DELAY is retried by
  newnfs_request(), so it is not affected by this patch.)
- Instead of waiting "forever" (actually until the process is SIGTERM'd)
  for Layouts to be returned during a mirror copy, fail and return
  ENXIO after about 1minute.
  Waiting for a <ctrl>C made sense when pnfsdscopymr() was done by itself,
  but did not make sense when done via find(1).
This patch only affects the pNFS server.
2018-08-08 20:21:45 +00:00
Pedro F. Giffuni
c820acbf0a msdosfs: fixes for Undefined Behavior.
These were found by the Undefined Behaviour GsoC project at NetBSD:

Do not change signedness bit with left shift.
While there avoid signed integer overflow.
Address both issues with using unsigned type.

msdosfs_fat.c:512:42, left shift of 1 by 31 places cannot be represented
in type 'int'
msdosfs_fat.c:521:44, left shift of 1 by 31 places cannot be represented
in type 'int'
msdosfs_fat.c:744:14, left shift of 1 by 31 places cannot be represented
in type 'int'
msdosfs_fat.c:744:24, signed integer overflow: -2147483648 - 1 cannot be
represented in type 'int [20]'
msdosfs_fat.c:840:13, left shift of 1 by 31 places cannot be represented
in type 'int'
msdosfs_fat.c:840:36, signed integer overflow: -2147483648 - 1 cannot be
represented in type 'int [20]'

Detected with micro-UBSan in the user mode.

Hinted from:	NetBSD (CVS 1.33)
MFC after:	2 weeks
Differenctial Revision:	https://reviews.freebsd.org/D16615
2018-08-08 15:08:22 +00:00
Fedor Uporov
53288b712d Split the dir_index and dir_nlink features.
Do not allow to create more that EXT4_LINK_MAX links to directory in case
if the dir_nlink is not set, like it is done in the fresh e2fsprogs updates.

MFC after:      3 months
2018-08-08 12:08:46 +00:00
Fedor Uporov
17c7b27f55 Fix directory blocks checksum updating logic.
The checksum updating functions were not called in case of dir index inode splitting
and in case of dir entry removing, when the entry was first in the block.
Fix and move the dir entry adding logic when i_count == 0 to new function.

MFC after:      3 months
2018-08-08 12:07:45 +00:00
Conrad Meyer
3dc1c7d6bc FUSE: Remove some set-but-not-used variables
No functional change.
2018-08-08 04:46:03 +00:00
Rick Macklem
93df87f208 Allow newnfs_request() to retry all callback RPCs with an NFSERR_DELAY reply.
The code in newnfs_request() retries RPCs that get a reply of NFSERR_DELAY,
but exempts certain NFSv4 operations. However, for callback RPCs, there
should not be any exemptions at this time. The code would have erroneously
exempted the CBRECALL callback, since it has the same operation number as
the CLOSE operation.
This patch fixes this by checking for a callback RPC (indicated by clp != NULL)
and not checking for exempt operations for callbacks.
This would have only affected the NFSv4 server when delegations are enabled
(they are not enabled by default) and the client replies to CBRECALL with
NFSERR_DELAY. This may never actually happen.
Spotted during code inspection.

MFC after:	2 weeks
2018-08-07 21:29:14 +00:00
Rick Macklem
25705dd5d0 Copy all bits of a file handle in case there is padding in the structure.
At least on x86, fhandle_t is a packed structure, so I believe an
assignment will copy all the bits. However, for some current/future
architectures, there might be padding in the structure that doesn't get
copied via an assignment.
Since NFS assumes a file handle is an opaque blob of bits that can be
compared via memcmp()/bcmp(), all the bits including any padding must be
copied.
This patch replaces the assignments with a call to a byte copy function.
Spotted during code inspection.
2018-08-05 19:21:50 +00:00
Rick Macklem
ac0d649588 Silence newer gcc warnings.
Newer versions of gcc generate "might not be initialized" warnings for
several variables in nfsrpc_doiods().  I have checked and all of these
variables are assigned values before they are used.
In the one case of "tdrpc", it could have passed garbage as an argument
to nfscl_dofflayoutio() when mirrorcnt is one. However nfscl_dofflayoutio() only
uses the argument when mirrorcnt > 1, so it wasn't actually broken.
This patch initializes "tdrpc" to avoid confusion and initializes the rest
to make the compiler happy.

Requested by:	mmacy
2018-08-02 20:10:59 +00:00
Conrad Meyer
dab6195cd3 FUSE: Bump maximum IO size to enable more performant operation
Various components restrict size of IO passed up to the userspace filesystem
based on the mount's f_iosize value.  The previous default of PAGE_SIZE
is anemic, even for normal filesystems, but especially considering every
FUSE operation involves a kernel <-> userspace IPC upcall.

Bump to DFLTPHYS (currently 64kB) to match other FUSE implementations.

Anecdotally, Jakub reports IO read performance increased from 600 MB/s ->
2700 MB/s with a basic RAM-backed FUSE filesystem.

PR:		230260
Reported by:	Peter (MooseFS) <freebsd AT moosefs.com>
Tested by:	Jakub Kruszona-Zawadzki <acid AT moosefs.com>
MFC after:	3 days
2018-08-02 19:25:43 +00:00
Ed Maste
195e6c50d3 msdosfs: trim EOL whitespace 2018-07-31 12:44:28 +00:00
Ed Maste
a6274b81d5 cd9660: replace bcopy/bzero with C standard equivalents
To reduce diffs against NetBSD.
2018-07-31 12:36:46 +00:00
Ed Maste
22e56aea3f msdosfs: use same max filesize #define as NetBSD and move to header
For use by makefs msdosfs support.

Obtained from:	NetBSD denode.h 1.6
Sponsored by:	The FreeBSD Foundation
2018-07-30 20:36:51 +00:00
Rick Macklem
743d528198 Silence newer gcc warnings.
Newer versions of gcc generate "set, but not used" warnings.
Add __unused macros to silence these warnings.
Although the variables are not being used, they are values parsed from
arguments to callback RPCs that might be needed in the future.

Requested by:	mmacy
2018-07-30 20:25:32 +00:00
Rick Macklem
8014c97147 Silence newer gcc warnings.
Newer versions of gcc generate "set, but not used" warnings in the NFS server.
Add __unused macros to silence these warnings.

Requested by:	mmacy
2018-07-29 21:51:17 +00:00
Rick Macklem
a3e709cd33 Modify the NFSv4.1 server so that it allows ReclaimComplete as done by ESXi 6.7.
I believe that a ReclaimComplete with rca_one_fs == TRUE is only
to be used after a file system has been transferred to a different
file server.  However, RFC5661 is somewhat vague w.r.t. this and
the ESXi 6.7 client does both a ReclaimComplete with rca_one_fs == TRUE
and one with ReclaimComplete with rca_one_fs == FALSE.
Therefore, just ignore the rca_one_fs == TRUE operation and return
NFS_OK without doing anything instead of replying NFS4ERR_NOTSUPP.
This allows the ESXi 6.7 NFSv4.1 client to do a mount.
After discussion on the NFSv4 IETF working group mailing list, doing this
along with setting a flag to note that a ReclaimComplete with rca_one_fs TRUE
was an appropriate way to handle this.
The flag that indicates that a ReclaimComplete with rca_one_fs == TRUE was
done may be used to disable replies of NFS4ERR_GRACE for non-reclaim
state operations in a future commit.

This patch along with r332790, r334492 and r336357 allow ESXi 6.7 NFSv4.1 mounts
work ok. ESX 6.5 NFSv4.1 mounts do not work well, due to what I believe are
violations of RFC-5661 and should not be used.

Reported by:	andreas.nagy@frequentis.com
Tested by:	andreas.nagy@frequentis.com, daniel@ftml.net (earlier version)
MFC after:	2 weeks
Relnotes:	yes
2018-07-28 20:21:04 +00:00
Eitan Adler
33f4bccaa6 Use https over http for FreeBSD pages 2018-07-27 10:40:48 +00:00
Ed Maste
6ae00e306f Revert msdosfs MAKEFS #ifdef changes from r319870
These changes are not needed for current msdosfs makefs WIP.

Submitted by:	Siva Mahadevan
Sponsored by:	The FreeBSD Foundation
2018-07-24 21:10:17 +00:00
Rick Macklem
cecf6c6e9c Set CLSET_TIMEOUT on TCP connections to pNFS DSs.
Use CLSET_TIMEOUT to set the timeout for connections to DSs instead of
specifying a timeout on each RPC. This is done so that SO_SNDTIMEO
is set on the TCP socket as well as specifying a time limit when
waiting for an RPC reply.  Useful if the send queue for the TCP
connection has become constipated, due to a failed DS.
The choice of lease_duration / 4 is fairly arbitrary, but seems to work
ok, with a lower bound of 10sec.
For client connections to a DS, set the retry limit to vfs.nfsd.dsretries,
which is 2 by default.
This patch should only affect pNFS connections to DSs.
This patch requires r336542.

MFC after:	2 weeks
2018-07-21 01:33:07 +00:00
Alan Somers
5717aa2d2a Allow mounting FUSE filesystems in jails
Reviewed by:	jamie
MFC after:	2 weeks
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D16371
2018-07-20 21:35:31 +00:00
Rick Macklem
5d54f186bb Modify the reasons for not issuing a delegation in the NFSv4.1 server.
The ESXi NFSv4.1 client will generate warning messages when the reason for
not issuing a delegation is two. Two refers to a resource limit and I do
not see why it would be considered invalid. However it probably was not the
best choice of reason for not issuing a delegation.
This patch changes the reasons used to ones that the ESXi client doesn't
complain about. This change does not affect the FreeBSD client and does
not appear to affect behaviour of the Linux NFSv4.1 client.
RFC5661 defines these "reasons" but does not give any guidance w.r.t. which
ones are more appropriate to return to a client.

Tested by:	andreas.nagy@frequentis.com
PR:		226650
MFC after:	2 weeks
2018-07-16 21:32:50 +00:00
Rick Macklem
5da3882447 Shut down the TCP connection to a DS in the pNFS client when Renew fails.
When a NFSv4.1 client mount using pNFS detects a failure trying to do a
Renew (actually just a Sequence operation), the code would simply try
again and again and again every 30sec.
This would tie up the "nfscl" thread, which should also be doing other
things like Renews on other DSs and the MDS.
This patch adds code which closes down the TCP connection and marks it
defunct when Renew detects an failure to communicate with the DS, so
further Renews will not be attempted until a new working TCP connection to
the DS is established.
It also makes the call to nfscl_cancelreqs() unconditional, since
nfscl_cancelreqs() checks the NFSCLDS_SAMECONN flag and does so while holding
the lock.
This fix only applies to the NFSv4.1 client whne using pNFS and without it
the only effect would have been an "nfscl" thread busy doing Renew attempts
on an unresponsive DS.

MFC after:	2 weeks
2018-07-15 18:54:44 +00:00
Rick Macklem
89c64a3a4f Fix the pNFS client when mirrors aren't on the same machine.
Without this patch, the client side NFSv4.1 pNFS code erroneously did writes
and commits to both DS mirrors using the TCP connection of the first one.
For my test setup this worked, since I have both DSs running on the same
machine, but it would have failed when the DSs are on separate machines.
This patch fixes the code to use the correct TCP connection for each DS.
This patch should only affect the NFSv4.1 client when using "pnfs" mounts
to mirrored DSs.

MFC after:	2 weeks
2018-07-14 19:51:44 +00:00
Rick Macklem
0e7bd20bb2 Close down the TCP connection to a pNFS DS when it is disabled.
So long as the TCP connection to a pNFS DS isn't shared with other DSs,
it can be closed down when the DS is being disabled in the pNFS client.
This causes any RPCs in progress to fail.
This patch only affects the NFSv4.1 pNFS client when errors occur
while doing I/O on a DS.

MFC after:	2 weeks
2018-07-13 20:03:05 +00:00
Rick Macklem
83f526de6a Change the pNFS client so that it does not report an NFSERR_STALE from
an I/O attempt on a DS to the server via LayoutReturn.

The current FreeBSD client can generate these errors for an operational
DS while doing a recovery of a mirror after a mirrored DS has been repaired.
I am not sure why these errors occur, but my best current guess is a race
between the Layout Recall issued by the kernel code run from pnfsdscopymr(8)
and a Read operation on the DS for the file bing copied.
The errrors are not fatal, since the client falls back on doing I/O through
the MDS, which can do the I/O successfully as a proxy. (The fact that the
MDS can do this indicates that the file does still exist on the functioning
DS.)
This patch only affects behaviour of the pNFS client and only when using
Flexible File layouts.

MFC after:	2 weeks
2018-07-13 12:39:27 +00:00
Rick Macklem
a6fed5f514 Modify the NFSv4.1 pNFS client to use separate TCP connections for DSs.
Without this patch, the NFSv4.1 pNFS client shared a single TCP connection
for all DSs that resided on the same machine. This made disabling one of
the DSs impossible. Although unlikely, it is possible that the storage
subsystem has failed in such a way that the storage for one DS on a machine
is no longer functioning correctly, but the storage used by another DS on
the same machine is still ok. For this case, it would be nice if a system
can fail one of the DSs without failing them all.
This patch changes the default behaviour to use separate TCP connections
for each DS even if they reside on the same machine.
I do not believe that this will be a problem for extant pNFS servers, but
a sysctl can be set to restore the old behaviour if this change causes a
problem for an extant pNFS server.
This patch only affects the NFSv4.1 pNFS client.

MFC after:	2 weeks
2018-07-12 20:46:22 +00:00
Rick Macklem
8361de2544 Ignore the cookie verifier for NFSv4.1 when the cookie is 0.
RFC5661 states that the cookie verifier should be 0 when the cookie is 0.
However, the wording is somewhat unclear and a recent discussion on the
nfsv4@ietf.org mailing list indicated that the NFSv4 server should ignore
the cookie verifier's value when the dirctory offset cookie is 0.
This patch deletes the check for this that would return NFSERR_BAD_COOKIE
when the verifier was not 0.
This was found during testing of the ESXi client against the NFSv4.1 server.

Reported by:	daniel@ftml.net (via packet trace)
MFC after:	2 weeks
2018-07-11 23:23:29 +00:00
Rick Macklem
de9a1a70ab Add support for a "forced" pnfsdskill to the pNFS server kernel code.
The pnfsdskill(8) command will normally fail if there is no valid mirror
for the DS to be disabled. However, a system administrator may need to
disable a DS which does not have a valid mirror so that the nfsd threads
can be terminated. This patch adds the kernel code needed by pnfsdskill(8)
to implement this "forced" case of disabling a DS.
This patch only affects the pNFS server.
2018-07-09 19:58:01 +00:00
Rick Macklem
acc6e58def Fix the kernel part of pnfsdscopymr() to handle holes in the file being copied.
If a mirrored DS is being recovered that has a lot of large sparse files,
pnfsdscopymr(8) would use a lot of space on the recovered mirror since it
would write the "holes" in the file being mirrored.
This patch adds code to check for a "hole" and skip doing the write.
The check is done on a "per PNFSDS_COPYSIZ size block", which is currently 64K.
I think that most file server file systems will be using a blocksize at
least this large. If the file server is using a smaller blocksize and
smaller holes need to be preserved, PNFSDS_COPYSIZ could be decreased.
The block of 0s is malloc()d, since pnfsdcopymr(8) should be an infrequent
occurrence.
2018-07-08 18:15:55 +00:00
Rick Macklem
ed66a76bca Fix handling of the hybrid DS case for a pNFS server.
After the addition of the "#mds_path" suffix for a DS specification on the
"-p" nfsd option, it is possible to have a mix of DSs assigned to an MDS
file system and DSs that store files for all DSs. This is what I referred
to as "hybrid" above.
At first, I didn't think this hybrid case would be useful, but I now believe
that some system administrators may fine it useful.
This patch modifies the file storage assignment algorithm so that it
makes the "#mds_path" DSs take priority and the all file systems DSs
are now only used for MDS file systems with no "#mds_path" DS servers.
This only affects the pNFS server for this "hybrid" case.
2018-07-07 19:27:49 +00:00
Rick Macklem
5b500ea949 Change the pNFS server so that it does not disable a mirrored DS for
an NFSERR_STALE error reported via a LayoutReturn.

The current FreeBSD client can generate these errors for an operational
DS while doing a recovery of a mirror after a mirrored DS has been repaired.
I am not sure why these errors occur, but my best current guess is a race
between the Layout Recall issued by the kernel code run from pnfsdscopymr(8)
and a Read operation on the DS for the file bing copied.
The errors are not fatal, since the client falls back on doing I/O through
the MDS, which can do the I/O successfully as a proxy. (The fact that the
MDS can do this indicates that the file does still exist on the functioning
DS.)
This change only affects the pNFS server and only when a client does a
LayoutReturn with the NFSERR_STALE error report.
2018-07-06 19:18:45 +00:00
Rick Macklem
ff3b992f38 Fix the pNFS server so that it handles the "#mds_path" check for mirrors.
The recently added feature of the pNFS server will set an fsid for the
MDS file system to define the file system a DS should store files for.
For a case where a DS handling all file systems has failed, it was possible
for the code to check for a mirror with a specified fs, even though
nfsdev_mdsisset was 0, possibly causing a false successful check for a mirror.
This patch adds a check for nfsdev_mdsisset != 0 to avoid this.
It only affects the pNFS server for a rare case. Found via code inspection.
2018-07-04 19:46:26 +00:00
Rick Macklem
2f32675c83 Add an optional feature to the pNFS server.
Without this patch, the pNFS server distributes the data storage files across
all of the specified DSs.
A tester noted that it would be nice if a system administrator could control
which DSs are used to store the file data for a given exported MDS file system.
This patch adds the kernel support to do this. It also makes a slight semantic
change to nfsv4_findmirror(), since some uses of it no longer require that
the DS being searched for have a current mirror.
A patch that will be committed in a few minutes will modify the nfsd daemon
to support this feature.
The patch should only affect sites using the pNFS server (specified via the
"-p" command line option for nfsd.

Suggested by:	james.rose@framestore.com
2018-07-02 19:21:33 +00:00
Rick Macklem
1aabf3fd5e Fix the pNFS server for a case where mirror level equals number of DSs.
If a pNFS service was set up where the number of DSs equals the mirror level
and then a DS was disabled, the service would create files with duplicate
entries for the same DS. This bug occurred because I didn't realize that
TAILQ_FOREACH_FROM() would start at the beginning of the list when the
inital value of the variable was NULL.
This patch also changes the pNFS server DS file creation code so that it
creates entrie(s) with 0.0.0.0 IP address when it cannot create mirror level
files due to lack of DSs.
The patch only affects the pNFS service and only when it was created with
a number of DSs equal to the mirror level and mirroring is enabled.
2018-06-29 12:41:36 +00:00
Rick Macklem
9f4c522e6b Set the slotid and ND_HASSLOTID flag for NFSv4.1 sequenced operations.
Most NFSv4.1 compound RPCs start with a Sequence operation. For these
cases, save the slotid and note that it is saved by setting ND_HASSLOTID.
This is used by r335568 to free up the session slot and disable it.

MFC after:	2 weeks
2018-06-23 00:48:45 +00:00