Commit Graph

3909 Commits

Author SHA1 Message Date
Konstantin Belousov
5c4ce6fac2 tmpfs: plug holes on rw->ro mount update.
In particular:
- suspend the mount around vflush() to avoid new writes come after the
  vnode is processed;
- flush pending metadata updates (mostly node times);
- remap all rw mappings of files from the mount into ro.

It is not clear to me how to handle writeable mappings on rw->ro for
tmpfs best.  Other filesystems, which use vnode vm object, call
vgone() on vnodes with writers, which sets the vm object type to
OBJT_DEAD, and keep the resident pages and installed ptes as is.  In
particular, the existing mappings continue to work as far as
application only accesses resident pages, but changes are not flushed
to file.

For tmpfs the vm object of VREG vnodes also serves as the data pages
container, giving single copy of the mapped pages, so it cannot be set
to OBJT_DEAD.  Alternatives for making rw mappings ro could be either
invalidating them at all, or marking as CoW.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D19737
2019-04-02 13:59:04 +00:00
Konstantin Belousov
e1cdc30faa tmpfs: ignore tmpfs_set_status() if mount point is read-only.
In particular, this fixes atimes still changing for ro tmpfs.
tmpfs_set_status() gains tmpfs_mount * argument.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D19737
2019-04-02 13:49:32 +00:00
Konstantin Belousov
ae26575394 Block creation of the new nodes for read-only tmpfs mounts.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D19737
2019-04-02 13:41:26 +00:00
Alan Somers
f220ef0b35 fix the GENERIC-NODEBUG build after r345675
Submitted by:	cy
Reported by:	cy, Michael Butler <imb@protected-networks.net>
MFC after:	2 weeks
X-MFC-With:	345675
2019-03-29 14:07:30 +00:00
Alan Somers
080518d810 fusefs: convert debug printfs into dtrace probes
fuse(4) was heavily instrumented with debug printf statements that could
only be enabled with compile-time flags. They fell into three basic groups:

1. Totally redundant with dtrace FBT probes. These I deleted.
2. Print textual information, usually error messages. These I converted to
   SDT probes of the form fuse:fuse:FILE:trace. They work just like the old
   printf statements except they can be enabled at runtime with dtrace. They
   can be filtered by FILE and/or by priority.
3. More complicated probes that print detailed information. These I
   converted into ad-hoc SDT probes.

Also, de-inline fuse_internal_cache_attrs.  It's big enough to be a regular
function, and this way it gets a dtrace FBT probe.

This commit is a merge of r345304, r344914, r344703, and r344664 from
projects/fuse2.

Reviewed by:	cem
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19667
2019-03-29 02:13:06 +00:00
Maxim Sobolev
4f20706113 Refine r345425: get rid of superfluous helper macro that I have added.
MFC after:	2 weeks
2019-03-26 01:28:10 +00:00
Allan Jude
b4b3e3498b Make TMPFS_PAGES_MINRESERVED a kernel option
TMPFS_PAGES_MINRESERVED controls how much memory is reserved for the system
and not used by tmpfs.

On very small memory systems, the default value may be too high and this
prevents these small memory systems from using reroot, which is required
for them to install firmware updates.

Submitted by:	Hiroki Mori <yamori813@yahoo.co.jp>
Reviewed by:	mizhka
Differential Revision:	https://reviews.freebsd.org/D13583
2019-03-25 07:46:20 +00:00
Maxim Sobolev
ac1a10efad Make it possible to update TMPFS mount point from read-only to read-write
and vice versa.

Reviewed by:	delphij
Approved by:	delphij
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D19682
2019-03-22 21:31:21 +00:00
Konstantin Belousov
7ae3486e6d nullfs: fix unmounts when filesystem is active.
If vflush() did not completely flushed the mount vnodes queue, either
retry for forced unmounts, or give up for non-forced.  This situation
can occur when new vnodes are instantiated while vflush() worked.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-03-21 13:30:48 +00:00
Alan Somers
123af6ec70 Rename fuse(4) to fusefs(4)
This makes it more consistent with other filesystems, which all end in "fs",
and more consistent with its mount helper, which is already named
"mount_fusefs".

Reviewed by:	cem, rgrimes
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19649
2019-03-20 21:48:43 +00:00
Alan Somers
7e4844f7d9 fuse(4): remove more debugging printfs
I missed these in r344664.  They're basically useless because they can only
be controlled at compile-time.  Also, de-inline fuse_internal_cache_attrs.
It's big enough to be a regular function, and this way it gets a dtrace FBT
probe.

Sponsored by:	The FreeBSD Foundation
2019-03-19 17:49:15 +00:00
Alan Somers
2aaf9152a8 MFHead@r345275 2019-03-18 19:21:53 +00:00
Fedor Uporov
0204d1c793 Remove unneeded mount point unlock function calls.
The ext2_nodealloccg() function unlocks the mount point
in case of successful node allocation.
The additional unlocks are not required and should be removed.

PR:		236452
Reported by:	pho
MFC after:	3 days
2019-03-15 11:49:46 +00:00
Edward Tomasz Napierala
2df8bd90c8 Drop unused 'p' argument to nfsv4_strtogid().
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2019-03-12 15:07:47 +00:00
Edward Tomasz Napierala
c703cba811 Drop unused 'p' argument to nfsv4_gidtostr().
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2019-03-12 15:05:11 +00:00
Edward Tomasz Napierala
0658ac3943 Drop unused 'p' argument to nfsv4_strtouid().
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2019-03-12 15:02:52 +00:00
Edward Tomasz Napierala
0f86b94a56 Drop unused 'p' argument to nfsv4_uidtostr().
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2019-03-12 14:59:08 +00:00
Edward Tomasz Napierala
f32bf2922f Drop unused 'p' argument to nfsrv_getuser().
Reviewed by:	rmacklem
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19455
2019-03-12 14:53:53 +00:00
Simon J. Gerraty
f5fdf82d82 Add _PC_ACL_* to vop_stdpathconf
This avoid EINVAL from tmpfs etc.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D19512
2019-03-11 20:40:56 +00:00
Alan Somers
84c4fd1f48 fuse(4): add dtrace probe for illegal short writes
Sponsored by:	The FreeBSD Foundation
2019-03-08 02:00:49 +00:00
Conrad Meyer
9a6a45d850 fuse: switch from DFLTPHYS/MAXBSIZE to maxcachebuf
On GENERIC kernels with empty loader.conf, there is no functional change.
DFLTPHYS and MAXBSIZE are both 64kB at the moment.  This change allows
larger bufcache block sizes to be used when either MAXBSIZE (custom kernel)
or the loader.conf tunable vfs.maxbcachebuf (GENERIC) is adjusted higher
than the default.

Suggested by:	ken@
2019-03-07 00:55:49 +00:00
Conrad Meyer
e7df98863b FUSE: Prevent trivial panic
When open(2) was invoked against a FUSE filesystem with an unexpected flags
value (no O_RDONLY / O_RDWR / O_WRONLY), an assertion fired, causing panic.

For now, prevent the panic by rejecting such VOP_OPENs with EINVAL.

This is not considered the correct long term fix, but does prevent an
unprivileged denial-of-service.

PR:		236329
Reported by:	asomers
Reviewed by:	asomers
Sponsored by:	Dell EMC Isilon
2019-03-06 22:56:49 +00:00
Alan Somers
4cbb4f8886 fuse(4): add tests related to FUSE_MKNOD
PR:		236236
Sponsored by:	The FreeBSD Foundation
2019-03-05 00:27:54 +00:00
Edward Tomasz Napierala
01c27978f5 Don't pass td to nfsvno_open().
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2019-03-04 14:50:00 +00:00
Edward Tomasz Napierala
127152fe56 Don't pass td to nfsvno_createsub().
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2019-03-04 14:30:53 +00:00
Edward Tomasz Napierala
5edc9102dc Don't pass td to nfsd_fhtovp(), it's unused.
Reviewed by:	rmacklem (earlier version)
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19421
2019-03-04 13:18:04 +00:00
Edward Tomasz Napierala
af444b18ed Push down the thread argument in NFS server code, using curthread
instead of passing it explicitly. No functional changes

Reviewed by:	rmacklem (earlier version)
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19419
2019-03-04 13:12:23 +00:00
Edward Tomasz Napierala
113aa93390 Push down td in nfsrvd_dorpc() - make it use curthread instead
of it being explicitly passed as an argument. No functional changes.

The big picture here is that I want to get rid of the 'td' argument
being passed everywhere, and this is the first piece that affects
the NFS server.

Reviewed by:	rmacklem
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19417
2019-03-04 13:02:36 +00:00
Fedor Uporov
9441309ae0 Fix double free in case of mount error.
Reported by:    Christopher Krah <krah@protonmail.com>
Reported as:    FS-9-EXT3-2: Denial Of Service in nmount-5 (vm_fault_hold)
Reviewed by:    pfg
MFC after:      1 week

Differential Revision:    https://reviews.freebsd.org/D19385
2019-03-04 11:33:49 +00:00
Fedor Uporov
3eed9f20d4 Do not read the on-disk inode in case of vnode allocation.
Reported by:    Christopher Krah <krah@protonmail.com>
Reported as:    FS-6-EXT2-4: Denial Of Service in mkdir-0 (ext2_mkdir/vn_rdwr)
Reviewed by:    pfg
MFC after:      1 week

Differential Revision:    https://reviews.freebsd.org/D19327
2019-03-04 11:27:47 +00:00
Fedor Uporov
736da5176d Fix integer overflow possibility.
Reported by:    Christopher Krah <krah@protonmail.com>
Reported as:    FS-2-EXT2-1: Out-of-Bounds Write in nmount (ext2_vget)
Reviewed by:    pfg
MFC after:      1 week

Differential Revision:    https://reviews.freebsd.org/D19326
2019-03-04 11:19:21 +00:00
Fedor Uporov
4ff6603ab3 Do not panic if inode bitmap is corrupted.
admbug:         804
Reported by:    Ilja Van Sprundel <ivansprundel@ioactive.com>
Reviewed by:    pfg
MFC after:      1 week

Differential Revision:    https://reviews.freebsd.org/D19325
2019-03-04 11:12:19 +00:00
Fedor Uporov
80a4a9716b Validate block bitmaps.
Reviewed by:    pfg
MFC after:      1 week

Differential Revision:    https://reviews.freebsd.org/D19324
2019-03-04 11:01:23 +00:00
Fedor Uporov
daa2d62da2 Add additional on-disk inode checks.
Reviewed by:    pfg
MFC after:      1 week

Differential Revision:    https://reviews.freebsd.org/D19323
2019-03-04 10:55:01 +00:00
Fedor Uporov
6e38bf94e5 Make superblock reading logic more strict.
Add more on-disk superblock consistency checks to ext2_compute_sb_data() function.
It should decrease the probability of mounting filesystems with corrupted superblock data.

Reviewed by:    pfg
MFC after:      1 week

Differential Revision:    https://reviews.freebsd.org/D19322
2019-03-04 10:42:25 +00:00
Alan Somers
c02ccc7e44 Fix typos from r344664
Sponsored by:	The FreeBSD Foundation
2019-03-01 15:49:11 +00:00
Alan Somers
cf16949867 fuse(4): convert debug printfs into dtrace probes
fuse(4) was heavily instrumented with debug printf statements that could
only be enabled with compile-time flags.  They fell into three basic groups:

1) Totally redundant with dtrace FBT probes.  These I deleted.
2) Print textual information, usually error messages.  These I converted to
   SDT probes of the form fuse:fuse:FILE:trace.  They work just like the old
   printf statements except they can be enabled at runtime with dtrace.
   They can be filtered by FILE and/or by priority.
3) More complicated probes that print detailed information.  These I
   converted into ad-hoc SDT probes.

Sponsored by:	The FreeBSD Foundation
2019-02-28 19:27:54 +00:00
Conrad Meyer
f6ebb68395 fuse: Fix a regression introduced in r337165
On systems with non-default DFLTPHYS and/or MAXBSIZE, FUSE would attempt to
use a buf cache block size in excess of permitted size.  This did not affect
most configurations, since DFLTPHYS and MAXBSIZE both default to 64kB.
The issue was discovered and reported using a custom kernel with a DFLTPHYS
of 512kB.

PR:		230260 (comment #9)
Reported by:	ken@
MFC after:	π/𝑒 weeks
2019-02-21 02:41:57 +00:00
Matt Macy
81167243b4 PFS: Bump NAMELEN and don't require clients to be sleepable
- debugfs consumers expect to be able to export names more than 48 characters

- debugfs consumers expect to be able to hold locks across calls and are able
  to handle allocation failures

Reviewed by:	hps@
MFC after:	1 week
Sponsored by:	iX Systems
Differential Revision:	https://reviews.freebsd.org/D19256
2019-02-20 20:55:02 +00:00
Conrad Meyer
02295caf43 Fuse: whitespace and style(9) cleanup
Take a pass through fixing some of the most egregious whitespace issues in
fs/fuse.  Also fix some style(9) warts while here.  Not 100% cleaned up, but
somewhat less painful to look at and edit.

No functional change.
2019-02-20 02:49:26 +00:00
Conrad Meyer
bd4cb2a46d fuse: add descriptions for remaining sysctls
(Except reclaim revoked; I don't know what that goal of that one is.)
2019-02-20 02:48:59 +00:00
Edward Tomasz Napierala
c9172fb4f1 Work around the "nfscl: bad open cnt on server" assertion
that can happen when rerooting into NFSv4 rootfs with kernel
built with INVARIANTS.

I've talked to rmacklem@ (back in 2017), and while the root cause
is still unknown, the case guarded by assertion (nfscl_doclose()
being called from VOP_INACTIVE) is believed to be safe, and the
whole thing seems to run just fine.

Obtained from:	CheriBSD
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2019-02-19 12:45:37 +00:00
Conrad Meyer
3c324b9465 FUSE: Refresh cached file size when it changes (lookup)
The cached fvdat->filesize is indepedent of the (mostly unused)
cached_attrs, and we failed to update it when a cached (but perhaps
inactive) vnode was found during VOP_LOOKUP to have a different size than
cached.

As noted in the code comment, this can occur in distributed filesystems or
with other kinds of irregular file behavior (anything is possible in FUSE).

We do something similar in fuse_vnop_getattr already.

PR:		230258 (as reported in description; other issues explored in
			comments are not all resolved)
Reported by:	MooseFS FreeBSD Team <freebsd AT moosefs.com>
Submitted by:	Jakub Kruszona-Zawadzki <acid AT moosefs.com> (earlier version)
2019-02-15 22:55:13 +00:00
Conrad Meyer
c4af8b173a FUSE: The FUSE design expects writethrough caching
At least prior to 7.23 (which adds FUSE_WRITEBACK_CACHE), the FUSE protocol
specifies only clean data to be cached.

Prior to this change, we implement and default to writeback caching.  This
is ok enough for local only filesystems without hardlinks, but violates the
general design contract with FUSE and breaks distributed filesystems or
concurrent access to hardlinks of the same inode.

In this change, add cache mode as an extension of cache enable/disable.  The
new modes are UC (was: cache disabled), WT (default), and WB (was: cache
enabled).

For now, WT caching is implemented as write-around, which meets the goal of
only caching clean data.  WT can be better than WA for workloads that
frequently read data that was recently written, but WA is trivial to
implement.  Note that this has no effect on O_WRONLY-opened files, which
were already coerced to write-around.

Refs:
  * https://sourceforge.net/p/fuse/mailman/message/8902254/
  * https://github.com/vgough/encfs/issues/315

PR:		230258 (inspired by)
2019-02-15 22:52:49 +00:00
Conrad Meyer
194e691aaf FUSE: Only "dirty" cached file size when data is dirty
Most users of fuse_vnode_setsize() set the cached fvdat->filesize and update
the buf cache bounds as a result of either a read from the underlying FUSE
filesystem, or as part of a write-through type operation (like truncate =>
VOP_SETATTR).  In these cases, do not set the FN_SIZECHANGE flag, which
indicates that an inode's data is dirty (in particular, that the local buf
cache and fvdat->filesize have dirty extended data).

PR:		230258 (related)
2019-02-15 22:51:09 +00:00
Conrad Meyer
09176f096b FUSE: Respect userspace FS "do-not-cache" of path components
The FUSE protocol demands that kernel implementations cache user filesystem
path components (lookup/cnp data) for a maximum period of time in the range
of [0, ULONG_MAX] seconds.  In practice, typical requests are for 0, 1, or
10 seconds; or "a long time" to represent indefinite caching.

Historically, FreeBSD FUSE has ignored this client directive entirely.  This
works fine for local-only filesystems, but causes consistency issues with
multi-writer network filesystems.

For now, respect 0 second cache TTLs and do not cache such metadata.
Non-zero metadata caching TTLs in the range [0.000000001, ULONG_MAX] seconds
are still cached indefinitely, because it is unclear how a userspace
filesystem could do anything sensible with those semantics even if
implemented.

Pass fuse_entry_out to fuse_vnode_get when available and only cache lookup
if the user filesystem did not set a zero second TTL.

PR:		230258 (inspired by; does not fix)
2019-02-15 22:50:31 +00:00
Conrad Meyer
78a7722fbc FUSE: Respect userspace FS "do-not-cache" of file attributes
The FUSE protocol demands that kernel implementations cache user filesystem
file attributes (vattr data) for a maximum period of time in the range of
[0, ULONG_MAX] seconds.  In practice, typical requests are for 0, 1, or 10
seconds; or "a long time" to represent indefinite caching.

Historically, FreeBSD FUSE has ignored this client directive entirely.  This
works fine for local-only filesystems, but causes consistency issues with
multi-writer network filesystems.

For now, respect 0 second cache TTLs and do not cache such metadata.
Non-zero metadata caching TTLs in the range [0.000000001, ULONG_MAX] seconds
are still cached indefinitely, because it is unclear how a userspace
filesystem could do anything sensible with those semantics even if
implemented.

In the future, as an optimization, we should implement notify_inval_entry,
etc, which provide userspace filesystems a way of evicting the kernel cache.

One potentially bogus access to invalid cached attribute data was left in
fuse_io_strategy.  It is restricted behind the undocumented and non-default
"vfs.fuse.fix_broken_io" sysctl or "brokenio" mount option; maybe these are
deadcode and can be eliminated?

Some minor APIs changed to facilitate this:

1. Attribute cache validity is tracked in FUSE inodes ("fuse_vnode_data").

2. cache_attrs() respects the provided TTL and only caches in the FUSE
inode if TTL > 0.  It also grows an "out" argument, which, if non-NULL,
stores the translated fuse_attr (even if not suitable for caching).

3. FUSE VTOVA(vp) returns NULL if the vnode's cache is invalid, to help
avoid programming mistakes.

4. A VOP_LINK check for potential nlink overflow prior to invoking the FUSE
link op was weakened (only performed when we have a valid attr cache).  The
check is racy in a multi-writer network filesystem anyway -- classic TOCTOU.
We have to trust any userspace filesystem that rejects local caching to
account for it correctly.

PR:		230258 (inspired by; does not fix)
2019-02-15 22:49:15 +00:00
Konstantin Belousov
b9662886ef Un null_vptocnp(), cache vp->v_mount and use it for null_nodeget() call.
The vp vnode is unlocked during the execution of the VOP method and
can be reclaimed, zeroing vp->v_data.  Caching allows to use the
correct mount point.

Reported and tested by:	pho
PR: 235549
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-08 08:20:18 +00:00
Konstantin Belousov
25728e8411 Before using VTONULL(), check that the covered vnode belongs to nullfs.
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-08 08:17:31 +00:00
Konstantin Belousov
930cc2dbef Some style for nullfs_mount(). Also use bool type for isvnunlocked.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-08 08:15:29 +00:00