When the entry cache expires, it's only necessary to purge the cache.
Disappearing a vnode also purges the attribute cache, which is unnecessary,
and invalidates the data cache, which could be harmful.
Sponsored by: The FreeBSD Foundation
I got most of -o default_permissions working in r346088. This commit adds
sticky bit checks. One downside is that sometimes there will be an extra
FUSE_GETATTR call for the parent directory during unlink or rename. But in
actual use I think those attributes will almost always be cached.
PR: 216391
Sponsored by: The FreeBSD Foundation
fuse_vnop_lookup was using a FUSE_GETATTR operation when looking up "." and
"..", even though the only information it needed was the file type and file
size. "." and ".." are obviously always going to be directories; there's no
need to double check.
Sponsored by: The FreeBSD Foundation
fuse_vnop_lookup contained an awkward hack meant to reduce daemon activity
during long lookup chains. However, the hack is no longer necessary now
that we properly cache file attributes. Also, I'm 99% certain that it
could've bypassed permission checks when using openat to open a file
relative to a directory that lacks execute permission.
Sponsored by: The FreeBSD Foundation
* Eliminate fuse_access_param. Whatever it was supposed to do, it seems
like it was never complete. The only real function it ever seems to have
had was a minor performance optimization, which I've already eliminated.
* Make extended attribute operations obey the allow_other mount option.
* Allow unprivileged access to the SYSTEM extattr namespace when
-o default_permissions is not in use.
* Disallow setextattr and deleteextattr on read-only mounts.
* Add tests for a few more error cases.
Sponsored by: The FreeBSD Foundation
Normally all permission checking is done in the fuse server. But when -o
default_permissions is used, it should be done in the kernel instead. This
commit adds appropriate permission checks through fusefs when -o
default_permissions is used. However, sticky bit checks aren't working yet.
I'll handle those in a follow-up commit.
There are no checks for file flags, because those aren't supported by our
version of the FUSE protocol. Nor is there any support for ACLs, though
that could be added if there were any demand.
PR: 216391
Reported by: hiyorin@gmail.com
Sponsored by: The FreeBSD Foundation
The FUSE protocol includes a way for a server to tell the client that a
negative lookup response is cacheable for a certain amount of time.
PR: 236226
Sponsored by: The FreeBSD Foundation
Follow-up to r346046. These two commits implement fuse cache timeouts for
both entries and attributes. They also remove the vfs.fusefs.lookup_cache
enable sysctl, which is no longer needed now that cache timeouts are
honored.
PR: 235773
Sponsored by: The FreeBSD Foundation
The FUSE protocol allows the server to specify the timeout period for the
client's attribute and entry caches. This commit implements the timeout
period for the attribute cache. The entry cache's timeout period is
currently disabled because it panics, and is guarded by the
vfs.fusefs.lookup_cache_expire sysctl.
PR: 235773
Reported by: cem
Sponsored by: The FreeBSD Foundation
FUSE_LOOKUP, FUSE_GETATTR, FUSE_SETATTR, FUSE_MKDIR, FUSE_LINK,
FUSE_SYMLINK, FUSE_MKNOD, and FUSE_CREATE all return file attributes with a
cache validity period. fusefs will now cache the attributes, if the server
returns a non-zero cache validity period.
This change does _not_ implement finite attr cache timeouts. That will
follow as part of PR 235773.
PR: 235775
Reported by: cem
Sponsored by: The FreeBSD Foundation
VOP_ACCESS was never fully implemented in fusefs. This change:
* Removes the FACCESS_DO_ACCESS flag, which pretty much disabled the whole
vop.
* Removes a quixotic special case for VEXEC on regular files. I don't know
why that was in there.
* Removes another confusing special case for VADMIN.
* Removes the FACCESS_NOCHECKSPY flag. It seemed to be a performance
optimization, but I'm unconvinced that it was a net positive.
* Updates test cases.
This change does NOT implement -o default_permissions. That will be handled
separately.
PR: 236291
Sponsored by: The FreeBSD Foundation
When -o allow_other is not in use, fusefs is supposed to prevent access to
the filesystem by any user other than the one who owns the daemon. Our
fusefs implementation was only enforcing that restriction at the mountpoint
itself. That was usually good enough because lookup usually descends from
the mountpoint. However, there are cases when it doesn't, such as when
using openat relative to a file beneath the mountpoint.
PR: 237052
Sponsored by: The FreeBSD Foundation
If a fuse file system returne FOPEN_KEEP_CACHE in the open or create
response, then the client is supposed to _not_ clear its caches for that
file. I don't know why clearing the caches would be the default given that
there's a separate flag to bypass the cache altogether, but that's the way
it is. fusefs(5) will now honor this flag.
Our behavior is slightly different than Linux's because we reuse file
handles. That means that open(2) wont't clear the cache if there's a
reusable file handle, even if the file server wouldn't have sent
FOPEN_KEEP_CACHE had we opened a new file handle like Linux does.
PR: 236560
Sponsored by: The FreeBSD Foundation
This bug was long present, but was exacerbated by r345876.
The problem is that fiov_refresh was bzero()ing a buffer _before_ it
reallocated that buffer. That's obviously the wrong order. I fixed the
order in r345876, which exposed the main problem. Previously, the first 160
bytes of the buffer were getting bzero()ed when it was first allocated in
fiov_init. Subsequently, as that buffer got recycled between callers, the
portion used by the _previous_ caller was getting bzero()ed by the current
caller in fiov_refresh. The problem was never visible simply because no
caller was trying to use more than 160 bytes.
Now the buffer gets properly bzero()ed both at initialization time and any
time it gets enlarged or reallocated.
Sponsored by: The FreeBSD Foundation
If a FUSE daemon returns FOPEN_DIRECT_IO when a file is opened, then it's
allowed to write less data than was requested during a FUSE_WRITE operation
on that file handle. fusefs should simply return a short write to userland.
The old code attempted to resend the unsent data. Not only was that
incorrect behavior, but it did it in an ineffective way, by attempting to
"rewind" the uio and uiomove the unsent data again.
This commit correctly handles short writes by returning directly to
userland if FOPEN_DIRECT_IO was set. If it wasn't set (making the short
write technically a protocol violation), then we resend the unsent data.
But instead of rewinding the uio, just resend the data that's already in the
kernel.
That necessitated a few changes to fuse_ipc.c to reduce the amount of bzero
activity. fusefs may be marginally faster as a result.
PR: 236381
Sponsored by: The FreeBSD Foundation
The code enabled when "DEBUG" is defined uses mem_alloc(), which is a
malloc(.., M_RPC, M_WAITOK | M_ZERO), but then calls gss_release_buffer()
which does a free(.., M_GSSAPI) to free the memory.
This patch fixes the problem by replacing mem_alloc() with a
malloc(.., M_GSSAPI, M_WAITOK | M_ZERO).
This bug affects almost no one, since the sources are not normally built
with "DEBUG" defined.
Submitted by: peter@ifm.liu.se
MFC after: 2 weeks
provider grows, GELI will expand automatically and will move the metadata
to the new location of the last sector.
This functionality is turned on by default. It can be turned off with the
-R flag, but it is not recommended - if the underlying provider grows and
automatic expansion is turned off, it won't be possible to attach this
provider again, as the metadata is no longer located in the last sector.
If the automatic expansion is turned off and the underlying provider grows,
GELI will only log a message with the previous size of the provider, so
recovery can be easier.
Obtained from: Fudo Security
In r337703 DTS files were updated to Linux 4.18, including Linux commit
4d8b032d3c03f4e9788a18bbb51b10e6c9e8a56b which removed the `phy_id`
property from am335x-bone-common (as the property was deprecated).
Use `phy-handle` via fdt_get_phyaddr, keeping the existing code as a
fallback for old DTBs.
PR: 236624
Submitted by: manu, Gerald Aryeetey <aryeeteygerald_rogers.com>
Reported by: Gerald Aryeetey
Reviewed by: manu
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19814
The original fusefs import, r238402, contained a bug in fuse_vnop_close that
could close a directory's file handle while there were still other open file
descriptors. The code looks deliberate, but there is no explanation for it.
This necessitated a workaround in fuse_vnop_readdir that would open a new
file handle if, "for some mysterious reason", that vnode didn't have any
open file handles. r345781 had the effect of causing the workaround to
panic, making the problem more visible.
This commit removes the workaround and the original bug, which also fixes
the panic.
Sponsored by: The FreeBSD Foundation
The FUSE protocol says that FUSE_FLUSH should be send every time a file
descriptor is closed. That's not quite possible in FreeBSD because multiple
file descriptors can share a single struct file, and closef doesn't call
fo_close until the last close. However, we can still send FUSE_FLUSH on
every VOP_CLOSE, which is probably good enough.
There are two purposes for FUSE_FLUSH. One is to allow file systems to
return EIO if they have an error when writing data that's cached
server-side. The other is to release POSIX file locks (which fusefs(5) does
not yet support).
PR: 236405, 236327
Sponsored by: The FreeBSD Foundation
Despite the call to vtruncbuf() from detrunc(), which results in
zeroing part of the partial page after EOF, there still is a
possibility to retain the stale data which is revived on file
enlargement. If the filesystem block size is greater than the page
size, partial block might keep other after-EOF pages wired and they
get reused then. Fix it by zeroing whole part of the partial buffer
after EOF, not relying on vnode_pager_setsize().
PR: 236977
Reported by: asomers
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Harvesting has to compete for the TPM chip with userspace.
Before this change the callout could hijack an unread buffer
causing a userspace call to the TPM to fail.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: delphij
Obtained from: Semihalf
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D19712
The e5500 has an FPU, but lacks the optional fsqrt instruction. This
instruction gets emulated in the kernel, but the emulation uses stale data,
from the last switch out, and does not return the result of the operation
immediately. Fix both of these conditions by saving and restoring the FPRs
around the emulation point.
MFC after: 1 week
MFC with: r345829
The current approach of injecting manifest into mac_veriexec is to
verify the integrity of it in userspace (veriexec (8)) and pass its
entries into kernel using a char device (/dev/veriexec).
This requires verifying root partition integrity in loader,
for example by using memory disk and checking its hash.
Otherwise if rootfs is compromised an attacker could inject their own data.
This patch introduces an option to parse manifest in kernel based on envs.
The loader sets manifest path and digest.
EVENTHANDLER is used to launch the module right after the rootfs is mounted.
It has to be done this way, since one might want to verify integrity of the init file.
This means that manifest is required to be present on the root partition.
Note that the envs have to be set right before boot to make sure that no one can spoof them.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: sjg
Obtained from: Semihalf
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D19281
This fix was committed less than 2 months after the code was forked into the
powerpc kernel. Though powerpc doesn't use quad-precision floating point,
or need it for emulation, the changes do look like correctness fixes
overall.
This was found while trying to get fsqrt emulation working on e5500, which
does have a real FPU, but lacks the fsqrt instruction. This is not the
complete fix, the rest is to be committed separately.
MFC after: 1 week
During truncate, fusefs was discarding entire cached blocks, but it wasn't
zeroing out the unused portion of a final partial block. This resulted in
reads returning stale data.
PR: 233783
Reported by: fsx
Sponsored by: The FreeBSD Foundation
When a new client structure was allocated, it was added to the list
so that it was visible to other threads before the expiry time was
initialized, with only a single reference count.
The caller would increment the reference count, but it was possible
for another thread to decrement the reference count to zero and free
the structure before the caller incremented the reference count.
This could occur because the expiry time was still set to zero when
the new client structure was inserted in the list and the list was
unlocked.
This patch fixes the race by initializing the reference count to two
and initializing all fields, including the expiry time, before inserting
it in the list.
Tested by: peter@ifm.liu.se
PR: 235582
MFC after: 2 weeks
Before this I suppose it was impossible load CAM-based NVMe as module.
Plus this appeared to be needed to build r345815 without NVMe driver.
MFC after: 2 weeks
interrupt enable are not fatal.
The firmware sets up all the interrupt enables based on run time
configuration, which means the information in the enables is more
accurate than what's compiled into the driver. This change also allows
the fatal bits to be updated without any changes in the driver in some
cases.
MFC after: 1 week
Sponsored by: Chelsio Communications
This commit cleans up after recent commits, especially 345766, 345768, and
345781. There is no functional change. The most important change is to add
comments documenting why we can't send flags like O_APPEND in
FUSE_WRITE_OPEN.
PR: 236340
Sponsored by: The FreeBSD Foundation
In particular:
- suspend the mount around vflush() to avoid new writes come after the
vnode is processed;
- flush pending metadata updates (mostly node times);
- remap all rw mappings of files from the mount into ro.
It is not clear to me how to handle writeable mappings on rw->ro for
tmpfs best. Other filesystems, which use vnode vm object, call
vgone() on vnodes with writers, which sets the vm object type to
OBJT_DEAD, and keep the resident pages and installed ptes as is. In
particular, the existing mappings continue to work as far as
application only accesses resident pages, but changes are not flushed
to file.
For tmpfs the vm object of VREG vnodes also serves as the data pages
container, giving single copy of the mapped pages, so it cannot be set
to OBJT_DEAD. Alternatives for making rw mappings ro could be either
invalidating them at all, or marking as CoW.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D19737
In particular, this fixes atimes still changing for ro tmpfs.
tmpfs_set_status() gains tmpfs_mount * argument.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D19737
o Fix bug in PLIC_ENABLE macro when irq >= 32.
Tested on the real hardware, which is HiFive Unleashed board.
Thanks to SiFive, Inc. for the board provided.
Reviewed by: markj
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D19775
* Crank the OPAL state machine during the receive loop, to make sure the
pollers are executed
* Add a proper detach function, so the module can be unloaded and reloaded
at runtime.
It still doesn't reliably work 100% of the time on POWER9, and it appears
timing and/or cache related. It may work on POWER8 now.
MFC after: 2 weeks
Since OPAL_GET_MSG does not discriminate between message types, asynchronous
completion events may be received in the OPAL_GET_MSG call, which dequeues
them from the list, thus preventing OPAL_CHECK_ASYNC_COMPLETION from
succeeding. Handle this case by integrating with the messaging framework.