Directory entries must be padded to maintain alignment; in many
filesystems the padding was not initialized, resulting in stack
memory being copied out to userspace. With the ino64 work there
are also some explicit pad fields in struct dirent. Add a subroutine
to clear these bytes and use it in the in-tree filesystems. The
NFS client is omitted for now as it was fixed separately in r340787.
Reported by: Thomas Barabosch, Fraunhofer FKIE
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
The d_off field has been added to the dirent structure recently.
Currently filesystems don't support this feature. Support has been
added and tested for zfs, ufs, ext2fs, fdescfs, msdosfs and unionfs.
A stub implementation is available for cd9660, nandfs, udf and
pseudofs but hasn't been tested.
Motivation for this feature: our usecase is for a userspace nfs server
(nfs-ganesha) with zfs. At the moment we cache direntry offsets by
calling lseek once per entry, with this patch we can get the offset
directly from getdirentries(2) calls which provides a significant
speedup.
Submitted by: Jack Halford <jack@gandi.net>
Reviewed by: mckusick, pfg, rmacklem (previous versions)
Sponsored by: Gandi.net
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D17917
by doing most of the work in a new function prison_add_vfs in kern_jail.c
Now a jail-enabled filesystem need only mark itself with VFCF_JAIL, and
the rest is taken care of. This includes adding a jail parameter like
allow.mount.foofs, and a sysctl like security.jail.mount_foofs_allowed.
Both of these used to be a static list of known filesystems, with
predefined permission bits.
Reviewed by: kib
Differential Revision: D14681
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
The previous limit of 24 was somewhat restrictive, and with this change
ceil(log2(sizeof(struct pfs_node))) is the same as before in both the ILP32
and LP64 models, so the malloc zone used for allocations of struct pfs_node
is the same as before.
Approved by: des
changed from %d to a long-width type.
Use uintmax_t casting and %ju to futureproof the format string against
potential changes with either the #define or the implementation-specific
definition for offsetof(..).
fixed size for the name buffer PFS_NAMELEN.
As r318736 was commited (ino64 project) the size of the permanent part
of the struct dirent was changed, so calulate PFS_DELEN properly.
If some process' nodes were accessed using procfs and the process
cannot exit properly at the time modunload event is reported to the
pseudofs-backed filesystem, the assertion in pfs_vncache_unload() is
triggered. Assertion is correct, the cache should be cleaned.
Approved by: des (pseudofs maintainer)
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Owning Giant in the init/uninit is accidental due to the moment where
VFS modules initialization is performed, and is not enforced by the
VFS interface. The Giant lock does not prevent a parallel execution
of the code, it is VFS which implements the proper protocol.
Approved by: des (pseudofs maintainer)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
pfs_visible(). The recursion does not cause deadlock because the sx
implementation does not prefer exclusive waiters over the shared, but
this is an implementation detail.
Reported by: pho, Matthew Bryan <matthew.bryan@isilon.com>
Reviewed by: jhb
Tested by: pho
Approved by: des (pseudofs maintainer)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
compiler interprets this as an undefined behaviour. Instead, ensure
that the sum of uio_offset and uio_resid is below OFF_MAX using the
operation which cannot overflow.
Reported and tested by: pho
Discussed with: bde
Approved by: des (pseudofs maintainer)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
sources from uio. Both uio_offset and offset, and uio_resid and resid
have the same types for some time.
Add check for buflen overflow by comparing the buflen with both offset
and resid (vs. comparing with offset only, as it is currently done).
Reported and tested by: pho
Approved by: des (pseudofs maintainer)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the
sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from
the usermode.
Discussed with: bde, das (previous versions)
MFC after: 1 month
consequence sbuf_len() will return -1 for buffers which had the error
status set prior to sbuf_finish() call. This causes a problem in
pfs_read() which purposely uses a fixed size sbuf to discard bytes which
are not needed to fulfill the read request.
Work around the problem by using the full buffer length when
sbuf_finish() indicates an overflow. An overflowed sbuf with fixed size
is always full.
PR: kern/163076
Approved by: des
MFC after: 2 weeks
nullfs. The problem is that resulting vnode is only required to be
held on return from the successfull call to vop, instead of being
referenced.
Nullfs VOP_INACTIVE() method reclaims the vnode, which in combination
with the VOP_VPTOCNP() interface means that the directory vnode
returned from VOP_VPTOCNP() is reclaimed in advance, causing
vn_fullpath() to error with EBADF or like.
Change the interface for VOP_VPTOCNP(), now the dvp must be
referenced. Convert all in-tree implementations of VOP_VPTOCNP(),
which is trivial, because vhold(9) and vref(9) are similar in the
locking prerequisites. Out-of-tree fs implementation of VOP_VPTOCNP(),
if any, should have no trouble with the fix.
Tested by: pho
Reviewed by: mckusick
MFC after: 3 weeks (subject of re approval)
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
LK_CANRECURSE after a lock is created. Use them to implement macros that
otherwise manipulated the flags directly. Assert that the associated
lockmgr lock is exclusively locked by the current thread when manipulating
these flags to ensure the flag updates are safe. This last change required
some minor shuffling in a few filesystems to exclusively lock a brand new
vnode slightly earlier.
Reviewed by: kib
MFC after: 3 days
Assert this.
In the reported panic, vdestroy() fired the assertion "vp has namecache
for ..", because pseudofs may end up doing cache_enter() with reclaimed
dvp, after dotdot lookup temporary unlocked dvp.
Similar problem exists in ufs_lookup() for "." lookup, when vnode
lock needs to be upgraded.
Verify that dvp is not reclaimed before calling cache_enter().
Reported and tested by: pho
Reviewed by: kan
MFC after: 2 weeks
larger than MAXPHYS + 1. This fixes a problem with cat(1) when it
uses a large I/O buffer.
Reported by: Fernando Apesteguía
Suggested by: jilles
Reviewed by: des
Approved by: trasz (mentor)
never been inserted into the pfs_vncache list. Since pfs_vncache_free()
does not anticipate this case, it decrements pfs_vncache_entries
unconditionally; if the vnode was not in the list, pfs_vncache_entries
will no longer reflect the actual number of list entries. This may cause
size of the cache to exceed the configured maximum. It may also trigger
a panic during module unload or system shutdown.
Do not decrement pfs_vncache_entries for the vnode that was not in the
list.
Submitted by: tegge
Reviewed by: des
MFC after: 1 week
dead_vnodeops before calling vgone(). Revert r189706 and corresponding
part of the r186560.
Noted and reviewed by: tegge
Approved by: des (pseudofs part)
MFC after: 3 days
Note that this does not actually enable full-range i/o requests for
64 architectures, and is done now to update KBI only.
Tested by: pho
Reviewed by: jhb, bde (as part of the review of the bigger patch)
be NULL or derefenced memory may become free at arbitrary moment.
Lock the vnode in cd9660, devfs and pseudofs implementation of VOP_IOCTL
to prevent reclaim; check whether the vnode was already reclaimed after
the lock is granted.
Reported by: georg at dts su
Reviewed by: des (pseudofs)
MFC after: 2 weeks
the VFS. Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.
In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.
While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.
VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled. Bump __FreeBSD_version in order to signal such
situation.
pfs_vncache_free() that removes pvd from the list, while it is not yet
put on the list.
Prevent the invalid removal from the list by clearing pvd_next and
pvd_prev for the newly allocated pvd, and only move pfs_vncache list
head when the pvd was at the head.
Suggested and approved by: des
MFC after: 2 weeks
may need to lock arbitrary vnodes, causing either lock order reversal or
recursive vnode lock acquisition.
Tested by: pho
Approved by: des
MFC after: 2 weeks
do pfs_vncache_alloc() for the same pfs_node and pid. In this case, we
could end up with two vnodes for the pair. Recheck the cache under the
locked pfs_vncache_mutex after all sleeping operations are done [1].
This case mostly cannot happen now because pseudofs uses exclusive vnode
locking for lookup. But it does drop the vnode lock for dotdot lookups,
and Marcus' pseudofs_vptocnp implementation is vulnerable too.
Do not call free() on the struct pfs_vdata after insmntque() failure,
because vp->v_data points to the structure, and pseudofs_reclaim()
frees it by the call to pfs_vncache_free().
Tested by: pho [1]
Approved by: des
MFC after: 2 weeks
to add more V* constants, and the variables changed by this patch were often
being assigned to mode_t variables, which is 16 bit.
Approved by: rwatson (mentor)