This ensures that pipe_poll() and the pipe kqueue filters observe
PIPE_EOF and set EV_EOF accordingly. As a result an extra call to
knote() after setting PIPE_EOF is unnecessary.
Submitted by: Jan Kokemüller <jan.kokemueller@gmail.com>
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24528
The macros CAST_USER_ADDR_T() and CAST_DOWN() were used for the Mac OS/X
port. The first of these macros was a no-op for FreeBSD and the second
is no longer used.
This patch gets rid of them. It also deletes the "mbuf_t" typedef which
is no longer used in the FreeBSD code from nfskpiport.h
This patch should not change semantics.
RFC5661 specifies that a client's recovery upon receipt of NFSERR_BADSESSION
should first consist of a CreateSession operation using the extant ClientID.
If that fails, then a full recovery beginning with the ExchangeID operation
is to be done.
Without this patch, the FreeBSD client did not attempt the CreateSession
operation with the extant ClientID and went directly to a full recovery
beginning with ExchangeID. I have had this patch several years, but since
no extant NFSv4.n server required the CreateSession with extant ClientID,
I have never committed it.
I an committing it now, since I suspect some future NFSv4.n server will
require this and it should not negatively impact recovery for extant NFSv4.n
servers, since they should all return NFSERR_STATECLIENTID for this first
CreateSession.
The patched client has been tested for recovery against both the FreeBSD
and Linux NFSv4.n servers and no problems have been observed.
MFC after: 1 month
The typedef mbuf_t was used for the Mac OS/X port of the code long ago.
Since this port is no longer used and the use of mbuf_t obscures what
the code does (and is not consistent with style(9)), it is no longer needed.
This patch replaces all instances of mbuf_t with "struct mbuf *", so that
it is no longer used.
This patch should not result in any semantic change.
Ryan Moeller reported crashes in the NFS server that appear to be
caused by stack corruption in nfsrv_compound(). It appears that
the stack got corrupted just after a NFSv4.1 Lookup that crosses
a server mount point.
Although it is just a "theory" at this point, the most obvious way
the stack could get corrupted would be if nfsvno_checkexp() somehow
acquires an export with a bogus nes_numsecflavor value. This would
cause the copying of the secflavors to run off the end of the array,
which is allocated on the stack below where the corruption occurs.
This sanity check is simple to do and would stop the stack corruption
if the theory is correct. Otherwise, doing the sanity check seems to
be a reasonable safety belt to add to the code.
Reported by: freqlabs
MFC after: 2 weeks
I missed the "atomic" field of the RemoveExtendedAttribute operation's
reply when I implemented it. It worked between FreeBSD client and server,
since it was missed for both, but it did not conform to RFC 8276.
This patch adds the field for both client and server.
Thanks go to Frank for doing interoperability testing of the extended
attribute support against patches for Linux.
Submitted by: Frank van der Linden <fllinden@amazon.com>
Reported by: Frank van der Linden <fllinden@amazon.com>
I did not realize that zero length attributes are allowed, but they are.
This patch fixes the NFSv4.2 client and server to handle zero length
extended attributes correctly.
Submitted by: Frank van der Linden <fllinden@amazon.com> (earlier version)
Reported by: Frank van der Linden <fllinder@amazon.com>
The file handle affinity code was configured to be used by both the
old and new NFS servers. This no longer makes sense, since there is
only one NFS server.
This patch copies a majority of the code in sys/nfs/nfs_fha.c and
sys/nfs/nfs_fha.h into sys/fs/nfsserver/nfs_fha_new.c and
sys/fs/nfsserver/nfs_fha_new.h, so that the files in sys/nfs can be
deleted. The code is simplified by deleting the function callback pointers
used to call functions in either the old or new NFS server and they were
replaced by calls to the functions.
As well as a cleanup, this re-organization simplifies the changes
required for handling of external page mbufs, which is required for KERN_TLS.
This patch should not result in a semantic change to file handle affinity.
When the code was ported to Mac OS/X, mbuf handling functions were
converted to using the Mac OS/X accessor functions. For FreeBSD, they
are a simple set of macros in sys/fs/nfs/nfskpiport.h.
Since r359757, r359780, r359785, r359810, r359811 have removed all uses
of these macros, this patch deleted the macros from the .h files.
My eventual goal is deleting nfskpiport.h, but that will take some more
editting to replace uses of the remaining macros.
When the code was ported to Mac OS/X, mbuf handling functions were
converted to using the Mac OS/X accessor functions. For FreeBSD, they
are a simple set of macros in sys/fs/nfs/nfskpiport.h.
Since porting to Mac OS/X is no longer a consideration, replacement of
these macros with the code generated by them makes the code more
readable.
When support for external page mbufs is added as needed by the KERN_TLS,
the patch becomes simpler if done without the macros.
This patch should not result in any semantic change.
This is the final patch of this series and the macros should now be
able to be deleted from the .h files in a future commit.
When the code was ported to Mac OS/X, mbuf handling functions were
converted to using the Mac OS/X accessor functions. For FreeBSD, they
are a simple set of macros in sys/fs/nfs/nfskpiport.h.
Since porting to Mac OS/X is no longer a consideration, replacement of
these macros with the code generated by them makes the code more
readable.
When support for external page mbufs is added as needed by the KERN_TLS,
the patch becomes simpler if done without the macros.
This patch should not result in any semantic change.
When the code was ported to Mac OS/X, mbuf handling functions were
converted to using the Mac OS/X accessor functions. For FreeBSD, they
are a simple set of macros in sys/fs/nfs/nfskpiport.h.
Since porting to Mac OS/X is no longer a consideration, replacement of
these macros with the code generated by them makes the code more
readable.
When support for external page mbufs is added as needed by the KERN_TLS,
the patch becomes simpler if done without the macros.
This patch should not result in any semantic change.
This conversion will be committed one file at a time.
When the code was ported to Mac OS/X, mbuf handling functions were
converted to using the Mac OS/X accessor functions. For FreeBSD, they
are a simple set of macros in sys/fs/nfs/nfskpiport.h.
Since porting to Mac OS/X is no longer a consideration, replacement of
these macros with the code generated by them makes the code more
readable.
When support for external page mbufs is added as needed by the KERN_TLS,
the patch becomes simpler if done without the macros.
This patch should not result in any semantic change.
This conversion will be committed one file at a time.
When the code was ported to Mac OS/X, mbuf handling functions were
converted to using the Mac OS/X accessor functions. For FreeBSD, they
are a simple set of macros in sys/fs/nfs/nfskpiport.h.
Since porting to Mac OS/X is no longer a consideration, replacement of
these macros with the code generated by them makes the code more
readable.
When support for external page mbufs is added as needed by the KERN_TLS,
the patch becomes simpler if done without the macros.
This patch should not result in any semantic change.
This conversion will be committed one file at a time.
This NFS lock device driver was replaced by the kernel NLM around FreeBSD7 and
has not normally been used since then.
To use it, the kernel had to be built without "options NFSLOCKD" and
the nfslockd.ko had to be deleted as well.
Since it uses Giant and is no longer used, this patch removes it.
With this device driver removed, there is now a lot of unused code
in the userland rpc.lockd. That will be removed on a future commit.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D22933
Luoqi Chen reported a problem on freebsd-fs@ where a Linux NFSv4 client
was able to open and write to a file when the file's permissions were
not set to allow the owner write access.
Since NFS servers check file permissions on every write RPC, it is standard
practice to allow the owner of the file to do writes, regardless of
file permissions. This provides POSIX like behaviour, since POSIX only
checks permissions upon open(2).
The traditional way NFS clients handle this is to check access via the
Access operation/RPC and use that to determine if an open(2) on the
client is allowed.
It appears that, for NFSv4, the Linux client expects the NFSv4 Open (not a
POSIX open) operation to fail with NFSERR_ACCES if the file is not being
created and file permissions do not allow owner access, unlike NFSv3.
Since both the Linux and OpenSolaris NFSv4 servers seem to exhibit this
behaviour, this patch changes the FreeBSD NFSv4 server to do the same.
A sysctl called vfs.nfsd.v4openaccess can be set to 0 to return the
NFSv4 server to its previous behaviour.
Since both the Linux and FreeBSD NFSv4 clients seem to exhibit correct
behaviour with the access check for file owner in Open enabled, it is enabled
by default.
Reported by: luoqi.chen@gmail.com
MFC after: 2 weeks
Peter reported that his dmesg was getting cluttered with
nfsrv_cache_session: no session
messages when he rebooted his NFS server and they did not seem useful.
He was correct, in that these messages are "normal" and expected when
NFSv4.1 or NFSv4.2 are mounted and the server is rebooted.
This patch silences the printf() during the grace period after a reboot.
It also adds the client IP address to the printf(), so that the message
is more useful if/when it occurs. If this happens outside of the
server's grace period, it does indicate something is not working correctly.
Instead of adding yet another nd_XXX argument, the arguments for
nfsrv_cache_session() were simplified to take a "struct nfsrv_descript *".
Reported by: pen@lysator.liu.se
MFC after: 2 weeks
Modern debuggers and process tracers use ptrace() rather than procfs
for debugging. ptrace() has a supserset of functionality available
via procfs and new debugging features are only added to ptrace().
While the two debugging services share some fields in struct proc,
they each use dedicated fields and separate code. This results in
extra complexity to support a feature that hasn't been enabled in the
default install for several years.
PR: 244939 (exp-run)
Reviewed by: kib, mjg (earlier version)
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D23837
Implement one mutex per cuse(3) server instance which also cover the
clients belonging to the given server instance.
This should significantly reduce the mutex congestion inside the
cuse(3) kernel module when multiple servers are in use.
MFC after: 1 week
Sponsored by: Mellanox Technologies
Attempting to use ioctls on /proc/<pid>/mem to control a process will
trigger warnings on the console. The <sys/pioctl.h> include file will
also now emit a compile-time warning when used from userland.
Reviewed by: emaste
MFC after: 1 week
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D23822
The FUSE protocol allows the client (kernel) to cache a file's size, if the
server (userspace daemon) allows it. A well-behaved daemon obviously should
not change a file's size while a client has it cached. But a buggy daemon
might. If the kernel ever detects that that has happened, then it should
invalidate the entire cache for that file. Previously, we would not only
cache stale data, but in the case of a file extension while we had the size
cached, we accidentally extended the cache with zeros.
PR: 244178
Reported by: Ben RUBSON <ben.rubson@gmx.com>
Reviewed by: cem
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24012
Return ENOMEM if one of the buffer cannot be created even with the
minimal size. This should avoid subsequent spurious ENOMEM errors
from write(2) when buffer cannot be allocated on the fly, after we
reported that the pipe was create succesfully.
Reported by: Keno Fischer <keno@juliacomputing.com>
Reviewed by: markj (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D23993
We were reusing a structure for multiple operations, but failing to
reinitialize one member. The result is that a server that cares about FUSE
file handle IDs would see one correct FUSE_FSYNC operation, and one with the
FHID unset.
PR: 244431
Reported by: Agata <chogata@gmail.com>
MFC after: 2 weeks
file systems to safely access their disk devices, and adapt FFS to use it.
Also add a new BO_NOBUFS flag to allow enforcing that file systems using
mntfs vnodes do not accidentally use the original devfs vnode to create buffers.
Reviewed by: kib, mckusick
Approved by: imp (mentor)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D23787
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Approved by: kib (mentor, blanket)
Differential Revision: https://reviews.freebsd.org/D23629
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Approved by: kib (mentor, blanket)
Differential Revision: https://reviews.freebsd.org/D23627
If node attribute returned in the reply for read rpc indicate
truncation, and it happens that the vnode is exclusively locked,
update of the node attributes would try to shrink vnode size. Since
during the read some vnode pages were busied by the reading thread,
vnode_pager_setsize() deadlocks waiting for the busy state owned by
the caller.
Use a thread-local flag to indicate that NFS read owns some (s)busy
pages states and postpone the call to vnode_pager_setsize() until the
thread relinguishes the ownership.
Diagnosed by: rlibby
Tested by: pho, rlibby
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked). Use it in
preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Reviewed by: kib, trasz
Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D23640
At the time opt-in was introduced adding yourself as a writer was esrializing
across the mount point. Nowadays it is fully per-cpu, the only impact being
a small single-threaded hit on top of what's there right now.
Vast majority of the overhead stems from the call to VOP_GETWRITEMOUNT which
has is done regardless.
Should someone want to microoptimize this single-threaded they can coalesce
looking the mount up with adding a write to it.
which disables tracking mtime updates due to writes through the shared
mapped areas backed by tmpfs files. This removes periodic scans which
downgrades rw mapped pages to ro to note the writes.
Suggested by: mjg
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D23432
It was generated to be just a jumping off point to tmpfs_itimes.
While here provide a dedicated variant for getattr since we normally don't
expect to need to the update from that caller.
O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping
permissions checks on the directory itself after the initial open(). This is
close to the semantics we've historically applied for O_EXEC on a directory,
which is UB according to POSIX. Conveniently, O_SEARCH on a file is also
explicitly undefined behavior according to POSIX, so O_EXEC would be a fine
choice. The spec goes on to state that O_SEARCH and O_EXEC need not be
distinct values, but they're not defined to be the same value.
This was pointed out as an incompatibility with other systems that had made
its way into libarchive, which had assumed that O_EXEC was an alias for
O_SEARCH.
This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC
respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a
directory is checked in vn_open_vnode already, so for completeness we add a
NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not
re-check that when descending in namei.
[0] https://pubs.opengroup.org/onlinepubs/9699919799/
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D23247
Around a generic call to null_nodeget(), there is nothing that would
prevent the unmount of the nullfs mp until we process to the
insmntque1() point. Calculate the VV_ROOT flag after insmntque1() to
not access mp->mnt_data before we have an exclusively locked vnode
from this mount point on the mp vnode list.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Nullfs needs to know the root vnode of the lower fs during the
operation. Currently it caches the upper vnode of it, which is also
the root of the nullfs mount. On unmount, nullfs calls vflush() with
rootrefs == 1, and aborts non-forced unmount if there are any more
vnodes instantiated during vflush(). This means that the reference to
the root vnode after failed non-forced unmount could be lost and
nullm_rootvp points to the freed memory.
Fix it by storing the reference for lower vnode instead, which is kept
intact during vflush(). nullfs_root() now instantiates the upper
vnode of lower root. Care about VV_ROOT flag in null_nodeget().
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
In order to do so we need to install the msdosfs headers to the bootstrap
sysroot and avoid includes of kernel headers that may not exist on every
host (e.g. sys/lockmgr.h). This change should allow bootstrapping of makefs
on FreeBSD 11+ as well as Linux and macOS.
We also have to avoid using the IO_SYNC macro since that may not be
available. In makefs it is only used to switch between calling
bwrite() and bdwrite() which both call the same function. Therefore we
can simply always call bwrite().
For our CheriBSD builds we always bootstrap makefs by setting
LOCAL_XTOOL_DIRS='lib/libnetbsd usr.sbin/makefs' and use the makefs binary
from the build tree to create a bootable disk image.
Reviewed By: brooks
Differential Revision: https://reviews.freebsd.org/D23201
The PR reported a crash that occurred when a file was removed while
client(s) were actively doing lock operations on it.
Since nfsvno_getvp() will return NULL when the file does not exist,
the bug was obvious and easy to fix via this patch. It is a little
surprising that this wasn't found sooner, but I guess the above
case rarely occurs.
Tested by: iron.udjin@gmail.com
PR: 242768
Reported by: iron.udjin@gmail.com
MFC after: 2 weeks
The vnode pager does not want the object lock held. Moving this out allows
further object lock scope reduction in callers. While here add some missing
paging in progress calls and an assert. The object handle is now protected
explicitly with pip.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D23033