Commit Graph

4477 Commits

Author SHA1 Message Date
Mateusz Guzik
c09f799271 tmpfs: drop acq fence now that vn_load_v_data_smr has consume semantics 2021-01-25 22:40:15 +00:00
Mateusz Guzik
cc96f92a57 atomic: make atomic_store_ptr type-aware 2021-01-25 22:40:15 +00:00
Alex Richardson
8d55837dc1 qeueue.h: Add {SLIST,STAILQ,LIST,TAILQ}_END()
We provide these for compat with other queue.h headers since some software
assumes it exists (e.g. the libevent contrib code), but we are not
encouraging their use (NULL should be used instead).

This fixes the following warning (which should arguable be an error since
it results in a function call to an undefined function):

.../contrib/libevent/buffer.c:495:16: warning: implicit declaration of function 'LIST_END' is invalid in C99 [-Wimplicit-function-declaration]
             cbent != LIST_END(&buffer->callbacks);
                      ^
.../contrib/libevent/buffer.c:495:13: warning: comparison between pointer and integer ('struct evbuffer_cb_entry *' and 'int') [-Wpointer-integer-compare]
             cbent != LIST_END(&buffer->callbacks);
             ~~~~~ ^  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Reviewed By:	jhb
Differential Revision: https://reviews.freebsd.org/D27151
2021-01-25 15:09:35 +00:00
Konstantin Belousov
bd01a69f48 nfs_write(): do not call ncl_pager_setsize() after clearing TDP2_SBPAGES
This might unnecessary truncate file undoing extension done by the write.

Reported by:	Yasuhiro Kimura <yasu@utahime.org>
Reviewed by:	rmacklem
Tested by:	rmacklem, Yasuhiro Kimura <yasu@utahime.org>
MFC after:	6 days
Sponsored by:	The FreeBSD Foundation
2021-01-25 01:02:03 +02:00
Konstantin Belousov
aa8c1f8d84 nfs client: block vnode_pager_setsize() calls from nfscl_loadattrcache in nfs_write
Otherwise writing thread might wait on sbusy state of the pages which were
busied by itself, similarly to nfs_read().  But also we need to clear
NVNSETSZKSIP flag possibly set by ncl_pager_setsize(), to not undo
extension done by write.

Reported by:	bdrewery
Reviewed by:	rmacklem
Tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28306
2021-01-23 17:24:32 +02:00
Mateusz Guzik
618029af50 tmpfs: add support for lockless symlink lookup
Reviewed by:	kib (previous version)
Tested by:	pho (previous version)
Differential Revision:	https://reviews.freebsd.org/D27488
2021-01-23 15:04:43 +00:00
Mateusz Guzik
739ecbcf1c cache: add symlink support to lockless lookup
Reviewed by:	kib (previous version)
Tested by:	pho (previous version)
Differential Revision:	https://reviews.freebsd.org/D27488
2021-01-23 15:04:43 +00:00
Konstantin Belousov
2d1e4220eb tmpfs_reclaim: detach unlinked node on dereferencing.
Otherwise it is dereferenced one extra time at unmount, if it survives
long enough.  One way to hold the reference on such node is to keep it
open.

tmpfs_vptocnp() now needs to account for the possibility that unlocked
node was removed from the list.

Reported by:	danfe
Tested by:	danfe, pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-01-14 14:51:37 +02:00
Konstantin Belousov
685265ecfb tmpfs_reclaim: style
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2021-01-14 14:43:13 +02:00
Mateusz Guzik
6b3a9a0f3d Convert remaining cap_rights_init users to cap_rights_init_one
semantic patch:

@@

expression rights, r;

@@

- cap_rights_init(&rights, r)
+ cap_rights_init_one(&rights, r)
2021-01-12 13:16:10 +00:00
Rick Macklem
148a227bf8 nfsd: add KASSERTs to nfsm_trimtrailing() for M_EXTPG mbufs
Add KASSERTS to nfsm_trimtrailing() to confirm the sanity of
the arguments for the M_EXTPG case.

Suggested by:	kib
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D28053
2021-01-10 13:50:15 -08:00
Konstantin Belousov
ac2576b9f7 tmpfs open: assert that there is no double-init of f_data.
Sponsored by:	The FreeBSD Foundation
2021-01-10 04:48:36 +02:00
Konstantin Belousov
9f200bc47b tmpfs_free_tmp(): explicitly assert that tmp is locked
Despite TMPFS_UNLOCK() is done in both paths later, unlocking not locked
mutex provides different failure mode.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-01-10 04:48:29 +02:00
Konstantin Belousov
42bebbda9e tmpfs: make M_TMPFSMNT static to tmpfs_vfsops.c
This malloc type is only used in this file.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-01-10 04:44:55 +02:00
Alan Somers
17a82e6af8 Fix vnode locking bug in fuse_vnop_copy_file_range
MFC-With:	92bbfe1f0d
Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D27938
2021-01-03 11:16:20 -07:00
Mark Johnston
90f580b954 Ensure that dirent's d_off field is initialized
We have the d_off field in struct dirent for providing the seek offset
of the next directory entry.  Several filesystems were not initializing
the field, which ends up being copied out to userland.

Reported by:	Syed Faraz Abrar <faraz@elttam.com>
Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27792
2021-01-03 11:50:31 -05:00
Alan Somers
34477e25c1 fusefs: only check vnode locks with DEBUG_VFS_LOCKS
MFC-With:	37df9d3bba
Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D27939
2021-01-03 09:19:00 -07:00
Alan Somers
542711e520 Fix a vnode locking bug in fuse_vnop_advlock.
Must lock the vnode before accessing the fufh table.  Also, check for
invalid parameters earlier.  Bug introduced by r346170.

MFC after:	2 weeks

Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D27936
2021-01-03 09:16:23 -07:00
Mateusz Guzik
3e506a67bb vfs: add v_irflag accessors
Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D27793
2021-01-03 06:50:06 +00:00
Konstantin Belousov
51a9b978e7 nfs server: improve use of the VFS KPI
In particular, do not assume that vn_start_write() returns the same mp
as it was passed in, or never returns error.

Also be more accurate to return NULL vp and mp when error occured, to
catch wrong control flow easier.

Stop checking for NULL mp before calling vn_finished_write(), NULL mp
is handled transparently by the function.

Reviewed by:	rmacklem
Tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D27881
2021-01-02 20:17:12 +02:00
Rick Macklem
dc78533a52 nfsd: fix NFSv4.0 seqid handling for ERELOOKUP
Commit 774a36851e fixed the NFS server so that it could handle
ERELOOKUP returns from VOP calls by redoing the operation/RPC.
However, for NFSv4.0, redoing an Open would increment
the open_owner's seqid multiple times, breaking the protocol.
This patch sets a new flag called ND_ERELOOKUP on the RPC when
a redo is in progress.  Then the code that increments the seqid
avoids the seqid increment/check when the flag is set, since
it indicates this has already been done for the Open.
2021-01-01 14:21:51 -08:00
Rick Macklem
774a36851e nfsd: fix NFS server for ERELOOKUP
r367672 modified UFS such that certain VOPs, such as
VOP_CREATE() will intermittently return ERELOOKUP.
When this happens, the entire system call, or NFS
operation in the case of the NFS server, must be redone.

This patch adds that support to the NFS server by rolling
back the state of the NFS request arguments and NFS
reply arguments mbuf lists to the condition they were
in before the operation and then redoing the operation.

Tested by:	pho
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D27875
2021-01-01 13:55:51 -08:00
Alan Somers
92bbfe1f0d fusefs: implement FUSE_COPY_FILE_RANGE.
This updates the FUSE protocol to 7.28, though most of the new features
are optional and are not yet implemented.

MFC after:	2 weeks
Relnotes:	yes
Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D27818
2021-01-01 10:18:23 -07:00
Mateusz Guzik
d71965127f tmpfs: use VNPASS when asserting on a vnode in tmpfs_read_pgcache 2021-01-01 03:23:01 +00:00
Alan Somers
37df9d3bba fusefs: update FUSE protocol to 7.24 and implement FUSE_LSEEK
FUSE_LSEEK reports holes on fuse file systems, and is used for example
by bsdtar.

MFC after:	2 weeks
Relnotes:	yes
Reviewed by:	cem
Differential Revision: https://reviews.freebsd.org/D27804
2020-12-31 08:51:47 -07:00
Edward Tomasz Napierala
4ddb3cc597 devfs(4): defer freeing until we drop devmtx ("cdev")
Before r332974 the old code would sometimes cause a rare lock order
reversal against pagequeue, which looked roughly like this:

witness_checkorder()
__mtx_lock-flags()
vm_page_alloc()
uma_small_alloc()
keg_alloc_slab()
keg_fetch-slab()
zone_fetch-slab()
zone_import()
zone_alloc_bucket()
uma_zalloc_arg()
bucket_alloc()
uma_zfree_arg()
free()
devfs_metoo()
devfs_populate_loop()
devfs_populate()
devfs_rioctl()
VOP_IOCTL_APV()
VOP_IOCTL()
vn_ioctl()
fo_ioctl()
kern_ioctl()
sys_ioctl()

Since r332974 the original problem no longer exists, but it still
makes sense to move things out of the - often congested - lock.

Reviewed By:	kib, markj
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D27334
2020-12-29 13:47:36 +00:00
Alan Somers
4f4111d2c5 fusefs: delete some dead code
The original fusefs GSoC project seems to have envisioned exchanging two
types of messages with FUSE servers.  Perhaps vectored and non-vectored?
But in practice only one type has ever been used.  Delete the other type.

Reviewed by:		cem
Differential Revision:	https://reviews.freebsd.org/D27770
2020-12-28 19:05:35 +00:00
Mark Johnston
599f904463 msdosfs: Fix a leak of dirent padding bytes
This was missed in r340856 / commit
6d2e2df764.  Three bytes from the kernel
stack may be leaked when reading directory entries.

Reported by:	Syed Faraz Abrar <faraz@elttam.com>
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2020-12-27 17:01:44 -05:00
Rick Macklem
665b1365fe Add a new "tlscertname" NFS mount option.
When using NFS-over-TLS, an NFS client can optionally provide an X.509
certificate to the server during the TLS handshake.  For some situations,
such as different NFS servers or different certificates being mapped
to different user credentials on the NFS server, there may be a need
for different mounts to provide different certificates.

This new mount option called "tlscertname" may be used to specify a
non-default certificate be provided.  This alernate certificate will
be stored in /etc/rpc.tlsclntd in a file with a name based on what is
provided by this mount option.
2020-12-23 13:42:55 -08:00
Brooks Davis
52e63ec2f1 VFS_QUOTACTL: Remove needless casts of arg
The argument is a void * so there's no need to cast it to caddr_t.

Update documentation to match function decleration.

Reviewed by:	freqlabs
Obtained from:	CheriBSD
MFC after:	1 week
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D27093
2020-12-17 21:58:10 +00:00
Kirk McKusick
645027c89d In ext2fs, BA_CLRBUF is used in ext2_balloc() not UFS_BALLOC().
Noted by:     kib
MFC after:    3 days
Sponsored by: Netflix
2020-12-08 00:49:31 +00:00
Kirk McKusick
bb3c01ec79 Document the BA_CLRBUF flag used in ufs and ext2fs filesystems.
Suggested by: kib
MFC after:    3 days
Sponsored by: Netflix
2020-12-06 20:50:21 +00:00
Konstantin Belousov
cd85379104 Make MAXPHYS tunable. Bump MAXPHYS to 1M.
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.

Make b_pages[] array in struct buf flexible.  Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*).  Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.

Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys.  Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight.  Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.

Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.

Suggested by: mav (*)
Reviewed by:	imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27225
2020-11-28 12:12:51 +00:00
Konstantin Belousov
f7af6e5e54 nullfs: provide custom bypass for VOP_READ_PGCACHE().
Normal bypass expects locked vnode, which is not true for
VOP_READ_PGCACHE().  Ensure liveness of the lower vnode by taking the
upper vnode interlock, which is also taked by null_reclaim() when
setting v_data to NULL.

Reported and tested by:	pho
Reviewed by:	markj, mjg
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D27327
2020-11-26 18:16:32 +00:00
Konstantin Belousov
6936779347 msdosfs: suspend around unmount or remount rw->ro.
This also eliminates unsafe use of VFS_SYNC(MNT_WAIT).

Requested by:	mckusick
Discussed with:	imp
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D27269
2020-11-20 15:19:30 +00:00
Konstantin Belousov
1b3cb4dc04 msdosfs: Add trivial support for suspension.
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D27269
2020-11-20 12:31:02 +00:00
Conrad Meyer
c1c4d0e9a8 msdosfs(5): Fix debug-only format string
No functional change; MSDOSFS_DEBUG isn't a real build option, so this isn't
covered by LINT kernels.
2020-11-18 20:20:03 +00:00
Alan Somers
ac8c4a61af nfs: Mark unused statistics variable as reserved
FreeBSD's NFS exporter has long exported some unused statistics fields.
Revision r366992 removed them from nfsstat. This revision renames those
fields in the kernel's exported structures to make it clear to other
consumers that they are unused.

Reported by:	emaste
Reviewed by:	emaste
Sponsored by:	Axcient
Differential Revision:	https://reviews.freebsd.org/D27258
2020-11-18 04:35:49 +00:00
Conrad Meyer
85078b8573 Split out cwd/root/jail, cmask state from filedesc table
No functional change intended.

Tracking these structures separately for each proc enables future work to
correctly emulate clone(2) in linux(4).

__FreeBSD_version is bumped (to 1300130) for consumption by, e.g., lsof.

Reviewed by:	kib
Discussed with:	markj, mjg
Differential Revision:	https://reviews.freebsd.org/D27037
2020-11-17 21:14:13 +00:00
Edward Tomasz Napierala
e3b1c847a4 Make it possible to mount a fuse filesystem, such as squashfuse,
from a Linux binary.  Should come handy for AppImages.

Reviewed by:	asomers
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26959
2020-11-09 08:53:15 +00:00
Mateusz Guzik
f24aa01f9d tmpfs: reorder struct tmpfs_node to shrink it by 8 bytes
The reduction (232 -> 224 bytes) allows UMA to fit one more item (17 -> 18)
per slab as reported in vm.uma.TMPFS_node.keg.ipers.
2020-11-05 11:24:45 +00:00
Conrad Meyer
20172854ab Add sbuf streaming mode to pseudofs(9), use in linprocfs(5)
Add a pseudofs node flag 'PFS_AUTODRAIN', which automatically emits sbuf
contents to the caller when the sbuf buffer fills.  This is only
permissible if the corresponding PFS node fill function can sleep
whenever it appends to the sbuf.

linprocfs' /proc/self/maps node happens to meet this requirement.
Streaming out the file as it is composed avoids truncating the output
and also avoids preallocating a very large buffer.

Reviewed by:	markj; earlier version: emaste, kib, trasz
Differential Revision:	https://reviews.freebsd.org/D27047
2020-11-05 06:48:51 +00:00
Mateusz Guzik
7c58c37ebb tmpfs: change tmpfs dirent zone into a malloc type
It is 64 bytes.
2020-10-30 14:07:25 +00:00
Mateusz Guzik
4bfebc8d2c cache: add cache_vop_mkdir and rename cache_rename to cache_vop_rename 2020-10-30 10:46:35 +00:00
Edward Tomasz Napierala
e3c51151a0 Make it possible to mount nullfs(5) using plain mount(8)
instead of mount_nullfs(8).

Obviously you'd need to force mount(8) to not call
mount_nullfs(8) to make use of it.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26934
2020-10-29 15:28:15 +00:00
Edward Tomasz Napierala
bce7ee9d41 Drop "All rights reserved" from all my stuff. This includes
Foundation copyrights, approved by emaste@.  It does not include
files which carry other people's copyrights; if you're one
of those people, feel free to make similar change.

Reviewed by:	emaste, imp, gbe (manpages)
Differential Revision:	https://reviews.freebsd.org/D26980
2020-10-28 13:46:11 +00:00
Mateusz Guzik
25fb30bd9a vfs: drop spurious cache_purge on rmdir
The removed directory gets cache_purged which is sufficient to remove any entries
related to the parent.

Note only tmpfs, ufs and zfs are patched.
2020-10-23 15:50:49 +00:00
Hans Petter Selasky
a71074e0af Fix for loading cuse.ko via rc.d . Make sure we declare the cuse(3)
module by name and not only by the version information, so that
"kldstat -q -m cuse" works.

Found by:		Goran Mekic <meka@tilda.center>
MFC after:		1 week
Sponsored by:		Mellanox Technologies // NVIDIA Networking
2020-10-23 08:44:53 +00:00
Mateusz Guzik
ab21ed17ed vfs: drop the de facto curthread argument from VOP_INACTIVE 2020-10-20 07:19:03 +00:00
Mateusz Guzik
8ecd87a3e7 vfs: drop spurious cred argument from VOP_VPTOCNP 2020-10-20 07:18:27 +00:00
Konstantin Belousov
6b56b0ca93 nullfs: ensure correct lock is taken after bypass.
If lower VOP relocked the lower vnode, it is possible that nullfs
vnode was reclaimed meantime.  In this case nullfs vnode no longer
shares lock with lower vnode, which breaks locking protocol.

Check for the condition and acquire nullfs vnode lock if detected.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-10-19 19:23:22 +00:00
Edward Tomasz Napierala
ce764cbd1c Bump pseudofs size limit from 128kB to 1MB. The old limit could result
in process' memory maps being truncated.

PR:		237883
Submitted by:	dchagin
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20575
2020-10-16 09:58:10 +00:00
Mateusz Guzik
eb88fed446 cache: fix vexec panic when racing against vgone
Use of dead_vnodeops would result in a panic instead of returning the intended
EOPNOTSUPP error.

While here make sure to abort, not just try to return a partial result.
The former allows the regular lookup to restart from scratch, while the latter
makes it stuck with an unusable vnode.

Reported by:	kevans
2020-10-09 19:10:00 +00:00
Pedro F. Giffuni
c2f0581e43 ext2fs: minor typo.
Obtained from:	Dragonfly
MFC after:	3 days
2020-10-06 21:31:04 +00:00
Rick Macklem
9f669985b2 Modify the NFSv4.2 VOP_COPY_FILE_RANGE() client call to return after one
successful RPC.

Without this patch, the NFSv4.2 VOP_COPY_FILE_RANGE() client call would
loop until the copy "len" was completed.  The problem with doing this is
that it might take a considerable time to complete for a large "len".
By returning after a single successful Copy RPC that copied some of the
data, the application that did the copy_file_range(2) syscall will be
more responsive to signal delivery for large "len" copies.
2020-10-01 00:47:35 +00:00
Rick Macklem
ff45b9fc1a Bjorn reported a problem where the Linux NFSv4.1 client is
using an open_to_lock_owner4 when that lock_owner4 has already
been created by a previous open_to_lock_owner4. This caused the NFS server
to reply NFSERR_INVAL.

For NFSv4.0, this is an error, although the updated NFSv4.0 RFC7530 notes
that the correct error reply is NFSERR_BADSEQID (RFC3530 did not specify
what error to return).

For NFSv4.1, it is not obvious whether or not this is allowed by RFC5661,
but the NFSv4.1 server can handle this case without error.
This patch changes the NFSv4.1 (and NFSv4.2) server to handle multiple
uses of the same lock_owner in open_to_lock_owner so that it now correctly
interoperates with the Linux NFS client.
It also changes the error returned for NFSv4.0 to be NFSERR_BADSEQID.

Thanks go to Bjorn for diagnosing this and testing the patch.
He also provided a program that I could use to reproduce the problem.

Tested by:	bj@cebitec.uni-bielefeld.de (Bjorn Fischer)
PR:		249567
Reported by:	bj@cebitec.uni-bielefeld.de (Bjorn Fischer)
MFC after:	3 days
2020-09-26 23:05:38 +00:00
Alan Somers
a62772a78e fusefs: fix mmap'd writes in direct_io mode
If a FUSE server returns FOPEN_DIRECT_IO in response to FUSE_OPEN, that
instructs the kernel to bypass the page cache for that file. This feature
is also known by libfuse's name: "direct_io".

However, when accessing a file via mmap, there is no possible way to bypass
the cache completely. This change fixes a deadlock that would happen when
an mmap'd write tried to invalidate a portion of the cache, wrongly assuming
that a write couldn't possibly come from cache if direct_io were set.

Arguably, we could instead disable mmap for files with FOPEN_DIRECT_IO set.
But allowing it is less likely to cause user complaints, and is more in
keeping with the spirit of open(2), where O_DIRECT instructs the kernel to
"reduce", not "eliminate" cache effects.

PR:		247276
Reported by:	trapexit@spawn.link
Reviewed by:	cem
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D26485
2020-09-24 16:27:53 +00:00
Mark Johnston
8e13d6dfb6 udf: Validate the full file entry length
Otherwise a corrupted file entry containing invalid extended attribute
lengths or allocation descriptor lengths can trigger an overflow when
the file entry is loaded.

admbug:		965
PR:		248613
Reported by:	C Turt <ecturt@gmail.com>
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2020-09-22 17:05:01 +00:00
Rick Macklem
58dd2b52cb Fix a LOR between the NFS server and server side krpc.
Recent testing of the NFS-over-TLS code found a LOR between the mutex lock
used for sessions and the sleep lock used for server side krpc socket
structures in nfsrv_checksequence().  This was fixed by r365789.
A similar bug exists in nfsrv_bindconnsess(), where SVC_RELEASE() is called
while mutexes are held.
This patch applies a fix similar to r365789, moving the SVC_RELEASE() call
down to after the mutexes are released.

This patch fixes the problem by moving the SVC_RELEASE() call in
nfsrv_checksequence() down a few lines to below where the mutex is released.

MFC after:	1 week
2020-09-18 23:52:56 +00:00
Eric van Gyzen
f9cc8410e1 vm_ooffset_t is now unsigned
vm_ooffset_t is now unsigned. Remove some tests for negative values,
or make other adjustments accordingly.

Reported by:	Coverity
Reviewed by:	kib markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D26214
2020-09-18 16:48:08 +00:00
Konstantin Belousov
016b7c7e39 tmpfs: restore atime updates for reads from page cache.
Split TMPFS_NODE_ACCCESSED bit into dedicated byte that can be updated
atomically without locks or (locked) atomics.

tn_update_getattr() change also contains unrelated bug fix.

Reported by:	lwhsu
PR:	249362
Reviewed by:	markj (previous version)
Discussed with:	mjg
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26451
2020-09-16 21:28:18 +00:00
Konstantin Belousov
23f9071466 Style.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2020-09-16 21:24:34 +00:00
Rick Macklem
a5c55410b3 Fix a LOR between the NFS server and server side krpc.
Recent testing of the NFS-over-TLS code found a LOR between the mutex lock
used for sessions and the sleep lock used for server side krpc socket
structures.
The code in nfsrv_checksequence() would call SVC_RELEASE() with the mutex
held.  Normally this is ok, since all that happens is SVC_RELEASE()
decrements a reference count.  However, if the socket has just been shut
down, SVC_RELEASE() drops the reference count to 0 and acquires a sleep
lock during destruction of the server side krpc structure.

This patch fixes the problem by moving the SVC_RELEASE() call in
nfsrv_checksequence() down a few lines to below where the mutex is released.

MFC after:	1 week
2020-09-16 02:25:18 +00:00
Konstantin Belousov
081e36e760 Add tmpfs page cache read support.
Or it could be explained as lockless (for vnode lock) reads.  Reads
are performed from the node tn_obj object.  Tmpfs regular vnode object
lifecycle is significantly different from the normal OBJT_VNODE: it is
alive as far as ref_count > 0.

Ensure liveness of the tmpfs VREG node and consequently v_object
inside VOP_READ_PGCACHE by referencing tmpfs node in tmpfs_open().
Provide custom tmpfs fo_close() method on file, to ensure that close
is paired with open.

Add tmpfs VOP_READ_PGCACHE that takes advantage of all tmpfs quirks.
It is quite cheap in code size sense to support page-ins for read for
tmpfs even if we do not own tmpfs vnode lock.  Also, we can handle
holes in tmpfs node without additional efforts, and do not have
limitation of the transfer size.

Reviewed by:	markj
Discussed with and benchmarked by:	mjg (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26346
2020-09-15 22:19:16 +00:00
Konstantin Belousov
4601f5f5ee Microoptimize tmpfs node ref/unref by using atomics.
Avoid tmpfs mount and node locks when ref count is greater than zero,
which is the case until node is being destroyed by unlink or unmount.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26346
2020-09-15 22:13:21 +00:00
Konstantin Belousov
96474d2a3f Do not copy vp into f_data for DTYPE_VNODE files.
The pointer to vnode is already stored into f_vnode, so f_data can be
reused.  Fix all found users of f_data for DTYPE_VNODE.

Provide finit_vnode() helper to initialize file of DTYPE_VNODE type.

Reviewed by:	markj (previous version)
Discussed with:	freqlabs (openzfs chunk)
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26346
2020-09-15 21:55:21 +00:00
Rick Macklem
2848d6d4de Fix a case where the NFSv4.0 server might crash if delegations are enabled.
asomers@ reported a crash on an NFSv4.0 server with a backtrace of:
kdb_backtrace
vpanic
panic
nfsrv_docallback
nfsrv_checkgetattr
nfsrvd_getattr
nfsrvd_dorpc
nfssvc_program
svc_run_internal
svc_thread_start
fork_exit
fork_trampoline
where the panic message was "docallb", which indicates that a callback
was attempted when the ClientID is unconfirmed.
This would not normally occur, but it is possible to have an unconfirmed
ClientID structure with delegation structure(s) chained off it if the
client were to issue a SetClientID with the same "id" but different
"verifier" after acquiring delegations on the previously confirmed ClientID.

The bug appears to be that nfsrv_checkgetattr() failed to check for
this uncommon case of an unconfirmed ClientID with a delegation structure
that no longer refers to a delegation the client knows about.

This patch adds a check for this case, handling it as if no delegation
exists, which is the case when the above occurs.
Although difficult to reproduce, this change should avoid the panic().

PR:		249127
Reported by:	asomers
Reviewed by:	asomers
MFC after:	1 week
Differential Revision:	https://reviews.freebbsd.org/D26342
2020-09-14 00:44:50 +00:00
Mateusz Guzik
c86d2ba8a5 tmpfs: drop spurious cache_purge in tmpfs_reclaim
vgone already performs it.
2020-09-04 19:30:15 +00:00
Mateusz Guzik
586ee69f09 fs: clean up empty lines in .c and .h files 2020-09-01 21:18:40 +00:00
Rick Macklem
4cdbb07b3c Add a check to test for the case of the "tls" option being used with "udp".
The KERN_TLS only supports TCP, so use of the "tls" option with "udp" will
not work.  This patch adds a test for this case, so that the mount is not
attempted when both "tls" and "udp" are specified.
2020-09-01 01:10:16 +00:00
Eric van Gyzen
0bb426274e Fix nfsrvd_locku memory leak
Coverity detected memory leak fix.

Submitted by:	bret_ketchum@dell.com
Reported by:	Coverity
Reviewed by:	rmacklem
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D26231
2020-08-31 15:31:17 +00:00
Rick Macklem
6e4b6ff88f Add flags to enable NFS over TLS to the NFS client and server.
An Internet Draft titled "Towards Remote Procedure Call Encryption By Default"
(soon to be an RFC I think) describes how Sun RPC is to use TLS with NFS
as a specific application case.
Various commits prepared the NFS code to use KERN_TLS, mainly enabling use
of ext_pgs mbufs for large RPC messages.
r364475 added TLS support to the kernel RPC.

This commit (which is the final one for kernel changes required to do
NFS over TLS) adds support for three export flags:
MNT_EXTLS - Requires a TLS connection.
MNT_EXTLSCERT - Requires a TLS connection where the client presents a valid
            X.509 certificate during TLS handshake.
MNT_EXTLSCERTUSER - Requires a TLS connection where the client presents a
            valid X.509 certificate with "user@domain" in the otherName
            field of the SubjectAltName during TLS handshake.
Without these export options, clients are permitted, but not required, to
use TLS.

For the client, a new nmount(2) option called "tls" makes the client do
a STARTTLS Null RPC and TLS handshake for all TCP connections used for the
mount. The CLSET_TLS client control option is used to indicate to the kernel RPC
that this should be done.

Unless the above export flags or "tls" option is used, semantics should
not change for the NFS client nor server.

For NFS over TLS to work, the userspace daemons rpctlscd(8) { for client }
or rpctlssd(8) daemon { for server } must be running.
2020-08-27 23:57:30 +00:00
Mateusz Guzik
4961e997a6 fuse: unbreak after r364814
Reported by:	kevans
2020-08-26 21:13:36 +00:00
Mateusz Guzik
feabaaf995 cache: drop the always curthread argument from reverse lookup routines
Note VOP_VPTOCNP keeps getting it as temporary compatibility for zfs.

Tested by:	pho
2020-08-24 08:57:02 +00:00
Mateusz Guzik
39f8815070 cache: add cache_rename, a dedicated helper to use for renames
While here make both tmpfs and ufs use it.

No fuctional changes.
2020-08-20 10:05:46 +00:00
Pedro F. Giffuni
ef20a5b58c extfs: remove redundant little endian conversion.
The XTIME_TO_NSEC macro already calls the htole32(), so there is no need
to call it twice. This code does nothing on LE platforms and affects only
nanosecond and birthtime fields so it's difficult to notice on regular use.

Hinted by:	DragonFlyBSD (git ae503f8f6f4b9a413932ffd68be029f20c38cab4)

X-MFC with:	r361136
2020-08-20 05:08:49 +00:00
Mateusz Guzik
8f226f4c23 vfs: remove the always-curthread td argument from VOP_RECLAIM 2020-08-19 07:28:01 +00:00
Mateusz Guzik
7ad2a82da2 vfs: drop the error parameter from vn_isdisk, introduce vn_isdisk_error
Most consumers pass NULL.
2020-08-19 02:51:17 +00:00
Rick Macklem
808306dd0f Delete the unused "use_ext" argument to nfscl_reqstart().
This is a partial revert of r363210, since the "use_ext" argument added
by that commit is not actually useful.

This patch should not result in any semantics change.
2020-08-18 01:41:12 +00:00
Pedro F. Giffuni
19642a0cfb extfs: remove redundant little endian conversion.
The NSEC_TO_XTIME macro already calls the htole32(), so there is no need
to call it twice. This code does nothing on LE platforms and affects only
nanosecond and birthtime fields so it's difficult to notice on regular use.

X-MFC with:	r361136
2020-08-17 15:05:41 +00:00
Konstantin Belousov
685cb01a18 VMIO reads: enable for nullfs upper vnode if the lower vnode supports it.
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D25968
2020-08-16 21:05:56 +00:00
Mateusz Guzik
1abe36567f tmpfs: use vget_prep/vget_finish instead of vget + vnode 2020-08-16 17:19:23 +00:00
Mateusz Guzik
a92a971bbb vfs: remove the thread argument from vget
It was already asserted to be curthread.

Semantic patch:

@@

expression arg1, arg2, arg3;

@@

- vget(arg1, arg2, arg3)
+ vget(arg1, arg2)
2020-08-16 17:18:54 +00:00
Rick Macklem
90cf38f22e Fix a bug introduced by r363001 for the ext_pgs case.
r363001 added support for ext_pgs mbufs to nfsm_uiombuf().
By inspection, I noticed that "mlen" was not set non-zero and, as such, there
would be an iteration of the loop that did nothing.
This patch sets it.
This bug would have no effect on the system, since the ext_pgs mbuf code
is not yet enabled.
2020-08-12 04:35:49 +00:00
Conrad Meyer
0ac9e27ba9 devfs: Abstract locking assertions
The conversion was largely mechanical: sed(1) with:

  -e 's|mtx_assert(&devmtx, MA_OWNED)|dev_lock_assert_locked()|g'
  -e 's|mtx_assert(&devmtx, MA_NOTOWNED)|dev_lock_assert_unlocked()|g'

The definitions of these abstractions in fs/devfs/devfs_int.h are the
only non-mechanical change.

No functional change.
2020-08-12 00:32:31 +00:00
Mateusz Guzik
3b44443626 devfs: rework si_usecount to track opens
This removes a lot of special casing from the VFS layer.

Reviewed by:	kib (previous version)
Tested by:	pho (previous version)
Differential Revision:	https://reviews.freebsd.org/D25612
2020-08-11 14:27:57 +00:00
Rick Macklem
02511d2112 Add an argument to newnfs_connect() that indicates use TLS for the connection.
For NFSv4.0, the server creates a server->client TCP connection for callbacks.
If the client mount on the server is using TLS, enable TLS for this callback
TCP connection.
TLS connections from clients will not be supported until the kernel RPC
changes are committed.

Since this changes the internal ABI between the NFS kernel modules that
will require a version bump, delete newnfs_trimtrailing(), which is no
longer used.

Since LCL_TLSCB is not yet set, these changes should not have any semantic
affect at this time.
2020-08-11 00:26:45 +00:00
Mateusz Guzik
03337743db vfs: clean MNTK_FPLOOKUP if MNT_UNION is set
Elides checking it during lookup.
2020-08-10 11:51:21 +00:00
Mateusz Guzik
ca423b858b devfs: bool -> int
Fixes buildworld after r364069
2020-08-10 11:46:39 +00:00
Mateusz Guzik
7b19bddac8 devfs: save on spurious relocking for devfs_populate
Tested by:	pho
2020-08-10 10:36:43 +00:00
Mateusz Guzik
f8935a96d1 devfs: use cheaper lockmgr entry points
Tested by:	pho
2020-08-10 10:36:10 +00:00
Mateusz Guzik
f9c13ab856 devfs: use vget_prep/vget_finish
Tested by:	pho
2020-08-10 10:35:47 +00:00
Mateusz Guzik
fc9fcee01a nullfs: add missing VOP_STAT handling
Tested by:	pho
2020-08-10 10:31:17 +00:00
Mateusz Guzik
9a14439f2f tmpfs: add VOP_STAT handler 2020-08-07 23:07:47 +00:00
Mateusz Guzik
d292b1940c vfs: remove the obsolete privused argument from vaccess
This brings argument count down to 6, which is passable without the
stack on amd64.
2020-08-05 09:27:03 +00:00
Rick Macklem
cb889ce631 Add optional support for ext_pgs mbufs to the NFS server's read, readlink
and getxattr operations.

This patch optionally enables generation of read, readlink and getxattr replies
in ext_pgs mbufs.  Since neither of ND_EXTPG or ND_TLS are currently ever set,
there is no change in semantics at this time.
It also corrects the message in a couple of panic()s that should never occur.

This is another in the series of commits that add support to the NFS client
and server for building RPC messages in ext_pgs mbufs with anonymous pages.
This is useful so that the entire mbuf list does not need to be
copied before calling sosend() when NFS over TLS is enabled.

Use of ext_pgs mbufs will not be enabled until the kernel RPC is updated
to handle TLS.
2020-07-31 23:35:49 +00:00
Rick Macklem
ea83d07e82 Add support for ext_pgs mbufs to nfsrvd_readdir() and nfsrvd_readdirplus().
This patch code that optionally (based on ND_TLS, never set yet) generates
readdir replies in ext_pgs mbufs.
To trim the list back, a new function that is ext_pgs aware called
nfsm_trimtrailing() replaces newnfs_trimtrailing().
newnfs_trimtrailing() is no longer used, but will be removed in a future
commit, since its removal does modify the internal kpi between the NFS
modules.

This is another in the series of commits that add support to the NFS client
and server for building RPC messages in ext_pgs mbufs with anonymous pages.
This is useful so that the entire mbuf list does not need to be
copied before calling sosend() when NFS over TLS is enabled.

Use of ext_pgs mbufs will not be enabled until the kernel RPC is updated
to handle TLS.
2020-07-29 22:58:08 +00:00
Rick Macklem
194d870481 Fix the NFSv4 client so that it checks for support of TimeCreate before
trying to set it.

r362490 added support for setting of the TimeCreate (va_birthtime) attribute,
but it does so without checking to see if the server supports the attribute.
This could result in NFSERR_ATTRNOTSUPP error replies to the Setattr operation.
This patch adds code to check that the server supports TimeCreate before
attempting to do a Setattr of it to avoid these error returns.
2020-07-26 23:13:10 +00:00
Rick Macklem
2de592f6e1 Fix the NFS server so that it sets va_birthtime.
r362490 marked that the NFSv4 attribute TimeCreate (va_birthtime) is supported,
but it did not change the NFS server code to actually do it.
As such, errors could occur when unrolling a tarball onto an NFSv4 mounted
volume, since setting TimeCreate would fail with a NFSERR_ATTRNOTSUPP reply.

This patch fixes the server so that it does TimeCreate and also makes
sure that TimeCreate will not be set for a DS file for a pNFS server.

A separate commit will add a check to the NFSv4 client for support of
the TimeCreate attribute before attempting to set it, to avoid a problem
when mounting a server that does not support the attribute.
The failures will still occur for r362490 or later kernels that do not
have this patch, since they indicate support for the attribute, but do not
actually support the attribute.
2020-07-26 23:03:41 +00:00
Rick Macklem
18a48314ba Add support for ext_pgs mbufs to nfsrv_adj().
This patch uses a slightly different algorithm for nfsrv_adj()
since ext_pgs mbuf lists are not permitted to have m_len == 0 mbufs.
As such, the code now frees mbufs after the adjustment in the list instead
of setting their m_len field to 0.
Since mbuf(s) may be trimmed off the tail of the list, the function now
returns a pointer to the last mbuf in the list.  This saves the caller
from needing to use m_last() to find the last mbuf.
It also implies that it might return a nul list, which required a check for
that in nfsrvd_readlink().

This is another in the series of commits that add support to the NFS client
and server for building RPC messages in ext_pgs mbufs with anonymous pages.
This is useful so that the entire mbuf list does not need to be
copied before calling sosend() when NFS over TLS is enabled.

Use of ext_pgs mbufs will not be enabled until the kernel RPC is updated
to handle TLS.
2020-07-26 02:42:09 +00:00