Commit Graph

453 Commits

Author SHA1 Message Date
Rick Macklem
4476c1def0 Add a boolean argument to nfscl_reqstart() to indicate that ext_pgs mbufs
should be used.

For KERN_TLS (and possibly some other future network interface) the mbuf
list passed into sosend() must be ext_pgs mbufs. The krpc could simply
copy all the mbuf data into ext_pgs mbufs before calling sosend(), but
that would be inefficient for large RPC messages.
This patch adds an argument to nfscl_reqstart() to indicate that it should
fill the RPC message into ext_pgs mbufs.
It also adds fields to "struct nfsrv_descript" needed for building NFS RPC
messages in ext_pgs mbufs, along with new flags for this.

Since the argument is always "false", this commit should not result in any
semantic change. However, this commit prepares the code
for future commits that will add support for building of NFS RPC messages
in ext_pgs mbufs.
2020-06-26 03:11:54 +00:00
Alan Somers
eea79fde5a Remove vfs_statfs and vnode_mount macros from NFS
These macro definitions are no longer needed as the NFS OSX port is long
dead.  The vfs_statfs macro conflicts with the vfsops field of the same
name.

Submitted by:	shivank@
Reviewed by:	rmacklem
MFC after:	2 weeks
Sponsored by:	Google, Inc. (GSoC 2020)
Differential Revision:	https://reviews.freebsd.org/D25263
2020-06-17 16:20:19 +00:00
Alexander V. Chernikov
9d5df78e64 Fix NOINET6 build broken by r361575.
Reported by:	ci, hps
2020-05-28 09:52:28 +00:00
Alexander V. Chernikov
c74ce5cca3 Make NFS address selection use fib4_lookup().
fib4_lookup_nh_ represents pre-epoch generation of fib api,
providing less guarantees over pointer validness and requiring
on-stack data copying.
Switch call to use new fib4_lookup(), allowing to eventually
deprecate old api.

Differential Revision:	https://reviews.freebsd.org/D24977
2020-05-28 07:35:07 +00:00
Alexander V. Chernikov
2bbab0af6d Use epoch(9) for rtentries to simplify control plane operations.
Currently the only reason of refcounting rtentries is the need to report
 the rtable operation details immediately after the execution.
Delaying rtentry reclamation allows to stop refcounting and simplify the code.
Additionally, this change allows to reimplement rib_lookup_info(), which
 is used by some of the customers to get the matching prefix along
 with nexthops, in more efficient way.

The change keeps per-vnet rtzone uma zone. It adds nh_vnet field to
 nhop_priv to be able to reliably set curvnet even during vnet teardown.
Rest of the reference counting code will be removed in the D24867 .

Differential Revision:	https://reviews.freebsd.org/D24866
2020-05-23 10:21:02 +00:00
Ryan Moeller
b9cc3262bc nfs: Remove APPLESTATIC macro
It is no longer useful.

Reviewed by:	rmacklem
Approved by:	mav (mentor)
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D24811
2020-05-12 13:23:25 +00:00
Ryan Moeller
32033b3d30 Remove APPLEKEXT ifndefs
They are no longer useful.

Reviewed by:	rmacklem
Approved by:	mav (mentor)
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D24752
2020-05-08 14:39:38 +00:00
Rick Macklem
5ecf33c6c4 Get rid of uio_XXX macros used for the Mac OS/X port.
The NFS code had a bunch of Mac OS/X accessor functions named uio_XXX
left over from the port to Mac OS/X. Since that port is long forgotten,
replace the calls with the code generated by the FreeBSD macros for these
in nfskpiport.h. This allows the macros to be deleted from nfskpiport.h
and I think makes the code more readable.

This patch should not result in any semantic change.
2020-04-28 02:11:02 +00:00
Rick Macklem
e4a458bb1b Remove Mac OS/X macros that did nothing for FreeBSD.
The macros CAST_USER_ADDR_T() and CAST_DOWN() were used for the Mac OS/X
port. The first of these macros was a no-op for FreeBSD and the second
is no longer used.
This patch gets rid of them. It also deletes the "mbuf_t" typedef which
is no longer used in the FreeBSD code from nfskpiport.h

This patch should not change semantics.
2020-04-25 02:18:59 +00:00
Rick Macklem
897d7d45ba Make the NFSv4.n client's recovery from NFSERR_BADSESSION RFC5661 conformant.
RFC5661 specifies that a client's recovery upon receipt of NFSERR_BADSESSION
should first consist of a CreateSession operation using the extant ClientID.
If that fails, then a full recovery beginning with the ExchangeID operation
is to be done.
Without this patch, the FreeBSD client did not attempt the CreateSession
operation with the extant ClientID and went directly to a full recovery
beginning with ExchangeID. I have had this patch several years, but since
no extant NFSv4.n server required the CreateSession with extant ClientID,
I have never committed it.
I an committing it now, since I suspect some future NFSv4.n server will
require this and it should not negatively impact recovery for extant NFSv4.n
servers, since they should all return NFSERR_STATECLIENTID for this first
CreateSession.

The patched client has been tested for recovery against both the FreeBSD
and Linux NFSv4.n servers and no problems have been observed.

MFC after:	1 month
2020-04-22 21:00:14 +00:00
Rick Macklem
0bda1ddd33 Fix the NFSv4.2 extended attribute support for remove extended attrbute.
I missed the "atomic" field of the RemoveExtendedAttribute operation's
reply when I implemented it. It worked between FreeBSD client and server,
since it was missed for both, but it did not conform to RFC 8276.
This patch adds the field for both client and server.

Thanks go to Frank for doing interoperability testing of the extended
attribute support against patches for Linux.

Submitted by:	Frank van der Linden <fllinden@amazon.com>
Reported by:	Frank van der Linden <fllinden@amazon.com>
2020-04-15 21:27:52 +00:00
Rick Macklem
fb8ed4c5f8 Fix the NFSv2 extended attribute support to handle 0 length attributes.
I did not realize that zero length attributes are allowed, but they are.
This patch fixes the NFSv4.2 client and server to handle zero length
extended attributes correctly.

Submitted by:	Frank van der Linden <fllinden@amazon.com> (earlier version)
Reported by:	Frank van der Linden <fllinder@amazon.com>
2020-04-14 22:57:21 +00:00
Rick Macklem
e3e7c612f3 Replace mbuf macros with the code they would generate in the NFS code.
When the code was ported to Mac OS/X, mbuf handling functions were
converted to using the Mac OS/X accessor functions. For FreeBSD, they
are a simple set of macros in sys/fs/nfs/nfskpiport.h.
Since porting to Mac OS/X is no longer a consideration, replacement of
these macros with the code generated by them makes the code more
readable.
When support for external page mbufs is added as needed by the KERN_TLS,
the patch becomes simpler if done without the macros.

This patch should not result in any semantic change.

This is the final patch of this series and the macros should now be
able to be deleted from the .h files in a future commit.
2020-04-11 23:37:58 +00:00
Rick Macklem
3133bbf7a4 Replace mbuf macros with the code they would generate in the NFS code.
When the code was ported to Mac OS/X, mbuf handling functions were
converted to using the Mac OS/X accessor functions. For FreeBSD, they
are a simple set of macros in sys/fs/nfs/nfskpiport.h.
Since porting to Mac OS/X is no longer a consideration, replacement of
these macros with the code generated by them makes the code more
readable.
When support for external page mbufs is added as needed by the KERN_TLS,
the patch becomes simpler if done without the macros.

This patch should not result in any semantic change.
This conversion will be committed one file at a time.
2020-04-10 22:42:14 +00:00
Rick Macklem
8de97f394e Remove the old NFS lock device driver that uses Giant.
This NFS lock device driver was replaced by the kernel NLM around FreeBSD7 and
has not normally been used since then.
To use it, the kernel had to be built without "options NFSLOCKD" and
the nfslockd.ko had to be deleted as well.
Since it uses Giant and is no longer used, this patch removes it.

With this device driver removed, there is now a lot of unused code
in the userland rpc.lockd. That will be removed on a future commit.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D22933
2020-04-09 14:44:46 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Konstantin Belousov
0ff51c98d1 Fix NFS client deadlock when read reports truncated node.
If node attribute returned in the reply for read rpc indicate
truncation, and it happens that the vnode is exclusively locked,
update of the node attributes would try to shrink vnode size.  Since
during the read some vnode pages were busied by the reading thread,
vnode_pager_setsize() deadlocks waiting for the busy state owned by
the caller.

Use a thread-local flag to indicate that NFS read owns some (s)busy
pages states and postpone the call to vnode_pager_setsize() until the
thread relinguishes the ownership.

Diagnosed by:	rlibby
Tested by:	pho, rlibby
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-02-22 20:50:30 +00:00
Kyle Evans
6a5abb1ee5 Provide O_SEARCH
O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping
permissions checks on the directory itself after the initial open(). This is
close to the semantics we've historically applied for O_EXEC on a directory,
which is UB according to POSIX. Conveniently, O_SEARCH on a file is also
explicitly undefined behavior according to POSIX, so O_EXEC would be a fine
choice. The spec goes on to state that O_SEARCH and O_EXEC need not be
distinct values, but they're not defined to be the same value.

This was pointed out as an incompatibility with other systems that had made
its way into libarchive, which had assumed that O_EXEC was an alias for
O_SEARCH.

This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC
respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a
directory is checked in vn_open_vnode already, so for completeness we add a
NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not
re-check that when descending in namei.

[0] https://pubs.opengroup.org/onlinepubs/9699919799/

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23247
2020-02-02 16:34:57 +00:00
Mateusz Guzik
b249ce48ea vfs: drop the mostly unused flags argument from VOP_UNLOCK
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D21427
2020-01-03 22:29:58 +00:00
Rick Macklem
05dcd5d2c8 Fix nfsmount() so that it will return NFSERR_MINORVERMISMATCH.
If nfsrpc_getdirpath() returns NFSERR_MINORVERMISMATCH, it would erroneously
get mapped to EIO. This was not particularily harmful, but would make it
hard for sysadmins to diagnose why an NFSv4 mount is failing.

mount_nfs.c still needs to be fixed so that it does not report
NFSERR_MINORVERMISMATCH as an unknown error 10021.

MFC after:	1 week
2019-12-25 01:15:38 +00:00
Mateusz Guzik
6fa079fc3f vfs: flatten vop vectors
This eliminates the following loop from all VOP calls:

while(vop != NULL && \
    vop->vop_spare2 == NULL && vop->vop_bypass == NULL)
        vop = vop->vop_default;

Reviewed by:	jeff
Tesetd by:	pho
Differential Revision:	https://reviews.freebsd.org/D22738
2019-12-16 00:06:22 +00:00
Rick Macklem
f808cf7294 Silence some "might not be initialized" warnings for riscv64.
None of these case were actually using the variable(s) uninitialized, but
I figured that silencing the warnings via initializing them made sense.

Some of these predated r355677.
2019-12-13 21:38:08 +00:00
Rick Macklem
bf6ac05aa3 Add some more initializations to quiet riscv build.
The one case in nfs_copy_file_range() was a legitimate case, although
it would probably never occur in practice.
2019-12-13 01:34:25 +00:00
Rick Macklem
c057a37818 Add support for NFSv4.2 to the NFS client and server.
This patch adds support for NFSv4.2 (RFC-7862) and Extended Attributes
(RFC-8276) to the NFS client and server.
NFSv4.2 is comprised of several optional features that can be supported
in addition to NFSv4.1. This patch adds the following optional features:
   - posix_fadvise(POSIX_FADV_WILLNEED/POSIX_FADV_DONTNEED)
   - posix_fallocate()
   - intra server file range copying via the copy_file_range(2) syscall
     --> Avoiding data tranfer over the wire to/from the NFS client.
   - lseek(SEEK_DATA/SEEK_HOLE)
   - Extended attribute syscalls for "user" namespace attributes as defined
     by RFC-8276.

Although this patch is fairly large, it should not affect support for
the other versions of NFS. However it does add two new sysctls that allow
a sysadmin to limit which minor versions of NFSv4 a server supports, allowing
a sysadmin to disable NFSv4.2.

Unfortunately, when the NFS stats structure was last revised, it was assumed
that there would be no additional operations added beyond what was
specified in RFC-7862. However RFC-8276 did add additional operations,
forcing the NFS stats structure to revised again. It now has extra unused
entries in all arrays, so that future extensions to NFSv4.2 can be
accomodated without revising this structure again.

A future commit will update nfsstat(1) to report counts for the new NFSv4.2
specific operations/procedures.

This patch affects the internal interface between the nfscommon, nfscl and
nfsd modules and, as such, they all must be upgraded simultaneously.
I will do a version bump (although arguably not needed), due to this.

This code has survived a "make universe" but has not been built with a
recent GCC. If you encounter build problems, please email me.

Relnotes:	yes
2019-12-12 23:22:55 +00:00
Mateusz Guzik
abd80ddb94 vfs: introduce v_irflag and make v_type smaller
The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.

v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.

Reviewed by:	kib, jeff
Differential Revision:	https://reviews.freebsd.org/D22715
2019-12-08 21:30:04 +00:00
Rick Macklem
a95cd06e9a Delete an unused external declaration.
Since nfsv4_opflag is no longer used in nfs_clcomsubs.c, delete the
external declaration of it. Found during NFSv4.2 code merge.

MFC after:	2 weeks
2019-12-08 16:59:36 +00:00
Konstantin Belousov
9698d99230 In nfs_lock(), recheck vp->v_data after lock before accessing it.
We might race with reclaim, and then this is no longer a nfs vnode, in
which case we do not need to handle deferred vnode_pager_setsize()
either.

Reported by:	rk@ronald.org
PR:	 242184
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-11-29 13:55:56 +00:00
Jeff Roberson
67d0e29304 Replace OBJ_MIGHTBEDIRTY with a system using atomics. Remove the TMPFS_DIRTY
flag and use the same system.

This enables further fault locking improvements by allowing more faults to
proceed with a shared lock.

Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22116
2019-10-29 21:06:34 +00:00
Konstantin Belousov
c6ba06d86c Fix interface between nfsclient and vnode pager.
Make the nfsclient always call vnode_pager_setsize() with the vnode
exclusively locked.  This ensures that page fault always can find the
backing page if the object size check succeeded.  Set VV_VMSIZEVNLOCK
flag on NFS nodes.

The main offender breaking the interface in nfsclient is
nfs_loadattrcache(), which is used whenever server responded with
updated attributes, which can happen on non-changing operations as
well.  Also, iod threads only have buffers locked (and even that is
LK_KERNPROC), but they still may call nfs_loadattrcache() on RPC
response.

Instead of immediately calling vnode_pager_setsize() if server
response indicated changed file size, but the vnode is not exclusively
locked, set a new node flag NVNSETSZSKIP.  When the vnode exclusively
locked, or when we can temporary upgrade the lock to exclusive, call
vnode_pager_setsize(), by providing the nfsclient VOP_LOCK() implementation.

Tested by:	pho
Discussed with:	rmacklem
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D21883
2019-10-22 16:17:38 +00:00
Jeff Roberson
0012f373e4 (4/6) Protect page valid with the busy lock.
Atomics are used for page busy and valid state when the shared busy is
held.  The details of the locking protocol and valid and dirty
synchronization are in the updated vm_page.h comments.

Reviewed by:    kib, markj
Tested by:      pho
Sponsored by:   Netflix, Intel
Differential Revision:        https://reviews.freebsd.org/D21594
2019-10-15 03:45:41 +00:00
Mateusz Guzik
d511f93e45 nfsclient: add root vnode caching
See r353150.

Sponsored by:   The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D21646
2019-10-06 22:17:29 +00:00
Rick Macklem
ee7201a725 Replace all mtx_assert() calls for n_mtx and ncl_iod_mutex with macros.
To be consistent with replacing the mtx_lock()/mtx_unlock() calls on
the NFS node mutex (n_mtx) and ncl_iod_mutex, this patch replaces
all mtx_assert() calls on these mutexes with macros as well.
This will simplify changing these locks to sx locks in a future commit.
However, this change may be delayed indefinitely, since it appears there
is a deadlock when vnode_pager_setsize() is called to shrink the size
and the NFS node lock is held.
There is no semantic change as a result of this commit.

Suggested by:	kib
MFC after:	1 week
2019-09-26 02:54:45 +00:00
Rick Macklem
b662b41e62 Replace all mtx_lock()/mtx_unlock() on the iod lock with macros.
Since the NFS node mutex needs to change to an sx lock so it can be held when
vnode_pager_setsize() is called and the iod lock is held when the NFS node lock
is acquired, the iod mutex will need to be changed to an sx lock as well.
To simply the future commit that changes both the NFS node lock and iod lock
to sx locks, this commit replaces all mtx_lock()/mtx_unlock() calls on the
iod lock with macros.
There is no semantic change as a result of this commit.

I don't know when the future commit will happen and be MFC'd, so I have
set the MFC on this commit to one week so that it can be MFC'd at the same
time.

Suggested by:	kib
MFC after:	1 week
2019-09-24 23:38:10 +00:00
Rick Macklem
5d85e12f44 Replace all mtx_lock()/mtx_unlock() on n_mtx with the macros.
For a long time, some places in the NFS code have locked/unlocked the
NFS node lock with the macros NFSLOCKNODE()/NFSUNLOCKNODE() whereas
others have simply used mtx_lock()/mtx_unlock().
Since the NFS node mutex needs to change to an sx lock so it can be held when
vnode_pager_setsize() is called, replace all occurrences of mtx_lock/mtx_unlock
with the macros to simply making the change to an sx lock in future commit.
There is no semantic change as a result of this commit.

I am not sure if the change to an sx lock will be MFC'd soon, so I put
an MFC of 1 week on this commit so that it could be MFC'd with that commit.

Suggested by:	kib
MFC after:	1 week
2019-09-24 01:58:54 +00:00
Konstantin Belousov
6fd583583b Further refine r352393, only call vnode_pager_setsize() outside the
node lock when shrinking.

This is similar to r252528, applied to the above commit.

Apparently there is a race which makes necessary at least to keep the
n_size and pager size consistent when extending.  Current suspect is
that iod threads perform vnode_pager_setsize() without taking the
vnode lock, which corrupts the file content.

Reported and tested by:	Masachika ISHIZUKA <ish@amail.plala.or.jp>
Discussed with:	rmacklem (related issues)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-09-17 18:41:39 +00:00
Konstantin Belousov
1246ee664b nfscl_loadattrcache: fix rest of the cases to not call
vnode_pager_setsize() under the node mutex.

r248567 moved some calls of vnode_pager_setsize() after the node lock
is unlocked, do the rest now.

Reported and tested by:	peterj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-09-16 13:26:27 +00:00
Conrad Meyer
a6935d085c Remove long-dead BUF_ASSERT_{,UN}HELD assertions
These were fully neutered in r177676 (2008), but not removed at the time for
unclear reasons.  They're totally dead code, so go ahead and yank them now.

No functional change.
2019-09-05 21:43:33 +00:00
Konstantin Belousov
6470c8d3db Rework v_object lifecycle for vnodes.
Current implementation of vnode_create_vobject() and
vnode_destroy_vobject() is written so that it prepared to handle the
vm object destruction for live vnode.  Practically, no filesystems use
this, except for some remnants that were present in UFS till today.
One of the consequences of that model is that each filesystem must
call vnode_destroy_vobject() in VOP_RECLAIM() or earlier, as result
all of them get rid of the v_object in reclaim.

Move the call to vnode_destroy_vobject() to vgonel() before
VOP_RECLAIM().  This makes v_object stable: either the object is NULL,
or it is valid vm object till the vnode reclamation.  Remove code from
vnode_create_vobject() to handle races with the parallel destruction.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D21412
2019-08-29 07:50:25 +00:00
Rick Macklem
6aab442af9 Get rid of extraneous initialization.
Get rid of an extraneous initialization, mainly to keep a static analyser
happy. No semantic change.

PR:		238167
Submitted by:	Alexey Dokuchaev
2019-05-31 03:13:09 +00:00
Rick Macklem
26fd36b29d Clean up silly code case.
This silly code segment has existed in the sources since it was brought
into FreeBSD 10 years ago. I honestly have no idea why this was done.
It was possible that I thought that it might have been better to not
set B_ASYNC for the "else" case, but I can't remember.
Anyhow, this patch gets rid of the if/else that does the same thing
either way, since it looks silly and upsets a static analyser.
This will have no semantic effect on the NFS client.

PR:		238167
2019-05-31 00:56:31 +00:00
Alan Somers
65417f5e27 Remove "struct ucred*" argument from vtruncbuf
vtruncbuf takes a "struct ucred*" argument. AFAICT, it's been unused ever
since that function was first added in r34611. Remove it.  Also, remove some
"struct ucred" arguments from fuse and nfs functions that were only used by
vtruncbuf.

Reviewed by:	cem
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20377
2019-05-24 20:27:50 +00:00
Konstantin Belousov
391918a3c1 Do not flush NFS node from NFS VOP_SET_TEXT().
The more appropriate place to do the flushing is VOP_OPEN().  This was
uncovered because VOP_SET_TEXT() is now called with the vnode'
vm_object rlocked, which is incompatible with the flush operations.

After the move, there is no need for NFS-specific VOP_SET_TEXT
overload.

Sponsored by:	The FreeBSD Foundation
MFC after:	30 days
2019-05-06 08:49:43 +00:00
Konstantin Belousov
78022527bb Switch to use shared vnode locks for text files during image activation.
kern_execve() locks text vnode exclusive to be able to set and clear
VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0
condition.

The change removes VV_TEXT, replacing it with the condition
v_writecount <= -1, and puts v_writecount under the vnode interlock.
Each text reference decrements v_writecount.  To clear the text
reference when the segment is unmapped, it is recorded in the
vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and
v_writecount is incremented on the map entry removal

The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that
v_writecount does not contradict the desired change.  vn_writecheck()
is now racy and its use was eliminated everywhere except access.
Atomic check for writeability and increment of v_writecount is
performed by the VOP.  vn_truncate() now increments v_writecount
around VOP_SETATTR() call, lack of which is arguably a bug on its own.

nullfs bypasses v_writecount to the lower vnode always, so nullfs
vnode has its own v_writecount correct, and lower vnode gets all
references, since object->handle is always lower vnode.

On the text vnode' vm object dealloc, the v_writecount value is reset
to zero, and deadfs vop_unset_text short-circuit the operation.
Reclamation of lowervp always reclaims all nullfs vnodes referencing
lowervp first, so no stray references are left.

Reviewed by:	markj, trasz
Tested by:	mjg, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D19923
2019-05-05 11:20:43 +00:00
Rick Macklem
eeb1f3ed51 Fix the NFSv4 client to safely find processes.
r340744 broke the NFSv4 client, because it replaced pfind_locked() with a
call to pfind(), since pfind() acquires the sx lock for the pid hash and
the NFSv4 already holds a mutex when it does the call.
The patch fixes the problem by recreating a pfind_any_locked() and adding the
functions pidhash_slockall() and pidhash_sunlockall to acquire/release
all of the pid hash locks.
These functions are then used by the NFSv4 client instead of acquiring
the allproc_lock and calling pfind().

Reviewed by:	kib, mjg
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D19887
2019-04-15 01:27:15 +00:00
Rodney W. Grimes
6c1c6ae537 Use IN_foo() macros from sys/netinet/in.h inplace of handcrafted code
There are a few places that use hand crafted versions of the macros
from sys/netinet/in.h making it difficult to actually alter the
values in use by these macros.  Correct that by replacing handcrafted
code with proper macro usage.

Reviewed by:		karels, kristof
Approved by:		bde (mentor)
MFC after:		3 weeks
Sponsored by:		John Gilmore
Differential Revision:	https://reviews.freebsd.org/D19317
2019-04-04 19:01:13 +00:00
Edward Tomasz Napierala
2df8bd90c8 Drop unused 'p' argument to nfsv4_strtogid().
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2019-03-12 15:07:47 +00:00
Edward Tomasz Napierala
0658ac3943 Drop unused 'p' argument to nfsv4_strtouid().
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2019-03-12 15:02:52 +00:00
Simon J. Gerraty
f5fdf82d82 Add _PC_ACL_* to vop_stdpathconf
This avoid EINVAL from tmpfs etc.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D19512
2019-03-11 20:40:56 +00:00
Edward Tomasz Napierala
c9172fb4f1 Work around the "nfscl: bad open cnt on server" assertion
that can happen when rerooting into NFSv4 rootfs with kernel
built with INVARIANTS.

I've talked to rmacklem@ (back in 2017), and while the root cause
is still unknown, the case guarded by assertion (nfscl_doclose()
being called from VOP_INACTIVE) is believed to be safe, and the
whole thing seems to run just fine.

Obtained from:	CheriBSD
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2019-02-19 12:45:37 +00:00
Gleb Smirnoff
756a541279 Allocate pager bufs from UMA instead of 80-ish mutex protected linked list.
o In vm_pager_bufferinit() create pbuf_zone and start accounting on how many
  pbufs are we going to have set.
  In various subsystems that are going to utilize pbufs create private zones
  via call to pbuf_zsecond_create(). The latter calls uma_zsecond_create(),
  and sets a limit on created zone. After startup preallocate pbufs according
  to requirements of all pbuf zones.

  Subsystems that used to have a private limit with old allocator now have
  private pbuf zones: md(4), fusefs, NFS client, smbfs, VFS cluster, FFS,
  swap, vnode pager.

  The following subsystems use shared pbuf zone: cam(4), nvme(4), physio(9),
  aio(4). They should have their private limits, but changing that is out of
  scope of this commit.

o Fetch tunable value of kern.nswbuf from init_param2() and while here move
  NSWBUF_MIN to opt_param.h and eliminate opt_swap.h, that was holding only
  this option.
  Default values aren't touched by this commit, but they probably should be
  reviewed wrt to modern hardware.

This change removes a tight bottleneck from sendfile(2) operation, that
uses pbufs in vnode pager. Other pagers also would benefit from faster
allocation.

Together with:	gallatin
Tested by:	pho
2019-01-15 01:02:16 +00:00