freebsd-dev

Author	SHA1	Message	Date
Rick Macklem	ad6dc36520	nfscl: Use vfs.nfs.maxalloclen to limit Deallocate RPC RTT Unlike Copy, the NFSv4.2 Allocate and Deallocate operations do not allow a reply with partial completion. As such, the only way to limit the time the operation takes to provide a reasonable RPC RTT is to limit the size of the allocation/deallocation in the NFSv4.2 client. This patch uses the sysctl vfs.nfs.maxalloclen to set the limit on the size of the Deallocate operation. There is no way to know how long a server will take to do an deallocate operation, but 64Mbytes results in a reasonable RPC RTT for the slow hardware I test on. For an 8Gbyte deallocation, the elapsed time for doing it in 64Mbyte chunks was the same (within margin of variability) as the elapsed time taken for a single large deallocation operation for a FreeBSD server with a UFS file system.	2021-09-18 14:38:43 -07:00
Konstantin Belousov	197a4f29f3	buffer pager: allow get_blksize method to return error Reported and reviewed by: asomers Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31998	2021-09-17 20:29:55 +03:00
Rick Macklem	9ebe4b8c67	nfscl: Add vfs.nfs.maxalloclen to limit Allocate/Deallocate RPC RTT Unlike Copy, the NFSv4.2 Allocate and Deallocate operations do not allow a reply with partial completion. As such, the only way to limit the time the operation takes to provide a reasonable RPC RTT is to limit the size of the allocation/deallocation in the NFSv4.2 client. This patch adds a sysctl called vfs.nfs.maxalloclen to set the limit on the size of the Allocate operation. There is no way to know how long a server will take to do an allocate operation, but 64Mbytes results in a reasonable RPC RTT for the slow hardware I test on, so that is what the default value for vfs.nfs.maxalloclen is set to. For an 8Gbyte allocation, the elapsed time for doing it in 64Mbyte chunks was the same as the elapsed time taken for a single large allocation operation for a FreeBSD server with a UFS file system. MFC after: 2 weeks	2021-09-15 17:29:45 -07:00
Rick Macklem	55089ef4f8	nfscl: Make vfs.nfs.maxcopyrange larger by default As of commit `103b207536`, the NFSv4.2 server will limit the size of a Copy operation based upon a 1 second timeout. The Linux 5.2 kernel server also limits Copy operation size to 4Mbytes. As such, the NFSv4.2 client can attempt a large Copy without resulting in a long RPC RTT for these servers. This patch changes vfs.nfs.maxcopyrange to 64bits and sets the default to the maximum possible size of SSIZE_MAX, since a larger size makes the Copy operation more efficient and allows for copying to complete with fewer RPCs. The sysctl may be need to be made smaller for other non-FreeBSD NFSv4.2 servers. MFC after: 2 weeks	2021-09-11 15:36:32 -07:00
Rick Macklem	08b9cc316a	nfscl: Add a VOP_DEALLOCATE() for the NFSv4.2 client This patch adds a VOP_DEALLOCATE() to the NFS client. For NFSv4.2 servers that support the Deallocate operation, it is used. Otherwise, it falls back on calling vop_stddeallocate(). Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D31640	2021-08-27 18:31:36 -07:00
Rick Macklem	3ad1e1c1ce	nfscl: Add a Lookup+Open RPC for NFSv4.1/4.2 This patch adds a Lookup+Open compound RPC to the NFSv4.1/4.2 NFS client, which can be used by nfs_lookup() so that a subsequent Open RPC is not required. It uses the cn_flags OPENREAD, OPENWRITE added by commit `c18c74a87c`. This reduced the number of RPCs by about 15% for a kernel build over NFS. For now, use of Lookup+Open is only done when the "oneopenown" mount option is used. It may be possible for Lookup+Open to be used for non-oneopenown NFSv4.1/4.2 mounts, but that will require extensive further testing to determine if it works. While here, I've added the changes to the nfscommon module that are needed to implement the Deallocate NFSv4.2 operation. This avoids needing another cycle of changes to the internal KAPI between the NFS modules. This commit has changed the internal KAPI between the NFS modules and, as such, all need to be rebuilt from sources. I have not bumped __FreeBSD_version, since it was bumped a few days ago.	2021-08-11 18:49:26 -07:00
Rick Macklem	efea1bc1fd	nfscl: Cache an open stateid for the "oneopenown" mount option For NFSv4.1/4.2, if the "oneopenown" mount option is used, there is, at most, only one open stateid for each NFS vnode. When an open stateid for a file is acquired, set a pointer to the open structure in the NFS vnode. This pointer can be used to acquire the open stateid without searching the open linked list when the following is true: - No delegations have been issued for the file. Since delegations can outlive an NFS vnode for a file, use the global NFSMNTP_DELEGISSUED flag on the mount to determine this. - No lock stateid has been issued for the file. To determine this, a new NFS vnode flag called NMIGHTBELOCKED is set when a lock stateid is issued, which can then be tested. When this open structure pointer can be used, it avoids the need to acquire the NFSCLSTATELOCK() and searching the open structure list for an open. The NFSCLSTATELOCK() can be highly contended when there are a lot of opens issued for the NFSv4.1/4.2 mount. This patch only affects NFSv4.1/4.2 mounts when the "oneopenown" mount option is used. MFC after: 2 weeks	2021-07-28 15:48:27 -07:00
Rick Macklem	54ff3b3986	nfscl: Set correct lockowner for "oneopenown" mount option For NFSv4.1/4.2, the client may use either an open, lock or delegation stateid as the stateid argument for an I/O operation. RFC 5661 defines an order of preference of delegation, then lock and finally open stateid for the argument, although NFSv4.1/4.2 servers are expected to handle any stateid type. For the "oneopenown" mount option, the lock owner was not being correctly generated and, as such, the I/O operation would use an open stateid, even when a lock stateid existed. Although this did not and should not affect an NFSv4.1/4.2 server's behaviour, this patch makes the behaviour for "oneopenown" the same as when the mount option is not specified. Found during inspection of packet captures. No failure during testing against NFSv4.1/4.2 servers of the unpatched code occurred. MFC after: 2 weeks	2021-07-28 15:23:05 -07:00
Rick Macklem	7685f8344d	nfscl: Send stateid.seqid of 0 for NFSv4.1/4.2 mounts For NFSv4.1/4.2, the client may set the "seqid" field of the stateid to 0 in RPC requests. This indicates to the server that it should not check the "seqid" or return NFSERR_OLDSTATEID if the "seqid" value is not up to date w.r.t. Open/Lock operations on the stateid. This "seqid" is incremented by the NFSv4 server for each Open/OpenDowngrade/Lock/Locku operation done on the stateid. Since a failure return of NFSERR_OLDSTATEID is of no use to the client for I/O operations, it makes sense to set "seqid" to 0 for the stateid argument for I/O operations. This avoids server failure replies of NFSERR_OLDSTATEID, although I am not aware of any case where this failure occurs. This makes the FreeBSD NFSv4.1/4.2 client compatible with the Linux NFSv4.1/4.2 client. MFC after: 2 weeks	2021-07-19 17:35:39 -07:00
Mark Johnston	7a9bc8a82e	nfssvc: Zero the buffer copied out when NFSSVC_DUMPMNTOPTS is set Reported by: KMSAN MFC after: 1 week Sponsored by: The FreeBSD Foundation	2021-07-15 22:41:10 -04:00
Mark Johnston	44de1834b5	nfsclient: Avoid copying uninitialized bytes into statfs hst will be nul-terminated but the remaining space in the buffer is left uninitialized. Avoid copying the entire buffer to ensure that uninitialized bytes are not leaked via statfs(2). Reported by: KMSAN Reviewed by: rmacklem MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31167	2021-07-15 12:18:17 -04:00
Rick Macklem	7f5508fe78	nfscl: Avoid KASSERT() panic in cache_enter_time() Commit `844aa31c6d` added cache_enter_time_flags(), specifically so that the NFS client could specify that cache enter replace any stale entry for the same name. Doing so avoids a KASSERT() panic() in cache_enter_time(), as reported by the PR. This patch uses cache_enter_time_flags() for Readdirplus, to avoid the panic(), since it is impossible for the NFS client to know if another client (or a local process on the NFS server) has replaced a file with another file of the same name. This patch only affects NFS mounts that use the "rdirplus" mount option. There may be other places in the NFS client where this needs to be done, but no panic() has been observed during testing. PR: 257043 MFC after: 2 weeks	2021-07-14 13:33:37 -07:00
Rick Macklem	1e0a518d65	nfscl: Add a Linux compatible "nconnect" mount option Linux has had an "nconnect" NFS mount option for some time. It specifies that N (up to 16) TCP connections are to created for a mount, instead of just one TCP connection. A discussion on freebsd-net@ indicated that this could improve client<-->server network bandwidth, if either the client or server have one of the following: - multiple network ports aggregated to-gether with lagg/lacp. - a fast NIC that is using multiple queues It does result in using more IP port#s and might increase server peak load for a client. One difference from the Linux implementation is that this implementation uses the first TCP connection for all RPCs composed of small messages and uses the additional TCP connections for RPCs that normally have large messages (Read/Readdir/Write). The Linux implementation spreads all RPCs across all TCP connections in a round robin fashion, whereas this implementation spreads Read/Readdir/Write across the additional TCP connections in a round robin fashion. Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D30970	2021-07-08 17:39:04 -07:00
Rick Macklem	a145cf3f73	nfscl: Change the default minor version for NFSv4 mounts When NFSv4.1 support was added to the client, the implementation was still experimental and, as such, the default minor version was set to 0. Since the NFSv4.1 client implementation is now believed to be solid and the NFSv4.1/4.2 protocol is significantly better than NFSv4.0, I beieve that NFSv4.1/4.2 should be used where possible. This patch changes the default minor version for NFSv4 to be the highest minor version supported by the NFSv4 server. If a specific minor version is desired, the "minorversion" mount option can be used to override this default. This is compatible with the Linux NFSv4 client behaviour. This was discussed on freebsd-current@ in mid-May 2021 under the subject "changing the default NFSv4 minor version" and the consensus seemed to be support for this change. It also appeared that changing this for FreeBSD 13.1 was not considered a POLA violation, so long as UPDATING and RELNOTES entries were made for it. MFC after: 2 weeks	2021-06-24 18:52:23 -07:00
Rick Macklem	aed98fa5ac	nfscl: Make NFSv4.0 client acquisition NFSv4.1/4.2 compatible When the NFSv4.0 client was implemented, acquisition of a clientid via SetClientID/SetClientIDConfirm was done upon the first Open, since that was when it was needed. NFSv4.1/4.2 acquires the clientid during mount (via ExchangeID/CreateSession), since the associated session is required during mount. This patch modifies the NFSv4.0 mount so that it acquires the clientid during mount. This simplifies the code and makes it easy to implement "find the highest minor version supported by the NFSv4 server", which will be done for the default minorversion in a future commit. The "start_renewthread" argument for nfscl_getcl() is replaced by "tryminvers", which will be used by the aforementioned future commit. MFC after: 2 weeks	2021-06-15 17:48:51 -07:00
Rick Macklem	5e5ca4c8fc	nfscl: Add a "has acquired a delegation" flag for delegations A problem was reported via email, where a large (130000+) accumulation of NFSv4 opens on an NFSv4 mount caused significant lock contention on the mutex used to protect the client mount's open/lock state. Although the root cause for the accumulation of opens was not resolved, it is obvious that the NFSv4 client is not designed to handle 100000+ opens efficiently. For a common case where delegations are not being issued by the NFSv4 server, the code acquires the mutex lock for open/lock state, finds the delegation list empty and just unlocks the mutex and returns. This patch adds an NFS mount point flag that is set when a delegation is issued for the mount. Then the patched code checks for this flag before acquiring the open/lock mutex, avoiding the need to acquire the lock for the case where delegations are not being issued by the NFSv4 server. This change appears to be performance neutral for a small number of opens, but should reduce lock contention for a large number of opens for the common case where server is not issuing delegations. This commit should not affect the high level semantics of delegation handling. MFC after: 2 weeks	2021-06-09 08:00:43 -07:00
Rick Macklem	03c81af249	nfscl: Fix generation of va_fsid for a tree of NFSv4 server file systems Pre-r318997 the code looked like: if (vp->v_mount->mnt_stat.f_fsid.val[0] != (uint32_t)np->n_vattr.na_filesid[0]) vap->va_fsid = (uint32_t)np->n_vattr.na_filesid[0]; Doing this assignment got lost by r318997 and, as such, NFSv4 mounts of servers with trees of file systems on the server is broken, due to duplicate fileno values for the same st_dev/va_fsid. Although I could have re-introduced the assignment, since the value of na_filesid[0] is not guaranteed to be unique across the server file systems, I felt it was better to always do the hash for na_filesid[0,1]. Since dev_t (st_dev/va_fsid) is now 64bits, I switched to a 64bit hash. There is a slight chance of a hash conflict where 2 different na_filesid values map to same va_fsid, which will be documented in the BUGS section of the man page for mount_nfs(8). Using a table to keep track of mappings to catch conflicts would not easily scale to 10,000+ server file systems and, when the conflict occurs, it only results in fts(3) reporting a "directory cycle" under certain circumstances. Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D30660	2021-06-07 13:48:25 -07:00
Rick Macklem	96b40b8967	nfscl: Use hash lists to improve expected search performance for opens A problem was reported via email, where a large (130000+) accumulation of NFSv4 opens on an NFSv4 mount caused significant lock contention on the mutex used to protect the client mount's open/lock state. Although the root cause for the accumulation of opens was not resolved, it is obvious that the NFSv4 client is not designed to handle 100000+ opens efficiently. When searching for an open, usually for a match by file handle, a linear search of all opens is done. Commit `3f7e14ad93` added a hash table of lists hashed on file handle for the opens. This patch uses the hash lists for searching for a matching open based of file handle instead of an exhaustive linear search of all opens. This change appears to be performance neutral for a small number of opens, but should improve expected performance for a large number of opens. This commit should not affect the high level semantics of open handling. MFC after: 2 weeks	2021-05-27 19:08:36 -07:00
Rick Macklem	724072ab1d	nfscl: Use hash lists to improve expected search performance for opens A problem was reported via email, where a large (130000+) accumulation of NFSv4 opens on an NFSv4 mount caused significant lock contention on the mutex used to protect the client mount's open/lock state. Although the root cause for the accumulation of opens was not resolved, it is obvious that the NFSv4 client is not designed to handle 100000+ opens efficiently. When searching for an open, usually for a match by file handle, a linear search of all opens is done. Commit `3f7e14ad93` added a hash table of lists hashed on file handle for the opens. This patch uses the hash lists for searching for a matching open based of file handle instead of an exhaustive linear search of all opens. This change appears to be performance neutral for a small number of opens, but should improve expected performance for a large number of opens. This patch also moves any found match to the front of the hash list, to try and maintain the hash lists in recently used ordering (least recently used at the end of the list). This commit should not affect the high level semantics of open handling. MFC after: 2 weeks	2021-05-25 14:19:29 -07:00
Rick Macklem	3f7e14ad93	nfscl: Add hash lists for the NFSv4 opens A problem was reported via email, where a large (130000+) accumulation of NFSv4 opens on an NFSv4 mount caused significant lock contention on the mutex used to protect the client mount's open/lock state. Although the root cause for the accumulation of opens was not resolved, it is obvious that the NFSv4 client is not designed to handle 100000+ opens efficiently. When searching for an open, usually for a match by file handle, a linear search of all opens is done. This patch adds a table of hash lists for the opens, hashed on file handle. This table will be used by future commits to search for an open based on file handle more efficiently. MFC after: 2 weeks	2021-05-22 14:53:56 -07:00
Rick Macklem	c28cb257dd	nfscl: Fix NFSv4.1/4.2 mount recovery from an expired lease The most difficult NFSv4 client recovery case happens when the lease has expired on the server. For NFSv4.0, the client will receive a NFSERR_EXPIRED reply from the server to indicate this has happened. For NFSv4.1/4.2, most RPCs have a Sequence operation and, as such, the client will receive a NFSERR_BADSESSION reply when the lease has expired for these RPCs. The client will then call nfscl_recover() to handle the NFSERR_BADSESSION reply. However, for the expired lease case, the first reclaim Open will fail with NFSERR_NOGRACE. This patch recognizes this case and calls nfscl_expireclient() to handle the recovery from an expired lease. This patch only affects NFSv4.1/4.2 mounts when the lease expires on the server, due to a network partitioning that exceeds the lease duration or similar. MFC after: 2 weeks	2021-05-19 14:52:56 -07:00
Rick Macklem	cb07628d9e	nfscl: Delete unneeded redundant MODULE_DEPEND() calls There are two module declarations in the nfscl.ko module for "nfscl" and "nfs". Both of these declarations had MODULE_DEPEND() calls. This patch deletes the MODULE_DEPEND() calls for "nfs" to avoid confusion with respect to what modules this module is dependent upon. The patch also adds comments explaining why there are two module declarations within the module. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D30102	2021-05-10 17:34:29 -07:00
Rick Macklem	dd02d9d605	nfscl: Add support for va_birthtime to NFSv4 There is a NFSv4 file attribute called TimeCreate that can be used for va_birthtime. r362175 added some support for use of TimeCreate. This patch completes support of va_birthtime by adding support for setting this attribute to the server. It also eanbles the client to acquire and set the attribute for a NFSv4 server that supports the attribute. Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D30156	2021-05-07 17:30:56 -07:00
Mark Johnston	8bde6d15d1	nfsclient: Copy only initialized fields in nfs_getattr() When loading attributes from the cache, the NFS client is careful to copy only the fields that it initialized. After fetching attributes from the server, however, it would copy the entire vattr structure initialized from the RPC response, so uninitialized stack bytes would end up being copied to userspace. In particular, va_birthtime (v2 and v3) and va_gen (v3) had this problem. Use a common subroutine to copy fields provided by the NFS client, and ensure that we provide a dummy va_gen for the v3 case. Reviewed by: rmacklem Reported by: KMSAN MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30090	2021-05-04 08:53:57 -04:00
Rick Macklem	0755df1eee	nfscl: fix typo in a comment MFC after: 2 weeks	2021-05-03 18:29:27 -07:00
Rick Macklem	f6fec55fe3	nfscl: add check for NULL clp and forced dismounts to nfscl_delegreturnvp() Commit `aad780464f` added a function called nfscl_delegreturnvp() to return delegations during the NFS VOP_RECLAIM(). The function erroneously assumed that nm_clp would be non-NULL. It will be NULL for NFSV4.0 mounts until a regular file is opened. It will also be NULL during vflush() in nfs_unmount() for a forced dismount. This patch adds a check for clp == NULL to fix this. Also, since it makes no sense to call nfscl_delegreturnvp() during a forced dismount, the patch adds a check for that case and does not do the call during forced dismounts. PR: 255436 Reported by: ish@amail.plala.or.jp MFC after: 2 weeks	2021-04-27 17:30:16 -07:00
Rick Macklem	aad780464f	nfscl: return delegations in the NFS VOP_RECLAIM() After a vnode is recycled it can no longer be acquired via vfs_hash_get() and, as such, a delegation for the vnode cannot be recalled. In the unlikely event that a delegation still exists when the vnode is being recycled, return the delegation since it will no longer be recallable. Until you have this patch in your NFSv4 client, you should consider avoiding the use of delegations. MFC after: 2 weeks	2021-04-25 17:57:55 -07:00
Rick Macklem	02695ea890	nfscl: fix delegation recall when the file is not open Without this patch, if a NFSv4 server recalled a delegation when the file is not open, the renew thread would block in the NFS VOP_INACTIVE() trying to acquire the client state lock that it already holds. This patch fixes the problem by delaying the vrele() call until after the client state lock is released. This bug has been in the NFSv4 client for a long time, but since it only affects delegation when recalled due to another client opening the file, it got missed during previous testing. Until you have this patch in your client, you should avoid the use of delegations. MFC after: 2 weeks	2021-04-25 12:55:00 -07:00
Konstantin Belousov	8cca7b7f28	nfs client: depend on xdr Since `7763814fc9` nfsrpc_setclient() uses mem_alloc() that is macro around malloc(M_RPC). M_RPC is provided by xdr.ko. Reviewed by: rmacklem Sponsored by: Mellanox Technologies/NVidia Networking MFC after: 1 week	2021-04-13 18:04:43 +03:00
Rick Macklem	7763814fc9	nfsv4 client: do the BindConnectionToSession as required During a recent testing event, it was reported that the NFSv4.1/4.2 server erroneously bound the back channel to a new TCP connection. RFC5661 specifies that the fore channel is implicitly bound to a new TCP connection when an RPC with Sequence (almost any of them) is done on it. For the back channel to be bound to the new TCP connection, an explicit BindConnectionToSession must be done as the first RPC on the new connection. Since new TCP connections are created by the "reconnect" layer (sys/rpc/clnt_rc.c) of the krpc, this patch adds an optional upcall done by the krpc whenever a new connection is created. The patch also adds the specific upcall function that does a BindConnectionToSession and configures the krpc to call it when required. This is necessary for correct interoperability with NFSv4.1/NFSv4.2 servers when the nfscbd daemon is running. If doing NFSv4.1/NFSv4.2 mounts without this patch, it is recommended that the nfscbd daemon not be running and that the "pnfs" mount option not be specified. PR: 254840 Comments by: asomers MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D29475	2021-04-11 14:34:57 -07:00
Rick Macklem	4e6c2a1ee9	nfsv4 client: factor loop contents out into a separate function Commit `fdc9b2d50f` replaced a couple of while loops with LIST_FOREACH() loops. This patch factors the body of that loop out into a separate function called nfscl_checkown(). This prepares the code for future changes to use a hash table of lists for open searches via file handle. This patch should not result in a semantics change. MFC after: 2 weeks	2021-04-01 15:36:37 -07:00
Rick Macklem	fdc9b2d50f	nfsv4 client: replace while loops with LIST_FOREACH() loops This patch replaces a couple of while() loops with LIST_FOREACH() loops. While here, declare a couple of variables "bool". I think LIST_FOREACH() is preferred and makes the code more readable. This also prepares the code for future changes to use a hash table of lists for open searches via file handle. This patch should not result in a semantics change. MFC after: 2 weeks	2021-03-29 14:14:51 -07:00
Rick Macklem	e61b29ab5d	nfsv4.1/4.2 client: fix handling of delegations for "oneopenown" mnt option If a delegation for a file has been acquired, the "oneopenown" option was ignored when the local open was issued. This could result in multiple openowners/opens for a file, that would be transferred to the server when the delegation was recalled. This would not be serious, but could result in more than one openowner. Since the Amazon/EFS does not issue delegations, this probably never occurs in practice. Spotted during code inspection. This small patch fixes the code so that it checks for "oneopenown" when doing client local opens on a delegation. MFC after: 2 weeks	2021-03-29 12:09:19 -07:00
Rick Macklem	82ee386c2a	nfsv4 client: fix forced dismount when sleeping in the renew thread During a recent NFSv4 testing event a test server caused a hang where "umount -N" failed. The renew thread was sleeping on "nfsv4lck" and the "umount" was sleeping, waiting for the renew thread to terminate. This is the second of two patches that is hoped to fix the renew thread so that it will terminate when "umount -N" is done on the mount. This patch adds a 5second timeout on the msleep()s and checks for the forced dismount flag so that the renew thread will wake up and see the forced dismount flag. Normally a wakeup() will occur in less than 5seconds, but if a premature return from msleep() does occur, it will simply loop around and msleep() again. The patch also adds the "mp" argument to nfsv4_lock() so that it will return when the forced dismount flag is set. While here, replace the nfsmsleep() wrapper that was used for portability with the actual msleep() call. MFC after: 2 weeks	2021-03-23 13:04:37 -07:00
Rick Macklem	fd232a21bb	nfsv4 pnfs client: fix updating of the layout stateid.seqid During a recent NFSv4 testing event a test server was replying NFSERR_OLDSTATEID for layout stateids presented to the server for LayoutReturn operations. Upon rereading RFC5661, it was apparent that the FreeBSD NFSv4.1/4.2 pNFS client did not maintain the seqid field of the layout stateid correctly. This patch is believed to correct the problem. Tested against a FreeBSD pNFS server with diagnostics added to check the stateid's seqid did not indicate problems. Unfortunately, testing aginst this server will not happen in the near future, so the fix may not be correct yet. MFC after: 2 weeks	2021-03-18 12:20:25 -07:00
Gordon Bergling	5666643a95	Fix some common typos in comments - occured -> occurred - normaly -> normally - controling -> controlling - fileds -> fields - insterted -> inserted - outputing -> outputting MFC after: 1 week	2021-03-13 18:26:15 +01:00
Rick Macklem	c04199affe	nfsclient: Fix ReadDS/WriteDS/CommitDS nfsstats RPC counts for a NFSv3 DS During a recent virtual NFSv4 testing event, a bug in the FreeBSD client was detected when doing I/O DS operations on a Flexible File Layout pNFS server. For an NFSv3 DS, the Read/Write/Commit nfsstats were incremented instead of the ReadDS/WriteDS/CommitDS counts. This patch fixes this. Only the RPC counts reported by nfsstat(1) were affected by this bug, the I/O operations were performed correctly. MFC after: 2 weeks	2021-03-02 14:18:23 -08:00
Rick Macklem	94f2e42f5e	nfsclient: Fix the stripe unit size for a File Layout pNFS layout During a recent virtual NFSv4 testing event, a bug in the FreeBSD client was detected when doing a File Layout pNFS DS I/O operation. The size of the I/O operation was smaller than expected. The I/O size is specified as a stripe unit size in bits 6->31 of nflh_util in the layout. I had misinterpreted RFC5661 and had shifted the value right by 6 bits. The correct interpretation is to use the value as presented (it is always an exact multiple of 64), clearing bits 0->5. This patch fixes this. Without the patch, I/O through the DSs work, but the I/O size is 1/64th of what is optimal. MFC after: 2 weeks	2021-03-01 12:49:32 -08:00
Rick Macklem	15bed8c46b	nfsclient: add nfs node locking around uses of n_direofoffset During code inspection I noticed that the n_direofoffset field of the NFS node was being manipulated without any lock being held to make it SMP safe. This patch adds locking of the NFS node's mutex around handling of n_direofoffset to make it SMP safe. I have not seen any failure that could be attributed to n_direofoffset being manipulated concurrently by multiple processors, but I think this is possible, since directories are read with shared vnode locking, plus locks only on individual buffer cache blocks. However, there have been as yet unexplained issues w.r.t reading large directories over NFS that could have conceivably been caused by concurrent manipulation of n_direofoffset. MFC after: 2 weeks	2021-02-28 14:53:54 -08:00
Rick Macklem	3e04ab36ba	nfsclient: add checks for a server returning the current directory Commit `3fe2c68ba2` dealt with a panic in cache_enter_time() where the vnode referred to the directory argument. It would also be possible to get these panics if a broken NFS server were to return the directory as an new object being created within the directory or in a Lookup reply. This patch adds checks to avoid the panics and logs messages to indicate that the server is broken for the file object creation cases. Reviewd by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28987	2021-02-28 14:15:32 -08:00
Rick Macklem	3fe2c68ba2	nfsclient: fix panic in cache_enter_time() Juraj Lutter (otis@) reported a panic "dvp != vp not true" in cache_enter_time() called from the NFS client's nfsrpc_readdirplus() function. This is specific to an NFSv3 mount with the "rdirplus" mount option. Unlike NFSv4, NFSv3 replies to ReaddirPlus includes entries for the current directory. This trivial patch avoids doing a cache_enter_time() call for the current directory to avoid the panic. Reported by: otis Tested by: otis Reviewed by: mjg MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D28969	2021-02-27 17:54:05 -08:00
Alexander V. Chernikov	605284b894	Enforce net epoch in in6_selectsrc(). in6_selectsrc() may call fib6_lookup() in some cases, which requires epoch. Wrap in6_selectsrc* calls into epoch inside its users. Mark it as requiring epoch by adding NET_EPOCH_ASSERT(). MFC after: 1 weeek Differential Revision: https://reviews.freebsd.org/D28647	2021-02-15 22:33:12 +00:00
Konstantin Belousov	bd01a69f48	nfs_write(): do not call ncl_pager_setsize() after clearing TDP2_SBPAGES This might unnecessary truncate file undoing extension done by the write. Reported by: Yasuhiro Kimura <yasu@utahime.org> Reviewed by: rmacklem Tested by: rmacklem, Yasuhiro Kimura <yasu@utahime.org> MFC after: 6 days Sponsored by: The FreeBSD Foundation	2021-01-25 01:02:03 +02:00
Konstantin Belousov	aa8c1f8d84	nfs client: block vnode_pager_setsize() calls from nfscl_loadattrcache in nfs_write Otherwise writing thread might wait on sbusy state of the pages which were busied by itself, similarly to nfs_read(). But also we need to clear NVNSETSZKSIP flag possibly set by ncl_pager_setsize(), to not undo extension done by write. Reported by: bdrewery Reviewed by: rmacklem Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28306	2021-01-23 17:24:32 +02:00
Mateusz Guzik	6b3a9a0f3d	Convert remaining cap_rights_init users to cap_rights_init_one semantic patch: @@ expression rights, r; @@ - cap_rights_init(&rights, r) + cap_rights_init_one(&rights, r)	2021-01-12 13:16:10 +00:00
Rick Macklem	665b1365fe	Add a new "tlscertname" NFS mount option. When using NFS-over-TLS, an NFS client can optionally provide an X.509 certificate to the server during the TLS handshake. For some situations, such as different NFS servers or different certificates being mapped to different user credentials on the NFS server, there may be a need for different mounts to provide different certificates. This new mount option called "tlscertname" may be used to specify a non-default certificate be provided. This alernate certificate will be stored in /etc/rpc.tlsclntd in a file with a name based on what is provided by this mount option.	2020-12-23 13:42:55 -08:00
Mateusz Guzik	ab21ed17ed	vfs: drop the de facto curthread argument from VOP_INACTIVE	2020-10-20 07:19:03 +00:00
Rick Macklem	9f669985b2	Modify the NFSv4.2 VOP_COPY_FILE_RANGE() client call to return after one successful RPC. Without this patch, the NFSv4.2 VOP_COPY_FILE_RANGE() client call would loop until the copy "len" was completed. The problem with doing this is that it might take a considerable time to complete for a large "len". By returning after a single successful Copy RPC that copied some of the data, the application that did the copy_file_range(2) syscall will be more responsive to signal delivery for large "len" copies.	2020-10-01 00:47:35 +00:00
Mateusz Guzik	586ee69f09	fs: clean up empty lines in .c and .h files	2020-09-01 21:18:40 +00:00
Rick Macklem	4cdbb07b3c	Add a check to test for the case of the "tls" option being used with "udp". The KERN_TLS only supports TCP, so use of the "tls" option with "udp" will not work. This patch adds a test for this case, so that the mount is not attempted when both "tls" and "udp" are specified.	2020-09-01 01:10:16 +00:00
Rick Macklem	6e4b6ff88f	Add flags to enable NFS over TLS to the NFS client and server. An Internet Draft titled "Towards Remote Procedure Call Encryption By Default" (soon to be an RFC I think) describes how Sun RPC is to use TLS with NFS as a specific application case. Various commits prepared the NFS code to use KERN_TLS, mainly enabling use of ext_pgs mbufs for large RPC messages. r364475 added TLS support to the kernel RPC. This commit (which is the final one for kernel changes required to do NFS over TLS) adds support for three export flags: MNT_EXTLS - Requires a TLS connection. MNT_EXTLSCERT - Requires a TLS connection where the client presents a valid X.509 certificate during TLS handshake. MNT_EXTLSCERTUSER - Requires a TLS connection where the client presents a valid X.509 certificate with "user@domain" in the otherName field of the SubjectAltName during TLS handshake. Without these export options, clients are permitted, but not required, to use TLS. For the client, a new nmount(2) option called "tls" makes the client do a STARTTLS Null RPC and TLS handshake for all TCP connections used for the mount. The CLSET_TLS client control option is used to indicate to the kernel RPC that this should be done. Unless the above export flags or "tls" option is used, semantics should not change for the NFS client nor server. For NFS over TLS to work, the userspace daemons rpctlscd(8) { for client } or rpctlssd(8) daemon { for server } must be running.	2020-08-27 23:57:30 +00:00
Mateusz Guzik	8f226f4c23	vfs: remove the always-curthread td argument from VOP_RECLAIM	2020-08-19 07:28:01 +00:00
Rick Macklem	808306dd0f	Delete the unused "use_ext" argument to nfscl_reqstart(). This is a partial revert of r363210, since the "use_ext" argument added by that commit is not actually useful. This patch should not result in any semantics change.	2020-08-18 01:41:12 +00:00
Mateusz Guzik	a92a971bbb	vfs: remove the thread argument from vget It was already asserted to be curthread. Semantic patch: @@ expression arg1, arg2, arg3; @@ - vget(arg1, arg2, arg3) + vget(arg1, arg2)	2020-08-16 17:18:54 +00:00
Rick Macklem	90cf38f22e	Fix a bug introduced by r363001 for the ext_pgs case. r363001 added support for ext_pgs mbufs to nfsm_uiombuf(). By inspection, I noticed that "mlen" was not set non-zero and, as such, there would be an iteration of the loop that did nothing. This patch sets it. This bug would have no effect on the system, since the ext_pgs mbuf code is not yet enabled.	2020-08-12 04:35:49 +00:00
Rick Macklem	02511d2112	Add an argument to newnfs_connect() that indicates use TLS for the connection. For NFSv4.0, the server creates a server->client TCP connection for callbacks. If the client mount on the server is using TLS, enable TLS for this callback TCP connection. TLS connections from clients will not be supported until the kernel RPC changes are committed. Since this changes the internal ABI between the NFS kernel modules that will require a version bump, delete newnfs_trimtrailing(), which is no longer used. Since LCL_TLSCB is not yet set, these changes should not have any semantic affect at this time.	2020-08-11 00:26:45 +00:00
Mateusz Guzik	d292b1940c	vfs: remove the obsolete privused argument from vaccess This brings argument count down to 6, which is passable without the stack on amd64.	2020-08-05 09:27:03 +00:00
Rick Macklem	cfaafa7908	Add support for ext_pgs mbufs to nfsm_uiombuflist() and nfsm_split(). This patch uses a slightly different algorithm for nfsm_uiombuflist() for the non-ext_pgs case, where a variable called "mcp" is maintained, pointing to the current location that mbuf data can be filled into. This avoids use of mtod(mp, char *) + mp->m_len to calculate the location, since this does not work for ext_pgs mbufs and I think it makes the algorithm more readable. This change should not result in semantic changes for the non-ext_pgs case. The patch also deletes come unneeded code. It also adds support for anonymous page ext_pgs mbufs to nfsm_split(). This is another in the series of commits that add support to the NFS client and server for building RPC messages in ext_pgs mbufs with anonymous pages. This is useful so that the entire mbuf list does not need to be copied before calling sosend() when NFS over TLS is enabled. At this time for this case, use of ext_pgs mbufs cannot be enabled, since ktls_encrypt() replaces the unencrypted data with encrypted data in place. Until such time as this can be enabled, there should be no semantic change. Also, note that this code is only used by the NFS client for a mirrored pNFS server.	2020-07-24 23:17:09 +00:00
Rick Macklem	9516bcdfb4	Modify writing to mirrored pNFS DSs to prepare for use of ext_pgs mbufs. This patch modifies writing to mirrored pNFS DSs slightly so that there is only one m_copym() call for a mirrored pair instead of two of them. This call replaces the custom nfsm_copym() call, which is no longer needed and deleted by this patch. The patch does introduce a new nfsm_split() function that only calls m_split() for the non-ext_pgs case. The semantics of nfsm_uiombuflist() is changed to include code that nul pads the generated mbuf list. This was done by nfsm_copym() prior to this patch. The main reason for this change is that it allows the data to be a list of ext_pgs mbufs, since the m_copym() is for the entire mbuf list. This support will be added in a future commit. This patch only affects writing to mirrored flexible file layout pNFS servers.	2020-07-22 23:33:37 +00:00
Alexander V. Chernikov	e1c05fd290	Transition from rtrequest1_fib() to rib_action(). Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib, in6_rtrequest, rtrequest_fib> and their uses and switch to to rib_action(). This is part of the new routing KPI. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25546	2020-07-21 19:56:13 +00:00
Alexander V. Chernikov	725871230d	Temporarly revert r363319 to unbreak the build. Reported by: CI Pointy hat to: melifaro	2020-07-19 10:53:15 +00:00
Alexander V. Chernikov	8cee15d9e4	Transition from rtrequest1_fib() to rib_action(). Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib, in6_rtrequest, rtrequest_fib> and their uses and switch to to rib_action(). This is part of the new routing KPI. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25546	2020-07-19 09:29:27 +00:00
Rick Macklem	7477442fdd	Fix the pNFS flexible file layout client for servers with small write size. The code in nfscl_dofflayout() loops when a flexible file layout server provides a small write data limit (no extant server is known to do this). If/when it looped, it erroneously reused the "drpc" argument for the mirror worker thread, corrupting it. This patch fixes the problem by only using the calling thread after the first loop iteration. Found during testing by simulating a server with a small write size. Since no extant pNFS server is known to provide a small write size, this fix it not needed in practice at this time. MFC after: 2 weeks	2020-07-15 01:26:28 +00:00
Rick Macklem	6722f6e577	Minor code cleanup that removes "nd->nd_bpos = mcp;" in both if and else. The statement "nd->nd_bpos = mcp;" was in both the if and else. Correct, but potentially confusing. This patch fixes this. There should be no semantics change caused by this commit.	2020-07-13 01:28:45 +00:00
Rick Macklem	3eaf03766e	Add support for ext_pgs mbufs to nfsm_uiombuf(). This patch uses a slightly different algorithm for the non-ext_pgs case, where a variable called "mcp" is maintained, pointing to the current location that mbuf data can be filled into. This avoids use of mtod(mp, char *) + mp->m_len to calculate the location, since this does not work for ext_pgs mbufs and I think it makes the algorithm more readable. This change should not result in semantic changes for the non-ext_pgs case. This is another in the series of commits that add support to the NFS client and server for building RPC messages in ext_pgs mbufs with anonymous pages. This is useful so that the entire mbuf list does not need to be copied before calling sosend() when NFS over TLS is enabled. Since ND_EXTPG is never set yet, there is no semantic change at this time.	2020-07-08 02:28:08 +00:00
Rick Macklem	4476c1def0	Add a boolean argument to nfscl_reqstart() to indicate that ext_pgs mbufs should be used. For KERN_TLS (and possibly some other future network interface) the mbuf list passed into sosend() must be ext_pgs mbufs. The krpc could simply copy all the mbuf data into ext_pgs mbufs before calling sosend(), but that would be inefficient for large RPC messages. This patch adds an argument to nfscl_reqstart() to indicate that it should fill the RPC message into ext_pgs mbufs. It also adds fields to "struct nfsrv_descript" needed for building NFS RPC messages in ext_pgs mbufs, along with new flags for this. Since the argument is always "false", this commit should not result in any semantic change. However, this commit prepares the code for future commits that will add support for building of NFS RPC messages in ext_pgs mbufs.	2020-06-26 03:11:54 +00:00
Alan Somers	eea79fde5a	Remove vfs_statfs and vnode_mount macros from NFS These macro definitions are no longer needed as the NFS OSX port is long dead. The vfs_statfs macro conflicts with the vfsops field of the same name. Submitted by: shivank@ Reviewed by: rmacklem MFC after: 2 weeks Sponsored by: Google, Inc. (GSoC 2020) Differential Revision: https://reviews.freebsd.org/D25263	2020-06-17 16:20:19 +00:00
Alexander V. Chernikov	9d5df78e64	Fix NOINET6 build broken by r361575. Reported by: ci, hps	2020-05-28 09:52:28 +00:00
Alexander V. Chernikov	c74ce5cca3	Make NFS address selection use fib4_lookup(). fib4_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-stack data copying. Switch call to use new fib4_lookup(), allowing to eventually deprecate old api. Differential Revision: https://reviews.freebsd.org/D24977	2020-05-28 07:35:07 +00:00
Alexander V. Chernikov	2bbab0af6d	Use epoch(9) for rtentries to simplify control plane operations. Currently the only reason of refcounting rtentries is the need to report the rtable operation details immediately after the execution. Delaying rtentry reclamation allows to stop refcounting and simplify the code. Additionally, this change allows to reimplement rib_lookup_info(), which is used by some of the customers to get the matching prefix along with nexthops, in more efficient way. The change keeps per-vnet rtzone uma zone. It adds nh_vnet field to nhop_priv to be able to reliably set curvnet even during vnet teardown. Rest of the reference counting code will be removed in the D24867 . Differential Revision: https://reviews.freebsd.org/D24866	2020-05-23 10:21:02 +00:00
Ryan Moeller	b9cc3262bc	nfs: Remove APPLESTATIC macro It is no longer useful. Reviewed by: rmacklem Approved by: mav (mentor) MFC after: 1 week Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D24811	2020-05-12 13:23:25 +00:00
Ryan Moeller	32033b3d30	Remove APPLEKEXT ifndefs They are no longer useful. Reviewed by: rmacklem Approved by: mav (mentor) MFC after: 1 week Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D24752	2020-05-08 14:39:38 +00:00
Rick Macklem	5ecf33c6c4	Get rid of uio_XXX macros used for the Mac OS/X port. The NFS code had a bunch of Mac OS/X accessor functions named uio_XXX left over from the port to Mac OS/X. Since that port is long forgotten, replace the calls with the code generated by the FreeBSD macros for these in nfskpiport.h. This allows the macros to be deleted from nfskpiport.h and I think makes the code more readable. This patch should not result in any semantic change.	2020-04-28 02:11:02 +00:00
Rick Macklem	e4a458bb1b	Remove Mac OS/X macros that did nothing for FreeBSD. The macros CAST_USER_ADDR_T() and CAST_DOWN() were used for the Mac OS/X port. The first of these macros was a no-op for FreeBSD and the second is no longer used. This patch gets rid of them. It also deletes the "mbuf_t" typedef which is no longer used in the FreeBSD code from nfskpiport.h This patch should not change semantics.	2020-04-25 02:18:59 +00:00
Rick Macklem	897d7d45ba	Make the NFSv4.n client's recovery from NFSERR_BADSESSION RFC5661 conformant. RFC5661 specifies that a client's recovery upon receipt of NFSERR_BADSESSION should first consist of a CreateSession operation using the extant ClientID. If that fails, then a full recovery beginning with the ExchangeID operation is to be done. Without this patch, the FreeBSD client did not attempt the CreateSession operation with the extant ClientID and went directly to a full recovery beginning with ExchangeID. I have had this patch several years, but since no extant NFSv4.n server required the CreateSession with extant ClientID, I have never committed it. I an committing it now, since I suspect some future NFSv4.n server will require this and it should not negatively impact recovery for extant NFSv4.n servers, since they should all return NFSERR_STATECLIENTID for this first CreateSession. The patched client has been tested for recovery against both the FreeBSD and Linux NFSv4.n servers and no problems have been observed. MFC after: 1 month	2020-04-22 21:00:14 +00:00
Rick Macklem	0bda1ddd33	Fix the NFSv4.2 extended attribute support for remove extended attrbute. I missed the "atomic" field of the RemoveExtendedAttribute operation's reply when I implemented it. It worked between FreeBSD client and server, since it was missed for both, but it did not conform to RFC 8276. This patch adds the field for both client and server. Thanks go to Frank for doing interoperability testing of the extended attribute support against patches for Linux. Submitted by: Frank van der Linden <fllinden@amazon.com> Reported by: Frank van der Linden <fllinden@amazon.com>	2020-04-15 21:27:52 +00:00
Rick Macklem	fb8ed4c5f8	Fix the NFSv2 extended attribute support to handle 0 length attributes. I did not realize that zero length attributes are allowed, but they are. This patch fixes the NFSv4.2 client and server to handle zero length extended attributes correctly. Submitted by: Frank van der Linden <fllinden@amazon.com> (earlier version) Reported by: Frank van der Linden <fllinder@amazon.com>	2020-04-14 22:57:21 +00:00
Rick Macklem	e3e7c612f3	Replace mbuf macros with the code they would generate in the NFS code. When the code was ported to Mac OS/X, mbuf handling functions were converted to using the Mac OS/X accessor functions. For FreeBSD, they are a simple set of macros in sys/fs/nfs/nfskpiport.h. Since porting to Mac OS/X is no longer a consideration, replacement of these macros with the code generated by them makes the code more readable. When support for external page mbufs is added as needed by the KERN_TLS, the patch becomes simpler if done without the macros. This patch should not result in any semantic change. This is the final patch of this series and the macros should now be able to be deleted from the .h files in a future commit.	2020-04-11 23:37:58 +00:00
Rick Macklem	3133bbf7a4	Replace mbuf macros with the code they would generate in the NFS code. When the code was ported to Mac OS/X, mbuf handling functions were converted to using the Mac OS/X accessor functions. For FreeBSD, they are a simple set of macros in sys/fs/nfs/nfskpiport.h. Since porting to Mac OS/X is no longer a consideration, replacement of these macros with the code generated by them makes the code more readable. When support for external page mbufs is added as needed by the KERN_TLS, the patch becomes simpler if done without the macros. This patch should not result in any semantic change. This conversion will be committed one file at a time.	2020-04-10 22:42:14 +00:00
Rick Macklem	8de97f394e	Remove the old NFS lock device driver that uses Giant. This NFS lock device driver was replaced by the kernel NLM around FreeBSD7 and has not normally been used since then. To use it, the kernel had to be built without "options NFSLOCKD" and the nfslockd.ko had to be deleted as well. Since it uses Giant and is no longer used, this patch removes it. With this device driver removed, there is now a lot of unused code in the userland rpc.lockd. That will be removed on a future commit. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D22933	2020-04-09 14:44:46 +00:00
Pawel Biernacki	7029da5c36	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
Konstantin Belousov	0ff51c98d1	Fix NFS client deadlock when read reports truncated node. If node attribute returned in the reply for read rpc indicate truncation, and it happens that the vnode is exclusively locked, update of the node attributes would try to shrink vnode size. Since during the read some vnode pages were busied by the reading thread, vnode_pager_setsize() deadlocks waiting for the busy state owned by the caller. Use a thread-local flag to indicate that NFS read owns some (s)busy pages states and postpone the call to vnode_pager_setsize() until the thread relinguishes the ownership. Diagnosed by: rlibby Tested by: pho, rlibby Sponsored by: The FreeBSD Foundation MFC after: 1 week	2020-02-22 20:50:30 +00:00
Kyle Evans	6a5abb1ee5	Provide O_SEARCH O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping permissions checks on the directory itself after the initial open(). This is close to the semantics we've historically applied for O_EXEC on a directory, which is UB according to POSIX. Conveniently, O_SEARCH on a file is also explicitly undefined behavior according to POSIX, so O_EXEC would be a fine choice. The spec goes on to state that O_SEARCH and O_EXEC need not be distinct values, but they're not defined to be the same value. This was pointed out as an incompatibility with other systems that had made its way into libarchive, which had assumed that O_EXEC was an alias for O_SEARCH. This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a directory is checked in vn_open_vnode already, so for completeness we add a NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not re-check that when descending in namei. [0] https://pubs.opengroup.org/onlinepubs/9699919799/ Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23247	2020-02-02 16:34:57 +00:00
Mateusz Guzik	b249ce48ea	vfs: drop the mostly unused flags argument from VOP_UNLOCK Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427	2020-01-03 22:29:58 +00:00
Rick Macklem	05dcd5d2c8	Fix nfsmount() so that it will return NFSERR_MINORVERMISMATCH. If nfsrpc_getdirpath() returns NFSERR_MINORVERMISMATCH, it would erroneously get mapped to EIO. This was not particularily harmful, but would make it hard for sysadmins to diagnose why an NFSv4 mount is failing. mount_nfs.c still needs to be fixed so that it does not report NFSERR_MINORVERMISMATCH as an unknown error 10021. MFC after: 1 week	2019-12-25 01:15:38 +00:00
Mateusz Guzik	6fa079fc3f	vfs: flatten vop vectors This eliminates the following loop from all VOP calls: while(vop != NULL && \ vop->vop_spare2 == NULL && vop->vop_bypass == NULL) vop = vop->vop_default; Reviewed by: jeff Tesetd by: pho Differential Revision: https://reviews.freebsd.org/D22738	2019-12-16 00:06:22 +00:00
Rick Macklem	f808cf7294	Silence some "might not be initialized" warnings for riscv64. None of these case were actually using the variable(s) uninitialized, but I figured that silencing the warnings via initializing them made sense. Some of these predated r355677.	2019-12-13 21:38:08 +00:00
Rick Macklem	bf6ac05aa3	Add some more initializations to quiet riscv build. The one case in nfs_copy_file_range() was a legitimate case, although it would probably never occur in practice.	2019-12-13 01:34:25 +00:00
Rick Macklem	c057a37818	Add support for NFSv4.2 to the NFS client and server. This patch adds support for NFSv4.2 (RFC-7862) and Extended Attributes (RFC-8276) to the NFS client and server. NFSv4.2 is comprised of several optional features that can be supported in addition to NFSv4.1. This patch adds the following optional features: - posix_fadvise(POSIX_FADV_WILLNEED/POSIX_FADV_DONTNEED) - posix_fallocate() - intra server file range copying via the copy_file_range(2) syscall --> Avoiding data tranfer over the wire to/from the NFS client. - lseek(SEEK_DATA/SEEK_HOLE) - Extended attribute syscalls for "user" namespace attributes as defined by RFC-8276. Although this patch is fairly large, it should not affect support for the other versions of NFS. However it does add two new sysctls that allow a sysadmin to limit which minor versions of NFSv4 a server supports, allowing a sysadmin to disable NFSv4.2. Unfortunately, when the NFS stats structure was last revised, it was assumed that there would be no additional operations added beyond what was specified in RFC-7862. However RFC-8276 did add additional operations, forcing the NFS stats structure to revised again. It now has extra unused entries in all arrays, so that future extensions to NFSv4.2 can be accomodated without revising this structure again. A future commit will update nfsstat(1) to report counts for the new NFSv4.2 specific operations/procedures. This patch affects the internal interface between the nfscommon, nfscl and nfsd modules and, as such, they all must be upgraded simultaneously. I will do a version bump (although arguably not needed), due to this. This code has survived a "make universe" but has not been built with a recent GCC. If you encounter build problems, please email me. Relnotes: yes	2019-12-12 23:22:55 +00:00
Mateusz Guzik	abd80ddb94	vfs: introduce v_irflag and make v_type smaller The current vnode layout is not smp-friendly by having frequently read data avoidably sharing cachelines with very frequently modified fields. In particular v_iflag inspected for VI_DOOMED can be found in the same line with v_usecount. Instead make it available in the same cacheline as the v_op, v_data and v_type which all get read all the time. v_type is avoidably 4 bytes while the necessary data will easily fit in 1. Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new flag field with a new value: VIRF_DOOMED. Reviewed by: kib, jeff Differential Revision: https://reviews.freebsd.org/D22715	2019-12-08 21:30:04 +00:00
Rick Macklem	a95cd06e9a	Delete an unused external declaration. Since nfsv4_opflag is no longer used in nfs_clcomsubs.c, delete the external declaration of it. Found during NFSv4.2 code merge. MFC after: 2 weeks	2019-12-08 16:59:36 +00:00
Konstantin Belousov	9698d99230	In nfs_lock(), recheck vp->v_data after lock before accessing it. We might race with reclaim, and then this is no longer a nfs vnode, in which case we do not need to handle deferred vnode_pager_setsize() either. Reported by: rk@ronald.org PR: 242184 Sponsored by: The FreeBSD Foundation MFC after: 3 days	2019-11-29 13:55:56 +00:00
Jeff Roberson	67d0e29304	Replace OBJ_MIGHTBEDIRTY with a system using atomics. Remove the TMPFS_DIRTY flag and use the same system. This enables further fault locking improvements by allowing more faults to proceed with a shared lock. Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D22116	2019-10-29 21:06:34 +00:00
Konstantin Belousov	c6ba06d86c	Fix interface between nfsclient and vnode pager. Make the nfsclient always call vnode_pager_setsize() with the vnode exclusively locked. This ensures that page fault always can find the backing page if the object size check succeeded. Set VV_VMSIZEVNLOCK flag on NFS nodes. The main offender breaking the interface in nfsclient is nfs_loadattrcache(), which is used whenever server responded with updated attributes, which can happen on non-changing operations as well. Also, iod threads only have buffers locked (and even that is LK_KERNPROC), but they still may call nfs_loadattrcache() on RPC response. Instead of immediately calling vnode_pager_setsize() if server response indicated changed file size, but the vnode is not exclusively locked, set a new node flag NVNSETSZSKIP. When the vnode exclusively locked, or when we can temporary upgrade the lock to exclusive, call vnode_pager_setsize(), by providing the nfsclient VOP_LOCK() implementation. Tested by: pho Discussed with: rmacklem Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D21883	2019-10-22 16:17:38 +00:00
Jeff Roberson	0012f373e4	(4/6) Protect page valid with the busy lock. Atomics are used for page busy and valid state when the shared busy is held. The details of the locking protocol and valid and dirty synchronization are in the updated vm_page.h comments. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21594	2019-10-15 03:45:41 +00:00
Mateusz Guzik	d511f93e45	nfsclient: add root vnode caching See r353150. Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21646	2019-10-06 22:17:29 +00:00
Rick Macklem	ee7201a725	Replace all mtx_assert() calls for n_mtx and ncl_iod_mutex with macros. To be consistent with replacing the mtx_lock()/mtx_unlock() calls on the NFS node mutex (n_mtx) and ncl_iod_mutex, this patch replaces all mtx_assert() calls on these mutexes with macros as well. This will simplify changing these locks to sx locks in a future commit. However, this change may be delayed indefinitely, since it appears there is a deadlock when vnode_pager_setsize() is called to shrink the size and the NFS node lock is held. There is no semantic change as a result of this commit. Suggested by: kib MFC after: 1 week	2019-09-26 02:54:45 +00:00
Rick Macklem	b662b41e62	Replace all mtx_lock()/mtx_unlock() on the iod lock with macros. Since the NFS node mutex needs to change to an sx lock so it can be held when vnode_pager_setsize() is called and the iod lock is held when the NFS node lock is acquired, the iod mutex will need to be changed to an sx lock as well. To simply the future commit that changes both the NFS node lock and iod lock to sx locks, this commit replaces all mtx_lock()/mtx_unlock() calls on the iod lock with macros. There is no semantic change as a result of this commit. I don't know when the future commit will happen and be MFC'd, so I have set the MFC on this commit to one week so that it can be MFC'd at the same time. Suggested by: kib MFC after: 1 week	2019-09-24 23:38:10 +00:00
Rick Macklem	5d85e12f44	Replace all mtx_lock()/mtx_unlock() on n_mtx with the macros. For a long time, some places in the NFS code have locked/unlocked the NFS node lock with the macros NFSLOCKNODE()/NFSUNLOCKNODE() whereas others have simply used mtx_lock()/mtx_unlock(). Since the NFS node mutex needs to change to an sx lock so it can be held when vnode_pager_setsize() is called, replace all occurrences of mtx_lock/mtx_unlock with the macros to simply making the change to an sx lock in future commit. There is no semantic change as a result of this commit. I am not sure if the change to an sx lock will be MFC'd soon, so I put an MFC of 1 week on this commit so that it could be MFC'd with that commit. Suggested by: kib MFC after: 1 week	2019-09-24 01:58:54 +00:00
Konstantin Belousov	6fd583583b	Further refine r352393, only call vnode_pager_setsize() outside the node lock when shrinking. This is similar to r252528, applied to the above commit. Apparently there is a race which makes necessary at least to keep the n_size and pager size consistent when extending. Current suspect is that iod threads perform vnode_pager_setsize() without taking the vnode lock, which corrupts the file content. Reported and tested by: Masachika ISHIZUKA <ish@amail.plala.or.jp> Discussed with: rmacklem (related issues) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-09-17 18:41:39 +00:00

1 2 3 4 5 ...

568 Commits