freebsd-dev

Author	SHA1	Message	Date
Mateusz Guzik	7fbeaf33b8	tmpfs: drop useless parent locking from tmpfs_dir_getdotdotdent The id field is immutable until the node gets freed.	2021-05-29 22:04:10 +00:00
Jason A. Harmening	6d3e78ad6c	VFS_QUOTACTL(9): allow implementation to indicate busy state changes Instead of requiring all implementations of vfs_quotactl to unbusy the mount for Q_QUOTAON and Q_QUOTAOFF, add an "mp_busy" in/out param to VFS_QUOTACTL(9). The implementation may then indicate to the caller whether it needed to unbusy the mount. Reviewed By: kib, markj Differential Revision: https://reviews.freebsd.org/D30218	2021-05-29 14:05:39 -07:00
Rick Macklem	96b40b8967	nfscl: Use hash lists to improve expected search performance for opens A problem was reported via email, where a large (130000+) accumulation of NFSv4 opens on an NFSv4 mount caused significant lock contention on the mutex used to protect the client mount's open/lock state. Although the root cause for the accumulation of opens was not resolved, it is obvious that the NFSv4 client is not designed to handle 100000+ opens efficiently. When searching for an open, usually for a match by file handle, a linear search of all opens is done. Commit `3f7e14ad93` added a hash table of lists hashed on file handle for the opens. This patch uses the hash lists for searching for a matching open based of file handle instead of an exhaustive linear search of all opens. This change appears to be performance neutral for a small number of opens, but should improve expected performance for a large number of opens. This commit should not affect the high level semantics of open handling. MFC after: 2 weeks	2021-05-27 19:08:36 -07:00
Rick Macklem	724072ab1d	nfscl: Use hash lists to improve expected search performance for opens A problem was reported via email, where a large (130000+) accumulation of NFSv4 opens on an NFSv4 mount caused significant lock contention on the mutex used to protect the client mount's open/lock state. Although the root cause for the accumulation of opens was not resolved, it is obvious that the NFSv4 client is not designed to handle 100000+ opens efficiently. When searching for an open, usually for a match by file handle, a linear search of all opens is done. Commit `3f7e14ad93` added a hash table of lists hashed on file handle for the opens. This patch uses the hash lists for searching for a matching open based of file handle instead of an exhaustive linear search of all opens. This change appears to be performance neutral for a small number of opens, but should improve expected performance for a large number of opens. This patch also moves any found match to the front of the hash list, to try and maintain the hash lists in recently used ordering (least recently used at the end of the list). This commit should not affect the high level semantics of open handling. MFC after: 2 weeks	2021-05-25 14:19:29 -07:00
Rick Macklem	3f7e14ad93	nfscl: Add hash lists for the NFSv4 opens A problem was reported via email, where a large (130000+) accumulation of NFSv4 opens on an NFSv4 mount caused significant lock contention on the mutex used to protect the client mount's open/lock state. Although the root cause for the accumulation of opens was not resolved, it is obvious that the NFSv4 client is not designed to handle 100000+ opens efficiently. When searching for an open, usually for a match by file handle, a linear search of all opens is done. This patch adds a table of hash lists for the opens, hashed on file handle. This table will be used by future commits to search for an open based on file handle more efficiently. MFC after: 2 weeks	2021-05-22 14:53:56 -07:00
Konstantin Belousov	f784da883f	Move mnt_maxsymlinklen into appropriate fs mount data structures Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week X-MFC-Note: struct mount layout Differential revision: https://reviews.freebsd.org/D30325	2021-05-22 15:16:09 +03:00
Konstantin Belousov	42881526d4	nullfs: dirty v_object must imply the need for inactivation Otherwise pages are cleaned some time later when the lower fs decides that it is time to do it. This mostly manifests itself as delayed mtime update, e.g. breaking make-like programs. Reported by: mav Tested by: mav, pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2021-05-22 12:30:17 +03:00
Rick Macklem	d80a903a1c	nfsd: Add support for CLAIM_DELEG_PREV_FH to the NFSv4.1/4.2 Open Commit `b3d4c70dc6` added support for CLAIM_DELEG_CUR_FH to Open. While doing this, I noticed that CLAIM_DELEG_PREV_FH support could be added the same way. Although I am not aware of any extant NFSv4.1/4.2 client that uses this claim type, it seems prudent to add support for this variant of Open to the NFSv4.1/4.2 server. This patch does not affect mounts from extant NFSv4.1/4.2 clients, as far as I know. MFC after: 2 weeks	2021-05-20 18:37:40 -07:00
Rick Macklem	c28cb257dd	nfscl: Fix NFSv4.1/4.2 mount recovery from an expired lease The most difficult NFSv4 client recovery case happens when the lease has expired on the server. For NFSv4.0, the client will receive a NFSERR_EXPIRED reply from the server to indicate this has happened. For NFSv4.1/4.2, most RPCs have a Sequence operation and, as such, the client will receive a NFSERR_BADSESSION reply when the lease has expired for these RPCs. The client will then call nfscl_recover() to handle the NFSERR_BADSESSION reply. However, for the expired lease case, the first reclaim Open will fail with NFSERR_NOGRACE. This patch recognizes this case and calls nfscl_expireclient() to handle the recovery from an expired lease. This patch only affects NFSv4.1/4.2 mounts when the lease expires on the server, due to a network partitioning that exceeds the lease duration or similar. MFC after: 2 weeks	2021-05-19 14:52:56 -07:00
Mateusz Guzik	4fe925b81e	fdescfs: allow shared locking of root vnode Eliminates fdescfs from lock profile when running poudriere.	2021-05-19 17:58:54 +00:00
Mateusz Guzik	43999a5cba	pseudofs: use vget_prep + vget_finish instead of vget + the interlock	2021-05-19 17:58:42 +00:00
Rick Macklem	fc0dc94029	nfsd: Reduce the callback timeout to 800msec Recent discussion on the nfsv4@ietf.org mailing list confirmed that an NFSv4 server should reply to an RPC in less than 1second. If an NFSv4 RPC requires a delegation be recalled, the server will attempt a CB_RECALL callback. If the client is not responsive, the RPC reply will be delayed until the callback times out. Without this patch, the timeout is set to 4 seconds (set in ticks, but used as seconds), resulting in the RPC reply taking over 4sec. This patch redefines the constant as being in milliseconds and it implements that for a value of 800msec, to ensure the RPC reply is sent in less than 1second. This patch only affects mounts from clients when delegations are enabled on the server and the client is unresponsive to callbacks. MFC after: 2 weeks	2021-05-18 16:17:58 -07:00
Rick Macklem	b3d4c70dc6	nfsd: Add support for CLAIM_DELEG_CUR_FH to the NFSv4.1/4.2 Open The Linux NFSv4.1/4.2 client now uses the CLAIM_DELEG_CUR_FH variant of the Open operation when delegations are recalled and the client has a local open of the file. This patch adds support for this variant of Open to the NFSv4.1/4.2 server. This patch only affects mounts from Linux clients when delegations are enabled on the server. MFC after: 2 weeks	2021-05-18 15:53:54 -07:00
Rick Macklem	46269d66ed	NFSv4 server: Re-establish the delegation recall timeout Commit `7a606f280a` allowed the server to do retries of CB_RECALL callbacks every couple of seconds. This was needed to allow the Linux client to re-establish the back channel. However this patch broke the delegation timeout check, such that it would just keep retrying CB_RECALLS. If the client has crashed or been network patitioned from the server, this continues until the client TCP reconnects to the server and re-establishes the back channel. This patch modifies the code such that it still times out the delegation recall after some minutes, so that the server will allow the conflicting client request once the delegation times out. This patch only affects the NFSv4 server when delegations are enabled and a NFSv4 client that holds a delegation has crashed or been network partitioned from the server for at least several minutes when a delegation needs to be recalled. MFC after: 2 weeks	2021-05-16 16:40:01 -07:00
Mateusz Guzik	eec2e4ef7f	tmpfs: reimplement the mtime scan to use the lazy list Tested by: pho Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D30065	2021-05-15 20:48:45 +00:00
Mateusz Guzik	128e25842e	vm: add another pager private flag Move OBJ_SHADOWLIST around to let pager flags be next to each other. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D30258	2021-05-15 20:47:29 +00:00
Konstantin Belousov	28bc23ab92	tmpfs: dynamically register tmpfs pager Remove OBJT_SWAP_TMPFS. Move tmpfs-specific swap pager bits into tmpfs_subr.c. There is no longer any code to directly support tmpfs in sys/vm, most tmpfs knowledge is shared by non-anon swap object type implementation. The tmpfs-specific methods are provided by registered tmpfs pager, which inherits from the swap pager. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30168	2021-05-13 20:13:34 +03:00
Konstantin Belousov	8b99833ac2	procfs_map: switch to use vm_object_kvme_type to get object type, and stop enumerating OBJT_XXX constants. This also provides properly a pointer for the vnode, if object backs any. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30168	2021-05-13 20:10:35 +03:00
Rick Macklem	cb07628d9e	nfscl: Delete unneeded redundant MODULE_DEPEND() calls There are two module declarations in the nfscl.ko module for "nfscl" and "nfs". Both of these declarations had MODULE_DEPEND() calls. This patch deletes the MODULE_DEPEND() calls for "nfs" to avoid confusion with respect to what modules this module is dependent upon. The patch also adds comments explaining why there are two module declarations within the module. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D30102	2021-05-10 17:34:29 -07:00
Fedor Uporov	2a984c2b49	Make encode/decode extra time functions inline. Mentioned by: pfg MFC after: 2 weeks	2021-05-08 06:42:20 +03:00
Rick Macklem	dd02d9d605	nfscl: Add support for va_birthtime to NFSv4 There is a NFSv4 file attribute called TimeCreate that can be used for va_birthtime. r362175 added some support for use of TimeCreate. This patch completes support of va_birthtime by adding support for setting this attribute to the server. It also eanbles the client to acquire and set the attribute for a NFSv4 server that supports the attribute. Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D30156	2021-05-07 17:30:56 -07:00
Konstantin Belousov	4b8365d752	Add OBJT_SWAP_TMPFS pager This is OBJT_SWAP pager, specialized for tmpfs. Right now, both swap pager and generic vm code have to explicitly handle swap objects which are tmpfs vnode v_object, in the special ways. Replace (almost) all such places with proper methods. Since VM still needs a notion of the 'swap object', regardless of its use, add yet another type-classification flag OBJ_SWAP. Set it in vm_object_allocate() where other type-class flags are set. This change almost completely eliminates the knowledge of tmpfs from VM, and opens a way to make OBJT_SWAP_TMPFS loadable from tmpfs.ko. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D30070	2021-05-07 17:08:03 +03:00
Fedor Uporov	c40a160fd0	Make inode extra time fields updating logic more closer to linux. Found using pjdfstest: pjdfstest/tests/utimensat/09.t Reviewed by: pfg MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D29933	2021-05-07 10:46:55 +03:00
Fedor Uporov	b3f4665639	Invalidate inode extents cache on truncation. It is needed to invalidate cache in case of inode space removal to avoid situation, when extents cache returns not exist extent. Reviewed by: pfg MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D29931	2021-05-07 10:27:37 +03:00
Fedor Uporov	5679656e09	Improve extents verification logic. It is possible to walk thru inode extents if EXT2FS_PRINT_EXTENTS macro is defined. The extents headers magics and physical blocks ranges are checked during extents walk. Reviewed by: pfg MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D29932	2021-05-07 10:27:28 +03:00
Fedor Uporov	1ed5f62d61	Add chr/blk devices support. The dev field is placed into the inode structure. The major/minor numbers conversion to/from linux compatile format happen during on-disk inodes writing/reading. Reviewed by: pfg MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D29930	2021-05-07 10:08:31 +03:00
Fedor Uporov	1484574843	Fix inode birthtime updating logic. The birthtime field of struct vattr does not checked for VNOVAL in case of ext2_setattr() and produce incorrect inode birthtime values. Found using pjdfstest: pjdfstest/tests/utimensat/03.t Reviewed by: pfg MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D29929	2021-05-07 10:08:20 +03:00
Mark Johnston	8bde6d15d1	nfsclient: Copy only initialized fields in nfs_getattr() When loading attributes from the cache, the NFS client is careful to copy only the fields that it initialized. After fetching attributes from the server, however, it would copy the entire vattr structure initialized from the RPC response, so uninitialized stack bytes would end up being copied to userspace. In particular, va_birthtime (v2 and v3) and va_gen (v3) had this problem. Use a common subroutine to copy fields provided by the NFS client, and ensure that we provide a dummy va_gen for the v3 case. Reviewed by: rmacklem Reported by: KMSAN MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30090	2021-05-04 08:53:57 -04:00
Rick Macklem	0755df1eee	nfscl: fix typo in a comment MFC after: 2 weeks	2021-05-03 18:29:27 -07:00
Mark Johnston	243b324f96	devfs: Avoid comparison with an uninitialized var in devfs_fp_check() devvn_refthread() will initialize *devp only if it succeeds, so check for success before comparing with fp->f_data. Other devvn_refthread() callers are careful to do this. Reported by: KMSAN Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30068	2021-05-03 13:24:30 -04:00
Rick Macklem	f6fec55fe3	nfscl: add check for NULL clp and forced dismounts to nfscl_delegreturnvp() Commit `aad780464f` added a function called nfscl_delegreturnvp() to return delegations during the NFS VOP_RECLAIM(). The function erroneously assumed that nm_clp would be non-NULL. It will be NULL for NFSV4.0 mounts until a regular file is opened. It will also be NULL during vflush() in nfs_unmount() for a forced dismount. This patch adds a check for clp == NULL to fix this. Also, since it makes no sense to call nfscl_delegreturnvp() during a forced dismount, the patch adds a check for that case and does not do the call during forced dismounts. PR: 255436 Reported by: ish@amail.plala.or.jp MFC after: 2 weeks	2021-04-27 17:30:16 -07:00
Rick Macklem	f5ff282bc0	nfscl: fix the handling of NFSERR_DELAY for Open/LayoutGet RPCs For a pNFS mount, the NFSv4.1/4.2 client uses compound RPCs that have both Open and LayoutGet operations in them. If the pNFS server were tp reply NFSERR_DELAY for one of these compounds, the retry after a delay cannot be handled by newnfs_request(), since there is a reference held on the open state for the Open operation in them. Fix this by adding these RPCs to the "don't do delay here" list in newnfs_request(). This patch is only needed if the mount is using pNFS (the "pnfs" mount option) and probably only matters if the MDS server is issuing delegations as well as pNFS layouts. Found by code inspection. MFC after: 2 weeks	2021-04-26 17:48:21 -07:00
Rick Macklem	8759773148	nfsd: fix the slot sequence# when a callback fails Commit `4281bfec36` patched the server so that the callback session slot would be free'd for reuse when a callback attempt fails. However, this can often result in the sequence# for the session slot to be advanced such that the client end will reply NFSERR_SEQMISORDERED. To avoid the NFSERR_SEQMISORDERED client reply, this patch negates the sequence# advance for the case where the callback has failed. The common case is a failed back channel, where the callback cannot be sent to the client, and not advancing the sequence# is correct for this case. For the uncommon case where the client's reply to the callback is lost, not advancing the sequence# will indicate to the client that the next callback is a retry and not a new callback. But, since the FreeBSD server always sets "csa_cachethis" false in the callback sequence operation, a retry and a new callback should be handled the same way by the client, so this should not matter. Until you have this patch in your NFSv4.1/4.2 server, you should consider avoiding the use of delegations. Even with this patch, interoperation with the Linux NFSv4.1/4.2 client in kernel versions prior to 5.3 can result in frequent 15second delays if delegations are enabled. This occurs because, for kernels prior to 5.3, the Linux client does a TCP reconnect every time it sees multiple concurrent callbacks and then it takes 15seconds to recover the back channel after doing so. MFC after: 2 weeks	2021-04-26 16:24:10 -07:00
Rick Macklem	aad780464f	nfscl: return delegations in the NFS VOP_RECLAIM() After a vnode is recycled it can no longer be acquired via vfs_hash_get() and, as such, a delegation for the vnode cannot be recalled. In the unlikely event that a delegation still exists when the vnode is being recycled, return the delegation since it will no longer be recallable. Until you have this patch in your NFSv4 client, you should consider avoiding the use of delegations. MFC after: 2 weeks	2021-04-25 17:57:55 -07:00
Rick Macklem	02695ea890	nfscl: fix delegation recall when the file is not open Without this patch, if a NFSv4 server recalled a delegation when the file is not open, the renew thread would block in the NFS VOP_INACTIVE() trying to acquire the client state lock that it already holds. This patch fixes the problem by delaying the vrele() call until after the client state lock is released. This bug has been in the NFSv4 client for a long time, but since it only affects delegation when recalled due to another client opening the file, it got missed during previous testing. Until you have this patch in your client, you should avoid the use of delegations. MFC after: 2 weeks	2021-04-25 12:55:00 -07:00
Rick Macklem	4281bfec36	nfsd: fix session slot handling for failed callbacks When the NFSv4.1/4.2 server does a callback to a client on the back channel, it will use a session slot in the back channel session. If the back channel has failed, the callback will fail and, without this patch, the session slot will not be released. As more callbacks are attempted, all session slots can become busy and then the nfsd thread gets stuck waiting for a back channel session slot. This patch frees the session slot upon callback failure to avoid this problem. Without this patch, the problem can be avoided by leaving delegations disabled in the NFS server. MFC after: 2 weeks	2021-04-23 15:24:47 -07:00
Rick Macklem	78ffcb86d9	nfscommon: fix function name in comment MFC after: 2 weeks	2021-04-19 20:09:46 -07:00
Rick Macklem	5a89498d19	nfsd: fix stripe size reply for the File Layout pNFS server At a recent testing event I found out that I had misinterpreted RFC5661 where it describes the stripe size in the File Layout's nfl_util field. This patch fixes the pNFS File Layout server so that it returns the correct value to the NFSv4.1/4.2 pNFS enabled client. This affects almost no one, since pNFS server configurations are rare and the extant pNFS aware NFS clients seemed to function correctly despite the erroneous stripe size. It might be needed for correct behaviour if a recent Linux client mounts a FreeBSD pNFS server configuration that is using File Layout (non-mirrored configuration). MFC after: 2 weeks	2021-04-19 17:54:54 -07:00
Rick Macklem	34256484af	Revert "nfsd: cut the Linux NFSv4.1/4.2 some slack w.r.t. RFC5661" This reverts commit `9edaceca81`. It turns out that the Linux client intentionally does an NFSv4.1 RPC with only a Sequence operation in it and with "seqid + 1" for the slot. This is used to re-synchronize the slot's seqid and the client expects the NFS4ERR_SEQ_MISORDERED error reply. As such, revert the patch, so that the server remains RFC5661 compliant.	2021-04-15 14:08:40 -07:00
Konstantin Belousov	5edf7227ec	pseudofs: limit writes to 1M Noted and reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D29752	2021-04-14 10:23:21 +03:00
Konstantin Belousov	8cca7b7f28	nfs client: depend on xdr Since `7763814fc9` nfsrpc_setclient() uses mem_alloc() that is macro around malloc(M_RPC). M_RPC is provided by xdr.ko. Reviewed by: rmacklem Sponsored by: Mellanox Technologies/NVidia Networking MFC after: 1 week	2021-04-13 18:04:43 +03:00
Rick Macklem	9edaceca81	nfsd: cut the Linux NFSv4.1/4.2 some slack w.r.t. RFC5661 Recent testing of network partitioning a FreeBSD NFSv4.1 server from a Linux NFSv4.1 client identified problems with both the FreeBSD server and Linux client. Sometimes, after some Linux NFSv4.1/4.2 clients establish a new TCP connection, they will advance the sequence number for a session slot by 2 instead of 1. RFC5661 specifies that a server should reply NFS4ERR_SEQ_MISORDERED for this case. This might result in a system call error in the client and seems to disable future use of the slot by the client. Since advancing the sequence number by 2 seems harmless, allow this case if vfs.nfs.linuxseqsesshack is non-zero. Note that, if the order of RPCs is actually reversed, a subsequent RPC with a smaller sequence number value for the slot will be received. This will result in a NFS4ERR_SEQ_MISORDERED reply. This has not been observed during testing. Setting vfs.nfs.linuxseqsesshack to 0 will provide RFC5661 compliant behaviour. This fix affects the fairly rare case where a NFSv4 Linux client does a TCP reconnect and then apparently erroneously increments the sequence number for the session slot twice during the reconnect cycle. PR: 254816 MFC after: 2 weeks	2021-04-11 16:51:25 -07:00
Rick Macklem	7763814fc9	nfsv4 client: do the BindConnectionToSession as required During a recent testing event, it was reported that the NFSv4.1/4.2 server erroneously bound the back channel to a new TCP connection. RFC5661 specifies that the fore channel is implicitly bound to a new TCP connection when an RPC with Sequence (almost any of them) is done on it. For the back channel to be bound to the new TCP connection, an explicit BindConnectionToSession must be done as the first RPC on the new connection. Since new TCP connections are created by the "reconnect" layer (sys/rpc/clnt_rc.c) of the krpc, this patch adds an optional upcall done by the krpc whenever a new connection is created. The patch also adds the specific upcall function that does a BindConnectionToSession and configures the krpc to call it when required. This is necessary for correct interoperability with NFSv4.1/NFSv4.2 servers when the nfscbd daemon is running. If doing NFSv4.1/NFSv4.2 mounts without this patch, it is recommended that the nfscbd daemon not be running and that the "pnfs" mount option not be specified. PR: 254840 Comments by: asomers MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D29475	2021-04-11 14:34:57 -07:00
Rick Macklem	22cefe3d83	nfsd: fix replies from session cache for multiple retries Recent testing of network partitioning a FreeBSD NFSv4.1 server from a Linux NFSv4.1 client identified problems with both the FreeBSD server and Linux client. Commit `05a39c2c1c` fixed replying with the cached reply in in the session slot if same session slot sequence#. However, the code uses the reply and, as such, will fail for a subsequent retry of the RPC. A subsequent retry would be an extremely rare event, but this patch fixes this, so long as m_copym(..M_NOWAIT) does not fail, which should also be a rare event. This fix affects the exceedingly rare case where a NFSv4 client retries a non-idempotent RPC, such as a lock operation, multiple times. Note that retries only occur after the client has needed to create a new TCP connection, with a new TCP connection for each retry. MFC after: 2 weeks	2021-04-10 15:50:25 -07:00
Rick Macklem	05a39c2c1c	nfsd: fix replies from session cache for retried RPCs Recent testing of network partitioning a FreeBSD NFSv4.1 server from a Linux NFSv4.1 client identified problems with both the FreeBSD server and Linux client. The FreeBSD server failec to reply using the cached reply in the session slot when an RPC was retried on the session slot, as indicated by same slot sequence#. This patch fixes this. It should also fix a similar failure for NFSv4.0 mounts, when the sequence# in the open/lock_owner requires a reply be done from an entry locked into the DRC. This fix affects the fairly rare case where a NFSv4 client retries a non-idempotent RPC, such as a lock operation. Note that retries only occur after the client has needed to create a new TCP connection. MFC after: 2 weeks	2021-04-08 14:04:22 -07:00
Rick Macklem	7a606f280a	nfsd: make the server repeat CB_RECALL every couple of seconds Commit `01ae8969a9` stopped the NFSv4.1/4.2 server from implicitly binding the back channel to a new TCP connection so that it conforms to RFC5661, for NFSv4.1/4.2. An effect of this for the Linux NFS client is that it will do a BindConnectionToSession when it sees NFSV4SEQ_CBPATHDOWN set in a sequence reply. This will fix the back channel, but the first attempt at a callback like CB_RECALL will already have failed. Without this patch, a CB_RECALL will not be retried and that can result in a 5 minute delay until the delegation times out. This patch modifies the code so that it will retry the CB_RECALL every couple of seconds, often avoiding the 5 minute delay. This is not critical for correct behaviour, but avoids the 5 minute delay for the case where the Linux client re-binds the back channel via BindConnectionToSession. MFC after: 2 weeks	2021-04-04 18:15:54 -07:00
Rick Macklem	6f2addd838	nfsd: fix BindConnectionToSession so that it clears "cb path down" Commit `01ae8969a9` stopped the NFSv4.1/4.2 server from implicitly binding the back channel to a new TCP connection so that it conforms to RFC5661, for NFSv4.1/4.2. An effect of this for the Linux NFS client is that it will do a BindConnectionToSession when it sees NFSV4SEQ_CBPATHDOWN set in a sequence reply. It will do this for every RPC reply until it no longer sees the flag. Without that patch, this will happen until the client does an Open, which will clear LCL_CBDOWN. This patch clears LCL_CBDOWN right away, so that NFSV4SEQ_CBPATHDOWN will no longer be sent to the client in Sequence replies and the Linux client will not repeat the BindConnectionToSession RPCs. This is not critical for correct behaviour, but reduces RPC overheads for cases where the Open will not be done for a while. MFC after: 2 weeks	2021-04-04 15:05:39 -07:00
Konstantin Belousov	76b1b5ce6d	nullfs: protect against user creating inconsistent state The VFS conventions is that VOP_LOOKUP() methods do not need to handle ISDOTDOT lookups for VV_ROOT vnodes (since they cannot, after all). Nullfs bypasses VOP_LOOKUP() to lower filesystem, and there, due to user actions, it is possible to get into situation where - upper vnode does not have VV_ROOT set - lower vnode is root - ISDOTDOT is requested User just needs to nullfs-mount non-root of some filesystem, and then move some directory under mount, out of mount, using lower filesystem. In this case, nullfs cannot do much, but we still should and can ensure internal kernel structures are consistent. Avoid ISDOTDOT lookup forwarding when VV_ROOT is set on lower dvp, return somewhat arbitrary ENOENT. PR: 253593 Reported by: Gregor Koscak <elogin41@gmail.com> Test by: Patrick Sullivan <sulli00777@gmail.com> Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2021-04-02 15:40:25 +03:00
Rick Macklem	4e6c2a1ee9	nfsv4 client: factor loop contents out into a separate function Commit `fdc9b2d50f` replaced a couple of while loops with LIST_FOREACH() loops. This patch factors the body of that loop out into a separate function called nfscl_checkown(). This prepares the code for future changes to use a hash table of lists for open searches via file handle. This patch should not result in a semantics change. MFC after: 2 weeks	2021-04-01 15:36:37 -07:00
Rick Macklem	01ae8969a9	nfsd: do not implicitly bind the back channel for NFSv4.1/4.2 mounts The NFSv4.1 (and 4.2 on 13) server incorrectly binds a new TCP connection to the back channel when first used by an RPC with a Sequence op in it (almost all of them). RFC5661 specifies that only the fore channel should be bound. This was done because early clients (including FreeBSD) did not do the required BindConnectionToSession RPC. Unfortunately, this breaks the Linux client when the "nconnects" mount option is used, since the server may do a callback on the incorrect TCP connection. This patch converts the server behaviour to that required by the RFC. It also makes the server test/indicate failure of the back channel more aggressively. Until this patch is applied to the server, the "nconnects" mount option is not recommended for a Linux NFSv4.1/4.2 client mount to the FreeBSD server. Reported by: bcodding@redhat.com Tested by: bcodding@redhat.com PR: 254560 MFC after: 1 week	2021-03-30 14:31:05 -07:00
Rick Macklem	fdc9b2d50f	nfsv4 client: replace while loops with LIST_FOREACH() loops This patch replaces a couple of while() loops with LIST_FOREACH() loops. While here, declare a couple of variables "bool". I think LIST_FOREACH() is preferred and makes the code more readable. This also prepares the code for future changes to use a hash table of lists for open searches via file handle. This patch should not result in a semantics change. MFC after: 2 weeks	2021-03-29 14:14:51 -07:00
Rick Macklem	e61b29ab5d	nfsv4.1/4.2 client: fix handling of delegations for "oneopenown" mnt option If a delegation for a file has been acquired, the "oneopenown" option was ignored when the local open was issued. This could result in multiple openowners/opens for a file, that would be transferred to the server when the delegation was recalled. This would not be serious, but could result in more than one openowner. Since the Amazon/EFS does not issue delegations, this probably never occurs in practice. Spotted during code inspection. This small patch fixes the code so that it checks for "oneopenown" when doing client local opens on a delegation. MFC after: 2 weeks	2021-03-29 12:09:19 -07:00
Rick Macklem	82ee386c2a	nfsv4 client: fix forced dismount when sleeping in the renew thread During a recent NFSv4 testing event a test server caused a hang where "umount -N" failed. The renew thread was sleeping on "nfsv4lck" and the "umount" was sleeping, waiting for the renew thread to terminate. This is the second of two patches that is hoped to fix the renew thread so that it will terminate when "umount -N" is done on the mount. This patch adds a 5second timeout on the msleep()s and checks for the forced dismount flag so that the renew thread will wake up and see the forced dismount flag. Normally a wakeup() will occur in less than 5seconds, but if a premature return from msleep() does occur, it will simply loop around and msleep() again. The patch also adds the "mp" argument to nfsv4_lock() so that it will return when the forced dismount flag is set. While here, replace the nfsmsleep() wrapper that was used for portability with the actual msleep() call. MFC after: 2 weeks	2021-03-23 13:04:37 -07:00
Alan Somers	9c5aac8f2e	fusefs: fix a dead store in fuse_vnop_advlock kevans actually caught this in the original review and I fixed it, but then I committed an older copy of the branch. Whoops. Reported by: kevans MFC after: 13 days MFC with: `929acdb19a` Differential Revision: https://reviews.freebsd.org/D29031	2021-03-19 19:38:57 -06:00
Rick Macklem	5f742d3879	nfsv4 client: fix forced dismount when sleeping on nfsv4lck During a recent NFSv4 testing event a test server caused a hang where "umount -N" failed. The renew thread was sleeping on "nfsv4lck" and the "umount" was sleeping, waiting for the renew thread to terminate. This is the first of two patches that is hoped to fix the renew thread so that it will terminate when "umount -N" is done on the mount. nfsv4_lock() checks for forced dismount, but only after it wakes up from msleep(). Without this patch, a wakeup() call was required. This patch adds a 1second timeout on the msleep(), so that it will wake up and see the forced dismount flag. Normally a wakeup() will occur in less than 1second, but if a premature return from msleep() does occur, it will simply loop around and msleep() again. While here, replace the nfsmsleep() wrapper that was used for portability with the actual msleep() call and make the same change for nfsv4_getref(). MFC after: 2 weeks	2021-03-19 14:09:33 -07:00
Alan Somers	929acdb19a	fusefs: fix two bugs regarding fcntl file locks 1) F_SETLKW (blocking) operations would be sent to the FUSE server as F_SETLK (non-blocking). 2) Release operations, F_SETLK with lk_type = F_UNLCK, would simply return EINVAL. PR: 253500 Reported by: John Millikin <jmillikin@gmail.com> MFC after: 2 weeks	2021-03-18 17:09:10 -06:00
Rick Macklem	fd232a21bb	nfsv4 pnfs client: fix updating of the layout stateid.seqid During a recent NFSv4 testing event a test server was replying NFSERR_OLDSTATEID for layout stateids presented to the server for LayoutReturn operations. Upon rereading RFC5661, it was apparent that the FreeBSD NFSv4.1/4.2 pNFS client did not maintain the seqid field of the layout stateid correctly. This patch is believed to correct the problem. Tested against a FreeBSD pNFS server with diagnostics added to check the stateid's seqid did not indicate problems. Unfortunately, testing aginst this server will not happen in the near future, so the fix may not be correct yet. MFC after: 2 weeks	2021-03-18 12:20:25 -07:00
Gordon Bergling	5666643a95	Fix some common typos in comments - occured -> occurred - normaly -> normally - controling -> controlling - fileds -> fields - insterted -> inserted - outputing -> outputting MFC after: 1 week	2021-03-13 18:26:15 +01:00
Konstantin Belousov	16dea83410	null_vput_pair(): release use reference on dvp earlier We might own the last use reference, and then vrele() at the end would need to take the dvp vnode lock to inactivate, which causes deadlock with vp. We cannot vrele() dvp from start since this might unlock ldvp. Handle it by holding the vnode and dropping use ref after lowerfs VOP_VPUT_PAIR() ended. This effectivaly requires unlock of the vp vnode after VOP_VPUT_PAIR(), so the call is changed to set unlock_vp to true unconditionally. This opens more opportunities for vp to be reclaimed, if lvp is still alive we reinstantiate vp with null_nodeget(). Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178	2021-03-12 13:31:08 +02:00
Rick Macklem	c04199affe	nfsclient: Fix ReadDS/WriteDS/CommitDS nfsstats RPC counts for a NFSv3 DS During a recent virtual NFSv4 testing event, a bug in the FreeBSD client was detected when doing I/O DS operations on a Flexible File Layout pNFS server. For an NFSv3 DS, the Read/Write/Commit nfsstats were incremented instead of the ReadDS/WriteDS/CommitDS counts. This patch fixes this. Only the RPC counts reported by nfsstat(1) were affected by this bug, the I/O operations were performed correctly. MFC after: 2 weeks	2021-03-02 14:18:23 -08:00
Rick Macklem	94f2e42f5e	nfsclient: Fix the stripe unit size for a File Layout pNFS layout During a recent virtual NFSv4 testing event, a bug in the FreeBSD client was detected when doing a File Layout pNFS DS I/O operation. The size of the I/O operation was smaller than expected. The I/O size is specified as a stripe unit size in bits 6->31 of nflh_util in the layout. I had misinterpreted RFC5661 and had shifted the value right by 6 bits. The correct interpretation is to use the value as presented (it is always an exact multiple of 64), clearing bits 0->5. This patch fixes this. Without the patch, I/O through the DSs work, but the I/O size is 1/64th of what is optimal. MFC after: 2 weeks	2021-03-01 12:49:32 -08:00
Rick Macklem	15bed8c46b	nfsclient: add nfs node locking around uses of n_direofoffset During code inspection I noticed that the n_direofoffset field of the NFS node was being manipulated without any lock being held to make it SMP safe. This patch adds locking of the NFS node's mutex around handling of n_direofoffset to make it SMP safe. I have not seen any failure that could be attributed to n_direofoffset being manipulated concurrently by multiple processors, but I think this is possible, since directories are read with shared vnode locking, plus locks only on individual buffer cache blocks. However, there have been as yet unexplained issues w.r.t reading large directories over NFS that could have conceivably been caused by concurrent manipulation of n_direofoffset. MFC after: 2 weeks	2021-02-28 14:53:54 -08:00
Rick Macklem	3e04ab36ba	nfsclient: add checks for a server returning the current directory Commit `3fe2c68ba2` dealt with a panic in cache_enter_time() where the vnode referred to the directory argument. It would also be possible to get these panics if a broken NFS server were to return the directory as an new object being created within the directory or in a Lookup reply. This patch adds checks to avoid the panics and logs messages to indicate that the server is broken for the file object creation cases. Reviewd by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28987	2021-02-28 14:15:32 -08:00
Alexander Motin	d01032736c	Fix diroffdiroff, probably copy/paste bug. Too long name looks bad in `vmstat -m`. MFC after: 1 week	2021-02-28 09:08:31 -05:00
Rick Macklem	3fe2c68ba2	nfsclient: fix panic in cache_enter_time() Juraj Lutter (otis@) reported a panic "dvp != vp not true" in cache_enter_time() called from the NFS client's nfsrpc_readdirplus() function. This is specific to an NFSv3 mount with the "rdirplus" mount option. Unlike NFSv4, NFSv3 replies to ReaddirPlus includes entries for the current directory. This trivial patch avoids doing a cache_enter_time() call for the current directory to avoid the panic. Reported by: otis Tested by: otis Reviewed by: mjg MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D28969	2021-02-27 17:54:05 -08:00
Ryan Libby	d7671ad8d6	Close races in vm object chain traversal for unlock We were unlocking the vm object before reading the backing_object field. In the meantime, the object could be freed and reused. This could cause us to go off the rails in the object chain traversal, failing to unlock the rest of the objects in the original chain and corrupting the lock state of the victim chain. Reviewed by: bdrewery, kib, markj, vangyzen MFC after: 3 days Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D28926	2021-02-25 12:11:19 -08:00
Alex Richardson	ba2cfa80e1	Fix makefs bootstrap after `d485c77f20` The makefs msdosfs code includes fs/msdosfs/denode.h which directly uses struct buf from <sys/buf.h> rather than the makefs struct m_buf. To work around this problem provide a local denode.h that includes ffs/buf.h and defines buf as an alias for m_buf. Reviewed By: kib, emaste Differential Revision: https://reviews.freebsd.org/D28835	2021-02-22 17:55:45 +00:00
Konstantin Belousov	8b7239681e	ext2fs: clear write cluster tracking on truncation Reviewed by: fsu, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28679	2021-02-21 11:38:21 +02:00
Konstantin Belousov	2bfd8992c7	vnode: move write cluster support data to inodes. The data is only needed by filesystems that 1. use buffer cache 2. utilize clustering write support. Requested by: mjg Reviewed by: asomers (previous version), fsu (ext2 parts), mckusick Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28679	2021-02-21 11:38:21 +02:00
Konstantin Belousov	d485c77f20	Remove #define _KERNEL hacks from libprocstat Make sys/buf.h, sys/pipe.h, sys/fs/devfs/devfs*.h headers usable in userspace, assuming that the consumer has an idea what it is for. Unhide more material from sys/mount.h and sys/ufs/ufs/inode.h, sys/ufs/ufs/ufsmount.h for consumption of userspace tools, with the same caveat. Remove unacceptable hack from usr.sbin/makefs which relied on sys/buf.h being unusable in userspace, where it override struct buf with its own definition. Instead, provide struct m_buf and struct m_vnode and adapt code to use local variants. Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D28679	2021-02-21 11:38:21 +02:00
Alexander V. Chernikov	605284b894	Enforce net epoch in in6_selectsrc(). in6_selectsrc() may call fib6_lookup() in some cases, which requires epoch. Wrap in6_selectsrc* calls into epoch inside its users. Mark it as requiring epoch by adding NET_EPOCH_ASSERT(). MFC after: 1 weeek Differential Revision: https://reviews.freebsd.org/D28647	2021-02-15 22:33:12 +00:00
Alan Somers	71befc3506	fusefs: set d_off during VOP_READDIR This allows d_off to be used with lseek to position the file so that getdirentries(2) will return the next entry. It is not used by readdir(3). PR: 253411 Reported by: John Millikin <jmillikin@gmail.com> Reviewed by: cem MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28605	2021-02-12 21:50:52 -07:00
Konstantin Belousov	4a21bcb241	nfsserver: use VOP_VPUT_PAIR(). Apply VOP_VPUT_PAIR() to the end of vnode operations after the VOP_MKNOD(), VOP_MKDIR(), VOP_LINK(), VOP_SYMLINK(), VOP_CREATE(). Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2021-02-12 03:02:21 +02:00
Konstantin Belousov	e4aaf35ab5	nullfs: provide special bypass for VOP_VPUT_PAIR Generic bypass cannot understand the rules of liveness for the VOP. Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2021-02-12 03:02:20 +02:00
Konstantin Belousov	ee965dfa64	vn_open(): If the vnode is reclaimed during open(2), do not return error. Most future operations on the returned file descriptor will fail anyway, and application should be ready to handle that failures. Not forcing it to understand the transient failure mode on open, which is implementation-specific, should make us less special without loss of reporting of errors. Suggested by: chs Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2021-02-12 03:02:20 +02:00
Mateusz Guzik	3bc17248d3	devfs: fix use count leak when using TIOCSCTTY by matching devfs_ctty_ref Fixes: `3b44443626` ("devfs: rework si_usecount to track opens")	2021-02-09 01:54:21 +00:00
Edward Tomasz Napierala	b8073b3c74	msdosfs: fix vnode leak with msdosfs_rename() This could happen when failing due to disappearing source file. Reviewed By: kib Tested by: pho Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D27338	2021-01-31 21:37:44 +00:00
Edward Tomasz Napierala	cb69621249	msdosfs: fix double unlock if the source file disappears We would unlock fvp here, only to unlock it again below, just before "bad". Reviewed By: kib Tested by: pho Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D27339	2021-01-31 21:35:34 +00:00
Alex Richardson	1d15bceae6	tmpfs: implement pathconf(_PC_SYMLINK_MAX) This fixes one of the sys/audit tests when running them on tmpfs. Reviewed By: delphij, kib Differential Revision: https://reviews.freebsd.org/D28387	2021-01-29 09:30:25 +00:00
Kyle Evans	0f919ed4ae	tmpfs: push VEXEC check into tmpfs_lookup() vfs_cache_lookup() has already done the appropriate VEXEC check, therefore we must not re-check in VOP_CACHEDLOOKUP. This fixes O_SEARCH semantics on tmpfs and removes a redundant descent into VOP_ACCESS() in the common case. Reported-by: arichardson (via CheriBSD Jenkins CI) Reviewed-by: kib MFC-after: 3 days Differential Revision: https://reviews.freebsd.org/D28401	2021-01-28 19:25:11 -06:00
Mateusz Guzik	c09f799271	tmpfs: drop acq fence now that vn_load_v_data_smr has consume semantics	2021-01-25 22:40:15 +00:00
Mateusz Guzik	cc96f92a57	atomic: make atomic_store_ptr type-aware	2021-01-25 22:40:15 +00:00
Alex Richardson	8d55837dc1	qeueue.h: Add {SLIST,STAILQ,LIST,TAILQ}_END() We provide these for compat with other queue.h headers since some software assumes it exists (e.g. the libevent contrib code), but we are not encouraging their use (NULL should be used instead). This fixes the following warning (which should arguable be an error since it results in a function call to an undefined function): .../contrib/libevent/buffer.c:495:16: warning: implicit declaration of function 'LIST_END' is invalid in C99 [-Wimplicit-function-declaration] cbent != LIST_END(&buffer->callbacks); ^ .../contrib/libevent/buffer.c:495:13: warning: comparison between pointer and integer ('struct evbuffer_cb_entry *' and 'int') [-Wpointer-integer-compare] cbent != LIST_END(&buffer->callbacks); ~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Reviewed By: jhb Differential Revision: https://reviews.freebsd.org/D27151	2021-01-25 15:09:35 +00:00
Konstantin Belousov	bd01a69f48	nfs_write(): do not call ncl_pager_setsize() after clearing TDP2_SBPAGES This might unnecessary truncate file undoing extension done by the write. Reported by: Yasuhiro Kimura <yasu@utahime.org> Reviewed by: rmacklem Tested by: rmacklem, Yasuhiro Kimura <yasu@utahime.org> MFC after: 6 days Sponsored by: The FreeBSD Foundation	2021-01-25 01:02:03 +02:00
Konstantin Belousov	aa8c1f8d84	nfs client: block vnode_pager_setsize() calls from nfscl_loadattrcache in nfs_write Otherwise writing thread might wait on sbusy state of the pages which were busied by itself, similarly to nfs_read(). But also we need to clear NVNSETSZKSIP flag possibly set by ncl_pager_setsize(), to not undo extension done by write. Reported by: bdrewery Reviewed by: rmacklem Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28306	2021-01-23 17:24:32 +02:00
Mateusz Guzik	618029af50	tmpfs: add support for lockless symlink lookup Reviewed by: kib (previous version) Tested by: pho (previous version) Differential Revision: https://reviews.freebsd.org/D27488	2021-01-23 15:04:43 +00:00
Mateusz Guzik	739ecbcf1c	cache: add symlink support to lockless lookup Reviewed by: kib (previous version) Tested by: pho (previous version) Differential Revision: https://reviews.freebsd.org/D27488	2021-01-23 15:04:43 +00:00
Konstantin Belousov	2d1e4220eb	tmpfs_reclaim: detach unlinked node on dereferencing. Otherwise it is dereferenced one extra time at unmount, if it survives long enough. One way to hold the reference on such node is to keep it open. tmpfs_vptocnp() now needs to account for the possibility that unlocked node was removed from the list. Reported by: danfe Tested by: danfe, pho MFC after: 1 week Sponsored by: The FreeBSD Foundation	2021-01-14 14:51:37 +02:00
Konstantin Belousov	685265ecfb	tmpfs_reclaim: style MFC after: 3 days Sponsored by: The FreeBSD Foundation	2021-01-14 14:43:13 +02:00
Mateusz Guzik	6b3a9a0f3d	Convert remaining cap_rights_init users to cap_rights_init_one semantic patch: @@ expression rights, r; @@ - cap_rights_init(&rights, r) + cap_rights_init_one(&rights, r)	2021-01-12 13:16:10 +00:00
Rick Macklem	148a227bf8	nfsd: add KASSERTs to nfsm_trimtrailing() for M_EXTPG mbufs Add KASSERTS to nfsm_trimtrailing() to confirm the sanity of the arguments for the M_EXTPG case. Suggested by: kib Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D28053	2021-01-10 13:50:15 -08:00
Konstantin Belousov	ac2576b9f7	tmpfs open: assert that there is no double-init of f_data. Sponsored by: The FreeBSD Foundation	2021-01-10 04:48:36 +02:00
Konstantin Belousov	9f200bc47b	tmpfs_free_tmp(): explicitly assert that tmp is locked Despite TMPFS_UNLOCK() is done in both paths later, unlocking not locked mutex provides different failure mode. MFC after: 1 week Sponsored by: The FreeBSD Foundation	2021-01-10 04:48:29 +02:00
Konstantin Belousov	42bebbda9e	tmpfs: make M_TMPFSMNT static to tmpfs_vfsops.c This malloc type is only used in this file. MFC after: 1 week Sponsored by: The FreeBSD Foundation	2021-01-10 04:44:55 +02:00
Alan Somers	17a82e6af8	Fix vnode locking bug in fuse_vnop_copy_file_range MFC-With: `92bbfe1f0d` Reviewed by: cem Differential Revision: https://reviews.freebsd.org/D27938	2021-01-03 11:16:20 -07:00
Mark Johnston	90f580b954	Ensure that dirent's d_off field is initialized We have the d_off field in struct dirent for providing the seek offset of the next directory entry. Several filesystems were not initializing the field, which ends up being copied out to userland. Reported by: Syed Faraz Abrar <faraz@elttam.com> Reviewed by: kib MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27792	2021-01-03 11:50:31 -05:00
Alan Somers	34477e25c1	fusefs: only check vnode locks with DEBUG_VFS_LOCKS MFC-With: `37df9d3bba` Reviewed by: cem Differential Revision: https://reviews.freebsd.org/D27939	2021-01-03 09:19:00 -07:00
Alan Somers	542711e520	Fix a vnode locking bug in fuse_vnop_advlock. Must lock the vnode before accessing the fufh table. Also, check for invalid parameters earlier. Bug introduced by r346170. MFC after: 2 weeks Reviewed by: cem Differential Revision: https://reviews.freebsd.org/D27936	2021-01-03 09:16:23 -07:00
Mateusz Guzik	3e506a67bb	vfs: add v_irflag accessors Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D27793	2021-01-03 06:50:06 +00:00
Konstantin Belousov	51a9b978e7	nfs server: improve use of the VFS KPI In particular, do not assume that vn_start_write() returns the same mp as it was passed in, or never returns error. Also be more accurate to return NULL vp and mp when error occured, to catch wrong control flow easier. Stop checking for NULL mp before calling vn_finished_write(), NULL mp is handled transparently by the function. Reviewed by: rmacklem Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27881	2021-01-02 20:17:12 +02:00
Rick Macklem	dc78533a52	nfsd: fix NFSv4.0 seqid handling for ERELOOKUP Commit `774a36851e` fixed the NFS server so that it could handle ERELOOKUP returns from VOP calls by redoing the operation/RPC. However, for NFSv4.0, redoing an Open would increment the open_owner's seqid multiple times, breaking the protocol. This patch sets a new flag called ND_ERELOOKUP on the RPC when a redo is in progress. Then the code that increments the seqid avoids the seqid increment/check when the flag is set, since it indicates this has already been done for the Open.	2021-01-01 14:21:51 -08:00
Rick Macklem	774a36851e	nfsd: fix NFS server for ERELOOKUP r367672 modified UFS such that certain VOPs, such as VOP_CREATE() will intermittently return ERELOOKUP. When this happens, the entire system call, or NFS operation in the case of the NFS server, must be redone. This patch adds that support to the NFS server by rolling back the state of the NFS request arguments and NFS reply arguments mbuf lists to the condition they were in before the operation and then redoing the operation. Tested by: pho Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D27875	2021-01-01 13:55:51 -08:00
Alan Somers	92bbfe1f0d	fusefs: implement FUSE_COPY_FILE_RANGE. This updates the FUSE protocol to 7.28, though most of the new features are optional and are not yet implemented. MFC after: 2 weeks Relnotes: yes Reviewed by: cem Differential Revision: https://reviews.freebsd.org/D27818	2021-01-01 10:18:23 -07:00
Mateusz Guzik	d71965127f	tmpfs: use VNPASS when asserting on a vnode in tmpfs_read_pgcache	2021-01-01 03:23:01 +00:00
Alan Somers	37df9d3bba	fusefs: update FUSE protocol to 7.24 and implement FUSE_LSEEK FUSE_LSEEK reports holes on fuse file systems, and is used for example by bsdtar. MFC after: 2 weeks Relnotes: yes Reviewed by: cem Differential Revision: https://reviews.freebsd.org/D27804	2020-12-31 08:51:47 -07:00
Edward Tomasz Napierala	4ddb3cc597	devfs(4): defer freeing until we drop devmtx ("cdev") Before r332974 the old code would sometimes cause a rare lock order reversal against pagequeue, which looked roughly like this: witness_checkorder() __mtx_lock-flags() vm_page_alloc() uma_small_alloc() keg_alloc_slab() keg_fetch-slab() zone_fetch-slab() zone_import() zone_alloc_bucket() uma_zalloc_arg() bucket_alloc() uma_zfree_arg() free() devfs_metoo() devfs_populate_loop() devfs_populate() devfs_rioctl() VOP_IOCTL_APV() VOP_IOCTL() vn_ioctl() fo_ioctl() kern_ioctl() sys_ioctl() Since r332974 the original problem no longer exists, but it still makes sense to move things out of the - often congested - lock. Reviewed By: kib, markj Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D27334	2020-12-29 13:47:36 +00:00
Alan Somers	4f4111d2c5	fusefs: delete some dead code The original fusefs GSoC project seems to have envisioned exchanging two types of messages with FUSE servers. Perhaps vectored and non-vectored? But in practice only one type has ever been used. Delete the other type. Reviewed by: cem Differential Revision: https://reviews.freebsd.org/D27770	2020-12-28 19:05:35 +00:00
Mark Johnston	599f904463	msdosfs: Fix a leak of dirent padding bytes This was missed in r340856 / commit `6d2e2df764`. Three bytes from the kernel stack may be leaked when reading directory entries. Reported by: Syed Faraz Abrar <faraz@elttam.com> MFC after: 3 days Sponsored by: The FreeBSD Foundation	2020-12-27 17:01:44 -05:00
Rick Macklem	665b1365fe	Add a new "tlscertname" NFS mount option. When using NFS-over-TLS, an NFS client can optionally provide an X.509 certificate to the server during the TLS handshake. For some situations, such as different NFS servers or different certificates being mapped to different user credentials on the NFS server, there may be a need for different mounts to provide different certificates. This new mount option called "tlscertname" may be used to specify a non-default certificate be provided. This alernate certificate will be stored in /etc/rpc.tlsclntd in a file with a name based on what is provided by this mount option.	2020-12-23 13:42:55 -08:00
Brooks Davis	52e63ec2f1	VFS_QUOTACTL: Remove needless casts of arg The argument is a void * so there's no need to cast it to caddr_t. Update documentation to match function decleration. Reviewed by: freqlabs Obtained from: CheriBSD MFC after: 1 week Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D27093	2020-12-17 21:58:10 +00:00
Kirk McKusick	645027c89d	In ext2fs, BA_CLRBUF is used in ext2_balloc() not UFS_BALLOC(). Noted by: kib MFC after: 3 days Sponsored by: Netflix	2020-12-08 00:49:31 +00:00
Kirk McKusick	bb3c01ec79	Document the BA_CLRBUF flag used in ufs and ext2fs filesystems. Suggested by: kib MFC after: 3 days Sponsored by: Netflix	2020-12-06 20:50:21 +00:00
Konstantin Belousov	cd85379104	Make MAXPHYS tunable. Bump MAXPHYS to 1M. Replace MAXPHYS by runtime variable maxphys. It is initialized from MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys. Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer cache buffers exactly to atop(maxbcachebuf) (currently it is sized to atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1. The +1 for pbufs allow several pbuf consumers, among them vmapbuf(), to use unaligned buffers still sized to maxphys, esp. when such buffers come from userspace (). Overall, we save significant amount of otherwise wasted memory in b_pages[] for buffer cache buffers, while bumping MAXPHYS to desired high value. Eliminate all direct uses of the MAXPHYS constant in kernel and driver sources, except a place which initialize maxphys. Some random (and arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted straight. Some drivers, which use MAXPHYS to size embeded structures, get private MAXPHYS-like constant; their convertion is out of scope for this work. Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs, dev/siis, where either submitted by, or based on changes by mav. Suggested by: mav () Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27225	2020-11-28 12:12:51 +00:00
Konstantin Belousov	f7af6e5e54	nullfs: provide custom bypass for VOP_READ_PGCACHE(). Normal bypass expects locked vnode, which is not true for VOP_READ_PGCACHE(). Ensure liveness of the lower vnode by taking the upper vnode interlock, which is also taked by null_reclaim() when setting v_data to NULL. Reported and tested by: pho Reviewed by: markj, mjg Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27327	2020-11-26 18:16:32 +00:00
Konstantin Belousov	6936779347	msdosfs: suspend around unmount or remount rw->ro. This also eliminates unsafe use of VFS_SYNC(MNT_WAIT). Requested by: mckusick Discussed with: imp Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D27269	2020-11-20 15:19:30 +00:00
Konstantin Belousov	1b3cb4dc04	msdosfs: Add trivial support for suspension. Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D27269	2020-11-20 12:31:02 +00:00
Conrad Meyer	c1c4d0e9a8	msdosfs(5): Fix debug-only format string No functional change; MSDOSFS_DEBUG isn't a real build option, so this isn't covered by LINT kernels.	2020-11-18 20:20:03 +00:00
Alan Somers	ac8c4a61af	nfs: Mark unused statistics variable as reserved FreeBSD's NFS exporter has long exported some unused statistics fields. Revision r366992 removed them from nfsstat. This revision renames those fields in the kernel's exported structures to make it clear to other consumers that they are unused. Reported by: emaste Reviewed by: emaste Sponsored by: Axcient Differential Revision: https://reviews.freebsd.org/D27258	2020-11-18 04:35:49 +00:00
Conrad Meyer	85078b8573	Split out cwd/root/jail, cmask state from filedesc table No functional change intended. Tracking these structures separately for each proc enables future work to correctly emulate clone(2) in linux(4). __FreeBSD_version is bumped (to 1300130) for consumption by, e.g., lsof. Reviewed by: kib Discussed with: markj, mjg Differential Revision: https://reviews.freebsd.org/D27037	2020-11-17 21:14:13 +00:00
Edward Tomasz Napierala	e3b1c847a4	Make it possible to mount a fuse filesystem, such as squashfuse, from a Linux binary. Should come handy for AppImages. Reviewed by: asomers MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26959	2020-11-09 08:53:15 +00:00
Mateusz Guzik	f24aa01f9d	tmpfs: reorder struct tmpfs_node to shrink it by 8 bytes The reduction (232 -> 224 bytes) allows UMA to fit one more item (17 -> 18) per slab as reported in vm.uma.TMPFS_node.keg.ipers.	2020-11-05 11:24:45 +00:00
Conrad Meyer	20172854ab	Add sbuf streaming mode to pseudofs(9), use in linprocfs(5) Add a pseudofs node flag 'PFS_AUTODRAIN', which automatically emits sbuf contents to the caller when the sbuf buffer fills. This is only permissible if the corresponding PFS node fill function can sleep whenever it appends to the sbuf. linprocfs' /proc/self/maps node happens to meet this requirement. Streaming out the file as it is composed avoids truncating the output and also avoids preallocating a very large buffer. Reviewed by: markj; earlier version: emaste, kib, trasz Differential Revision: https://reviews.freebsd.org/D27047	2020-11-05 06:48:51 +00:00
Mateusz Guzik	7c58c37ebb	tmpfs: change tmpfs dirent zone into a malloc type It is 64 bytes.	2020-10-30 14:07:25 +00:00
Mateusz Guzik	4bfebc8d2c	cache: add cache_vop_mkdir and rename cache_rename to cache_vop_rename	2020-10-30 10:46:35 +00:00
Edward Tomasz Napierala	e3c51151a0	Make it possible to mount nullfs(5) using plain mount(8) instead of mount_nullfs(8). Obviously you'd need to force mount(8) to not call mount_nullfs(8) to make use of it. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26934	2020-10-29 15:28:15 +00:00
Edward Tomasz Napierala	bce7ee9d41	Drop "All rights reserved" from all my stuff. This includes Foundation copyrights, approved by emaste@. It does not include files which carry other people's copyrights; if you're one of those people, feel free to make similar change. Reviewed by: emaste, imp, gbe (manpages) Differential Revision: https://reviews.freebsd.org/D26980	2020-10-28 13:46:11 +00:00
Mateusz Guzik	25fb30bd9a	vfs: drop spurious cache_purge on rmdir The removed directory gets cache_purged which is sufficient to remove any entries related to the parent. Note only tmpfs, ufs and zfs are patched.	2020-10-23 15:50:49 +00:00
Hans Petter Selasky	a71074e0af	Fix for loading cuse.ko via rc.d . Make sure we declare the cuse(3) module by name and not only by the version information, so that "kldstat -q -m cuse" works. Found by: Goran Mekic <meka@tilda.center> MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2020-10-23 08:44:53 +00:00
Mateusz Guzik	ab21ed17ed	vfs: drop the de facto curthread argument from VOP_INACTIVE	2020-10-20 07:19:03 +00:00
Mateusz Guzik	8ecd87a3e7	vfs: drop spurious cred argument from VOP_VPTOCNP	2020-10-20 07:18:27 +00:00
Konstantin Belousov	6b56b0ca93	nullfs: ensure correct lock is taken after bypass. If lower VOP relocked the lower vnode, it is possible that nullfs vnode was reclaimed meantime. In this case nullfs vnode no longer shares lock with lower vnode, which breaks locking protocol. Check for the condition and acquire nullfs vnode lock if detected. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2020-10-19 19:23:22 +00:00
Edward Tomasz Napierala	ce764cbd1c	Bump pseudofs size limit from 128kB to 1MB. The old limit could result in process' memory maps being truncated. PR: 237883 Submitted by: dchagin MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D20575	2020-10-16 09:58:10 +00:00
Mateusz Guzik	eb88fed446	cache: fix vexec panic when racing against vgone Use of dead_vnodeops would result in a panic instead of returning the intended EOPNOTSUPP error. While here make sure to abort, not just try to return a partial result. The former allows the regular lookup to restart from scratch, while the latter makes it stuck with an unusable vnode. Reported by: kevans	2020-10-09 19:10:00 +00:00
Pedro F. Giffuni	c2f0581e43	ext2fs: minor typo. Obtained from: Dragonfly MFC after: 3 days	2020-10-06 21:31:04 +00:00
Rick Macklem	9f669985b2	Modify the NFSv4.2 VOP_COPY_FILE_RANGE() client call to return after one successful RPC. Without this patch, the NFSv4.2 VOP_COPY_FILE_RANGE() client call would loop until the copy "len" was completed. The problem with doing this is that it might take a considerable time to complete for a large "len". By returning after a single successful Copy RPC that copied some of the data, the application that did the copy_file_range(2) syscall will be more responsive to signal delivery for large "len" copies.	2020-10-01 00:47:35 +00:00
Rick Macklem	ff45b9fc1a	Bjorn reported a problem where the Linux NFSv4.1 client is using an open_to_lock_owner4 when that lock_owner4 has already been created by a previous open_to_lock_owner4. This caused the NFS server to reply NFSERR_INVAL. For NFSv4.0, this is an error, although the updated NFSv4.0 RFC7530 notes that the correct error reply is NFSERR_BADSEQID (RFC3530 did not specify what error to return). For NFSv4.1, it is not obvious whether or not this is allowed by RFC5661, but the NFSv4.1 server can handle this case without error. This patch changes the NFSv4.1 (and NFSv4.2) server to handle multiple uses of the same lock_owner in open_to_lock_owner so that it now correctly interoperates with the Linux NFS client. It also changes the error returned for NFSv4.0 to be NFSERR_BADSEQID. Thanks go to Bjorn for diagnosing this and testing the patch. He also provided a program that I could use to reproduce the problem. Tested by: bj@cebitec.uni-bielefeld.de (Bjorn Fischer) PR: 249567 Reported by: bj@cebitec.uni-bielefeld.de (Bjorn Fischer) MFC after: 3 days	2020-09-26 23:05:38 +00:00
Alan Somers	a62772a78e	fusefs: fix mmap'd writes in direct_io mode If a FUSE server returns FOPEN_DIRECT_IO in response to FUSE_OPEN, that instructs the kernel to bypass the page cache for that file. This feature is also known by libfuse's name: "direct_io". However, when accessing a file via mmap, there is no possible way to bypass the cache completely. This change fixes a deadlock that would happen when an mmap'd write tried to invalidate a portion of the cache, wrongly assuming that a write couldn't possibly come from cache if direct_io were set. Arguably, we could instead disable mmap for files with FOPEN_DIRECT_IO set. But allowing it is less likely to cause user complaints, and is more in keeping with the spirit of open(2), where O_DIRECT instructs the kernel to "reduce", not "eliminate" cache effects. PR: 247276 Reported by: trapexit@spawn.link Reviewed by: cem MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D26485	2020-09-24 16:27:53 +00:00
Mark Johnston	8e13d6dfb6	udf: Validate the full file entry length Otherwise a corrupted file entry containing invalid extended attribute lengths or allocation descriptor lengths can trigger an overflow when the file entry is loaded. admbug: 965 PR: 248613 Reported by: C Turt <ecturt@gmail.com> MFC after: 3 days Sponsored by: The FreeBSD Foundation	2020-09-22 17:05:01 +00:00
Rick Macklem	58dd2b52cb	Fix a LOR between the NFS server and server side krpc. Recent testing of the NFS-over-TLS code found a LOR between the mutex lock used for sessions and the sleep lock used for server side krpc socket structures in nfsrv_checksequence(). This was fixed by r365789. A similar bug exists in nfsrv_bindconnsess(), where SVC_RELEASE() is called while mutexes are held. This patch applies a fix similar to r365789, moving the SVC_RELEASE() call down to after the mutexes are released. This patch fixes the problem by moving the SVC_RELEASE() call in nfsrv_checksequence() down a few lines to below where the mutex is released. MFC after: 1 week	2020-09-18 23:52:56 +00:00
Eric van Gyzen	f9cc8410e1	vm_ooffset_t is now unsigned vm_ooffset_t is now unsigned. Remove some tests for negative values, or make other adjustments accordingly. Reported by: Coverity Reviewed by: kib markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D26214	2020-09-18 16:48:08 +00:00
Konstantin Belousov	016b7c7e39	tmpfs: restore atime updates for reads from page cache. Split TMPFS_NODE_ACCCESSED bit into dedicated byte that can be updated atomically without locks or (locked) atomics. tn_update_getattr() change also contains unrelated bug fix. Reported by: lwhsu PR: 249362 Reviewed by: markj (previous version) Discussed with: mjg Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26451	2020-09-16 21:28:18 +00:00
Konstantin Belousov	23f9071466	Style. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2020-09-16 21:24:34 +00:00
Rick Macklem	a5c55410b3	Fix a LOR between the NFS server and server side krpc. Recent testing of the NFS-over-TLS code found a LOR between the mutex lock used for sessions and the sleep lock used for server side krpc socket structures. The code in nfsrv_checksequence() would call SVC_RELEASE() with the mutex held. Normally this is ok, since all that happens is SVC_RELEASE() decrements a reference count. However, if the socket has just been shut down, SVC_RELEASE() drops the reference count to 0 and acquires a sleep lock during destruction of the server side krpc structure. This patch fixes the problem by moving the SVC_RELEASE() call in nfsrv_checksequence() down a few lines to below where the mutex is released. MFC after: 1 week	2020-09-16 02:25:18 +00:00
Konstantin Belousov	081e36e760	Add tmpfs page cache read support. Or it could be explained as lockless (for vnode lock) reads. Reads are performed from the node tn_obj object. Tmpfs regular vnode object lifecycle is significantly different from the normal OBJT_VNODE: it is alive as far as ref_count > 0. Ensure liveness of the tmpfs VREG node and consequently v_object inside VOP_READ_PGCACHE by referencing tmpfs node in tmpfs_open(). Provide custom tmpfs fo_close() method on file, to ensure that close is paired with open. Add tmpfs VOP_READ_PGCACHE that takes advantage of all tmpfs quirks. It is quite cheap in code size sense to support page-ins for read for tmpfs even if we do not own tmpfs vnode lock. Also, we can handle holes in tmpfs node without additional efforts, and do not have limitation of the transfer size. Reviewed by: markj Discussed with and benchmarked by: mjg (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26346	2020-09-15 22:19:16 +00:00
Konstantin Belousov	4601f5f5ee	Microoptimize tmpfs node ref/unref by using atomics. Avoid tmpfs mount and node locks when ref count is greater than zero, which is the case until node is being destroyed by unlink or unmount. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26346	2020-09-15 22:13:21 +00:00
Konstantin Belousov	96474d2a3f	Do not copy vp into f_data for DTYPE_VNODE files. The pointer to vnode is already stored into f_vnode, so f_data can be reused. Fix all found users of f_data for DTYPE_VNODE. Provide finit_vnode() helper to initialize file of DTYPE_VNODE type. Reviewed by: markj (previous version) Discussed with: freqlabs (openzfs chunk) Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26346	2020-09-15 21:55:21 +00:00
Rick Macklem	2848d6d4de	Fix a case where the NFSv4.0 server might crash if delegations are enabled. asomers@ reported a crash on an NFSv4.0 server with a backtrace of: kdb_backtrace vpanic panic nfsrv_docallback nfsrv_checkgetattr nfsrvd_getattr nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit fork_trampoline where the panic message was "docallb", which indicates that a callback was attempted when the ClientID is unconfirmed. This would not normally occur, but it is possible to have an unconfirmed ClientID structure with delegation structure(s) chained off it if the client were to issue a SetClientID with the same "id" but different "verifier" after acquiring delegations on the previously confirmed ClientID. The bug appears to be that nfsrv_checkgetattr() failed to check for this uncommon case of an unconfirmed ClientID with a delegation structure that no longer refers to a delegation the client knows about. This patch adds a check for this case, handling it as if no delegation exists, which is the case when the above occurs. Although difficult to reproduce, this change should avoid the panic(). PR: 249127 Reported by: asomers Reviewed by: asomers MFC after: 1 week Differential Revision: https://reviews.freebbsd.org/D26342	2020-09-14 00:44:50 +00:00
Mateusz Guzik	c86d2ba8a5	tmpfs: drop spurious cache_purge in tmpfs_reclaim vgone already performs it.	2020-09-04 19:30:15 +00:00
Mateusz Guzik	586ee69f09	fs: clean up empty lines in .c and .h files	2020-09-01 21:18:40 +00:00
Rick Macklem	4cdbb07b3c	Add a check to test for the case of the "tls" option being used with "udp". The KERN_TLS only supports TCP, so use of the "tls" option with "udp" will not work. This patch adds a test for this case, so that the mount is not attempted when both "tls" and "udp" are specified.	2020-09-01 01:10:16 +00:00
Eric van Gyzen	0bb426274e	Fix nfsrvd_locku memory leak Coverity detected memory leak fix. Submitted by: bret_ketchum@dell.com Reported by: Coverity Reviewed by: rmacklem MFC after: 2 weeks Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D26231	2020-08-31 15:31:17 +00:00
Rick Macklem	6e4b6ff88f	Add flags to enable NFS over TLS to the NFS client and server. An Internet Draft titled "Towards Remote Procedure Call Encryption By Default" (soon to be an RFC I think) describes how Sun RPC is to use TLS with NFS as a specific application case. Various commits prepared the NFS code to use KERN_TLS, mainly enabling use of ext_pgs mbufs for large RPC messages. r364475 added TLS support to the kernel RPC. This commit (which is the final one for kernel changes required to do NFS over TLS) adds support for three export flags: MNT_EXTLS - Requires a TLS connection. MNT_EXTLSCERT - Requires a TLS connection where the client presents a valid X.509 certificate during TLS handshake. MNT_EXTLSCERTUSER - Requires a TLS connection where the client presents a valid X.509 certificate with "user@domain" in the otherName field of the SubjectAltName during TLS handshake. Without these export options, clients are permitted, but not required, to use TLS. For the client, a new nmount(2) option called "tls" makes the client do a STARTTLS Null RPC and TLS handshake for all TCP connections used for the mount. The CLSET_TLS client control option is used to indicate to the kernel RPC that this should be done. Unless the above export flags or "tls" option is used, semantics should not change for the NFS client nor server. For NFS over TLS to work, the userspace daemons rpctlscd(8) { for client } or rpctlssd(8) daemon { for server } must be running.	2020-08-27 23:57:30 +00:00
Mateusz Guzik	4961e997a6	fuse: unbreak after r364814 Reported by: kevans	2020-08-26 21:13:36 +00:00
Mateusz Guzik	feabaaf995	cache: drop the always curthread argument from reverse lookup routines Note VOP_VPTOCNP keeps getting it as temporary compatibility for zfs. Tested by: pho	2020-08-24 08:57:02 +00:00
Mateusz Guzik	39f8815070	cache: add cache_rename, a dedicated helper to use for renames While here make both tmpfs and ufs use it. No fuctional changes.	2020-08-20 10:05:46 +00:00
Pedro F. Giffuni	ef20a5b58c	extfs: remove redundant little endian conversion. The XTIME_TO_NSEC macro already calls the htole32(), so there is no need to call it twice. This code does nothing on LE platforms and affects only nanosecond and birthtime fields so it's difficult to notice on regular use. Hinted by: DragonFlyBSD (git ae503f8f6f4b9a413932ffd68be029f20c38cab4) X-MFC with: r361136	2020-08-20 05:08:49 +00:00
Mateusz Guzik	8f226f4c23	vfs: remove the always-curthread td argument from VOP_RECLAIM	2020-08-19 07:28:01 +00:00
Mateusz Guzik	7ad2a82da2	vfs: drop the error parameter from vn_isdisk, introduce vn_isdisk_error Most consumers pass NULL.	2020-08-19 02:51:17 +00:00
Rick Macklem	808306dd0f	Delete the unused "use_ext" argument to nfscl_reqstart(). This is a partial revert of r363210, since the "use_ext" argument added by that commit is not actually useful. This patch should not result in any semantics change.	2020-08-18 01:41:12 +00:00
Pedro F. Giffuni	19642a0cfb	extfs: remove redundant little endian conversion. The NSEC_TO_XTIME macro already calls the htole32(), so there is no need to call it twice. This code does nothing on LE platforms and affects only nanosecond and birthtime fields so it's difficult to notice on regular use. X-MFC with: r361136	2020-08-17 15:05:41 +00:00
Konstantin Belousov	685cb01a18	VMIO reads: enable for nullfs upper vnode if the lower vnode supports it. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D25968	2020-08-16 21:05:56 +00:00
Mateusz Guzik	1abe36567f	tmpfs: use vget_prep/vget_finish instead of vget + vnode	2020-08-16 17:19:23 +00:00
Mateusz Guzik	a92a971bbb	vfs: remove the thread argument from vget It was already asserted to be curthread. Semantic patch: @@ expression arg1, arg2, arg3; @@ - vget(arg1, arg2, arg3) + vget(arg1, arg2)	2020-08-16 17:18:54 +00:00
Rick Macklem	90cf38f22e	Fix a bug introduced by r363001 for the ext_pgs case. r363001 added support for ext_pgs mbufs to nfsm_uiombuf(). By inspection, I noticed that "mlen" was not set non-zero and, as such, there would be an iteration of the loop that did nothing. This patch sets it. This bug would have no effect on the system, since the ext_pgs mbuf code is not yet enabled.	2020-08-12 04:35:49 +00:00
Conrad Meyer	0ac9e27ba9	devfs: Abstract locking assertions The conversion was largely mechanical: sed(1) with: -e 's\|mtx_assert(&devmtx, MA_OWNED)\|dev_lock_assert_locked()\|g' -e 's\|mtx_assert(&devmtx, MA_NOTOWNED)\|dev_lock_assert_unlocked()\|g' The definitions of these abstractions in fs/devfs/devfs_int.h are the only non-mechanical change. No functional change.	2020-08-12 00:32:31 +00:00
Mateusz Guzik	3b44443626	devfs: rework si_usecount to track opens This removes a lot of special casing from the VFS layer. Reviewed by: kib (previous version) Tested by: pho (previous version) Differential Revision: https://reviews.freebsd.org/D25612	2020-08-11 14:27:57 +00:00
Rick Macklem	02511d2112	Add an argument to newnfs_connect() that indicates use TLS for the connection. For NFSv4.0, the server creates a server->client TCP connection for callbacks. If the client mount on the server is using TLS, enable TLS for this callback TCP connection. TLS connections from clients will not be supported until the kernel RPC changes are committed. Since this changes the internal ABI between the NFS kernel modules that will require a version bump, delete newnfs_trimtrailing(), which is no longer used. Since LCL_TLSCB is not yet set, these changes should not have any semantic affect at this time.	2020-08-11 00:26:45 +00:00
Mateusz Guzik	03337743db	vfs: clean MNTK_FPLOOKUP if MNT_UNION is set Elides checking it during lookup.	2020-08-10 11:51:21 +00:00
Mateusz Guzik	ca423b858b	devfs: bool -> int Fixes buildworld after r364069	2020-08-10 11:46:39 +00:00
Mateusz Guzik	7b19bddac8	devfs: save on spurious relocking for devfs_populate Tested by: pho	2020-08-10 10:36:43 +00:00
Mateusz Guzik	f8935a96d1	devfs: use cheaper lockmgr entry points Tested by: pho	2020-08-10 10:36:10 +00:00
Mateusz Guzik	f9c13ab856	devfs: use vget_prep/vget_finish Tested by: pho	2020-08-10 10:35:47 +00:00
Mateusz Guzik	fc9fcee01a	nullfs: add missing VOP_STAT handling Tested by: pho	2020-08-10 10:31:17 +00:00
Mateusz Guzik	9a14439f2f	tmpfs: add VOP_STAT handler	2020-08-07 23:07:47 +00:00
Mateusz Guzik	d292b1940c	vfs: remove the obsolete privused argument from vaccess This brings argument count down to 6, which is passable without the stack on amd64.	2020-08-05 09:27:03 +00:00
Rick Macklem	cb889ce631	Add optional support for ext_pgs mbufs to the NFS server's read, readlink and getxattr operations. This patch optionally enables generation of read, readlink and getxattr replies in ext_pgs mbufs. Since neither of ND_EXTPG or ND_TLS are currently ever set, there is no change in semantics at this time. It also corrects the message in a couple of panic()s that should never occur. This is another in the series of commits that add support to the NFS client and server for building RPC messages in ext_pgs mbufs with anonymous pages. This is useful so that the entire mbuf list does not need to be copied before calling sosend() when NFS over TLS is enabled. Use of ext_pgs mbufs will not be enabled until the kernel RPC is updated to handle TLS.	2020-07-31 23:35:49 +00:00
Rick Macklem	ea83d07e82	Add support for ext_pgs mbufs to nfsrvd_readdir() and nfsrvd_readdirplus(). This patch code that optionally (based on ND_TLS, never set yet) generates readdir replies in ext_pgs mbufs. To trim the list back, a new function that is ext_pgs aware called nfsm_trimtrailing() replaces newnfs_trimtrailing(). newnfs_trimtrailing() is no longer used, but will be removed in a future commit, since its removal does modify the internal kpi between the NFS modules. This is another in the series of commits that add support to the NFS client and server for building RPC messages in ext_pgs mbufs with anonymous pages. This is useful so that the entire mbuf list does not need to be copied before calling sosend() when NFS over TLS is enabled. Use of ext_pgs mbufs will not be enabled until the kernel RPC is updated to handle TLS.	2020-07-29 22:58:08 +00:00
Rick Macklem	194d870481	Fix the NFSv4 client so that it checks for support of TimeCreate before trying to set it. r362490 added support for setting of the TimeCreate (va_birthtime) attribute, but it does so without checking to see if the server supports the attribute. This could result in NFSERR_ATTRNOTSUPP error replies to the Setattr operation. This patch adds code to check that the server supports TimeCreate before attempting to do a Setattr of it to avoid these error returns.	2020-07-26 23:13:10 +00:00
Rick Macklem	2de592f6e1	Fix the NFS server so that it sets va_birthtime. r362490 marked that the NFSv4 attribute TimeCreate (va_birthtime) is supported, but it did not change the NFS server code to actually do it. As such, errors could occur when unrolling a tarball onto an NFSv4 mounted volume, since setting TimeCreate would fail with a NFSERR_ATTRNOTSUPP reply. This patch fixes the server so that it does TimeCreate and also makes sure that TimeCreate will not be set for a DS file for a pNFS server. A separate commit will add a check to the NFSv4 client for support of the TimeCreate attribute before attempting to set it, to avoid a problem when mounting a server that does not support the attribute. The failures will still occur for r362490 or later kernels that do not have this patch, since they indicate support for the attribute, but do not actually support the attribute.	2020-07-26 23:03:41 +00:00
Rick Macklem	18a48314ba	Add support for ext_pgs mbufs to nfsrv_adj(). This patch uses a slightly different algorithm for nfsrv_adj() since ext_pgs mbuf lists are not permitted to have m_len == 0 mbufs. As such, the code now frees mbufs after the adjustment in the list instead of setting their m_len field to 0. Since mbuf(s) may be trimmed off the tail of the list, the function now returns a pointer to the last mbuf in the list. This saves the caller from needing to use m_last() to find the last mbuf. It also implies that it might return a nul list, which required a check for that in nfsrvd_readlink(). This is another in the series of commits that add support to the NFS client and server for building RPC messages in ext_pgs mbufs with anonymous pages. This is useful so that the entire mbuf list does not need to be copied before calling sosend() when NFS over TLS is enabled. Use of ext_pgs mbufs will not be enabled until the kernel RPC is updated to handle TLS.	2020-07-26 02:42:09 +00:00
Mateusz Guzik	172ffe702c	tmpfs: add support for lockless lookup Reviewed by: kib Tested by: pho (in a patchset) Differential Revision: https://reviews.freebsd.org/D25580	2020-07-25 10:38:44 +00:00
Rick Macklem	cfaafa7908	Add support for ext_pgs mbufs to nfsm_uiombuflist() and nfsm_split(). This patch uses a slightly different algorithm for nfsm_uiombuflist() for the non-ext_pgs case, where a variable called "mcp" is maintained, pointing to the current location that mbuf data can be filled into. This avoids use of mtod(mp, char *) + mp->m_len to calculate the location, since this does not work for ext_pgs mbufs and I think it makes the algorithm more readable. This change should not result in semantic changes for the non-ext_pgs case. The patch also deletes come unneeded code. It also adds support for anonymous page ext_pgs mbufs to nfsm_split(). This is another in the series of commits that add support to the NFS client and server for building RPC messages in ext_pgs mbufs with anonymous pages. This is useful so that the entire mbuf list does not need to be copied before calling sosend() when NFS over TLS is enabled. At this time for this case, use of ext_pgs mbufs cannot be enabled, since ktls_encrypt() replaces the unencrypted data with encrypted data in place. Until such time as this can be enabled, there should be no semantic change. Also, note that this code is only used by the NFS client for a mirrored pNFS server.	2020-07-24 23:17:09 +00:00
Mark Johnston	cbef26ed16	cuse: Stop checking for failures from malloc(M_WAITOK). PR: 240545 Submitted by: Andrew Reiter <arr@watson.org> Reviewed by: hselasky MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D25765	2020-07-23 14:03:37 +00:00
Rick Macklem	9516bcdfb4	Modify writing to mirrored pNFS DSs to prepare for use of ext_pgs mbufs. This patch modifies writing to mirrored pNFS DSs slightly so that there is only one m_copym() call for a mirrored pair instead of two of them. This call replaces the custom nfsm_copym() call, which is no longer needed and deleted by this patch. The patch does introduce a new nfsm_split() function that only calls m_split() for the non-ext_pgs case. The semantics of nfsm_uiombuflist() is changed to include code that nul pads the generated mbuf list. This was done by nfsm_copym() prior to this patch. The main reason for this change is that it allows the data to be a list of ext_pgs mbufs, since the m_copym() is for the entire mbuf list. This support will be added in a future commit. This patch only affects writing to mirrored flexible file layout pNFS servers.	2020-07-22 23:33:37 +00:00
Alexander V. Chernikov	e1c05fd290	Transition from rtrequest1_fib() to rib_action(). Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib, in6_rtrequest, rtrequest_fib> and their uses and switch to to rib_action(). This is part of the new routing KPI. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25546	2020-07-21 19:56:13 +00:00
Mark Johnston	39bc40e3d2	ext2fs: Stop checking for failures from malloc(M_WAITOK). PR: 240545 Submitted by: Andrew Reiter <arr@watson.org> Reviewed by: fsu MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D25707	2020-07-20 14:28:26 +00:00
Alexander V. Chernikov	725871230d	Temporarly revert r363319 to unbreak the build. Reported by: CI Pointy hat to: melifaro	2020-07-19 10:53:15 +00:00
Alexander V. Chernikov	8cee15d9e4	Transition from rtrequest1_fib() to rib_action(). Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib, in6_rtrequest, rtrequest_fib> and their uses and switch to to rib_action(). This is part of the new routing KPI. Submitted by: Neel Chauhan <neel AT neelc DOT org> Differential Revision: https://reviews.freebsd.org/D25546	2020-07-19 09:29:27 +00:00
Rick Macklem	7477442fdd	Fix the pNFS flexible file layout client for servers with small write size. The code in nfscl_dofflayout() loops when a flexible file layout server provides a small write data limit (no extant server is known to do this). If/when it looped, it erroneously reused the "drpc" argument for the mirror worker thread, corrupting it. This patch fixes the problem by only using the calling thread after the first loop iteration. Found during testing by simulating a server with a small write size. Since no extant pNFS server is known to provide a small write size, this fix it not needed in practice at this time. MFC after: 2 weeks	2020-07-15 01:26:28 +00:00
Rick Macklem	6722f6e577	Minor code cleanup that removes "nd->nd_bpos = mcp;" in both if and else. The statement "nd->nd_bpos = mcp;" was in both the if and else. Correct, but potentially confusing. This patch fixes this. There should be no semantics change caused by this commit.	2020-07-13 01:28:45 +00:00
Rick Macklem	3eaf03766e	Add support for ext_pgs mbufs to nfsm_uiombuf(). This patch uses a slightly different algorithm for the non-ext_pgs case, where a variable called "mcp" is maintained, pointing to the current location that mbuf data can be filled into. This avoids use of mtod(mp, char *) + mp->m_len to calculate the location, since this does not work for ext_pgs mbufs and I think it makes the algorithm more readable. This change should not result in semantic changes for the non-ext_pgs case. This is another in the series of commits that add support to the NFS client and server for building RPC messages in ext_pgs mbufs with anonymous pages. This is useful so that the entire mbuf list does not need to be copied before calling sosend() when NFS over TLS is enabled. Since ND_EXTPG is never set yet, there is no semantic change at this time.	2020-07-08 02:28:08 +00:00
Rick Macklem	022346fa62	Add support for ext_pgs mbufs to nfsrvd_rephead(). This is another in the series of commits that add support to the NFS client and server for building RPC messages in ext_pgs mbufs with anonymous pages. This is useful so that the entire mbuf list does not need to be copied before calling sosend() when NFS over TLS is enabled. Since ND_EXTPG is never set yet, there is no semantic change at this time.	2020-07-07 00:42:23 +00:00
Rick Macklem	34fc29e0c9	Add support for ext_pgs mbufs to nfsm_strtom(). Also, add a new function nfsm_add_ext_pgs() which will either add a page or add a new ext_pgs mbuf with a page to the mbuf list. Used by nfsm_strtom(). This is another in the series of commits that add support to the NFS client and server for building RPC messages in ext_pgs mbufs with anonymous pages. This is useful so that the entire mbuf list does not need to be copied before calling sosend() when NFS over TLS is enabled. Since ND_EXTPG is never set yet, there is no semantic change at this time.	2020-07-05 21:55:16 +00:00
Mateusz Guzik	11c345b18f	devfs: fix a vnode use-after-free in devfs_ioctl The vnode to be replaced was read with a shared lock, meaning 2 racing threads can find the same one. While here clean it up a little bit.	2020-07-04 06:27:28 +00:00
Rick Macklem	dccb580624	Add support for ext_pgs mbufs to nfscl_reqstart() and nfsm_set(). This is another in the series of commits that add support to the NFS client and server for building RPC messages in ext_pgs mbufs with anonymous pages. This is useful so that the entire mbuf list does not need to be copied before calling sosend() when NFS over TLS is enabled. Since ND_EXTPG is never set yet, there is no semantic change at this time.	2020-07-04 03:28:13 +00:00
Rick Macklem	606007409c	Fix build breakage caused by r362903. Only pmap.h is needed now, but vm_page.h and vm_pageout.h is needed later, so put them in now. Pointy hat goes on me.	2020-07-03 05:21:05 +00:00
Rick Macklem	2da1527844	Add support for ext_pgs mbufs to nfsm_build(). This is the first of a series of commits that add support to the NFS client and server for building RPC messages in ext_pgs mbufs with anonymous pages. This is useful so that the entire mbuf list does not need to be copied before calling sosend() when NFS over TLS is enabled. Since ND_EXTPG is never set yet, there is no semantic change at this time.	2020-07-03 01:19:29 +00:00
Rick Macklem	4476c1def0	Add a boolean argument to nfscl_reqstart() to indicate that ext_pgs mbufs should be used. For KERN_TLS (and possibly some other future network interface) the mbuf list passed into sosend() must be ext_pgs mbufs. The krpc could simply copy all the mbuf data into ext_pgs mbufs before calling sosend(), but that would be inefficient for large RPC messages. This patch adds an argument to nfscl_reqstart() to indicate that it should fill the RPC message into ext_pgs mbufs. It also adds fields to "struct nfsrv_descript" needed for building NFS RPC messages in ext_pgs mbufs, along with new flags for this. Since the argument is always "false", this commit should not result in any semantic change. However, this commit prepares the code for future commits that will add support for building of NFS RPC messages in ext_pgs mbufs.	2020-06-26 03:11:54 +00:00
Mark Johnston	84242cf68a	Call swap_pager_freespace() from vm_object_page_remove(). All vm_object_page_remove() callers, except linux_invalidate_mapping_pages() in the LinuxKPI, free swap space when removing a range of pages from an object. The LinuxKPI case appears to be an unintentional omission that could result in leaked swap blocks, so unconditionally free swap space in vm_object_page_remove() to protect against similar bugs in the future. Reviewed by: alc, kib Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25329	2020-06-25 15:21:21 +00:00
Doug Rabson	c07782e10e	Add some missing parts for supporting va_birthtime. Reviewed by: rmacklem	2020-06-22 08:23:16 +00:00
Thomas Munro	f270658873	vfs: track sequential reads and writes separately For software like PostgreSQL and SQLite that sometimes reads sequentially while also writing sequentially some distance behind with interleaved syscalls on the same fd, performance is better on UFS if we do sequential access heuristics separately for reads and writes. Patch originally by Andrew Gierth in 2008, updated and proposed by me with his permission. Reviewed by: mjg, kib, tmunro Approved by: mjg (mentor) Obtained from: Andrew Gierth <andrew@tao11.riddles.org.uk> Differential Revision: https://reviews.freebsd.org/D25024	2020-06-21 08:51:24 +00:00
Alan Somers	eea79fde5a	Remove vfs_statfs and vnode_mount macros from NFS These macro definitions are no longer needed as the NFS OSX port is long dead. The vfs_statfs macro conflicts with the vfsops field of the same name. Submitted by: shivank@ Reviewed by: rmacklem MFC after: 2 weeks Sponsored by: Google, Inc. (GSoC 2020) Differential Revision: https://reviews.freebsd.org/D25263	2020-06-17 16:20:19 +00:00
Doug Rabson	3900c11481	Add support for the timecreate attribute This maps to the va_birthtime VFS attribute.	2020-06-14 11:41:57 +00:00
Rick Macklem	1f7104d720	Fix export_args ex_flags field so that is 64bits, the same as mnt_flags. Since mnt_flags was upgraded to 64bits there has been a quirk in "struct export_args", since it hold a copy of mnt_flags in ex_flags, which is an "int" (32bits). This happens to currently work, since all the flag bits used in ex_flags are defined in the low order 32bits. However, new export flags cannot be defined. Also, ex_anon is a "struct xucred", which limits it to 16 additional groups. This patch revises "struct export_args" to make ex_flags 64bits and replaces ex_anon with ex_uid, ex_ngroups and ex_groups (which points to a groups list, so it can be malloc'd up to NGROUPS in size. This requires that the VFS_CHECKEXP() arguments change, so I also modified the last "secflavors" argument to be an array pointer, so that the secflavors could be copied in VFS_CHECKEXP() while the export entry is locked. (Without this patch VFS_CHECKEXP() returns a pointer to the secflavors array and then it is used after being unlocked, which is potentially a problem if the exports entry is changed. In practice this does not occur when mountd is run with "-S", but I think it is worth fixing.) This patch also deleted the vfs_oexport_conv() function, since do_mount_update() does the conversion, as required by the old vfs_cmount() calls. Reviewed by: kib, freqlabs Relnotes: yes Differential Revision: https://reviews.freebsd.org/D25088	2020-06-14 00:10:18 +00:00
Ryan Moeller	693d10a291	tmpfs: Preserve alignment of struct fid fields On 64-bit platforms, the two short fields in `struct tmpfs_fid` are padded to the 64-bit alignment of the long field. This pushes the offsets of the subsequent fields by 4 bytes and makes `struct tmpfs_fid` bigger than `struct fid`. `tmpfs_vptofh()` casts a `struct fid ` to `struct tmpfs_fid `, causing 4 bytes of adjacent memory to be overwritten when the struct fields are set. Through several layers of indirection and embedded structs, the adjacent memory for one particular call to `tmpfs_vptofh()` happens to be the stack canary for `nfsrvd_compound()`. Half of the canary ends up being clobbered, going unnoticed until eventually the stack check fails when `nfsrvd_compound()` returns and a panic is triggered. Instead of duplicating fields of `struct fid` in `struct tmpfs_fid`, narrow the struct to cover only the unique fields for tmpfs and assert at compile time that the struct fits in the allotted space. This way we don't have to replicate the offsets of `struct fid` fields, we just use them directly. Reviewed by: kib, mav, rmacklem Approved by: mav (mentor) MFC after: 1 week Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D25077	2020-06-03 09:38:51 +00:00
Alexander V. Chernikov	9d5df78e64	Fix NOINET6 build broken by r361575. Reported by: ci, hps	2020-05-28 09:52:28 +00:00
Alexander V. Chernikov	c74ce5cca3	Make NFS address selection use fib4_lookup(). fib4_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-stack data copying. Switch call to use new fib4_lookup(), allowing to eventually deprecate old api. Differential Revision: https://reviews.freebsd.org/D24977	2020-05-28 07:35:07 +00:00
Conrad Meyer	852c303b61	copystr(9): Move to deprecate (attempt #2 ) This reapplies logical r360944 and r360946 (reverting r360955), with fixed copystr() stand-in replacement macro. Eventually the goal is to convert consumers and kill the macro, but for a first step it helps if the macro is correct. Prior commit message: Unlike the other copy*() functions, it does not serve to copy from one address space to another or protect against potential faults. It's just an older incarnation of the now-more-common strlcpy(). Add a coccinelle script to tools/ which can be used to mechanically convert existing instances where replacement with strlcpy is trivial. In the two cases which matched, fuse_vfsops.c and union_vfsops.c, the code was further refactored manually to simplify. Replace the declaration of copystr() in systm.h with a small macro wrapper around strlcpy (with correction from brooks@ -- thanks). Remove N redundant MI implementations of copystr. For MIPS, this entailed inlining the assembler copystr into the only consumer, copyinstr, and making the latter a leaf function. Reviewed by: jhb (earlier version) Discussed with: brooks (thanks!) Differential Revision: https://reviews.freebsd.org/D24672	2020-05-25 16:40:48 +00:00
Alexander V. Chernikov	2bbab0af6d	Use epoch(9) for rtentries to simplify control plane operations. Currently the only reason of refcounting rtentries is the need to report the rtable operation details immediately after the execution. Delaying rtentry reclamation allows to stop refcounting and simplify the code. Additionally, this change allows to reimplement rib_lookup_info(), which is used by some of the customers to get the matching prefix along with nexthops, in more efficient way. The change keeps per-vnet rtzone uma zone. It adds nh_vnet field to nhop_priv to be able to reliably set curvnet even during vnet teardown. Rest of the reference counting code will be removed in the D24867 . Differential Revision: https://reviews.freebsd.org/D24866	2020-05-23 10:21:02 +00:00
Alan Somers	bfcb817bcd	Fix issues with FUSE_ACCESS when default_permissions is disabled This patch fixes two issues relating to FUSE_ACCESS when the default_permissions mount option is disabled: * VOP_ACCESS() calls with VADMIN set should never be sent to a fuse server in the form of FUSE_ACCESS operations. The FUSE protocol has no equivalent of VADMIN, so we must evaluate such things kernel-side, regardless of the default_permissions setting. * The FUSE protocol only requires FUSE_ACCESS to be sent for two purposes: for the access(2) syscall and to check directory permissions for searchability during lookup. FreeBSD sends it much more frequently, due to differences between our VFS and Linux's, for which FUSE was designed. But this patch does eliminate several cases not required by the FUSE protocol: * for any FUSE_XATTR operation when creating a new file * when deleting a file * when setting timestamps, such as by utimensat(2). * Additionally, when default_permissions is disabled, this patch removes one FUSE_GETATTR operation when deleting a file. PR: 245689 Reported by: MooseFS FreeBSD Team <freebsd@moosefs.pro> Reviewed by: cem MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D24777	2020-05-22 18:11:17 +00:00
Alan Somers	7096c29e5b	Disable nullfs cacheing on top of fusefs Nullfs cacheing can keep a large number of vnodes active. That results in more active FUSE file handles, causing some FUSE servers to use extra resources. Disable nullfs cacheing for fusefs, just like we already do for NFSv4. PR: 245688 Reported by: MooseFS FreeBSD Team <freebsd@moosefs.pro> MFC after: 2 weeks	2020-05-22 18:03:14 +00:00
Ryan Moeller	245bfd34da	Deduplicate fsid comparisons Comparing fsid_t objects requires internal knowledge of the fsid structure and yet this is duplicated across a number of places in the code. Simplify by creating a fsidcmp function (macro). Reviewed by: mjg, rmacklem Approved by: mav (mentor) MFC after: 1 week Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D24749	2020-05-21 01:55:35 +00:00
Rick Macklem	3d7650f04c	Add a function nfsm_set() to initialize "struct nfsrv_descript" for building mbuf lists. This function is currently trivial, but will that will change when support for building NFS messages in ext_pgs mbufs is added. Adding support for ext_pgs mbufs is needed for KERN_TLS, which will be used to implement nfs-over-tls.	2020-05-18 00:07:45 +00:00
Fedor Uporov	cd3acfe7f3	Add BE architectures support. Author of most initial version: pfg (https://reviews.freebsd.org/D23259) Reviewed by: pfg MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D24685	2020-05-17 14:52:54 +00:00
Fedor Uporov	4bd6d63dc5	Restrict the max runp and runb return values in case of extents mapping. This restriction already present in case of indirect mapping, do the same in case of extents. PR: 246182 Reported by: Teran McKinney MFC after: 2 weeks	2020-05-17 14:10:46 +00:00
Fedor Uporov	86e2d48bf9	Fix incorrect inode link count check in case of rename. The check was incorrect because the directory inode link count have min value 2 after dir_nlink extfs feature introduction.	2020-05-17 14:03:13 +00:00
Fedor Uporov	ec81c9cc06	Add inode bitmap tail initialization. Make ext2fs compatible with changes introduced in e2fsprogs v1.45.2. Now the tail of inode bitmap is filled with 0xff pattern explicitly during bitmap initialization phase to avoid e2fsck error like: "Padding at end of inode bitmap is not set."	2020-05-17 14:00:54 +00:00
John Baldwin	07a34ce381	Remove unused header for DES. The NFS port doesn't use any of the DES functions.	2020-05-13 18:35:02 +00:00
Ryan Moeller	b9cc3262bc	nfs: Remove APPLESTATIC macro It is no longer useful. Reviewed by: rmacklem Approved by: mav (mentor) MFC after: 1 week Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D24811	2020-05-12 13:23:25 +00:00
Conrad Meyer	051fc58cb3	Revert r360944 and r360946 until reported issues can be resolved Reported by: cy	2020-05-12 04:34:26 +00:00
Conrad Meyer	580744621f	copystr(9): Move to deprecate [2/2] Unlike the other copy*() functions, it does not serve to copy from one address space to another or protect against potential faults. It's just an older incarnation of the now-more-common strlcpy(). Add a coccinelle script to tools/ which can be used to mechanically convert existing instances where replacement with strlcpy is trivial. In the two cases which matched, fuse_vfsops.c and union_vfsops.c, the code was further refactored manually to simplify. Replace the declaration of copystr() in systm.h with a small macro wrapper around strlcpy. Remove N redundant MI implementations of copystr. For MIPS, this entailed inlining the assembler copystr into the only consumer, copyinstr, and making the latter a leaf function. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D24672	2020-05-11 22:57:21 +00:00
Alan Somers	9d4e48aebf	fusefs: better dtrace probes for asynchronous invalidation operations MFC after: 2 weeks	2020-05-08 22:26:52 +00:00
Ryan Moeller	32033b3d30	Remove APPLEKEXT ifndefs They are no longer useful. Reviewed by: rmacklem Approved by: mav (mentor) MFC after: 1 week Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D24752	2020-05-08 14:39:38 +00:00
Rick Macklem	04d6c514b0	Delete unused function newnfs_trimleading. The NFS function called newnfs_trimleading() has not been used by the code in long time. To give you a clue, it still had a K&R style function declaration. Delete it, since it is just cruft, as a part of the NFS mbuf handling cleanup in preparation for adding ext_pgs mbuf support. The ext_pgs mbuf support for the build/send side is needed by nfs-over-tls.	2020-05-06 00:44:03 +00:00
Rick Macklem	3973ef1dfc	Revert r360514, to avoid unnecessary churn of the sources. r360514 prepared the NFS code for changes to handle ext_pgs mbufs on the receive side. However, at this time, KERN_TLS does not pass ext_pgs mbufs up through soreceive(). As such, as this time, only the send/build side of the NFS mbuf code needs to handle ext_pgs mbufs. Revert r360514 since the rather extensive changes required for receive side ext_pgs mbufs are not yet needed. This avoids unnecessary churn of the sources.	2020-05-05 00:58:03 +00:00
Rick Macklem	0c9cd5cacd	Factor some code out of nfsm_dissct() into separate functions. Factoring some of the code in nfsm_dissct() out into separate functions allows these functions to be used elsewhere in the NFS mbuf handling code. Other uses of these functions will be done in future commits. It also makes it easier to add support for ext_pgs mbufs, which is needed for nfs-over-tls under development in base/projects/nfs-over-tls. Although the algorithm in nfsm_dissct() is somewhat re-written by this patch, the semantics of nfsm_dissct() should not have changed.	2020-05-01 00:36:14 +00:00
Rick Macklem	5ecf33c6c4	Get rid of uio_XXX macros used for the Mac OS/X port. The NFS code had a bunch of Mac OS/X accessor functions named uio_XXX left over from the port to Mac OS/X. Since that port is long forgotten, replace the calls with the code generated by the FreeBSD macros for these in nfskpiport.h. This allows the macros to be deleted from nfskpiport.h and I think makes the code more readable. This patch should not result in any semantic change.	2020-04-28 02:11:02 +00:00
Mark Johnston	9b22722423	Call pipeselwakeup() after toggling PIPE_EOF. This ensures that pipe_poll() and the pipe kqueue filters observe PIPE_EOF and set EV_EOF accordingly. As a result an extra call to knote() after setting PIPE_EOF is unnecessary. Submitted by: Jan Kokemüller <jan.kokemueller@gmail.com> MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D24528	2020-04-27 15:59:07 +00:00
Rick Macklem	e4a458bb1b	Remove Mac OS/X macros that did nothing for FreeBSD. The macros CAST_USER_ADDR_T() and CAST_DOWN() were used for the Mac OS/X port. The first of these macros was a no-op for FreeBSD and the second is no longer used. This patch gets rid of them. It also deletes the "mbuf_t" typedef which is no longer used in the FreeBSD code from nfskpiport.h This patch should not change semantics.	2020-04-25 02:18:59 +00:00
Rick Macklem	897d7d45ba	Make the NFSv4.n client's recovery from NFSERR_BADSESSION RFC5661 conformant. RFC5661 specifies that a client's recovery upon receipt of NFSERR_BADSESSION should first consist of a CreateSession operation using the extant ClientID. If that fails, then a full recovery beginning with the ExchangeID operation is to be done. Without this patch, the FreeBSD client did not attempt the CreateSession operation with the extant ClientID and went directly to a full recovery beginning with ExchangeID. I have had this patch several years, but since no extant NFSv4.n server required the CreateSession with extant ClientID, I have never committed it. I an committing it now, since I suspect some future NFSv4.n server will require this and it should not negatively impact recovery for extant NFSv4.n servers, since they should all return NFSERR_STATECLIENTID for this first CreateSession. The patched client has been tested for recovery against both the FreeBSD and Linux NFSv4.n servers and no problems have been observed. MFC after: 1 month	2020-04-22 21:00:14 +00:00
Edward Tomasz Napierala	d499502db7	Silence down a warning which should really be a debug message. MFC after: 2 weeks Sponsored by: DARPA	2020-04-21 13:57:51 +00:00
Rick Macklem	ae070589d3	Replace all instances of the typedef mbuf_t with "struct mbuf ". The typedef mbuf_t was used for the Mac OS/X port of the code long ago. Since this port is no longer used and the use of mbuf_t obscures what the code does (and is not consistent with style(9)), it is no longer needed. This patch replaces all instances of mbuf_t with "struct mbuf ", so that it is no longer used. This patch should not result in any semantic change.	2020-04-17 21:17:51 +00:00
Rick Macklem	82164bdd76	Add a sanity check for nes_numsecflavor to the NFS server. Ryan Moeller reported crashes in the NFS server that appear to be caused by stack corruption in nfsrv_compound(). It appears that the stack got corrupted just after a NFSv4.1 Lookup that crosses a server mount point. Although it is just a "theory" at this point, the most obvious way the stack could get corrupted would be if nfsvno_checkexp() somehow acquires an export with a bogus nes_numsecflavor value. This would cause the copying of the secflavors to run off the end of the array, which is allocated on the stack below where the corruption occurs. This sanity check is simple to do and would stop the stack corruption if the theory is correct. Otherwise, doing the sanity check seems to be a reasonable safety belt to add to the code. Reported by: freqlabs MFC after: 2 weeks	2020-04-17 02:21:46 +00:00
Rick Macklem	0bda1ddd33	Fix the NFSv4.2 extended attribute support for remove extended attrbute. I missed the "atomic" field of the RemoveExtendedAttribute operation's reply when I implemented it. It worked between FreeBSD client and server, since it was missed for both, but it did not conform to RFC 8276. This patch adds the field for both client and server. Thanks go to Frank for doing interoperability testing of the extended attribute support against patches for Linux. Submitted by: Frank van der Linden <fllinden@amazon.com> Reported by: Frank van der Linden <fllinden@amazon.com>	2020-04-15 21:27:52 +00:00
Rick Macklem	fb8ed4c5f8	Fix the NFSv2 extended attribute support to handle 0 length attributes. I did not realize that zero length attributes are allowed, but they are. This patch fixes the NFSv4.2 client and server to handle zero length extended attributes correctly. Submitted by: Frank van der Linden <fllinden@amazon.com> (earlier version) Reported by: Frank van der Linden <fllinder@amazon.com>	2020-04-14 22:57:21 +00:00
Rick Macklem	9897e357de	Re-organize the NFS file handle affinity code for the NFS server. The file handle affinity code was configured to be used by both the old and new NFS servers. This no longer makes sense, since there is only one NFS server. This patch copies a majority of the code in sys/nfs/nfs_fha.c and sys/nfs/nfs_fha.h into sys/fs/nfsserver/nfs_fha_new.c and sys/fs/nfsserver/nfs_fha_new.h, so that the files in sys/nfs can be deleted. The code is simplified by deleting the function callback pointers used to call functions in either the old or new NFS server and they were replaced by calls to the functions. As well as a cleanup, this re-organization simplifies the changes required for handling of external page mbufs, which is required for KERN_TLS. This patch should not result in a semantic change to file handle affinity.	2020-04-14 00:01:26 +00:00
Rick Macklem	66ea9219a2	Delete the mbuf macros that were used for the Mac OS/X port. When the code was ported to Mac OS/X, mbuf handling functions were converted to using the Mac OS/X accessor functions. For FreeBSD, they are a simple set of macros in sys/fs/nfs/nfskpiport.h. Since r359757, r359780, r359785, r359810, r359811 have removed all uses of these macros, this patch deleted the macros from the .h files. My eventual goal is deleting nfskpiport.h, but that will take some more editting to replace uses of the remaining macros.	2020-04-13 00:07:37 +00:00
Rick Macklem	e3e7c612f3	Replace mbuf macros with the code they would generate in the NFS code. When the code was ported to Mac OS/X, mbuf handling functions were converted to using the Mac OS/X accessor functions. For FreeBSD, they are a simple set of macros in sys/fs/nfs/nfskpiport.h. Since porting to Mac OS/X is no longer a consideration, replacement of these macros with the code generated by them makes the code more readable. When support for external page mbufs is added as needed by the KERN_TLS, the patch becomes simpler if done without the macros. This patch should not result in any semantic change. This is the final patch of this series and the macros should now be able to be deleted from the .h files in a future commit.	2020-04-11 23:37:58 +00:00
Rick Macklem	9f6624d317	Replace mbuf macros with the code they would generate in the NFS code. When the code was ported to Mac OS/X, mbuf handling functions were converted to using the Mac OS/X accessor functions. For FreeBSD, they are a simple set of macros in sys/fs/nfs/nfskpiport.h. Since porting to Mac OS/X is no longer a consideration, replacement of these macros with the code generated by them makes the code more readable. When support for external page mbufs is added as needed by the KERN_TLS, the patch becomes simpler if done without the macros. This patch should not result in any semantic change.	2020-04-11 20:57:15 +00:00
Rick Macklem	3133bbf7a4	Replace mbuf macros with the code they would generate in the NFS code. When the code was ported to Mac OS/X, mbuf handling functions were converted to using the Mac OS/X accessor functions. For FreeBSD, they are a simple set of macros in sys/fs/nfs/nfskpiport.h. Since porting to Mac OS/X is no longer a consideration, replacement of these macros with the code generated by them makes the code more readable. When support for external page mbufs is added as needed by the KERN_TLS, the patch becomes simpler if done without the macros. This patch should not result in any semantic change. This conversion will be committed one file at a time.	2020-04-10 22:42:14 +00:00
Rick Macklem	28e8046b2e	Replace mbuf macros with the code they would generate in the NFS code. When the code was ported to Mac OS/X, mbuf handling functions were converted to using the Mac OS/X accessor functions. For FreeBSD, they are a simple set of macros in sys/fs/nfs/nfskpiport.h. Since porting to Mac OS/X is no longer a consideration, replacement of these macros with the code generated by them makes the code more readable. When support for external page mbufs is added as needed by the KERN_TLS, the patch becomes simpler if done without the macros. This patch should not result in any semantic change. This conversion will be committed one file at a time.	2020-04-10 21:25:35 +00:00
Rick Macklem	c948a17a52	Replace mbuf macros with the code they would generate in the NFS code. When the code was ported to Mac OS/X, mbuf handling functions were converted to using the Mac OS/X accessor functions. For FreeBSD, they are a simple set of macros in sys/fs/nfs/nfskpiport.h. Since porting to Mac OS/X is no longer a consideration, replacement of these macros with the code generated by them makes the code more readable. When support for external page mbufs is added as needed by the KERN_TLS, the patch becomes simpler if done without the macros. This patch should not result in any semantic change. This conversion will be committed one file at a time.	2020-04-09 23:11:19 +00:00
Rick Macklem	8de97f394e	Remove the old NFS lock device driver that uses Giant. This NFS lock device driver was replaced by the kernel NLM around FreeBSD7 and has not normally been used since then. To use it, the kernel had to be built without "options NFSLOCKD" and the nfslockd.ko had to be deleted as well. Since it uses Giant and is no longer used, this patch removes it. With this device driver removed, there is now a lot of unused code in the userland rpc.lockd. That will be removed on a future commit. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D22933	2020-04-09 14:44:46 +00:00
Rick Macklem	b0b7d978b6	Fix an interoperability issue w.r.t. the Linux client and the NFSv4 server. Luoqi Chen reported a problem on freebsd-fs@ where a Linux NFSv4 client was able to open and write to a file when the file's permissions were not set to allow the owner write access. Since NFS servers check file permissions on every write RPC, it is standard practice to allow the owner of the file to do writes, regardless of file permissions. This provides POSIX like behaviour, since POSIX only checks permissions upon open(2). The traditional way NFS clients handle this is to check access via the Access operation/RPC and use that to determine if an open(2) on the client is allowed. It appears that, for NFSv4, the Linux client expects the NFSv4 Open (not a POSIX open) operation to fail with NFSERR_ACCES if the file is not being created and file permissions do not allow owner access, unlike NFSv3. Since both the Linux and OpenSolaris NFSv4 servers seem to exhibit this behaviour, this patch changes the FreeBSD NFSv4 server to do the same. A sysctl called vfs.nfsd.v4openaccess can be set to 0 to return the NFSv4 server to its previous behaviour. Since both the Linux and FreeBSD NFSv4 clients seem to exhibit correct behaviour with the access check for file owner in Open enabled, it is enabled by default. Reported by: luoqi.chen@gmail.com MFC after: 2 weeks	2020-04-08 01:12:54 +00:00
Rick Macklem	76fd19b0a2	Fix noisy NFSv4 server printf. Peter reported that his dmesg was getting cluttered with nfsrv_cache_session: no session messages when he rebooted his NFS server and they did not seem useful. He was correct, in that these messages are "normal" and expected when NFSv4.1 or NFSv4.2 are mounted and the server is rebooted. This patch silences the printf() during the grace period after a reboot. It also adds the client IP address to the printf(), so that the message is more useful if/when it occurs. If this happens outside of the server's grace period, it does indicate something is not working correctly. Instead of adding yet another nd_XXX argument, the arguments for nfsrv_cache_session() were simplified to take a "struct nfsrv_descript *". Reported by: pen@lysator.liu.se MFC after: 2 weeks	2020-04-06 23:21:39 +00:00
John Baldwin	59838c1a19	Retire procfs-based process debugging. Modern debuggers and process tracers use ptrace() rather than procfs for debugging. ptrace() has a supserset of functionality available via procfs and new debugging features are only added to ptrace(). While the two debugging services share some fields in struct proc, they each use dedicated fields and separate code. This results in extra complexity to support a feature that hasn't been enabled in the default install for several years. PR: 244939 (exp-run) Reviewed by: kib, mjg (earlier version) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D23837	2020-04-01 19:22:09 +00:00
Hans Petter Selasky	98029019b6	Fine grain locking inside the cuse(3) kernel module. Implement one mutex per cuse(3) server instance which also cover the clients belonging to the given server instance. This should significantly reduce the mutex congestion inside the cuse(3) kernel module when multiple servers are in use. MFC after: 1 week Sponsored by: Mellanox Technologies	2020-03-30 18:25:43 +00:00
Alan Somers	9338f18965	fusefs: add a dtrace probe that fires after mounting is complete This probe is useful for showing the protocol options negotiated with a FUSE server. MFC after: 2 weeks	2020-03-30 14:03:35 +00:00
Mark Johnston	355b3b7fd7	Simplify td_ucred handling in newnfs_connect(). No functional change intended. MFC after: 1 week	2020-03-26 15:02:56 +00:00
John Baldwin	8d8a74e69e	Mark procfs-based process debugging as deprecated for FreeBSD 13. Attempting to use ioctls on /proc/<pid>/mem to control a process will trigger warnings on the console. The <sys/pioctl.h> include file will also now emit a compile-time warning when used from userland. Reviewed by: emaste MFC after: 1 week Relnotes: yes Differential Revision: https://reviews.freebsd.org/D23822	2020-03-17 18:44:03 +00:00

... 3 4 5 6 7 ...

4707 Commits