freebsd-skq

Author	SHA1	Message	Date
Rick Macklem	3973ef1dfc	Revert r360514, to avoid unnecessary churn of the sources. r360514 prepared the NFS code for changes to handle ext_pgs mbufs on the receive side. However, at this time, KERN_TLS does not pass ext_pgs mbufs up through soreceive(). As such, as this time, only the send/build side of the NFS mbuf code needs to handle ext_pgs mbufs. Revert r360514 since the rather extensive changes required for receive side ext_pgs mbufs are not yet needed. This avoids unnecessary churn of the sources.	2020-05-05 00:58:03 +00:00
Rick Macklem	0c9cd5cacd	Factor some code out of nfsm_dissct() into separate functions. Factoring some of the code in nfsm_dissct() out into separate functions allows these functions to be used elsewhere in the NFS mbuf handling code. Other uses of these functions will be done in future commits. It also makes it easier to add support for ext_pgs mbufs, which is needed for nfs-over-tls under development in base/projects/nfs-over-tls. Although the algorithm in nfsm_dissct() is somewhat re-written by this patch, the semantics of nfsm_dissct() should not have changed.	2020-05-01 00:36:14 +00:00
Rick Macklem	5ecf33c6c4	Get rid of uio_XXX macros used for the Mac OS/X port. The NFS code had a bunch of Mac OS/X accessor functions named uio_XXX left over from the port to Mac OS/X. Since that port is long forgotten, replace the calls with the code generated by the FreeBSD macros for these in nfskpiport.h. This allows the macros to be deleted from nfskpiport.h and I think makes the code more readable. This patch should not result in any semantic change.	2020-04-28 02:11:02 +00:00
Mark Johnston	9b22722423	Call pipeselwakeup() after toggling PIPE_EOF. This ensures that pipe_poll() and the pipe kqueue filters observe PIPE_EOF and set EV_EOF accordingly. As a result an extra call to knote() after setting PIPE_EOF is unnecessary. Submitted by: Jan Kokemüller <jan.kokemueller@gmail.com> MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D24528	2020-04-27 15:59:07 +00:00
Rick Macklem	e4a458bb1b	Remove Mac OS/X macros that did nothing for FreeBSD. The macros CAST_USER_ADDR_T() and CAST_DOWN() were used for the Mac OS/X port. The first of these macros was a no-op for FreeBSD and the second is no longer used. This patch gets rid of them. It also deletes the "mbuf_t" typedef which is no longer used in the FreeBSD code from nfskpiport.h This patch should not change semantics.	2020-04-25 02:18:59 +00:00
Rick Macklem	897d7d45ba	Make the NFSv4.n client's recovery from NFSERR_BADSESSION RFC5661 conformant. RFC5661 specifies that a client's recovery upon receipt of NFSERR_BADSESSION should first consist of a CreateSession operation using the extant ClientID. If that fails, then a full recovery beginning with the ExchangeID operation is to be done. Without this patch, the FreeBSD client did not attempt the CreateSession operation with the extant ClientID and went directly to a full recovery beginning with ExchangeID. I have had this patch several years, but since no extant NFSv4.n server required the CreateSession with extant ClientID, I have never committed it. I an committing it now, since I suspect some future NFSv4.n server will require this and it should not negatively impact recovery for extant NFSv4.n servers, since they should all return NFSERR_STATECLIENTID for this first CreateSession. The patched client has been tested for recovery against both the FreeBSD and Linux NFSv4.n servers and no problems have been observed. MFC after: 1 month	2020-04-22 21:00:14 +00:00
Edward Tomasz Napierala	d499502db7	Silence down a warning which should really be a debug message. MFC after: 2 weeks Sponsored by: DARPA	2020-04-21 13:57:51 +00:00
Rick Macklem	ae070589d3	Replace all instances of the typedef mbuf_t with "struct mbuf ". The typedef mbuf_t was used for the Mac OS/X port of the code long ago. Since this port is no longer used and the use of mbuf_t obscures what the code does (and is not consistent with style(9)), it is no longer needed. This patch replaces all instances of mbuf_t with "struct mbuf ", so that it is no longer used. This patch should not result in any semantic change.	2020-04-17 21:17:51 +00:00
Rick Macklem	82164bdd76	Add a sanity check for nes_numsecflavor to the NFS server. Ryan Moeller reported crashes in the NFS server that appear to be caused by stack corruption in nfsrv_compound(). It appears that the stack got corrupted just after a NFSv4.1 Lookup that crosses a server mount point. Although it is just a "theory" at this point, the most obvious way the stack could get corrupted would be if nfsvno_checkexp() somehow acquires an export with a bogus nes_numsecflavor value. This would cause the copying of the secflavors to run off the end of the array, which is allocated on the stack below where the corruption occurs. This sanity check is simple to do and would stop the stack corruption if the theory is correct. Otherwise, doing the sanity check seems to be a reasonable safety belt to add to the code. Reported by: freqlabs MFC after: 2 weeks	2020-04-17 02:21:46 +00:00
Rick Macklem	0bda1ddd33	Fix the NFSv4.2 extended attribute support for remove extended attrbute. I missed the "atomic" field of the RemoveExtendedAttribute operation's reply when I implemented it. It worked between FreeBSD client and server, since it was missed for both, but it did not conform to RFC 8276. This patch adds the field for both client and server. Thanks go to Frank for doing interoperability testing of the extended attribute support against patches for Linux. Submitted by: Frank van der Linden <fllinden@amazon.com> Reported by: Frank van der Linden <fllinden@amazon.com>	2020-04-15 21:27:52 +00:00
Rick Macklem	fb8ed4c5f8	Fix the NFSv2 extended attribute support to handle 0 length attributes. I did not realize that zero length attributes are allowed, but they are. This patch fixes the NFSv4.2 client and server to handle zero length extended attributes correctly. Submitted by: Frank van der Linden <fllinden@amazon.com> (earlier version) Reported by: Frank van der Linden <fllinder@amazon.com>	2020-04-14 22:57:21 +00:00
Rick Macklem	9897e357de	Re-organize the NFS file handle affinity code for the NFS server. The file handle affinity code was configured to be used by both the old and new NFS servers. This no longer makes sense, since there is only one NFS server. This patch copies a majority of the code in sys/nfs/nfs_fha.c and sys/nfs/nfs_fha.h into sys/fs/nfsserver/nfs_fha_new.c and sys/fs/nfsserver/nfs_fha_new.h, so that the files in sys/nfs can be deleted. The code is simplified by deleting the function callback pointers used to call functions in either the old or new NFS server and they were replaced by calls to the functions. As well as a cleanup, this re-organization simplifies the changes required for handling of external page mbufs, which is required for KERN_TLS. This patch should not result in a semantic change to file handle affinity.	2020-04-14 00:01:26 +00:00
Rick Macklem	66ea9219a2	Delete the mbuf macros that were used for the Mac OS/X port. When the code was ported to Mac OS/X, mbuf handling functions were converted to using the Mac OS/X accessor functions. For FreeBSD, they are a simple set of macros in sys/fs/nfs/nfskpiport.h. Since r359757, r359780, r359785, r359810, r359811 have removed all uses of these macros, this patch deleted the macros from the .h files. My eventual goal is deleting nfskpiport.h, but that will take some more editting to replace uses of the remaining macros.	2020-04-13 00:07:37 +00:00
Rick Macklem	e3e7c612f3	Replace mbuf macros with the code they would generate in the NFS code. When the code was ported to Mac OS/X, mbuf handling functions were converted to using the Mac OS/X accessor functions. For FreeBSD, they are a simple set of macros in sys/fs/nfs/nfskpiport.h. Since porting to Mac OS/X is no longer a consideration, replacement of these macros with the code generated by them makes the code more readable. When support for external page mbufs is added as needed by the KERN_TLS, the patch becomes simpler if done without the macros. This patch should not result in any semantic change. This is the final patch of this series and the macros should now be able to be deleted from the .h files in a future commit.	2020-04-11 23:37:58 +00:00
Rick Macklem	9f6624d317	Replace mbuf macros with the code they would generate in the NFS code. When the code was ported to Mac OS/X, mbuf handling functions were converted to using the Mac OS/X accessor functions. For FreeBSD, they are a simple set of macros in sys/fs/nfs/nfskpiport.h. Since porting to Mac OS/X is no longer a consideration, replacement of these macros with the code generated by them makes the code more readable. When support for external page mbufs is added as needed by the KERN_TLS, the patch becomes simpler if done without the macros. This patch should not result in any semantic change.	2020-04-11 20:57:15 +00:00
Rick Macklem	3133bbf7a4	Replace mbuf macros with the code they would generate in the NFS code. When the code was ported to Mac OS/X, mbuf handling functions were converted to using the Mac OS/X accessor functions. For FreeBSD, they are a simple set of macros in sys/fs/nfs/nfskpiport.h. Since porting to Mac OS/X is no longer a consideration, replacement of these macros with the code generated by them makes the code more readable. When support for external page mbufs is added as needed by the KERN_TLS, the patch becomes simpler if done without the macros. This patch should not result in any semantic change. This conversion will be committed one file at a time.	2020-04-10 22:42:14 +00:00
Rick Macklem	28e8046b2e	Replace mbuf macros with the code they would generate in the NFS code. When the code was ported to Mac OS/X, mbuf handling functions were converted to using the Mac OS/X accessor functions. For FreeBSD, they are a simple set of macros in sys/fs/nfs/nfskpiport.h. Since porting to Mac OS/X is no longer a consideration, replacement of these macros with the code generated by them makes the code more readable. When support for external page mbufs is added as needed by the KERN_TLS, the patch becomes simpler if done without the macros. This patch should not result in any semantic change. This conversion will be committed one file at a time.	2020-04-10 21:25:35 +00:00
Rick Macklem	c948a17a52	Replace mbuf macros with the code they would generate in the NFS code. When the code was ported to Mac OS/X, mbuf handling functions were converted to using the Mac OS/X accessor functions. For FreeBSD, they are a simple set of macros in sys/fs/nfs/nfskpiport.h. Since porting to Mac OS/X is no longer a consideration, replacement of these macros with the code generated by them makes the code more readable. When support for external page mbufs is added as needed by the KERN_TLS, the patch becomes simpler if done without the macros. This patch should not result in any semantic change. This conversion will be committed one file at a time.	2020-04-09 23:11:19 +00:00
Rick Macklem	8de97f394e	Remove the old NFS lock device driver that uses Giant. This NFS lock device driver was replaced by the kernel NLM around FreeBSD7 and has not normally been used since then. To use it, the kernel had to be built without "options NFSLOCKD" and the nfslockd.ko had to be deleted as well. Since it uses Giant and is no longer used, this patch removes it. With this device driver removed, there is now a lot of unused code in the userland rpc.lockd. That will be removed on a future commit. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D22933	2020-04-09 14:44:46 +00:00
Rick Macklem	b0b7d978b6	Fix an interoperability issue w.r.t. the Linux client and the NFSv4 server. Luoqi Chen reported a problem on freebsd-fs@ where a Linux NFSv4 client was able to open and write to a file when the file's permissions were not set to allow the owner write access. Since NFS servers check file permissions on every write RPC, it is standard practice to allow the owner of the file to do writes, regardless of file permissions. This provides POSIX like behaviour, since POSIX only checks permissions upon open(2). The traditional way NFS clients handle this is to check access via the Access operation/RPC and use that to determine if an open(2) on the client is allowed. It appears that, for NFSv4, the Linux client expects the NFSv4 Open (not a POSIX open) operation to fail with NFSERR_ACCES if the file is not being created and file permissions do not allow owner access, unlike NFSv3. Since both the Linux and OpenSolaris NFSv4 servers seem to exhibit this behaviour, this patch changes the FreeBSD NFSv4 server to do the same. A sysctl called vfs.nfsd.v4openaccess can be set to 0 to return the NFSv4 server to its previous behaviour. Since both the Linux and FreeBSD NFSv4 clients seem to exhibit correct behaviour with the access check for file owner in Open enabled, it is enabled by default. Reported by: luoqi.chen@gmail.com MFC after: 2 weeks	2020-04-08 01:12:54 +00:00
Rick Macklem	76fd19b0a2	Fix noisy NFSv4 server printf. Peter reported that his dmesg was getting cluttered with nfsrv_cache_session: no session messages when he rebooted his NFS server and they did not seem useful. He was correct, in that these messages are "normal" and expected when NFSv4.1 or NFSv4.2 are mounted and the server is rebooted. This patch silences the printf() during the grace period after a reboot. It also adds the client IP address to the printf(), so that the message is more useful if/when it occurs. If this happens outside of the server's grace period, it does indicate something is not working correctly. Instead of adding yet another nd_XXX argument, the arguments for nfsrv_cache_session() were simplified to take a "struct nfsrv_descript *". Reported by: pen@lysator.liu.se MFC after: 2 weeks	2020-04-06 23:21:39 +00:00
John Baldwin	59838c1a19	Retire procfs-based process debugging. Modern debuggers and process tracers use ptrace() rather than procfs for debugging. ptrace() has a supserset of functionality available via procfs and new debugging features are only added to ptrace(). While the two debugging services share some fields in struct proc, they each use dedicated fields and separate code. This results in extra complexity to support a feature that hasn't been enabled in the default install for several years. PR: 244939 (exp-run) Reviewed by: kib, mjg (earlier version) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D23837	2020-04-01 19:22:09 +00:00
Hans Petter Selasky	98029019b6	Fine grain locking inside the cuse(3) kernel module. Implement one mutex per cuse(3) server instance which also cover the clients belonging to the given server instance. This should significantly reduce the mutex congestion inside the cuse(3) kernel module when multiple servers are in use. MFC after: 1 week Sponsored by: Mellanox Technologies	2020-03-30 18:25:43 +00:00
Alan Somers	9338f18965	fusefs: add a dtrace probe that fires after mounting is complete This probe is useful for showing the protocol options negotiated with a FUSE server. MFC after: 2 weeks	2020-03-30 14:03:35 +00:00
Mark Johnston	355b3b7fd7	Simplify td_ucred handling in newnfs_connect(). No functional change intended. MFC after: 1 week	2020-03-26 15:02:56 +00:00
John Baldwin	8d8a74e69e	Mark procfs-based process debugging as deprecated for FreeBSD 13. Attempting to use ioctls on /proc/<pid>/mem to control a process will trigger warnings on the console. The <sys/pioctl.h> include file will also now emit a compile-time warning when used from userland. Reviewed by: emaste MFC after: 1 week Relnotes: yes Differential Revision: https://reviews.freebsd.org/D23822	2020-03-17 18:44:03 +00:00
Edward Tomasz Napierala	fb48a42f03	Make autofs(5) timeout messages include affected process name and PID. MFC after: 2 weeks Sponsored by: DARPA, AFRL	2020-03-16 16:17:58 +00:00
Alan Somers	b0ecfb42d1	fusefs: avoid cache corruption with buggy fuse servers The FUSE protocol allows the client (kernel) to cache a file's size, if the server (userspace daemon) allows it. A well-behaved daemon obviously should not change a file's size while a client has it cached. But a buggy daemon might. If the kernel ever detects that that has happened, then it should invalidate the entire cache for that file. Previously, we would not only cache stale data, but in the case of a file extension while we had the size cached, we accidentally extended the cache with zeros. PR: 244178 Reported by: Ben RUBSON <ben.rubson@gmx.com> Reviewed by: cem MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D24012	2020-03-11 04:29:45 +00:00
Konstantin Belousov	c6d3d601c9	Preallocate pipe buffers on pipe creation. Return ENOMEM if one of the buffer cannot be created even with the minimal size. This should avoid subsequent spurious ENOMEM errors from write(2) when buffer cannot be allocated on the fly, after we reported that the pipe was create succesfully. Reported by: Keno Fischer <keno@juliacomputing.com> Reviewed by: markj (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D23993	2020-03-09 21:55:26 +00:00
Alan Somers	d970778e6f	fusefs: fix fsync for files with multiple open handles We were reusing a structure for multiple operations, but failing to reinitialize one member. The result is that a server that cares about FUSE file handle IDs would see one correct FUSE_FSYNC operation, and one with the FHID unset. PR: 244431 Reported by: Agata <chogata@gmail.com> MFC after: 2 weeks	2020-03-09 01:57:21 +00:00
Chuck Silvers	f15ccf8836	Add a new "mntfs" pseudo file system which provides private device vnodes for file systems to safely access their disk devices, and adapt FFS to use it. Also add a new BO_NOBUFS flag to allow enforcing that file systems using mntfs vnodes do not accidentally use the original devfs vnode to create buffers. Reviewed by: kib, mckusick Approved by: imp (mentor) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D23787	2020-03-06 18:41:37 +00:00
Mateusz Guzik	625adeaccd	nullfs: don't pre lock exclusive in nullfs_root Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23955	2020-03-04 19:52:00 +00:00
Pawel Biernacki	7029da5c36	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
Pawel Biernacki	d3d10ed299	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (10 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Approved by: kib (mentor, blanket) Differential Revision: https://reviews.freebsd.org/D23629	2020-02-24 10:37:56 +00:00
Pawel Biernacki	ef06a80cdb	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (8 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Approved by: kib (mentor, blanket) Differential Revision: https://reviews.freebsd.org/D23627	2020-02-24 10:33:51 +00:00
Konstantin Belousov	0ff51c98d1	Fix NFS client deadlock when read reports truncated node. If node attribute returned in the reply for read rpc indicate truncation, and it happens that the vnode is exclusively locked, update of the node attributes would try to shrink vnode size. Since during the read some vnode pages were busied by the reading thread, vnode_pager_setsize() deadlocks waiting for the busy state owned by the caller. Use a thread-local flag to indicate that NFS read owns some (s)busy pages states and postpone the call to vnode_pager_setsize() until the thread relinguishes the ownership. Diagnosed by: rlibby Tested by: pho, rlibby Sponsored by: The FreeBSD Foundation MFC after: 1 week	2020-02-22 20:50:30 +00:00
Fedor Uporov	3767ed5b11	Add a EXT2FS-specific implementation for lseek(SEEK_DATA). The lseek(SEEK_DATA) optimization logic could be simply borrowed from ufs side. See, https://reviews.freebsd.org/D19599. Reviewed by: pfg MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23605	2020-02-18 16:39:57 +00:00
Pawel Biernacki	e0d69c5a88	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (1 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Reviewed by: kib, trasz Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D23640	2020-02-15 18:48:38 +00:00
Mateusz Guzik	074ad60a4c	vfs: make write suspension mandatory At the time opt-in was introduced adding yourself as a writer was esrializing across the mount point. Nowadays it is fully per-cpu, the only impact being a small single-threaded hit on top of what's there right now. Vast majority of the overhead stems from the call to VOP_GETWRITEMOUNT which has is done regardless. Should someone want to microoptimize this single-threaded they can coalesce looking the mount up with adding a write to it.	2020-02-15 13:00:39 +00:00
Konstantin Belousov	c1e84733ac	tmpfs: add nomtime mount option, which disables tracking mtime updates due to writes through the shared mapped areas backed by tmpfs files. This removes periodic scans which downgrades rw mapped pages to ro to note the writes. Suggested by: mjg Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D23432	2020-02-04 19:05:58 +00:00
Konstantin Belousov	b66352b787	tmpfs_mount update: simplify, cache the value of VFS_TO_TMPFS() calculation. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2020-02-04 18:52:25 +00:00
Mateusz Guzik	2abdae33b1	tmpfs: inline tmpfs_update It was generated to be just a jumping off point to tmpfs_itimes. While here provide a dedicated variant for getattr since we normally don't expect to need to the update from that caller.	2020-02-03 17:06:21 +00:00
Mateusz Guzik	f1fa1ba3d0	Fix up various vnode-related asserts which did not dump the used vnode	2020-02-03 14:25:32 +00:00
Kyle Evans	6a5abb1ee5	Provide O_SEARCH O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping permissions checks on the directory itself after the initial open(). This is close to the semantics we've historically applied for O_EXEC on a directory, which is UB according to POSIX. Conveniently, O_SEARCH on a file is also explicitly undefined behavior according to POSIX, so O_EXEC would be a fine choice. The spec goes on to state that O_SEARCH and O_EXEC need not be distinct values, but they're not defined to be the same value. This was pointed out as an incompatibility with other systems that had made its way into libarchive, which had assumed that O_EXEC was an alias for O_SEARCH. This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a directory is checked in vn_open_vnode already, so for completeness we add a NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not re-check that when descending in namei. [0] https://pubs.opengroup.org/onlinepubs/9699919799/ Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23247	2020-02-02 16:34:57 +00:00
Kyle Evans	bd11e674ec	pseudofs: don't do VEXEC check in VOP_CACHEDLOOKUP VOP_CACHEDLOOKUP should assume that the appropriate VEXEC check has been done in the caller (vfs_cache_lookup), so it does not belong here.	2020-02-02 15:36:12 +00:00
Mateusz Guzik	10a15df653	vfs: remove the never set VDESC_VPP_WILLRELE flag	2020-02-02 09:35:48 +00:00
Mateusz Guzik	45757984f8	vfs: consistently use size_t for buflen around VOP_VPTOCNP	2020-02-01 20:34:43 +00:00
Konstantin Belousov	dc1d2cc648	Fix a bug in r357199. Around a generic call to null_nodeget(), there is nothing that would prevent the unmount of the nullfs mp until we process to the insmntque1() point. Calculate the VV_ROOT flag after insmntque1() to not access mp->mnt_data before we have an exclusively locked vnode from this mount point on the mp vnode list. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2020-01-30 19:34:37 +00:00
Mateusz Guzik	3cfabd81a1	vfs: remove the never set VDESC_NOMAP_VPP flag	2020-01-30 08:56:22 +00:00
Konstantin Belousov	5fc9e11c42	Save lower root vnode in nullfs mnt data instead of upper. Nullfs needs to know the root vnode of the lower fs during the operation. Currently it caches the upper vnode of it, which is also the root of the nullfs mount. On unmount, nullfs calls vflush() with rootrefs == 1, and aborts non-forced unmount if there are any more vnodes instantiated during vflush(). This means that the reference to the root vnode after failed non-forced unmount could be lost and nullm_rootvp points to the freed memory. Fix it by storing the reference for lower vnode instead, which is kept intact during vflush(). nullfs_root() now instantiates the upper vnode of lower root. Care about VV_ROOT flag in null_nodeget(). Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2020-01-28 11:29:06 +00:00
Alex Richardson	162ae9c834	Allow bootstrapping makefs on older FreeBSD hosts and Linux/macOS In order to do so we need to install the msdosfs headers to the bootstrap sysroot and avoid includes of kernel headers that may not exist on every host (e.g. sys/lockmgr.h). This change should allow bootstrapping of makefs on FreeBSD 11+ as well as Linux and macOS. We also have to avoid using the IO_SYNC macro since that may not be available. In makefs it is only used to switch between calling bwrite() and bdwrite() which both call the same function. Therefore we can simply always call bwrite(). For our CheriBSD builds we always bootstrap makefs by setting LOCAL_XTOOL_DIRS='lib/libnetbsd usr.sbin/makefs' and use the makefs binary from the build tree to create a bootable disk image. Reviewed By: brooks Differential Revision: https://reviews.freebsd.org/D23201	2020-01-27 12:02:41 +00:00
Rick Macklem	60a09a94cf	Fix a crash in the NFSv4 server. The PR reported a crash that occurred when a file was removed while client(s) were actively doing lock operations on it. Since nfsvno_getvp() will return NULL when the file does not exist, the bug was obvious and easy to fix via this patch. It is a little surprising that this wasn't found sooner, but I guess the above case rarely occurs. Tested by: iron.udjin@gmail.com PR: 242768 Reported by: iron.udjin@gmail.com MFC after: 2 weeks	2020-01-26 17:59:05 +00:00
Jeff Roberson	d6e13f3b4d	Don't hold the object lock while calling getpages. The vnode pager does not want the object lock held. Moving this out allows further object lock scope reduction in callers. While here add some missing paging in progress calls and an assert. The object handle is now protected explicitly with pip. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23033	2020-01-19 23:47:32 +00:00
Mateusz Guzik	d3cc535474	vfs: provide F_ISUNIONSTACK as a kludge for libc Prior to introduction of this op libc's readdir would call fstatfs(2), in effect unnecessarily copying kilobytes of data just to check fs name and a mount flag. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D23162	2020-01-17 14:42:25 +00:00
Mateusz Guzik	e2fa68513e	unionfs: use MNTK_NOMSYNC	2020-01-16 22:45:08 +00:00
Mateusz Guzik	2a829749d3	tmpfs: add missing CLTFLAG_MPSAFE annotation	2020-01-15 01:32:11 +00:00
Mateusz Guzik	7493134e08	nfs: add missing CLTFLAG_MPSAFE annotations	2020-01-15 01:31:57 +00:00
Mateusz Guzik	388820fbef	fusefs: add missing CLTFLAG_MPSAFE annotation	2020-01-15 01:31:28 +00:00
Eric van Gyzen	cf64777f50	Add missing comma in nfsv4_errstr Reported by: Coverity CID: 1412243 Sponsored by: Dell EMC Isilon	2020-01-13 21:49:27 +00:00
Mateusz Guzik	cc3593fbd9	vfs: rework vnode list management The current notion of an active vnode is eliminated. Vnodes transition between 0<->1 hold counts all the time and the associated traversal between different lists induces significant scalability problems in certain workloads. Introduce a global list containing all allocated vnodes. They get unlinked only when UMA reclaims memory and are only requeued when hold count reaches 0. Sample result from an incremental make -s -j 104 bzImage on tmpfs: stock: 118.55s user 3649.73s system 7479% cpu 50.382 total patched: 122.38s user 1780.45s system 6242% cpu 30.480 total Reviewed by: jeff Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D22997	2020-01-13 02:37:25 +00:00
Mateusz Guzik	57083d2576	vfs: add per-mount vnode lazy list and use it for deferred inactive + msync This obviates the need to scan the entire active list looking for vnodes of interest. msync is handled by adding all vnodes with write count to the lazy list. deferred inactive directly adds vnodes as it sets the VI_DEFINACT flag. Vnodes get dequeued from the list when their hold count reaches 0. Newly added MNT_VNODE_FOREACH_LAZY* macros support filtering so that spurious locking is avoided in the common case. Reviewed by: jeff Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D22995	2020-01-13 02:34:02 +00:00
Mateusz Guzik	b249ce48ea	vfs: drop the mostly unused flags argument from VOP_UNLOCK Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427	2020-01-03 22:29:58 +00:00
Mateusz Guzik	4a20fe31c3	unionfs: fix up VOP_UNLOCK use after flags stopped being supported For the most part the code was passing the LK_RELEASE flag. The 2 cases which did not use the VOP_UNLOCK_FLAGS macro. This fixes a panic when stacking unionfs on top of e.g., tmpfs when debug is enabled. Note there are latent bugs which prevent unionfs from working with debug regardless of this change. PR: 243064 Reported by: Mason Loring Bliss	2020-01-03 22:12:25 +00:00
Mateusz Guzik	d2203b48a5	msdos: vgone unconstructed vnode before vputing it Otherwise someone else may race to start using it. Race window was opened by r351748 ("vfs: implement usecount implying holdcnt"). Noted by: kib	2020-01-01 22:50:23 +00:00
Mateusz Guzik	f342b91c76	msdosfs: add a missing MNT_VNODE_FOREACH_ALL_ABORT to msdosfs_sync	2020-01-01 22:47:00 +00:00
Mark Johnston	9f5632e6c8	Remove page locking for queue operations. With the previous reviews, the page lock is no longer required in order to perform queue operations on a page. It is also no longer needed in the page queue scans. This change effectively eliminates remaining uses of the page lock and also the false sharing caused by multiple pages sharing a page lock. Reviewed by: jeff Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D22885	2019-12-28 19:04:00 +00:00
Rick Macklem	57fd4aa7e4	Change NFSv4.1 and NFSv4.2 error strings to start with lower case letter. r356084 added error strings for NFSv4.1 and NFSv4.2, with the first character capitalized. Since the other error strings were not capitalized and these strings would usually be imbedded in an error, I decided to make the first characters lower cased. No real effect but more consistent.	2019-12-26 21:06:34 +00:00
Rick Macklem	8f2940cec7	Add NFSv4.1 and NFSv4.2 errors to nfsv4_errstr.h. nfsv4_errstr.h only had strings for NFSv4.0 errors. This patch adds the errors for NFSv4.1 and NFSv4.2. At this time, this file is not used by any sources in the tree, so the change is not significant. I do plan on using nfsv4_errstr.h in a future patch to mount_nfs.c. Since I am doing this patch so that "minor version mismatch" will be recognized, I made that string less abbreviated.	2019-12-25 22:25:30 +00:00
Rick Macklem	05dcd5d2c8	Fix nfsmount() so that it will return NFSERR_MINORVERMISMATCH. If nfsrpc_getdirpath() returns NFSERR_MINORVERMISMATCH, it would erroneously get mapped to EIO. This was not particularily harmful, but would make it hard for sysadmins to diagnose why an NFSv4 mount is failing. mount_nfs.c still needs to be fixed so that it does not report NFSERR_MINORVERMISMATCH as an unknown error 10021. MFC after: 1 week	2019-12-25 01:15:38 +00:00
Doug Moore	f9f4c60aa7	Including <sys/tmpfs.h> into non-kernel software leads to a compilation error because, without _KERNEL defined, the macro TMPFS_VALIDATE_DIR is invoked, but never defined. User-level software that includes sys/tmpfs.h must define _KERNEL to make the definition of TMPFS_VALIDATE_DIR visible. This change puts all the inline functions that, directly or indirectly, invoke MPASS into the scope of the _KERNEL block, allowing many user-space includers of <sys/tmpfs.h> to stop defining _KERNEL. Reviewed by: alc, kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D22874	2019-12-19 16:39:52 +00:00
Mateusz Guzik	6fa079fc3f	vfs: flatten vop vectors This eliminates the following loop from all VOP calls: while(vop != NULL && \ vop->vop_spare2 == NULL && vop->vop_bypass == NULL) vop = vop->vop_default; Reviewed by: jeff Tesetd by: pho Differential Revision: https://reviews.freebsd.org/D22738	2019-12-16 00:06:22 +00:00
Jeff Roberson	a808177864	Add a deferred free mechanism for freeing swap space that does not require an exclusive object lock. Previously swap space was freed on a best effort basis when a page that had valid swap was dirtied, thus invalidating the swap copy. This may be done inconsistently and requires the object lock which is not always convenient. Instead, track when swap space is present. The first dirty is responsible for deleting space or setting PGA_SWAP_FREE which will trigger background scans to free the swap space. Simplify the locking in vm_fault_dirty() now that we can reliably identify the first dirty. Discussed with: alc, kib, markj Differential Revision: https://reviews.freebsd.org/D22654	2019-12-15 03:15:06 +00:00
Rick Macklem	f808cf7294	Silence some "might not be initialized" warnings for riscv64. None of these case were actually using the variable(s) uninitialized, but I figured that silencing the warnings via initializing them made sense. Some of these predated r355677.	2019-12-13 21:38:08 +00:00
Rick Macklem	bf6ac05aa3	Add some more initializations to quiet riscv build. The one case in nfs_copy_file_range() was a legitimate case, although it would probably never occur in practice.	2019-12-13 01:34:25 +00:00
Rick Macklem	95bf2e523b	Fix the build for MAC not defined and a couple of might not be initialized. r355677 broke the build for the not MAC defined case and a couple of might not be initialized warnings were generated for riscv. Others seem to be erroneous. Hopefully there won't be too many more build errors. Pointy hat goes on me.	2019-12-13 00:45:14 +00:00
Rick Macklem	c057a37818	Add support for NFSv4.2 to the NFS client and server. This patch adds support for NFSv4.2 (RFC-7862) and Extended Attributes (RFC-8276) to the NFS client and server. NFSv4.2 is comprised of several optional features that can be supported in addition to NFSv4.1. This patch adds the following optional features: - posix_fadvise(POSIX_FADV_WILLNEED/POSIX_FADV_DONTNEED) - posix_fallocate() - intra server file range copying via the copy_file_range(2) syscall --> Avoiding data tranfer over the wire to/from the NFS client. - lseek(SEEK_DATA/SEEK_HOLE) - Extended attribute syscalls for "user" namespace attributes as defined by RFC-8276. Although this patch is fairly large, it should not affect support for the other versions of NFS. However it does add two new sysctls that allow a sysadmin to limit which minor versions of NFSv4 a server supports, allowing a sysadmin to disable NFSv4.2. Unfortunately, when the NFS stats structure was last revised, it was assumed that there would be no additional operations added beyond what was specified in RFC-7862. However RFC-8276 did add additional operations, forcing the NFS stats structure to revised again. It now has extra unused entries in all arrays, so that future extensions to NFSv4.2 can be accomodated without revising this structure again. A future commit will update nfsstat(1) to report counts for the new NFSv4.2 specific operations/procedures. This patch affects the internal interface between the nfscommon, nfscl and nfsd modules and, as such, they all must be upgraded simultaneously. I will do a version bump (although arguably not needed), due to this. This code has survived a "make universe" but has not been built with a recent GCC. If you encounter build problems, please email me. Relnotes: yes	2019-12-12 23:22:55 +00:00
Mateusz Guzik	c8b29d1212	vfs: locking primitives which elide ->v_vnlock and shared locking disablement Both of these features are not needed by many consumers and result in avoidable reads which in turn puts them on profiles due to cache-line ping ponging. On top of that the current lockgmr entry point is slower than necessary single-threaded. As an attempted clean up preparing for other changes, provide new routines which don't support any of the aforementioned features. With these patches in place vop_stdlock and vop_stdunlock disappear from flamegraphs during -j 104 buildkernel. Reviewed by: jeff (previous version) Tested by: pho Differential Revision: https://reviews.freebsd.org/D22665	2019-12-11 23:11:21 +00:00
Mateusz Guzik	abd80ddb94	vfs: introduce v_irflag and make v_type smaller The current vnode layout is not smp-friendly by having frequently read data avoidably sharing cachelines with very frequently modified fields. In particular v_iflag inspected for VI_DOOMED can be found in the same line with v_usecount. Instead make it available in the same cacheline as the v_op, v_data and v_type which all get read all the time. v_type is avoidably 4 bytes while the necessary data will easily fit in 1. Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new flag field with a new value: VIRF_DOOMED. Reviewed by: kib, jeff Differential Revision: https://reviews.freebsd.org/D22715	2019-12-08 21:30:04 +00:00
Rick Macklem	a95cd06e9a	Delete an unused external declaration. Since nfsv4_opflag is no longer used in nfs_clcomsubs.c, delete the external declaration of it. Found during NFSv4.2 code merge. MFC after: 2 weeks	2019-12-08 16:59:36 +00:00
Rick Macklem	8e1906f700	Fix kernel handling of a NFSERR_MINORVERSMISMATCH NFSv4 server reply. When an NFSv4 server replies NFSERR_MINORVERSMISMATCH, it does not generate a status result for the first operation in the compound. Without this patch, this will result in a bogus EBADXDR error return. Returning EBADXDR is relatively harmless, but a correct reply of NFSERR_MINORVERSMISMATCH is needed by the pNFS client to select the correct minor version to use for a File Layout DS now that there can be NFSv4.2 DS servers. mount_nfs.c still needs to be fixed for this, although how the mount fails is only useful to help sysadmins isolate why a mount fails. Found during testing of the NFSv4.2 client and server. MFC after: 2 weeks	2019-12-08 00:06:00 +00:00
Rick Macklem	238da71f91	Add some definitions for NFSv4.2 which will be used by subsequent commits. This is a preliminary commit of NFSv4.2 definitions that will be used by subsequent commits which adds NFSv4.2 support to the NFS client and server. There will be a series of these preliminary commits that will prepare for a major commit of the NFSv4.2 client/server changes currently found in subversion under projects/nfsv42/sys.	2019-12-07 23:13:51 +00:00
Rick Macklem	394dae30b1	Set the XATTRSUPPORT attribute bit for NFSv4.2, always cleared for now. Since r355472 added code which clears the XATTRSUPPORT bit for non-NFSv4.2 mounts, it is now safe to set it. There will be a series of these preliminary commits that will prepare for a major commit of the NFSv4.2 client/server changes currently found in subversion under projects/nfsv42/sys. This commit completes updates to nfsproto.h required by the NFSv4.2.	2019-12-07 01:10:38 +00:00
Rick Macklem	2096ce0339	Add a couple of definitions for NFSv4.2 and update macros to use them. This patch adds code to macros to clear attribute bits not supported by NFSv4.2. For now, these bits are never set anyhow, but this prepares the code for the addition of NFSv4.2 support in a future commit. There will be a series of these preliminary commits that will prepare for a major commit of the NFSv4.2 client/server changes currently found in subversion under projects/nfsv42/sys.	2019-12-06 23:51:11 +00:00
Rick Macklem	8f9259a550	Add some definitions for NFSv4.2 which will be used by subsequent commits. This is a preliminary commit of NFSv4.2 definitions that will be used by subsequent commits which adds NFSv4.2 support to the NFS client and server. There will be a series of these preliminary commits that will prepare for a major commit of the NFSv4.2 client/server changes currently found in subversion under projects/nfsv42/sys.	2019-12-06 01:53:02 +00:00
Mateusz Guzik	1e0006e49c	nullfs: locklessly check for entries in null_hashget During random sampling over poudriere -j 104 over 10% of calls returned NULL.	2019-12-05 13:41:22 +00:00
Konstantin Belousov	a51c8071a3	Stop using per-mount tmpfs zones. Requested and reviewed by: jeff Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D22643	2019-12-05 00:03:17 +00:00
Rick Macklem	348d9b567e	Add some definitions for NFSv4.2 which will be used by subsequent commits. This is a preliminary commit of NFSv4.2 definitions that will be used by subsequent commits which adds NFSv4.2 support to the NFS client and server. There will be a series of these preliminary commits that will prepare for a major commit of the NFSv4.2 client/server changes currently found in subversion under projects/nfsv42/sys.	2019-12-04 23:24:40 +00:00
Mateusz Guzik	ba08feecbf	tmpfs: use proper macros for permission values in tmpfs_access While here group them in one var to prevent overy long lines. Perhaps a general macro of the same sort should be introduced. Requested by: kib	2019-12-01 00:34:49 +00:00
Kyle Evans	1b50b999f9	tty: implement TIOCNOTTY Generally, it's preferred that an application fork/setsid if it doesn't want to keep its controlling TTY, but it could be that a debugger is trying to steal it instead -- so it would hook in, drop the controlling TTY, then do some magic to set things up again. In this case, TIOCNOTTY is quite handy and still respected by at least OpenBSD, NetBSD, and Linux as far as I can tell. I've dropped the note about obsoletion, as I intend to support TIOCNOTTY as long as it doesn't impose a major burden. Reviewed by: bcr (manpages), kib Differential Revision: https://reviews.freebsd.org/D22572	2019-11-30 20:10:50 +00:00
Mateusz Guzik	a02cab334c	devfs: introduce a per-dev lock to protect ->si_devsw This allows bumping threadcount without taking the global devmtx lock. In particular this eliminates contention on said lock while using bhyve with multiple vms. Reviewed by: kib Tested by: markj MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22548	2019-11-30 16:46:19 +00:00
Mateusz Guzik	0f4b850e85	tmpfs: add fast path to tmpfs_access for common case lookup VEXEC consists of vast majority of all calls and almost all targets have at least 0111.	2019-11-30 16:41:47 +00:00
Konstantin Belousov	9698d99230	In nfs_lock(), recheck vp->v_data after lock before accessing it. We might race with reclaim, and then this is no longer a nfs vnode, in which case we do not need to handle deferred vnode_pager_setsize() either. Reported by: rk@ronald.org PR: 242184 Sponsored by: The FreeBSD Foundation MFC after: 3 days	2019-11-29 13:55:56 +00:00
Rick Macklem	e1cda5eea6	Fix two races while handling nfsuserd daemon start/stop. A crash was reported where the nr_client field was NULL during an upcall to the nfsuserd daemon. Since nr_client == NULL only occurs when the nfsuserd daemon is being shut down, it appeared to be caused by a race between doing an upcall and the daemon shutting down. By inspection two races were identified: 1 - The nfsrv_nfsuserd variable is used to indicate whether or not the daemon is running. However it did not handle the intermediate phase where the daemon is starting or stopping. This was fixed by making nfsrv_nfsuserd tri-state and having the functions that are called during start/stop to obey the intermediate state. 2 - nfsrv_nfsuserd was checked to see that the daemon was running at the beginning of an upcall, but nothing prevented the daemon from being shut down while an upcall was still in progress. This race probably caused the crash. The patch fixes this by adding a count of upcalls in progress and having the shut down function delay until this count goes to zero before getting rid of nr_client and related data used by an upcall. Tested by: avg (Panzura QA) Reported by: avg Reviewed by: avg MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D22377	2019-11-28 23:34:23 +00:00
Konstantin Belousov	dbe257d253	tmpfs: resolve deadlock between rename and unmount. Top-level kern_renameat() increases the writecount on the mount point, which, together with tmpfs unmount suspending the mount, already ensures that unmount cannot proceed while rename unlocks and relocks all operated vnodes. Remove vfs_busy() call from tmpfs_rename() which was done while holding a vnode lock, creating the deadlock. The only intent of the busy operation seems to be the prevention of unmount, which is already ensured. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-11-24 19:06:38 +00:00
Rick Macklem	14eff785e8	Fix the pNFS server's reporting of SpaceUsed (va_bytes). The pNFS server currently reports SpaceUsed (va_bytes) for the metadata file. This in not correct, since the metadata file is always empty and, as such, va_bytes is just the allocation for the empty file. This patch adds va_bytes to the list of attributes acquired from the DS for a file, so that it includes the allocated data size and is updated when the file is written. For files created on a pNFS server before this patch is applied, the va_bytes value is estimated by rounding va_size up to a multiple of BLKDEV_IOSIZE. Once the file is written after this patch has been applied to the metadata server, the va_bytes returned for the file will be correct. This patch only affects a pNFS metadata server. Found during testing of the NFSv4.2 pNFS server for the Allocate operation. (Not yet in head/current.) MFC after: 2 weeks	2019-11-22 00:22:55 +00:00
Mateusz Guzik	1fccb43c39	vfs: change si_usecount management to count used vnodes Currently si_usecount is effectively a sum of usecounts from all associated vnodes. This is maintained by special-casing for VCHR every time usecount is modified. Apart from complicating the code a little bit, it has a scalability impact since it forces a read from a cacheline shared with said count. There are no consumers of the feature in the ports tree. In head there are only 2: revoke and devfs_close. Both can get away with a weaker requirement than the exact usecount, namely just the count of active vnodes. Changing the meaning to the latter means we only need to modify it on 0<->1 transitions, avoiding the check plenty of times (and entirely in something like vrefact). Reviewed by: kib, jeff Tested by: pho Differential Revision: https://reviews.freebsd.org/D22202	2019-11-20 12:05:59 +00:00
Jeff Roberson	639676877b	Simplify anonymous memory handling with an OBJ_ANON flag. This eliminates reudundant complicated checks and additional locking required only for anonymous memory. Introduce vm_object_allocate_anon() to create these objects. DEFAULT and SWAP objects now have the correct settings for non-anonymous consumers and so individual consumers need not modify the default flags to create super-pages and avoid ONEMAPPING/NOSPLIT. Reviewed by: alc, dougm, kib, markj Tested by: pho Differential Revision: https://reviews.freebsd.org/D22119	2019-11-19 23:19:43 +00:00
Jeff Roberson	67d0e29304	Replace OBJ_MIGHTBEDIRTY with a system using atomics. Remove the TMPFS_DIRTY flag and use the same system. This enables further fault locking improvements by allowing more faults to proceed with a shared lock. Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D22116	2019-10-29 21:06:34 +00:00
Mateusz Guzik	2e310f6f72	pseudofs: hashed vncache Vast majority of uses the cache are just checking if there is an entry present on process exit (and evicting it if so). Both checking and eviction process are very expensive and put the lock protecting it high up on the profile during poudriere -j 104. Convert the linked list into a hash. This allows to almost always avoid taking the lock in the first place (and consequently almost removes it from the profile). Note only one lock is preserved as a split did not meaningfully impact contention. Should the cache be used for something it will still run into contention issues. The code needs a rewrite, but should someone want to tidy it up further the following can be done: 1) per-chain locks (or at least an array) 2) hashing by something else than just pid Sponsored by: The FreeBSD Foundation	2019-10-22 22:52:53 +00:00
Konstantin Belousov	c6ba06d86c	Fix interface between nfsclient and vnode pager. Make the nfsclient always call vnode_pager_setsize() with the vnode exclusively locked. This ensures that page fault always can find the backing page if the object size check succeeded. Set VV_VMSIZEVNLOCK flag on NFS nodes. The main offender breaking the interface in nfsclient is nfs_loadattrcache(), which is used whenever server responded with updated attributes, which can happen on non-changing operations as well. Also, iod threads only have buffers locked (and even that is LK_KERNPROC), but they still may call nfs_loadattrcache() on RPC response. Instead of immediately calling vnode_pager_setsize() if server response indicated changed file size, but the vnode is not exclusively locked, set a new node flag NVNSETSZSKIP. When the vnode exclusively locked, or when we can temporary upgrade the lock to exclusive, call vnode_pager_setsize(), by providing the nfsclient VOP_LOCK() implementation. Tested by: pho Discussed with: rmacklem Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D21883	2019-10-22 16:17:38 +00:00
Jeff Roberson	0012f373e4	(4/6) Protect page valid with the busy lock. Atomics are used for page busy and valid state when the shared busy is held. The details of the locking protocol and valid and dirty synchronization are in the updated vm_page.h comments. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21594	2019-10-15 03:45:41 +00:00
Jeff Roberson	63e9755548	(1/6) Replace busy checks with acquires where it is trival to do so. This is the first in a series of patches that promotes the page busy field to a first class lock that no longer requires the object lock for consistency. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21548	2019-10-15 03:35:11 +00:00
Mateusz Guzik	9c04e4c01e	tmpfs: use MNTK_NOMSYNC Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22009	2019-10-13 15:42:41 +00:00
Mateusz Guzik	48c426f226	pseudofs: use MNTK_NOMSYNC Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22009	2019-10-13 15:42:25 +00:00
Mateusz Guzik	be4cd6912f	nullfs: use MNTK_NOMSYNC Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22009	2019-10-13 15:42:04 +00:00
Mateusz Guzik	4c9ba39aea	devfs: use MNTK_NOMSYNC Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22009	2019-10-13 15:41:47 +00:00
Konstantin Belousov	e3fdd051f9	devfs_vptocnp(): correct the component name when node is not at top. Node' cdp.si_name is the full path as provided by make_dev(9), it should not be returned by VOP_VPTOCNP() when only the last component is requested. Use the dirent entry instead. With this note, handling of VDIR and VCHR nodes only differs in handling of root vnode, which simplifies and unifies the logic. Reported by: Li, Zhichao1 <Zhichao_Li1@Dell.com> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-10-11 18:41:24 +00:00
Konstantin Belousov	53fcc6c960	Plug the rest of undef behavior places that were missed in r337456. There are three more places in msdosfs_fat.c which might shift one into the sign bit. While there, fix formatting of KASSERTs. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-10-11 18:37:02 +00:00
Doug Moore	2288078c5e	Define macro VM_MAP_ENTRY_FOREACH for enumerating the entries in a vm_map. In case the implementation ever changes from using a chain of next pointers, then changing the macro definition will be necessary, but changing all the files that iterate over vm_map entries will not. Drop a counter in vm_object.c that would have an effect only if the vm_map entry count was wrong. Discussed with: alc Reviewed by: markj Tested by: pho (earlier version) Differential Revision: https://reviews.freebsd.org/D21882	2019-10-08 07:14:21 +00:00
Mateusz Guzik	d511f93e45	nfsclient: add root vnode caching See r353150. Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21646	2019-10-06 22:17:29 +00:00
Mateusz Guzik	7682d0be2b	tmpfs: add root vnode caching See r353150. Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21646	2019-10-06 22:17:11 +00:00
Mateusz Guzik	559ac49d41	devfs: add root vnode caching See r353150. Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21646	2019-10-06 22:16:55 +00:00
Mateusz Guzik	dfa8dae493	devfs: plug redundant bwillwrite avoidance vn_write already checks for vnode type to see if bwillwrite should be called. This effectively reverts r244643. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21905	2019-10-05 17:44:33 +00:00
Konstantin Belousov	c5dac63c15	tmpfs_readdir(): unlock the locked node. During readdir() we guarantee that the tn_dir.tn_parent does not go away, but it might be replaced by a parallel rename. Read tn_parent only once, then use the cached value. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-10-03 19:55:05 +00:00
Konstantin Belousov	f7e69a6fa0	tmpfs_rename: style. Reformat multi-line comments to follow style. Also fix some typos. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-10-03 19:51:56 +00:00
Konstantin Belousov	d60ac9d561	Remove unnecessary vm/vm_page.h and vm/vm_pager.h includes from tmpfs/tmpfs_vnodes.c. Submitted by: ota@j.email.ne.jp MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21881	2019-10-03 08:25:09 +00:00
Rick Macklem	ee7201a725	Replace all mtx_assert() calls for n_mtx and ncl_iod_mutex with macros. To be consistent with replacing the mtx_lock()/mtx_unlock() calls on the NFS node mutex (n_mtx) and ncl_iod_mutex, this patch replaces all mtx_assert() calls on these mutexes with macros as well. This will simplify changing these locks to sx locks in a future commit. However, this change may be delayed indefinitely, since it appears there is a deadlock when vnode_pager_setsize() is called to shrink the size and the NFS node lock is held. There is no semantic change as a result of this commit. Suggested by: kib MFC after: 1 week	2019-09-26 02:54:45 +00:00
Rick Macklem	b662b41e62	Replace all mtx_lock()/mtx_unlock() on the iod lock with macros. Since the NFS node mutex needs to change to an sx lock so it can be held when vnode_pager_setsize() is called and the iod lock is held when the NFS node lock is acquired, the iod mutex will need to be changed to an sx lock as well. To simply the future commit that changes both the NFS node lock and iod lock to sx locks, this commit replaces all mtx_lock()/mtx_unlock() calls on the iod lock with macros. There is no semantic change as a result of this commit. I don't know when the future commit will happen and be MFC'd, so I have set the MFC on this commit to one week so that it can be MFC'd at the same time. Suggested by: kib MFC after: 1 week	2019-09-24 23:38:10 +00:00
Rick Macklem	5d85e12f44	Replace all mtx_lock()/mtx_unlock() on n_mtx with the macros. For a long time, some places in the NFS code have locked/unlocked the NFS node lock with the macros NFSLOCKNODE()/NFSUNLOCKNODE() whereas others have simply used mtx_lock()/mtx_unlock(). Since the NFS node mutex needs to change to an sx lock so it can be held when vnode_pager_setsize() is called, replace all occurrences of mtx_lock/mtx_unlock with the macros to simply making the change to an sx lock in future commit. There is no semantic change as a result of this commit. I am not sure if the change to an sx lock will be MFC'd soon, so I put an MFC of 1 week on this commit so that it could be MFC'd with that commit. Suggested by: kib MFC after: 1 week	2019-09-24 01:58:54 +00:00
Kyle Evans	5fdac75222	msdosfs: do not deget unlinked denodes When a file is unlinked, the denode is not reclaimed until the last reference is dropped, but the directory entry is immediately up for reuse. This is a problem later when createde goes to grab a denode for the newly created entry -- we search the hash and find a dead denode, then return that without even bumping the reference count and the data later gets truncated when the the last reference to the unlinked file is dropped. This manifested itself as a broken in-place strip(1) on msdosfs. elfcopy will do a sequence incredibly roughly like this: open("/mnt/foo", ...) => fd 3 mmap() unlink("/mnt/foo") open("/mnt/foo", ...) => fd 4 write(4, ...) close(4) close(3) and the resulting file would be truncated, but the write succeeded, as long as a reference to the unlinked file had not been closed. Some archaeology indicates that this bug has likely existed since msdosfs was converted to use vfs_hash instead of a home rolled hash implementation in r143570. Prior to that point, the hashget implementation would do a refcnt check while searching and explicitly only return a denode with de_refcnt != 0. vfs_hash did not yet have the callback that it does today, so this slipped away and did not come back when it later grew that functionality. The comment indicating that we want to skip these denodes has been updated to reflect where this is actually done. My repo-diving session seems to indicate that the refcnt check was likely never actually below the comment, to be pedantic, but instead a detail wrapped up in the hashget implementation since the beginning of its inclusion into FreeBSD. This bug was the cause behind the issue addressed in r352557. Reported by: jhibbits Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D21731	2019-09-20 20:47:10 +00:00
Konstantin Belousov	6fd583583b	Further refine r352393, only call vnode_pager_setsize() outside the node lock when shrinking. This is similar to r252528, applied to the above commit. Apparently there is a race which makes necessary at least to keep the n_size and pager size consistent when extending. Current suspect is that iod threads perform vnode_pager_setsize() without taking the vnode lock, which corrupts the file content. Reported and tested by: Masachika ISHIZUKA <ish@amail.plala.or.jp> Discussed with: rmacklem (related issues) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-09-17 18:41:39 +00:00
Mateusz Guzik	4cace859c2	vfs: convert struct mount counters to per-cpu There are 3 counters modified all the time in this structure - one for keeping the structure alive, one for preventing unmount and one for tracking active writers. Exact values of these counters are very rarely needed, which makes them a prime candidate for conversion to a per-cpu scheme, resulting in much better performance. Sample benchmark performing fstatfs (modifying 2 out of 3 counters) on a 104-way 2 socket Skylake system: before: 852393 ops/s after: 76682077 ops/s Reviewed by: kib, jeff Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21637	2019-09-16 21:37:47 +00:00
Alan Somers	320c848ff6	Fix an off-by-one error from r351961 That revision addressed a Coverity CID that could lead to a buffer overflow, but it had an off-by-one error in the buffer size check. Reported by: Coverity Coverity CID: 1405530 MFC after: 3 days MFC-With: 351961 Sponsored by: The FreeBSD Foundation	2019-09-16 16:41:01 +00:00
Alan Somers	42767f76af	fusefs: fix some minor issues with fuse_vnode_setparent * When unparenting a vnode, actually clear the flag. AFAIK this is basically a no-op because we only unparent a vnode when reclaiming it or when unlinking. * There's no need to call fuse_vnode_setparent during reclaim, because we're about to free the vnode data anyway. Reviewed by: emaste MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21630	2019-09-16 14:51:49 +00:00
Konstantin Belousov	1246ee664b	nfscl_loadattrcache: fix rest of the cases to not call vnode_pager_setsize() under the node mutex. r248567 moved some calls of vnode_pager_setsize() after the node lock is unlocked, do the rest now. Reported and tested by: peterj Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-09-16 13:26:27 +00:00
Edward Tomasz Napierala	cf38985293	Make pseudofs(9) create directory entries in order, instead of the reverse. This fixes Linux sysctl(8) binary - it assumes the first two directory entries are always "." and "..". There might be other Linux apps affected by this. NB it might be a good idea to rewrite it using queue(3). Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21550	2019-09-14 19:16:07 +00:00
Conrad Meyer	aaa3852435	buf: Add B_INVALONERR flag to discard data Setting the B_INVALONERR flag before a synchronous write causes the buf cache to forcibly invalidate contents if the write fails (BIO_ERROR). This is intended to be used to allow layers above the buffer cache to make more informed decisions about when discarding dirty buffers without successful write is acceptable. As a proof of concept, use in msdosfs to handle failures to mark the on-disk 'dirty' bit during rw mount or ro->rw update. Extending this to other filesystems is left as future work. PR: 210316 Reviewed by: kib (with objections) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D21539	2019-09-11 21:24:14 +00:00
Alan Somers	6c0c362075	fusefs: Fix iosize for FUSE_WRITE in 7.8 compat mode When communicating with a FUSE server that implements version 7.8 (or older) of the FUSE protocol, the FUSE_WRITE request structure is 16 bytes shorter than normal. The protocol version check wasn't applied universally, leading to an extra 16 bytes being sent to such servers. The extra bytes were allocated and bzero()d, so there was no information disclosure. Reviewed by: emaste MFC after: 3 days MFC-With: r350665 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21557	2019-09-11 19:29:40 +00:00
Mark Johnston	fee2a2fa39	Change synchonization rules for vm_page reference counting. There are several mechanisms by which a vm_page reference is held, preventing the page from being freed back to the page allocator. In particular, holding the page's object lock is sufficient to prevent the page from being freed; holding the busy lock or a wiring is sufficent as well. These references are protected by the page lock, which must therefore be acquired for many per-page operations. This results in false sharing since the page locks are external to the vm_page structures themselves and each lock protects multiple structures. Transition to using an atomically updated per-page reference counter. The object's reference is counted using a flag bit in the counter. A second flag bit is used to atomically block new references via pmap_extract_and_hold() while removing managed mappings of a page. Thus, the reference count of a page is guaranteed not to increase if the page is unbusied, unmapped, and the object's write lock is held. As a consequence of this, the page lock no longer protects a page's identity; operations which move pages between objects are now synchronized solely by the objects' locks. The vm_page_wire() and vm_page_unwire() KPIs are changed. The former requires that either the object lock or the busy lock is held. The latter no longer has a return value and may free the page if it releases the last reference to that page. vm_page_unwire_noq() behaves the same as before; the caller is responsible for checking its return value and freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is introduced for use in pmap_extract_and_hold(). It fails if the page is concurrently being unmapped, typically triggering a fallback to the fault handler. vm_page_wire() no longer requires the page lock and vm_page_unwire() now internally acquires the page lock when releasing the last wiring of a page (since the page lock still protects a page's queue state). In particular, synchronization details are no longer leaked into the caller. The change excises the page lock from several frequently executed code paths. In particular, vm_object_terminate() no longer bounces between page locks as it releases an object's pages, and direct I/O and sendfile(SF_NOCACHE) completions no longer require the page lock. In these latter cases we now get linear scalability in the common scenario where different threads are operating on different files. __FreeBSD_version is bumped. The DRM ports have been updated to accomodate the KPI changes. Reviewed by: jeff (earlier version) Tested by: gallatin (earlier version), pho Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20486	2019-09-09 21:32:42 +00:00
Ed Maste	4f0372f8cb	msdosfsmount.h: fix ifdef comment	2019-09-09 18:35:17 +00:00
Alan Somers	16f8783452	Coverity fixes in fusefs(5) CID 1404532 fixes a signed vs unsigned comparison error in fuse_vnop_bmap. It could potentially have resulted in VOP_BMAP reporting too many consecutive blocks. CID 1404364 is much worse. It was an array access by an untrusted, user-provided variable. It could potentially have resulted in a malicious file system crashing the kernel or worse. Reported by: Coverity Reviewed by: emaste MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21466	2019-09-06 19:40:11 +00:00
Conrad Meyer	454409a372	msdosfs: Remove redundant brelse() after r294954 Same automation. No functional change.	2019-09-06 08:08:10 +00:00
Conrad Meyer	dadc0f97d0	cd9660: Remove redundant brelse() after r294954 Same automation. No functional change.	2019-09-06 08:07:36 +00:00
Conrad Meyer	fe8b34563d	ext2fs: Remove redundant brelse() after r294954 Coccinelle: @ rule1 @ identifier __error; @@ ... int __error; ... @ rule2 depends on rule1 @ identifier rule1.__error; identifier __bp; @@ __error = ( bread \| bread_gb \| breadn \| breadn_flags ) (..., &__bp); if ( ( __error \| __error != 0 ) ) { ... - brelse(__bp); ... } No functional change.	2019-09-06 08:07:12 +00:00
Rick Macklem	4ce21f37fd	Delete the unused "nd" argument for nfsrv_proxyds(). The "nd" argument for nfsrv_proxyds() is no longer used by the function. This patch deletes it. This allows a subsequent patch to delete the "nd" argument from nfsvno_getattr(), since it's only use of "nd" was to pass it to nfsrv_proxyds(). Getting rid of the "nd" argument from nfsvno_getattr() avoids confusion over why it might need "nd". This patch is trivial and does not have any semantic effect.	2019-09-05 22:25:19 +00:00
Conrad Meyer	a6935d085c	Remove long-dead BUF_ASSERT_{,UN}HELD assertions These were fully neutered in r177676 (2008), but not removed at the time for unclear reasons. They're totally dead code, so go ahead and yank them now. No functional change.	2019-09-05 21:43:33 +00:00
Conrad Meyer	f80cbeb292	msdosfs: Drop an unneeded brelse in bread error condition After r294954, it is an invariant that bread returns non-NULL bp if and only if the routine succeeded. On error, it handles any buffer cleanup internally. So the brelse(NULL) here was just redundant. No functional change. Discussed with: kib (extracted from a larger differential)	2019-09-05 21:30:52 +00:00
Rick Macklem	2e67077700	Delete the unused "nd" argument for nfsrv_checkdsattr(). The "nd" argument for nfsrv_checkdsattr() is no longer used by the function. This patch deletes it. This allows subsequent patches to delete the "nd" argument from nfsrv_proxyds(), since it's only use of "nd" was to pass it to nfsrv_checkdsattr(). The same will then be true for nfsvno_getattr(), which passes "nd" to nfsrv_proxyds(). Getting rid of the "nd" argument from nfsvno_getattr() avoids confusion over why it might need "nd". This patch is trivial and does not have any semantic effect. Found by inspection while working on the NFSv4.2 server.	2019-09-04 22:37:28 +00:00
Kyle Evans	f99c5e8d28	pseudofs: make readdir work without a pid again Specifically, the following was broken: $ mount -t procfs procfs /proc $ ls -l /proc r351741 reworked readdir slightly to avoid pfs_node/pidhash LOR, but inadvertently regressed pid == NO_PID; new pfs_lookup_proc() fails for the obvious reasons, and later pfs_visible_proc doesn't capture the pid == NO_PID -> return 1 aspect of pfs_visible. We can infact skip this whole block if we're operating on a directory w/ NO_PID, as it's always visible. Reported by: trasz Reviewed by: mjg Differential Revision: https://reviews.freebsd.org/D21518	2019-09-04 14:20:39 +00:00
Mateusz Guzik	f5791174df	pseudofs: fix a LOR pfs_node vs pidhash (sleepable after non-sleepable) Sponsored by: The FreeBSD Foundation	2019-09-03 12:54:51 +00:00
Ed Maste	840aca2880	makefs: share msdosfsmount.h between kernel msdosfs and makefs Sponsored by: The FreeBSD Foundation	2019-09-01 16:55:33 +00:00
Mateusz Guzik	e0f4540a2a	nullfs: reduce areas protected by vnode interlock in null_lock Similarly to the other routine stop taking the interlock for the lower vnode. The interlock for nullfs vnode is still taken to ensure stability of ->v_data. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21480	2019-09-01 02:52:00 +00:00
Mateusz Guzik	13c73428dc	nullfs: use VOP_NEED_INACTIVE Reviewed by: kib Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation	2019-08-30 00:30:03 +00:00
Mark Johnston	9222b82368	Remove unused VM page locking macros. They were orphaned by r292373. Reviewed by: asomers MFC after: 1 week Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21469	2019-08-29 22:13:15 +00:00
Konstantin Belousov	6470c8d3db	Rework v_object lifecycle for vnodes. Current implementation of vnode_create_vobject() and vnode_destroy_vobject() is written so that it prepared to handle the vm object destruction for live vnode. Practically, no filesystems use this, except for some remnants that were present in UFS till today. One of the consequences of that model is that each filesystem must call vnode_destroy_vobject() in VOP_RECLAIM() or earlier, as result all of them get rid of the v_object in reclaim. Move the call to vnode_destroy_vobject() to vgonel() before VOP_RECLAIM(). This makes v_object stable: either the object is NULL, or it is valid vm object till the vnode reclamation. Remove code from vnode_create_vobject() to handle races with the parallel destruction. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21412	2019-08-29 07:50:25 +00:00
Mateusz Guzik	a89cd2a4bd	tmpfs: use VOP_NEED_INACTIVE Reviewed by: kib Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21371	2019-08-28 20:35:23 +00:00
Mateusz Guzik	1e2f0ceb2f	vfs: add VOP_NEED_INACTIVE vnode usecount drops to 0 all the time (e.g. for directories during path lookup). When that happens the kernel would always lock the exclusive lock for the vnode in order to call vinactive(). This blocks other threads who want to use the vnode for looukp. vinactive is very rarely needed and can be tested for without the vnode lock held. This patch gives filesytems an opportunity to do it, sample total wait time for tmpfs over 500 minutes of poudriere -j 104: before: 557563641706 (lockmgr:tmpfs) after: 46309603301 (lockmgr:tmpfs) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21371	2019-08-28 20:34:24 +00:00
Alan Somers	5e63333052	fusefs: Fix some bugs regarding the size of the LISTXATTR list * A small error in r338152 let to the returned size always being exactly eight bytes too large. * The FUSE_LISTXATTR operation works like Linux's listxattr(2): if the caller does not provide enough space, then the server should return ERANGE rather than return a truncated list. That's true even though in FUSE's case the kernel doesn't provide space to the client at all; it simply requests a maximum size for the list. We previously weren't handling the case where the server returns ERANGE even though the kernel requested as much size as the server had told us it needs; that can happen due to a race. * We also need to ensure that a pathological server that always returns ERANGE no matter what size we request in FUSE_LISTXATTR won't cause an infinite loop in the kernel. As of this commit, it will instead cause an infinite loop that exits and enters the kernel on each iteration, allowing signals to be processed. Reviewed by: cem MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21287	2019-08-28 04:19:37 +00:00
Mateusz Guzik	4840711516	unionfs: stop passing LK_INTERLOCK to VOP_UNLOCK This is part of the preparation to remove flags argument from VOP_UNLOCK. Also has a side effect of fixing stacking on top of nullfs broken by r351472. Reported by: cy Sponsored by: The FreeBSD Foundation	2019-08-27 20:51:17 +00:00
Mateusz Guzik	33d46a3cef	nullfs: reduce areas protected by vnode interlock Some places only take the interlock to hold the vnode, which was a requiremnt before they started being manipulated with atomics. Use the newly introduced vholdnz to bump the count. Reviewed by: kib Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21358	2019-08-25 05:13:15 +00:00

1 2 3 4 5 ...

4383 Commits