freebsd-skq

Author	SHA1	Message	Date
Mateusz Guzik	abd80ddb94	vfs: introduce v_irflag and make v_type smaller The current vnode layout is not smp-friendly by having frequently read data avoidably sharing cachelines with very frequently modified fields. In particular v_iflag inspected for VI_DOOMED can be found in the same line with v_usecount. Instead make it available in the same cacheline as the v_op, v_data and v_type which all get read all the time. v_type is avoidably 4 bytes while the necessary data will easily fit in 1. Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new flag field with a new value: VIRF_DOOMED. Reviewed by: kib, jeff Differential Revision: https://reviews.freebsd.org/D22715	2019-12-08 21:30:04 +00:00
Rick Macklem	a95cd06e9a	Delete an unused external declaration. Since nfsv4_opflag is no longer used in nfs_clcomsubs.c, delete the external declaration of it. Found during NFSv4.2 code merge. MFC after: 2 weeks	2019-12-08 16:59:36 +00:00
Rick Macklem	8e1906f700	Fix kernel handling of a NFSERR_MINORVERSMISMATCH NFSv4 server reply. When an NFSv4 server replies NFSERR_MINORVERSMISMATCH, it does not generate a status result for the first operation in the compound. Without this patch, this will result in a bogus EBADXDR error return. Returning EBADXDR is relatively harmless, but a correct reply of NFSERR_MINORVERSMISMATCH is needed by the pNFS client to select the correct minor version to use for a File Layout DS now that there can be NFSv4.2 DS servers. mount_nfs.c still needs to be fixed for this, although how the mount fails is only useful to help sysadmins isolate why a mount fails. Found during testing of the NFSv4.2 client and server. MFC after: 2 weeks	2019-12-08 00:06:00 +00:00
Rick Macklem	238da71f91	Add some definitions for NFSv4.2 which will be used by subsequent commits. This is a preliminary commit of NFSv4.2 definitions that will be used by subsequent commits which adds NFSv4.2 support to the NFS client and server. There will be a series of these preliminary commits that will prepare for a major commit of the NFSv4.2 client/server changes currently found in subversion under projects/nfsv42/sys.	2019-12-07 23:13:51 +00:00
Rick Macklem	394dae30b1	Set the XATTRSUPPORT attribute bit for NFSv4.2, always cleared for now. Since r355472 added code which clears the XATTRSUPPORT bit for non-NFSv4.2 mounts, it is now safe to set it. There will be a series of these preliminary commits that will prepare for a major commit of the NFSv4.2 client/server changes currently found in subversion under projects/nfsv42/sys. This commit completes updates to nfsproto.h required by the NFSv4.2.	2019-12-07 01:10:38 +00:00
Rick Macklem	2096ce0339	Add a couple of definitions for NFSv4.2 and update macros to use them. This patch adds code to macros to clear attribute bits not supported by NFSv4.2. For now, these bits are never set anyhow, but this prepares the code for the addition of NFSv4.2 support in a future commit. There will be a series of these preliminary commits that will prepare for a major commit of the NFSv4.2 client/server changes currently found in subversion under projects/nfsv42/sys.	2019-12-06 23:51:11 +00:00
Rick Macklem	8f9259a550	Add some definitions for NFSv4.2 which will be used by subsequent commits. This is a preliminary commit of NFSv4.2 definitions that will be used by subsequent commits which adds NFSv4.2 support to the NFS client and server. There will be a series of these preliminary commits that will prepare for a major commit of the NFSv4.2 client/server changes currently found in subversion under projects/nfsv42/sys.	2019-12-06 01:53:02 +00:00
Mateusz Guzik	1e0006e49c	nullfs: locklessly check for entries in null_hashget During random sampling over poudriere -j 104 over 10% of calls returned NULL.	2019-12-05 13:41:22 +00:00
Konstantin Belousov	a51c8071a3	Stop using per-mount tmpfs zones. Requested and reviewed by: jeff Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D22643	2019-12-05 00:03:17 +00:00
Rick Macklem	348d9b567e	Add some definitions for NFSv4.2 which will be used by subsequent commits. This is a preliminary commit of NFSv4.2 definitions that will be used by subsequent commits which adds NFSv4.2 support to the NFS client and server. There will be a series of these preliminary commits that will prepare for a major commit of the NFSv4.2 client/server changes currently found in subversion under projects/nfsv42/sys.	2019-12-04 23:24:40 +00:00
Mateusz Guzik	ba08feecbf	tmpfs: use proper macros for permission values in tmpfs_access While here group them in one var to prevent overy long lines. Perhaps a general macro of the same sort should be introduced. Requested by: kib	2019-12-01 00:34:49 +00:00
Kyle Evans	1b50b999f9	tty: implement TIOCNOTTY Generally, it's preferred that an application fork/setsid if it doesn't want to keep its controlling TTY, but it could be that a debugger is trying to steal it instead -- so it would hook in, drop the controlling TTY, then do some magic to set things up again. In this case, TIOCNOTTY is quite handy and still respected by at least OpenBSD, NetBSD, and Linux as far as I can tell. I've dropped the note about obsoletion, as I intend to support TIOCNOTTY as long as it doesn't impose a major burden. Reviewed by: bcr (manpages), kib Differential Revision: https://reviews.freebsd.org/D22572	2019-11-30 20:10:50 +00:00
Mateusz Guzik	a02cab334c	devfs: introduce a per-dev lock to protect ->si_devsw This allows bumping threadcount without taking the global devmtx lock. In particular this eliminates contention on said lock while using bhyve with multiple vms. Reviewed by: kib Tested by: markj MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22548	2019-11-30 16:46:19 +00:00
Mateusz Guzik	0f4b850e85	tmpfs: add fast path to tmpfs_access for common case lookup VEXEC consists of vast majority of all calls and almost all targets have at least 0111.	2019-11-30 16:41:47 +00:00
Konstantin Belousov	9698d99230	In nfs_lock(), recheck vp->v_data after lock before accessing it. We might race with reclaim, and then this is no longer a nfs vnode, in which case we do not need to handle deferred vnode_pager_setsize() either. Reported by: rk@ronald.org PR: 242184 Sponsored by: The FreeBSD Foundation MFC after: 3 days	2019-11-29 13:55:56 +00:00
Rick Macklem	e1cda5eea6	Fix two races while handling nfsuserd daemon start/stop. A crash was reported where the nr_client field was NULL during an upcall to the nfsuserd daemon. Since nr_client == NULL only occurs when the nfsuserd daemon is being shut down, it appeared to be caused by a race between doing an upcall and the daemon shutting down. By inspection two races were identified: 1 - The nfsrv_nfsuserd variable is used to indicate whether or not the daemon is running. However it did not handle the intermediate phase where the daemon is starting or stopping. This was fixed by making nfsrv_nfsuserd tri-state and having the functions that are called during start/stop to obey the intermediate state. 2 - nfsrv_nfsuserd was checked to see that the daemon was running at the beginning of an upcall, but nothing prevented the daemon from being shut down while an upcall was still in progress. This race probably caused the crash. The patch fixes this by adding a count of upcalls in progress and having the shut down function delay until this count goes to zero before getting rid of nr_client and related data used by an upcall. Tested by: avg (Panzura QA) Reported by: avg Reviewed by: avg MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D22377	2019-11-28 23:34:23 +00:00
Konstantin Belousov	dbe257d253	tmpfs: resolve deadlock between rename and unmount. Top-level kern_renameat() increases the writecount on the mount point, which, together with tmpfs unmount suspending the mount, already ensures that unmount cannot proceed while rename unlocks and relocks all operated vnodes. Remove vfs_busy() call from tmpfs_rename() which was done while holding a vnode lock, creating the deadlock. The only intent of the busy operation seems to be the prevention of unmount, which is already ensured. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-11-24 19:06:38 +00:00
Rick Macklem	14eff785e8	Fix the pNFS server's reporting of SpaceUsed (va_bytes). The pNFS server currently reports SpaceUsed (va_bytes) for the metadata file. This in not correct, since the metadata file is always empty and, as such, va_bytes is just the allocation for the empty file. This patch adds va_bytes to the list of attributes acquired from the DS for a file, so that it includes the allocated data size and is updated when the file is written. For files created on a pNFS server before this patch is applied, the va_bytes value is estimated by rounding va_size up to a multiple of BLKDEV_IOSIZE. Once the file is written after this patch has been applied to the metadata server, the va_bytes returned for the file will be correct. This patch only affects a pNFS metadata server. Found during testing of the NFSv4.2 pNFS server for the Allocate operation. (Not yet in head/current.) MFC after: 2 weeks	2019-11-22 00:22:55 +00:00
Mateusz Guzik	1fccb43c39	vfs: change si_usecount management to count used vnodes Currently si_usecount is effectively a sum of usecounts from all associated vnodes. This is maintained by special-casing for VCHR every time usecount is modified. Apart from complicating the code a little bit, it has a scalability impact since it forces a read from a cacheline shared with said count. There are no consumers of the feature in the ports tree. In head there are only 2: revoke and devfs_close. Both can get away with a weaker requirement than the exact usecount, namely just the count of active vnodes. Changing the meaning to the latter means we only need to modify it on 0<->1 transitions, avoiding the check plenty of times (and entirely in something like vrefact). Reviewed by: kib, jeff Tested by: pho Differential Revision: https://reviews.freebsd.org/D22202	2019-11-20 12:05:59 +00:00
Jeff Roberson	639676877b	Simplify anonymous memory handling with an OBJ_ANON flag. This eliminates reudundant complicated checks and additional locking required only for anonymous memory. Introduce vm_object_allocate_anon() to create these objects. DEFAULT and SWAP objects now have the correct settings for non-anonymous consumers and so individual consumers need not modify the default flags to create super-pages and avoid ONEMAPPING/NOSPLIT. Reviewed by: alc, dougm, kib, markj Tested by: pho Differential Revision: https://reviews.freebsd.org/D22119	2019-11-19 23:19:43 +00:00
Jeff Roberson	67d0e29304	Replace OBJ_MIGHTBEDIRTY with a system using atomics. Remove the TMPFS_DIRTY flag and use the same system. This enables further fault locking improvements by allowing more faults to proceed with a shared lock. Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D22116	2019-10-29 21:06:34 +00:00
Mateusz Guzik	2e310f6f72	pseudofs: hashed vncache Vast majority of uses the cache are just checking if there is an entry present on process exit (and evicting it if so). Both checking and eviction process are very expensive and put the lock protecting it high up on the profile during poudriere -j 104. Convert the linked list into a hash. This allows to almost always avoid taking the lock in the first place (and consequently almost removes it from the profile). Note only one lock is preserved as a split did not meaningfully impact contention. Should the cache be used for something it will still run into contention issues. The code needs a rewrite, but should someone want to tidy it up further the following can be done: 1) per-chain locks (or at least an array) 2) hashing by something else than just pid Sponsored by: The FreeBSD Foundation	2019-10-22 22:52:53 +00:00
Konstantin Belousov	c6ba06d86c	Fix interface between nfsclient and vnode pager. Make the nfsclient always call vnode_pager_setsize() with the vnode exclusively locked. This ensures that page fault always can find the backing page if the object size check succeeded. Set VV_VMSIZEVNLOCK flag on NFS nodes. The main offender breaking the interface in nfsclient is nfs_loadattrcache(), which is used whenever server responded with updated attributes, which can happen on non-changing operations as well. Also, iod threads only have buffers locked (and even that is LK_KERNPROC), but they still may call nfs_loadattrcache() on RPC response. Instead of immediately calling vnode_pager_setsize() if server response indicated changed file size, but the vnode is not exclusively locked, set a new node flag NVNSETSZSKIP. When the vnode exclusively locked, or when we can temporary upgrade the lock to exclusive, call vnode_pager_setsize(), by providing the nfsclient VOP_LOCK() implementation. Tested by: pho Discussed with: rmacklem Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D21883	2019-10-22 16:17:38 +00:00
Jeff Roberson	0012f373e4	(4/6) Protect page valid with the busy lock. Atomics are used for page busy and valid state when the shared busy is held. The details of the locking protocol and valid and dirty synchronization are in the updated vm_page.h comments. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21594	2019-10-15 03:45:41 +00:00
Jeff Roberson	63e9755548	(1/6) Replace busy checks with acquires where it is trival to do so. This is the first in a series of patches that promotes the page busy field to a first class lock that no longer requires the object lock for consistency. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21548	2019-10-15 03:35:11 +00:00
Mateusz Guzik	9c04e4c01e	tmpfs: use MNTK_NOMSYNC Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22009	2019-10-13 15:42:41 +00:00
Mateusz Guzik	48c426f226	pseudofs: use MNTK_NOMSYNC Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22009	2019-10-13 15:42:25 +00:00
Mateusz Guzik	be4cd6912f	nullfs: use MNTK_NOMSYNC Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22009	2019-10-13 15:42:04 +00:00
Mateusz Guzik	4c9ba39aea	devfs: use MNTK_NOMSYNC Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22009	2019-10-13 15:41:47 +00:00
Konstantin Belousov	e3fdd051f9	devfs_vptocnp(): correct the component name when node is not at top. Node' cdp.si_name is the full path as provided by make_dev(9), it should not be returned by VOP_VPTOCNP() when only the last component is requested. Use the dirent entry instead. With this note, handling of VDIR and VCHR nodes only differs in handling of root vnode, which simplifies and unifies the logic. Reported by: Li, Zhichao1 <Zhichao_Li1@Dell.com> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-10-11 18:41:24 +00:00
Konstantin Belousov	53fcc6c960	Plug the rest of undef behavior places that were missed in r337456. There are three more places in msdosfs_fat.c which might shift one into the sign bit. While there, fix formatting of KASSERTs. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-10-11 18:37:02 +00:00
Doug Moore	2288078c5e	Define macro VM_MAP_ENTRY_FOREACH for enumerating the entries in a vm_map. In case the implementation ever changes from using a chain of next pointers, then changing the macro definition will be necessary, but changing all the files that iterate over vm_map entries will not. Drop a counter in vm_object.c that would have an effect only if the vm_map entry count was wrong. Discussed with: alc Reviewed by: markj Tested by: pho (earlier version) Differential Revision: https://reviews.freebsd.org/D21882	2019-10-08 07:14:21 +00:00
Mateusz Guzik	d511f93e45	nfsclient: add root vnode caching See r353150. Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21646	2019-10-06 22:17:29 +00:00
Mateusz Guzik	7682d0be2b	tmpfs: add root vnode caching See r353150. Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21646	2019-10-06 22:17:11 +00:00
Mateusz Guzik	559ac49d41	devfs: add root vnode caching See r353150. Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21646	2019-10-06 22:16:55 +00:00
Mateusz Guzik	dfa8dae493	devfs: plug redundant bwillwrite avoidance vn_write already checks for vnode type to see if bwillwrite should be called. This effectively reverts r244643. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21905	2019-10-05 17:44:33 +00:00
Konstantin Belousov	c5dac63c15	tmpfs_readdir(): unlock the locked node. During readdir() we guarantee that the tn_dir.tn_parent does not go away, but it might be replaced by a parallel rename. Read tn_parent only once, then use the cached value. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-10-03 19:55:05 +00:00
Konstantin Belousov	f7e69a6fa0	tmpfs_rename: style. Reformat multi-line comments to follow style. Also fix some typos. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-10-03 19:51:56 +00:00
Konstantin Belousov	d60ac9d561	Remove unnecessary vm/vm_page.h and vm/vm_pager.h includes from tmpfs/tmpfs_vnodes.c. Submitted by: ota@j.email.ne.jp MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21881	2019-10-03 08:25:09 +00:00
Rick Macklem	ee7201a725	Replace all mtx_assert() calls for n_mtx and ncl_iod_mutex with macros. To be consistent with replacing the mtx_lock()/mtx_unlock() calls on the NFS node mutex (n_mtx) and ncl_iod_mutex, this patch replaces all mtx_assert() calls on these mutexes with macros as well. This will simplify changing these locks to sx locks in a future commit. However, this change may be delayed indefinitely, since it appears there is a deadlock when vnode_pager_setsize() is called to shrink the size and the NFS node lock is held. There is no semantic change as a result of this commit. Suggested by: kib MFC after: 1 week	2019-09-26 02:54:45 +00:00
Rick Macklem	b662b41e62	Replace all mtx_lock()/mtx_unlock() on the iod lock with macros. Since the NFS node mutex needs to change to an sx lock so it can be held when vnode_pager_setsize() is called and the iod lock is held when the NFS node lock is acquired, the iod mutex will need to be changed to an sx lock as well. To simply the future commit that changes both the NFS node lock and iod lock to sx locks, this commit replaces all mtx_lock()/mtx_unlock() calls on the iod lock with macros. There is no semantic change as a result of this commit. I don't know when the future commit will happen and be MFC'd, so I have set the MFC on this commit to one week so that it can be MFC'd at the same time. Suggested by: kib MFC after: 1 week	2019-09-24 23:38:10 +00:00
Rick Macklem	5d85e12f44	Replace all mtx_lock()/mtx_unlock() on n_mtx with the macros. For a long time, some places in the NFS code have locked/unlocked the NFS node lock with the macros NFSLOCKNODE()/NFSUNLOCKNODE() whereas others have simply used mtx_lock()/mtx_unlock(). Since the NFS node mutex needs to change to an sx lock so it can be held when vnode_pager_setsize() is called, replace all occurrences of mtx_lock/mtx_unlock with the macros to simply making the change to an sx lock in future commit. There is no semantic change as a result of this commit. I am not sure if the change to an sx lock will be MFC'd soon, so I put an MFC of 1 week on this commit so that it could be MFC'd with that commit. Suggested by: kib MFC after: 1 week	2019-09-24 01:58:54 +00:00
Kyle Evans	5fdac75222	msdosfs: do not deget unlinked denodes When a file is unlinked, the denode is not reclaimed until the last reference is dropped, but the directory entry is immediately up for reuse. This is a problem later when createde goes to grab a denode for the newly created entry -- we search the hash and find a dead denode, then return that without even bumping the reference count and the data later gets truncated when the the last reference to the unlinked file is dropped. This manifested itself as a broken in-place strip(1) on msdosfs. elfcopy will do a sequence incredibly roughly like this: open("/mnt/foo", ...) => fd 3 mmap() unlink("/mnt/foo") open("/mnt/foo", ...) => fd 4 write(4, ...) close(4) close(3) and the resulting file would be truncated, but the write succeeded, as long as a reference to the unlinked file had not been closed. Some archaeology indicates that this bug has likely existed since msdosfs was converted to use vfs_hash instead of a home rolled hash implementation in r143570. Prior to that point, the hashget implementation would do a refcnt check while searching and explicitly only return a denode with de_refcnt != 0. vfs_hash did not yet have the callback that it does today, so this slipped away and did not come back when it later grew that functionality. The comment indicating that we want to skip these denodes has been updated to reflect where this is actually done. My repo-diving session seems to indicate that the refcnt check was likely never actually below the comment, to be pedantic, but instead a detail wrapped up in the hashget implementation since the beginning of its inclusion into FreeBSD. This bug was the cause behind the issue addressed in r352557. Reported by: jhibbits Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D21731	2019-09-20 20:47:10 +00:00
Konstantin Belousov	6fd583583b	Further refine r352393, only call vnode_pager_setsize() outside the node lock when shrinking. This is similar to r252528, applied to the above commit. Apparently there is a race which makes necessary at least to keep the n_size and pager size consistent when extending. Current suspect is that iod threads perform vnode_pager_setsize() without taking the vnode lock, which corrupts the file content. Reported and tested by: Masachika ISHIZUKA <ish@amail.plala.or.jp> Discussed with: rmacklem (related issues) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-09-17 18:41:39 +00:00
Mateusz Guzik	4cace859c2	vfs: convert struct mount counters to per-cpu There are 3 counters modified all the time in this structure - one for keeping the structure alive, one for preventing unmount and one for tracking active writers. Exact values of these counters are very rarely needed, which makes them a prime candidate for conversion to a per-cpu scheme, resulting in much better performance. Sample benchmark performing fstatfs (modifying 2 out of 3 counters) on a 104-way 2 socket Skylake system: before: 852393 ops/s after: 76682077 ops/s Reviewed by: kib, jeff Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21637	2019-09-16 21:37:47 +00:00
Alan Somers	320c848ff6	Fix an off-by-one error from r351961 That revision addressed a Coverity CID that could lead to a buffer overflow, but it had an off-by-one error in the buffer size check. Reported by: Coverity Coverity CID: 1405530 MFC after: 3 days MFC-With: 351961 Sponsored by: The FreeBSD Foundation	2019-09-16 16:41:01 +00:00
Alan Somers	42767f76af	fusefs: fix some minor issues with fuse_vnode_setparent * When unparenting a vnode, actually clear the flag. AFAIK this is basically a no-op because we only unparent a vnode when reclaiming it or when unlinking. * There's no need to call fuse_vnode_setparent during reclaim, because we're about to free the vnode data anyway. Reviewed by: emaste MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21630	2019-09-16 14:51:49 +00:00
Konstantin Belousov	1246ee664b	nfscl_loadattrcache: fix rest of the cases to not call vnode_pager_setsize() under the node mutex. r248567 moved some calls of vnode_pager_setsize() after the node lock is unlocked, do the rest now. Reported and tested by: peterj Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-09-16 13:26:27 +00:00
Edward Tomasz Napierala	cf38985293	Make pseudofs(9) create directory entries in order, instead of the reverse. This fixes Linux sysctl(8) binary - it assumes the first two directory entries are always "." and "..". There might be other Linux apps affected by this. NB it might be a good idea to rewrite it using queue(3). Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21550	2019-09-14 19:16:07 +00:00
Conrad Meyer	aaa3852435	buf: Add B_INVALONERR flag to discard data Setting the B_INVALONERR flag before a synchronous write causes the buf cache to forcibly invalidate contents if the write fails (BIO_ERROR). This is intended to be used to allow layers above the buffer cache to make more informed decisions about when discarding dirty buffers without successful write is acceptable. As a proof of concept, use in msdosfs to handle failures to mark the on-disk 'dirty' bit during rw mount or ro->rw update. Extending this to other filesystems is left as future work. PR: 210316 Reviewed by: kib (with objections) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D21539	2019-09-11 21:24:14 +00:00
Alan Somers	6c0c362075	fusefs: Fix iosize for FUSE_WRITE in 7.8 compat mode When communicating with a FUSE server that implements version 7.8 (or older) of the FUSE protocol, the FUSE_WRITE request structure is 16 bytes shorter than normal. The protocol version check wasn't applied universally, leading to an extra 16 bytes being sent to such servers. The extra bytes were allocated and bzero()d, so there was no information disclosure. Reviewed by: emaste MFC after: 3 days MFC-With: r350665 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21557	2019-09-11 19:29:40 +00:00
Mark Johnston	fee2a2fa39	Change synchonization rules for vm_page reference counting. There are several mechanisms by which a vm_page reference is held, preventing the page from being freed back to the page allocator. In particular, holding the page's object lock is sufficient to prevent the page from being freed; holding the busy lock or a wiring is sufficent as well. These references are protected by the page lock, which must therefore be acquired for many per-page operations. This results in false sharing since the page locks are external to the vm_page structures themselves and each lock protects multiple structures. Transition to using an atomically updated per-page reference counter. The object's reference is counted using a flag bit in the counter. A second flag bit is used to atomically block new references via pmap_extract_and_hold() while removing managed mappings of a page. Thus, the reference count of a page is guaranteed not to increase if the page is unbusied, unmapped, and the object's write lock is held. As a consequence of this, the page lock no longer protects a page's identity; operations which move pages between objects are now synchronized solely by the objects' locks. The vm_page_wire() and vm_page_unwire() KPIs are changed. The former requires that either the object lock or the busy lock is held. The latter no longer has a return value and may free the page if it releases the last reference to that page. vm_page_unwire_noq() behaves the same as before; the caller is responsible for checking its return value and freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is introduced for use in pmap_extract_and_hold(). It fails if the page is concurrently being unmapped, typically triggering a fallback to the fault handler. vm_page_wire() no longer requires the page lock and vm_page_unwire() now internally acquires the page lock when releasing the last wiring of a page (since the page lock still protects a page's queue state). In particular, synchronization details are no longer leaked into the caller. The change excises the page lock from several frequently executed code paths. In particular, vm_object_terminate() no longer bounces between page locks as it releases an object's pages, and direct I/O and sendfile(SF_NOCACHE) completions no longer require the page lock. In these latter cases we now get linear scalability in the common scenario where different threads are operating on different files. __FreeBSD_version is bumped. The DRM ports have been updated to accomodate the KPI changes. Reviewed by: jeff (earlier version) Tested by: gallatin (earlier version), pho Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20486	2019-09-09 21:32:42 +00:00
Ed Maste	4f0372f8cb	msdosfsmount.h: fix ifdef comment	2019-09-09 18:35:17 +00:00
Alan Somers	16f8783452	Coverity fixes in fusefs(5) CID 1404532 fixes a signed vs unsigned comparison error in fuse_vnop_bmap. It could potentially have resulted in VOP_BMAP reporting too many consecutive blocks. CID 1404364 is much worse. It was an array access by an untrusted, user-provided variable. It could potentially have resulted in a malicious file system crashing the kernel or worse. Reported by: Coverity Reviewed by: emaste MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21466	2019-09-06 19:40:11 +00:00
Conrad Meyer	454409a372	msdosfs: Remove redundant brelse() after r294954 Same automation. No functional change.	2019-09-06 08:08:10 +00:00
Conrad Meyer	dadc0f97d0	cd9660: Remove redundant brelse() after r294954 Same automation. No functional change.	2019-09-06 08:07:36 +00:00
Conrad Meyer	fe8b34563d	ext2fs: Remove redundant brelse() after r294954 Coccinelle: @ rule1 @ identifier __error; @@ ... int __error; ... @ rule2 depends on rule1 @ identifier rule1.__error; identifier __bp; @@ __error = ( bread \| bread_gb \| breadn \| breadn_flags ) (..., &__bp); if ( ( __error \| __error != 0 ) ) { ... - brelse(__bp); ... } No functional change.	2019-09-06 08:07:12 +00:00
Rick Macklem	4ce21f37fd	Delete the unused "nd" argument for nfsrv_proxyds(). The "nd" argument for nfsrv_proxyds() is no longer used by the function. This patch deletes it. This allows a subsequent patch to delete the "nd" argument from nfsvno_getattr(), since it's only use of "nd" was to pass it to nfsrv_proxyds(). Getting rid of the "nd" argument from nfsvno_getattr() avoids confusion over why it might need "nd". This patch is trivial and does not have any semantic effect.	2019-09-05 22:25:19 +00:00
Conrad Meyer	a6935d085c	Remove long-dead BUF_ASSERT_{,UN}HELD assertions These were fully neutered in r177676 (2008), but not removed at the time for unclear reasons. They're totally dead code, so go ahead and yank them now. No functional change.	2019-09-05 21:43:33 +00:00
Conrad Meyer	f80cbeb292	msdosfs: Drop an unneeded brelse in bread error condition After r294954, it is an invariant that bread returns non-NULL bp if and only if the routine succeeded. On error, it handles any buffer cleanup internally. So the brelse(NULL) here was just redundant. No functional change. Discussed with: kib (extracted from a larger differential)	2019-09-05 21:30:52 +00:00
Rick Macklem	2e67077700	Delete the unused "nd" argument for nfsrv_checkdsattr(). The "nd" argument for nfsrv_checkdsattr() is no longer used by the function. This patch deletes it. This allows subsequent patches to delete the "nd" argument from nfsrv_proxyds(), since it's only use of "nd" was to pass it to nfsrv_checkdsattr(). The same will then be true for nfsvno_getattr(), which passes "nd" to nfsrv_proxyds(). Getting rid of the "nd" argument from nfsvno_getattr() avoids confusion over why it might need "nd". This patch is trivial and does not have any semantic effect. Found by inspection while working on the NFSv4.2 server.	2019-09-04 22:37:28 +00:00
Kyle Evans	f99c5e8d28	pseudofs: make readdir work without a pid again Specifically, the following was broken: $ mount -t procfs procfs /proc $ ls -l /proc r351741 reworked readdir slightly to avoid pfs_node/pidhash LOR, but inadvertently regressed pid == NO_PID; new pfs_lookup_proc() fails for the obvious reasons, and later pfs_visible_proc doesn't capture the pid == NO_PID -> return 1 aspect of pfs_visible. We can infact skip this whole block if we're operating on a directory w/ NO_PID, as it's always visible. Reported by: trasz Reviewed by: mjg Differential Revision: https://reviews.freebsd.org/D21518	2019-09-04 14:20:39 +00:00
Mateusz Guzik	f5791174df	pseudofs: fix a LOR pfs_node vs pidhash (sleepable after non-sleepable) Sponsored by: The FreeBSD Foundation	2019-09-03 12:54:51 +00:00
Ed Maste	840aca2880	makefs: share msdosfsmount.h between kernel msdosfs and makefs Sponsored by: The FreeBSD Foundation	2019-09-01 16:55:33 +00:00
Mateusz Guzik	e0f4540a2a	nullfs: reduce areas protected by vnode interlock in null_lock Similarly to the other routine stop taking the interlock for the lower vnode. The interlock for nullfs vnode is still taken to ensure stability of ->v_data. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21480	2019-09-01 02:52:00 +00:00
Mateusz Guzik	13c73428dc	nullfs: use VOP_NEED_INACTIVE Reviewed by: kib Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation	2019-08-30 00:30:03 +00:00
Mark Johnston	9222b82368	Remove unused VM page locking macros. They were orphaned by r292373. Reviewed by: asomers MFC after: 1 week Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21469	2019-08-29 22:13:15 +00:00
Konstantin Belousov	6470c8d3db	Rework v_object lifecycle for vnodes. Current implementation of vnode_create_vobject() and vnode_destroy_vobject() is written so that it prepared to handle the vm object destruction for live vnode. Practically, no filesystems use this, except for some remnants that were present in UFS till today. One of the consequences of that model is that each filesystem must call vnode_destroy_vobject() in VOP_RECLAIM() or earlier, as result all of them get rid of the v_object in reclaim. Move the call to vnode_destroy_vobject() to vgonel() before VOP_RECLAIM(). This makes v_object stable: either the object is NULL, or it is valid vm object till the vnode reclamation. Remove code from vnode_create_vobject() to handle races with the parallel destruction. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21412	2019-08-29 07:50:25 +00:00
Mateusz Guzik	a89cd2a4bd	tmpfs: use VOP_NEED_INACTIVE Reviewed by: kib Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21371	2019-08-28 20:35:23 +00:00
Mateusz Guzik	1e2f0ceb2f	vfs: add VOP_NEED_INACTIVE vnode usecount drops to 0 all the time (e.g. for directories during path lookup). When that happens the kernel would always lock the exclusive lock for the vnode in order to call vinactive(). This blocks other threads who want to use the vnode for looukp. vinactive is very rarely needed and can be tested for without the vnode lock held. This patch gives filesytems an opportunity to do it, sample total wait time for tmpfs over 500 minutes of poudriere -j 104: before: 557563641706 (lockmgr:tmpfs) after: 46309603301 (lockmgr:tmpfs) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21371	2019-08-28 20:34:24 +00:00
Alan Somers	5e63333052	fusefs: Fix some bugs regarding the size of the LISTXATTR list * A small error in r338152 let to the returned size always being exactly eight bytes too large. * The FUSE_LISTXATTR operation works like Linux's listxattr(2): if the caller does not provide enough space, then the server should return ERANGE rather than return a truncated list. That's true even though in FUSE's case the kernel doesn't provide space to the client at all; it simply requests a maximum size for the list. We previously weren't handling the case where the server returns ERANGE even though the kernel requested as much size as the server had told us it needs; that can happen due to a race. * We also need to ensure that a pathological server that always returns ERANGE no matter what size we request in FUSE_LISTXATTR won't cause an infinite loop in the kernel. As of this commit, it will instead cause an infinite loop that exits and enters the kernel on each iteration, allowing signals to be processed. Reviewed by: cem MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21287	2019-08-28 04:19:37 +00:00
Mateusz Guzik	4840711516	unionfs: stop passing LK_INTERLOCK to VOP_UNLOCK This is part of the preparation to remove flags argument from VOP_UNLOCK. Also has a side effect of fixing stacking on top of nullfs broken by r351472. Reported by: cy Sponsored by: The FreeBSD Foundation	2019-08-27 20:51:17 +00:00
Mateusz Guzik	33d46a3cef	nullfs: reduce areas protected by vnode interlock Some places only take the interlock to hold the vnode, which was a requiremnt before they started being manipulated with atomics. Use the newly introduced vholdnz to bump the count. Reviewed by: kib Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21358	2019-08-25 05:13:15 +00:00
Ed Maste	b257feb247	msdosfs_fat: reduce diffs with NetBSD and makefs Use pointer arithmetic (as now done in makefs, and in NetBSD) instead of taking the address of array element. No functional change, but this makes it easier to compare different versions of this file. Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21365	2019-08-22 16:06:52 +00:00
Mateusz Guzik	81f666e79d	nullfs: lock the vnode with LK_SHARED in null_vptocnp null_nodeget which follows almost always finds the target vnode in the hash, avoiding insmntque1 altogether. Should it be needed, it already checks if the lock needs to be upgraded. Reviewed by: kib Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20244	2019-08-21 23:24:40 +00:00
Ed Maste	476b0ab758	makefs: share denode.h between kernel msdosfs and makefs There is no need to duplicate this file when it can be trivially shared (just exposing sections previously under #ifdef _KERNEL). MFC with: r351273 Differential Revision: The FreeBSD Foundation	2019-08-21 19:07:13 +00:00
Ed Maste	51e79affa3	makefs: share fat.h between kernel msdosfs and makefs There is no reason to duplicate this file when it can be trivially shared (just exposing one section previously under #ifdef _KERNEL). Reviewed by: imp, cem MFC with: r351273 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21346	2019-08-21 02:21:40 +00:00
Konstantin Belousov	de4e1aeb21	Fix an issue with executing tmpfs binary. Suppose that a binary was executed from tmpfs mount, and the text vnode was reclaimed while the binary was still running. It is possible during even the normal operations since tmpfs vnode' vm_object has swap type, and no references on the vnode is held. Also assume that the text vnode was revived for some reason. Then, on the process exit or exec, unmapping of the text mapping tries to remove the text reference from the vnode, but since it went from recycle/instantiation cycle, there is no reference kept, and assertion in VOP_UNSET_TEXT_CHECKED() triggers. Fix this by keeping a use reference on the tmpfs vnode for each exec reference. This prevents the vnode reclamation while executable map entry is active. Do it by adding per-mount flag MNTK_TEXT_REFS that directs vop_stdset_text() to add use ref on first vnode text use, and per-vnode VI_TEXT_REF flag, to record the need on unref in vop_stdunset_text() on last vnode text use going away. Set MNTK_TEXT_REFS for tmpfs mounts. Reported by: bdrewery Tested by: sbruno, pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-08-18 20:36:11 +00:00
Alan Somers	3a79e8e772	fusefs: don't send the namespace during listextattr The FUSE_LISTXATTR operation always returns the full list of a file's extended attributes, in all namespaces. There's no way to filter the list server-side. However, currently FreeBSD's fusefs driver sends a namespace string with the FUSE_LISTXATTR request. That behavior was probably copied from fuse_vnop_getextattr, which has an attribute name argument. It's been there ever since extended attribute support was added in r324620. This commit removes it. Reviewed by: cem MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21280	2019-08-16 05:06:54 +00:00
Alan Somers	bf50749773	fusefs: Fix the size of fuse_getattr_in In FUSE protocol 7.9, the size of the FUSE_GETATTR request has increased. However, the fusefs driver is currently not sending the additional fields. In our implementation, the additional fields are always zero, so I there haven't been any test failures until now. But fusefs-lkl requires the request's length to be correct. Fix this bug, and also enhance the test suite to catch similar bugs. PR: 239830 MFC after: 2 weeks MFC-With: 350665 Sponsored by: The FreeBSD Foundation	2019-08-14 20:45:00 +00:00
Alan Somers	0b4275accb	fusefs: merge from projects/fuse2 This commit imports the new fusefs driver. It raises the protocol level from 7.8 to 7.23, fixes many bugs, adds a test suite for the driver, and adds many new features. New features include: * Optional kernel-side permissions checks (-o default_permissions) * Implement VOP_MKNOD, VOP_BMAP, and VOP_ADVLOCK * Allow interrupting FUSE operations * Support named pipes and unix-domain sockets in fusefs file systems * Forward UTIME_NOW during utimensat(2) to the daemon * kqueue support for /dev/fuse * Allow updating mounts with "mount -u" * Allow exporting fusefs file systems over NFS * Server-initiated invalidation of the name cache or data cache * Respect RLIMIT_FSIZE * Try to support servers as old as protocol 7.4 Performance enhancements include: * Implement FUSE's FOPEN_KEEP_CACHE and FUSE_ASYNC_READ flags * Cache file attributes * Cache lookup entries, both positive and negative * Server-selectable cache modes: writethrough, writeback, or uncached * Write clustering * Readahead * Use counter(9) for statistical reporting PR: 199934 216391 233783 234581 235773 235774 235775 PR: 236226 236231 236236 236291 236329 236381 236405 PR: 236327 236466 236472 236473 236474 236530 236557 PR: 236560 236844 237052 237181 237588 238565 Reviewed by: bcr (man pages) Reviewed by: cem, ngie, rpokala, glebius, kib, bde, emaste (post-commit review on project branch) MFC after: 3 weeks Relnotes: yes Sponsored by: The FreeBSD Foundation Pull Request: https://reviews.freebsd.org/D21110	2019-08-07 00:38:26 +00:00
Alan Somers	427d205cb5	fusefs: remove superfluous counter_u64_zero Reported by: glebius Sponsored by: The FreeBSD Foundation	2019-08-06 00:50:25 +00:00
Konstantin Belousov	30d49d536b	Try to decrease the number of bugs in unionfs after the VV_TEXT flag removal. - Provide unionfs_add_writecount() which passes the writecount to the lower or upper vnode as appropriate. - In unionfs VOP_RECLAIM() implementation, annulate unionfs writecounts from upper or lower vnode. It is not clear that it is always correct to remove the all references from either lower or upper vnode, but we currently do not track which vnode get how many refs anyway. Reported and tested by: t_uemura@macome.co.jp MFC after: 1 week Sponsored by: The FreeBSD Foundation	2019-08-01 14:40:37 +00:00
Alan Somers	508abc9494	fusefs: fix the build after r350446 fuse needs to include an additional header after r350446 Sponsored by: The FreeBSD Foundation	2019-07-31 21:48:35 +00:00
Alan Somers	58df81b339	MFHead @350426 Sponsored by: The FreeBSD Foundation	2019-07-30 04:17:36 +00:00
Mark Johnston	918988576c	Avoid relying on header pollution from sys/refcount.h. MFC after: 3 days Sponsored by: The FreeBSD Foundation	2019-07-29 20:26:01 +00:00
Alan Somers	669a092af1	fusefs: fix panic when writing with O_DIRECT and using writeback cache When a fusefs file system is mounted using the writeback cache, the cache may still be bypassed by opening a file with O_DIRECT. When writing with O_DIRECT, the cache must be invalidated for the affected portion of the file. Fix some panics caused by inadvertently invalidating too much. Sponsored by: The FreeBSD Foundation	2019-07-28 15:17:32 +00:00
Alan Somers	a63915c2d7	MFHead @r350386 Sponsored by: The FreeBSD Foundation	2019-07-28 04:02:22 +00:00
Alan Somers	ed74f781c9	fusefs: add a intr/nointr mount option FUSE file systems can optionally support interrupting outstanding operations. However, the file system does not identify to the kernel at mount time whether it's capable of doing that. Instead it signals its noncapability by returning ENOSYS to the first FUSE_INTERRUPT operation it receives. That's a problem for reliable signal delivery, because the kernel must choose which thread should get a signal before it knows whether the FUSE server can handle interrupts. The problem is even worse because the FUSE protocol allows a file system to simply ignore all FUSE_INTERRUPT operations. Fix the signal delivery logic by making interruptibility an opt-in mount option. This will require a corresponding change to libfuse, but not to most file systems that link to libfuse. Bump __FreeBSD_version due to the new mount option. Sponsored by: The FreeBSD Foundation	2019-07-18 17:55:13 +00:00
Alan Somers	f05962453e	fusefs: fix another semi-infinite loop bug regarding signal handling fticket_wait_answer would spin if it received an unhandled signal whose default disposition is to terminate. The reason is because msleep(9) would return EINTR even for a masked signal. One reason is when the thread is stopped, which happens for example during sigexit(). Fix this bug by returning immediately if fticket_wait_answer ever gets interrupted a second time, for any reason. Sponsored by: The FreeBSD Foundation	2019-07-18 15:30:00 +00:00
Alan Somers	d26d63a4af	fusefs: multiple interruptility improvements 1) Don't explicitly not mask SIGKILL. kern_sigprocmask won't allow it to be masked, anyway. 2) Fix an infinite loop bug. If a process received both a maskable signal lower than 9 (like SIGINT) and then received SIGKILL, fticket_wait_answer would spin. msleep would immediately return EINTR, but cursig would return SIGINT, so the sleep would get retried. Fix it by explicitly checking whether SIGKILL has been received. 3) Abandon the sig_isfatal optimization introduced by r346357. That optimization would cause fticket_wait_answer to return immediately, without waiting for a response from the server, if the process were going to exit anyway. However, it's vulnerable to a race: 1) fatal signal is received while fticket_wait_answer is sleeping. 2) fticket_wait_answer sends the FUSE_INTERRUPT operation. 3) fticket_wait_answer determines that the signal was fatal and returns without waiting for a response. 4) Another thread changes the signal to non-fatal. 5) The first thread returns to userspace. Instead of exiting, the process continues. 6) The application receives EINTR, wrongly believes that the operation was successfully interrupted, and restarts it. This could cause problems for non-idempotent operations like FUSE_RENAME. Reported by: kib (the race part) Sponsored by: The FreeBSD Foundation	2019-07-17 22:45:43 +00:00
Alan Somers	07e86257e6	fusefs: fix the build with some NODEBUG kernels systm.h needs to be included before counter.h Sponsored by: The FreeBSD Foundation	2019-07-13 21:41:12 +00:00
Alan Somers	97b0512b23	projects/fuse2: build fixes * Fix the kernel build with gcc by removing a redundant extern declaration * In the tests, fix a printf format specifier that assumed LP64 Sponsored by: The FreeBSD Foundation	2019-07-13 14:42:09 +00:00
Fedor Uporov	6ce04e595a	Add additional check for 'blocks per group' and 'fragments per group' superblock fields. These fields will not be equal only in case if bigalloc filesystem feature is turned on. This feature is not supported for now. Reported by: Christopher Krah, Thomas Barabosch, and Jan-Niclas Hilgert of Fraunhofer FKIE Reported as: FS-27-EXT2-12: Denial of Service in openat-0 (vm_fault_hold/ext2_clusteracct) MFC after: 2 weeks	2019-07-07 08:58:02 +00:00
Fedor Uporov	c008656263	Remove ufs fragments logic. The ext2fs fragments are different from ufs fragments. In case of ext2fs the fragment should be equal or more then block size. The values more than block size are used only in case of bigalloc feature, which is does not supported for now. Reported by: Christopher Krah, Thomas Barabosch, and Jan-Niclas Hilgert of Fraunhofer FKIE Reported as: FS-22-EXT2-9: Denial of service in ftruncate-0 (ext2_balloc) MFC after: 2 weeks	2019-07-07 08:56:13 +00:00
Fedor Uporov	590517d05a	Remove unneeded mount point unlock call. Reported by: Christopher Krah, Thomas Barabosch, and Jan-Niclas Hilgert of Fraunhofer FKIE Reported as: FS-11-EXT2-6: Denial Of Service in write-1 (ext2_balloc) MFC after: 2 weeks	2019-07-07 08:53:52 +00:00
Alan Somers	7e1f5432f4	fusefs: don't leak memory of unsent operations on unmount Sponsored by: The FreeBSD Foundation	2019-06-28 18:48:02 +00:00
Alan Somers	8aafc8c389	[skip ci] update copyright headers in fusefs files Sponsored by: The FreeBSD Foundation	2019-06-28 04:18:10 +00:00
Alan Somers	7f49ce7a0b	MFHead @349476 Sponsored by: The FreeBSD Foundation	2019-06-27 23:50:54 +00:00
Alan Somers	c1afff113c	fusefs: fix a memory leak regarding FUSE_INTERRUPT We were leaking the fuse ticket if the original operation completed before the daemon received the INTERRUPT operation. Fixing this was easier than I expected. Sponsored by: The FreeBSD Foundation	2019-06-27 22:24:56 +00:00
Alan Somers	435ecf40bb	fusefs: recycle vnodes after their last unlink Previously fusefs would never recycle vnodes. After VOP_INACTIVE, they'd linger around until unmount or the vnlru reclaimed them. This commit essentially actives and inlines the old reclaim_revoked sysctl, and fixes some issues dealing with the attribute cache and multiply linked files. Sponsored by: The FreeBSD Foundation	2019-06-27 20:18:12 +00:00
Alan Somers	38c8634635	fusefs: counter(9) variables should not be statically initialized Reported by: rpokala Sponsored by: The FreeBSD Foundation	2019-06-27 17:59:15 +00:00
Alan Somers	560a55d094	fusefs: convert statistical sysctls to use counter(9) counter(9) is more performant than using atomic instructions to update sysctls that just report statistics to userland. Sponsored by: The FreeBSD Foundation	2019-06-27 16:30:25 +00:00
Alan Somers	caeea8b4cc	fusefs: fix some memory leaks Fix memory leaks relating to FUSE_BMAP and FUSE_CREATE. There are still leaks relating to FUSE_INTERRUPT, but they'll be harder to fix since the server is legally allowed to never respond to a FUSE_INTERRUPT operation. Sponsored by: The FreeBSD Foundation	2019-06-27 00:00:48 +00:00
Alan Somers	f8ebf1cd7e	fusefs: implement protocol 7.23's FUSE_WRITEBACK_CACHE option As of protocol 7.23, fuse file systems can specify their cache behavior on a per-mountpoint basis. If they set FUSE_WRITEBACK_CACHE in fuse_init_out.flags, then they'll get the writeback cache. If not, then they'll get the writethrough cache. If they set FOPEN_DIRECT_IO in every FUSE_OPEN response, then they'll get no cache at all. The old vfs.fusefs.data_cache_mode sysctl is ignored for servers that use protocol 7.23 or later. However, it's retained for older servers, especially for those running in jails that lack access to the new protocol. This commit also fixes two other minor test bugs: * WriteCluster:SetUp was using an uninitialized variable. * Read.direct_io_pread wasn't verifying that the cache was actually bypassed. Sponsored by: The FreeBSD Foundation	2019-06-26 17:32:31 +00:00
Alan Somers	205696a17d	fusefs: delete some unused mount options The fusefs kernel module allegedly supported no_attrcache, no_readahed, no_datacache, no_namecache, and no_mmap mount options, but the mount_fusefs binary never did. So there was no way to ever activate these options. Delete them. Some of them have alternatives: no_attrcache: set the attr_valid time to 0 in FUSE_LOOKUP and FUSE_GETATTR responses. no_readahed: set max_readahead to 0 in the FUSE_INIT response. no_datacache: set the vfs.fusefs.data_cache_mode sysctl to 0, or (coming soon) set the attr_valid time to 0 and set FUSE_AUTO_INVAL_DATA in the FUSE_INIT response. no_namecache: set entry_valid time to 0 in FUSE_LOOKUP and FUSE_GETATTR responses. Sponsored by: The FreeBSD Foundation	2019-06-26 15:15:24 +00:00
Alan Somers	fef464546c	fusefs: implement the "time_gran" feature. If a server supports a timestamp granularity other than 1ns, it can tell the client this as of protocol 7.23. The client will use that granularity when updating its cached timestamps during write. This way the timestamps won't appear to change following flush. Sponsored by: The FreeBSD Foundation	2019-06-26 02:09:22 +00:00
Alan Somers	0a8fe2d369	fusefs: set ctime during FUSE_SETATTR following a write As of r349396 the kernel will internally update the mtime and ctime of files on write. It will also flush the mtime should a SETATTR happen before the data cache gets flushed. Now it will flush the ctime too, if the server is using protocol 7.23 or higher. This is the only case in which the kernel will explicitly set a file's ctime, since neither utimensat(2) nor any other user interfaces allow it. Sponsored by: The FreeBSD Foundation	2019-06-26 00:03:37 +00:00
Alan Somers	788af9538a	fusefs: automatically update mtime and ctime on write Writing should implicitly update a file's mtime and ctime. For fuse, the server is supposed to do that. But the client needs to do it too, because the FUSE_WRITE response does not include time attributes, and it's not desirable to issue a GETATTR after every WRITE. When using the writeback cache, there's another hitch: the kernel should ignore the mtime and ctime fields in any GETATTR response for files with a dirty write cache. Sponsored by: The FreeBSD Foundation	2019-06-25 23:40:18 +00:00
Alan Somers	0d3a88d76c	fusefs: writes should update the file size, even when data_cache_mode=0 Writes that extend a file should update the file's size. r344185 restricted that behavior for fusefs to only happen when the data cache was enabled. That probably made sense at the time because the attribute cache wasn't fully baked yet. Now that it is, we should always update the cached file size during write. Sponsored by: The FreeBSD Foundation	2019-06-25 18:36:11 +00:00
Alan Somers	b9e2019755	fusefs: rewrite vop_getpages and vop_putpages Use the standard facilities for getpages and putpages instead of bespoke implementations that don't work well with the writeback cache. This has several corollaries: * Change the way we handle short reads _again_. vfs_bio_getpages doesn't provide any way to handle unexpected short reads. Plus, I found some more lock-order problems. So now when the short read is detected we'll just clear the vnode's attribute cache, forcing the file size to be requeried the next time it's needed. VOP_GETPAGES doesn't have any way to indicate a short read to the "caller", so we just bzero the rest of the page whenever a short read happens. * Change the way we decide when to set the FUSE_WRITE_CACHE bit. We now set it for clustered writes even when the writeback cache is not in use. Sponsored by: The FreeBSD Foundation	2019-06-25 17:24:43 +00:00
Hans Petter Selasky	43a9329e1b	Free all allocated unit IDs in cuse(3) after the client character devices have been destroyed to avoid creating character devices with identical name. MFC after: 1 week Sponsored by: Mellanox Technologies	2019-06-25 11:46:01 +00:00
Hans Petter Selasky	c7ffaed92e	Fix for deadlock situation in cuse(3) The final server unref should be done by the server thread to prevent deadlock in the client cdevpriv destructor, which cannot destroy itself. MFC after: 1 week Sponsored by: Mellanox Technologies	2019-06-25 11:42:53 +00:00
Warner Losh	e5500f1efa	Replay r349334 by markj accidentally reverted by r349352 Remove a lingering use of splbio(). The buffer must be locked by the caller. No functional change intended. Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation	2019-06-25 06:14:00 +00:00
Warner Losh	f5a95d9a07	Remove NAND and NANDFS support NANDFS has been broken for years. Remove it. The NAND drivers that remain are for ancient parts that are no longer relevant. They are polled, have terrible performance and just for ancient arm hardware. NAND parts have evolved significantly from this early work and little to none of it would be relevant should someone need to update to support raw nand. This code has been off by default for years and has violated the vnode protocol leading to panics since it was committed. Numerous posts to arch@ and other locations have found no actual users for this software. Relnotes: Yes No Objection From: arch@ Differential Revision: https://reviews.freebsd.org/D20745	2019-06-25 04:50:09 +00:00
Alan Somers	1734e205f3	fusefs: refine the short read fix from r349332 b_fsprivate1 needs to be initialized even for write operations, probably because a buffer can be used to read, write, and read again with the final read serviced by cache. Sponsored by: The FreeBSD Foundation	2019-06-24 20:08:28 +00:00
Mark Johnston	673c1c2944	Remove a lingering use of splbio(). The buffer must be locked by the caller. No functional change intended. Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation	2019-06-24 19:19:37 +00:00
Alan Somers	17575bad85	fusefs: improve the short read fix from r349279 VOP_GETPAGES intentionally tries to read beyond EOF, so fuse_read_biobackend can't rely on bp->b_resid > 0 indicating a short read. And adjusting bp->b_count after a short read seems to cause some sort of resource leak. Instead, store the shortfall in the bp->b_fsprivate1 field. Sponsored by: The FreeBSD Foundation	2019-06-24 17:05:31 +00:00
Alan Somers	44f654fdc5	fusefs: fix corruption on short reads caused by r349279 Even if a short read is caused by EOF, it's still necessary to bzero the remaining buffer, because that buffer could become valid as a result of a future ftruncate or pwrite operation. Reported by: fsx Sponsored by: The FreeBSD Foundation	2019-06-21 23:29:29 +00:00
Alan Somers	aef22f2d75	fusefs: correctly handle short reads A fuse server may return a short read for three reasons: * The file is opened with FOPEN_DIRECT_IO. In this case, the short read should be returned directly to userland. We already handled this case correctly. * The file was truncated server-side, and the read hit EOF. In this case, the kernel should update the file size. Fixed in the case of VOP_READ. Fixing this for VOP_GETPAGES is TODO. * The file is opened in writeback mode, there are dirty buffers past what the server thinks is the file's EOF, and the read hit what the server thinks is the file's EOF. In this case, the client is trying to read a hole, and should zero-fill it. We already handled this case, and I added a test for it. Sponsored by: The FreeBSD Foundation	2019-06-21 21:44:31 +00:00
Alan Somers	87ff949a7b	fusefs: raise protocol level to 7.23 None of the new features are implemented yet. This commit just adds the new protocol definitions and adds backwards-compatibility code for pre 7.23 servers. Sponsored by: The FreeBSD Foundation	2019-06-21 04:57:23 +00:00
Alan Somers	8f9b3ba718	fusefs: use standard integer types in fuse_kernel.h This is a merge of Linux revision 4c82456eeb4da081dd63dc69e91aa6deabd29e03. No functional change. Sponsored by: The FreeBSD Foundation	2019-06-21 03:17:27 +00:00
Alan Somers	b160acd1c0	fusefs: raise the protocol level to 7.21 Jumping from protocol 7.15 to 7.21 adds several new features. While they're all potentially useful, they're also all optional, and I'm not implementing any right now because my highest priority lies in a later version. Sponsored by: The FreeBSD Foundation	2019-06-21 03:04:56 +00:00
Alan Somers	ecb489158c	fusefs: diff reduction of fuse_kernel.h vs the upstream version fuse_kernel.h is based on Linux's fuse.h. In r349250 I modified fuse_kernel.h by generating a diff of two versions of Linux's fuse.h and applying it to our tree. patch succeeded, but it put one chunk in the wrong location. This commit fixes that. No functional changes. Sponsored by: The FreeBSD Foundation	2019-06-21 02:55:43 +00:00
Alan Somers	7cbb8e8a06	fusefs: raise protocol level to 7.15 This protocol level adds two new features: the ability for the server to store or retrieve data into/from the client's cache. But the messages aren't defined soundly since they identify the file only by its inode, without the generation number. So it's possible for them to modify the wrong file's cache. Also, I don't know of any file systems in ports that use these messages. So I'm not implementing them. I did add a (disabled) test for the store message, however. Sponsored by: The FreeBSD Foundation	2019-06-20 23:32:25 +00:00
Alan Somers	bb23d43901	fusefs: trivially raise protocol level to 7.14 The only new feature is splice(2) support on /dev/fuse, which FreeBSD can't support. Sponsored by: The FreeBSD Foundation	2019-06-20 23:12:19 +00:00
Alan Somers	38b06f8ac4	fcntl: fix overflow when setting F_READAHEAD VOP_READ and VOP_WRITE take the seqcount in blocks in a 16-bit field. However, fcntl allows you to set the seqcount in bytes to any nonnegative 31-bit value. The result can be a 16-bit overflow, which will be sign-extended in functions like ffs_read. Fix this by sanitizing the argument in kern_fcntl. As a matter of policy, limit to IO_SEQMAX rather than INT16_MAX. Also, fifos have overloaded the f_seqcount field for a completely different purpose ever since r238936. Formalize that by using a union type. Reviewed by: cem MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20710	2019-06-20 23:07:20 +00:00
Alan Somers	192a918194	fusefs: attempt to support servers as old as protocol 7.4 Previously we allowed servers as old as 7.1 to connect (there never was a 7.0). However, we wrongly assumed a few things about protocols older than 7.8. This commit attempts to support servers as old as 7.4 but no older. I added no new tests because I'm not sure there actually _are_ any servers this old in the wild. Sponsored by: The FreeBSD Foundation	2019-06-20 22:21:42 +00:00
Alan Somers	2ffddc5ee9	fusefs: raise protocol level to 7.13 This protocol version adds one new feature: the ability for the server to set the maximum number of background requests and a "congestion threshold" with ill-defined properties. I don't know of any fuse file systems in ports that use this feature, so I'm not implementing it. Sponsored by: The FreeBSD Foundation	2019-06-20 21:29:28 +00:00
Alan Somers	a1c9f4ad0d	fusefs: implement VOP_BMAP If the fuse daemon supports FUSE_BMAP, then use that for the block mapping. Otherwise, use the same technique used by vop_stdbmap. Report large values for runp and runb in order to maximize read clustering and minimize upcalls, even if we don't know the true layout. The major result of this change is that sequential reads to FUSE files will now usually happen 128KB at a time instead of 64KB. Sponsored by: The FreeBSD Foundation	2019-06-20 17:08:21 +00:00
Alan Somers	e532a99901	MFHead @349234 Sponsored by: The FreeBSD Foundation	2019-06-20 15:56:08 +00:00
Alan Somers	84879e46c2	fusefs: multiple fixes related to the write cache * Don't always write the last page synchronously. That's not actually required. It was probably just masking another bug that I fixed later, possibly in r349021. * Enable the NotifyWriteback tests now that Writeback cache is working. * Add a test to ensure that the write cache isn't flushed synchronously when in writeback mode. Sponsored by: The FreeBSD Foundation	2019-06-17 23:34:11 +00:00
Alan Somers	402b609c80	fusefs: use cluster_read for more readahead fusefs will now use cluster_read. This allows readahead of more than one cache block. However, it won't yet actually cluster the reads because that requires VOP_BMAP, which fusefs does not yet implement. Sponsored by: The FreeBSD Foundation	2019-06-17 22:01:23 +00:00
Xin LI	f89d207279	Separate kernel crc32() implementation to its own header (gsb_crc32.h) and rename the source to gsb_crc32.c. This is a prerequisite of unifying kernel zlib instances. PR: 229763 Submitted by: Yoshihiro Ota <ota at j.email.ne.jp> Differential Revision: https://reviews.freebsd.org/D20193	2019-06-17 19:49:08 +00:00
Alan Somers	d569012f45	fusefs: implement non-clustered readahead fusefs will now read ahead at most one cache block at a time (usually 64 KB). Clustered reads are still TODO. Individual file systems may disable read ahead by setting fuse_init_out.max_readahead=0 during initialization. Sponsored by: The FreeBSD Foundation	2019-06-17 16:56:51 +00:00
Alan Somers	b5aaf286ea	fusefs: fix the "write-through" of write-through cacheing Our fusefs(5) module supports three cache modes: uncached, write-through, and write-back. However, the write-through mode (which is the default) has never actually worked as its name suggests. Rather, it's always been more like "write-around". It wrote directly, bypassing the cache. The cache would only be populated by a subsequent read of the same data. This commit fixes that problem. Now the write-through mode works as one would expect: write(2) immediately adds data to the cache and then blocks while the daemon processes the write operation. A side effect of this change is that non-cache-block-aligned writes will now incur a read-modify-write cycle of the cache block. The old behavior (bypassing write cache entirely) can still be achieved by opening a file with O_DIRECT. PR: 237588 Sponsored by: The FreeBSD Foundation	2019-06-14 19:47:48 +00:00
Alan Somers	8eecd9ce05	fusefs: enable write clustering Enable write clustering in fusefs whenever cache mode is set to writeback and the "async" mount option is used. With default values for MAXPHYS, DFLTPHYS, and the fuse max_write mount parameter, that means sequential writes will now be written 128KB at a time instead of 64KB. Also, add a regression test for PR 238565, a panic during unmount that probably affects UFS, ext2, and msdosfs as well as fusefs. PR: 238565 Sponsored by: The FreeBSD Foundation	2019-06-14 18:14:51 +00:00
Alan Somers	dff3a6b410	fusefs: fix a bug with WriteBack cacheing An errant vfs_bio_clrbuf snuck in in r348931. Surprisingly, it doesn't have any effect most of the time. But under some circumstances it cause the buffer to behave in a write-only fashion. Sponsored by: The FreeBSD Foundation	2019-06-13 19:07:03 +00:00
Alan Somers	93c0c1d4ce	fusefs: fix a page fault with writeback cacheing When truncating a file downward through a dirty buffer, it's neccessary to update the buffer's b->dirtyend. Sponsored by: The FreeBSD Foundation	2019-06-11 23:46:31 +00:00
Alan Somers	a87e0831ab	fusefs: WIP fixing writeback cacheing The current "writeback" cache mode, selected by the vfs.fusefs.data_cache_mode sysctl, doesn't do writeback cacheing at all. It merely goes through the motions of using buf(9), but then writes every buffer synchronously. This commit: * Enables delayed writes when the sysctl is set to writeback cacheing * Fixes a cache-coherency problem when extending a file whose last page has just been written. * Removes the "sync" mount option, which had been set unconditionally. * Adjusts some SDT probes * Adds several new tests that mimic what fsx does but with more control and without a real file system. As I discover failures with fsx, I add regression tests to this file. * Adds a test that ensures we can append to a file without reading any data from it. This change is still incomplete. Clustered writing is not yet supported, and there are frequent "panic: vm_fault_hold: fault on nofault entry" panics that I need to fix. Sponsored by: The FreeBSD Foundation	2019-06-11 16:32:33 +00:00
Alan Somers	ddc51e453e	fusefs: remove some stuff that was copy/pasted from nfsclient fusefs's I/O methods were originally copy/pasted from nfsclient. This commit removes some irrelevant parts, like stuff involving B_NEEDCOMMIT. Sponsored by: The FreeBSD Foundation	2019-06-06 20:35:41 +00:00
Alan Somers	0269ae4c19	MFHead @348740 Sponsored by: The FreeBSD Foundation	2019-06-06 16:20:50 +00:00
Alan Somers	011bca9948	fusefs: simplify fuse_write_biobackend. No functional change. Sponsored by: The FreeBSD Foundation	2019-06-05 20:18:56 +00:00
Konstantin Belousov	3c93d22758	Manually clear text references on reclaim for nullfs and tmpfs. Both filesystems do no use vnode_pager_dealloc() which would handle this case otherwise. Nullfs because vnode vm_object handle never points to nullfs vnode. Tmpfs because its vm_object is never vnode object at all. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-06-05 20:16:25 +00:00
Alan Somers	a639731ba9	fusefs: respect RLIMIT_FSIZE Sponsored by: The FreeBSD Foundation	2019-06-03 23:24:07 +00:00
Alan Somers	6ff7f297f8	fusefs: don't require FUSE_EXPORT_SUPPORT for async invalidation In r348560 I thought that FUSE_EXPORT_SUPPORT was required for cases where the node to be invalidated (or the parent of the entry to be invalidated) wasn't cached. But I realize now that that's not the case. During entry invalidation, if the parent isn't in the vfs hash table, then it must've been reclaimed. And since fuse_vnop_reclaim does a cache_purge, that means the entry to be invalidated has already been removed from the namecache. And during inode invalidation, if the inode to be invalidated isn't in the vfs hash table, then it too must've been reclaimed. In that case it will have no buffer cache to invalidate. Sponsored by: The FreeBSD Foundation	2019-06-03 20:45:32 +00:00
Alan Somers	eae1ae132c	fusefs: support asynchronous cache invalidation Protocol 7.12 adds a way for the server to notify the client that it should invalidate an inode's data cache and/or attributes. This commit implements that mechanism. Unlike Linux's implementation, ours requires that the file system also supports FUSE_EXPORT_SUPPORT (NFS-style lookups). Otherwise the invalidation operation will return EINVAL. Sponsored by: The FreeBSD Foundation	2019-06-03 17:34:01 +00:00
Alan Somers	c2d70d6e6f	fusefs: support name cache invalidation Protocol 7.12 adds a way for the server to notify the client that it should invalidate an entry from its name cache. This commit implements that mechanism. Sponsored by: The FreeBSD Foundation	2019-06-01 00:11:19 +00:00
Alan Somers	0d2bf48996	fusefs: check the vnode cache when looking up files for the NFS server FUSE allows entries to be cached for a limited amount of time. fusefs's vnop_lookup method already implements that using the timeout functionality of cache_lookup/cache_enter_time. However, lookups for the NFS server go through a separate path: vfs_vget. That path can't use the same timeout functionality because cache_lookup/cache_enter_time only work on pathnames, whereas vfs_vget works by inode number. This commit adds entry timeout information to the fuse vnode structure, and checks it during vfs_vget. This allows the NFS server to take advantage of cached entries. It's also the same path that FUSE's asynchronous cache invalidation operations will use. Sponsored by: The FreeBSD Foundation	2019-05-31 21:22:58 +00:00
Rick Macklem	6aab442af9	Get rid of extraneous initialization. Get rid of an extraneous initialization, mainly to keep a static analyser happy. No semantic change. PR: 238167 Submitted by: Alexey Dokuchaev	2019-05-31 03:13:09 +00:00

1 2 3 4 5 ...

4306 Commits