freebsd-nq

Author	SHA1	Message	Date
Mateusz Guzik	f1fa1ba3d0	Fix up various vnode-related asserts which did not dump the used vnode	2020-02-03 14:25:32 +00:00
Mateusz Guzik	10a15df653	vfs: remove the never set VDESC_VPP_WILLRELE flag	2020-02-02 09:35:48 +00:00
Konstantin Belousov	dc1d2cc648	Fix a bug in r357199. Around a generic call to null_nodeget(), there is nothing that would prevent the unmount of the nullfs mp until we process to the insmntque1() point. Calculate the VV_ROOT flag after insmntque1() to not access mp->mnt_data before we have an exclusively locked vnode from this mount point on the mp vnode list. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2020-01-30 19:34:37 +00:00
Mateusz Guzik	3cfabd81a1	vfs: remove the never set VDESC_NOMAP_VPP flag	2020-01-30 08:56:22 +00:00
Konstantin Belousov	5fc9e11c42	Save lower root vnode in nullfs mnt data instead of upper. Nullfs needs to know the root vnode of the lower fs during the operation. Currently it caches the upper vnode of it, which is also the root of the nullfs mount. On unmount, nullfs calls vflush() with rootrefs == 1, and aborts non-forced unmount if there are any more vnodes instantiated during vflush(). This means that the reference to the root vnode after failed non-forced unmount could be lost and nullm_rootvp points to the freed memory. Fix it by storing the reference for lower vnode instead, which is kept intact during vflush(). nullfs_root() now instantiates the upper vnode of lower root. Care about VV_ROOT flag in null_nodeget(). Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2020-01-28 11:29:06 +00:00
Mateusz Guzik	b249ce48ea	vfs: drop the mostly unused flags argument from VOP_UNLOCK Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427	2020-01-03 22:29:58 +00:00
Mateusz Guzik	6fa079fc3f	vfs: flatten vop vectors This eliminates the following loop from all VOP calls: while(vop != NULL && \ vop->vop_spare2 == NULL && vop->vop_bypass == NULL) vop = vop->vop_default; Reviewed by: jeff Tesetd by: pho Differential Revision: https://reviews.freebsd.org/D22738	2019-12-16 00:06:22 +00:00
Mateusz Guzik	abd80ddb94	vfs: introduce v_irflag and make v_type smaller The current vnode layout is not smp-friendly by having frequently read data avoidably sharing cachelines with very frequently modified fields. In particular v_iflag inspected for VI_DOOMED can be found in the same line with v_usecount. Instead make it available in the same cacheline as the v_op, v_data and v_type which all get read all the time. v_type is avoidably 4 bytes while the necessary data will easily fit in 1. Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new flag field with a new value: VIRF_DOOMED. Reviewed by: kib, jeff Differential Revision: https://reviews.freebsd.org/D22715	2019-12-08 21:30:04 +00:00
Mateusz Guzik	1e0006e49c	nullfs: locklessly check for entries in null_hashget During random sampling over poudriere -j 104 over 10% of calls returned NULL.	2019-12-05 13:41:22 +00:00
Mateusz Guzik	be4cd6912f	nullfs: use MNTK_NOMSYNC Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22009	2019-10-13 15:42:04 +00:00
Mateusz Guzik	e0f4540a2a	nullfs: reduce areas protected by vnode interlock in null_lock Similarly to the other routine stop taking the interlock for the lower vnode. The interlock for nullfs vnode is still taken to ensure stability of ->v_data. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21480	2019-09-01 02:52:00 +00:00
Mateusz Guzik	13c73428dc	nullfs: use VOP_NEED_INACTIVE Reviewed by: kib Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation	2019-08-30 00:30:03 +00:00
Mateusz Guzik	1e2f0ceb2f	vfs: add VOP_NEED_INACTIVE vnode usecount drops to 0 all the time (e.g. for directories during path lookup). When that happens the kernel would always lock the exclusive lock for the vnode in order to call vinactive(). This blocks other threads who want to use the vnode for looukp. vinactive is very rarely needed and can be tested for without the vnode lock held. This patch gives filesytems an opportunity to do it, sample total wait time for tmpfs over 500 minutes of poudriere -j 104: before: 557563641706 (lockmgr:tmpfs) after: 46309603301 (lockmgr:tmpfs) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21371	2019-08-28 20:34:24 +00:00
Mateusz Guzik	33d46a3cef	nullfs: reduce areas protected by vnode interlock Some places only take the interlock to hold the vnode, which was a requiremnt before they started being manipulated with atomics. Use the newly introduced vholdnz to bump the count. Reviewed by: kib Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21358	2019-08-25 05:13:15 +00:00
Mateusz Guzik	81f666e79d	nullfs: lock the vnode with LK_SHARED in null_vptocnp null_nodeget which follows almost always finds the target vnode in the hash, avoiding insmntque1 altogether. Should it be needed, it already checks if the lock needs to be upgraded. Reviewed by: kib Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20244	2019-08-21 23:24:40 +00:00
Konstantin Belousov	3c93d22758	Manually clear text references on reclaim for nullfs and tmpfs. Both filesystems do no use vnode_pager_dealloc() which would handle this case otherwise. Nullfs because vnode vm_object handle never points to nullfs vnode. Tmpfs because its vm_object is never vnode object at all. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-06-05 20:16:25 +00:00
Konstantin Belousov	78022527bb	Switch to use shared vnode locks for text files during image activation. kern_execve() locks text vnode exclusive to be able to set and clear VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0 condition. The change removes VV_TEXT, replacing it with the condition v_writecount <= -1, and puts v_writecount under the vnode interlock. Each text reference decrements v_writecount. To clear the text reference when the segment is unmapped, it is recorded in the vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and v_writecount is incremented on the map entry removal The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that v_writecount does not contradict the desired change. vn_writecheck() is now racy and its use was eliminated everywhere except access. Atomic check for writeability and increment of v_writecount is performed by the VOP. vn_truncate() now increments v_writecount around VOP_SETATTR() call, lack of which is arguably a bug on its own. nullfs bypasses v_writecount to the lower vnode always, so nullfs vnode has its own v_writecount correct, and lower vnode gets all references, since object->handle is always lower vnode. On the text vnode' vm object dealloc, the v_writecount value is reset to zero, and deadfs vop_unset_text short-circuit the operation. Reclamation of lowervp always reclaims all nullfs vnodes referencing lowervp first, so no stray references are left. Reviewed by: markj, trasz Tested by: mjg, pho Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D19923	2019-05-05 11:20:43 +00:00
Konstantin Belousov	7ae3486e6d	nullfs: fix unmounts when filesystem is active. If vflush() did not completely flushed the mount vnodes queue, either retry for forced unmounts, or give up for non-forced. This situation can occur when new vnodes are instantiated while vflush() worked. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-03-21 13:30:48 +00:00
Konstantin Belousov	b9662886ef	Un null_vptocnp(), cache vp->v_mount and use it for null_nodeget() call. The vp vnode is unlocked during the execution of the VOP method and can be reclaimed, zeroing vp->v_data. Caching allows to use the correct mount point. Reported and tested by: pho PR: 235549 Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-02-08 08:20:18 +00:00
Konstantin Belousov	25728e8411	Before using VTONULL(), check that the covered vnode belongs to nullfs. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-02-08 08:17:31 +00:00
Konstantin Belousov	930cc2dbef	Some style for nullfs_mount(). Also use bool type for isvnunlocked. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-02-08 08:15:29 +00:00
Jamie Gritton	0e5c6bd436	Make it easier for filesystems to count themselves as jail-enabled, by doing most of the work in a new function prison_add_vfs in kern_jail.c Now a jail-enabled filesystem need only mark itself with VFCF_JAIL, and the rest is taken care of. This includes adding a jail parameter like allow.mount.foofs, and a sysctl like security.jail.mount_foofs_allowed. Both of these used to be a static list of known filesystems, with predefined permission bits. Reviewed by: kib Differential Revision: D14681	2018-05-04 20:54:27 +00:00
Edward Tomasz Napierala	ed5cdcb6c3	Make nullfs properly report MNT_AUTOMOUNTED set on the nullfs mount itself, instead of copying from the underlying filesystem. PR: 224851 Reported by: Jamie Landeg-Jones <jamie at dyslexicfish.net> Tested by: Jamie Landeg-Jones <jamie at dyslexicfish.net> MFC after: 2 weeks	2018-01-10 17:51:02 +00:00
Pedro F. Giffuni	51369649b0	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.	2017-11-20 19:43:44 +00:00
Warner Losh	fbbd9655e5	Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96	2017-02-28 23:42:47 +00:00
Konstantin Belousov	2f304845e2	Do not allocate struct statfs on kernel stack. Right now size of the structure is 472 bytes on amd64, which is already large and stack allocations are indesirable. With the ino64 work, MNAMELEN is increased to 1024, which will make it impossible to have struct statfs on the stack. Extracted from: ino64 work by gleb Discussed with: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-01-05 17:19:26 +00:00
Konstantin Belousov	abc1515601	NFSv4 client tracks opens, and the track records are only dropped when the vnode is inactivated. This contradicts with the nullfs caching which keeps upper vnode around, as consequence keeping the use reference to lower vnode. Add a filesystem flag to request nullfs to not cache when mounted over that filesystem, and set the flag for nfs v4 mounts. Reported by: asomers Reviewed by: rmacklem Tested by: asomers, rmacklem Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-11-27 09:20:58 +00:00
Bryan Drewery	28323add09	Fix improper use of "its". Sponsored by: Dell EMC Isilon	2016-11-08 23:59:41 +00:00
Edward Tomasz Napierala	e583d99909	Change the getnewvnode(9) tag for nullfs from "null" to "nullfs". It's more consistent, and besides, the "null" alone looks weird. MFC after: 1 month	2016-09-15 13:57:37 +00:00
Mateusz Guzik	6a3e46059a	nullfs: plug vnode ref leak in null_vptocnp The lower vnode is already referenced and nodeget is supposed to consume the reference. Thus the extra vref call was causing a leak. Reported by: pho Reviewed by: kib MFC after: 1 week	2016-09-09 10:40:55 +00:00
Mateusz Guzik	2740551545	nullfs: stop special-casing directories in null_vptocnp The previous code was forcing an expensive walk in vop_stdvptocnp, which was causing performance issues on highly contended zfs. No objections: kib MFC after: 2 weeks	2016-09-06 21:22:03 +00:00
Pedro F. Giffuni	b3a15ddd5b	sys/fs: spelling fixes in comments. No functional change.	2016-04-29 20:51:24 +00:00
Konstantin Belousov	f36aa2b792	Pass MNTK_NO_IOPF and MNTK_UNMAPPED_BUFS flags from the lower filesystem to the nullfs mount. MNTK_NO_IOPF must be present on the nullfs struct mount so that struct file fo_read and fo_write fops operate in the mode requested by the lower mount. MNTK_UNMAPPED_BUFS allows VOP_GETPAGES() to use unmapped buffers. It does not matter for VOP_GETPAGES() calls from vm_fault() since handle of the vm_object always points to the lower vnode. But it may be useful for other situations where VOP_GETPAGES() is used. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-03-04 17:24:28 +00:00
Konstantin Belousov	830cd4b810	After nullfs rmdir operation, reclaim the directory vnode which was unlinked. Otherwise the vnode stays cached, causing leak. This is similar to r292961 for regular files. Reported and tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-02-17 19:43:03 +00:00
Konstantin Belousov	6f73b583d9	Force nullfs vnode reclaim after unlinking, to potentially unlink lower vnode. Otherwise, reference to the lower vnode from the upper one prevents final unlink. PR: 178238 Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-12-30 19:49:22 +00:00
Mark Johnston	5f34e93c58	Check suspendability on the mountpoint returned by VOP_GETWRITEMOUNT. This obviates the need for a MNTK_SUSPENDABLE flag, since passthrough filesystems like nullfs and unionfs no longer need to inherit this information from their lower layer(s). This change also restores the pre-r273336 behaviour of using the presence of a susp_clean VFS method to request suspension support. Reviewed by: kib, mjg Differential Revision: https://reviews.freebsd.org/D2937	2015-07-05 22:37:33 +00:00
Rick Macklem	dda11d4ab9	File systems that do not use the buffer cache (such as ZFS) must use VOP_FSYNC() to perform the NFS server's Commit operation. This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which is set by file systems that use the buffer cache. If this flag is not set, the NFS server always does a VOP_FSYNC(). This should be ok for old file system modules that do not set MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although it might not be optimal for file systems that use the buffer cache. Reviewed by: kib MFC after: 2 weeks	2015-04-15 20:16:31 +00:00
Mateusz Guzik	cd29b292b8	Convert nullfs hash lock from a mutex to an rwlock.	2014-12-30 21:41:35 +00:00
Mateusz Guzik	4fce16e4c9	Provide vfs suspension support only for filesystems which need it, take two. nullfs and unionfs need to request suspension if underlying filesystem(s) use it. Utilize mnt_kern_flag for this purpose. This is a fixup for 273271. No strong objections from: kib Pointy hat to: mjg MFC after: 2 weeks	2014-10-20 18:00:50 +00:00
Konstantin Belousov	effc6a3593	VOP_LOOKUP() may relock the directory vnode for some reasons. Since nullfs vnode shares vnode lock with lower vnode, this allows the reclamation of nullfs directory vnode in null_lookup(). In this situation, VOP must return ENOENT. More, since after the reclamation, the locks of nullfs directory vnode and lower vnode are no longer shared, the relock of the ldvp does not restore the correct locking state of dvp, and leaks ldvp lock. Correct this by unlocking ldvp and locking dvp. Use cached value of dvp->v_mount. Reported by: bdrewery Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-08-08 11:39:05 +00:00
Konstantin Belousov	0ebe0000b6	Assert that nullfs vnode has VV_ROOT set whenever lower vnode has. Assert that dotdot lookup on the root vnode is not performed. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-07-28 14:20:31 +00:00
Konstantin Belousov	289dd6dd7c	Fix typo. MFC after: 3 days	2014-07-24 23:14:03 +00:00
Konstantin Belousov	65589a29f4	Check for the cross-device cross-link attempt in the VFS, instead of forcing filesystem VOP_LINK() methods to repeat the code. In tmpfs_link(), remove redundand check for the type of the source, already done by VFS. Note that NFS server already performs this check before calling VOP_LINK(). Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-07-16 14:04:46 +00:00
Dag-Erling Smørgrav	1a05c762b9	Fix the length calculation for the final block of a sendfile(2) transmission which could be tricked into rounding up to the nearest page size, leaking up to a page of kernel memory. [13:11] In IPv6 and NetATM, stop SIOCSIFADDR, SIOCSIFBRDADDR, SIOCSIFDSTADDR and SIOCSIFNETMASK at the socket layer rather than pass them on to the link layer without validation or credential checks. [SA-13:12] Prevent cross-mount hardlinks between different nullfs mounts of the same underlying filesystem. [SA-13:13] Security: CVE-2013-5666 Security: FreeBSD-SA-13:11.sendfile Security: CVE-2013-5691 Security: FreeBSD-SA-13:12.ifioctl Security: CVE-2013-5710 Security: FreeBSD-SA-13:13.nullfs Approved by: re	2013-09-10 10:05:59 +00:00
Konstantin Belousov	18a8d3d7f8	The tvp vnode on rename is usually unlinked. Drop the cached null vnode for tvp to allow the free of the lower vnode, if needed. PR: kern/180236 Tested by: smh Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-07-04 19:01:18 +00:00
Konstantin Belousov	74c7ff1a0e	Do not leak the NULLV_NOUNLOCK flag from the nullfs_unlink_lowervp(), for the case when the nullfs vnode is not reclaimed. Otherwise, later reclamation would not unlock the lower vnode. Reported by: antoine Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-05-21 11:31:56 +00:00
Konstantin Belousov	0fc6daa72d	- Fix nullfs vnode reference leak in nullfs_reclaim_lowervp(). The null_hashget() obtains the reference on the nullfs vnode, which must be dropped. - Fix a wart which existed from the introduction of the nullfs caching, do not unlock lower vnode in the nullfs_reclaim_lowervp(). It should be innocent, but now it is also formally safe. Inform the nullfs_reclaim() about this using the NULLV_NOUNLOCK flag set on nullfs inode. - Add a callback to the upper filesystems for the lower vnode unlinking. When inactivating a nullfs vnode, check if the lower vnode was unlinked, indicated by nullfs flag NULLV_DROP or VV_NOSYNC on the lower vnode, and reclaim upper vnode if so. This allows nullfs to purge cached vnodes for the unlinked lower vnode, avoiding excessive caching. Reported by: G??ran L??wkrantz <goran.lowkrantz@ismobile.com> Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-05-11 11:17:44 +00:00
Jilles Tjoelker	6d6a91c50f	nullfs: Improve f_flags in statfs(). Include some flags of the nullfs mount itself: MNT_RDONLY, MNT_NOEXEC, MNT_NOSUID, MNT_UNION, MNT_NOSYMFOLLOW. This allows userland code calling statfs() or fstatfs() to see these flags. In particular, this allows opendir() to detect that a -t nullfs -o union mount needs deduplication (otherwise at least . and .. are returned twice) and allows rtld to detect a -t nullfs -o noexec mount as noexec. Turn off the MNT_ROOTFS flag from the underlying filesystem because the nullfs mount is definitely not the root filesystem. Reviewed by: kib MFC after: 1 week	2013-03-02 12:42:23 +00:00
Konstantin Belousov	e8f966eeb8	Remove the filtering of the acceptable mount options for nullfs, added in r245004. Although the report was for noatime option which is non-functional for the nullfs, other standard options like nosuid or noexec are useful with it. Reported by: Dewayne Geraghty <dewayne.geraghty@heuristicsystems.com.au> MFC after: 3 days	2013-01-16 05:32:49 +00:00
Konstantin Belousov	603f963e56	The current default size of the nullfs hash table used to lookup the existing nullfs vnode by the lower vnode is only 16 slots. Since the default mode for the nullfs is to cache the vnodes, hash has extremely huge chains. Size the nullfs hashtbl based on the current value of desiredvnodes. Use vfs_hash_index() to calculate the hash bucket for a given vnode. Pointy hat to: kib Diagnosed and reviewed by: peter Tested by: peter, pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 5 days	2013-01-14 05:44:47 +00:00

1 2 3 4 5 ...

276 Commits