freebsd-dev

Author	SHA1	Message	Date
Konstantin Belousov	f7b71c8a5b	Issue NOTE_EXTEND when a directory entry is added to or removed from the monitored directory as the result of rename(2) operation. The renames staying in the directory are not reported. Submitted by: Vladimir Kondratyev <wulf@cicgroup.ru> MFC after: 2 weeks	2016-05-02 13:18:17 +00:00
Konstantin Belousov	bd2ead6b2e	Fix reporting of NOTE_LINK when directory link count changes due to rename removing or adding subdirectory entry. Discussed with and tested by: Vladimir Kondratyev <wulf@cicgroup.ru> NetBSD PR: 48958 (http://gnats.netbsd.org/48958) MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2016-05-02 13:13:32 +00:00
Pedro F. Giffuni	e3043798aa	sys/kern: spelling fixes in comments. No functional change.	2016-04-29 22:15:33 +00:00
Pedro F. Giffuni	55e0987aea	sys: extend use of the howmany() macro when available. We have a howmany() macro in the <sys/param.h> header that is convenient to re-use as it makes things easier to read.	2016-04-26 15:38:17 +00:00
Konstantin Belousov	0791e0c0e7	Provide more correct sizing of the KVA consumed by a vnode, used by the virtvnodes calculation. Include the size of fs-specific v_data as the nfs nclnode inline, the NFS nclnode is bigger than either ZFS znode or UFS inode. Include the size of namecache_ts and short cache path element, multiplied by the name cache population factor, again inline. Inline defines are used to avoid pollution of the vnode.h with the subsystem-private objects. Non-significant unsynchronized changes of the definitions are fine, we do not care about that precision, and e.g. ZFS consumes much malloced memory per vnode for reasons unaccounted in the formula. Lower the partition of kmem dedicated to vnodes, from 1/7 to 1/10. The measures reduce vnode cache pressure on kmem and bring the vnode cache memory use below some apparent thresholds that were exceeded by r291244 due to more robust vnode reuse. Reported and tested by: marius (i386, previous version) Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-02-24 15:15:46 +00:00
Konstantin Belousov	fa48f413ef	In bnoreuselist(), check both ends of the specified logical block numbers range. This effectively skips indirect and extdata blocks on the buffer queue. Since their logical block numbers are negative, bnoreuselist() could loop infinitely. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-02-17 19:39:57 +00:00
Mark Johnston	793c381706	Add vrefl(), a locked variant of vref(9). This API has no in-tree consumers at the moment but is useful to at least one out-of-tree consumer, and naturally complements existing vnode refcount functions (vholdl(9), vdropl(9)). Obtained from: kib (sys/ portion) Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D4947 Differential Revision: https://reviews.freebsd.org/D4953	2016-01-18 22:21:46 +00:00
Konstantin Belousov	1041e09089	Two fixes for excessive iterations after r292326. Advance the logical block number to the lblkno of the found block plus one, instead of incrementing the block number which was used for lookup. This change skips sparcely populated buffer ranges, similar to r292325, instead of doing useless lookups. Do not restart the bnoreuselist() from the start of the range if buffer lock cannot be obtained without sleep. Only retry lookup and lock for the same queue and same logical block number. Reported by: benno Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-01-05 14:48:40 +00:00
Konstantin Belousov	106ebb761a	Optimize vop_stdadvise(POSIX_FADV_DONTNEED). Instead of looking up a buffer for each block number in the range with gbincore(), look up the next instantiated buffer with the logical block number which is greater or equal to the next lblkno. This significantly speeds up the iteration for sparce-populated range. Move the iteration into new helper bnoreuselist(), which is structured similarly to flushbuflist(). Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation	2015-12-16 08:48:37 +00:00
Konstantin Belousov	8549b4b9fe	Simplify the loop step in the flushbuflist() and make it independed on the type stability of the buffers memory. Instead of memoizing pointer to the next buffer and validating it, remember the next logical block number in the bo list and re-lookup. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation	2015-12-16 08:39:51 +00:00
Kirk McKusick	d9ea698c75	We need to zero out the clustering variables in a freed vnode structure. For completeness add a VNASSERT that there are no threads waiting on a range lock (this was previously checked on every vnode free). Reported by; Rick Macklem Fix from: Mateusz Guzik PR: 204949	2015-12-04 03:54:18 +00:00
Kirk McKusick	003a7c2b01	We need to zero out the union of pointers in a freed vnode structure. PR: 204949 Fix from: Mateusz Guzik Tested by: Jason Unovitch	2015-12-03 02:04:22 +00:00
Kirk McKusick	41d4f10391	As the kernel allocates and frees vnodes, it fully initializes them on every allocation and fully releases them on every free. These are not trivial costs: it starts by zeroing a large structure then initializes a mutex, a lock manager lock, an rw lock, four lists, and six pointers. And looking at vfs.vnodes_created, these operations are being done millions of times an hour on a busy machine. As a performance optimization, this code update uses the uma_init and uma_fini routines to do these initializations and cleanups only as the vnodes enter and leave the vnode_zone. With this change the initializations are only done kern.maxvnodes times at system startup and then only rarely again. The frees are done only if the vnode_zone shrinks which never happens in practice. For those curious about the avoided work, look at the vnode_init() and vnode_fini() functions in kern/vfs_subr.c to see the code that has been removed from the main vnode allocation/free path. Reviewed by: kib Tested by: Peter Holm	2015-11-29 21:42:26 +00:00
Konstantin Belousov	f186a80d8a	Remove VI_AGE vnode iflag, it is unused. Noted by: bde Sponsored by: The FreeBSD Foundation	2015-11-27 01:45:40 +00:00
Konstantin Belousov	b3162b45e1	Move the comment about resident pages preventing vnode from leaving active list, into the header comment for vdrop(), which is the function that decides whether to leave the vnode on the list. Note that dirty page write-out in vinactive() is asynchronous. Discussed with: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-11-27 01:16:35 +00:00
Konstantin Belousov	547831b6fd	Rework the vnode cache recycling to meet free and unused vnodes targets. See the comment above wantfreevnodes variable for the description of the algorithm. The vfs.vlru_alloc_cache_src sysctl is removed. New code frees namecache sources as the last chance to satisfy the highest watermark, instead of selecting the source vnodes randomly. This provides good enough behaviour to keep vn_fullpath() working in most situations. The filesystem layout with deep trees, where the removed knob was required, is thus handled automatically. Submitted by: bde Discussed with: mckusick Tested by: pho MFC after: 1 month	2015-11-24 09:45:36 +00:00
Gleb Smirnoff	09c837b897	Remove remnants of the old NFS from vnode pager. Reviewed by: kib Sponsored by: Netflix	2015-11-20 23:52:27 +00:00
Mark Johnston	0a805de6f3	Remove a check for a condition that is always false by a preceding KASSERT that was added in r144704.	2015-09-26 22:26:55 +00:00
Mark Johnston	d925c2e800	Fix argument ordering in vn_printf(). MFC after: 3 days	2015-09-26 22:16:54 +00:00
Conrad Meyer	55d33667ee	kevent(2): Note DOOMED vnodes with NOTE_REVOKE In poll mode, check for and wake VBAD vnodes. (Vnodes that are VBAD at registration will never be woken by the RECLAIM trigger.) Add post-VOP_RECLAIM hook to trigger notes on vnode reclamation. (Vnodes that were fine at registration but are vgoned while being monitored should signal waiters.) Reviewed by: kib Approved by: markj (mentor) Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3675	2015-09-15 20:22:30 +00:00
Kirk McKusick	17518b1a2b	Track changes to kern.maxvnodes and appropriately increase or decrease the size of the name cache hash table (mapping file names to vnodes) and the vnode hash table (mapping mount point and inode number to vnode). An appropriate locking strategy is the key to changing hash table sizes while they are in active use. Reviewed by: kib Tested by: Peter Holm Differential Revision: https://reviews.freebsd.org/D2265 MFC after: 2 weeks	2015-09-06 05:50:51 +00:00
Edward Tomasz Napierala	c9ba65040f	Make vfs_unmountall() unmount /dev after /, not before. The only reason this didn't result in an unclean shutdown is that devfs ignores MNT_FORCE flag. Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3467	2015-08-24 13:18:13 +00:00
Edward Tomasz Napierala	6e572e084b	After r286237 it should be fine to call vgone(9) on a busy GEOM vnode; remove KASSERT that would prevent forced devfs unmount from working. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-08-23 14:53:54 +00:00
Ed Schouten	2433a4eb04	Make it possible to implement poll(2) on top of kqueue(2). It looks like EVFILT_READ and EVFILT_WRITE trigger under the same conditions as poll()'s POLLRDNORM and POLLWRNORM as described by POSIX. The only difference is that POLLRDNORM has to be triggered on regular files unconditionally, whereas EVFILT_READ only triggers when not EOF. Introduce a new flag, NOTE_FILE_POLL, that can be used to make EVFILT_READ and EVFILT_WRITE behave identically to poll(). This flag will be used by cloudlibc's poll() function. Reviewed by: jmg Differential Revision: https://reviews.freebsd.org/D3303	2015-08-05 07:34:29 +00:00
Edward Tomasz Napierala	57a73b26e0	Mark vgonel() as static. It was already declared static earlier; no idea why compilers don't warn about this. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-08-04 08:51:56 +00:00
Mateusz Guzik	752fc07d33	vfs: implement v_holdcnt/v_usecount manipulation using atomic ops Transitions 0->1 and 1->0 (which decide e.g. on putting the vnode on the free list) of either counter are still guarded with vnode interlock. Reviewed by: kib (earlier version) Tested by: pho	2015-07-16 13:57:05 +00:00
Mateusz Guzik	c634b75204	vfs: always clear VI_OWEINACT in consumers bumping v_usecount Previously vputx would detect the condition and clear the flag. With this change it is invalid to have both v_usecount > 0 and the flag set. Assert the condition is met in all revlevant places. Reviewed by: kib	2015-07-11 16:28:55 +00:00
Mateusz Guzik	2d1ca3cdff	vfs: move si_usecount manipulation to dedicated functions Reviewed by: kib	2015-07-11 16:28:12 +00:00
Konstantin Belousov	cf88021ab1	Do not allow creation of the dirty buffers for the dead buffer objects, i.e. for buffer objects which vnode was reclaimed. Buffer cache cannot write such buffers. Return the error and discard the buffer immediately on write attempt. BO_DIRTY now always set during vnode reclamation, since it is used not only for the INVARIANTS checks. Do allow placement of the clean buffers on dead bufobj list, otherwise filesystems cannot use bufcache at all after the devvp reclaim. Reported and tested by: trasz Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-07-11 11:21:56 +00:00
Mark Johnston	8bbd1f25b1	Remove a stale descriptive comment for gbincore(). The splay trees referenced in the comment were converted to path-compressed tries in r250551. MFC after: 3 days	2015-07-05 22:44:41 +00:00
John-Mark Gurney	1977bd233a	zero this struct as it depends upon it... Reviewed by: mjg Differential Revision: https://reviews.freebsd.org/D2890	2015-06-23 18:40:20 +00:00
Konstantin Belousov	1eabd96728	vfs_msync(), called from syncer vnode fsync VOP, only iterates over the active vnode list for the given mount point, with the assumption that vnodes with dirty pages are active. This is enforced by vinactive() doing vm_object_page_clean() pass over the vnode pages. The issue is, if vinactive() cannot be called during vput() due to the vnode being only shared-locked, we might end up with the dirty pages for the vnode on the free list. Such vnode is invisible to syncer, and pages are only cleaned on the vnode reactivation. In other words, the race results in the broken guarantee that user data, written through the mmap(2), is written to the disk not later than in 30 seconds after the write. Fix this by keeping the vnode which is freed but still owing inactivation, on the active list. When syncer loops find such vnode, it is deactivated and cleaned by the final vput() call. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-06-17 04:46:58 +00:00
Konstantin Belousov	780dca1b1e	Right now, dounmount() is called with unreferenced mount point. Nothing stops a parallel unmount to suceed before the given call to dounmount() checks and locks the covered vnode. Prevent dounmount() from acting on the freed (although type-stable) memory by changing the interface to require the mount point to be referenced. dounmount() consumes the reference on return, regardless of the sucessfull or erronous result. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-05-27 09:22:50 +00:00
Rick Macklem	dda11d4ab9	File systems that do not use the buffer cache (such as ZFS) must use VOP_FSYNC() to perform the NFS server's Commit operation. This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which is set by file systems that use the buffer cache. If this flag is not set, the NFS server always does a VOP_FSYNC(). This should be ok for old file system modules that do not set MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although it might not be optimal for file systems that use the buffer cache. Reviewed by: kib MFC after: 2 weeks	2015-04-15 20:16:31 +00:00
Konstantin Belousov	08189ed667	The VNASSERT in vflush() FORCECLOSE case is trying to panic early to prevent errors from yanking devices out from under filesystems. Only care about special vnodes on devfs, special nodes on other kinds of filesystems do not have special properties. Sponsored by: EMC / Isilon Storage Division Submitted by: Conrad Meyer MFC after: 1 week	2015-02-27 16:43:50 +00:00
Enji Cooper	c514f051b7	Add the mnt_lockref field to the ddb(4) 'show mount' command MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D1688 Submitted by: Conrad Meyer <conrad.meyer@isilon.com> Sponsored by: EMC / Isilon Storage Division	2015-02-17 09:31:58 +00:00
John Baldwin	1b76e0b732	Add two new counters for vnode life cycle events: - vfs.recycles counts the number of vnodes forcefully recycled to avoid exceeding kern.maxvnodes. - vfs.vnodes_created counts the number of vnodes created by successful calls to getnewvnode(). Differential Revision: https://reviews.freebsd.org/D1671 Reviewed by: kib MFC after: 1 week	2015-02-14 17:02:51 +00:00
John Baldwin	a77a12340a	Change the default VFS timestamp precision from seconds to microseconds. Discussed on: arch@ MFC after: 2 weeks	2015-01-25 19:56:45 +00:00
Konstantin Belousov	ea117d1735	The vinactive() call in vgonel() may start writes for the dirty pages, creating delayed write buffers belonging to the reclaimed vnode. Put the buffer cleanup code after inactivation. Add asserts that ensure that buffer queues are empty and add BO_DEAD flag for bufobj to check that no buffers are added after the cleanup. BO_DEAD is only used by INVARIANTS-enabled kernels. Reported and tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-12-13 16:02:37 +00:00
Konstantin Belousov	a77c72f5ae	Apply chunk forgotten in r275620. Remove local variable for real. CID: 1257462 Sponsored by: The FreeBSD Foundation	2014-12-09 09:36:28 +00:00
Konstantin Belousov	a25100c539	Add functions syncer_suspend() and syncer_resume(), which are supposed to be called before suspension and after resume, correspondingly. The syncer_suspend() ensures that all filesystems dirty data and metadata are saved to the permanent storage, and stops kernel threads which might modify filesystems. The syncer_resume() restores stopped threads. For now, only syncer is stopped. This is needed, because each sync loop causes superblock updates for UFS. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-12-08 16:48:57 +00:00
Mateusz Guzik	32e7f8e4d5	Don't take devmtx unnecessarily in vn_isdisk. MFC after: 1 week	2014-10-15 05:17:36 +00:00
Will Andrews	9832a24d27	In the syncer, drop the sync mutex while patting the watchdog. Some watchdog drivers (like ipmi) need to sleep while patting the watchdog. See sys/dev/ipmi/ipmi.c:ipmi_wd_event(), which calls malloc(M_WAITOK). Submitted by: asomers MFC after: 1 month Sponsored by: Spectra Logic MFSpectraBSD: 637548 on 2012/10/04	2014-10-01 15:32:28 +00:00
Konstantin Belousov	168f4ee0a8	Remove Giant acquisition from the mount and unmount pathes. It could be claimed that two things were reasonable protected by Giant. One is vfsconf list links, which is converted to the new dedicated sx vfsconf_sx. Another is vfsconf.vfc_refcount, which is now updated with atomics. Note that vfc_refcount still has the same races now as it has under the Giant, the unload of filesystem modules can happen while the module is still in use. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-08-03 03:27:54 +00:00
Konstantin Belousov	634012b917	Remove one-time use macros which check for the vnode lifecycle. More, some parts of the checks are in fact redundand in the surrounding code, and it is more clear what the conditions are by direct testing of the flags. Two of the three macros were only used in assertions. In vnlru_free(), all relevant parts of vholdl() were already inlined, except the increment of v_holdcnt itself. Do not call vholdl() to do the increment as well, this allows to make assertions in vholdl()/vhold() more strict. In v_incr_usecount(), call vholdl() before incrementing other ref counters. The change is no-op, but it makes less surprising to see the vnode state in debugger if interrupted inside v_incr_usecount(). Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-07-29 16:42:34 +00:00
Alexander Motin	781c93d405	Implement simple direct-mapped cache for popular filesystem identifiers to avoid congestion on global mountlist_mtx mutex in vfs_busyfs(), while traversing through the list of mount points. This change significantly improves NFS server scalability, since it had to do this translation for every request, and the global lock becomes quite congested. This code is more optimized for relatively small number of mount points. On systems with hundreds of active mount points this simple cache may have many collisions. But the original traversal code in that case should also behave much worse, so we are not loosing much. Reviewed by: attilio MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2014-06-12 12:43:48 +00:00
Alexander Motin	4f655310bf	Remove unneeded mountlist_mtx acquisition from sync_fsync(). All struct mount fields accessed by sync_fsync() are protected by MNT_MTX.	2014-06-11 12:56:49 +00:00
Alexander Motin	3345d73ca8	Remove extra branching from r267232. MFC after: 2 weeks	2014-06-08 19:01:37 +00:00
Alexander Motin	590d636321	Use atomics to modify numvnodes variable. This allows to mostly avoid lock usage in getnewvnode_[drop_]reserve(), that reduces number of global vnode_free_list_mtx mutex acquisitions from 4 to 2 per NFS request on ZFS, improving SMP scalability. Reviewed by: kib MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2014-06-08 15:38:40 +00:00
Benjamin Kaduk	bf09eca2cb	Check for mismatched vref()/vdrop() Assert that the hold count has not fallen below the use count, a situation that would only happen when a vref() (or similar) is erroneously paired with a vdrop(). This situation has not been observed in the wild, but could be helpful for someone implementing a new filesystem. Reviewed by: kib Approved by: hrs (mentor)	2014-05-21 03:11:27 +00:00
Bryan Drewery	44f1c91610	Rename global cnt to vm_cnt to avoid shadowing. To reduce the diff struct pcu.cnt field was not renamed, so PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in kvm(3) and vmstat(8). The goal was to not affect externally used KPI. Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the the global cnt variable. Exp-run revealed no ports using it directly. No objection from: arch@ Sponsored by: EMC / Isilon Storage Division	2014-03-22 10:26:09 +00:00
Konstantin Belousov	2c1531e746	Do not flush buffers when the v_object of the passed vnode does not really belong to it. Such vnodes, with the pointers to other vnodes v_objects, are typically instantiated by the bypass filesystems. Invalidating mappings of other vnode pages and the pages is wrong, since reclamation of the upper vnode does not imply that lower vnode is reclaimed too. One of the consequences of the improper reclamation was destruction of the wired mappings of the lower vnode pages, triggering miscellaneous assertions in the VM system. Reported by: John Marshall <john.marshall@riverwillow.com.au> Tested by: John Marshall <john.marshall@riverwillow.com.au>, pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (gjb)	2013-10-09 18:43:29 +00:00
Konstantin Belousov	d6498b153e	When printing the vnode information from ddb, print the lengths of the dirty and clean buffer queues. Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (gjb)	2013-10-01 20:18:33 +00:00
Konstantin Belousov	fe39412e99	For vunref(), try to upgrade the vnode lock if the function was called with the vnode shared-locked. If upgrade succeeded, the inactivation can be done immediately, instead of being postponed. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (glebius)	2013-09-29 18:07:14 +00:00
Konstantin Belousov	27884e3bd1	Acquire a hold reference on the vnode when a knote is instantiated. Otherwise, knote keeps a pointer to a vnode which could become invalid any time. Reported by: many Tested by: Patrick Lamaiziere <patfbsd@davenulle.org> Discussed with: jmg Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (marius)	2013-09-26 13:14:51 +00:00
Pawel Jakub Dawidek	4593c0ad6b	In r114945 the line 'nmp = TAILQ_NEXT(mp, mnt_list);' was duplicated. Instead of just removing the duplicate, convert the loop to TAILQ_FOREACH().	2013-08-17 14:13:45 +00:00
Konstantin Belousov	2c38cc792e	When creation of the v_pollinfo raced and our instance of vpollinfo must be destroyed, knlist_clear() and seldrain() calls could be avoided, since vpollinfo was not used. More, the knlist_clear() calling protocol requires the knlist locked, which is not true at the call site. Split the destruction into the helper destroy_vpollinfo_free(), and call it when raced, instead of destroy_vpollinfo(). Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 days	2013-07-28 06:59:29 +00:00
Konstantin Belousov	a8b0523ae7	Clear the vnode knotes before destroying vpollinfo. Reported and tested by: Patrick Lamaiziere <patfbsd@davenulle.org> Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-07-17 10:56:21 +00:00
Konstantin Belousov	d39116f5d5	Be more generous when donating the current thread time to the owner of the vnode lock while iterating over the free vnode list. Instead of yielding, pause for 1 tick. The change is reported to help in some virtualized environments. Submitted by: Roger Pau Monn? <roger.pau@citrix.com> Discussed with: jilles Tested by: pho MFC after: 2 weeks	2013-06-03 17:36:43 +00:00
Jeff Roberson	22a722605d	- Convert the bufobj lock to rwlock. - Use a shared bufobj lock in getblk() and inmem(). - Convert softdep's lk to rwlock to match the bufobj lock. - Move INFREECNT to b_flags and protect it with the buf lock. - Remove unnecessary locking around bremfree() and BKGRDINPROG. Sponsored by: EMC / Isilon Storage Division Discussed with: mckusick, kib, mdf	2013-05-31 00:43:41 +00:00
Jeff Roberson	f2cc1285c2	- Add a new general purpose path-compressed radix trie which can be used with any structure containing a uint64_t index. The tree code auto-generates type safe wrappers. - Eliminate the buf splay and replace it with pctrie. This is not only significantly faster with large files but also allows for the possibility of shared locking. Reviewed by: alc, attilio Sponsored by: EMC / Isilon Storage Division	2013-05-12 04:05:01 +00:00
Konstantin Belousov	0fc6daa72d	- Fix nullfs vnode reference leak in nullfs_reclaim_lowervp(). The null_hashget() obtains the reference on the nullfs vnode, which must be dropped. - Fix a wart which existed from the introduction of the nullfs caching, do not unlock lower vnode in the nullfs_reclaim_lowervp(). It should be innocent, but now it is also formally safe. Inform the nullfs_reclaim() about this using the NULLV_NOUNLOCK flag set on nullfs inode. - Add a callback to the upper filesystems for the lower vnode unlinking. When inactivating a nullfs vnode, check if the lower vnode was unlinked, indicated by nullfs flag NULLV_DROP or VV_NOSYNC on the lower vnode, and reclaim upper vnode if so. This allows nullfs to purge cached vnodes for the unlinked lower vnode, avoiding excessive caching. Reported by: G??ran L??wkrantz <goran.lowkrantz@ismobile.com> Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-05-11 11:17:44 +00:00
Marcel Moolenaar	e63091ea6c	Add option WITNESS_NO_VNODE to suppress printing LORs between VNODE locks. To support this, VNODE locks are created with the LK_IS_VNODE flag. This flag is propagated down using the LO_IS_VNODE flag. Note that WITNESS still records the LOR. Only the printing and the optional entering into the kernel debugger is bypassed with the WITNESS_NO_VNODE option.	2013-05-09 16:28:18 +00:00
Matthew D Fleming	770b41b349	Add missing vdrop() in error case. Submitted by: Fahad (mohd.fahadullah@isilon.com) MFC after: 1 week	2013-05-04 18:38:16 +00:00
Rick Macklem	64fa8df6e0	Allow the vnode to be unlocked for the weird case of LK_EXCLOTHER. LK_EXCLOTHER is only used to acquire a usecount on a vnode during NFSv4 recovery from an expired lease. Reported and tested by: pho MFC after: 2 weeks	2013-04-16 14:22:16 +00:00
Jeff Roberson	26089666b6	Prepare to replace the buf splay with a trie: - Don't insert BKGRDMARKER bufs into the splay or dirty/clean buf lists. No consumers need to find them there and it complicates the tree. These flags are all FFS specific and could be moved out of the buf cache. - Use pbgetvp() and pbrelvp() to associate the background and journal bufs with the vp. Not only is this much cheaper it makes more sense for these transient bufs. - Fix the assertions in pbget* and pbrel*. It's not safe to check list pointers which were never initialized. Use the BX flags instead. We also check B_PAGING in reassignbuf() so this should cover all cases. Discussed with: kib, mckusick, attilio Sponsored by: EMC / Isilon Storage Division	2013-04-06 22:21:23 +00:00
Attilio Rao	89f6b8632c	Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho	2013-03-09 02:32:23 +00:00
Konstantin Belousov	10b4bb0b33	Add a trivial comment to record the proper commit log for r245407: Set the v_hash for a new vnode in the getnewvnode() to the value calculated based on the vnode structure address. Filesystems using vfs_hash_insert() override the v_hash using the standard formula of (inode_number + mnt_hashseed). For other filesystems, the initialization allows the vfs_hash_index() to provide useful hash too. Suggested, reviewed and tested by: peter Sponsored by: The FreeBSD Foundation MFC after: 5 days	2013-01-14 05:52:23 +00:00
Konstantin Belousov	a41df84820	diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c index 7c243b6..0bdaf36 100644 --- a/sys/kern/vfs_subr.c +++ b/sys/kern/vfs_subr.c @@ -279,6 +279,7 @@ SYSCTL_INT(_debug, OID_AUTO, vnlru_nowhere, CTLFLAG_RW, #define VSHOULDFREE(vp) (!((vp)->v_iflag & VI_FREE) && !(vp)->v_holdcnt) #define VSHOULDBUSY(vp) (((vp)->v_iflag & VI_FREE) && (vp)->v_holdcnt) +static int vnsz2log; /* * Initialize the vnode management data structures. @@ -293,6 +294,7 @@ SYSCTL_INT(_debug, OID_AUTO, vnlru_nowhere, CTLFLAG_RW, static void vntblinit(void dummy __unused) { + u_int i; int physvnodes, virtvnodes; / @@ -332,6 +334,9 @@ vntblinit(void dummy __unused) syncer_maxdelay = syncer_mask + 1; mtx_init(&sync_mtx, "Syncer mtx", NULL, MTX_DEF); cv_init(&sync_wakeup, "syncer"); + for (i = 1; i <= sizeof(struct vnode); i <<= 1) + vnsz2log++; + vnsz2log--; } SYSINIT(vfs, SI_SUB_VFS, SI_ORDER_FIRST, vntblinit, NULL); @@ -1067,6 +1072,14 @@ alloc: } rangelock_init(&vp->v_rl); + / + * For the filesystems which do not use vfs_hash_insert(), + * still initialize v_hash to have vfs_hash_index() useful. + * E.g., nullfs uses vfs_hash_index() on the lower vnode for + * its own hashing. + / + vp->v_hash = (uintptr_t)vp >> vnsz2log; + vpp = vp; return (0); }	2013-01-14 05:42:54 +00:00
Attilio Rao	c92c859b7b	Fixup r244240: mp_ncpus will be 1 also in the !SMP and smp_disabled=1 case. There is no point in optimizing further the code and use a TRUE litteral for a path that does heavyweight stuff anyway (like lock acq), at the price of obfuscated code. Use the appropriate check where necessary and remove a macro. Sponsored by: EMC / Isilon storage division MFC after: 3 days	2012-12-26 15:20:32 +00:00
Attilio Rao	b1308d72c2	Fixup r218424: uio_yield() was scaling directly to userland priority. When kern_yield() was introduced with the possibility to specify a new priority, the behaviour changed by not lowering priority at all in the consumers, making the yielding mechanism highly ineffective for high priority kthreads like bufdaemon, syncer, vlrudaemon, etc. There are no evidences that consumers could bear with such change in semantic and this situation could finally lead to bugs similar to the ones fixed in r244240. Re-specify userland pri for kthreads involved. Tested by: pho Reviewed by: kib, mdf MFC after: 1 week	2012-12-21 13:14:12 +00:00
Konstantin Belousov	14df601e47	When mnt_vnode_next_active iterator cannot lock the next vnode and yields, specify the user priority for the yield. Otherwise, a higher-priority (kernel) thread could fall into the priority-inversion with the thread owning the mutex lock. On single-processor machines or UP kernels, do not loop adaptively when the next vnode cannot be locked, instead yield unconditionally. Restructure the iteration initializer and the iterator to remove code duplication. Put the code to fetch and lock a vnode next to the current marker, into the mnt_vnode_next_active() function, and use it instead of repeating the loop. Reported by: hrs, rmacklem Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 days	2012-12-15 02:04:46 +00:00
Konstantin Belousov	686ffcaceb	Do not yield while owning a mutex. The Giant reacquire in the kern_yield() is problematic than. The owned mutex is the mount interlock, and it is in fact not needed to guarantee the stability of the mount list of active vnodes, so fix the the issue by only taking the mount interlock for MNT_REF and MNT_REL operations. While there, augment the unconditional yield by some amount of spinning [1]. Reported and tested by: pho Reviewed by: attilio Submitted by: attilio [1] MFC after: 3 days	2012-12-10 20:44:09 +00:00
Konstantin Belousov	07840861b1	The vnode_free_list_mtx is required unconditionally when iterating over the active list. The mount interlock is not enough to guarantee the validity of the tailq link pointers. The __mnt_vnode_next_active() and __mnt_vnode_first_active() active lists iterators helper functions did not provided the neccessary stability for the list, allowing the iterators to pick garbage. This was uncovered after the r243599 made the active list iterators non-nop. Since a vnode interlock is before the vnode_free_list_mtx, obtain the vnode ilock in the non-blocking manner when under vnode_free_list_mtx, and restart iteration after the yield if the lock attempt failed. Assert that a vnode found on the list is active, and assert that the helpers return the vnode with interlock owned. Reported and tested by: pho MFC after: 1 week	2012-12-03 22:15:16 +00:00
David Xu	3da9ab75f4	Take first active vnode correctly. Reviewed by: kib MFC after: 3 days	2012-11-27 06:07:58 +00:00
Andriy Gapon	6b991098a7	assert_vop_locked: make the assertion race-free and more efficient this is really a minor improvement for the sake of correctness MFC after: 6 days	2012-11-24 13:11:47 +00:00
Andriy Gapon	4f15bb6730	remove vop_lookup_pre and vop_lookup_post Suggested by: kib MFC after: 5 days	2012-11-22 10:36:10 +00:00
Attilio Rao	973b795b64	insmntque() is always called with the lock held in exclusive mode, then: - assume the lock is held in exclusive mode and remove a moot check about the lock acquisition. - in the destructor remove !MPSAFE specific chunk. Reviewed by: kib MFC after: 2 weeks	2012-11-19 20:43:19 +00:00
Andriy Gapon	ab49c952d9	assert_vop_locked should treat LK_EXCLOTHER as the not locked case ... from a perspective of the current thread. Spotted by: mjg Discussed with: kib MFC after: 18 days	2012-11-19 11:35:56 +00:00
Andriy Gapon	c496727c54	vnode_if: fix locking protocol description for lookup and cachedlookup Also remove the checks from vop_lookup_pre and vop_lookup_post, which are now completely redundant (before this change they were partially redundant). Discussed with: kib MFC after: 10 days	2012-11-19 11:32:56 +00:00
Attilio Rao	bc2258da88	Complete MPSAFE VFS interface and remove MNTK_MPSAFE flag. Porters should refer to __FreeBSD_version 1000021 for this change as it may have happened at the same timeframe.	2012-11-09 18:02:25 +00:00
Konstantin Belousov	76fd782cd9	A clarification to the behaviour of the active vnode list management regarding the vnode page cleaning. In collaboration with: pho MFC after: 1 week	2012-11-05 16:40:42 +00:00
Konstantin Belousov	90af57930c	Add decoding of the missed MNT_KERN_ flags to ddb "show mount" command. MFC after: 3 weeks	2012-11-04 13:33:13 +00:00
Konstantin Belousov	fb81941575	Add decoding of the missed VI_ and VV_ flags to ddb "show vnode" command. MFC after: 3 days	2012-11-04 13:32:45 +00:00
Konstantin Belousov	df3161c7df	Order the enumeration of the MNT_ flags to be the same as the order of their definitions. MFC after: 3 days	2012-11-04 13:31:41 +00:00
Konstantin Belousov	5050aa86cf	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho	2012-10-22 17:50:54 +00:00
Konstantin Belousov	9b233e2307	Add a KPI to allow to reserve some amount of space in the numvnodes counter, without actually allocating the vnodes. The supposed use of the getnewvnode_reserve(9) is to reclaim enough free vnodes while the code still does not hold any resources that might be needed during the reclamation, and to consume the slack later for getnewvnode() calls made from the innards. After the critical block is finished, the caller shall free any reserve left, by getnewvnode_drop_reserve(9). Reviewed by: avg Tested by: pho MFC after: 1 week	2012-10-14 19:43:37 +00:00
Attilio Rao	0a15e5d30d	Remove all the checks on curthread != NULL with the exception of some MD trap checks (eg. printtrap()). Generally this check is not needed anymore, as there is not a legitimate case where curthread != NULL, after pcpu 0 area has been properly initialized. Reviewed by: bde, jhb MFC after: 1 week	2012-09-13 22:26:22 +00:00
Konstantin Belousov	bcd5bb8e57	Add a facility for vgone() to inform the set of subscribed mounts about vnode reclamation. Typical use is for the bypass mounts like nullfs to get a notification about lower vnode going away. Now, vgone() calls new VFS op vfs_reclaim_lowervp() with an argument lowervp which is reclaimed. It is possible to register several reclamation event listeners, to correctly handle the case of several nullfs mounts over the same directory. For the filesystem not having nullfs mounts over it, the overhead added is a single mount interlock lock/unlock in the vnode reclamation path. In collaboration with: pho MFC after: 3 weeks	2012-09-09 19:17:15 +00:00
Konstantin Belousov	258f94423b	Provide some compat32 shims for sysctl vfs.conflist. It is required for getvfsbyname(3) operation when called from 32bit process, and getvfsbyname(3) is used by recent bsdtar import. Reported by: many Tested by: David Naylor <naylor.b.david@gmail.com> MFC after: 5 days	2012-08-22 20:05:34 +00:00
Andriy Gapon	7adc598a15	free wdog_kern_pat calls in post-panic paths from under SW_WATCHDOG Those calls are useful with hardware watchdog drivers too. MFC after: 3 weeks	2012-06-03 08:01:12 +00:00
Konstantin Belousov	8f0e91308a	Add a rangelock implementation, intended to be used to range-locking the i/o regions of the vnode data space. The implementation is quite simple-minded, it uses the list of the lock requests, ordered by arrival time. Each request may be for read or for write. The implementation is fair FIFO. MFC after: 2 month	2012-05-30 16:06:38 +00:00
Edward Tomasz Napierala	af6e6b87ad	Remove unused thread argument to vrecycle(). Reviewed by: kib	2012-04-23 14:10:34 +00:00
Edward Tomasz Napierala	c52fd858ae	Remove unused thread argument from vtruncbuf(). Reviewed by: kib	2012-04-23 13:21:28 +00:00
Kirk McKusick	dca5e0ec50	This update uses the MNT_VNODE_FOREACH_ACTIVE interface that loops over just the active vnodes associated with a mount point to replace MNT_VNODE_FOREACH_ALL in the vfs_msync, ffs_sync_lazy, and qsync routines. The vfs_msync routine is run every 30 seconds for every writably mounted filesystem. It ensures that any files mmap'ed from the filesystem with modified pages have those pages queued to be written back to the file from which they are mapped. The ffs_lazy_sync and qsync routines are run every 30 seconds for every writably mounted UFS/FFS filesystem. The ffs_lazy_sync routine ensures that any files that have been accessed in the previous 30 seconds have had their access times queued for updating in the filesystem. The qsync routine ensures that any files with modified quotas have those quotas queued to be written back to their associated quota file. In a system configured with 250,000 vnodes, less than 1000 are typically active at any point in time. Prior to this change all 250,000 vnodes would be locked and inspected twice every minute by the syncer. For UFS/FFS filesystems they would be locked and inspected six times every minute (twice by each of these three routines since each of these routines does its own pass over the vnodes associated with a mount point). With this change the syncer now locks and inspects only the tiny set of vnodes that are active. Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks	2012-04-20 07:00:28 +00:00
Kirk McKusick	f257ebbb2e	This change creates a new list of active vnodes associated with a mount point. Active vnodes are those with a non-zero use or hold count, e.g., those vnodes that are not on the free list. Note that this list is in addition to the list of all the vnodes associated with a mount point. To avoid adding another set of linkage pointers to the vnode structure, the active list uses the existing linkage pointers used by the free list (previously named v_freelist, now renamed v_actfreelist). This update adds the MNT_VNODE_FOREACH_ACTIVE interface that loops over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks	2012-04-20 06:50:44 +00:00
Kirk McKusick	16165feec4	Delete a no longer useful VNASSERT missed during changes in 234400. Suggested by: kib	2012-04-18 19:34:20 +00:00
Kirk McKusick	60005d66ab	Fix a memory leak of M_VNODE_MARKER introduced in 234386. Found by: Peter Holm	2012-04-18 19:30:22 +00:00
Kirk McKusick	73305eb826	Drop export of vdestroy() function from kern/vfs_subr.c as it is used only as a helper function in that file. Replace sole call to vbusy() with inline code in vholdl(). Replace sole calls to vfree() and vdestroy() with inline code in vdropl(). The Clang compiler already inlines these functions, so they do not show up in a kernel backtrace which is confusing. Also you cannot set their frame in kgdb which means that it is impossible to view their local variables. So, while the produced code is unchanged, the debugging should be easier. Discussed with: kib MFC after: 2 weeks	2012-04-17 21:46:59 +00:00
Kirk McKusick	71469bb38f	Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL. The primary changes are that the user of the interface no longer needs to manage the mount-mutex locking and that the vnode that is returned has its mutex locked (thus avoiding the need to check to see if its is DOOMED or other possible end of life senarios). To minimize compatibility issues for third-party developers, the old MNT_VNODE_FOREACH interface will remain available so that this change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH will be removed in head. The reason for this update is to prepare for the addition of the MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks	2012-04-17 16:28:22 +00:00

1 2 3 4 5 ...

976 Commits