Commit Graph

951 Commits

Author SHA1 Message Date
trasz
8869161bb2 Improve debugging printf. 2017-01-22 15:27:14 +00:00
mjg
e90da0df92 vfs: hide the getvnode NULL mp message behind DIAGNOSTIC
Since crossmp vnode changes the message was being printed on each boot.

Reported by:	trasz
Discussed with:	kib
2017-01-21 16:59:50 +00:00
mjg
9412199098 vfs: switch nodes_created, recycles_count and free_owe_inact to counter(9)
Reviewed by:	kib
2016-12-31 19:59:31 +00:00
mjg
f03b37f3e8 vfs: add vrefact, to be used when the vnode has to be already active
This allows blind increment of relevant counters which under contention
is cheaper than inc-not-zero loops at least on amd64.

Use it in some of the places which are guaranteed to see already active
vnodes.

Reviewed by:	kib (previous version)
2016-12-12 15:37:11 +00:00
markj
e06021a945 Launder VPO_NOSYNC pages upon vnode deactivation.
As of r234483, vnode deactivation causes non-VPO_NOSYNC pages to be
laundered. This behaviour has two problems:

1. Dirty VPO_NOSYNC pages must be laundered before the vnode can be
   reclaimed, and this work may be unfairly deferred to the vnlru process
   or an unrelated application when the system is under vnode pressure.
2. Deactivation of a vnode with dirty VPO_NOSYNC pages requires a scan of
   the corresponding VM object's memq for non-VPO_NOSYNC dirty pages; if
   the laundry thread needs to launder pages from an unreferenced such
   vnode, it will reactivate and deactivate the vnode with each laundering,
   potentially resulting in a large number of expensive scans.

Therefore, ensure that all dirty pages are laundered upon deactivation,
i.e., when all maps of the vnode are removed and all references are
released.

Reviewed by:	alc, kib
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D8641
2016-11-26 21:00:27 +00:00
mjg
af2e2494f1 vfs: clear the tmp free list flag before taking the free vnode list lock
Safe access is already guaranteed because of the mnt_listmx lock.
2016-10-08 13:36:59 +00:00
bdrewery
fc32cf9f50 vrefl: Assert that the interlock is held.
Sponsored by:	Dell EMC Isilon
MFC after:	2 weeks
2016-10-06 18:10:19 +00:00
bdrewery
f3e00c4570 Add vrecyclel() to vrecycle() a vnode with the interlock already held.
Obtained from:	OneFS
Sponsored by:	Dell EMC Isilon
MFC after:	2 weeks
2016-10-06 18:09:22 +00:00
bdrewery
2d8b77f87e Correct some comments after r294299.
Sponsored by:	Dell EMC Isilon
2016-10-04 21:44:20 +00:00
mjg
b06af67fb0 vfs: batch free vnodes in per-mnt lists
Previously free vnodes would always by directly returned to the global
LRU list. With this change up to mnt_free_list_batch vnodes are collected
first.

syncer runs always return the batch regardless of its size.

While vnodes on per-mnt lists are not counted as free, they can be
returned in case of vnode shortage.

Reviewed by:	kib
Tested by:	pho
2016-09-30 17:27:17 +00:00
mjg
6a50fe29a5 vfs: remove the __bo_vnode field from struct vnode
The pointer can be obtained using __containerof instead.

Reviewed by:	kib
2016-09-30 17:11:03 +00:00
emaste
00b67b15b9 Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
trasz
cf980f6c7a Print vnode details when vnode locking assertion gets triggered.
MFC after:	1 month
2016-08-12 22:20:52 +00:00
trasz
255ed885fa Replace all remaining calls to vprint(9) with vn_printf(9), and remove
the old macro.

MFC after:	1 month
2016-08-10 16:12:31 +00:00
trasz
522c3665ca Remove unused - never actually implemented - vnode lock types
from vnode_if.src.

MFC after:	1 month
2016-08-04 13:45:18 +00:00
kib
9d45f8230f Fix grammar.
Submitted by:	alc
MFC after:	2 weeks
2016-07-11 17:04:22 +00:00
kib
c859b3f77d In vgonel(), postpone setting BO_DEAD until VOP_RECLAIM() is called,
if vnode is VMIO.  For VMIO vnodes, set BO_DEAD in vm_object_terminate().

The vnode_destroy_object(), when calling into vm_object_terminate(),
must be able to flush buffers.  BO_DEAD purpose is to quickly destroy
buffers on write when the underlying vnode is not operable any more
(one example is the devfs node after geom is gone).  Setting BO_DEAD
for reclaiming vnode before object is terminated is premature, and
results in unability to flush buffers with live SU dependencies from
vinvalbuf() in vm_object_terminate().

Reported by:	David Cross <dcrosstech@gmail.com>
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-07-11 14:19:09 +00:00
kib
c31bb3499e Remove racy assert. The thread which changes vnode usecount from 0 to 1
does it under the vnode interlock, but the interlock is not owned by the
asserting thread.  As result, we might read increased use counter but also
still see VI_OWEINACT.

In collaboration with: nwhitehorn
Hardware donated by: IBM LTC
Sponsored by:	The FreeBSD Foundation (kib)
Approved by:	re (gjb)
2016-07-03 01:56:48 +00:00
kib
06125ebef5 Fix typo. Note that atomic is still required even for interlocked case.
Sponsored by:	The FreeBSD Foundation
Approved by:	re (marius)
2016-06-20 15:45:50 +00:00
mjg
ed393257f0 vfs: ifdef out noop vop_* primitives on !DEBUG_VFS_LOCKS kernels
This removes calls to empty functions like vop_lock_{pre/post} from
common vfs routines.

Approved by:	re (gjb)
2016-06-17 19:41:30 +00:00
kib
40488ecf84 Add VFS interface to flush specified amount of free vnodes belonging
to mount points with the given filesystem type, specified by mount
vfs_ops pointer.

Based on patch by:	mckusick
Reviewed by:	avg, mckusick
Tested by:	allanjude, madpilot
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
2016-06-17 17:33:25 +00:00
trasz
661ae93171 Cosmetics - add missing space after ellipses in shutdown messages.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-31 15:27:33 +00:00
avg
594a717030 vfs_read_dirent: increment ncookies after adding a cookie
It seems that at present vfs_read_dirent() is used only with filesystems
that do not support cookies, so the bug never manifested itself.

MFC after:	1 week
2016-05-16 07:31:11 +00:00
kib
838f7d1304 Add EVFILT_VNODE open, read and close notifications.
While there, order EVFILT_VNODE notes descriptions alphabetically.

Based on submission, and tested by:	Vladimir Kondratyev <wulf@cicgroup.ru>
MFC after:	2 weeks
2016-05-03 15:17:43 +00:00
kib
9eb6f0fde4 Issue NOTE_EXTEND when a directory entry is added to or removed from
the monitored directory as the result of rename(2) operation.  The
renames staying in the directory are not reported.

Submitted by:	Vladimir Kondratyev <wulf@cicgroup.ru>
MFC after:	2 weeks
2016-05-02 13:18:17 +00:00
kib
5bc5df1663 Fix reporting of NOTE_LINK when directory link count changes due to
rename removing or adding subdirectory entry.

Discussed with and tested by:	Vladimir Kondratyev <wulf@cicgroup.ru>
NetBSD PR:	48958 (http://gnats.netbsd.org/48958)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2016-05-02 13:13:32 +00:00
pfg
28823d0656 sys/kern: spelling fixes in comments.
No functional change.
2016-04-29 22:15:33 +00:00
pfg
fc01419148 sys: extend use of the howmany() macro when available.
We have a howmany() macro in the <sys/param.h> header that is
convenient to re-use as it makes things easier to read.
2016-04-26 15:38:17 +00:00
kib
392fea70cc Provide more correct sizing of the KVA consumed by a vnode, used by
the virtvnodes calculation.  Include the size of fs-specific v_data as
the nfs nclnode inline, the NFS nclnode is bigger than either ZFS
znode or UFS inode.  Include the size of namecache_ts and short cache
path element, multiplied by the name cache population factor, again
inline.

Inline defines are used to avoid pollution of the vnode.h with the
subsystem-private objects.  Non-significant unsynchronized changes of
the definitions are fine, we do not care about that precision, and
e.g. ZFS consumes much malloced memory per vnode for reasons
unaccounted in the formula.

Lower the partition of kmem dedicated to vnodes, from 1/7 to 1/10.

The measures reduce vnode cache pressure on kmem and bring the vnode
cache memory use below some apparent thresholds that were exceeded by
r291244 due to more robust vnode reuse.

Reported and tested by:	marius (i386, previous version)
Reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-02-24 15:15:46 +00:00
kib
a5f88cf93f In bnoreuselist(), check both ends of the specified logical block
numbers range.

This effectively skips indirect and extdata blocks on the buffer
queue.  Since their logical block numbers are negative, bnoreuselist()
could loop infinitely.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-02-17 19:39:57 +00:00
markj
09fb369fc5 Add vrefl(), a locked variant of vref(9).
This API has no in-tree consumers at the moment but is useful to at least
one out-of-tree consumer, and naturally complements existing vnode refcount
functions (vholdl(9), vdropl(9)).

Obtained from:	kib (sys/ portion)
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D4947
Differential Revision:	https://reviews.freebsd.org/D4953
2016-01-18 22:21:46 +00:00
kib
8c46f725d5 Two fixes for excessive iterations after r292326.
Advance the logical block number to the lblkno of the found block plus
one, instead of incrementing the block number which was used for
lookup.  This change skips sparcely populated buffer ranges, similar
to r292325, instead of doing useless lookups.

Do not restart the bnoreuselist() from the start of the range if
buffer lock cannot be obtained without sleep.  Only retry lookup and
lock for the same queue and same logical block number.

Reported by:	benno
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2016-01-05 14:48:40 +00:00
kib
b5160b0280 Optimize vop_stdadvise(POSIX_FADV_DONTNEED). Instead of looking up a
buffer for each block number in the range with gbincore(), look up the
next instantiated buffer with the logical block number which is
greater or equal to the next lblkno.  This significantly speeds up the
iteration for sparce-populated range.

Move the iteration into new helper bnoreuselist(), which is structured
similarly to flushbuflist().

Reported and tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
2015-12-16 08:48:37 +00:00
kib
764a2409cb Simplify the loop step in the flushbuflist() and make it independed on
the type stability of the buffers memory.  Instead of memoizing
pointer to the next buffer and validating it, remember the next
logical block number in the bo list and re-lookup.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2015-12-16 08:39:51 +00:00
mckusick
1a9ecd3df9 We need to zero out the clustering variables in a freed vnode structure.
For completeness add a VNASSERT that there are no threads waiting on a
range lock (this was previously checked on every vnode free).

Reported by; Rick Macklem
Fix from:    Mateusz Guzik
PR:          204949
2015-12-04 03:54:18 +00:00
mckusick
25671cd0d5 We need to zero out the union of pointers in a freed vnode structure.
PR:        204949
Fix from:  Mateusz Guzik
Tested by: Jason Unovitch
2015-12-03 02:04:22 +00:00
mckusick
6f0b4b3366 As the kernel allocates and frees vnodes, it fully initializes them
on every allocation and fully releases them on every free.  These
are not trivial costs: it starts by zeroing a large structure then
initializes a mutex, a lock manager lock, an rw lock, four lists,
and six pointers. And looking at vfs.vnodes_created, these operations
are being done millions of times an hour on a busy machine.

As a performance optimization, this code update uses the uma_init
and uma_fini routines to do these initializations and cleanups only
as the vnodes enter and leave the vnode_zone. With this change the
initializations are only done kern.maxvnodes times at system startup
and then only rarely again. The frees are done only if the vnode_zone
shrinks which never happens in practice. For those curious about the
avoided work, look at the vnode_init() and vnode_fini() functions in
kern/vfs_subr.c to see the code that has been removed from the main
vnode allocation/free path.

Reviewed by: kib
Tested by:   Peter Holm
2015-11-29 21:42:26 +00:00
kib
d58541e227 Remove VI_AGE vnode iflag, it is unused.
Noted by:	bde
Sponsored by:	The FreeBSD Foundation
2015-11-27 01:45:40 +00:00
kib
3cb64bd1ce Move the comment about resident pages preventing vnode from leaving
active list, into the header comment for vdrop(), which is the
function that decides whether to leave the vnode on the list.  Note
that dirty page write-out in vinactive() is asynchronous.

Discussed with:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-11-27 01:16:35 +00:00
kib
896302b6a8 Rework the vnode cache recycling to meet free and unused vnodes
targets.  See the comment above wantfreevnodes variable for the
description of the algorithm.

The vfs.vlru_alloc_cache_src sysctl is removed.  New code frees
namecache sources as the last chance to satisfy the highest watermark,
instead of selecting the source vnodes randomly. This provides good
enough behaviour to keep vn_fullpath() working in most situations.
The filesystem layout with deep trees, where the removed knob was
required, is thus handled automatically.

Submitted by:	bde
Discussed with:	mckusick
Tested by:	pho
MFC after:	1 month
2015-11-24 09:45:36 +00:00
glebius
6460e3db4e Remove remnants of the old NFS from vnode pager.
Reviewed by:	kib
Sponsored by:	Netflix
2015-11-20 23:52:27 +00:00
markj
bb2ac24a51 Remove a check for a condition that is always false by a preceding KASSERT
that was added in r144704.
2015-09-26 22:26:55 +00:00
markj
3eefaf542b Fix argument ordering in vn_printf().
MFC after:	3 days
2015-09-26 22:16:54 +00:00
cem
9fe341127e kevent(2): Note DOOMED vnodes with NOTE_REVOKE
In poll mode, check for and wake VBAD vnodes.  (Vnodes that are VBAD at
registration will never be woken by the RECLAIM trigger.)

Add post-VOP_RECLAIM hook to trigger notes on vnode reclamation.  (Vnodes that
were fine at registration but are vgoned while being monitored should signal
waiters.)

Reviewed by:	kib
Approved by:	markj (mentor)
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3675
2015-09-15 20:22:30 +00:00
mckusick
17357462a0 Track changes to kern.maxvnodes and appropriately increase or decrease
the size of the name cache hash table (mapping file names to vnodes)
and the vnode hash table (mapping mount point and inode number to vnode).
An appropriate locking strategy is the key to changing hash table sizes
while they are in active use.

Reviewed by: kib
Tested by:   Peter Holm
Differential Revision: https://reviews.freebsd.org/D2265
MFC after:   2 weeks
2015-09-06 05:50:51 +00:00
trasz
c1207ee9b8 Make vfs_unmountall() unmount /dev after /, not before. The only
reason this didn't result in an unclean shutdown is that devfs ignores
MNT_FORCE flag.

Reviewed by:	kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3467
2015-08-24 13:18:13 +00:00
trasz
9e31188bdc After r286237 it should be fine to call vgone(9) on a busy GEOM vnode;
remove KASSERT that would prevent forced devfs unmount from working.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-08-23 14:53:54 +00:00
ed
4a54322f0b Make it possible to implement poll(2) on top of kqueue(2).
It looks like EVFILT_READ and EVFILT_WRITE trigger under the same
conditions as poll()'s POLLRDNORM and POLLWRNORM as described by POSIX.
The only difference is that POLLRDNORM has to be triggered on regular
files unconditionally, whereas EVFILT_READ only triggers when not EOF.

Introduce a new flag, NOTE_FILE_POLL, that can be used to make
EVFILT_READ and EVFILT_WRITE behave identically to poll(). This flag
will be used by cloudlibc's poll() function.

Reviewed by:	jmg
Differential Revision:	https://reviews.freebsd.org/D3303
2015-08-05 07:34:29 +00:00
trasz
b8165ab6a9 Mark vgonel() as static. It was already declared static earlier;
no idea why compilers don't warn about this.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-08-04 08:51:56 +00:00
mjg
28fa5eedfe vfs: implement v_holdcnt/v_usecount manipulation using atomic ops
Transitions 0->1 and 1->0 (which decide e.g. on putting the vnode on the free
list) of either counter are still guarded with vnode interlock.

Reviewed by:	kib (earlier version)
Tested by:	pho
2015-07-16 13:57:05 +00:00