Commit Graph

934 Commits

Author SHA1 Message Date
kib
c31bb3499e Remove racy assert. The thread which changes vnode usecount from 0 to 1
does it under the vnode interlock, but the interlock is not owned by the
asserting thread.  As result, we might read increased use counter but also
still see VI_OWEINACT.

In collaboration with: nwhitehorn
Hardware donated by: IBM LTC
Sponsored by:	The FreeBSD Foundation (kib)
Approved by:	re (gjb)
2016-07-03 01:56:48 +00:00
kib
06125ebef5 Fix typo. Note that atomic is still required even for interlocked case.
Sponsored by:	The FreeBSD Foundation
Approved by:	re (marius)
2016-06-20 15:45:50 +00:00
mjg
ed393257f0 vfs: ifdef out noop vop_* primitives on !DEBUG_VFS_LOCKS kernels
This removes calls to empty functions like vop_lock_{pre/post} from
common vfs routines.

Approved by:	re (gjb)
2016-06-17 19:41:30 +00:00
kib
40488ecf84 Add VFS interface to flush specified amount of free vnodes belonging
to mount points with the given filesystem type, specified by mount
vfs_ops pointer.

Based on patch by:	mckusick
Reviewed by:	avg, mckusick
Tested by:	allanjude, madpilot
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
2016-06-17 17:33:25 +00:00
trasz
661ae93171 Cosmetics - add missing space after ellipses in shutdown messages.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-31 15:27:33 +00:00
avg
594a717030 vfs_read_dirent: increment ncookies after adding a cookie
It seems that at present vfs_read_dirent() is used only with filesystems
that do not support cookies, so the bug never manifested itself.

MFC after:	1 week
2016-05-16 07:31:11 +00:00
kib
838f7d1304 Add EVFILT_VNODE open, read and close notifications.
While there, order EVFILT_VNODE notes descriptions alphabetically.

Based on submission, and tested by:	Vladimir Kondratyev <wulf@cicgroup.ru>
MFC after:	2 weeks
2016-05-03 15:17:43 +00:00
kib
9eb6f0fde4 Issue NOTE_EXTEND when a directory entry is added to or removed from
the monitored directory as the result of rename(2) operation.  The
renames staying in the directory are not reported.

Submitted by:	Vladimir Kondratyev <wulf@cicgroup.ru>
MFC after:	2 weeks
2016-05-02 13:18:17 +00:00
kib
5bc5df1663 Fix reporting of NOTE_LINK when directory link count changes due to
rename removing or adding subdirectory entry.

Discussed with and tested by:	Vladimir Kondratyev <wulf@cicgroup.ru>
NetBSD PR:	48958 (http://gnats.netbsd.org/48958)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2016-05-02 13:13:32 +00:00
pfg
28823d0656 sys/kern: spelling fixes in comments.
No functional change.
2016-04-29 22:15:33 +00:00
pfg
fc01419148 sys: extend use of the howmany() macro when available.
We have a howmany() macro in the <sys/param.h> header that is
convenient to re-use as it makes things easier to read.
2016-04-26 15:38:17 +00:00
kib
392fea70cc Provide more correct sizing of the KVA consumed by a vnode, used by
the virtvnodes calculation.  Include the size of fs-specific v_data as
the nfs nclnode inline, the NFS nclnode is bigger than either ZFS
znode or UFS inode.  Include the size of namecache_ts and short cache
path element, multiplied by the name cache population factor, again
inline.

Inline defines are used to avoid pollution of the vnode.h with the
subsystem-private objects.  Non-significant unsynchronized changes of
the definitions are fine, we do not care about that precision, and
e.g. ZFS consumes much malloced memory per vnode for reasons
unaccounted in the formula.

Lower the partition of kmem dedicated to vnodes, from 1/7 to 1/10.

The measures reduce vnode cache pressure on kmem and bring the vnode
cache memory use below some apparent thresholds that were exceeded by
r291244 due to more robust vnode reuse.

Reported and tested by:	marius (i386, previous version)
Reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-02-24 15:15:46 +00:00
kib
a5f88cf93f In bnoreuselist(), check both ends of the specified logical block
numbers range.

This effectively skips indirect and extdata blocks on the buffer
queue.  Since their logical block numbers are negative, bnoreuselist()
could loop infinitely.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-02-17 19:39:57 +00:00
markj
09fb369fc5 Add vrefl(), a locked variant of vref(9).
This API has no in-tree consumers at the moment but is useful to at least
one out-of-tree consumer, and naturally complements existing vnode refcount
functions (vholdl(9), vdropl(9)).

Obtained from:	kib (sys/ portion)
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D4947
Differential Revision:	https://reviews.freebsd.org/D4953
2016-01-18 22:21:46 +00:00
kib
8c46f725d5 Two fixes for excessive iterations after r292326.
Advance the logical block number to the lblkno of the found block plus
one, instead of incrementing the block number which was used for
lookup.  This change skips sparcely populated buffer ranges, similar
to r292325, instead of doing useless lookups.

Do not restart the bnoreuselist() from the start of the range if
buffer lock cannot be obtained without sleep.  Only retry lookup and
lock for the same queue and same logical block number.

Reported by:	benno
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2016-01-05 14:48:40 +00:00
kib
b5160b0280 Optimize vop_stdadvise(POSIX_FADV_DONTNEED). Instead of looking up a
buffer for each block number in the range with gbincore(), look up the
next instantiated buffer with the logical block number which is
greater or equal to the next lblkno.  This significantly speeds up the
iteration for sparce-populated range.

Move the iteration into new helper bnoreuselist(), which is structured
similarly to flushbuflist().

Reported and tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
2015-12-16 08:48:37 +00:00
kib
764a2409cb Simplify the loop step in the flushbuflist() and make it independed on
the type stability of the buffers memory.  Instead of memoizing
pointer to the next buffer and validating it, remember the next
logical block number in the bo list and re-lookup.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2015-12-16 08:39:51 +00:00
mckusick
1a9ecd3df9 We need to zero out the clustering variables in a freed vnode structure.
For completeness add a VNASSERT that there are no threads waiting on a
range lock (this was previously checked on every vnode free).

Reported by; Rick Macklem
Fix from:    Mateusz Guzik
PR:          204949
2015-12-04 03:54:18 +00:00
mckusick
25671cd0d5 We need to zero out the union of pointers in a freed vnode structure.
PR:        204949
Fix from:  Mateusz Guzik
Tested by: Jason Unovitch
2015-12-03 02:04:22 +00:00
mckusick
6f0b4b3366 As the kernel allocates and frees vnodes, it fully initializes them
on every allocation and fully releases them on every free.  These
are not trivial costs: it starts by zeroing a large structure then
initializes a mutex, a lock manager lock, an rw lock, four lists,
and six pointers. And looking at vfs.vnodes_created, these operations
are being done millions of times an hour on a busy machine.

As a performance optimization, this code update uses the uma_init
and uma_fini routines to do these initializations and cleanups only
as the vnodes enter and leave the vnode_zone. With this change the
initializations are only done kern.maxvnodes times at system startup
and then only rarely again. The frees are done only if the vnode_zone
shrinks which never happens in practice. For those curious about the
avoided work, look at the vnode_init() and vnode_fini() functions in
kern/vfs_subr.c to see the code that has been removed from the main
vnode allocation/free path.

Reviewed by: kib
Tested by:   Peter Holm
2015-11-29 21:42:26 +00:00
kib
d58541e227 Remove VI_AGE vnode iflag, it is unused.
Noted by:	bde
Sponsored by:	The FreeBSD Foundation
2015-11-27 01:45:40 +00:00
kib
3cb64bd1ce Move the comment about resident pages preventing vnode from leaving
active list, into the header comment for vdrop(), which is the
function that decides whether to leave the vnode on the list.  Note
that dirty page write-out in vinactive() is asynchronous.

Discussed with:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-11-27 01:16:35 +00:00
kib
896302b6a8 Rework the vnode cache recycling to meet free and unused vnodes
targets.  See the comment above wantfreevnodes variable for the
description of the algorithm.

The vfs.vlru_alloc_cache_src sysctl is removed.  New code frees
namecache sources as the last chance to satisfy the highest watermark,
instead of selecting the source vnodes randomly. This provides good
enough behaviour to keep vn_fullpath() working in most situations.
The filesystem layout with deep trees, where the removed knob was
required, is thus handled automatically.

Submitted by:	bde
Discussed with:	mckusick
Tested by:	pho
MFC after:	1 month
2015-11-24 09:45:36 +00:00
glebius
6460e3db4e Remove remnants of the old NFS from vnode pager.
Reviewed by:	kib
Sponsored by:	Netflix
2015-11-20 23:52:27 +00:00
markj
bb2ac24a51 Remove a check for a condition that is always false by a preceding KASSERT
that was added in r144704.
2015-09-26 22:26:55 +00:00
markj
3eefaf542b Fix argument ordering in vn_printf().
MFC after:	3 days
2015-09-26 22:16:54 +00:00
cem
9fe341127e kevent(2): Note DOOMED vnodes with NOTE_REVOKE
In poll mode, check for and wake VBAD vnodes.  (Vnodes that are VBAD at
registration will never be woken by the RECLAIM trigger.)

Add post-VOP_RECLAIM hook to trigger notes on vnode reclamation.  (Vnodes that
were fine at registration but are vgoned while being monitored should signal
waiters.)

Reviewed by:	kib
Approved by:	markj (mentor)
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3675
2015-09-15 20:22:30 +00:00
mckusick
17357462a0 Track changes to kern.maxvnodes and appropriately increase or decrease
the size of the name cache hash table (mapping file names to vnodes)
and the vnode hash table (mapping mount point and inode number to vnode).
An appropriate locking strategy is the key to changing hash table sizes
while they are in active use.

Reviewed by: kib
Tested by:   Peter Holm
Differential Revision: https://reviews.freebsd.org/D2265
MFC after:   2 weeks
2015-09-06 05:50:51 +00:00
trasz
c1207ee9b8 Make vfs_unmountall() unmount /dev after /, not before. The only
reason this didn't result in an unclean shutdown is that devfs ignores
MNT_FORCE flag.

Reviewed by:	kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3467
2015-08-24 13:18:13 +00:00
trasz
9e31188bdc After r286237 it should be fine to call vgone(9) on a busy GEOM vnode;
remove KASSERT that would prevent forced devfs unmount from working.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-08-23 14:53:54 +00:00
ed
4a54322f0b Make it possible to implement poll(2) on top of kqueue(2).
It looks like EVFILT_READ and EVFILT_WRITE trigger under the same
conditions as poll()'s POLLRDNORM and POLLWRNORM as described by POSIX.
The only difference is that POLLRDNORM has to be triggered on regular
files unconditionally, whereas EVFILT_READ only triggers when not EOF.

Introduce a new flag, NOTE_FILE_POLL, that can be used to make
EVFILT_READ and EVFILT_WRITE behave identically to poll(). This flag
will be used by cloudlibc's poll() function.

Reviewed by:	jmg
Differential Revision:	https://reviews.freebsd.org/D3303
2015-08-05 07:34:29 +00:00
trasz
b8165ab6a9 Mark vgonel() as static. It was already declared static earlier;
no idea why compilers don't warn about this.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-08-04 08:51:56 +00:00
mjg
28fa5eedfe vfs: implement v_holdcnt/v_usecount manipulation using atomic ops
Transitions 0->1 and 1->0 (which decide e.g. on putting the vnode on the free
list) of either counter are still guarded with vnode interlock.

Reviewed by:	kib (earlier version)
Tested by:	pho
2015-07-16 13:57:05 +00:00
mjg
b3b0716b63 vfs: always clear VI_OWEINACT in consumers bumping v_usecount
Previously vputx would detect the condition and clear the flag.

With this change it is invalid to have both v_usecount > 0 and the flag
set. Assert the condition is met in all revlevant places.

Reviewed by:	kib
2015-07-11 16:28:55 +00:00
mjg
74d7b1e72e vfs: move si_usecount manipulation to dedicated functions
Reviewed by:	kib
2015-07-11 16:28:12 +00:00
kib
564e88d499 Do not allow creation of the dirty buffers for the dead buffer
objects, i.e. for buffer objects which vnode was reclaimed.  Buffer
cache cannot write such buffers.  Return the error and discard the
buffer immediately on write attempt.

BO_DIRTY now always set during vnode reclamation, since it is used not
only for the INVARIANTS checks.  Do allow placement of the clean
buffers on dead bufobj list, otherwise filesystems cannot use bufcache
at all after the devvp reclaim.

Reported and tested by:	trasz
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-07-11 11:21:56 +00:00
markj
f94d7a8603 Remove a stale descriptive comment for gbincore().
The splay trees referenced in the comment were converted to
path-compressed tries in r250551.

MFC after:	3 days
2015-07-05 22:44:41 +00:00
jmg
110a049369 zero this struct as it depends upon it...
Reviewed by:	mjg
Differential Revision:	https://reviews.freebsd.org/D2890
2015-06-23 18:40:20 +00:00
kib
becc575eec vfs_msync(), called from syncer vnode fsync VOP, only iterates over
the active vnode list for the given mount point, with the assumption
that vnodes with dirty pages are active.  This is enforced by
vinactive() doing vm_object_page_clean() pass over the vnode pages.

The issue is, if vinactive() cannot be called during vput() due to the
vnode being only shared-locked, we might end up with the dirty pages
for the vnode on the free list.  Such vnode is invisible to syncer,
and pages are only cleaned on the vnode reactivation.  In other words,
the race results in the broken guarantee that user data, written
through the mmap(2), is written to the disk not later than in 30
seconds after the write.

Fix this by keeping the vnode which is freed but still owing
inactivation, on the active list.  When syncer loops find such vnode,
it is deactivated and cleaned by the final vput() call.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-06-17 04:46:58 +00:00
kib
d77dbf3761 Right now, dounmount() is called with unreferenced mount point.
Nothing stops a parallel unmount to suceed before the given call to
dounmount() checks and locks the covered vnode.  Prevent dounmount()
from acting on the freed (although type-stable) memory by changing the
interface to require the mount point to be referenced.  dounmount()
consumes the reference on return, regardless of the sucessfull or
erronous result.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-05-27 09:22:50 +00:00
rmacklem
ad77d0b1c1 File systems that do not use the buffer cache (such as ZFS) must
use VOP_FSYNC() to perform the NFS server's Commit operation.
This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which
is set by file systems that use the buffer cache. If this flag
is not set, the NFS server always does a VOP_FSYNC().
This should be ok for old file system modules that do not set
MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although
it might not be optimal for file systems that use the buffer cache.

Reviewed by:	kib
MFC after:	2 weeks
2015-04-15 20:16:31 +00:00
kib
3bc9cbc06a The VNASSERT in vflush() FORCECLOSE case is trying to panic early to
prevent errors from yanking devices out from under filesystems.  Only
care about special vnodes on devfs, special nodes on other kinds of
filesystems do not have special properties.

Sponsored by:  EMC / Isilon Storage Division
Submitted by:   Conrad Meyer
MFC after:	1 week
2015-02-27 16:43:50 +00:00
ngie
994a2af400 Add the mnt_lockref field to the ddb(4) 'show mount' command
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D1688
Submitted by: Conrad Meyer <conrad.meyer@isilon.com>
Sponsored by: EMC / Isilon Storage Division
2015-02-17 09:31:58 +00:00
jhb
4247c4fbb3 Add two new counters for vnode life cycle events:
- vfs.recycles counts the number of vnodes forcefully recycled to avoid
  exceeding kern.maxvnodes.
- vfs.vnodes_created counts the number of vnodes created by successful
  calls to getnewvnode().

Differential Revision:	https://reviews.freebsd.org/D1671
Reviewed by:	kib
MFC after:	1 week
2015-02-14 17:02:51 +00:00
jhb
71715e274d Change the default VFS timestamp precision from seconds to microseconds.
Discussed on:	arch@
MFC after:	2 weeks
2015-01-25 19:56:45 +00:00
kib
ca2d4fd9c1 The vinactive() call in vgonel() may start writes for the dirty pages,
creating delayed write buffers belonging to the reclaimed vnode.  Put
the buffer cleanup code after inactivation.

Add asserts that ensure that buffer queues are empty and add BO_DEAD
flag for bufobj to check that no buffers are added after the cleanup.
BO_DEAD is only used by INVARIANTS-enabled kernels.

Reported and tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-12-13 16:02:37 +00:00
kib
86353785aa Apply chunk forgotten in r275620. Remove local variable for real.
CID:	1257462
Sponsored by:	The FreeBSD Foundation
2014-12-09 09:36:28 +00:00
kib
1a8d4344d0 Add functions syncer_suspend() and syncer_resume(), which are supposed
to be called before suspension and after resume, correspondingly.  The
syncer_suspend() ensures that all filesystems dirty data and metadata
are saved to the permanent storage, and stops kernel threads which
might modify filesystems.  The syncer_resume() restores stopped
threads.

For now, only syncer is stopped.  This is needed, because each sync
loop causes superblock updates for UFS.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-12-08 16:48:57 +00:00
mjg
143261aa90 Don't take devmtx unnecessarily in vn_isdisk.
MFC after:	1 week
2014-10-15 05:17:36 +00:00
will
eba83cccb3 In the syncer, drop the sync mutex while patting the watchdog.
Some watchdog drivers (like ipmi) need to sleep while patting the watchdog.
See sys/dev/ipmi/ipmi.c:ipmi_wd_event(), which calls malloc(M_WAITOK).

Submitted by:	asomers
MFC after:	1 month
Sponsored by:	Spectra Logic
MFSpectraBSD:	637548 on 2012/10/04
2014-10-01 15:32:28 +00:00