Commit Graph

233 Commits

Author SHA1 Message Date
Mateusz Guzik
ac850e5a8d namecache: fix .. check broken after r324378
wtf by:	mjg
Diagnosed by:	avg
2017-11-01 08:40:04 +00:00
Mateusz Guzik
5644fffa25 namecache: ncnegfactor 16 -> 12
It is used on each new entry addition to decide whether to whack an existing
negative entry in order to prevent a blow out in size, but the parameter was
set years ago and never revisited.

Building with poudriere results in about 400 evictions per second which
unnecessarily grab entries from the hot list.

With the new parameter there are next to no evictions of the sort.
2017-11-01 06:45:41 +00:00
Mateusz Guzik
709939a7b7 namecache: factor out ~MAKEENTRY lookups from the common path
Lookups of the sort are rare compared to regular ones and succesfull ones
result in removing entries from the cache.

In the current code buckets are rlocked and a trylock dance is performed,
which can fail and cause a restart. Fixing it will require a little bit
of surgery and in order to keep the code maintaineable the 2 cases have
to split.

MFC after:	1 week
2017-10-06 23:05:55 +00:00
John Baldwin
c2dc6d5db1 Use UMA_ALIGNOF() for name cache UMA zones.
This fixes kernel crashes due to misaligned accesses to the 64-bit
time_t embedded in struct namecache_ts in MIPS n32 kernels.

MFC after:	1 week
Sponsored by:	DARPA / AFRL
2017-09-27 23:18:57 +00:00
Mateusz Guzik
0bbae6f364 namecache: clean up struct namecache_ts handling
namecache_ts differs from mere namecache by few fields placed mid struct.
The access to the last element (the name) is thus special-cased.

The standard solution is to put new fields at the very beginning anad
embedd the original struct. The pointer shuffled around points to the
embedded part. If needed, access to new fields can be gained through
__containerof.

MFC after:	1 week
2017-09-10 11:17:32 +00:00
Mateusz Guzik
dad74ce924 namecache: fold the unlock label into the only consumer
No functional changes.

MFC after:	1 week
2017-09-08 06:57:11 +00:00
Mateusz Guzik
da8f32a7f1 namecache: factor out dot lookup into a dedicated function
The intent is to move uncommon cases out of the way.

MFC after:	1 week
2017-09-08 06:51:33 +00:00
Mateusz Guzik
8066a14a3c cache: stop holding the ncneg_hot lock across purging
Only non-hot entries are purged so the lock is not needed in the first place.
This saves one lock/unlock pair.

MFC after:	1 week
2017-05-04 03:11:59 +00:00
Brooks Davis
a3b7d0fb60 Regen after r316594. 2017-04-06 23:40:51 +00:00
Mateusz Guzik
dfecf51dd0 cache: use vrefact for '.' lookups and refing the rdir in fullpath 2017-01-30 03:20:05 +00:00
Mateusz Guzik
17071ff298 cache: annotate with __read_mostly and __exclusive_cache_line
MFC after:	1 month
2017-01-27 14:56:36 +00:00
Mateusz Guzik
4938d86764 cache: sprinkle __predict_false 2016-12-29 16:35:49 +00:00
Mateusz Guzik
b37707533e cache: move shrink lock init to nchinit
This gets rid of unnecesary sysinit usage.

While here also rename the lock to be consistent with the rest.
2016-12-29 12:01:54 +00:00
Mateusz Guzik
0569bc9ca9 cache: depessimize hashing macros/inlines
All hash sizes are power-of-2, but the compiler does not know that for sure
and 'foo % size' forces doing a division.

Store the size - 1 and use 'foo & hash' instead which allows mere shift.
2016-12-29 08:41:25 +00:00
Mateusz Guzik
6dd9661b77 cache: drop the NULL check from VP2VNODELOCK
Now that negative entries are annotated with a dedicated flag, NULL vnodes
are no longer passed.
2016-12-29 08:34:50 +00:00
Mateusz Guzik
25e578de55 vfs: use vrefact in getcwd and fchdir 2016-12-12 19:16:35 +00:00
Mateusz Guzik
8b0e0c91e0 cache: ensure that the number of bucket locks does not exceed hash size
The size can be changed by side effect of modifying kern.maxvnodes.

Since numbucketlocks was not modified, setting a sufficiently low value
would give more locks than actual buckets, which would then lead to
corruption.

Force the number of buckets to be not smaller.

Note this should not matter for real world cases.

Reported and tested by:	pho
2016-11-23 19:50:12 +00:00
Mateusz Guzik
6ce45c6ac3 cache: plug a write-only variable in cache_negative_zap_one 2016-11-15 03:43:10 +00:00
Mateusz Guzik
317cac6d5a cache: fix a race between entry removal and demotion
The negative list shrinker can demote an entry with only hotlist + neglist
locks held. On the other hand entry removal possibly sets the NCF_DVDROP
without aformentioned locks held prior to detaching it from the respective
netlist., which can lose the update made by the shrinker.

Reported and tested by:	truckman
2016-11-15 03:38:05 +00:00
Konstantin Belousov
9bd4f0a2c6 vn_fullpath1() checked VV_ROOT and then unreferenced
vp->v_mount->mnt_vnodecovered unlocked.  This allowed unmount to race.
Lock vnode after we noticed the VV_ROOT flag.  See comments for
explanation why unlocked check for the flag is considered safe.

Reported and tested by:	avg
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-11-07 10:55:56 +00:00
Mateusz Guzik
bb697a20d7 cache: fix up a corner case in r307650
If no negative entry is found on the last list, the ncp pointer will be
left uninitialized and a non-null value will make the function assume an
entry was found.

Fix the problem by initializing to NULL on entry.

Reported by:	glebius
2016-10-20 19:55:50 +00:00
Mateusz Guzik
a45a1a25b8 cache: split negative entry LRU into multiple lists
This splits the ncneg_mtx lock while preserving the hit ratio at least
during buildworld.

Create N dedicated lists for new negative entries.

Entries with at least one hit get promoted to the hot list, where they
get requeued every M hits.

Shrinking demotes one hot entry and performs a round-robin shrinking of
regular lists.

Reviewed by:	kib
2016-10-19 18:29:52 +00:00
Konstantin Belousov
f71d08566c Limit scope of the optimization in r306608 to dounmount() caller only.
Other uses of cache_purgevfs() do rely on the cache purge for correct
operations, when paths are invalidated without unmount.

Reported and tested by:	jkim
Discussed with:	mjg
Sponsored by:	The FreeBSD Foundation
2016-10-07 11:38:28 +00:00
Mateusz Guzik
4876636eb7 cache: ignore purgevfs requests for filesystems with few vnodes
purgevfs is purely optional and induces lock contention in workloads
which frequently mount and unmount filesystems.

In particular, poudriere will do this for filesystems with 4 vnodes or
less. Full cache scan is clearly wasteful.

Since there is no explicit counter for namecache entries, the number of
vnodes used by the target fs is checked.

The default limit is the number of bucket locks.

Reviewed by:	kib
2016-10-03 00:02:32 +00:00
Mateusz Guzik
1d2541fd1a cache: get rid of the global lock
Add a table of vnode locks and use them along with bucketlocks to provide
concurrent modification support. The approach taken is to preserve the
current behaviour of the namecache and just lock all relevant parts before
any changes are made.

Lookups still require the relevant bucket to be locked.

Discussed with:		kib
Tested by:	pho
2016-09-23 04:45:11 +00:00
Ed Maste
69a2875821 Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
Mateusz Guzik
a27815330c cache: improve scalability by introducing bucket locks
An array of bucket locks is added.

All modifications still require the global cache_lock to be held for
writing. However, most readers only need the relevant bucket lock and in
effect can run concurrently to the writer as long as they use a
different lock. See the added comment for more details.

This is an intermediate step towards removal of the global lock.

Reviewed by:	kib
Tested by:	pho
2016-09-10 16:29:53 +00:00
Mateusz Guzik
591df14528 cache: defer freeing entries until after the global lock is dropped
This also defers vdrop for held vnodes.

Glanced at by:	kib
2016-09-04 16:52:14 +00:00
Mateusz Guzik
31977b420a cache: manage negative entry list with a dedicated lock
Since negative entries are managed with a LRU list, a hit requires a
modificaton.

Currently the code tries to upgrade the global lock if needed and is
forced to retry the lookup if it fails.

Provide a dedicated lock for use when the cache is only shared-locked.

Reviewed by:	kib
MFC after:	1 week
2016-09-04 08:58:35 +00:00
Mateusz Guzik
b9042ae1bf cache: put all negative entry management code into dedicated functions
Reviewed by:	kib
MFC after:	1 week
2016-09-04 08:55:15 +00:00
Pedro F. Giffuni
e3043798aa sys/kern: spelling fixes in comments.
No functional change.
2016-04-29 22:15:33 +00:00
Konstantin Belousov
0791e0c0e7 Provide more correct sizing of the KVA consumed by a vnode, used by
the virtvnodes calculation.  Include the size of fs-specific v_data as
the nfs nclnode inline, the NFS nclnode is bigger than either ZFS
znode or UFS inode.  Include the size of namecache_ts and short cache
path element, multiplied by the name cache population factor, again
inline.

Inline defines are used to avoid pollution of the vnode.h with the
subsystem-private objects.  Non-significant unsynchronized changes of
the definitions are fine, we do not care about that precision, and
e.g. ZFS consumes much malloced memory per vnode for reasons
unaccounted in the formula.

Lower the partition of kmem dedicated to vnodes, from 1/7 to 1/10.

The measures reduce vnode cache pressure on kmem and bring the vnode
cache memory use below some apparent thresholds that were exceeded by
r291244 due to more robust vnode reuse.

Reported and tested by:	marius (i386, previous version)
Reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-02-24 15:15:46 +00:00
Mateusz Guzik
b0632ab432 cache: minor changes
1. vhold and zap immediately instead of postponing few lines later
2. increment numneg after new entry is added

No functional changes.

No objections:		kib
2016-01-21 01:09:39 +00:00
Mateusz Guzik
baa2bcf572 cache: perform . lockup without the namecache lock
Reviewed by:	kib
2016-01-21 01:07:05 +00:00
Mateusz Guzik
db709ecbcc cache: provide a helper for computing the hash
Reviewed by:	kib
2016-01-21 01:05:41 +00:00
Mateusz Guzik
76583fa294 cache: use counter(9) API to maintain statistics
Previously the code would just increment statistics while only holding a
shared lock, in effect losing updates.

Separate tracking for nchstats is removed as values can be obtained from
existing counters. Note that some fields are updated by external
consumers and are left unfixed. This should not be a serious issue as
this structure looks quite obsolete.

No strong objections: kib
2016-01-21 01:04:03 +00:00
Mateusz Guzik
6b53d1bc6f cache: ansify functions and fix some style issues
No functional changes.
2016-01-07 02:04:17 +00:00
Mark Johnston
3616095801 Fix style issues around existing SDT probes.
- Use SDT_PROBE<N>() instead of SDT_PROBE(). This has no functional effect
  at the moment, but will be needed for some future changes.
- Don't hardcode the module component of the probe identifier. This is
  set automatically by the SDT framework.

MFC after:	1 week
2015-12-16 23:39:27 +00:00
Andriy Gapon
2f2f522b5d save some bytes by using more concise SDT_PROBE<n> instead of SDT_PROBE
SDT_PROBE requires 5 parameters whereas SDT_PROBE<n> requires n parameters
where n is typically smaller than 5.

Perhaps SDT_PROBE should be made a private implementation detail.

MFC after:	20 days
2015-09-28 12:14:16 +00:00
Kirk McKusick
17518b1a2b Track changes to kern.maxvnodes and appropriately increase or decrease
the size of the name cache hash table (mapping file names to vnodes)
and the vnode hash table (mapping mount point and inode number to vnode).
An appropriate locking strategy is the key to changing hash table sizes
while they are in active use.

Reviewed by: kib
Tested by:   Peter Holm
Differential Revision: https://reviews.freebsd.org/D2265
MFC after:   2 weeks
2015-09-06 05:50:51 +00:00
Mateusz Guzik
752fc07d33 vfs: implement v_holdcnt/v_usecount manipulation using atomic ops
Transitions 0->1 and 1->0 (which decide e.g. on putting the vnode on the free
list) of either counter are still guarded with vnode interlock.

Reviewed by:	kib (earlier version)
Tested by:	pho
2015-07-16 13:57:05 +00:00
Edward Tomasz Napierala
6289b482ec Modify kern___getcwd() to take max pathlen limit as an additional
argument.  This will be used for the Linux emulation layer - for Linux,
PATH_MAX is 4096 and not 1024.

Differential Revision:	https://reviews.freebsd.org/D2335
Reviewed by:	kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-04-21 13:55:24 +00:00
Kirk McKusick
f351915514 More accurately collect name-cache statistics in sysctl functions
sysctl_debug_hashstat_nchash() and sysctl_debug_hashstat_rawnchash().
These changes are in preparation for allowing changes in the size
of the vnode hash tables driven by increases and decreases in the
maximum number of vnodes in the system.

Reviewed by: kib@
Phabric:     D2265
2015-04-18 00:59:03 +00:00
Dmitry Chagin
9f7a06f27e Indeed, instead of hiding the kern___getcwd() bug by bogus cast
in r276564, change path type to char * (pathnames are always char *).
And remove bogus casts of malloc().
kern___getcwd() internally doesn't actually use or support u_char *
paths, except to copy them to a normal char * path.

These changes are not visible to libc as libc/gen/getcwd.c misdeclares
__getcwd() as taking a plain char * path.

While here remove _SYS_SYSPROTO_H_ for __getcwd() syscall as
we always have sysproto.h.

Pointed out by:	bde

MFC after:	1 week
2015-01-04 10:34:02 +00:00
Hans Petter Selasky
f0188618f2 Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
Sergey Kandaurov
bcdd3bceb6 vn_path_to_global_path: update comment. 2014-08-03 07:59:19 +00:00
Konstantin Belousov
fe20047039 Fix accounting for the negative cache entries when reusing v_cache_dd.
Having ncneg diverge with the actual length of the ncneg tailq causes
NULL dereference.

Add assertion that an entry taken from ncneg queue is indeed negative.

Reported by and discussed with:	avg
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-12-27 17:09:59 +00:00
Andriy Gapon
d9fae5ab88 dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE
In its stead use the Solaris / illumos approach of emulating '-' (dash)
in probe names with '__' (two consecutive underscores).

Reviewed by:	markj
MFC after:	3 weeks
2013-11-26 08:46:27 +00:00
Attilio Rao
54366c0bd7 - For kernel compiled only with KDTRACE_HOOKS and not any lock debugging
option, unbreak the lock tracing release semantic by embedding
  calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined
  version of the releasing functions for mutex, rwlock and sxlock.
  Failing to do so skips the lockstat_probe_func invokation for
  unlocking.
- As part of the LOCKSTAT support is inlined in mutex operation, for
  kernel compiled without lock debugging options, potentially every
  consumer must be compiled including opt_kdtrace.h.
  Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the
  dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES
  is linked there and it is only used as a compile-time stub [0].

[0] immediately shows some new bug as DTRACE-derived support for debug
in sfxge is broken and it was never really tested.  As it was not
including correctly opt_kdtrace.h before it was never enabled so it
was kept broken for a while.  Fix this by using a protection stub,
leaving sfxge driver authors the responsibility for fixing it
appropriately [1].

Sponsored by:	EMC / Isilon storage division
Discussed with:	rstone
[0] Reported by:	rstone
[1] Discussed with:	philip
2013-11-25 07:38:45 +00:00
Andriy Gapon
4633a4c379 namecache sdt: freebsd doesn't support structured characters yet
:-)

MFC after:	 7 days
2013-07-09 08:58:34 +00:00