Commit Graph

252 Commits

Author SHA1 Message Date
Mateusz Guzik
b088a4d6f9 cache: avoid excessive relocking on entry removal during lookup
Due to lock ordering issues (bucket lock held, vnode locks wanted) the code
starts with trylocking which in face of contention often fails. Prior to
the change it would loop back with a possible yield.

Instead note we know what locks are needed and can take them in the right
order, avoiding retries. Then we can safely re-lookup and see if the entry
we are looking for is still there.

On a 104-way box poudriere would result in constant retries during an 11h
run as seen in the vfs.cache.zap_and_exit_bucket_fail counter.

before: 408866592
after :         0

However, a new stat reports:
vfs.cache.zap_and_exit_bucket_relock_success: 32638

Note this is only a bandaid over current design issues.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2019-09-10 20:19:29 +00:00
Mateusz Guzik
a6cacb0dca cache: change the formula for calculating lock array sizes
It used to be mp_ncpus * 64, but this gives unnecessarily big values for small
machines and at the same time constraints bigger ones. In particular this helps
on a 104-way box for which the count is now doubled.

While here make cache_purgevfs less likely. Currently it is not efficient in
face of contention due to lock ordering issues. These are fixable but not worth
it at the moment.

Sponsored by:	The FreeBSD Foundation
2019-09-10 20:11:00 +00:00
Mateusz Guzik
1214618c05 cache: assorted cleanups
Sponsored by:	The FreeBSD Foundation
2019-09-10 20:08:24 +00:00
Mateusz Guzik
e3c3248cc7 vfs: implement usecount implying holdcnt
vnodes have 2 reference counts - holdcnt to keep the vnode itself from getting
freed and usecount to denote it is actively used.

Previously all operations bumping usecount would also bump holdcnt, which is
not necessary. We can detect if usecount is already > 1 (in which case holdcnt
is also > 1) and utilize it to avoid bumping holdcnt on our own. This saves
on atomic ops.

Reviewed by:	kib
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21471
2019-09-03 15:42:11 +00:00
Alan Somers
caaa7cee09 [skip ci] Fix the comment for cache_purge(9)
This is a merge of r348738 from projects/fuse2

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-07-22 21:03:52 +00:00
Conrad Meyer
daec92844e Include ktr.h in more compilation units
Similar to r348026, exhaustive search for uses of CTRn() and cross reference
ktr.h includes.  Where it was obvious that an OS compat header of some kind
included ktr.h indirectly, .c files were left alone.  Some of these files
clearly got ktr.h via header pollution in some scenarios, or tinderbox would
not be passing prior to this revision, but go ahead and explicitly include it
in files using it anyway.

Like r348026, these CUs did not show up in tinderbox as missing the include.

Reported by:	peterj (arm64/mp_machdep.c)
X-MFC-With:	r347984
Sponsored by:	Dell EMC Isilon
2019-05-21 20:38:48 +00:00
Mateusz Guzik
8ba6c1391b cache: fix a brainfart in r347505
If bumping over the counter goes over the limit we have to decrement it back.

Previous code would only bump the counter after adding the entry (thus allowing
the cache to go over the limit).

Sponsored by:	The FreeBSD Foundation
2019-05-12 07:56:01 +00:00
Mateusz Guzik
5bf50787e6 cache: bump numcache on entry, while here fix lnumcache type
Sponsored by:	The FreeBSD Foundation
2019-05-12 06:59:22 +00:00
Mateusz Guzik
63ad3b65b0 cache: push sdt probes in cache_zap_locked to code doing the work
Avoids branching to check which probe to evaluate. Very same check was
being done later to do the actual work.

Sponsored by:	The FreeBSD Foundation
2019-05-12 06:39:30 +00:00
Alan Somers
691d4ab6f0 fix cache_lookup's documentation
cache_lookup's documentation got dislocated by r324378. Relocate and expand
it.

Reviewed by:	jhb, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-04-10 13:02:33 +00:00
Mateusz Guzik
22443809ff cache: retire cache_enter compat schim
It was added over 6 years ago for binary compat. cache_enter macro remains
as it expands to cache_enter_time.

Sponsored by:	The FreeBSD Foundation
2018-11-29 09:32:59 +00:00
Bjoern A. Zeeb
7ffbcfe281 Sometimes it is helpful to get the path for a vnode.
Implement a ddb function walking the namecache to do this.

Reviewed by:		jhb, mjg
Inspired by:		gdb macro from jhb (old version)
Sponsored by:		iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D14898
2018-06-20 08:34:29 +00:00
Matt Macy
e9b1074bc7 cache_lookup remove unused variable and initialize used 2018-05-19 04:08:11 +00:00
Mark Johnston
e1703ef5ae Plug a name cache lock leak.
Reviewed by:	mjg
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-12-01 22:51:02 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Mateusz Guzik
ce80021f4e namecache: bump numcache after dropping all locks
This makes no difference correctness-wise, but shortens total hold time.
2017-11-05 22:29:45 +00:00
Mateusz Guzik
119b826a62 namecache: wlock buckets in cache_lookup_nomakeentry
Since the case of an empty chain was already covered, it si very likely
that the existing entry is matching. Skipping readlocking saves on lock
upgrade.
2017-11-05 22:28:39 +00:00
Mateusz Guzik
ba324b5946 namecache: skip locking in cache_lookup_nomakeentry if there is no entry 2017-11-05 21:59:39 +00:00
Mateusz Guzik
a52058f013 namecache: skip locking in cache_purge_negative if there are no entries 2017-11-05 08:31:25 +00:00
Mateusz Guzik
ac850e5a8d namecache: fix .. check broken after r324378
wtf by:	mjg
Diagnosed by:	avg
2017-11-01 08:40:04 +00:00
Mateusz Guzik
5644fffa25 namecache: ncnegfactor 16 -> 12
It is used on each new entry addition to decide whether to whack an existing
negative entry in order to prevent a blow out in size, but the parameter was
set years ago and never revisited.

Building with poudriere results in about 400 evictions per second which
unnecessarily grab entries from the hot list.

With the new parameter there are next to no evictions of the sort.
2017-11-01 06:45:41 +00:00
Mateusz Guzik
709939a7b7 namecache: factor out ~MAKEENTRY lookups from the common path
Lookups of the sort are rare compared to regular ones and succesfull ones
result in removing entries from the cache.

In the current code buckets are rlocked and a trylock dance is performed,
which can fail and cause a restart. Fixing it will require a little bit
of surgery and in order to keep the code maintaineable the 2 cases have
to split.

MFC after:	1 week
2017-10-06 23:05:55 +00:00
John Baldwin
c2dc6d5db1 Use UMA_ALIGNOF() for name cache UMA zones.
This fixes kernel crashes due to misaligned accesses to the 64-bit
time_t embedded in struct namecache_ts in MIPS n32 kernels.

MFC after:	1 week
Sponsored by:	DARPA / AFRL
2017-09-27 23:18:57 +00:00
Mateusz Guzik
0bbae6f364 namecache: clean up struct namecache_ts handling
namecache_ts differs from mere namecache by few fields placed mid struct.
The access to the last element (the name) is thus special-cased.

The standard solution is to put new fields at the very beginning anad
embedd the original struct. The pointer shuffled around points to the
embedded part. If needed, access to new fields can be gained through
__containerof.

MFC after:	1 week
2017-09-10 11:17:32 +00:00
Mateusz Guzik
dad74ce924 namecache: fold the unlock label into the only consumer
No functional changes.

MFC after:	1 week
2017-09-08 06:57:11 +00:00
Mateusz Guzik
da8f32a7f1 namecache: factor out dot lookup into a dedicated function
The intent is to move uncommon cases out of the way.

MFC after:	1 week
2017-09-08 06:51:33 +00:00
Mateusz Guzik
8066a14a3c cache: stop holding the ncneg_hot lock across purging
Only non-hot entries are purged so the lock is not needed in the first place.
This saves one lock/unlock pair.

MFC after:	1 week
2017-05-04 03:11:59 +00:00
Brooks Davis
a3b7d0fb60 Regen after r316594. 2017-04-06 23:40:51 +00:00
Mateusz Guzik
dfecf51dd0 cache: use vrefact for '.' lookups and refing the rdir in fullpath 2017-01-30 03:20:05 +00:00
Mateusz Guzik
17071ff298 cache: annotate with __read_mostly and __exclusive_cache_line
MFC after:	1 month
2017-01-27 14:56:36 +00:00
Mateusz Guzik
4938d86764 cache: sprinkle __predict_false 2016-12-29 16:35:49 +00:00
Mateusz Guzik
b37707533e cache: move shrink lock init to nchinit
This gets rid of unnecesary sysinit usage.

While here also rename the lock to be consistent with the rest.
2016-12-29 12:01:54 +00:00
Mateusz Guzik
0569bc9ca9 cache: depessimize hashing macros/inlines
All hash sizes are power-of-2, but the compiler does not know that for sure
and 'foo % size' forces doing a division.

Store the size - 1 and use 'foo & hash' instead which allows mere shift.
2016-12-29 08:41:25 +00:00
Mateusz Guzik
6dd9661b77 cache: drop the NULL check from VP2VNODELOCK
Now that negative entries are annotated with a dedicated flag, NULL vnodes
are no longer passed.
2016-12-29 08:34:50 +00:00
Mateusz Guzik
25e578de55 vfs: use vrefact in getcwd and fchdir 2016-12-12 19:16:35 +00:00
Mateusz Guzik
8b0e0c91e0 cache: ensure that the number of bucket locks does not exceed hash size
The size can be changed by side effect of modifying kern.maxvnodes.

Since numbucketlocks was not modified, setting a sufficiently low value
would give more locks than actual buckets, which would then lead to
corruption.

Force the number of buckets to be not smaller.

Note this should not matter for real world cases.

Reported and tested by:	pho
2016-11-23 19:50:12 +00:00
Mateusz Guzik
6ce45c6ac3 cache: plug a write-only variable in cache_negative_zap_one 2016-11-15 03:43:10 +00:00
Mateusz Guzik
317cac6d5a cache: fix a race between entry removal and demotion
The negative list shrinker can demote an entry with only hotlist + neglist
locks held. On the other hand entry removal possibly sets the NCF_DVDROP
without aformentioned locks held prior to detaching it from the respective
netlist., which can lose the update made by the shrinker.

Reported and tested by:	truckman
2016-11-15 03:38:05 +00:00
Konstantin Belousov
9bd4f0a2c6 vn_fullpath1() checked VV_ROOT and then unreferenced
vp->v_mount->mnt_vnodecovered unlocked.  This allowed unmount to race.
Lock vnode after we noticed the VV_ROOT flag.  See comments for
explanation why unlocked check for the flag is considered safe.

Reported and tested by:	avg
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-11-07 10:55:56 +00:00
Mateusz Guzik
bb697a20d7 cache: fix up a corner case in r307650
If no negative entry is found on the last list, the ncp pointer will be
left uninitialized and a non-null value will make the function assume an
entry was found.

Fix the problem by initializing to NULL on entry.

Reported by:	glebius
2016-10-20 19:55:50 +00:00
Mateusz Guzik
a45a1a25b8 cache: split negative entry LRU into multiple lists
This splits the ncneg_mtx lock while preserving the hit ratio at least
during buildworld.

Create N dedicated lists for new negative entries.

Entries with at least one hit get promoted to the hot list, where they
get requeued every M hits.

Shrinking demotes one hot entry and performs a round-robin shrinking of
regular lists.

Reviewed by:	kib
2016-10-19 18:29:52 +00:00
Konstantin Belousov
f71d08566c Limit scope of the optimization in r306608 to dounmount() caller only.
Other uses of cache_purgevfs() do rely on the cache purge for correct
operations, when paths are invalidated without unmount.

Reported and tested by:	jkim
Discussed with:	mjg
Sponsored by:	The FreeBSD Foundation
2016-10-07 11:38:28 +00:00
Mateusz Guzik
4876636eb7 cache: ignore purgevfs requests for filesystems with few vnodes
purgevfs is purely optional and induces lock contention in workloads
which frequently mount and unmount filesystems.

In particular, poudriere will do this for filesystems with 4 vnodes or
less. Full cache scan is clearly wasteful.

Since there is no explicit counter for namecache entries, the number of
vnodes used by the target fs is checked.

The default limit is the number of bucket locks.

Reviewed by:	kib
2016-10-03 00:02:32 +00:00
Mateusz Guzik
1d2541fd1a cache: get rid of the global lock
Add a table of vnode locks and use them along with bucketlocks to provide
concurrent modification support. The approach taken is to preserve the
current behaviour of the namecache and just lock all relevant parts before
any changes are made.

Lookups still require the relevant bucket to be locked.

Discussed with:		kib
Tested by:	pho
2016-09-23 04:45:11 +00:00
Ed Maste
69a2875821 Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
Mateusz Guzik
a27815330c cache: improve scalability by introducing bucket locks
An array of bucket locks is added.

All modifications still require the global cache_lock to be held for
writing. However, most readers only need the relevant bucket lock and in
effect can run concurrently to the writer as long as they use a
different lock. See the added comment for more details.

This is an intermediate step towards removal of the global lock.

Reviewed by:	kib
Tested by:	pho
2016-09-10 16:29:53 +00:00
Mateusz Guzik
591df14528 cache: defer freeing entries until after the global lock is dropped
This also defers vdrop for held vnodes.

Glanced at by:	kib
2016-09-04 16:52:14 +00:00
Mateusz Guzik
31977b420a cache: manage negative entry list with a dedicated lock
Since negative entries are managed with a LRU list, a hit requires a
modificaton.

Currently the code tries to upgrade the global lock if needed and is
forced to retry the lookup if it fails.

Provide a dedicated lock for use when the cache is only shared-locked.

Reviewed by:	kib
MFC after:	1 week
2016-09-04 08:58:35 +00:00
Mateusz Guzik
b9042ae1bf cache: put all negative entry management code into dedicated functions
Reviewed by:	kib
MFC after:	1 week
2016-09-04 08:55:15 +00:00
Pedro F. Giffuni
e3043798aa sys/kern: spelling fixes in comments.
No functional change.
2016-04-29 22:15:33 +00:00