Commit Graph

338 Commits

Author SHA1 Message Date
Mateusz Guzik
d3476daddc cache: factor dotdot lookup out of cache_lookup
Tested by:	pho
2020-08-26 12:49:39 +00:00
Mateusz Guzik
f9cdb0775e cache: remove leftover assert in vn_fullpath_any_smr
It is only valid when !slash_prefixed. For slash_prefixed the length
is properly accounted for later.

Reported by:	markj (syzkaller)
2020-08-24 18:23:58 +00:00
Mateusz Guzik
e35406c8f7 cache: lockless reverse lookup
This enables fully scalable operation for getcwd and significantly improves
realpath.

For example:
PATH_CUSTOM=/usr/src ./getcwd_processes -t 104
before:  1550851
after: 380135380

Tested by:	pho
2020-08-24 09:00:57 +00:00
Mateusz Guzik
feabaaf995 cache: drop the always curthread argument from reverse lookup routines
Note VOP_VPTOCNP keeps getting it as temporary compatibility for zfs.

Tested by:	pho
2020-08-24 08:57:02 +00:00
Mateusz Guzik
f0696c5e4b cache: perform reverse lookup using v_cache_dd if possible
Tested by:	pho
2020-08-24 08:55:55 +00:00
Mateusz Guzik
ce575cd0e2 cache: populate v_cache_dd for non-VDIR entries
It makes v_cache_dd into a little bit of a misnomer and it may be addressed later.

Tested by:	pho
2020-08-24 08:55:04 +00:00
Mateusz Guzik
1e448a1558 cache: stronger vnode asserts in cache_enter_time 2020-08-22 16:58:34 +00:00
Mateusz Guzik
760a430bb3 vfs: add a work around for vp_crossmp bug to realpath
The actual bug is not yet addressed as it will get much easier after other
problems are addressed (most notably rename contract).

The only affected in-tree consumer is realpath. Everyone else happens to be
performing lookups within a mount point, having a side effect of ni_dvp being
set to mount point's root vnode in the worst case.

Reported by:	pho
2020-08-22 06:56:04 +00:00
Mateusz Guzik
17838b5869 cache: don't use cache_purge_negative when renaming
It avoidably scans (and evicts) unrelated entries. Instead take
advantage of passed componentname and perform a hash lookup
for the exact one.

Sample data from buildworld probed on cache_purge_negative extended
to count both scanned and evicted entries on each call are below.
At most it has to evict 1.

  evicted
           value  ------------- Distribution ------------- count
              -1 |                                         0
               0 |@@@@@@@@@@@@@@@                          19506
               1 |@@@@@                                    5820
               2 |@@@@@@                                   7751
               4 |@@@@@                                    6506
               8 |@@@@@                                    5996
              16 |@@@                                      4029
              32 |@                                        1489
              64 |                                         193
             128 |                                         109
             256 |                                         56
             512 |                                         16
            1024 |                                         7
            2048 |                                         3
            4096 |                                         1
            8192 |                                         1
           16384 |                                         0

  scanned
           value  ------------- Distribution ------------- count
              -1 |                                         0
               0 |@@                                       2456
               1 |@                                        1496
               2 |@@                                       2728
               4 |@@@                                      4171
               8 |@@@@                                     5122
              16 |@@@@                                     5335
              32 |@@@@@                                    6279
              64 |@@@@                                     5671
             128 |@@@@                                     4558
             256 |@@                                       3123
             512 |@@                                       2790
            1024 |@@                                       2449
            2048 |@@                                       3021
            4096 |@                                        1398
            8192 |@                                        886
           16384 |                                         0
2020-08-20 10:06:50 +00:00
Mateusz Guzik
39f8815070 cache: add cache_rename, a dedicated helper to use for renames
While here make both tmpfs and ufs use it.

No fuctional changes.
2020-08-20 10:05:46 +00:00
Mateusz Guzik
16be9f9956 cache: reimplement cache_lookup_nomakeentry as cache_remove_cnp
This in particular removes unused arguments.
2020-08-20 10:05:19 +00:00
Mateusz Guzik
6c55d6e030 cache: when adding an already existing entry assert on a complete match 2020-08-19 15:08:14 +00:00
Mateusz Guzik
7c75f14f5b cache: tidy up the comment above cache_prehash 2020-08-19 15:07:28 +00:00
Mateusz Guzik
3c5d2ed71f cache: add NOCAPCHECK to the list of supported flags for lockless lookup
It is de facto supported in that lockless lookup does not do any capability
checks.
2020-08-16 18:33:24 +00:00
Mateusz Guzik
8ab4becab0 vfs: use namei_zone for getcwd allocations
instead of malloc.

Note that this should probably be wrapped with a dedicated API and other
vn_getcwd callers did not get converted.
2020-08-16 18:21:21 +00:00
Mateusz Guzik
5e79447d60 cache: let SAVESTART passthrough
The flag is only passed for non-LOOKUP ops and those fallback to the slowpath.
2020-08-10 12:28:56 +00:00
Mateusz Guzik
bb48255cf5 cache: resize struct namecache to a multiply of alignment
For example struct namecache on amd64 is 100 bytes, but it has to occupies
104. Use the extra bytes to support longer names.
2020-08-10 12:05:55 +00:00
Mateusz Guzik
8b62cebea7 cache: remove unused variables from cache_fplookup_parse 2020-08-10 11:51:56 +00:00
Mateusz Guzik
03337743db vfs: clean MNTK_FPLOOKUP if MNT_UNION is set
Elides checking it during lookup.
2020-08-10 11:51:21 +00:00
Mateusz Guzik
c571b99545 cache: strlcpy -> memcpy 2020-08-10 10:40:14 +00:00
Mateusz Guzik
3ba0e51703 vfs: partially support file create/delete/rename in lockless lookup
Perform the lookup until the last 2 elements and fallback to slowpath.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2020-08-10 10:35:18 +00:00
Mateusz Guzik
21d5af2b30 vfs: drop the thread argumemnt from vfs_fplookup_vexec
It is guaranteed curthread.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2020-08-10 10:34:22 +00:00
Mateusz Guzik
e910c93eea cache: add more predicts for failing conditions 2020-08-06 04:20:14 +00:00
Mateusz Guzik
95888901f7 cache: plug unititalized variable use
CID:	1431128
2020-08-06 04:19:47 +00:00
Mateusz Guzik
e1b1971c05 cache: don't ignore size passed to nchinittbl 2020-08-05 09:38:02 +00:00
Mateusz Guzik
2b86f9d6d0 cache: convert the hash from LIST to SLIST
This reduces struct namecache by sizeof(void *).

Negative side is that we have to find the previous element (if any) when
removing an entry, but since we normally don't expect collisions it should be
fine.

Note this adds cache_get_hash calls which can be eliminated.
2020-08-05 09:25:59 +00:00
Mateusz Guzik
cf8ac0de81 cache: reduce zone alignment to 8 bytes
It used to be sizeof of the given struct to accomodate for 32 bit mips
doing 64 bit loads, but the same can be achieved with requireing just
64 bit alignment.

While here reorder struct namecache so that most commonly used fields
are closer.
2020-08-05 09:24:38 +00:00
Mateusz Guzik
d61ce7ef50 cache: convert ncnegnash into a macro
It is a read-only var with value known at compilation time.
2020-08-05 09:24:00 +00:00
Mateusz Guzik
2840f07d4f cache: cleanup lockless entry point
- remove spurious bzero
- assert ni_lcf, it has to be set by namei by this point
2020-08-05 07:32:26 +00:00
Mateusz Guzik
8ccf01e0e2 cache: stop messing with cn_lkflags
See r363882.
2020-08-05 07:30:57 +00:00
Mateusz Guzik
27c4618df5 cache: stop messing with cn_flags
This removes flag setting/unsetting carried over from regular lookup.
Flags still get for compatibility when falling back.

Note .. and . handling can get partially folded together.
2020-08-05 07:30:17 +00:00
Mateusz Guzik
db99ec5656 vfs: support lockless dotdot lookup
Tested by:	pho
2020-08-04 23:07:42 +00:00
Mateusz Guzik
b403aa126e cache: add NCF_WIP flag
This allows making half-constructed entries visible to the lockless lookup,
which now can check for either "not yet fully constructed" and "no longer valid"
state.

This will be used for .. lookup.
2020-08-04 23:07:00 +00:00
Mateusz Guzik
6e10434c02 cache: add cache_purge_vgone
cache_purge locklessly checks whether the vnode at hand has any namecache
entries. This can race with a concurrent purge which managed to remove
the last entry, but may not be done touching the vnode.

Make sure we observe the relevant vnode lock as not taken before proceeding
with vgone.

Paired with the fact that doomed vnodes cannnot receive entries this restores
the invariant that there are no namecache-related writing users past cache_purge
in vgone.

Reported by:	pho
2020-08-04 23:04:29 +00:00
Mateusz Guzik
1164f7a566 cache: factor away failed vexec handling 2020-08-04 19:55:26 +00:00
Mateusz Guzik
0439b00ea8 cache: assorted tidy ups 2020-08-04 19:55:00 +00:00
Mateusz Guzik
18bd02e2ce cache: factor away lockless dot lookup and add missing stat + sdt probe 2020-08-04 19:54:37 +00:00
Mateusz Guzik
17a66c7087 vfs: add vfs_op_thread_enter/exit _crit variants
and employ them in the namecache. Eliminates all spurious checks for preemption.
2020-08-04 19:54:10 +00:00
Mateusz Guzik
0311b05fec cache: add missing numcache detrement on insertion failure 2020-08-04 19:52:52 +00:00
Mateusz Guzik
7ad2f1105e vfs: store precomputed namecache hash in the vnode
This significantly speeds up path lookup, Cascade Lake doing access(2) on ufs
on /usr/obj/usr/src/amd64.amd64/sys/GENERIC/vnode_if.c, ops/s:
before: 2535298
after: 2797621

Over +10%.

The reversed order of computation here does not seem to matter for hash
distribution.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D25921
2020-08-02 20:02:06 +00:00
Mateusz Guzik
838984de32 vfs: move namecache initialisation into cache_vnode_init 2020-08-02 19:42:06 +00:00
Mateusz Guzik
8a7ec17095 cache: reshuffle struct cache_fpl and nameidata_saved
Shaves 16 bytes.
2020-08-01 06:35:18 +00:00
Mateusz Guzik
5a3944334c cache: mark climb_mount as __noinline 2020-08-01 06:34:18 +00:00
Mateusz Guzik
cb90ef2875 cache: drop the useless numchecks counter 2020-07-30 22:52:18 +00:00
Mateusz Guzik
404927357d vfs: add support for WANTPARENT and LOCKPARENT to lockless lookup
This makes the realpath syscall operational with the new lookup. Note that the
walk to obtain the full path name still takes locks.

Tested by:      pho
Differential Revision:	https://reviews.freebsd.org/D23917
2020-07-30 15:45:11 +00:00
Mateusz Guzik
8230d29357 vfs: support negative entry promotion in lockless lookup
Tested by:	pho
2020-07-30 15:44:10 +00:00
Mateusz Guzik
4057e3eaaa vfs: add NOMACCHECK and AUDITVNODE2 to lockless lookup
They are both nops since lookup does not progress with either mac or audit enabled.

Tested by:	pho
2020-07-30 15:43:16 +00:00
Mateusz Guzik
9dbd12fb52 vfs: add support for !LOCKLEAF to lockless lookup
Tested by:      pho (in a patchset)
Differential Revision:	https://reviews.freebsd.org/D23916
2020-07-25 10:40:38 +00:00
Mateusz Guzik
c42b77e694 vfs: lockless lookup
Provides full scalability as long as all visited filesystems support the
lookup and terminal vnodes are different.

Inner workings are explained in the comment above cache_fplookup.

Capabilities and fd-relative lookups are not supported and will result in
immediate fallback to regular code.

Symlinks, ".." in the path, mount points without support for lockless lookup
and mismatched counters will result in an attempt to get a reference to the
directory vnode and continue in regular lookup. If this fails, the entire
operation is aborted and regular lookup starts from scratch. However, care is
taken that data is not copied again from userspace.

Sample benchmark:
incremental -j 104 bzImage on tmpfs:
before: 142.96s user 1025.63s system 4924% cpu 23.731 total
after: 147.36s user 313.40s system 3216% cpu 14.326 total

Sample microbenchmark: access calls to separate files in /tmpfs, 104 workers, ops/s:
before:   2165816
after:  151216530

Reviewed by:    kib
Tested by:      pho (in a patchset)
Differential Revision:	https://reviews.freebsd.org/D25578
2020-07-25 10:37:15 +00:00
Mateusz Guzik
29f3e5ea41 cache: make negative shrinker round robin on all lists every time
Previously it would check 4, 3, 2, 1 lists. In practice by the time
it is getting called all lists have some elements and consequently
this does not result in new evictions.

Nonetheless, the code is clearer.

Tested by:	pho
2020-07-14 21:19:33 +00:00