251 Commits

Author SHA1 Message Date
Konstantin Belousov
28cd3a673e O_RELATIVE_BENEATH: return ENOTCAPABLE instead of EINVAL for abs path
Requested and reviewed by:	markj
Tested by:	arichardson,  pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D28907
2021-03-02 20:21:40 +02:00
Konstantin Belousov
49c98a4bf3 nameicap_check_dotdot: trim tracker on check
Tracker should contain exactly the path from the starting directory to
the current lookup point. Otherwise we might not detect some cases of
dotdot escape. Consequently, if we are walking up the tree by dotdot
lookup, we must remove an entries below the walked directory.

Reviewed by:	markj
Tested by:	arichardson, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D28907
2021-03-02 20:21:35 +02:00
Konstantin Belousov
e8a2862aa0 Add nameicap_cleanup_from(), to clean tracker list starting from some element
Reviewed by:	markj
Tested by:	arichardson, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D28907
2021-03-02 20:21:30 +02:00
Konstantin Belousov
2388ad7c29 nameicap_tracker_add: avoid duplicates in the tracker list
Reviewed by:	markj
Tested by:	arichardson, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D28907
2021-03-02 20:21:23 +02:00
Konstantin Belousov
59e7494281 Do not call nameicap_tracker_add() for dotdot case.
Reviewed by:	markj
Tested by:	arichardson, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D28907
2021-03-02 20:21:14 +02:00
Konstantin Belousov
20e91ca36a open(2): Remove O_BENEATH and AT_BENEATH
with the reasoning that the flags did not worked properly, and were not
shipped in a release.

O_RESOLVE_BENEATH is kept as useful.

Reviewed by:	markj
Tested by:	arichardson, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D28907
2021-03-02 20:16:55 +02:00
Mateusz Guzik
739ecbcf1c cache: add symlink support to lockless lookup
Reviewed by:	kib (previous version)
Tested by:	pho (previous version)
Differential Revision:	https://reviews.freebsd.org/D27488
2021-01-23 15:04:43 +00:00
Mateusz Guzik
70ba77706d vfs: extend vfs:namei:lookup:return probe with nameidata 2021-01-12 13:35:27 +00:00
Mateusz Guzik
cdb62ab74e vfs: add NDFREE_NOTHING and convert several NDFREE_PNBUF callers
Check the comment above the routine for reasoning.
2021-01-12 13:16:10 +00:00
Mateusz Guzik
002e18eb7f vfs: add FAILIFEXISTS flag
Both FreeBSD and Linux mkdir -p walk the tree up ignoring any EEXIST on
the way and both are used a lot when building respective kernels.

This poses a problem as spurious locking avoidably interferes with
concurrent operations like getdirentries on affected directories.

Work around the problem by adding FAILIFEXISTS flag. In case of lockless
lookup this manages to avoid any work to begin with, there is no speed
up for the locked case but perhaps this can be augmented later on.

For simplicity the only supported semantics are as used by mkdir.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D27789
2020-12-28 01:53:27 +00:00
Mateusz Guzik
8fcfd0e222 vfs: add cleanup on error missed in r368375
Noted by:	jrtc27
2020-12-06 19:24:38 +00:00
Mateusz Guzik
60e2a0d9a4 vfs: factor buffer allocation/copyin out of namei 2020-12-06 04:59:24 +00:00
Edward Tomasz Napierala
9c8c797c1a Remove the 'wantparent' variable, unused since r145004.
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D27193
2020-11-23 12:47:23 +00:00
Mateusz Guzik
2fbb45c601 vfs: change nt_zone into a malloc type
Elements are small in size and allocated for short periods.
2020-11-05 12:06:50 +00:00
Mateusz Guzik
62568e886a vfs: add NAMEI_DBG_HADSTARTDIR handling lost in rewrite
Noted by:	rpokala
2020-10-29 18:43:37 +00:00
Mateusz Guzik
eebc2e450f vfs: add NDREINIT to facilitate repeated namei calls
struct nameidata mixes caller arguments, internal state and output, which
can be quite error prone.

Recent addition of valdiating ni_resflags uncovered a caller which could
repeatedly call namei, effectively operating on partially populated state.

Add bare minimium validation this does not happen. The real fix would
decouple aforementioned state.

Reported by:	pho
Tested by:	pho (different variant)
2020-10-29 12:56:02 +00:00
Mateusz Guzik
d681c51d36 cache: add missing NIRES_ABS handling 2020-10-26 18:01:18 +00:00
Konstantin Belousov
4ea4966009 Do not allow to use O_BENEATH as an oracle.
Specifically, if lookup() returned any error and the topping directory
was not latched, which means that (non-existent) path did not returned
to the topping location, give ENOTCAPABLE a priority over the lookup()
error.

PR:	249960
Reviewed by:	emaste, ngie
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D26695
2020-10-08 22:31:11 +00:00
Konstantin Belousov
1317da4349 Add O_RESOLVE_BENEATH and AT_RESOLVE_BENEATH to mimic Linux' RESOLVE_BENEATH.
It is like O_BENEATH, but disables to walk out of the subtree rooted
in the starting directory. O_BENEATH does not care if path walks out
if it returned.

Requested by:	Dan Gohman <dev@sunfishcode.online>
PR:	248335
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25886
2020-09-22 22:48:12 +00:00
Konstantin Belousov
6a9c72d901 Change O_BENEATH to handle relative paths same as absolute.
Do not care if path walks out of the topping directory if it returns back.

Requested and reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25886
2020-09-22 22:43:32 +00:00
Konstantin Belousov
07e7ad2b98 Only clear latch for BENEATH when we walk out of the startdir,
not unconditionally on any dotdot component.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25886
2020-09-22 22:36:02 +00:00
Konstantin Belousov
c7de3d6f0b Add NIRES_STRICTREL.
Stop abusing internal namei flag NI_LCF_STRICTRELATIVE as indicator of
cap-restricted lookup.  Add designated returned flag NIRES_STRICTREL
to inform kern_openat() that lookup was restricted.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25886
2020-09-22 22:06:20 +00:00
Konstantin Belousov
f9e46c9bf1 lookup: Track last lookup component if it is directory.
This makes open("/a/../a", O_BENEATH) with cwd == "/a" work.

Reviewed by:	markj
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25886
2020-09-22 21:59:18 +00:00
Konstantin Belousov
44619a5e86 Improve comment above nameicap_check_dotdot().
Explain why tracker is needed at all.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25886
2020-09-22 21:54:30 +00:00
Kirk McKusick
66ac5b2c5a Add a comment to clarify when and why cached names are deleted
during pathname lookup.

Reviewed by:  kib
MFC after:    3 days
Sponsored by: Netflix
2020-08-27 22:14:58 +00:00
Mateusz Guzik
f0d9c77e52 vfs: validate ndp state after the lookup
The intent is to remove known-to-be-nops NDFREE calls after many lookups.
2020-08-23 21:06:41 +00:00
Mateusz Guzik
4b5001196a vfs: convert nameiop into an enum
While here change the field size from long to int and move it into the
gap next to cn_flags.

Shrinks struct componentname from 64 to 56 bytes on amd64.
2020-08-23 21:05:39 +00:00
Mateusz Guzik
de0fcd3a44 vfs: assert that HASBUF is only set with SAVENAME or SAVESTART
as requested by the caller. The intent is to eradicate the mostly
spurious NDFREE_PNBUF calls.
2020-08-22 16:58:59 +00:00
Mateusz Guzik
760a430bb3 vfs: add a work around for vp_crossmp bug to realpath
The actual bug is not yet addressed as it will get much easier after other
problems are addressed (most notably rename contract).

The only affected in-tree consumer is realpath. Everyone else happens to be
performing lookups within a mount point, having a side effect of ni_dvp being
set to mount point's root vnode in the worst case.

Reported by:	pho
2020-08-22 06:56:04 +00:00
Mateusz Guzik
494c0f2a83 vfs: mark HASBUF as an internal flag
There is no setter for cn_pnbuf.
2020-08-16 17:55:20 +00:00
Mateusz Guzik
b38ad2683a vfs: add missing pwd_drop on error in namei_setup
Reported by:	pho
2020-08-13 10:24:45 +00:00
Mateusz Guzik
2d0631dd08 vfs: stricter validation for flags passed to namei in cn_flags
namei de facto expects that the naimeidata object is properly initialized,
but at the same time it mixes consumer-passable and internal flags, while
tolerating this part by explicitly clearing some of them.

Tighten the interface instead.

While here renumber the flags and denote the gap between the 2 variants.

Try to piggy back th renumber on the just bumped __FreeBSD_version.
2020-08-11 01:34:40 +00:00
Mateusz Guzik
25e42ee217 vfs: drop the hello world stat probes from the vfs provider
Interested parties can get the same information by hoooking on vop_stat.
2020-08-10 18:11:00 +00:00
Mateusz Guzik
7f70080150 vfs: disallow NOCACHE with LOOKUP
This means there is no expectation lookup will purge the terminal entry,
which simplifies lockless lookup.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2020-08-10 10:33:40 +00:00
Mateusz Guzik
158ab70c24 vfs: tidy up namei entry point
- predict for string copy errors
- reshuffle inititalistion of vars which are not needed
2020-08-05 07:33:39 +00:00
Mateusz Guzik
85cf316172 vfs: inline NDINIT_ALL
The routine takes more than 6 arguments, which on amd64 means some of
them have to be passed through the stack.
2020-08-01 06:33:38 +00:00
Mateusz Guzik
14576629bb vfs: convert ni_rigthsneeded to a pointer
Shaves 8 bytes of struct nameidata on 64-bit platforms.
2020-08-01 06:33:11 +00:00
Mateusz Guzik
21c162605b vfs: make rights mandatory for NDINIT_ALL 2020-08-01 06:32:25 +00:00
Mateusz Guzik
b1f910e02c vfs: short-circuit the common case NDFREE calls
Almost all consumers use the NDF_ONLY_PNBUF macro, making them avoidably branch
a lot in the NDFREE routine. Also note most of them should not need to call
any cleanup anyway as they don't request HASBUF.
2020-07-30 15:47:41 +00:00
Mateusz Guzik
d3e63e8eb2 vfs: make sure startdir_used is always assigned to before use
CID:	1431070
2020-07-30 07:11:08 +00:00
Mateusz Guzik
c42b77e694 vfs: lockless lookup
Provides full scalability as long as all visited filesystems support the
lookup and terminal vnodes are different.

Inner workings are explained in the comment above cache_fplookup.

Capabilities and fd-relative lookups are not supported and will result in
immediate fallback to regular code.

Symlinks, ".." in the path, mount points without support for lockless lookup
and mismatched counters will result in an attempt to get a reference to the
directory vnode and continue in regular lookup. If this fails, the entire
operation is aborted and regular lookup starts from scratch. However, care is
taken that data is not copied again from userspace.

Sample benchmark:
incremental -j 104 bzImage on tmpfs:
before: 142.96s user 1025.63s system 4924% cpu 23.731 total
after: 147.36s user 313.40s system 3216% cpu 14.326 total

Sample microbenchmark: access calls to separate files in /tmpfs, 104 workers, ops/s:
before:   2165816
after:  151216530

Reviewed by:    kib
Tested by:      pho (in a patchset)
Differential Revision:	https://reviews.freebsd.org/D25578
2020-07-25 10:37:15 +00:00
Mateusz Guzik
422f38d8ea vfs: fix trivial whitespace issues which don't interefere with blame
.. even without the -w switch
2020-07-10 09:01:36 +00:00
Mateusz Guzik
2f423bce54 vfs: stop taking additional refs on root vnode during lookup
They are spurious since introduction of struct pwd, which provides them
implicitly.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23885
2020-03-01 21:54:28 +00:00
Mateusz Guzik
8d03b99b9d fd: move vnodes out of filedesc into a dedicated structure
The new structure is copy-on-write. With the assumption that path lookups are
significantly more frequent than chdirs and chrooting this is a win.

This provides stable root and jail root vnodes without the need to reference
them on lookup, which in turn means less work on globally shared structures.
Note this also happens to fix a bug where jail vnode was never referenced,
meaning subsequent access on lookup could run into use-after-free.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23884
2020-03-01 21:53:46 +00:00
Mateusz Guzik
721a81c369 vfs: stop duplicating vnode work in audit during path lookup
Duplicating the work was putting an avoidable requirement that the filedesc
lock is held across the entire operation (otherwise by the time audit reads
vnode pointers another thread in the same process can chdir somewhere else,
making audit log things using different vnode than the one which will be
used for actual lookup).

Do the obvious thing and pass down vnodes which will be used.
2020-02-21 01:44:31 +00:00
Mateusz Guzik
e126c5a3e8 vfs: use new capsicum helpers 2020-02-15 01:28:42 +00:00
Mateusz Guzik
6ebab6bad2 vfs: use mac fastpath for lookup, open, read, write, mmap 2020-02-13 22:22:55 +00:00
Kyle Evans
3d62f685d5 namei: preserve errors from fget_cap_locked
Most notably, we want to make sure we don't clobber any capabilities-related
errors. This is a regression from r357412 (O_SEARCH) that was picked up by
the capsicum tests.

PR:		243839
Reviewed by:	kib (committed form recommended by)
Tested by:	lwhsu
Differential Revision:	https://reviews.freebsd.org/D23479
2020-02-03 18:59:07 +00:00
Kyle Evans
6a5abb1ee5 Provide O_SEARCH
O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping
permissions checks on the directory itself after the initial open(). This is
close to the semantics we've historically applied for O_EXEC on a directory,
which is UB according to POSIX. Conveniently, O_SEARCH on a file is also
explicitly undefined behavior according to POSIX, so O_EXEC would be a fine
choice. The spec goes on to state that O_SEARCH and O_EXEC need not be
distinct values, but they're not defined to be the same value.

This was pointed out as an incompatibility with other systems that had made
its way into libarchive, which had assumed that O_EXEC was an alias for
O_SEARCH.

This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC
respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a
directory is checked in vn_open_vnode already, so for completeness we add a
NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not
re-check that when descending in namei.

[0] https://pubs.opengroup.org/onlinepubs/9699919799/

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23247
2020-02-02 16:34:57 +00:00
Mateusz Guzik
90f4ec3328 vfs: save on atomics on the root vnode for absolute lookups
There are 2 back-to-back atomics on the vnode, but we can check upfront if one
is sufficient. Similarly we can handle relative lookups where current working
directory == root directory.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23427
2020-02-01 06:40:35 +00:00