Commit Graph

18161 Commits

Author SHA1 Message Date
Konstantin Belousov
c61fae1475 pgcache read: protect against reads past end of the vm object size
If uio_offset is past end of the object size, calculated resid is negative.
Delegate handling this case to the locked read, as any other non-trivial
situation.

PR:	253158
Reported by:	Harald Schmalzbauer <bugzilla.freebsd@omnilan.de>
Tested by:	cy
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-02-16 07:09:37 +02:00
Alex Richardson
0482d7c9e9 Fix fget_only_user() to return ENOTCAPABLE on a failed capsicum check
After eaad8d1303 four additional
capsicum-test tests started failing. It turns out this is because
fget_only_user() was returning EBADF on a failed capsicum check instead
of forwarding the return value of cap_check_inline() like
fget_unlocked_seq().

capsicum-test failures before this:
```
[  FAILED  ] 7 tests, listed below:
[  FAILED  ] Capability.OperationsForked
[  FAILED  ] Capability.NoBypassDAC
[  FAILED  ] Pdfork.OtherUserForked
[  FAILED  ] PipePdfork.WildcardWait
[  FAILED  ] OpenatTest.WithFlag
[  FAILED  ] ForkedOpenatTest_WithFlagInCapabilityMode._
[  FAILED  ] Select.LotsOFileDescriptorsForked
```
After:
```
[  FAILED  ] 3 tests, listed below:
[  FAILED  ] Capability.NoBypassDAC
[  FAILED  ] Pdfork.OtherUserForked
[  FAILED  ] PipePdfork.WildcardWait
```

Reviewed By:	mjg
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D28691
2021-02-15 22:55:12 +00:00
Jason A. Harmening
41032835dc Fix divide-by-zero panic when ASLR is enabled and superpages disabled
When locating the anonymous memory region for a vm_map with ASLR
enabled, we try to keep the slid base address aligned on a superpage
boundary to minimize pagetable fragmentation and maximize the potential
usage of superpage mappings.  We can't (portably) do this if superpages
have been disabled by loader tunable and pagesizes[1] is 0, and it
would be less beneficial in that case anyway.

PR:		253511
Reported by:	johannes@jo-t.de
MFC after:	1 week
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D28678
2021-02-15 10:38:04 -08:00
Mateusz Guzik
eac22dd480 lockmgr: shrink struct lock by 8 bytes on LP64
Currently the struct has a 4 byte padding stemming from 3 ints.

1. prio comfortably fits in short, unfortunately there is no dedicated
   type for it and plumbing it throughout the codebase is not worth it
   right now, instead an assert is added which covers also flags for
   safety
2. lk_exslpfail can in principle exceed u_short, but the count is
   already not considered reliable and it only ever gets modified
   straight to 0. In other words it can be incrementing with an upper
   bound of USHRT_MAX

With these in place struct lock shrinks from 48 to 40 bytes.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D28680
2021-02-15 13:57:25 +00:00
Konstantin Belousov
25c6318c79 procstat: distinguish vm map guards in procstat vm output.
Requested and reviewed by:	rwatson (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D28658
2021-02-14 03:24:58 +02:00
Konstantin Belousov
adf28ab456 fifo: minor comment and assert improvements.
In particular, replace a note that reload through vget() is obsoleted,
with explanation why this code is required.

Reviewed by:	chs, mckusick
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-02-12 03:02:22 +02:00
Konstantin Belousov
b59a8e63d6 Stop ignoring ERELOOKUP from VOP_INACTIVE()
When possible, relock the vnode and retry inactivation.  Only vunref() is
required not to drop the vnode lock, so handle it specially by not retrying.

This is a part of the efforts to ensure that unlinked not referenced vnode
does not prevent inode from reusing.

Reviewed by:	chs, mckusick
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-02-12 03:02:21 +02:00
Konstantin Belousov
3b2aa36024 Use VOP_VPUT_PAIR() for eligible VFS syscalls.
The current list is limited to the cases where UFS needs to handle
vput(dvp) specially. Which means VOP_CREATE(), VOP_MKDIR(), VOP_MKNOD(),
VOP_LINK(), and VOP_SYMLINK().

Reviewed by:	chs, mkcusick
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-02-12 03:02:20 +02:00
Konstantin Belousov
49c117c193 Add VOP_VPUT_PAIR() with trivial default implementation.
The VOP is intended to be used in situations where VFS has two
referenced locked vnodes, typically a directory vnode dvp and a vnode
vp that is linked from the directory, and at least dvp is vput(9)ed.
The child vnode can be also vput-ed, but optionally left referenced and
locked.

There, at least UFS may need to do some actions with dvp which cannot be
done while vp is also locked, so its lock might be dropped temporary.
For instance, in some cases UFS needs to sync dvp to avoid filesystem
state that is currently not handled by either kernel nor fsck. Having
such VOP provides the neccessary context for filesystem which can do
correct locking and handle potential reclamation of vp after relock.

Trivial implementation does vput(dvp) and optionally vput(vp).

Reviewed by:	chs, mckusick
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-02-12 03:02:20 +02:00
Konstantin Belousov
ee965dfa64 vn_open(): If the vnode is reclaimed during open(2), do not return error.
Most future operations on the returned file descriptor will fail
anyway, and application should be ready to handle that failures.  Not
forcing it to understand the transient failure mode on open, which is
implementation-specific, should make us less special without loss of
reporting of errors.

Suggested by: chs
Reviewed by:	chs, mckusick
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2021-02-12 03:02:20 +02:00
Konstantin Belousov
bf0db19339 buf SU hooks: track buf_start() calls with B_IOSTARTED flag
and only call buf_complete() if previously started.  Some error paths,
like CoW failire, might skip buf_start() and do bufdone(), which itself
call buf_complete().

Various SU handle_written_XXX() functions check that io was started
and incomplete parts of the buffer data reverted before restoring them.
This is a useful invariant that B_IO_STARTED on buffer layer allows to
keep instead of changing check and panic into check and return.

Reported by:	pho
Reviewed by:	chs, mckusick
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundations
2021-02-12 03:02:19 +02:00
Mateusz Guzik
39e0c3f686 cache: assorted comment fixups 2021-02-09 17:09:44 +01:00
Kyle Evans
504ebd612e kern: sonewconn: set so_options before pru_attach()
Protocol attachment has historically been able to observe and modify
so->so_options as needed, and it still can for newly created sockets.
779f106aa1 moved this to after pru_attach() when we re-acquire the
lock on the listening socket.

Restore the historical behavior so that pru_attach implementations can
consistently use it. Note that some pru_attach() do currently rely on
this, though that may change in the future. D28265 contains a change to
remove the use in TCP and IB/SDP bits, as resetting the requested linger
time on incoming connections seems questionable at best.

This does move the assignment out from under the head's listen lock, but
glebius notes that head won't be going away and applications cannot
assume any specific ordering with a race between a connection coming in
and the application changing socket options anyways.

Discussed-with:	glebius
MFC-after:	1 week
2021-02-08 21:44:43 -06:00
Alexander V. Chernikov
924d1c9a05 Revert "SO_RERROR indicates that receive buffer overflows should be handled as errors."
Wrong version of the change was pushed inadvertenly.

This reverts commit 4a01b854ca.
2021-02-08 22:32:32 +00:00
Alexander V. Chernikov
4a01b854ca SO_RERROR indicates that receive buffer overflows should be handled as errors.
Historically receive buffer overflows have been ignored and programs
could not tell if they missed messages or messages had been truncated
because of overflows. Since programs historically do not expect to get
receive overflow errors, this behavior is not the default.

This is really really important for programs that use route(4) to keep in sync
with the system. If we loose a message then we need to reload the full system
state, otherwise the behaviour from that point is undefined and can lead
to chasing bogus bug reports.
2021-02-08 21:42:20 +00:00
Mark Johnston
b5aa9ad43a ktls: Make configuration sysctls available as tunables
Reviewed by:	gallatin, jhb
Sponsored by:	Ampere Computing
Submitted by:	Klara, Inc.
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D28499
2021-02-08 09:19:02 -05:00
Mark Johnston
1755b2b989 ktls: Use COUNTER_U64_DEFINE_EARLY
This makes it a bit more straightforward to add new counters when
debugging.  No functional change intended.

Reviewed by:	jhb
Sponsored by:	Ampere Computing
Submitted by:	Klara, Inc.
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D28498
2021-02-08 09:18:51 -05:00
Mateusz Guzik
2f8a844635 cache: remove the largely obsolete general description
Examples of inconsistencies with the current state:
- references LRU of all entries, removed years ago
- references a non-existent lock (neglist)
- claims negative entries have a NULL target

It will be replaced with a more accurate and more informative
description.

In the meantime take it out so it stops misleading.
2021-02-06 00:28:40 +01:00
Mateusz Guzik
0e1594e60e cache: fix vfs:namecache:lookup:miss probe call sites 2021-02-06 00:28:40 +01:00
Mateusz Guzik
2e96132a7d cache: drop spurious arg from panic in cache_validate
vp is already reported when noting mismatch
2021-02-06 00:28:39 +01:00
Mateusz Guzik
b54ed778fe cache: comment on FNV 2021-02-06 00:13:57 +01:00
Ed Maste
edc374e7c4 Correct description for kern.proc.proc_td
kern.proc.proc_td returns the process table with an entry for each
thread.  Previously the description included "no threads", presumably
a cut-and-pasteo in 2648efa621.

Description suggested by PauAmma.

PR:		253146
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2021-02-02 17:00:05 -05:00
Mateusz Guzik
45456abc4c cache: fix trailing slash support in face of permission problems
Reported by:	Johan Hendriks <joh.hendriks gmail.com>
Tested by:	kevans
2021-02-02 18:13:51 +00:00
Mateusz Guzik
6f19dc2124 cache: add delayed degenerate path handling 2021-02-01 04:53:23 +00:00
Mateusz Guzik
bbfb1edd70 cache: move hash computation into the parsing loop 2021-02-01 04:36:45 +00:00
Mateusz Guzik
e027e24bfa cache: add trailing slash support
Tested by:	pho
2021-01-31 12:02:46 +00:00
Mateusz Guzik
8cbd164a17 cache: handle NOFOLLOW requests for symlinks
Tested by:	pho
2021-01-31 12:02:46 +00:00
Gleb Smirnoff
3f43ada98c Catch up with 6edfd179c8: mechanically rename IFCAP_NOMAP to IFCAP_MEXTPG.
Originally IFCAP_NOMAP meant that the mbuf has external storage pointer
that points to unmapped address.  Then, this was extended to array of
such pointers.  Then, such mbufs were augmented with header/trailer.
Basically, extended mbufs are extended, and set of features is subject
to change.  The new name should be generic enough to avoid further
renaming.
2021-01-29 11:46:24 -08:00
Mateusz Guzik
45e1f85414 poll: use fget_unlocked or fget_only_user when feasible
This follows select by eleminating the use of filedesc lock.
This is a win for single-threaded processes and a mixed bag for others
as at small concurrency it is faster to take the lock instead of
refing/unrefing each file descriptor.

Nonetheless, removal of shared lock usage is a step towards a
mtx-protected fd table.
2021-01-29 11:23:44 +00:00
Mateusz Guzik
6affe1b712 select: employ fget_only_user
Since most select users are single-threaded this avoid a lot of work
in the common case.

For example select of 16 fds (ops/s):
before:	2114536
after:	2991010
2021-01-29 11:23:44 +00:00
Mateusz Guzik
eaad8d1303 fd: add fget_only_user
This can be used by single-threaded processes which don't share a file
descriptor table to access their file objects without having to
reference them.

For example select consumers tend to match the requirement and have
several file descriptors to inspect.
2021-01-29 11:23:43 +00:00
Jamie Gritton
c050ea803e jail: Handle a parent jail when a child is added to it
It's possible when adding a jail that its dying parent comes back to
life.  Only allow that to happen when JAIL_DYING is specified.  And if
it does happen, call PR_METHOD_CREATE on it.
2021-01-28 21:51:09 -08:00
Bryan Drewery
c926114f2f Fix getblk() with GB_NOCREAT returning false-negatives.
It is possible for a buf to be reassigned between the dirty and clean
lists while gbincore_unlocked() looks in each list.  Avoid creating
a buffer in that case and fallback to a locked lookup.

This fixes a regression from r363482.

More discussion on potential improvements to the clean and dirty lists
handling is in the review.

Reviewed by:	cem, kib, markj, vangyzen, rlibby
Reported by:	Suraj.Raju at dell.com
Submitted by:	Suraj.Raju at dell.com, cem, [based on both]
MFC after:	2 weeks
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D28375
2021-01-28 11:24:24 -08:00
Mateusz Guzik
5c325977b1 cache: add missing MNT_NOSYMFOLLOW check to symlink traversal 2021-01-27 15:08:38 +00:00
Mateusz Guzik
5fc384d181 cache: fallback when encountering a mount point during .. lookup
The current abort is overzealous.
2021-01-27 16:00:31 +01:00
Bjoern A. Zeeb
6f65b50546 firmware(9): extend firmware_get() by a "no warn" flag.
With the upcoming usage from LinuxKPI but also from drivers
ported natively we are seeing more probing of various
firmware (names).

Add the ability to firmware(9) to silence the
"firmware image loading/registering errors" by adding a new
firmware_get_flags() functions extending firmware_get() and
taking a flags argument as firmware_put() already does.

Requested-by:	zeising (for future LinuxKPI/DRM)
Sponsored-by:	The FreeBSD Foundation
Sponsored-by:	Rubicon Communications, LLC ("Netgate")
MFC after:	3 days
Reviewed-by:	markj
Differential Revision:	https://reviews.freebsd.org/D27413
2021-01-27 13:51:26 +00:00
Mateusz Guzik
a098a831a1 cache: tidy up handling of foo/bar lookups where foo is not a directory
The code was performing an avoidable check for doomed state to account
for foo being a VDIR but turning VBAD. Now that dooming puts a vnode
in a permanent "modify" state this is no longer necessary as the final
status check will catch it.
2021-01-26 20:42:53 +00:00
Mateusz Guzik
a51eca7936 cache: stop referring to removing entries as invalidating them
Said use is a remnant from the old code and clashes with the NCF_INVALID
flag.
2021-01-26 20:42:53 +00:00
Brooks Davis
d89c1c461c Reserve gaps in syscall numbers for local use
It is best for auditing of syscalls.master if we only append to the
file.  Reserving unimplemented system call numbers for local use makes
this policy and provides a large set of syscall numbers FreeBSD
derivatives can use without risk of conflict.

Reviewed by:	jhb, kevans, kib
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D27988
2021-01-26 18:27:45 +00:00
Brooks Davis
119fa6ee8a syscalls.master: Add a new syscall type: RESERVED
RESERVED syscall number are reserved for local/vendor use.  RESERVED is
identical to UNIMPL except that comments are ignored.

Reviewed by:	kevans
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D27988
2021-01-26 18:27:44 +00:00
Brooks Davis
65a524b499 Remove documentation of unimplemented syscalls
We have not been able to run binaries from other BSDs well over a
decade.  There is no need to document their allocation decisions here.

We also don't need to reserve syscall numbers of never-implemented
syscalls.

Reviewed by:	jhb, kib
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D27988
2021-01-26 18:27:44 +00:00
Mateusz Guzik
6943671b48 cache: convert cache_fplookup_parse to void now that it always succeeds 2021-01-26 13:24:03 +01:00
Mateusz Guzik
e7cf562a40 cache: change ->v_cache_dd synchronisation rules
Instead of resorting to seqc modification take advantage of immutability
of entries and check if the entry still matches after everything got
prepared.
2021-01-25 22:41:13 +00:00
Mateusz Guzik
6f08427649 cache: make ->v_cache_dd accesses atomic-clean for lockless usage 2021-01-25 22:41:13 +00:00
Mateusz Guzik
6ef8fede86 cache: make ->nc_flag accesses atomic-clean for lockless usage 2021-01-25 22:41:13 +00:00
Mateusz Guzik
ffcf8f97f8 cache: store vnodes in local vars in cache_zap_locked 2021-01-25 22:41:13 +00:00
Mateusz Guzik
868643e722 cache: assorted cleanups 2021-01-25 19:45:24 +00:00
Mateusz Guzik
1c7a65adb0 cache: track calls to cache_symlink_alloc with unsupported size
While here assert on size passed to free.
2021-01-25 19:45:23 +00:00
Mateusz Guzik
02ec31bdf6 cache: add back target entry on rename 2021-01-23 18:10:16 +00:00
Mateusz Guzik
739ecbcf1c cache: add symlink support to lockless lookup
Reviewed by:	kib (previous version)
Tested by:	pho (previous version)
Differential Revision:	https://reviews.freebsd.org/D27488
2021-01-23 15:04:43 +00:00