If uio_offset is past end of the object size, calculated resid is negative.
Delegate handling this case to the locked read, as any other non-trivial
situation.
PR: 253158
Reported by: Harald Schmalzbauer <bugzilla.freebsd@omnilan.de>
Tested by: cy
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
When locating the anonymous memory region for a vm_map with ASLR
enabled, we try to keep the slid base address aligned on a superpage
boundary to minimize pagetable fragmentation and maximize the potential
usage of superpage mappings. We can't (portably) do this if superpages
have been disabled by loader tunable and pagesizes[1] is 0, and it
would be less beneficial in that case anyway.
PR: 253511
Reported by: johannes@jo-t.de
MFC after: 1 week
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D28678
Currently the struct has a 4 byte padding stemming from 3 ints.
1. prio comfortably fits in short, unfortunately there is no dedicated
type for it and plumbing it throughout the codebase is not worth it
right now, instead an assert is added which covers also flags for
safety
2. lk_exslpfail can in principle exceed u_short, but the count is
already not considered reliable and it only ever gets modified
straight to 0. In other words it can be incrementing with an upper
bound of USHRT_MAX
With these in place struct lock shrinks from 48 to 40 bytes.
Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D28680
In particular, replace a note that reload through vget() is obsoleted,
with explanation why this code is required.
Reviewed by: chs, mckusick
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
When possible, relock the vnode and retry inactivation. Only vunref() is
required not to drop the vnode lock, so handle it specially by not retrying.
This is a part of the efforts to ensure that unlinked not referenced vnode
does not prevent inode from reusing.
Reviewed by: chs, mckusick
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
The current list is limited to the cases where UFS needs to handle
vput(dvp) specially. Which means VOP_CREATE(), VOP_MKDIR(), VOP_MKNOD(),
VOP_LINK(), and VOP_SYMLINK().
Reviewed by: chs, mkcusick
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
The VOP is intended to be used in situations where VFS has two
referenced locked vnodes, typically a directory vnode dvp and a vnode
vp that is linked from the directory, and at least dvp is vput(9)ed.
The child vnode can be also vput-ed, but optionally left referenced and
locked.
There, at least UFS may need to do some actions with dvp which cannot be
done while vp is also locked, so its lock might be dropped temporary.
For instance, in some cases UFS needs to sync dvp to avoid filesystem
state that is currently not handled by either kernel nor fsck. Having
such VOP provides the neccessary context for filesystem which can do
correct locking and handle potential reclamation of vp after relock.
Trivial implementation does vput(dvp) and optionally vput(vp).
Reviewed by: chs, mckusick
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Most future operations on the returned file descriptor will fail
anyway, and application should be ready to handle that failures. Not
forcing it to understand the transient failure mode on open, which is
implementation-specific, should make us less special without loss of
reporting of errors.
Suggested by: chs
Reviewed by: chs, mckusick
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
and only call buf_complete() if previously started. Some error paths,
like CoW failire, might skip buf_start() and do bufdone(), which itself
call buf_complete().
Various SU handle_written_XXX() functions check that io was started
and incomplete parts of the buffer data reverted before restoring them.
This is a useful invariant that B_IO_STARTED on buffer layer allows to
keep instead of changing check and panic into check and return.
Reported by: pho
Reviewed by: chs, mckusick
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundations
Protocol attachment has historically been able to observe and modify
so->so_options as needed, and it still can for newly created sockets.
779f106aa1 moved this to after pru_attach() when we re-acquire the
lock on the listening socket.
Restore the historical behavior so that pru_attach implementations can
consistently use it. Note that some pru_attach() do currently rely on
this, though that may change in the future. D28265 contains a change to
remove the use in TCP and IB/SDP bits, as resetting the requested linger
time on incoming connections seems questionable at best.
This does move the assignment out from under the head's listen lock, but
glebius notes that head won't be going away and applications cannot
assume any specific ordering with a race between a connection coming in
and the application changing socket options anyways.
Discussed-with: glebius
MFC-after: 1 week
Historically receive buffer overflows have been ignored and programs
could not tell if they missed messages or messages had been truncated
because of overflows. Since programs historically do not expect to get
receive overflow errors, this behavior is not the default.
This is really really important for programs that use route(4) to keep in sync
with the system. If we loose a message then we need to reload the full system
state, otherwise the behaviour from that point is undefined and can lead
to chasing bogus bug reports.
This makes it a bit more straightforward to add new counters when
debugging. No functional change intended.
Reviewed by: jhb
Sponsored by: Ampere Computing
Submitted by: Klara, Inc.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28498
Examples of inconsistencies with the current state:
- references LRU of all entries, removed years ago
- references a non-existent lock (neglist)
- claims negative entries have a NULL target
It will be replaced with a more accurate and more informative
description.
In the meantime take it out so it stops misleading.
kern.proc.proc_td returns the process table with an entry for each
thread. Previously the description included "no threads", presumably
a cut-and-pasteo in 2648efa621.
Description suggested by PauAmma.
PR: 253146
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Originally IFCAP_NOMAP meant that the mbuf has external storage pointer
that points to unmapped address. Then, this was extended to array of
such pointers. Then, such mbufs were augmented with header/trailer.
Basically, extended mbufs are extended, and set of features is subject
to change. The new name should be generic enough to avoid further
renaming.
This follows select by eleminating the use of filedesc lock.
This is a win for single-threaded processes and a mixed bag for others
as at small concurrency it is faster to take the lock instead of
refing/unrefing each file descriptor.
Nonetheless, removal of shared lock usage is a step towards a
mtx-protected fd table.
Since most select users are single-threaded this avoid a lot of work
in the common case.
For example select of 16 fds (ops/s):
before: 2114536
after: 2991010
This can be used by single-threaded processes which don't share a file
descriptor table to access their file objects without having to
reference them.
For example select consumers tend to match the requirement and have
several file descriptors to inspect.
It's possible when adding a jail that its dying parent comes back to
life. Only allow that to happen when JAIL_DYING is specified. And if
it does happen, call PR_METHOD_CREATE on it.
It is possible for a buf to be reassigned between the dirty and clean
lists while gbincore_unlocked() looks in each list. Avoid creating
a buffer in that case and fallback to a locked lookup.
This fixes a regression from r363482.
More discussion on potential improvements to the clean and dirty lists
handling is in the review.
Reviewed by: cem, kib, markj, vangyzen, rlibby
Reported by: Suraj.Raju at dell.com
Submitted by: Suraj.Raju at dell.com, cem, [based on both]
MFC after: 2 weeks
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D28375
With the upcoming usage from LinuxKPI but also from drivers
ported natively we are seeing more probing of various
firmware (names).
Add the ability to firmware(9) to silence the
"firmware image loading/registering errors" by adding a new
firmware_get_flags() functions extending firmware_get() and
taking a flags argument as firmware_put() already does.
Requested-by: zeising (for future LinuxKPI/DRM)
Sponsored-by: The FreeBSD Foundation
Sponsored-by: Rubicon Communications, LLC ("Netgate")
MFC after: 3 days
Reviewed-by: markj
Differential Revision: https://reviews.freebsd.org/D27413
The code was performing an avoidable check for doomed state to account
for foo being a VDIR but turning VBAD. Now that dooming puts a vnode
in a permanent "modify" state this is no longer necessary as the final
status check will catch it.
It is best for auditing of syscalls.master if we only append to the
file. Reserving unimplemented system call numbers for local use makes
this policy and provides a large set of syscall numbers FreeBSD
derivatives can use without risk of conflict.
Reviewed by: jhb, kevans, kib
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27988
RESERVED syscall number are reserved for local/vendor use. RESERVED is
identical to UNIMPL except that comments are ignored.
Reviewed by: kevans
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27988
We have not been able to run binaries from other BSDs well over a
decade. There is no need to document their allocation decisions here.
We also don't need to reserve syscall numbers of never-implemented
syscalls.
Reviewed by: jhb, kib
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27988
Instead of resorting to seqc modification take advantage of immutability
of entries and check if the entry still matches after everything got
prepared.