Originally IFCAP_NOMAP meant that the mbuf has external storage pointer
that points to unmapped address. Then, this was extended to array of
such pointers. Then, such mbufs were augmented with header/trailer.
Basically, extended mbufs are extended, and set of features is subject
to change. The new name should be generic enough to avoid further
renaming.
This follows select by eleminating the use of filedesc lock.
This is a win for single-threaded processes and a mixed bag for others
as at small concurrency it is faster to take the lock instead of
refing/unrefing each file descriptor.
Nonetheless, removal of shared lock usage is a step towards a
mtx-protected fd table.
Since most select users are single-threaded this avoid a lot of work
in the common case.
For example select of 16 fds (ops/s):
before: 2114536
after: 2991010
This can be used by single-threaded processes which don't share a file
descriptor table to access their file objects without having to
reference them.
For example select consumers tend to match the requirement and have
several file descriptors to inspect.
It's possible when adding a jail that its dying parent comes back to
life. Only allow that to happen when JAIL_DYING is specified. And if
it does happen, call PR_METHOD_CREATE on it.
It is possible for a buf to be reassigned between the dirty and clean
lists while gbincore_unlocked() looks in each list. Avoid creating
a buffer in that case and fallback to a locked lookup.
This fixes a regression from r363482.
More discussion on potential improvements to the clean and dirty lists
handling is in the review.
Reviewed by: cem, kib, markj, vangyzen, rlibby
Reported by: Suraj.Raju at dell.com
Submitted by: Suraj.Raju at dell.com, cem, [based on both]
MFC after: 2 weeks
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D28375
With the upcoming usage from LinuxKPI but also from drivers
ported natively we are seeing more probing of various
firmware (names).
Add the ability to firmware(9) to silence the
"firmware image loading/registering errors" by adding a new
firmware_get_flags() functions extending firmware_get() and
taking a flags argument as firmware_put() already does.
Requested-by: zeising (for future LinuxKPI/DRM)
Sponsored-by: The FreeBSD Foundation
Sponsored-by: Rubicon Communications, LLC ("Netgate")
MFC after: 3 days
Reviewed-by: markj
Differential Revision: https://reviews.freebsd.org/D27413
The code was performing an avoidable check for doomed state to account
for foo being a VDIR but turning VBAD. Now that dooming puts a vnode
in a permanent "modify" state this is no longer necessary as the final
status check will catch it.
It is best for auditing of syscalls.master if we only append to the
file. Reserving unimplemented system call numbers for local use makes
this policy and provides a large set of syscall numbers FreeBSD
derivatives can use without risk of conflict.
Reviewed by: jhb, kevans, kib
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27988
RESERVED syscall number are reserved for local/vendor use. RESERVED is
identical to UNIMPL except that comments are ignored.
Reviewed by: kevans
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27988
We have not been able to run binaries from other BSDs well over a
decade. There is no need to document their allocation decisions here.
We also don't need to reserve syscall numbers of never-implemented
syscalls.
Reviewed by: jhb, kib
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27988
Instead of resorting to seqc modification take advantage of immutability
of entries and check if the entry still matches after everything got
prepared.
The change to use refcounts for pr_uref was mishandled in
prison_proc_free, so killing a jail's last process could add
an extra reference, leaving it an unkillable zombie.
to make it use the right aligned zone.
Reported by: melifaro
Reviewed by: alc, markj (previous version)
Discussed with: jrtc27
Tested by: pho (previous version)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28219
UMA page_alloc() does not take an alignment, so UMA can only handle
alignment less then page size.
Noted by: alc
Reviewed by: alc, markj (previous version)
Discussed with: jrtc27
Tested by: pho (previous version)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28219
Use refcount(9) for both pr_ref and pr_uref in struct prison. This
allows prisons to held and freed without requiring the prison mutex.
An exception to this is that dropping the last reference will still
lock the prison, to keep the guarantee that a locked prison remains
valid and alive (provided it was at the time it was locked).
Among other things, this honors the promise made in a comment in
crcopy(9), that it will not block, which hasn't been true for two
decades.
This allows slightly more efficient opcode testing in-kernel. It is
transparent to userland, except to applications that sneakily submit
aio fsync or aio mlock operations via lio_listio, which has never been
documented, requires the use of deliberately undefined constants
(LIO_SYNC and LIO_MLOCK), and is arguably a bug.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D27942
This patch helped me debug why /sbin/init was not being loaded after
making changes to the image activator in CheriBSD.
Reviewed By: jhb (earlier version), kib
Differential Revision: https://reviews.freebsd.org/D28121
- Only check for empty domains if we actually tried to configure domain
affinity in the first place. Otherwise setting bind_threads=1 will
always cause the sysctl value to be reported as zero. This is
harmless since the threads end up being bound, but it's confusing.
- Try to improve the sysctl description a bit.
Reviewed by: gallatin, jhb
Submitted by: Klara, Inc.
Sponsored by: Ampere Computing
Differential Revision: https://reviews.freebsd.org/D28161
TRYUPGRADE requests kept failing when they should not have due to wrong
macro used to count readers.
Fixes: f6b091fbbd ("lockmgr: rewrite upgrade to stop always dropping the lock")
Noted by: asomers
Differential Revision: https://reviews.freebsd.org/D27947
Previously the code would branch on top find out whether it should
branch on SDT probe and bumping the numposhits counter, depending
on cache_fplookup_cross_mount.
Arguably it should be done regardless of what said function returns.
Move prison_hold, prison_hold_locked ,prison_proc_hold, and
prison_proc_free to a more intuitive part of the file (together with
with prison_free and prison_free_locked), and add or improve comments
to these and others, to better describe what's going in the prison
reference cycle.
No functional changes.
Use a machdep.nirq tunable intead of compile-time constant NIRQ
as a value for maximum number of interrupts. It allows keep a system
footprint small by default with an option to increase the limit
for large systems like server-grade ARM64
Reviewd by: mhorne
Differential Revision: https://reviews.freebsd.org/D27844
Submitted by: Klara, Inc.
Sponsored by: Ampere Computing
prison_isvalid() checks if a prison record can be used at all, i.e.
pr_ref > 0. This filters out prisons that aren't fully created, and
those that are either in the process of being dismantled, or will be
at the next opportunity. While the check for pr_ref > 0 is simple
enough to make without a convenience function, this prepares the way
for other measures of prison validity.
prison_isalive() checks not only validity as far as the useablity of
the prison structure, but also whether the prison is visible to user
space. It replaces a test for pr_uref > 0, which is currently only
used within kern_jail.c, and not often there.
Both of these functions also assert that either the prison mutex or
allprison_lock is held, since it's generally the case that unlocked
prisons aren't guaranteed to remain useable for any length of time.
This isn't entirely true, for example a thread can assume its own
prison is good, but most exceptions will exist inside of kern_jail.c.