mutexes but now are converted to epoch(9) use thread-private epoch_tracker.
Embedding tracker into ifnet(9) or ifnet derived structures creates a non
reentrable function, that will fail miserably if called simultaneously from
two different contexts.
A thread private tracker will provide a single tracker that would allow to
call these functions safely. It doesn't allow nested call, but this is not
expected from compatibility KPIs.
Reviewed by: markj
They only showed up after I redefined LOCKSTAT_ENABLED to 0.
doing_lockprof in mutex.c is a real (but harmless) bug. Should the
value be non-zero it will do checks for lock profiling which would
otherwise be skipped.
state in rwlock.c is a wart from the compiler, the value can't be
used if lock profiling is not enabled.
Sponsored by: The FreeBSD Foundation
Change the assert paths in rm, rw, and sx locks to match the lock
and unlock paths. I did this for mutexes in r306346.
Reported by: Travis Lane <tlane@isilon.com>
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
processors would benefit from avoiding a function call, but bloating
code. In fact, clang created an uninlined real function for many
object files in the network stack.
- Move epoch_private.h into subr_epoch.c. Code copied exactly, avoiding
any changes, including style(9).
- Remove private copies of critical_enter/exit.
Reviewed by: kib, jtl
Differential Revision: https://reviews.freebsd.org/D17879
Remove restrictions that prevent allocation requests to cross the
boundary between two meta nodes.
Replace the bmu_avail field in meta nodes with a bitmap that identifies
which subtrees have some free memory, and iterate over the nonempty
subtrees only in blst_meta_alloc. If free memory is scarce, this should
make searching for it faster.
Put the code for handling the next-leaf allocation in a separate
function. When taking blocks from the next leaf empties the leaf, be
sure to clear the appropriate bit in its parent, and so on, up to the
least-common ancestor of this leaf and the next.
Eliminate special terminator nodes, and rely instead on the fact that
there is a 0-bit at the end of the bitmask at the root of the tree that
will stop a meta_alloc search, or a next-leaf search, before the search
falls off the end of the tree. Make sure that the tree is big enough to
have space for that 0-bit.
Eliminate special all-free indicators. Lazy initialization of subtrees
stands in the way of having an allocation span a meta-node boundary, so
a subtree of all free blocks is not treated specially. Subtrees of
all-allocated blocks are still recognized by looking at the bitmask at
the root and finding 0.
Don't print all-allocated subtrees. Do print the bitmasks for meta
nodes, when tree-printing.
Submitted by: Doug Moore <dougm@rice.edu>
Reviewed by: alc
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D12635
Both to formally document the requirement that this not be called after the
dynamic kenv is setup, and to perhaps help static analyzers figure out
what's going on. While calling init_static_kenv this late isn't fatal, there
are some caveats that the caller should be aware of:
- Late calls are effectively a no-op, as far as default FreeBSD is
concerned, as everything will switch to searching the dynamic kenv once it's
available.
- Each of the kern_getenv calls will leak memory, as it's assumed that
these are searching static environment and allocations will not be made.
As such, this usage is not sensible and should be detected.
The vlan interfaces can be created from vnet jails, it seems, so it
sounds logical to allow pcp configuration as well.
Reviewed by: bz, hselasky (previous version)
Sponsored by: Mellanox Technologies
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D17777
Correct boneheaded assertion I added in r339501. Mea culpa.
The intent is to notice when an M_WAITOK zone allocation would fail during
netdump, not to prevent all use of mbufs during netdump.
Reviewed by: markj
X-MFC-With: r339501
Differential Revision: https://reviews.freebsd.org/D17957
This also removes a lot of #ifdefs and cleans up a warning when the
AUDIT kernel option is defined, but neither KDTRACE_HOOKS nor MAC are.
Reported and tested by: danger
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The path must have a tail which does not escape starting/topping
directory. The documentation will come shortly, see the man pages
commit message for the reason of separate commit.
Reviewed by: jilles (previous version)
Discussed with: emaste
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D17714
As dev_t is now a 64-bit integer, it requires special handling as a
system call argument. 64-bit arguments are split between two 64-bit
integers due to the way arguments are promoted to allow reuse of most
system call implementations. They must be reassembled before use.
Further, 64-bit arguments at an odd offset (counting from zero) are
padded and slid to the next slot on powerpc and mips. Fix the
non-COMPAT11 system call by adding a freebsd32_mknodat() and
appropriately padded declerations.
The COMPAT11 system calls are fully compatible with the 64-bit
implementations so remove the freebsd32_ versions.
Use uint32_t consistently as the type of the old dev_t. This matches
the old definition.
Reviewed by: kib
MFC after: 3 days
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17928
The previous code required that the return type be a single word. This
allows it to be a pointer without using a typedef.
Update the return types of break, mmap, and shmat to be void * as
declared. This only effects systrace output in-tree, but can aid in
generating system call wrappers from syscalls.master.
Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17873
Different compilation units may otherwise get a different view of the
layout of struct tty depending on whether they include opt_printf.h.
This caused a blowup in the number of types defined in the kernel's
CTF file after r339468; thanks to dim@ for bisecting down to that
revision.
PR: 232675
Reported by: dim
Reviewed by: cem (previous version)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17877
These submaps are used for mapping pipe buffers and execv() argument
strings respectively, so there's no need for such mappings to have
execute permissions.
Reported by: jhb
Reviewed by: alc, jhb, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17827
Leave ptrace(2) alone for the moment as it's defined to take a caddr_t.
Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17852
This allows us to build the ubsan code added in r340189 into the kernel
with the KUBSAN option. This will report when undefined behaviour is
detected in the currently running kernel.
As it can be large, the kernel is 65MB on arm64, loader may not be able to
load the kernel on all architectures so is disabled by default for now.
Sponsored by: DARPA, AFRL
This imports revision 1.3 of common/lib/libc/misc/ubsan.c from NetBSD, the
micro-ubsan code. It is an implementation of the Undefined Behavior
Sanitizer runtime for use with recent clang and gcc.
The uubsan code will be used in a later commit to implement kubsan to help
find undefined behavior in the kernel.
Sponsored by: DARPA, AFRL
Replace a call to DELAY(1) with a new cpu_lock_delay() KPI. Currently
cpu_lock_delay() is defined to DELAY(1) on all platforms. However,
platforms with a DELAY() implementation that uses spin locks should
implement a custom cpu_lock_delay() doesn't use locks.
Reviewed by: kib
MFC after: 3 days
We already allow to use poll(2). There is no reason to disallow ppoll(2).
PR: 232495
Submitted by: Stefan Grundmann <sg2342@googlemail.com>
Reviewed by: cem, oshogbo
MFC after: 2 weeks
In discussing D17503 "Run epoch calls sooner and more reliably" with
sbahra@ we came to the conclusion that epoch is currently misusing the
ck_epoch API. It isn't safe to do a "write side" operation (ck_epoch_call
or ck_epoch_poll) in the middle of a "read side" section. Since, by definition,
it's possible to be preempted during the middle of an EPOCH_PREEMPT
epoch the GC task might call ck_epoch_poll or another thread might call
ck_epoch_call on the same section. The right solution is ultimately to change
the way that ck_epoch works for this use case. However, as a stopgap for
12 we agreed to simply have separate records for each use case.
Tested by: pho@
MFC after: 3 days
These arguments are mostly paths handled by NAMEI*() macros which already
take const char * arguments.
This change improves the match between syscalls.master and the public
declerations of system calls.
Reviewed by: kib (prior version)
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17812
Initializing the eflags field of the map->header entry to a value with a
unique new bit set makes a few comparisons to &map->header unnecessary.
Submitted by: Doug Moore <dougm@rice.edu>
Reviewed by: alc, kib
Tested by: pho
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D14005
This will enable callers to take const paths as part of syscall
decleration improvements.
Where doing so is easy and non-distruptive carry the const through
implementations. In UFS the value is passed to an interface that must
take non-const values. In ZFS, const poisoning would touch code shared
with upstream and it's not worth adding diffs.
Bump __FreeBSD_version for external API consumers.
Reviewed by: kib (prior version)
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17805
The comment isn't stale. The check is bogus in the sense that poll(2)
does not require pollfd entries to be unique in fd space, so there is no
reason there cannot be more pollfd entries than open or even allowed
fds. The check is mostly a seatbelt against accidental misuse or
abuse. FD_SETSIZE, while usually unrelated to poll, is used as an
arbitrary floor for systems with very low kern.maxfilesperproc.
Additionally, document this possible EINVAL condition in the poll.2
manual.
No functional change.
Reviewed by: markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D17671
This is more clear and produces better results when generating function
stubs from syscalls.master.
Reviewed by: kib, emaste
Obtained from: CheribSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17784
Add a new 'debugger_on_trap' knob separate from 'debugger_on_panic'
and make the calls to kdb_trap() in MD fatal trap handlers prior to
calling panic() conditional on this new knob instead of
'debugger_on_panic'. Disable the new knob by default. Developers who
wish to recover from a fatal fault by adjusting saved register state
and retrying the faulting instruction can still do so by enabling the
new knob. However, for the more common case this makes the user
experience for panics due to a fatal fault match the user experience
for other panics, e.g. 'c' in DDB will generate a crash dump and
reboot the system rather than being stuck in an infinite loop of fatal
fault messages and DDB prompts.
Reviewed by: kib, avg
MFC after: 2 months
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D17768
This takes advantage of two recents changes to makesyscalls.sh:
r328598: Permit a range of syscall numbers for UNIMPL
r339624: Remove the need for backslashes in syscalls.master
Syscall declerations are now split across multiple lines with the
syscall name and variables each on seperate lines (with an exception for
syscalls taking no arguments.)
Reviewed by: imp
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17706
I erroneously thought that it was two 64bit platforms which use link_elf_obj.c.
PR: 228854
Reported by: ci.f.o.
MFC after: 3 days
X-MFC with: r339931
Pointyhat to: bz
we fail during module load because the pcpu or vnet module sections are
full. We did return a proper error but not leaving any indication to
the user as to what the actual problem was.
Even worse, on 12/13 currently we are seeing an unrelated error (ENOSYS
instead of ENOSPC, which gets skipped over in kern_linker.c) to be
printed which made problem diagnostics even harder.
PR: 228854
MFC after: 3 days
Remove malloc_domain(9) and most other _domain KPIs added in r327900.
The new functions allow the caller to specify a general NUMA domain
selection policy, rather than specifically requesting an allocation from
a specific domain. The latter policy tends to interact poorly with
M_WAITOK, resulting in situations where a caller is blocked indefinitely
because the specified domain is depleted. Most existing consumers of
the _domain KPIs are converted to instead use a DOMAINSET_PREF() policy,
in which we fall back to other domains to satisfy the allocation
request.
This change also defines a set of DOMAINSET_FIXED() policies, which
only permit allocations from the specified domain.
Discussed with: gallatin, jeff
Reported and tested by: pho (previous version)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17418
- In uma_prealloc(), we need to check for an empty domain before the
first allocation attempt, not after. Fix this by switching
uma_prealloc() to use a vm_domainset iterator, which addresses the
secondary issue of using a signed domain identifier in round-robin
iteration.
- Don't automatically create a page daemon for domain 0.
- In domainset_empty_vm(), recompute ds_cnt and ds_order after
excluding empty domains; otherwise we may frequently specify an empty
domain when calling in to the page allocator, wasting CPU time.
Convert DOMAINSET_PREF() policies for empty domains to round-robin.
- When freeing bootstrap pages, don't count them towards the per-domain
total page counts for now: some vm_phys segments are created before
the SRAT is parsed and are thus always identified as being in domain 0
even when they are not. Then, when bootstrap pages are freed, they
are added to a domain that we had previously thought was empty. Until
this is corrected, we simply exclude them from the per-domain page
count.
Reported and tested by: Rajesh Kumar <rajfbsd@gmail.com>
Reviewed by: gallatin
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17704
Set curthread->td_stopsched when entering kdb via any vector.
Previously, it was only set when entering via panic, so when
entering kdb another way, mutexes and such were still "live",
and an attempt to lock an already locked mutex would panic.
Reviewed by: kib, cem
Discussed with: jhb
Tested by: pho
MFC after: 2 months
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D17687
taskqgroup_detach() would remove the task even if it was running or
enqueued, which could lead to panics (see D17404). With this change,
taskqgroup_detach() drains the task and sets a new flag which prevents the
task from being scheduled again.
I've added grouptask_block() and grouptask_unblock() to allow control
over the flag from other locations as well.
Reviewed by: Jeffrey Pieper <jeffrey.e.pieper@intel.com>
MFC after: 1 week
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D17674
Some of the poll code used 'fds' and some used 'ufds' to refer to the
uap->fds userspace pointer that was passed around to subroutines. Some of
the poll code used 'fds' to refer to the kernel memory pollfd arrays, which
seemed unnecessarily confusing.
Unify on 'ufds' to refer to the userspace pollfd array.
Additionally, 'bits' is not an accurate description of the kernel pollfd
array in kern_poll, so rename that to 'kfds'. Finally, clean up some logic
with mallocarray() and nitems().
No functional change.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D17670
ioctl(2) commands only have meaning in the context of a file descriptor
so translating them in the syscall layer is incorrect.
The new handler users an accessor to retrieve/construct a pointer from
the last member of the passed structure and relies on type punning to
access the other member which requires no translation.
Unlike r339174 this change supports both places FIODGNAME is handled.
Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17475
Flags prevent open(2) and *at(2) vfs syscalls name lookup from
escaping the starting directory. Supposedly the interface is similar
to the same proposed Linux flags.
Reviewed by: jilles (code, previous version of manpages), 0mp (manpages)
Discussed with: allanjude, emaste, jonathan
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D17547