r23081 introduced kern.dummy oid as a semi ABI compat for kern.maxsockbuf
that was moved to a new namespace. It never functioned as an alias of any
kind and was just returning 0 unconditionally, hence it was probably
provided to keep some 3rd party programmes happy about sysctl(3) not
reporting an error because of non-existing oid.
After nearly 23 years it seems reasonable to just hide it from sysctl(8)
list not to cause unnecessary confusion as for its purpose.
Reported by: Antranig Vartanian <antranigv@freebsd.am>
Reviewed by: kib (mentor)
Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D22982
Amount of changes to the original code has been intentionally minimised
to ease diffing.
The changes are mostly mechanical, with the following exceptions:
* lltable handler is now called directly based of RTF_LLINFO flag presense.
* "report" logic for updating rtm in RTM_GET/RTM_DELETE has been simplified,
fixing several potential use-after-free cases in rt_addrinfo.
* llable asserts has been replaced with error-returning, preventing kernel
crashes when lltable gw af family is invalid (root required).
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D22864
Combined with earlier nstart/nend removal it allows to remove several locks
from request path of GEOM and few other places. It would be cool if we had
more SMP-friendly statistics, but this helps too.
Sponsored by: iXsystems, Inc.
With the previous reviews, the page lock is no longer required in order
to perform queue operations on a page. It is also no longer needed in
the page queue scans. This change effectively eliminates remaining uses
of the page lock and also the false sharing caused by multiple pages
sharing a page lock.
Reviewed by: jeff
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D22885
Summary:
r356113 used an older patch, which predated the
freebsd_copyout_auxargs() addition. Fix this by using a private
powerpc_copyout_auxargs() instead, and keep it private to powerpc, not in MI
files.
Reviewed by: kib, bdragon
Differential Revision: https://reviews.freebsd.org/D22935
1. The only place in the tree which calls getnewvnode with mp == NULL does it
for vp_crossmp which will never execute this codepath. Any vnode which legally
has ->v_mount == NULL is also doomed, which once more wont execute this code.
2. Remove an assertion for v_holdcnt from production kernels. It gets taken care
of by refcount macros in debug kernels.
Any code which would want to pass NULL mp can construct a fake one instead.
Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D22722
To be used when like rmlocks, except when sleeping for readers needs to be
allowed. See the manpage for more information.
Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D22823
Summary:
As a transition aide, implement an alternative elfN_freebsd_fixup which
is called for old powerpc binaries. Similarly, add a translation to rtld to
convert old values to new ones (as expected by a new rtld).
Translation of old<->new values is incomplete, but sufficient to allow an
installworld of a new userspace from an old one when a new kernel is running.
Test Plan:
Someone needs to see how a new kernel/rtld/libc works with an old
binary. If if works we can probalby ship this. If not we probalby need
some more compat bits.
Submitted by: brooks
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D20799
srandom(9) is meaningless on SMP systems or any system with, say,
interrupts. One could never rely on random(9) to produce a reproducible
sequence of outputs on the basis of a specific srandom() seed because the
global state was shared by all kernel contexts. As such, removing it is
literally indistinguishable to random(9) consumers (as compared with
retaining it).
Mark random(9) as deprecated and slated for quick removal. This is not to
say we intend to remove all fast, non-cryptographic PRNG(s) in the kernel.
It/they just won't be random(9), as it exists today, in either name or
implementation.
Before random(9) is removed, a replacement will be provided and in-tree
consumers will be converted.
Note that despite the name, the random(9) interface does not bear any
resemblance to random(3). Instead, it is the same crummy 1988 Park-Miller
LCG used in libc rand(3).
A weak symbol here is decidedly cleaner than any #ifdef soup or relocating
kbdinit, the former leading to maintenance required on addition of any
console/keyboard drivers and the latter pushing kbd init bits away from
where they're used.
This leads to the revert of r355806; this reduces duplication in keyboard
registration and driver switch lookup and leaves us with one authoritative
source for currently registered drivers. The reduced duplication later is
nice as we have more procedure involved in keyboard setup.
keyboard_driver->flags is used to more quickly detect bogus adds/removes.
From KPI consumers' perspective, nothing changes- kbd_add_driver of an
already-registered driver will succeed, and a single kbd_delete_driver will
later remove it as expected. In contrast to historical behavior,
kbd_delete_driver on a driver registered via linker set will now actually
de-register the driver so that it may not be used -- e.g. if kbdmux's
MOD_LOAD handler fails somewhere.
Detection for already-registered drivers in kbd_add_driver has improved, as
the previous SLIST_NEXT(driver) != NULL check would not have caught a driver
that's at the tail end.
kbdinit is now called from cninit() rather than via SYSINIT so that keyboard
drivers are available as early as console drivers. This is particularly
important as cnprobe will, in both syscons and vt, attempt to do any early
configuration of keyboard drivers built-in (see: kbd_configure).
Reviewed by: imp (earlier version, pre-cninit change)
Differential Revision: https://reviews.freebsd.org/D22835
_sleep(9), wakeup(9), sleepqueue(9), et al do not dereference or modify the
channel pointers provided in any way; they are merely used as intptrs into a
dictionary structure to match waiters with wakers. Correctly annotate this
such that _sleep() and wakeup() may be used on const pointers without
invoking ugly patterns like __DECONST(). Plumb const through all of the
underlying sleepqueue bits.
No functional change.
Reviewed by: rlibby
Discussed with: kib, markj
Differential Revision: https://reviews.freebsd.org/D22914
Due to clang and LLD's tendency to use a PLT for builtins, and as they
don't have full support for EABI, we sometimes have to deal with a PLT in
.ko files in a clang-built kernel.
As such, augment the in-kernel linker to support jump table processing.
As there is no particular reason to support lazy binding in kernel modules,
only implement Secure-PLT immediate binding.
As part of these changes, add elf_cpu_parse_dynamic() to the MD API of the
in-kernel linker (except on platforms that use raw object files.)
The new function will allow MD code to act on MD tags in _DYNAMIC.
Use this new function in the PowerPC MD code to ensure BSS-PLT modules using
PLT will be rejected during insertion, and to poison the runtime resolver to
ensure we get a clear panic reason if a call is made to the resolver.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D22608
It is UB to evaluate pointer comparisons when pointers do not point within
the same object. Instead, convert the pointers to numbers and compare the
numbers.
Reported by: kib
Discussed with: rlibby
Previously just ensuring that we do not sleep when clustering for
md(4) vnode was enough. Now, with the switch of the pbuf allocator to
uma and completely broken per-subsystem pbuf limits, it might cause
unbounded sleep even for non-md(4) vnodes.
Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D22899
If the devmap entry uses the upper 32 bits they wouldn't be printed in
devmap_dump_table(). This fixes that.
Submitted by: Nicholas O'Brien <nickisobrien_gmail.com>
Sponsored by: Axiado
We by definition cannot trace the stack of such a thread. Also remove a
redundant stack_zero() call in the SIGINFO handler, the stack structure
is cleared by the MD stack_capture().
Sponsored by: The FreeBSD Foundation
Now that it is not used after schedlock changes got merged.
Note the unlock routine temporarily still checks for it on account of just using
regular spin unlock.
This is a prelude towards a general clean up.
Don't hold the scheduler lock while doing context switches. Instead we
unlock after selecting the new thread and switch within a spinlock
section leaving interrupts and preemption disabled to prevent local
concurrency. This means that mi_switch() is entered with the thread
locked but returns without. This dramatically simplifies scheduler
locking because we will not hold the schedlock while spinning on
blocked lock in switch.
This change has not been made to 4BSD but in principle it would be
more straightforward.
Discussed with: markj
Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22778
Eliminate lock recursion from turnstiles. This was simply used to avoid
tracking the top-level turnstile lock. explicitly check for it before
picking up and dropping locks.
Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22746
Do all sleepqueue post-processing in sleepq_remove_thread() so that we
do not require the thread lock after a context switch.
Reviewed by: jhb, kib
Differential Revision: https://reviews.freebsd.org/D22745
The arm kernel stack unwinder has apparently never been able to unwind when
the path of execution leads through a kernel module. There was code that
tried to handle modules by looking for the unwind data in them, but it did
so by trying to find symbols which have never existed in arm kernel
modules. That caused the unwind code to panic, and because part of panic
handling calls into the unwind code, that just created a recursion loop.
Locating the unwind data in a loaded module requires accessing the Elf
section headers to find the SHT_ARM_EXIDX section. For preloaded modules
those headers are present in a metadata blob. For dynamically loaded
modules, the headers are present only while the loading is in progress; the
memory is freed once the module is ready to use. For that reason, there is
new code in kern/link_elf.c, wrapped in #ifdef __arm__, to extract the
unwind info while the headers are loaded. The values are saved into new
fields in the linker_file structure which are also conditional on __arm__.
In arm/unwind.c there is new code to locally cache the per-module info
needed to find the unwind tables. The local cache is crafted for lockless
read access, because the unwind code often needs to run in context where
sleeping is not allowed. A large comment block describes the local cache
list, so I won't repeat it all here.
Eliminate recursion from most thread_lock consumers. Return from
sched_add() without the thread_lock held. This eliminates unnecessary
atomics and lock word loads as well as reducing the hold time for
scheduler locks. This will eventually allow for lockless remote adds.
Discussed with: kib
Reviewed by: jhb
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22626
an exclusive object lock.
Previously swap space was freed on a best effort basis when a page that
had valid swap was dirtied, thus invalidating the swap copy. This may be
done inconsistently and requires the object lock which is not always
convenient.
Instead, track when swap space is present. The first dirty is responsible
for deleting space or setting PGA_SWAP_FREE which will trigger background
scans to free the swap space.
Simplify the locking in vm_fault_dirty() now that we can reliably identify
the first dirty.
Discussed with: alc, kib, markj
Differential Revision: https://reviews.freebsd.org/D22654
exec_map_first_page(). This will also enable pagein clustering for other
interested consumers (tmpfs, md, etc).
Discussed with: alc
Approved by: kib
Differential Revision: https://reviews.freebsd.org/D22731
bits, by storing and modifying the complement of the original leaf
mask, and by avoiding some unnecessary intermediate variables in
computing the shift amounts. The logic is similar to what has recently
been committed to sys/sys/bitstring.h.
Compute better hint updates for the case when the cursor starts in
mid-leaf, and eliminates some otherwise viable solutions. Assume the
worst case, that all the eliminated offsets could have been solutions,
and you can still compute a better hint than we use now.
Eliminate some unnecessary conditional control flow.
Approved by: alc
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22666
Delay the attachment of children, when requested, until after interrutps are
running. This is often needed to allow children to run transactions on i2c or
spi busses. It's a common enough idiom that it will be useful to have its own
wrapper.
Reviewed by: ian
Differential Revision: https://reviews.freebsd.org/D21465
Allocate the callout structure on-demand from
fail_point_use_timeout_path() since most fail points do not use
timeouts.
Reviewed by: markj (earlier version), cem
Differential Revision: https://reviews.freebsd.org/D22599
s/BIT_NAND/BIT_ANDNOT/, and for CPU and DOMAINSET too. The actual
implementation is "and not" (or "but not"), i.e. A but not B.
Fortunately this does appear to be what all existing callers want.
Don't supply a NAND (not (A and B)) operation at this time.
Discussed with: jeff
Reviewed by: cem
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D22791
The simulation cannot be reproduced, so the value of using a deterministic PRNG
like random(3) is dubious. The number of repitions used in the sample isn't a
problem for the Chacha implementation of arc4random we have today. (Also, no
one actually runs this code; it was provided as an example of the work the
author did validating the implementation. It's not even test code.)
r355677 modified the NFS client so that it does lseek(SEEK_DATA/SEEK_HOLE)
for NFSv4.2, but calls vop_stdioctl() otherwise. As such, vop_stdioctl()
needs to be a global function.
Missed during the code merge for r355677.
via sys_sync(2). Minor cleanup, no functional changes.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19366
This fixes a regression after r355311. Specifically, sched_preempt()
may trigger a context switch by calling thread_lock(), since
thread_lock() calls critical_exit() in its slow path and the interrupted
thread may have already been marked for preemption. This would happen
before tdq_ipipending is cleared, blocking further preemption IPIs. The
CPU can be left in this state indefinitely if the interrupted thread
migrates.
Rename tdq_ipipending to tdq_owepreempt. Any switch satisfies a remote
preemption request, so clear tdq_owepreempt in sched_switch() instead of
sched_preempt() to avoid subtle problems of the sort described above.
Reviewed by: jeff, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22758
Both of these features are not needed by many consumers and result in avoidable
reads which in turn puts them on profiles due to cache-line ping ponging.
On top of that the current lockgmr entry point is slower than necessary
single-threaded. As an attempted clean up preparing for other changes,
provide new routines which don't support any of the aforementioned features.
With these patches in place vop_stdlock and vop_stdunlock disappear from
flamegraphs during -j 104 buildkernel.
Reviewed by: jeff (previous version)
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22665
A system call number should be at least reserved.
We do not expect an attempt to register a fixed number system call
when nothing at all is known about it.
MFC after: 3 weeks
Sponsored by: Panzura
This typedef is the same as timeout_t except that it is in the callout
namespace and header.
Use this typedef in various places of the callout implementation that
were either using the raw type or timeout_t.
While here, add <sys/callout.h> to the manpage.
Reviewed by: kib, imp
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D22751