Provide vnode in memory map info for files on tmpfs
When providing memory map information to userland, populate the vnode pointer
for tmpfs files. Set the memory mapping to appear as a vnode type, to match
FreeBSD 9 behavior.
This fixes the use of tmpfs files with the dtrace pid provider,
procstat -v, procfs, linprocfs, pmc (pmcstat), and ptrace (PT_VM_ENTRY).
Submitted by: Eric Badger <eric@badgerio.us> (initial revision)
Obtained from: Dell Inc.
PR: 198431
Always send log(9) messages to the message buffer.
It is truer to the semantics of logging for messages to *always*
go to the message buffer, where they can eventually be collected
and, in fact, be put into a log file.
This restores the behavior prior to r70239, which seems to have
changed it inadvertently.
Submitted by: Eric Badger <eric@badgerio.us>
Obtained from: Dell Inc.
Clean up some cosmetic nits in kern_umtx.c, found during recent work
in this area and by the Clang static analyzer.
Remove some dead assignments.
Fix a typo in a panic string.
Use umtx_pi_disown() instead of duplicate code.
Use an existing variable instead of curthread.
Approved by: kib (mentor until recently)
Sponsored by: Dell Inc.
kgdb uses td_oncpu to determine if a thread is running and should use
a pcb from stoppcbs[] rather than the thread's PCB. However, exited threads
retained td_oncpu from the last time they ran, and newborn threads had their
CPU fields cleared to zero during fork and thread creation since they are
in the set of fields zeroed when threads are setup. To fix, explicitly
update the CPU fields for exiting threads in sched_throw() to reflect the
switch out and reset the CPU fields for new threads in sched_fork_thread()
to NOCPU.
More accurately collect name-cache statistics in sysctl functions
sysctl_debug_hashstat_nchash() and sysctl_debug_hashstat_rawnchash().
These changes are in preparation for allowing changes in the size
of the vnode hash tables driven by increases and decreases in the
maximum number of vnodes in the system.
Reviewed by: kib@
Phabric: D2265
MFC of 287497:
Track changes to kern.maxvnodes and appropriately increase or decrease
the size of the name cache hash table (mapping file names to vnodes)
and the vnode hash table (mapping mount point and inode number to vnode).
An appropriate locking strategy is the key to changing hash table sizes
while they are in active use.
Reviewed by: kib
Tested by: Peter Holm
Differential Revision: https://reviews.freebsd.org/D2265
Fix integer truncation bug in malloc(9)
A couple of internal functions used by malloc(9) and uma truncated
a size_t down to an int. This could cause any number of issues
(e.g. indefinite sleeps, memory corruption) if any kernel
subsystem tried to allocate 2GB or more through malloc. zfs would
attempt such an allocation when run on a system with 2TB or more
of RAM.
Correct the use of an unitialized variable in sendfind_getobj()
When sendfile_getobj() is called on a DTYPE_SHM file, it never
initializes error, which is eventually returned to the caller.
Various fixes to orphan handling which also fix issues with following
forks.
283281:
Always set p_oppid when attaching to an existing process via procfs
tracing. This matches the behavior of ptrace(PT_ATTACH). Also,
the procfs detach request assumes p_oppid is always set.
283282:
Only reparent a traced process to its old parent if the tracing process is
not the old parent. Otherwise, proc_reap() will leave the zombie in place
resulting in the process' status being returned twice to its parent.
Add test cases for PT_TRACE_ME and PT_ATTACH which are fixed by
this change.
283562:
Do not allow a process to reap an orphan (a child currently being
traced by another process such as a debugger). The parent process does
need to check for matching orphan pids to avoid returning ECHILD if an
orphan has exited, but it should not return the exited status for the
child until after the debugger has detached from the orphan process
either explicitly or implicitly via wait().
Add two tests for for this case: one where the debugger is the direct
child (thus the parent has a non-empty children list) and one where
the debugger is not a direct child (so the only "child" of the parent
is the orphan).
283647:
Tweak the description of when waitpid() doesn't return any status for a
non-blocking wait to avoid the word "empty".
283836:
Consistently only use one end of the pipe in the parent and debugger
processes and do not rely on EOF due to a close() in the debugger.
284000:
Add a CHILD_REQUIRE macro similar to ATF_REQUIRE for use in child processes
of the main test process.
286158:
Clear P_TRACED before reparenting a detached process back to its
original parent. Otherwise the debugee will be set as an orphan of
the debugger.
Add tests for tracing forks via PT_FOLLOW_FORK.
If a specific timecounter has been chosen via sysctl, and a new
timecounter with higher quality registers (presumably in a module that has
just been loaded), do not undo the user's choice by switching to the new
timecounter.
Document that behavior, and also the fact that there is no way to
unregister a timecounter (and thus no way to unload a module containing
one).
Add an API for easily creating userspace threads in kernelspace.
This change refactors the existing create_thread() function to be more
generic. It replaces almost all of its arguments by a callback that can
be used to extract the thread ID and copy it out to the right place, but
also to perform additional initialization steps, such as setting the
trapframe. This also makes the difference between thr_new() and
thr_create() more clear in my opinion.
This function is going to be used by the CloudABI compatibility layer.
It looks like the OpenSolaris compatibility framework already provides a
function called thread_create(). Rename this function to
do_thread_create() and use a macro to deal with the namespacing
conflict. A similar approach is already used for thread_exit().
This is a direct commit to 10-STABLE - 11-CURRENT is not affected,
because tunables are automatically fetched there.
MFC after: ASAP
Sponsored by: The FreeBSD Foundation
Build debug version of rmlock's methods only when LOCK_DEBUG > 0.
Currently LOCK_DEBUG is always defined in sys/lock.h (0 or 1).
This means that debugging code always built. In addition the kernel
modules have always defined LOCK_DEBUG as 1. So, debugging rmlock code
is always used by kernel modules.
Make setproctitle(3) work in Capsicum capability mode. This makes
ctld(8) child processes to indicate initiator address and name in
their titles, similar to what iscsid(8) child processes do.
PR: 181352
Sponsored by: The FreeBSD Foundation
register stack bottom address, after the merge of r284956 in r285967.
Note: this is a direct commit to stable/10.
Reported and tested by: clusteradm (peter)
Sponsored by: The FreeBSD Foundation
o Revert the other functional half of r239864, i. e. the merge of r134227
from x86 to use smp_ipi_mtx spin lock not only for smp_rendezvous_cpus()
but also for the MD cache invalidation, TLB demapping and remote register
reading IPIs due to the following reasons:
- The cross-IPI SMP deadlock x86 otherwise is subject to can't happen on
sparc64. That's because on sparc64, spin locks don't disable interrupts
completely but only raise the processor interrupt level to PIL_TICK. This
means that IPIs still get delivered and direct dispatch IPIs such as the
cache invalidation etc. IPIs in question are still executed.
- In smp_rendezvous_cpus(), smp_ipi_mtx is held not only while sending an
IPI_RENDEZVOUS, but until all CPUs have processed smp_rendezvous_action().
Consequently, smp_ipi_mtx may be locked for an extended amount of time as
queued IPIs (as opposed to the direct ones) such as IPI_RENDEZVOUS are
scheduled via a soft interrupt. Moreover, given that this soft interrupt
is only delivered at PIL_RENDEZVOUS, processing of smp_rendezvous_action()
on a target may be interrupted by f. e. a tick interrupt at PIL_TICK, in
turn leading to the target in question trying to send an IPI by itself
while IPI_RENDEZVOUS isn't fully handled, yet, and, thus, resulting in a
deadlock.
o As mentioned in the commit message of r245850, on least some sun4u platforms
concurrent sending of IPIs by different CPUs is fatal. Therefore, hold the
reintroduced MD ipi_mtx also while delivering cross-traps via MI helpers,
i. e. ipi_{all_but_self,cpu,selected}().
o Akin to x86, let the last CPU to process cpu_mp_bootstrap() set smp_started
instead of the BSP in cpu_mp_unleash(). This ensures that all APs actually
are started, when smp_started is no longer 0.
o In all MD and MI IPI helpers, check for smp_started == 1 rather than for
smp_cpus > 1 or nothing at all. This avoids races during boot causing IPIs
trying to be delivered to APs that in fact aren't up and running, yet.
While at it, move setting of the cpu_ipi_{selected,single}() pointers to
the appropriate delivery functions from mp_init() to cpu_mp_start() where
it's better suited and allows to get rid of the global isjbus variable.
o Given that now concurrent IPI delivery no longer is possible, also nuke
the delays before completely disabling interrupts again in the CPU-specific
cross-trap delivery functions, previously giving other CPUs a window for
sending IPIs on their part. Actually, we now should be able to entirely get
rid of completely disabling interrupts in these functions. Such a change
needs more testing, though.
o In {s,}tick_get_timecount_mp(), make the {s,}tick variable static. While not
necessary for correctness, this avoids page faults when accessing the stack
of a foreign CPU as {s,}tick now is locked into the TLBs as part of static
kernel data. Hence, {s,}tick_get_timecount_mp() always execute as fast as
possible, avoiding jitter.
PR: 201245
If a signal is caught in pipelock, causing it to fail, pipe_direct_write
should not try to pipeunlock.
Approved by: markj (mentor)
Sponsored by: EMC / Isilon Storage Division
Ensure that locstat_nsecs() has no effect when lockstat probes are not
enabled or when the profiled lock carries the LO_NOPROFILE flag.
PR: 201642, 201517
Approved by: re (gjb)
Tested by: Jason Unovitch
Use the monotonic (uptime) counter rather than time-of-day to measure
elapsed time between ntp_adjtime() clock offset adjustments. This
eliminates spurious frequency steering after a large clock step (such
as a 1970->2015 step on a system with no battery-backed clock hardware).
This problem was discovered after the import of ntpd 4.2.8, which does
things in a slightly different (but still correct) order than the 4.2.4
we had previously. In particular, 4.2.4 would step the clock then
immediately after use ntp_adjtime() to set the frequency and offset to
zero, which captured the post-step time-of-day as a side effect. In
4.2.8, ntpd sets frequency and offset to zero before any initial clock
step, capturing the time as 1970-ish, then when it next calls
ntp_adjtime() it's with a non-zero offset measurement. This non-zero
value gets multiplied by the apparent 45-year interval, which blows up
into a completely bogus frequency steer. That gets clamped to 500ppm,
but that's still enough to make the clock drift so fast that ntpd has
to keep stepping it every few minutes to compensate.
Approved by: re (gjb)
Allow passthrough devices to be hinted.
MFC r279683:
When ICW1 is issued the edge sense circuit is reset which means that
following an initialization a low-to-high transistion is necesary to
generate an interrupt.
MFC r279925:
Add -p parameter to list PCI device to pass through to the guest.
MFC r281559:
Fix handling of BUS_PROBE_NOWILDCARD in 'device_probe_child()'.
MFC r280447:
When fetching an instruction in non-64bit mode, consider the value of the
code segment base address.
MFC r280725:
Move legacy interrupt allocation for virtio devices to common code.
MFC r280775:
Fix the RTC device model to operate correctly in 12-hour mode.
MFC r280929:
Fix "MOVS" instruction memory to MMIO emulation.
MFC r280968:
Display instruction bytes and %rip prior to aborting due to an instruction
emulation error.
MFC r281145:
Enhance the support for Group 1 Extended opcodes for CMP, AND, OR instructions.
MFC r281542:
Initialize 'error' before use (Coverity IDs 1249748, 1249747, 1249751, 1249749)
MFC r281561:
Prior to aborting due to an ioport error, it is always interesting to see what
the guest's %rip is.
MFC r281611:
If the number of guest vcpus is less than '1' then flag it as an error.
MFC r281612:
Prefer 'vcpu_should_yield()' over checking 'curthread->td_flags' directly.
MFC r281630:
Relax the check on which vectors can be delivered through the APIC. According
to the Intel SDM vectors 16 through 255 are allowed to be delivered via the
local APIC.
MFC r281879:
Missing break in switch case (Coverity ID 1292499)
MFC r281946:
Don't allow guest to modify readonly bits in the PCI config 'status' register.
MFC r281987:
STOS/STOSB/STOSW/STOSD/STOSQ instruction emulation.
MFC r282206:
Implement the century byte in the RTC.
leave it initialized at NULL.
Since the affected functions where moved from sys/kern/uipc_syscalls.c
to sys/netinet/sctp_syscalls.c it was not possible to MFC r284613.
Therefore, this is a direct commit with the corresponding changes of r284613.
Reported by: Coverity
CID: 1018058, 1018060
Add kern.racct.enable tunable and RACCT_DISABLED config option.
The point of this is to be able to add RACCT (with RACCT_DISABLED)
to GENERIC, to avoid having to rebuild the kernel to use rctl(8).
MFC r282901:
Build GENERIC with RACCT/RCTL support by default. Note that it still
needs to be enabled by adding "kern.racct.enable=1" to /boot/loader.conf.
Note those two are MFC-ed together, because the latter one changes the
name of RACCT_DISABLED option to RACCT_DEFAULT_TO_DISABLED. Should have
committed the renaming separately...
Relnotes: yes
Sponsored by: The FreeBSD Foundation
witness: don't warn about matrix inconsistencies without holding the mutex
Lock order checking is done without the witness mutex held, so multiple
threads that are racing to establish a new lock order may read matrix
entries that are in an inconsistent state. Don't print a warning in this
case, but instead just redo the check after taking the witness lock.