Previously the process terminating with SIGABRT at startup was the
only notification.
PR: 200617
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D2731
Lock order checking is done without the witness mutex held, so multiple
threads that are racing to establish a new lock order may read matrix
entries that are in an inconsistent state. Don't print a warning in this
case, but instead just redo the check after taking the witness lock.
Differential Revision: https://reviews.freebsd.org/D2713
Reviewed by: jhb
MFC after: 2 weeks
Sponsored by: EMC / Isilon Storage Division
interpreter list to avoid the problem of holding a non-sleep lock during
a page fault as reported by witness. In addition, it consistently uses
memset()/memcpy() instead of bzero()/bcopy() except in the case where
bcopy() is required (i.e. overlapping copy).
Differential Revision: https://reviews.freebsd.org/D2123
Submitted by: sson
MFC after: 2 weeks
Relnotes: Yes
logic is now placed in the mmap hook implementation rather than requiring
it to be placed in sys/vm/vm_mmap.c. This hook allows new file types to
support mmap() as well as potentially allowing mmap() for existing file
types that do not currently support any mapping.
The vm_mmap() function is now split up into two functions. A new
vm_mmap_object() function handles the "back half" of vm_mmap() and accepts
a referenced VM object to map rather than a (handle, handle_type) tuple.
vm_mmap() is now reduced to converting a (handle, handle_type) tuple to a
a VM object and then calling vm_mmap_object() to handle the actual mapping.
The vm_mmap() function remains for use by other parts of the kernel
(e.g. device drivers and exec) but now only supports mapping vnodes,
character devices, and anonymous memory.
The mmap() system call invokes vm_mmap_object() directly with a NULL object
for anonymous mappings. For mappings using a file descriptor, the
descriptors fo_mmap() hook is invoked instead. The fo_mmap() hook is
responsible for performing type-specific checks and adjustments to
arguments as well as possibly modifying mapping parameters such as flags
or the object offset. The fo_mmap() hook routines then call
vm_mmap_object() to handle the actual mapping.
The fo_mmap() hook is optional. If it is not set, then fo_mmap() will
fail with ENODEV. A fo_mmap() hook is implemented for regular files,
character devices, and shared memory objects (created via shm_open()).
While here, consistently use the VM_PROT_* constants for the vm_prot_t
type for the 'prot' variable passed to vm_mmap() and vm_mmap_object()
as well as the vm_mmap_vnode() and vm_mmap_cdev() helper routines.
Previously some places were using the mmap()-specific PROT_* constants
instead. While this happens to work because PROT_xx == VM_PROT_xx,
using VM_PROT_* is more correct.
Differential Revision: https://reviews.freebsd.org/D2658
Reviewed by: alc (glanced over), kib
MFC after: 1 month
Sponsored by: Chelsio
When providing memory map information to userland, populate the vnode pointer
for tmpfs files. Set the memory mapping to appear as a vnode type, to match
FreeBSD 9 behavior.
This fixes the use of tmpfs files with the dtrace pid provider,
procstat -v, procfs, linprocfs, pmc (pmcstat), and ptrace (PT_VM_ENTRY).
Submitted by: Eric Badger <eric@badgerio.us> (initial revision)
Obtained from: Dell Inc.
PR: 198431
MFC after: 2 weeks
Reviewed by: jhb
Approved by: kib (mentor)
Without this, if a process was being traced by truss(1), which
uses different p_stops bits than gdb(1), the latter would
misbehave because of the unexpected bits.
Reported by: jceel
Submitted by: sef
Sponsored by: iXsystems, Inc.
MFC after: 2 weeks
tdsigwakeup() increases the priority of the low-priority threads, to
give them a chance to be terminated timely. Also, kernel allows user
to signal kernel processes. The combined effect is that signalling
idle process bump a priority of the selected delivery thread, which
starts eating CPU.
Check for the delivery thread be an idle thread and do not raise its
priority then.
The signal delivery to the kernel threads must be opt-in feature.
Kernel thread should explicitely declare the ability to handle signals
directed to it. E.g., nfsd threads check for signal as an indication
of exit request.
Most threads do not handle signals at all, and queuing the signal to
them causes odd side-effects. Most innocent consequence is the memory
leak due to queued ksiginfo, which is never deleted from the sigqueue.
Code to prevent even queuing signals to the kernel threads is trivial,
but it requires careful examination of each call to kproc/kthread
creation to decide should the signalling be allowed. The commit is a
stop-gap measure which fixes the immediate case for now.
PR: 200493
Reported and tested by: trasz
Discussed with: trasz, emaste
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
buildkernel run.
Some of them were write-only under some kernel options, e.g. variables
keeping values only used by CTR() macros. It costs nothing to the
code readability and correctness to eliminate the warnings in those
cases too by removing the local cached values used only for
single-access.
Review: https://reviews.freebsd.org/D2665
Reviewed by: rodrigc
Looked at by: bjk
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Nothing stops a parallel unmount to suceed before the given call to
dounmount() checks and locks the covered vnode. Prevent dounmount()
from acting on the freed (although type-stable) memory by changing the
interface to require the mount point to be referenced. dounmount()
consumes the reference on return, regardless of the sucessfull or
erronous result.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
vn_start_secondary_write(9) functions. The flag indicates that the
caller already owns a reference on the mount point, and the functions
can consume it. The reference is released by vn_finished_write(9) and
vn_finished_secondary_write(9) in due course.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
limits in the code which is deep in the call stack, and owns several
critical system resources, like vnode locks. Attempt to wait while
the per-mount softupdate thread cleans up the backlog may deadlock,
because the thread might need to lock the same vnode which is owned by
the waiting thread.
Instead of synchronously waiting for the worker, perform the worker'
tickle and pause until the backlog is cleaned, at the safe point
during return from kernel to usermode. A new ast request to call
softdep_ast_cleanup() is created, the SU code now only checks the size
of queue and schedules ast.
There is no ast delivery for the kernel threads, so they are exempted
from the mechanism, except NFS daemon threads. NFS server loop
explicitely checks for the request, and informs the schedule_cleanup()
that it is capable of handling the requests by the process P2_AST_SU
flag. This is needed because nfsd may be the sole cause of the SU
workqueue overflow. But, to not cause nsfd to spawn additional
threads just because we slow down existing workers, only tickle su
threads, without waiting for the backlog cleanup.
Reviewed by: jhb, mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
traced by another process such as a debugger). The parent process does
need to check for matching orphan pids to avoid returning ECHILD if an
orphan has exited, but it should not return the exited status for the
child until after the debugger has detached from the orphan process
either explicitly or implicitly via wait().
Add two tests for for this case: one where the debugger is the direct
child (thus the parent has a non-empty children list) and one where
the debugger is not a direct child (so the only "child" of the parent
is the orphan).
Differential Revision: https://reviews.freebsd.org/D2644
Reviewed by: kib
MFC after: 2 weeks
In r256613, taskqueue_enqueue_locked() have been modified to release the
task queue lock before returning. In r276665, taskqueue_drain_all() will
call taskqueue_enqueue_locked() to insert the barrier task into the queue,
but did not reacquire the lock after it but later code expects the lock
still being held (e.g. TQ_SLEEP()).
The barrier task is special and if we release then reacquire the lock,
there would be a small race window where a high priority task could sneak
into the queue. Looking more closely, the race seems to be tolerable but
is undesirable from semantics standpoint.
To solve this, in taskqueue_drain_tq_queue(), instead of directly calling
taskqueue_enqueue_locked(), insert the barrier task directly without
releasing the lock.
1. Add a kern_kqueue() counterpart for kqueue() with flags parameter.
2. Be a bit secure. To avoid a double fp lookup add a kern_kevent_fp()
counterpart for kern_kevent() with file pointer parameter instead
of file descriptor an pass the buck to it.
Suggested by: mjg [2]
Differential Revision: https://reviews.freebsd.org/D1091
Reviewed by: trasz
threads split sys_sched_getparam(), sys_sched_setparam(),
sys_sched_getscheduler(), sys_sched_setscheduler() to their kern_*
counterparts and add targettd parameter to allow specify the target
thread directly by callee.
Differential Revision: https://reviews.freebsd.org/D1034
Reviewed by: trasz
threads refactor kern_sched_rr_get_interval() and sys_sched_rr_get_interval().
Add a kern_sched_rr_get_interval() counterpart which takes a targettd
parameter to allow specify target thread directly by callee (new Linuxulator).
Linuxulator temporarily uses first thread in proc.
Move linux_sched_rr_get_interval() to the MI part.
Differential Revision: https://reviews.freebsd.org/D1032
Reviewed by: trasz
threads introduce kern_thr_alloc() which will be used later in the
linux_clone().
Differential Revision: https://reviews.freebsd.org/D1029
Reviewed by: trasz
threads split sys_thr_exit() up into sys_thr_exit() and kern_thr_exit().
Move
Where the second will be used in linux_exit() system call later.
Differential Revision: https://reviews.freebsd.org/D1028
Reviewed by: trasz
stopping signals should obey, but also all forms of single-threading.
Otherwise, thread might sleep interruptible while owning some
resources, and single-threading thread could try to access them.
An example is owning vnode lock while dumping core.
Submitted by: Conrad Meyer
Review: https://reviews.freebsd.org/D2612
Tested by: pho
MFC after: 1 week
discrimination between different subarch binaries, at least for mips
and arm. Arm is implemented, mips is still tbd, so not currently
exported. aarch64 does not export this because aarch64 binaries use
different tags and flags than arm.
Differential Revision: https://reviews.freebsd.org/D2611
years for head. However, it is continuously misused as the mpsafe argument
for callout_init(9). Deprecate the flag and clean up callout_init() calls
to make them more consistent.
Differential Revision: https://reviews.freebsd.org/D2613
Reviewed by: jhb
MFC after: 2 weeks
The mask does not really need to be updated with atomic operations and
the downside of losing races during transitions is not great (it is
not marked volatile, so those races are pretty wide open as it is).
Differential Revision: https://reviews.freebsd.org/D2595
Reviewed by: emaste, neel, rpaulo
MFC after: 2 weeks
not the old parent. Otherwise, proc_reap() will leave the zombie in place
resulting in the process' status being returned twice to its parent.
Add test cases for PT_TRACE_ME and PT_ATTACH which are fixed by
this change.
Differential Revision: https://reviews.freebsd.org/D2594
Reviewed by: kib
MFC after: 2 weeks
until all threads sleeping on a condvar have resumed execution after being
awakened. However, there are cases where that guarantee is very hard to
provide.
The replacement started at r283088 was necessarily incomplete without
replacing boolean_t with bool. This also involved cleaning some type
mismatches and ansifying old C function declarations.
Pointed out by: bde
Discussed with: bde, ian, jhb
longer than 192 bytes will cause the version field of a dump header to
overflow. strncpy doesn't null terminate it, so savecore will print a
corrupted info file. Using strlcpy fixes the bug.
Differential Revision: https://reviews.freebsd.org/D2560
Reviewed by: markj
MFC after: 3 weeks
Sponsored by: Spectra Logic
thread awakened due to a time out, then cv_waiters was not decremented.
If INT_MAX threads timed out on a cv without an intervening cv_broadcast,
then cv_waiters could overflow. To fix this, have each sleeping thread
decrement cv_waiters when it resumes.
Note that previously cv_waiters was protected by the sleepq chain lock.
However, that lock is not held when threads resume from sleep. In
addition, the interlock is also not always reacquired after resuming
(cv_wait_unlock), nor is it always held by callers of cv_signal() or
cv_broadcast(). Instead, use atomic ops to update cv_waiters. Since
the sleepq chain lock is still held on every increment, it should
still be safe to compare cv_waiters against zero while holding the
lock in the wakeup routines as the only way the race should be lost
would result in extra calls to sleepq_signal() or sleepq_broadcast().
Differential Revision: https://reviews.freebsd.org/D2427
Reviewed by: benno
Reported by: benno (wrap of cv_waiters in the field)
MFC after: 2 weeks
particular, switch to the proc0 pmap to have expected %cr3 and PCID
for the thread0 during initialization, and the up to date pm_active
mask.
pmap_pinit0() should be done after proc0->p_vmspace is assigned so
that the amd64 pmap_activate() find the correct curproc pmap.
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
suspended thread itself, on the return path from
thread_suspend_check(). A consequence is that return from
thread_single_end(SINGLE_BOUNDARY) may leave p_boundary_count
non-zero, it might be even equal to the threads count.
Now, assume that we have two threads in the process, both calling
execve(2). Suppose that the first thread won the race to be the
suspension thread, and that afterward its exec failed for any reason.
After the first thread did thread_single_end(SINGLE_BOUNDARY), second
thread becomes the process suspension thread and checks
p_boundary_count. The non-zero value of the count allows the
suspension loop to finish without actually suspending some threads.
In other words, we enter exec code with some threads not suspended.
Fix this by decrementing p_boundary_count in the
thread_single_end()->thread_unsuspend_one() during marking the thread
as runnable. This way, a return from thread_single_end() guarantees
that the counter is cleared. We do not care whether the unsuspended
thread has a chance to run.
Add some asserts to ensure the state of the process when single
boundary suspension is lifted. Also make thread_unuspend_one()
static.
In collaboration with: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
This allows functions that retrieve and inspect pthread_attr_t objects to
work correctly: querying the cpuset_t size is part of querying CPU
affinity information, which is part of creating a complete pthread_attr_t.
Approved by: rwatson (mentor)
Reviewed by: pjd
Sponsored by: NSERC
needs to be enabled by adding "kern.racct.enable=1" to /boot/loader.conf.
Differential Revision: https://reviews.freebsd.org/D2407
Reviewed by: emaste@, wblock@
MFC after: 1 month
Relnotes: yes
Sponsored by: The FreeBSD Foundation
allocated from exec_map. If many threads try to perform execve(2) in
parallel, the exec map is exhausted and some threads sleep
uninterruptible waiting for the map space. Then, the thread which won
the race for the space allocation, cannot single-thread the process,
causing deadlock.
Reported and tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
fragmented conditions currently just wakes up the pagedaemon. The
kmem arena is significantly smaller then the total available physical
memory, which means that there are loads where kmem arena space could
be exhausted, while there is a lot of pages available still. The
woken up pagedaemon sees vm_pages_needed != 0, verifies the condition
vm_paging_needed() which is false, clears the pass and returns back to
sleep, not calling neither uma_reclaim() nor lowmem handler.
To handle low kmem arena conditions, create additional pagedaemon
thread which calls uma_reclaim() directly. The thread sleeps on the
dedicated channel and kmem_reclaim() wakes the thread in addition to
the pagedaemon.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
m_dup_pkthdr(), that uses M_COPYFLAGS mask to copy m_flags field.
If original mbuf chain has M_RDONLY flag, its copy also will have it.
Reset this flag explicitly.
MFC after: 2 weeks
The function clearning credentials on failure asserts the process is a
zombie, which is not true when fork fails.
Changing creds to NULL is unnecessary, but is still being done for
consistency with other code.
Pointy hat: mjg
Reported by: pho
interface without breaking ABI or API compatibility with existing drivers.
The existing data structures used to communicate between the kernel and
driver portions of PPS processing contain no spare/padding fields and no
flags field or other straightforward mechanism for communicating changes
in the structures or behaviors of the code. This makes it difficult to
MFC new features added to the PPS facility. ABI compatibility is
important; out-of-tree drivers in module form are known to exist. (Note
that the existing api_version field in the pps_params structure must
contain the value mandated by RFC 2783 and any RFCs that come along after.)
These changes introduce a pair of abi-version fields which are filled in
by the driver and the kernel respectively to indicate the interface
version. The driver sets its version field before calling the new
pps_init_abi() function. That lets the kernel know how much of the
pps_state structure is understood by the driver and it can avoid using
newer fields at the end of the structure that it knows about if the driver
is a lower version. The kernel fills in its version field during the init
call, letting the driver know what features and data the kernel supports.
To implement the new version information in a way that is backwards
compatible with code from before these changes, the high bit of the
lightly-used 'kcmode' field is repurposed as a flag bit that indicates the
driver is aware of the abi versioning scheme. Basically if this bit is
clear that indicates a "version 0" driver and if it is set the driver_abi
field indicates the version.
These changes also move the recently-added 'mtx' field of pps_state from
the middle to the end of the structure, and make the kernel code that uses
this field conditional on the driver being abi version 1 or higher. It
changes the only driver currently supplying the mtx field, usb_serial, to
use pps_init_abi().
Reviewed by: hselasky@