freebsd-skq

Author	SHA1	Message	Date
jhb	9de4cb45db	Revert r284153, as I believe it breaks the dtrace sdt module. I will fix the original issue a different way.	2015-06-08 18:06:00 +00:00
emaste	41e1b133ab	Add user facing errors for exceeding process memory limits Previously the process terminating with SIGABRT at startup was the only notification. PR: 200617 Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D2731	2015-06-08 16:07:07 +00:00
jhb	1c5eb55915	Add an internal "locked" variant of linker_file_lookup_set() and change the public function to acquire the global linker lock directly. This permits linker_file_lookup_set() to be safely used from other modules.	2015-06-08 14:06:47 +00:00
markj	575c540dc2	witness: don't warn about matrix inconsistencies without holding the mutex Lock order checking is done without the witness mutex held, so multiple threads that are racing to establish a new lock order may read matrix entries that are in an inconsistent state. Don't print a warning in this case, but instead just redo the check after taking the witness lock. Differential Revision: https://reviews.freebsd.org/D2713 Reviewed by: jhb MFC after: 2 weeks Sponsored by: EMC / Isilon Storage Division	2015-06-07 18:59:47 +00:00
sbruno	06ac0dc77f	Revert 284029, update imgact_binmisctl.c change mtx to reader count, at the request of the submitter. Will attempt to use an sx_lock for this fix to WITNESS crashes in a later revision. Submitted by: sson	2015-06-05 18:16:10 +00:00
sbruno	9b1ca91264	This change uses a reader count instead of holding the mutex for the interpreter list to avoid the problem of holding a non-sleep lock during a page fault as reported by witness. In addition, it consistently uses memset()/memcpy() instead of bzero()/bcopy() except in the case where bcopy() is required (i.e. overlapping copy). Differential Revision: https://reviews.freebsd.org/D2123 Submitted by: sson MFC after: 2 weeks Relnotes: Yes	2015-06-05 16:21:43 +00:00
jhb	bba1e1e047	Add a new file operations hook for mmap operations. File type-specific logic is now placed in the mmap hook implementation rather than requiring it to be placed in sys/vm/vm_mmap.c. This hook allows new file types to support mmap() as well as potentially allowing mmap() for existing file types that do not currently support any mapping. The vm_mmap() function is now split up into two functions. A new vm_mmap_object() function handles the "back half" of vm_mmap() and accepts a referenced VM object to map rather than a (handle, handle_type) tuple. vm_mmap() is now reduced to converting a (handle, handle_type) tuple to a a VM object and then calling vm_mmap_object() to handle the actual mapping. The vm_mmap() function remains for use by other parts of the kernel (e.g. device drivers and exec) but now only supports mapping vnodes, character devices, and anonymous memory. The mmap() system call invokes vm_mmap_object() directly with a NULL object for anonymous mappings. For mappings using a file descriptor, the descriptors fo_mmap() hook is invoked instead. The fo_mmap() hook is responsible for performing type-specific checks and adjustments to arguments as well as possibly modifying mapping parameters such as flags or the object offset. The fo_mmap() hook routines then call vm_mmap_object() to handle the actual mapping. The fo_mmap() hook is optional. If it is not set, then fo_mmap() will fail with ENODEV. A fo_mmap() hook is implemented for regular files, character devices, and shared memory objects (created via shm_open()). While here, consistently use the VM_PROT_* constants for the vm_prot_t type for the 'prot' variable passed to vm_mmap() and vm_mmap_object() as well as the vm_mmap_vnode() and vm_mmap_cdev() helper routines. Previously some places were using the mmap()-specific PROT_* constants instead. While this happens to work because PROT_xx == VM_PROT_xx, using VM_PROT_* is more correct. Differential Revision: https://reviews.freebsd.org/D2658 Reviewed by: alc (glanced over), kib MFC after: 1 month Sponsored by: Chelsio	2015-06-04 19:41:15 +00:00
vangyzen	597cee37df	Provide vnode in memory map info for files on tmpfs When providing memory map information to userland, populate the vnode pointer for tmpfs files. Set the memory mapping to appear as a vnode type, to match FreeBSD 9 behavior. This fixes the use of tmpfs files with the dtrace pid provider, procstat -v, procfs, linprocfs, pmc (pmcstat), and ptrace (PT_VM_ENTRY). Submitted by: Eric Badger <eric@badgerio.us> (initial revision) Obtained from: Dell Inc. PR: 198431 MFC after: 2 weeks Reviewed by: jhb Approved by: kib (mentor)	2015-06-02 18:37:04 +00:00
delphij	0226e94a1c	Clear p_stops when doing PT_DETACH. Without this, if a process was being traced by truss(1), which uses different p_stops bits than gdb(1), the latter would misbehave because of the unexpected bits. Reported by: jceel Submitted by: sef Sponsored by: iXsystems, Inc. MFC after: 2 weeks	2015-06-01 18:15:45 +00:00
kib	34f7242bc8	When delivering a signal with default disposition to the thread, tdsigwakeup() increases the priority of the low-priority threads, to give them a chance to be terminated timely. Also, kernel allows user to signal kernel processes. The combined effect is that signalling idle process bump a priority of the selected delivery thread, which starts eating CPU. Check for the delivery thread be an idle thread and do not raise its priority then. The signal delivery to the kernel threads must be opt-in feature. Kernel thread should explicitely declare the ability to handle signals directed to it. E.g., nfsd threads check for signal as an indication of exit request. Most threads do not handle signals at all, and queuing the signal to them causes odd side-effects. Most innocent consequence is the memory leak due to queued ksiginfo, which is never deleted from the sigqueue. Code to prevent even queuing signals to the kernel threads is trivial, but it requires careful examination of each call to kproc/kthread creation to decide should the signalling be allowed. The commit is a stop-gap measure which fixes the immediate case for now. PR: 200493 Reported and tested by: trasz Discussed with: trasz, emaste Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-05-29 16:26:08 +00:00
kib	e2f56205b5	Remove several write-only variables, all reported by the gcc 4.9 buildkernel run. Some of them were write-only under some kernel options, e.g. variables keeping values only used by CTR() macros. It costs nothing to the code readability and correctness to eliminate the warnings in those cases too by removing the local cached values used only for single-access. Review: https://reviews.freebsd.org/D2665 Reviewed by: rodrigc Looked at by: bjk Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-05-29 13:24:17 +00:00
kib	d77dbf3761	Right now, dounmount() is called with unreferenced mount point. Nothing stops a parallel unmount to suceed before the given call to dounmount() checks and locks the covered vnode. Prevent dounmount() from acting on the freed (although type-stable) memory by changing the interface to require the mount point to be referenced. dounmount() consumes the reference on return, regardless of the sucessfull or erronous result. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-05-27 09:22:50 +00:00
kib	260b7bb259	Add V_MNTREF flag to the vn_start_write(9) and vn_start_secondary_write(9) functions. The flag indicates that the caller already owns a reference on the mount point, and the functions can consume it. The reference is released by vn_finished_write(9) and vn_finished_secondary_write(9) in due course. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-05-27 09:21:47 +00:00
kib	ff588ae9b0	Currently, softupdate code detects overstepping on the workitems limits in the code which is deep in the call stack, and owns several critical system resources, like vnode locks. Attempt to wait while the per-mount softupdate thread cleans up the backlog may deadlock, because the thread might need to lock the same vnode which is owned by the waiting thread. Instead of synchronously waiting for the worker, perform the worker' tickle and pause until the backlog is cleaned, at the safe point during return from kernel to usermode. A new ast request to call softdep_ast_cleanup() is created, the SU code now only checks the size of queue and schedules ast. There is no ast delivery for the kernel threads, so they are exempted from the mechanism, except NFS daemon threads. NFS server loop explicitely checks for the request, and informs the schedule_cleanup() that it is capable of handling the requests by the process P2_AST_SU flag. This is needed because nfsd may be the sole cause of the SU workqueue overflow. But, to not cause nsfd to spawn additional threads just because we slow down existing workers, only tickle su threads, without waiting for the backlog cleanup. Reviewed by: jhb, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-05-27 09:20:42 +00:00
jhb	588ce3236a	Do not allow a process to reap an orphan (a child currently being traced by another process such as a debugger). The parent process does need to check for matching orphan pids to avoid returning ECHILD if an orphan has exited, but it should not return the exited status for the child until after the debugger has detached from the orphan process either explicitly or implicitly via wait(). Add two tests for for this case: one where the debugger is the direct child (thus the parent has a non-empty children list) and one where the debugger is not a direct child (so the only "child" of the parent is the orphan). Differential Revision: https://reviews.freebsd.org/D2644 Reviewed by: kib MFC after: 2 weeks	2015-05-26 10:29:37 +00:00
delphij	3c3588ac3e	MFuser/delphij/zfs-arc-rebase@r281754: In r256613, taskqueue_enqueue_locked() have been modified to release the task queue lock before returning. In r276665, taskqueue_drain_all() will call taskqueue_enqueue_locked() to insert the barrier task into the queue, but did not reacquire the lock after it but later code expects the lock still being held (e.g. TQ_SLEEP()). The barrier task is special and if we release then reacquire the lock, there would be a small race window where a high priority task could sneak into the queue. Looking more closely, the race seems to be tolerable but is undesirable from semantics standpoint. To solve this, in taskqueue_drain_tq_queue(), instead of directly calling taskqueue_enqueue_locked(), insert the barrier task directly without releasing the lock.	2015-05-26 01:40:33 +00:00
jhb	01eefee0f3	Add KTR tracing for some MI ptrace events. Differential Revision: https://reviews.freebsd.org/D2643 Reviewed by: kib	2015-05-25 22:13:22 +00:00
dchagin	e7fb40a8a0	For future use in the Linuxulator: 1. Add a kern_kqueue() counterpart for kqueue() with flags parameter. 2. Be a bit secure. To avoid a double fp lookup add a kern_kevent_fp() counterpart for kern_kevent() with file pointer parameter instead of file descriptor an pass the buck to it. Suggested by: mjg [2] Differential Revision: https://reviews.freebsd.org/D1091 Reviewed by: trasz	2015-05-24 16:36:29 +00:00
dchagin	ca0fda4077	In preparation for switching linuxulator to the use the native 1:1 threads add a hook for cleaning thread resources before the thread die. Differential Revision: https://reviews.freebsd.org/D1038	2015-05-24 14:51:29 +00:00
dchagin	b365a1e86e	In preparation for switching linuxulator to the use the native 1:1 threads split sys_sched_getparam(), sys_sched_setparam(), sys_sched_getscheduler(), sys_sched_setscheduler() to their kern_* counterparts and add targettd parameter to allow specify the target thread directly by callee. Differential Revision: https://reviews.freebsd.org/D1034 Reviewed by: trasz	2015-05-24 14:44:06 +00:00
dchagin	ca1941958a	In preparation for switching linuxulator to the use the native 1:1 threads refactor kern_sched_rr_get_interval() and sys_sched_rr_get_interval(). Add a kern_sched_rr_get_interval() counterpart which takes a targettd parameter to allow specify target thread directly by callee (new Linuxulator). Linuxulator temporarily uses first thread in proc. Move linux_sched_rr_get_interval() to the MI part. Differential Revision: https://reviews.freebsd.org/D1032 Reviewed by: trasz	2015-05-24 14:39:26 +00:00
dchagin	4d4fc642c1	In preparation for switching linuxulator to the use the native 1:1 threads introduce kern_thr_alloc() which will be used later in the linux_clone(). Differential Revision: https://reviews.freebsd.org/D1029 Reviewed by: trasz	2015-05-24 14:37:45 +00:00
dchagin	aa80851527	In preparation for switching linuxulator to the use the native 1:1 threads split sys_thr_exit() up into sys_thr_exit() and kern_thr_exit(). Move Where the second will be used in linux_exit() system call later. Differential Revision: https://reviews.freebsd.org/D1028 Reviewed by: trasz	2015-05-24 14:36:33 +00:00
kib	0638a68fde	If thread requested to not stop on non-boundary, then not only stopping signals should obey, but also all forms of single-threading. Otherwise, thread might sleep interruptible while owning some resources, and single-threading thread could try to access them. An example is owning vnode lock while dumping core. Submitted by: Conrad Meyer Review: https://reviews.freebsd.org/D2612 Tested by: pho MFC after: 1 week	2015-05-23 19:09:04 +00:00
imp	93408dd7e4	Fix typo in symbol name. It helps to hit save in all your buffers before committing.	2015-05-22 21:10:14 +00:00
imp	b8ed2d07c4	Export the eflags field from the elf header. This allows better discrimination between different subarch binaries, at least for mips and arm. Arm is implemented, mips is still tbd, so not currently exported. aarch64 does not export this because aarch64 binaries use different tags and flags than arm. Differential Revision: https://reviews.freebsd.org/D2611	2015-05-22 20:50:35 +00:00
jkim	318c4f97e6	CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks	2015-05-22 17:05:21 +00:00
jhb	4331ca7813	Expand ktr_mask to be a 64-bit unsigned integer. The mask does not really need to be updated with atomic operations and the downside of losing races during transitions is not great (it is not marked volatile, so those races are pretty wide open as it is). Differential Revision: https://reviews.freebsd.org/D2595 Reviewed by: emaste, neel, rpaulo MFC after: 2 weeks	2015-05-22 11:09:41 +00:00
jhb	5ad66fe3b2	Only reparent a traced process to its old parent if the tracing process is not the old parent. Otherwise, proc_reap() will leave the zombie in place resulting in the process' status being returned twice to its parent. Add test cases for PT_TRACE_ME and PT_ATTACH which are fixed by this change. Differential Revision: https://reviews.freebsd.org/D2594 Reviewed by: kib MFC after: 2 weeks	2015-05-22 11:04:54 +00:00
jhb	6064bc7d5f	Revert r282971. It depends on condvar consumers not destroying condvars until all threads sleeping on a condvar have resumed execution after being awakened. However, there are cases where that guarantee is very hard to provide.	2015-05-21 16:43:26 +00:00
pfg	b0d837707d	ddb: finish converting boolean values. The replacement started at r283088 was necessarily incomplete without replacing boolean_t with bool. This also involved cleaning some type mismatches and ansifying old C function declarations. Pointed out by: bde Discussed with: bde, ian, jhb	2015-05-21 15:16:18 +00:00
oshogbo	b887490023	Fix memory leak. Approved by: pjd (mentor)	2015-05-20 17:48:22 +00:00
oshogbo	69a8097d44	Style. Approved by: pjd (mentor)	2015-05-20 17:47:01 +00:00
oshogbo	bfb3720837	Always use the nv_free function. Approved by: pjd (mentor)	2015-05-20 17:44:58 +00:00
asomers	1a7b6ddd5d	Properly null-terminate strings in a kernel dump header. A version string longer than 192 bytes will cause the version field of a dump header to overflow. strncpy doesn't null terminate it, so savecore will print a corrupted info file. Using strlcpy fixes the bug. Differential Revision: https://reviews.freebsd.org/D2560 Reviewed by: markj MFC after: 3 weeks Sponsored by: Spectra Logic	2015-05-19 16:23:47 +00:00
mjg	3dafd57ac7	fd: fix imbalanced fdp unlock in F_SETLK and F_GETLK MFC after: 3 days	2015-05-18 14:27:04 +00:00
mjg	82d355a2e3	Tidy up sys_umask a little bit Consistently use saved fdp pointer as it cannot change. If it could change the code would be already incorrect. No functional changes.	2015-05-18 13:43:33 +00:00
jhb	f0fb852722	Previously, cv_waiters was only updated by cv_signal or cv_wait. If a thread awakened due to a time out, then cv_waiters was not decremented. If INT_MAX threads timed out on a cv without an intervening cv_broadcast, then cv_waiters could overflow. To fix this, have each sleeping thread decrement cv_waiters when it resumes. Note that previously cv_waiters was protected by the sleepq chain lock. However, that lock is not held when threads resume from sleep. In addition, the interlock is also not always reacquired after resuming (cv_wait_unlock), nor is it always held by callers of cv_signal() or cv_broadcast(). Instead, use atomic ops to update cv_waiters. Since the sleepq chain lock is still held on every increment, it should still be safe to compare cv_waiters against zero while holding the lock in the wakeup routines as the only way the race should be lost would result in extra calls to sleepq_signal() or sleepq_broadcast(). Differential Revision: https://reviews.freebsd.org/D2427 Reviewed by: benno Reported by: benno (wrap of cv_waiters in the field) MFC after: 2 weeks	2015-05-15 13:50:37 +00:00
kib	c3a04ab331	On amd64, make proc0 pmap initialization slightly more correct. In particular, switch to the proc0 pmap to have expected %cr3 and PCID for the thread0 during initialization, and the up to date pm_active mask. pmap_pinit0() should be done after proc0->p_vmspace is assigned so that the amd64 pmap_activate() find the correct curproc pmap. Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2015-05-15 08:30:29 +00:00
kib	bcfe60fa4d	Right now, the process' p_boundary_count counter is decremented by the suspended thread itself, on the return path from thread_suspend_check(). A consequence is that return from thread_single_end(SINGLE_BOUNDARY) may leave p_boundary_count non-zero, it might be even equal to the threads count. Now, assume that we have two threads in the process, both calling execve(2). Suppose that the first thread won the race to be the suspension thread, and that afterward its exec failed for any reason. After the first thread did thread_single_end(SINGLE_BOUNDARY), second thread becomes the process suspension thread and checks p_boundary_count. The non-zero value of the count allows the suspension loop to finish without actually suspending some threads. In other words, we enter exec code with some threads not suspended. Fix this by decrementing p_boundary_count in the thread_single_end()->thread_unsuspend_one() during marking the thread as runnable. This way, a return from thread_single_end() guarantees that the counter is cleared. We do not care whether the unsuspended thread has a chance to run. Add some asserts to ensure the state of the process when single boundary suspension is lifted. Also make thread_unuspend_one() static. In collaboration with: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-05-15 07:54:31 +00:00
jonathan	1288a9c619	Allow sizeof(cpuset_t) to be queried in capability mode. This allows functions that retrieve and inspect pthread_attr_t objects to work correctly: querying the cpuset_t size is part of querying CPU affinity information, which is part of creating a complete pthread_attr_t. Approved by: rwatson (mentor) Reviewed by: pjd Sponsored by: NSERC	2015-05-14 15:14:03 +00:00
trasz	82bbee8b66	Build GENERIC with RACCT/RCTL support by default. Note that it still needs to be enabled by adding "kern.racct.enable=1" to /boot/loader.conf. Differential Revision: https://reviews.freebsd.org/D2407 Reviewed by: emaste@, wblock@ MFC after: 1 month Relnotes: yes Sponsored by: The FreeBSD Foundation	2015-05-14 14:03:55 +00:00
kib	f371322983	On exec, single-threading must be enforced before arguments space is allocated from exec_map. If many threads try to perform execve(2) in parallel, the exec map is exhausted and some threads sleep uninterruptible waiting for the map space. Then, the thread which won the race for the space allocation, cannot single-thread the process, causing deadlock. Reported and tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-05-10 09:00:40 +00:00
kib	71cf7d735d	The vmem callback to reclaim kmem arena address space on low or fragmented conditions currently just wakes up the pagedaemon. The kmem arena is significantly smaller then the total available physical memory, which means that there are loads where kmem arena space could be exhausted, while there is a lot of pages available still. The woken up pagedaemon sees vm_pages_needed != 0, verifies the condition vm_paging_needed() which is false, clears the pass and returns back to sleep, not calling neither uma_reclaim() nor lowmem handler. To handle low kmem arena conditions, create additional pagedaemon thread which calls uma_reclaim() directly. The thread sleeps on the dedicated channel and kmem_reclaim() wakes the thread in addition to the pagedaemon. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-05-09 20:08:36 +00:00
kib	e0b2902247	Do not return from thread_single(SINGLE_BOUNDARY) until all stopped thread are guarenteed to be removed from the processors. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-05-09 18:32:13 +00:00
ae	1ea3701ab5	m_dup() is supposed to give a writable copy of an mbuf chain. It uses m_dup_pkthdr(), that uses M_COPYFLAGS mask to copy m_flags field. If original mbuf chain has M_RDONLY flag, its copy also will have it. Reset this flag explicitly. MFC after: 2 weeks	2015-05-07 18:35:01 +00:00
mjg	14fd588310	Fix up panics when fork fails due to hitting proc limit The function clearning credentials on failure asserts the process is a zombie, which is not true when fork fails. Changing creds to NULL is unnecessary, but is still being done for consistency with other code. Pointy hat: mjg Reported by: pho	2015-05-06 21:03:19 +00:00
ian	76cf7d7e54	Implement a mechanism for making changes in the kernel<->driver PPS interface without breaking ABI or API compatibility with existing drivers. The existing data structures used to communicate between the kernel and driver portions of PPS processing contain no spare/padding fields and no flags field or other straightforward mechanism for communicating changes in the structures or behaviors of the code. This makes it difficult to MFC new features added to the PPS facility. ABI compatibility is important; out-of-tree drivers in module form are known to exist. (Note that the existing api_version field in the pps_params structure must contain the value mandated by RFC 2783 and any RFCs that come along after.) These changes introduce a pair of abi-version fields which are filled in by the driver and the kernel respectively to indicate the interface version. The driver sets its version field before calling the new pps_init_abi() function. That lets the kernel know how much of the pps_state structure is understood by the driver and it can avoid using newer fields at the end of the structure that it knows about if the driver is a lower version. The kernel fills in its version field during the init call, letting the driver know what features and data the kernel supports. To implement the new version information in a way that is backwards compatible with code from before these changes, the high bit of the lightly-used 'kcmode' field is repurposed as a flag bit that indicates the driver is aware of the abi versioning scheme. Basically if this bit is clear that indicates a "version 0" driver and if it is set the driver_abi field indicates the version. These changes also move the recently-added 'mtx' field of pps_state from the middle to the end of the structure, and make the kernel code that uses this field conditional on the driver being abi version 1 or higher. It changes the only driver currently supplying the mtx field, usb_serial, to use pps_init_abi(). Reviewed by: hselasky@	2015-05-04 17:59:39 +00:00
oshogbo	ffeaddc43c	nv_malloc can fail in userland. Add check to prevent a NULL pointer dereference. Pointed out by: mjg Approved by: pjd (mentor)	2015-05-02 18:12:34 +00:00
oshogbo	2cc8650d9b	Remove duplicated code using macro template for the nvlist_add_.* functions. Approved by: pjd (mentor)	2015-05-02 18:10:45 +00:00

1 2 3 4 5 ...

14303 Commits