freebsd-skq

Author	SHA1	Message	Date
Adrian Chadd	5bbb2169d2	Un-static cpuset_which() - it's useful in other contexts, such as some CPU set operations in my upcoming NUMA work. Tested/compiled: * i386 (run) * amd64 (run) * mips (run) * mips64 (run) * armv6 (built) Sponsored by: Norse Corp, Inc.	2015-06-26 04:14:05 +00:00
Mateusz Guzik	7150ce743a	rlimit: deduplicate code in chg* functions	2015-06-25 00:15:37 +00:00
Sean Bruno	4e83b32a80	At the suggestion of jhb, replace atomic_set/clear calls with use of exclusive locks in the enable/disable interpreter path. Tested with WITNESS/INVARIANTS on and off. Reviewed by: sson davide	2015-06-24 15:52:26 +00:00
John-Mark Gurney	1977bd233a	zero this struct as it depends upon it... Reviewed by: mjg Differential Revision: https://reviews.freebsd.org/D2890	2015-06-23 18:40:20 +00:00
Konstantin Belousov	b05c401ff6	Only take previous buffer queue lock (olock) when needed for REMFREE in binsfree(). Submitted by: Conrad Meyer Sponsored by: EMC / Isilon Storage Division Review: https://reviews.freebsd.org/D2882 MFC after: 1 week	2015-06-23 06:12:14 +00:00
Sean Bruno	945afa7c25	Make imgact_binmisc_exec() static. Submitted by: kib Reviewed by: sson	2015-06-22 17:04:24 +00:00
Sean Bruno	602ec83516	Remove uneeded NULL check since malloc the malloc is now M_WAITOK Submitted by: mjg	2015-06-19 20:35:17 +00:00
Sean Bruno	e0ae213f63	Must have one of either M_WAITOK or M_NOWAIT, read the man page bruno. Submitted by: mjg	2015-06-19 19:57:39 +00:00
Sean Bruno	a7647ec444	Feedback from commit r284535 davide: imgact_binmisc_clear_entry() needs to use atomic ops to remove the enable bit. kib: M_NOWAIT is not warranted and comment is invalid.	2015-06-19 18:57:36 +00:00
Sean Bruno	5f98711d51	This change replaces the mutex with a sx lock for the interpreter list to avoid the problem of holding a non-sleep lock during a page fault as reported by witness. It also uses atomics where possible to avoid having to acquire the exclusive lock. In addition, it consistently uses memset()/memcpy() instead of bzero()/bcopy(). Differential Revision: https://reviews.freebsd.org/D1971 Submitted by: sson Reviewed by: jhb	2015-06-18 02:04:20 +00:00
Bjoern A. Zeeb	af10bf055f	Initialise pr_enforce_statfs from the "default" sysctl value and not from the compile time constant. The sysctl value is seeded from the compile time constant. MFC after: 2 weeks	2015-06-17 13:15:54 +00:00
Konstantin Belousov	1eabd96728	vfs_msync(), called from syncer vnode fsync VOP, only iterates over the active vnode list for the given mount point, with the assumption that vnodes with dirty pages are active. This is enforced by vinactive() doing vm_object_page_clean() pass over the vnode pages. The issue is, if vinactive() cannot be called during vput() due to the vnode being only shared-locked, we might end up with the dirty pages for the vnode on the free list. Such vnode is invisible to syncer, and pages are only cleaned on the vnode reactivation. In other words, the race results in the broken guarantee that user data, written through the mmap(2), is written to the disk not later than in 30 seconds after the write. Fix this by keeping the vnode which is freed but still owing inactivation, on the active list. When syncer loops find such vnode, it is deactivated and cleaned by the final vput() call. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-06-17 04:46:58 +00:00
Pedro F. Giffuni	708694f704	Use nitems() macro instead of __arraycount()	2015-06-16 20:19:00 +00:00
Mateusz Guzik	4da8456f0a	Replace struct filedesc argument in getvnode with struct thread This is is a step towards removal of spurious arguments.	2015-06-16 13:09:18 +00:00
Mateusz Guzik	9ef8328d52	fd: make rights a mandatory argument to fget_unlocked	2015-06-16 09:52:36 +00:00
Mateusz Guzik	80f3623f2f	fd: don't unnecessary copy capabilities in _fget	2015-06-16 09:08:30 +00:00
Mateusz Guzik	cedab3c72c	fd: reduce excessive zeroing on fd close fde_file as NULL is already an indicator of an unused fd. All other fields are populated when fp is installed.	2015-06-14 14:10:05 +00:00
Mateusz Guzik	ea31808c3b	fd: move out actual fp installation to _finstall Use it in fd passing functions as the first step towards fd code cleanup.	2015-06-14 14:08:52 +00:00
Jeremie Le Hen	3768a5dfb5	nit: Rename racct_alloc_resource to racct_adjust_resource. This is more accurate as the amount can be negative. MFC after: 2 weeks	2015-06-14 08:33:14 +00:00
Gleb Smirnoff	093c7f396d	Make KPI of vm_pager_get_pages() more strict: if a pager changes a page in the requested array, then it is responsible for disposition of previous page and is responsible for updating the entry in the requested array. Now consumers of KPI do not need to re-lookup the pages after call to vm_pager_get_pages(). Reviewed by: kib Sponsored by: Netflix Sponsored by: Nginx, Inc.	2015-06-12 11:32:20 +00:00
Andriy Gapon	076dd8eb2e	several lockstat improvements 0. For spin events report time spent spinning, not a loop count. While loop count is much easier and cheaper to obtain it is hard to reason about the reported numbers, espcially for adaptive locks where both spinning and sleeping can happen. So, it's better to compare apples and apples. 1. Teach lockstat about FreeBSD rw locks. This is done in part by changing the corresponding probes and in part by changing what probes lockstat should expect. 2. Teach lockstat that rw locks are adaptive and can spin on FreeBSD. 3. Report lock acquisition events for successful rw try-lock operations. 4. Teach lockstat about FreeBSD sx locks. Reporting of events for those locks completely mirrors rw locks. 5. Report spin and block events before acquisition event. This is behavior documented for the upstream, so it makes sense to stick to it. Note that because of FreeBSD adaptive lock implementations both the spin and block events may be reported for the same acquisition while the upstream reports only one of them. Differential Revision: https://reviews.freebsd.org/D2727 Reviewed by: markj MFC after: 17 days Relnotes: yes Sponsored by: ClusterHQ	2015-06-12 10:01:24 +00:00
Mateusz Guzik	3331a33a42	ussreq: use saved fdp pointer insted of td->td_proc->p_fd No functional changes.	2015-06-12 06:28:22 +00:00
Konstantin Belousov	529c97886b	Tweaks for r284178: Do not include machine/atomic.h explicitely, the header is already included by sys/systm.h. Force inlining of tc_getgen() and tc_setgen(). The functions are used more than once, which causes compilers with non-aggressive inlining policies to generate calls. Suggested by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-06-11 04:41:54 +00:00
Mateusz Guzik	21de5aea6c	Fixup the build after r284215. Submitted by: Ivan Klymenko <fidaj ukr.net> [slighly modified]	2015-06-10 12:39:01 +00:00
Mateusz Guzik	f6f6d24062	Implement lockless resource limits. Use the same scheme implemented to manage credentials. Code needing to look at process's credentials (as opposed to thred's) is provided with *_proc variants of relevant functions. Places which possibly had to take the proc lock anyway still use the proc pointer to access limits.	2015-06-10 10:48:12 +00:00
Mateusz Guzik	4ea6a9a28f	Generalised support for copy-on-write structures shared by threads. Thread credentials are maintained as follows: each thread has a pointer to creds and a reference on them. The pointer is compared with proc's creds on userspace<->kernel boundary and updated if needed. This patch introduces a counter which can be compared instead, so that more structures can use this scheme without adding more comparisons on the boundary.	2015-06-10 10:43:59 +00:00
Mateusz Guzik	3b3eb22ab6	fd: remove fdesc_mtx	2015-06-10 09:40:07 +00:00
Mateusz Guzik	153cc61b54	fd: use atomics to manage fd_refcnt and fd_holcnt This gets rid of fdesc_mtx.	2015-06-10 09:34:50 +00:00
Kenneth D. Merry	5672fac935	Add support for reading MAM attributes to camcontrol(8) and libcam(3). MAM is Medium Auxiliary Memory and is most commonly found as flash chips on tapes. This includes support for reading attributes and decoding most known attributes, but does not yet include support for writing attributes or reporting attributes in XML format. libsbuf/Makefile: Add subr_prf.c for the new sbuf_hexdump() function. This function is essentially the same function. libsbuf/Symbol.map: Add a new shared library minor version, and include the sbuf_hexdump() function. libsbuf/Version.def: Add version 1.4 of the libsbuf library. libutil/hexdump.3: Document sbuf_hexdump() alongside hexdump(3), since it is essentially the same function. camcontrol/Makefile: Add attrib.c. camcontrol/attrib.c: Implementation of READ ATTRIBUTE support for camcontrol(8). camcontrol/camcontrol.8: Document the new 'camcontrol attrib' subcommand. camcontrol/camcontrol.c: Add the new 'camcontrol attrib' subcommand. camcontrol/camcontrol.h: Add a function prototype for scsiattrib(). share/man/man9/sbuf.9: Document the existence of sbuf_hexdump() and point users to the hexdump(3) man page for more details. sys/cam/scsi/scsi_all.c: Add a table of known attributes, text descriptions and handler functions. Add a new scsi_attrib_sbuf() function along with a number of other related functions that help decode attributes. scsi_attrib_ascii_sbuf() decodes ASCII format attributes. scsi_attrib_int_sbuf() decodes binary format attributes, and will pass them off to scsi_attrib_hexdump_sbuf() if they're bigger than 8 bytes. scsi_attrib_vendser_sbuf() decodes the vendor and drive serial number attribute. scsi_attrib_volcoh_sbuf() decodes the Volume Coherency Information attribute that LTFS writes out. sys/cam/scsi/scsi_all.h: Add a number of attribute-related structure definitions and other defines. Add function prototypes for all of the functions added in scsi_all.c. sys/kern/subr_prf.c: Add a new function, sbuf_hexdump(). This is the same as the existing hexdump(9) function, except that it puts the result in an sbuf. This also changes subr_prf.c so that it can be compiled in userland for includsion in libsbuf. We should work to change this so that the kernel hexdump implementation is a wrapper around sbuf_hexdump() with a statically allocated sbuf with a drain. That will require a drain function that goes to the kernel printf() buffer that can take a non-NUL terminated string as input. That is because an sbuf isn't NUL-terminated until it is finished, and we don't want to finish it while we're still using it. We should also work to consolidate the userland hexdump and kernel hexdump implemenatations, which are currently separate. This would also mean making applications that currently link in libutil link in libsbuf. sys/sys/sbuf.h: Add the prototype for sbuf_hexdump(), and add another copy of the hexdump flag values if they aren't already defined. Ideally the flags should be defined in one place but the implemenation makes it difficult to do properly. (See above.) Sponsored by: Spectra Logic Corporation MFC after: 1 week	2015-06-09 21:39:38 +00:00
Konstantin Belousov	2c6946dca2	When updating/accessing the timehands, barriers are needed to ensure that: - th_generation update is visible after the parameters update is visible; - the read of parameters is not reordered before initial read of th_generation. On UP kernels, compiler barriers are enough. For SMP machines, CPU barriers must be used too, as was confirmed by submitter by testing on the Freescale T4240 platform with 24 PowerPC processors. Submitted by: Sebastian Huber <sebastian.huber@embedded-brains.de> MFC after: 1 week	2015-06-09 11:49:56 +00:00
John Baldwin	15c2b30155	Revert r284153, as I believe it breaks the dtrace sdt module. I will fix the original issue a different way.	2015-06-08 18:06:00 +00:00
Ed Maste	6b16d66497	Add user facing errors for exceeding process memory limits Previously the process terminating with SIGABRT at startup was the only notification. PR: 200617 Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D2731	2015-06-08 16:07:07 +00:00
John Baldwin	69c5c774fe	Add an internal "locked" variant of linker_file_lookup_set() and change the public function to acquire the global linker lock directly. This permits linker_file_lookup_set() to be safely used from other modules.	2015-06-08 14:06:47 +00:00
Mark Johnston	8b84d791a0	witness: don't warn about matrix inconsistencies without holding the mutex Lock order checking is done without the witness mutex held, so multiple threads that are racing to establish a new lock order may read matrix entries that are in an inconsistent state. Don't print a warning in this case, but instead just redo the check after taking the witness lock. Differential Revision: https://reviews.freebsd.org/D2713 Reviewed by: jhb MFC after: 2 weeks Sponsored by: EMC / Isilon Storage Division	2015-06-07 18:59:47 +00:00
Sean Bruno	280b716943	Revert 284029, update imgact_binmisctl.c change mtx to reader count, at the request of the submitter. Will attempt to use an sx_lock for this fix to WITNESS crashes in a later revision. Submitted by: sson	2015-06-05 18:16:10 +00:00
Sean Bruno	8c8613a14f	This change uses a reader count instead of holding the mutex for the interpreter list to avoid the problem of holding a non-sleep lock during a page fault as reported by witness. In addition, it consistently uses memset()/memcpy() instead of bzero()/bcopy() except in the case where bcopy() is required (i.e. overlapping copy). Differential Revision: https://reviews.freebsd.org/D2123 Submitted by: sson MFC after: 2 weeks Relnotes: Yes	2015-06-05 16:21:43 +00:00
John Baldwin	7077c42623	Add a new file operations hook for mmap operations. File type-specific logic is now placed in the mmap hook implementation rather than requiring it to be placed in sys/vm/vm_mmap.c. This hook allows new file types to support mmap() as well as potentially allowing mmap() for existing file types that do not currently support any mapping. The vm_mmap() function is now split up into two functions. A new vm_mmap_object() function handles the "back half" of vm_mmap() and accepts a referenced VM object to map rather than a (handle, handle_type) tuple. vm_mmap() is now reduced to converting a (handle, handle_type) tuple to a a VM object and then calling vm_mmap_object() to handle the actual mapping. The vm_mmap() function remains for use by other parts of the kernel (e.g. device drivers and exec) but now only supports mapping vnodes, character devices, and anonymous memory. The mmap() system call invokes vm_mmap_object() directly with a NULL object for anonymous mappings. For mappings using a file descriptor, the descriptors fo_mmap() hook is invoked instead. The fo_mmap() hook is responsible for performing type-specific checks and adjustments to arguments as well as possibly modifying mapping parameters such as flags or the object offset. The fo_mmap() hook routines then call vm_mmap_object() to handle the actual mapping. The fo_mmap() hook is optional. If it is not set, then fo_mmap() will fail with ENODEV. A fo_mmap() hook is implemented for regular files, character devices, and shared memory objects (created via shm_open()). While here, consistently use the VM_PROT_* constants for the vm_prot_t type for the 'prot' variable passed to vm_mmap() and vm_mmap_object() as well as the vm_mmap_vnode() and vm_mmap_cdev() helper routines. Previously some places were using the mmap()-specific PROT_* constants instead. While this happens to work because PROT_xx == VM_PROT_xx, using VM_PROT_* is more correct. Differential Revision: https://reviews.freebsd.org/D2658 Reviewed by: alc (glanced over), kib MFC after: 1 month Sponsored by: Chelsio	2015-06-04 19:41:15 +00:00
Eric van Gyzen	63e4c6cdf9	Provide vnode in memory map info for files on tmpfs When providing memory map information to userland, populate the vnode pointer for tmpfs files. Set the memory mapping to appear as a vnode type, to match FreeBSD 9 behavior. This fixes the use of tmpfs files with the dtrace pid provider, procstat -v, procfs, linprocfs, pmc (pmcstat), and ptrace (PT_VM_ENTRY). Submitted by: Eric Badger <eric@badgerio.us> (initial revision) Obtained from: Dell Inc. PR: 198431 MFC after: 2 weeks Reviewed by: jhb Approved by: kib (mentor)	2015-06-02 18:37:04 +00:00
Xin LI	4c372ca254	Clear p_stops when doing PT_DETACH. Without this, if a process was being traced by truss(1), which uses different p_stops bits than gdb(1), the latter would misbehave because of the unexpected bits. Reported by: jceel Submitted by: sef Sponsored by: iXsystems, Inc. MFC after: 2 weeks	2015-06-01 18:15:45 +00:00
Konstantin Belousov	aef68c961a	When delivering a signal with default disposition to the thread, tdsigwakeup() increases the priority of the low-priority threads, to give them a chance to be terminated timely. Also, kernel allows user to signal kernel processes. The combined effect is that signalling idle process bump a priority of the selected delivery thread, which starts eating CPU. Check for the delivery thread be an idle thread and do not raise its priority then. The signal delivery to the kernel threads must be opt-in feature. Kernel thread should explicitely declare the ability to handle signals directed to it. E.g., nfsd threads check for signal as an indication of exit request. Most threads do not handle signals at all, and queuing the signal to them causes odd side-effects. Most innocent consequence is the memory leak due to queued ksiginfo, which is never deleted from the sigqueue. Code to prevent even queuing signals to the kernel threads is trivial, but it requires careful examination of each call to kproc/kthread creation to decide should the signalling be allowed. The commit is a stop-gap measure which fixes the immediate case for now. PR: 200493 Reported and tested by: trasz Discussed with: trasz, emaste Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-05-29 16:26:08 +00:00
Konstantin Belousov	69baeadc31	Remove several write-only variables, all reported by the gcc 4.9 buildkernel run. Some of them were write-only under some kernel options, e.g. variables keeping values only used by CTR() macros. It costs nothing to the code readability and correctness to eliminate the warnings in those cases too by removing the local cached values used only for single-access. Review: https://reviews.freebsd.org/D2665 Reviewed by: rodrigc Looked at by: bjk Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-05-29 13:24:17 +00:00
Konstantin Belousov	780dca1b1e	Right now, dounmount() is called with unreferenced mount point. Nothing stops a parallel unmount to suceed before the given call to dounmount() checks and locks the covered vnode. Prevent dounmount() from acting on the freed (although type-stable) memory by changing the interface to require the mount point to be referenced. dounmount() consumes the reference on return, regardless of the sucessfull or erronous result. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-05-27 09:22:50 +00:00
Konstantin Belousov	2db0e1f50d	Add V_MNTREF flag to the vn_start_write(9) and vn_start_secondary_write(9) functions. The flag indicates that the caller already owns a reference on the mount point, and the functions can consume it. The reference is released by vn_finished_write(9) and vn_finished_secondary_write(9) in due course. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-05-27 09:21:47 +00:00
Konstantin Belousov	1bc93bb7b9	Currently, softupdate code detects overstepping on the workitems limits in the code which is deep in the call stack, and owns several critical system resources, like vnode locks. Attempt to wait while the per-mount softupdate thread cleans up the backlog may deadlock, because the thread might need to lock the same vnode which is owned by the waiting thread. Instead of synchronously waiting for the worker, perform the worker' tickle and pause until the backlog is cleaned, at the safe point during return from kernel to usermode. A new ast request to call softdep_ast_cleanup() is created, the SU code now only checks the size of queue and schedules ast. There is no ast delivery for the kernel threads, so they are exempted from the mechanism, except NFS daemon threads. NFS server loop explicitely checks for the request, and informs the schedule_cleanup() that it is capable of handling the requests by the process P2_AST_SU flag. This is needed because nfsd may be the sole cause of the SU workqueue overflow. But, to not cause nsfd to spawn additional threads just because we slow down existing workers, only tickle su threads, without waiting for the backlog cleanup. Reviewed by: jhb, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-05-27 09:20:42 +00:00
John Baldwin	57c74f5b64	Do not allow a process to reap an orphan (a child currently being traced by another process such as a debugger). The parent process does need to check for matching orphan pids to avoid returning ECHILD if an orphan has exited, but it should not return the exited status for the child until after the debugger has detached from the orphan process either explicitly or implicitly via wait(). Add two tests for for this case: one where the debugger is the direct child (thus the parent has a non-empty children list) and one where the debugger is not a direct child (so the only "child" of the parent is the orphan). Differential Revision: https://reviews.freebsd.org/D2644 Reviewed by: kib MFC after: 2 weeks	2015-05-26 10:29:37 +00:00
Xin LI	eb3d0c5d8c	MFuser/delphij/zfs-arc-rebase@r281754: In r256613, taskqueue_enqueue_locked() have been modified to release the task queue lock before returning. In r276665, taskqueue_drain_all() will call taskqueue_enqueue_locked() to insert the barrier task into the queue, but did not reacquire the lock after it but later code expects the lock still being held (e.g. TQ_SLEEP()). The barrier task is special and if we release then reacquire the lock, there would be a small race window where a high priority task could sneak into the queue. Looking more closely, the race seems to be tolerable but is undesirable from semantics standpoint. To solve this, in taskqueue_drain_tq_queue(), instead of directly calling taskqueue_enqueue_locked(), insert the barrier task directly without releasing the lock.	2015-05-26 01:40:33 +00:00
John Baldwin	515b7a0b97	Add KTR tracing for some MI ptrace events. Differential Revision: https://reviews.freebsd.org/D2643 Reviewed by: kib	2015-05-25 22:13:22 +00:00
Dmitry Chagin	7236f2c220	For future use in the Linuxulator: 1. Add a kern_kqueue() counterpart for kqueue() with flags parameter. 2. Be a bit secure. To avoid a double fp lookup add a kern_kevent_fp() counterpart for kern_kevent() with file pointer parameter instead of file descriptor an pass the buck to it. Suggested by: mjg [2] Differential Revision: https://reviews.freebsd.org/D1091 Reviewed by: trasz	2015-05-24 16:36:29 +00:00
Dmitry Chagin	91d1786f65	In preparation for switching linuxulator to the use the native 1:1 threads add a hook for cleaning thread resources before the thread die. Differential Revision: https://reviews.freebsd.org/D1038	2015-05-24 14:51:29 +00:00
Dmitry Chagin	a93e83c8d7	In preparation for switching linuxulator to the use the native 1:1 threads split sys_sched_getparam(), sys_sched_setparam(), sys_sched_getscheduler(), sys_sched_setscheduler() to their kern_* counterparts and add targettd parameter to allow specify the target thread directly by callee. Differential Revision: https://reviews.freebsd.org/D1034 Reviewed by: trasz	2015-05-24 14:44:06 +00:00

1 2 3 4 5 ...

14306 Commits