freebsd-skq

Author	SHA1	Message	Date
attilio	d67371dab6	Switch some "low-hanging fruit" to acquire read lock on vmobjects rather than write locks. Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: pho	2013-04-08 19:58:32 +00:00
attilio	15bf891afe	Rename VM_OBJECT_LOCK(), VM_OBJECT_UNLOCK() and VM_OBJECT_TRYLOCK() to their "write" versions. Sponsored by: EMC / Isilon storage division	2013-02-20 12:03:20 +00:00
attilio	658534ed5a	Switch vm_object lock to be a rwlock. * VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations * VM_OBJECT_SLEEP() is introduced as a general purpose primitve to get a sleep operation using a VM_OBJECT_LOCK() as protection * The approach must bear with vm_pager.h namespace pollution so many files require including directly rwlock.h	2013-02-20 10:38:34 +00:00
kib	de5bb93ec1	When vforked child is traced, the debugging events are not generated until child performs exec(). The behaviour is reasonable when a debugger is the real parent, because the parent is stopped until exec(), and sending a debugging event to the debugger would deadlock both parent and child. On the other hand, when debugger is not the parent of the vforked child, not sending debugging signals makes it impossible to debug across vfork. Fix the issue by declining generating debug signals only when vfork() was done and child called ptrace(PT_TRACEME). Set a new process flag P_PPTRACE from the attach code for PT_TRACEME, if P_PPWAIT flag is set, which indicates that the process was created with vfork() and still did not execed. Check P_PPTRACE from issignal(), instead of refusing the trace outright for the P_PPWAIT case. The scope of P_PPTRACE is exactly contained in the scope of P_PPWAIT. Found and tested by: zont Reviewed by: pluknet MFC after: 2 weeks	2013-02-07 15:34:22 +00:00
kib	560aa751e0	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho	2012-10-22 17:50:54 +00:00
kib	2ded2254c0	Always initialize pl_event. Submitted by: Andrey Zonov <andrey@zonov.org> MFC after: 3 days	2012-08-08 00:20:30 +00:00
davidxu	be2413da6d	If you have pressed CTRL+Z and a process is suspended, then you use gdb to attach to the process, it is surprising that the process is resumed without inputting any gdb commands, however ptrace manual said: The tracing process will see the newly-traced process stop and may then control it as if it had been traced all along. But the current code does not work in this way, unless traced process received a signal later, it will continue to run as a background task. To fix this problem, just send signal SIGSTOP to the traced process after we resumed it, this works like that you are attaching to a running process, it is not perfect but better than nothing.	2012-07-09 09:24:46 +00:00
kib	bfe47eb1df	Allow the parent to gather the exit status of the children reparented to the debugger. When reparenting for debugging, keep the child in the new orphan list of old parent. When looping over the children in kern_wait(), iterate over both children list and orphan list to search for the process by pid. Submitted by: Dmitry Mikulin <dmitrym juniper.net> MFC after: 2 weeks	2012-02-23 11:50:23 +00:00
kib	956d09353b	Mark the automatically attached child with PL_FLAG_CHILD in struct lwpinfo flags, for PT_FOLLOWFORK auto-attachment. In collaboration with: Dmitry Mikulin <dmitrym juniper net> MFC after: 1 week	2012-02-10 00:02:13 +00:00
kmacy	99851f359e	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)	2011-09-16 13:58:51 +00:00
obrien	d17521adb2	Add comment from CSRG rev 7.27 (1992/06/23 19:56:55; author: mckusick)	2011-06-17 21:44:13 +00:00
obrien	f797e31a8d	We should not return ECHILD when debugging a child and the parent does a "wait4(-1, ..., WNOHANG, ...)". Instead wait(2) should behave as if the child does not wish to report status at this time. Reviewed by: jhb	2011-06-14 17:09:30 +00:00
dchagin	1e124ec538	Add macro to test the sv_flags of any process. Change some places to test the flags instead of explicit comparing with address of known sysentvec structures. MFC after: 1 month	2011-01-26 20:03:58 +00:00
kib	72112368cf	Allow debugger to specify that children of the traced process should be automatically traced. Extend the ptrace(PL_LWPINFO) to report that child just forked. Reviewed by: davidxu, jhb MFC after: 2 weeks	2011-01-25 10:59:21 +00:00
alc	be5201b0d1	Introduce vm_fault_hold() and use it to (1) eliminate a long-standing race condition in proc_rwmem() and to (2) simplify the implementation of the cxgb driver's vm_fault_hold_user_pages(). Specifically, in proc_rwmem() the requested read or write could fail because the targeted page could be reclaimed between the calls to vm_fault() and vm_page_hold(). In collaboration with: kib@ MFC after: 6 weeks	2010-12-20 22:49:31 +00:00
attilio	7718cbcbf4	Add the ability for GDB to printout the thread name along with other thread specific informations. In order to do that, and in order to avoid KBI breakage with existing infrastructure the following semantic is implemented: - For live programs, a new member to the PT_LWPINFO is added (pl_tdname) - For cores, a new ELF note is added (NT_THRMISC) that can be used for storing thread specific, miscellaneous, informations. Right now it is just popluated with a thread name. GDB, then, retrieves the correct informations from the corefile via the BFD interface, as it groks the ELF notes and create appropriate pseudo-sections. Sponsored by: Sandvine Incorporated Tested by: gianni Discussed with: dim, kan, kib MFC after: 2 weeks	2010-11-22 14:42:13 +00:00
davidxu	55194e796c	Create a global thread hash table to speed up thread lookup, use rwlock to protect the table. In old code, thread lookup is done with process lock held, to find a thread, kernel has to iterate through process and thread list, this is quite inefficient. With this change, test shows in extreme case performance is dramatically improved. Earlier patch was reviewed by: jhb, julian	2010-10-09 02:50:23 +00:00
kib	22a31bdc6e	Extend ptrace(PT_LWPINFO) to report siginfo for the signal that caused debugee stop. The change should keep the ABI. Take care of compat32. Discussed with: davidxu, jhb MFC after: 2 weeks	2010-07-04 11:48:30 +00:00
ed	76489ac1ea	Use ISO C99 integer types in sys/kern where possible. There are only about 100 occurences of the BSD-specific u_int*_t datatypes in sys/kern. The ISO C99 integer types are used here more often.	2010-06-21 09:55:56 +00:00
jhb	6caceffefa	Ignore the 'addr' argument passed to PT_STEP (it is required to be '1' for PT_STEP which means "ignore") and PT_DETACH. PR: kern/146167 MFC after: 1 week	2010-05-25 21:32:37 +00:00
kib	4208ccbe79	Reorganize syscall entry and leave handling. Extend struct sysvec with three new elements: sv_fetch_syscall_args - the method to fetch syscall arguments from usermode into struct syscall_args. The structure is machine-depended (this might be reconsidered after all architectures are converted). sv_set_syscall_retval - the method to set a return value for usermode from the syscall. It is a generalization of cpu_set_syscall_retval(9) to allow ABIs to override the way to set a return value. sv_syscallnames - the table of syscall names. Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding the call to cpu_set_syscall_retval(). The new functions syscallenter(9) and syscallret(9) are provided that use sv_syscall pointers and contain the common repeated code from the syscall() implementations for the architecture-specific syscall trap handlers. Syscallenter() fetches arguments, calls syscall implementation from ABI sysent table, and set up return frame. The end of syscall bookkeeping is done by syscallret(). Take advantage of single place for MI syscall handling code and implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the thread is stopped at syscall entry or return point respectively. The EXEC flag augments SCX and notifies debugger that the process address space was changed by one of exec(2)-family syscalls. The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are changed to use syscallenter()/syscallret(). MIPS and arm are not converted and use the mostly unchanged syscall() implementation. Reviewed by: jhb, marcel, marius, nwhitehorn, stas Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc), stas (mips) MFC after: 1 month	2010-05-23 18:32:02 +00:00
kmacy	1dc1263413	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib	2010-04-30 00:46:43 +00:00
nwhitehorn	142a4d2993	Provide groundwork for 32-bit binary compatibility on non-x86 platforms, for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32 option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts of the kernel and enhances the freebsd32 compatibility code to support big-endian platforms. Reviewed by: kib, jhb	2010-03-11 14:49:06 +00:00
marcel	574fc5dc43	Initialize pve_fsid and pve_fileid to VNOVAL.	2010-02-11 21:10:56 +00:00
marcel	59fa82c67c	o Add support for COMPAT_IA32. o Incorporate review comments: - Properly reference and lock the map - Take into account that the VM map can change inbetween requests - Add the fileid and fsid attributes Credits: kib@ Reviewed by: kib@	2010-02-11 18:00:53 +00:00
marcel	bcd7bda0da	Unbreak building kernels with COMPAT_32 enabled. The actual support for the PT_VM_ENTRY request from 32-bit processes will follow. Pointy hat: marcel	2010-02-09 17:20:00 +00:00
marcel	764ce56ace	Add PT_VM_TIMESTAMP and PT_VM_ENTRY so that the tracing process can obtain the memory map of the traced process. PT_VM_TIMESTAMP can be used to check if the memory map changed since the last time to avoid iterating over all the VM entries unnecesarily. MFC after: 1 month	2010-02-09 05:52:35 +00:00
kib	ddbe48bd56	For PT_TO_SCE stop that stops the ptraced process upon syscall entry, syscall arguments are collected before ptracestop() is called. As a consequence, debugger cannot modify syscall or its arguments. For i386, amd64 and ia32 on amd64 MD syscall(), reread syscall number and arguments after ptracestop(), if debugger modified anything in the process environment. Since procfs stopeven requires number of syscall arguments in p_xstat, this cannot be solved by moving stop/trace point before argument fetching. Move the code to read arguments into separate function fetch_syscall_args() to avoid code duplication. Note that ktrace point for modified syscall is intentionally recorded twice, once with original arguments, and second time with the arguments set by debugger. PT_TO_SCX stop is executed after cpu_syscall_set_retval() already. Reported by: Ali Polatel <alip exherbo org> Briefly discussed with: jhb MFC after: 3 weeks	2010-01-23 11:45:35 +00:00
alc	2d9252d6c7	Replace VM_PROT_OVERRIDE_WRITE by VM_PROT_COPY. VM_PROT_OVERRIDE_WRITE has represented a write access that is allowed to override write protection. Until now, VM_PROT_OVERRIDE_WRITE has been used to write breakpoints into text pages. Text pages are not just write protected but they are also copy-on-write. VM_PROT_OVERRIDE_WRITE overrides the write protection on the text page and triggers the replication of the page so that the breakpoint will be written to a private copy. However, here is where things become confused. It is the debugger, not the process being debugged that requires write access to the copied page. Nonetheless, the copied page is being mapped into the process with write access enabled. In other words, once the debugger sets a breakpoint within a text page, the program can write to its private copy of that text page. Whereas prior to setting the breakpoint, a SIGSEGV would have occurred upon a write access. VM_PROT_COPY addresses this problem. The combination of VM_PROT_READ and VM_PROT_COPY forces the replication of a copy-on-write page even though the access is only for read. Moreover, the replicated page is only mapped into the process with read access, and not write access. Reviewed by: kib MFC after: 4 weeks	2009-11-26 05:16:07 +00:00
alc	b4400267f9	Update a comment to reflect the previous change.	2009-10-25 02:48:29 +00:00
marcel	51bb720939	o Introduce vm_sync_icache() for making the I-cache coherent with the memory or D-cache, depending on the semantics of the platform. vm_sync_icache() is basically a wrapper around pmap_sync_icache(), that translates the vm_map_t argumument to pmap_t. o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc it replaces the pmap_page_executable() function, added to solve the I-cache problem in uiomove_fromphys(). o In proc_rwmem() call vm_sync_icache() when writing to a page that has execute permissions. This assures that when breakpoints are written, the I-cache will be coherent and the process will actually hit the breakpoint. o This also fixes the Book-E PMAP implementation that was missing necessary locking while trying to deal with the I-cache coherency in pmap_enter() (read: mmu_booke_enter_locked). The key property of this change is that the I-cache is made coherent after writes have been done. Doing it in the PMAP layer when adding or changing a mapping means that the I-cache is made coherent before any writes happen. The difference is key when the I-cache prefetches.	2009-10-21 18:38:02 +00:00
rwatson	b803771ae1	Clean up a number of aspects of token generation from audit arguments to system calls: - Centralize generation of argument tokens for VM addresses in a macro, ADDR_TOKEN(), and properly encode 64-bit addresses in 64-bit arguments. - Fix up argument numbers across a large number of syscalls so that they match the numeric argument into the system call. - Don't audit the address argument to ioctl(2) or ptrace(2), but do keep generating tokens for mmap(2), minherit(2), since they relate to passing object access across execve(2). Approved by: re (audit argument blanket) Obtained from: TrustedBSD Project MFC after: 1 week	2009-07-02 09:15:30 +00:00
rwatson	da78c9e4a2	Replace AUDIT_ARG() with variable argument macros with a set more more specific macros for each audit argument type. This makes it easier to follow call-graphs, especially for automated analysis tools (such as fxr). In MFC, we should leave the existing AUDIT_ARG() macros as they may be used by third-party kernel modules. Suggested by: brooks Approved by: re (kib) Obtained from: TrustedBSD Project MFC after: 1 week	2009-06-27 13:58:44 +00:00
kib	fa686c638e	Implement global and per-uid accounting of the anonymous memory. Add rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved for the uid. The accounting information (charge) is associated with either map entry, or vm object backing the entry, assuming the object is the first one in the shadow chain and entry does not require COW. Charge is moved from entry to object on allocation of the object, e.g. during the mmap, assuming the object is allocated, or on the first page fault on the entry. It moves back to the entry on forks due to COW setup. The per-entry granularity of accounting makes the charge process fair for processes that change uid during lifetime, and decrements charge for proper uid when region is unmapped. The interface of vm_pager_allocate(9) is extended by adding struct ucred *, that is used to charge appropriate uid when allocation if performed by kernel, e.g. md(4). Several syscalls, among them is fork(2), may now return ENOMEM when global or per-uid limits are enforced. In collaboration with: pho Reviewed by: alc Approved by: re (kensmith)	2009-06-23 20:45:22 +00:00
kib	c672211541	Use the p_sysent->sv_flags flag SV_ILP32 to detect 32bit process executing on 64bit kernel. This eliminates the direct comparisions of p_sysent with &ia32_freebsd_sysvec, that were left intact after r185169.	2009-03-02 18:43:50 +00:00
davidxu	1ebf3ee9a3	Revert rev 184216 and 184199, due to the way the thread_lock works, it may cause a lockup. Noticed by: peter, jhb	2008-11-05 03:01:23 +00:00
davidxu	2062caca24	Actually, for signal and thread suspension, extra process spin lock is unnecessary, the normal process lock and thread lock are enough. The spin lock is still needed for process and thread exiting to mimic single sched_lock.	2008-10-23 07:55:38 +00:00
davidxu	5068f6dcf0	Move per-thread userland debugging flags into seperated field, this eliminates some problems of locking, e.g, a thread lock is needed but can not be used at that time. Only the process lock is needed now for new field.	2008-10-15 06:31:37 +00:00
jeff	46f09d5bc3	- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.	2008-03-19 06:19:01 +00:00
jeff	acb93d599c	Remove kernel support for M:N threading. While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.	2008-03-12 10:12:01 +00:00
ups	c6c6cc7efa	Use VM_FAULT_DIRTY to fault in pages for write access in proc_rwmen. Otherwise copy on write may create an anonymous page that is not marked as dirty. Since writing data to these pages in this function also does not dirty these pages they may be later discarded by the pagedaemon.	2007-11-08 19:35:36 +00:00
jeff	8f126294e7	- Fix from pr kern/115469; Don't redeliver a signal once it has been handled by the target process. Contributed by: Tijl Coosemans <tijl@ulyssis.org> Approved by: re	2007-10-09 00:03:39 +00:00
jeff	3fc0f8b973	- Move all of the PS_ flags into either p_flag or td_flags. - p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or previously the sched_lock. These bugs have existed for some time. - Allow swapout to try each thread in a process individually and then swapin the whole process if any of these fail. This allows us to move most scheduler related swap flags into td_flags. - Keep ki_sflag for backwards compat but change all in source tools to use the new and more correct location of P_INMEM. Reported by: pho Reviewed by: attilio, kib Approved by: re (kensmith)	2007-09-17 05:31:39 +00:00
jeff	91d1501790	Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-05 00:00:57 +00:00
rwatson	300d4098cf	Remove 'MPSAFE' annotations from the comments above most system calls: all system calls now enter without Giant held, and then in some cases, acquire Giant explicitly. Remove a number of other MPSAFE annotations in the credential code and tweak one or two other adjacent comments.	2007-03-04 22:36:48 +00:00
jb	f82c799735	Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@	2006-10-26 21:42:22 +00:00
davidxu	4889cc1ac4	Move sigqueue_take() call into proc_reparent(), this fixed bugs where proc_reparent() is called but sigqueue_take() is forgotten.	2006-10-25 06:18:04 +00:00
trhodes	e7d9ab0897	Close a race condition where num can be larger than tmp, giving the user too large of a boundary. Reported by: Ilja Van Sprundel	2006-10-14 10:30:14 +00:00
cperciva	4ec3e871a6	Fix a signedness bug. MFC after: 3 days Security: Local DoS	2006-08-20 10:29:08 +00:00
jhb	ff9c76bccd	Close some races between procfs/ptrace and exit(2): - Reorder the events in exit(2) slightly so that we trigger the S_EXIT stop event earlier. After we have signalled that, we set P_WEXIT and then wait for any processes with a hold on the vmspace via PHOLD to release it. PHOLD now KASSERT()'s that P_WEXIT is clear when it is invoked, and PRELE now does a wakeup if P_WEXIT is set and p_lock drops to zero. - Change proc_rwmem() to require that the processing read from has its vmspace held via PHOLD by the caller and get rid of all the junk to screw around with the vmspace reference count as we no longer need it. - In ptrace() and pseudofs(), treat a process with P_WEXIT set as if it doesn't exist. - Only do one PHOLD in kern_ptrace() now, and do it earlier so it covers FIX_SSTEP() (since on alpha at least this can end up calling proc_rwmem() to clear an earlier single-step simualted via a breakpoint). We only do one to avoid races. Also, by making the EINVAL error for unknown requests be part of the default: case in the switch, the various switch cases can now just break out to return which removes a _lot_ of duplicated PRELE and proc unlocks, etc. Also, it fixes at least one bug where a LWP ptrace command could return EINVAL with the proc lock still held. - Changed the locking for ptrace_single_step(), ptrace_set_pc(), and ptrace_clear_single_step() to always be called with the proc lock held (it was a mixed bag previously). Alpha and arm have to drop the lock while the mess around with breakpoints, but other archs avoid extra lock release/acquires in ptrace(). I did have to fix a couple of other consumers in kern_kse and a few other places to hold the proc lock and PHOLD. Tested by: ps (1 mostly, but some bits of 2-4 as well) MFC after: 1 week	2006-02-22 18:57:50 +00:00

1 2 3 4

185 Commits