freebsd-nq

Author	SHA1	Message	Date
Pawel Jakub Dawidek	c345faea5a	Replace expand_name() function with corefile_open() function, which not only returns name, but also vnode of corefile to use. This simplifies the code and closes few races, especially in %I handling. Reviewed by: kib Obtained from: WHEEL Systems	2012-12-19 23:59:48 +00:00
Pawel Jakub Dawidek	22a5d85aa9	Use correct file permissions when looking for available core file if kern.corefile contains %I. Obtained from: WHEEL Systems	2012-12-19 23:40:02 +00:00
Pawel Jakub Dawidek	07a8e07896	The 'flags' argument can be modified in vn_open_cred(), so we need to set it for every loop interation. Pointed out by: kib	2012-12-19 12:14:08 +00:00
Pawel Jakub Dawidek	cc58032c44	Do not audit paths we try when kern.corefile contains %I. Obtained from: WHEEL Systems	2012-12-19 12:12:53 +00:00
Pawel Jakub Dawidek	29146f1a7a	Style cleanups.	2012-12-19 12:10:14 +00:00
Pawel Jakub Dawidek	086053a370	The expand_name() function isn't called with the process lock held anymore, so we can safely use malloc(M_WAITOK) now. Pointed out by: kib	2012-12-19 12:00:09 +00:00
Pawel Jakub Dawidek	f06f465db7	Minor style tweaks. Obtained from: WHEEL Systems	2012-12-17 10:51:22 +00:00
Pawel Jakub Dawidek	c52ff61196	Better variables naming in expand_name() to be more consistent with coredump(). Obtained from: WHEEL Systems	2012-12-17 10:48:10 +00:00
Pawel Jakub Dawidek	dd57ce87eb	Move expand_name() after process lock is released. This fixed panic where we hold mutex (process lock) and try to obtain sleepable lock (vnode lock in expand_name()). The panic could occur when %I was used in kern.corefile. Additionally we avoid expand_name() overhead when coredumps are disabled. Obtained from: WHEEL Systems	2012-12-16 14:53:27 +00:00
Pawel Jakub Dawidek	2ce1b32df2	Don't add audit record when coredumps are disabled or name cannot be expanded. Discussed with: rwatson Obtained from: WHEEL Systems	2012-12-16 14:24:59 +00:00
Pawel Jakub Dawidek	7e73ee85ab	Make the check easier to read. Obtained from: WHEEL Systems	2012-12-16 14:14:18 +00:00
Pawel Jakub Dawidek	b039f8c2aa	Use 'cred' variable. Obtained from: WHEEL Systems	2012-12-16 13:56:38 +00:00
Pawel Jakub Dawidek	b0c9d4d70e	Add kern.capmode_coredump sysctl/tunable to allow processes in capability mode to dump core. Reviewed by: rwatson Obtained from: WHEEL Systems MFC after: 2 weeks	2012-11-27 10:38:11 +00:00
Pawel Jakub Dawidek	8890f5d020	Allow to use kill(2) in capability mode, but process can send a signal only to himself. For example abort(3) at first tries to do kill(getpid(), SIGABRT) which was failing in capability mode, so the code was failing back to exit(1). Reviewed by: rwatson Obtained from: WHEEL Systems MFC after: 2 weeks	2012-11-27 10:22:40 +00:00
Pawel Jakub Dawidek	b62d05fcf9	Allow to modify kern.sugid_coredump and kern.corefile from loader.conf. Obtained from: WHEEL Systems	2012-11-27 10:16:48 +00:00
Pawel Jakub Dawidek	c320984687	More style fixes.	2012-11-27 10:15:58 +00:00
Pawel Jakub Dawidek	23c6445a4b	Style fixes (mostly whitespaces).	2012-11-27 10:11:54 +00:00
Konstantin Belousov	5050aa86cf	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho	2012-10-22 17:50:54 +00:00
Eitan Adler	3d74f47b90	Correct the killpg(2) return values: Return EPERM if processes were found but they were unable to be signaled. Return the first error from p_cansignal if no signal was successful. Reviewed by: jilles Approved by: cperciva MFC after: 1 week	2012-10-22 03:43:02 +00:00
Eitan Adler	10950e4651	Colin acked the wrong diff originally. fixed version coming soon. Approved by: cperciva (implicit)	2012-10-22 03:36:44 +00:00
Eitan Adler	2a1c0e4d4e	Correct the killpg(2) return values: Return EPERM if processes were found but they were unable to be signaled. Return the first error from p_cansignal if no signal was successful. Reviewed by: jilles Approved by: cperciva MFC after: 1 week	2012-10-22 03:34:43 +00:00
John Baldwin	0f14f15b62	Ignore stop and continue signals sent to an exiting process. Stop signals set p_xstat to the signal that triggered the stop, but p_xstat is also used to hold the exit status of an exiting process. Without this change, a stop signal that arrived after a process was marked P_WEXIT but before it was marked a zombie would overwrite the exit status with the stop signal number. Reviewed by: kib MFC after: 1 week	2012-09-13 15:51:18 +00:00
Konstantin Belousov	888aefef89	Deliver SIGSYS to the guilty thread, not to the process. MFC after: 1 week	2012-08-18 18:17:10 +00:00
David Xu	7ce60f6013	Always clear p_xthread if current thread no longer needs it, in theory, if debugger exited without calling ptrace(PT_DETACH), there is a time window that the p_xthread may be pointing to non-existing thread, in practical, this is not a problem because child process soon will be killed by parent process.	2012-07-10 05:45:13 +00:00
Konstantin Belousov	2dd9ea6f70	Add thread-private flag to indicate that error value is already placed in td_errno. Flag is supposed to be used by syscalls returning EJUSTRETURN because errno was already placed into the usermode frame by a call to set_syscall_retval(9). Both ktrace and dtrace get errno value from td_errno if the flag is set. Use the flag to fix sigsuspend(2) error return ktrace records. Requested by: bde MFC after: 1 week	2012-04-12 10:48:43 +00:00
Jilles Tjoelker	8a8be77610	Remove unused and wrong SA_PROC internal signal property. The SA_PROC signal property indicated whether each signal number is directed at a specific thread or at the process in general. However, that depends on how the signal was generated and not on the signal number. SA_PROC was not used.	2012-04-09 21:58:58 +00:00
Ed Schouten	6472ac3d8a	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
Sergey Kandaurov	c241c5e49a	Fix arguments list for proc:::signal-discard DTrace probe. Reported by: Anton Yuzhaninov <citrin citrin ru> MFC after: 1 week	2011-10-28 15:22:51 +00:00
Konstantin Belousov	8e9a54ee46	The sigwait(3) function shall not return EINTR, according to the POSIX/SUSvN. The sigwait(2) syscall does return EINTR, and libc.so.7 contains the wrapper sigwait(3) which hides EINTR from callers. The EINTR return is used by libthr to handle required cancellation point in the sigwait(3). To help the binaries linked against pre-libc.so.7, i.e. RELENG_6 and earlier, to have right ABI for sigwait(3), transform EINTR return from sigwait(2) into ERESTART. Discussed with: davidxu MFC after: 1 week	2011-10-01 10:18:55 +00:00
Kip Macy	8451d0dd78	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)	2011-09-16 13:58:51 +00:00
Jonathan Anderson	cfb5f76865	Add experimental support for process descriptors A "process descriptor" file descriptor is used to manage processes without using the PID namespace. This is required for Capsicum's Capability Mode, where the PID namespace is unavailable. New system calls pdfork(2) and pdkill(2) offer the functional equivalents of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote process for debugging purposes. The currently-unimplemented pdwait(2) will, in the future, allow querying rusage/exit status. In the interim, poll(2) may be used to check (and wait for) process termination. When a process is referenced by a process descriptor, it does not issue SIGCHLD to the parent, making it suitable for use in libraries---a common scenario when using library compartmentalisation from within large applications (such as web browsers). Some observers may note a similarity to Mach task ports; process descriptors provide a subset of this behaviour, but in a UNIX style. This feature is enabled by "options PROCDESC", but as with several other Capsicum kernel features, is not enabled by default in GENERIC 9.0. Reviewed by: jhb, kib Approved by: re (kib), mentor (rwatson) Sponsored by: Google Inc	2011-08-18 22:51:30 +00:00
Edward Tomasz Napierala	b8fdb0d94d	Fix support for RACCT_CORE by merging forgotten file.	2011-05-26 18:54:07 +00:00
Jilles Tjoelker	6100955206	ktrace: Log the code for all signals (PSIG events). The code provides information on how the signal was generated. Formerly, the code was only logged for traps, much like only signal handlers for traps received a meaningful si_code before FreeBSD 7.0. In rare cases, no information is available and 0 is still logged. MFC after: 1 week	2011-04-17 14:38:11 +00:00
John Baldwin	e806d352d2	Fix several places to ignore processes that are not yet fully constructed. MFC after: 1 week	2011-04-06 17:47:22 +00:00
John Baldwin	c3b127e022	Small style fix.	2011-03-23 13:44:32 +00:00
Konstantin Belousov	6fa39a7327	Allow debugger to specify that children of the traced process should be automatically traced. Extend the ptrace(PL_LWPINFO) to report that child just forked. Reviewed by: davidxu, jhb MFC after: 2 weeks	2011-01-25 10:59:21 +00:00
David Xu	407af02b6e	In kern_sigtimedwait(), move initialization code out of process lock, instead of using SIGISMEMBER to test every interesting signal, just unmask the signal set and let cursig() return one, get the signal after it returns, call reschedule_signal() after signals are blocked again. In kern_sigprocmask(), don't call reschedule_signal() when it is unnecessary. In reschedule_signal(), replace SIGISEMPTY() + SIGISMEMBER() with sig_ffs(), rename variable 'i' to sig.	2010-10-14 08:01:33 +00:00
David Xu	fc4ecc1d48	sigqueue_collect_set() is no longer needed because other functions maintain pending set correctly.	2010-10-13 06:28:40 +00:00
David Xu	cf7d9a8ca8	Create a global thread hash table to speed up thread lookup, use rwlock to protect the table. In old code, thread lookup is done with process lock held, to find a thread, kernel has to iterate through process and thread list, this is quite inefficient. With this change, test shows in extreme case performance is dramatically improved. Earlier patch was reviewed by: jhb, julian	2010-10-09 02:50:23 +00:00
Matthew D Fleming	4d369413e1	Replace sbuf_overflowed() with sbuf_error(), which returns any error code associated with overflow or with the drain function. While this function is not expected to be used often, it produces more information in the form of an errno that sbuf_overflowed() did.	2010-09-10 16:42:16 +00:00
David Xu	137cf33d5e	rescure comments from RELENG_4.	2010-09-01 01:26:07 +00:00
David Xu	83b718eb07	If a process is being debugged, skips job control caused by SIGSTOP/SIGCONT signals, because it is managed by debugger, however a normal signal sent to a interruptibly sleeping thread wakes up the thread so it will handle the signal when the process leaves the stopped state. PR: 150138 MFC after: 1 week	2010-08-31 07:15:50 +00:00
Rui Paulo	79856499bd	Add an extra comment to the SDT probes definition. This allows us to get use '-' in probe names, matching the probe names in Solaris.[1] Add userland SDT probes definitions to sys/sdt.h. Sponsored by: The FreeBSD Foundation Discussed with: rwaston [1]	2010-08-22 11:18:57 +00:00
David Xu	212bc4b337	Fix function name in error messages.	2010-07-20 02:23:12 +00:00
John Baldwin	fc8cca02c7	- Various style and whitespace fixes. - Make sugid_coredump and kern_logsigexit private to kern_sig.c. Submitted by: bde (partially) MFC after: 1 month	2010-07-08 19:15:26 +00:00
Konstantin Belousov	8a26007903	Extend ptrace(PT_LWPINFO) to report siginfo for the signal that caused debugee stop. The change should keep the ABI. Take care of compat32. Discussed with: davidxu, jhb MFC after: 2 weeks	2010-07-04 11:48:30 +00:00
John Baldwin	ad6eec7b9e	Tweak the in-kernel API for sending signals to threads: - Rename tdsignal() to tdsendsignal() and make it private to kern_sig.c. - Add tdsignal() and tdksignal() routines that mirror psignal() and pksignal() except that they accept a thread as an argument instead of a process. They send a signal to a specific thread rather than to an individual process. Reviewed by: kib	2010-06-29 20:41:52 +00:00
Konstantin Belousov	c51050129f	Do not report a stack garbage as the old value for debug.ncores sysctl. Reported by: brucec	2010-06-21 09:51:25 +00:00
Konstantin Belousov	afe1a68827	Reorganize syscall entry and leave handling. Extend struct sysvec with three new elements: sv_fetch_syscall_args - the method to fetch syscall arguments from usermode into struct syscall_args. The structure is machine-depended (this might be reconsidered after all architectures are converted). sv_set_syscall_retval - the method to set a return value for usermode from the syscall. It is a generalization of cpu_set_syscall_retval(9) to allow ABIs to override the way to set a return value. sv_syscallnames - the table of syscall names. Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding the call to cpu_set_syscall_retval(). The new functions syscallenter(9) and syscallret(9) are provided that use sv_syscall pointers and contain the common repeated code from the syscall() implementations for the architecture-specific syscall trap handlers. Syscallenter() fetches arguments, calls syscall implementation from ABI sysent table, and set up return frame. The end of syscall bookkeeping is done by syscallret(). Take advantage of single place for MI syscall handling code and implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the thread is stopped at syscall entry or return point respectively. The EXEC flag augments SCX and notifies debugger that the process address space was changed by one of exec(2)-family syscalls. The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are changed to use syscallenter()/syscallret(). MIPS and arm are not converted and use the mostly unchanged syscall() implementation. Reviewed by: jhb, marcel, marius, nwhitehorn, stas Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc), stas (mips) MFC after: 1 month	2010-05-23 18:32:02 +00:00
Alfred Perlstein	b7402d8269	Avoid allocating MAXHOSTNAMELEN bytes on the stack in expand_name(), use the heap instead. Obtained from: Juniper Networks Reviewed by: jhb	2010-04-30 03:15:00 +00:00
Konstantin Belousov	3f1c4c4f31	When OOM searches for a process to kill, ignore the processes already killed by OOM. When killed process waits for a page allocation, try to satisfy the request as fast as possible. This removes the often encountered deadlock, where OOM continously selects the same victim process, that sleeps uninterruptibly waiting for a page. The killed process may still sleep if page cannot be obtained immediately, but testing has shown that system has much higher chance to survive in OOM situation with the patch. In collaboration with: pho Reviewed by: alc MFC after: 4 weeks	2010-04-06 10:43:01 +00:00
Alfred Perlstein	e722820434	Merge projects/enhanced_coredumps (r204346) into HEAD: Enhanced process coredump routines. This brings in the following features: 1) Limit number of cores per process via the %I coredump formatter. Example: if corefilename is set to %N.%I.core AND num_cores = 3, then if a process "rpd" cores, then the corefile will be named "rpd.0.core", however if it cores again, then the kernel will generate "rpd.1.core" until we hit the limit of "num_cores". this is useful to get several corefiles, but also prevent filling the machine with corefiles. 2) Encode machine hostname in core dump name via %H. 3) Compress coredumps, useful for embedded platforms with limited space. A sysctl kern.compress_user_cores is made available if turned on. To enable compressed coredumps, the following config options need to be set: options COMPRESS_USER_CORES device zlib # brings in the zlib requirements. device gzio # brings in the kernel vnode gzip output module. 4) Eventhandlers are fired to indicate coredumps in progress. 5) The imgact sv_coredump routine has grown a flag to pass in more state, currently this is used only for passing a flag down to compress the coredump or not. Note that the gzio facility can be used for generic output of gzip'd streams via vnodes. Obtained from: Juniper Networks Reviewed by: kan	2010-03-02 06:58:58 +00:00
Konstantin Belousov	a5799a4f27	Staticise sigqueue manipulation functions used only in kern_sig.c. MFC after: 1 week	2010-01-23 11:43:30 +00:00
Konstantin Belousov	6a671a6bde	When traced process is about to receive the signal, the process is stopped and debugger may modify or drop the signal. After the changes to keep process-targeted signals on the process sigqueue, another thread may note the old signal on the queue and act before the thread removes changed or dropped signal from the process queue. Since process is traced, it usually gets stopped. Or, if the same signal is delivered while process was stopped, the thread may erronously remove it, intending to remove the original signal. Remove the signal from the queue before notifying the debugger. Restore the siginfo to the head of sigqueue when signal is allowed to be delivered to the debugee, using newly introduced KSI_HEAD ksiginfo_t flag. This preserves required order of delivery. Always restore the unchanged signal on the curthread sigqueue, not to the process queue, since the thread is about to get it anyway, because sigmask cannot be changed. Handle failure of reinserting the siginfo into the queue by falling back to sq_kill method, calling sigqueue_add with NULL ksi. If debugger changed the signal to be delivered, use sigqueue_add() with NULL ksi instead of only setting sq_signals bit. Reported by: Gardner Bell <gbell72 rogers com> Analyzed and first version of fix by: Tijl Coosemans <tijl coosemans org> PR: 142757 Reviewed by: davidxu MFC after: 2 weeks	2010-01-20 11:58:04 +00:00
Konstantin Belousov	4f17d481ed	Remove wrong assertion. Debugee is allowed to lose a signal. Reported and tested by: jh MFC after: 2 weeks	2009-12-03 20:16:59 +00:00
Konstantin Belousov	a3de221dbe	Among signal generation syscalls, only sigqueue(2) is allowed by POSIX to fail due to lack of resources to queue siginfo. Add KSI_SIGQ flag that allows sigqueue_add() to fail while trying to allocate memory for new siginfo. When the flag is not set, behaviour is the same as for KSI_TRAP: if memory cannot be allocated, set bit in sq_kill. KSI_TRAP is kept to preserve KBI. Add SI_KERNEL si_code, to be used in siginfo.si_code when signal is generated by kernel. Deliver siginfo when signal is generated by kill(2) family of syscalls (SI_USER with properly filled si_uid and si_pid), or by kernel (SI_KERNEL, mostly job control or SIGIO). Since KSI_SIGQ flag is not set for the ksi, low memory condition cause old behaviour. Keep psignal(9) KBI intact, but modify it to generate SI_KERNEL si_code. Pgsignal(9) and gsignal(9) now take ksi explicitely. Add pksignal(9) that behaves like psignal but takes ksi, and ddb kill command implemented as pksignal(..., ksi = NULL) to not do allocation while in debugger. While there, remove some register specifiers and use ANSI C prototypes. Reviewed by: davidxu MFC after: 1 month	2009-11-17 11:39:15 +00:00
Konstantin Belousov	75c586a4c8	In r198506, kern_sigsuspend() started doing cursig/postsig loop to make sure that a signal was delivered to the thread before returning from syscall. Signal delivery puts new return frame on the user stack, and modifies trap frame to enter signal handler. As a consequence, syscall return code sets EINTR as error return for signal frame, instead of the syscall return. Also, for ia64, due to different registers layout for those two kind of frames, usermode sigsegfaulted when returned from signal handler. Use newly-introduced cpu_set_syscall_retval(9) to set syscall result, and return EJUSTRETURN from kern_sigsuspend() to prevent syscall return code from modifying this frame [1]. Another issue is that pending SIGCONT might be cancelled by SIGSTOP, causing postsig() not to deliver any catched signal [2]. Modify postsig() to return 1 if signal was posted, and 0 otherwise, and use this in the kern_sigsuspend loop. Proposed by: marcel [1] Noted by: davidxu [2] Reviewed by: marcel, davidxu MFC after: 1 month	2009-11-10 11:46:53 +00:00
Konstantin Belousov	80a8b0f3bf	Trapsignal() and postsig() call kern_sigprocmask() with both process lock and curproc->p_sigacts->ps_mtx. Reschedule_signals may need to have ps_mtx locked to decide and wakeup a thread, causing recursion on the mutex. Inform kern_sigprocmask() and reschedule_signals() about lock state of the ps_mtx by new flag SIGPROCMASK_PS_LOCKED to avoid recursion. Reported and tested by: keramida MFC after: 1 month	2009-10-30 10:10:39 +00:00
Konstantin Belousov	8084540253	Trapsignal() calls kern_sigprocmask() when delivering catched signal with proc lock held. Reported and tested by: Mykola Dzham freebsd at levsha org ua MFC after: 1 month	2009-10-29 14:34:24 +00:00
Konstantin Belousov	d6e029adbe	In r197963, a race with thread being selected for signal delivery while in kernel mode, and later changing signal mask to block the signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls. Use kern_sigprocmask() instead of direct manipulation of td_sigmask to reschedule newly blocked signals, closing the race. Reviewed by: davidxu Tested by: pho MFC after: 1 month	2009-10-27 10:47:58 +00:00
Konstantin Belousov	84440afb54	In kern_sigsuspend(), better manipulate thread signal mask using kern_sigprocmask() to properly notify other possible candidate threads for signal delivery. Since sigsuspend() shall only return to usermode after a signal was delivered, do cursig/postsig loop immediately after waiting for signal, repeating the wait if wakeup was spurious due to race with other thread fetching signal from the process queue before us. Add thread_suspend_check() call to allow the thread to be stopped or killed while in loop. Modify last argument of kern_sigprocmask() from boolean to flags, allowing the function to be called with locked proc. Convertion of the callers that supplied 1 to the old argument will be done in the next commit, and due to SIGPROCMASK_OLD value equial to 1, code is formally correct in between. Reviewed by: davidxu Tested by: pho MFC after: 1 month	2009-10-27 10:42:24 +00:00
Joseph Koshy	8b5adf9dd9	Improve the description of sysctl "kern.sugid_coredump". Submitted by: Mel Flynn <mel.flynn+fbsd.hackers at mailing.thruhere.net> on -hackers	2009-10-12 15:49:48 +00:00
Konstantin Belousov	7df8f6ab6f	Fix typo. Submitted by: rdivacky MFC after: 1 month	2009-10-12 10:09:48 +00:00
Konstantin Belousov	6b286ee8b5	Currently, when signal is delivered to the process and there is a thread not blocking the signal, signal is placed on the thread sigqueue. If the selected thread is in kernel executing thr_exit() or sigprocmask() syscalls, then signal might be not delivered to usermode for arbitrary amount of time, and for exiting thread it is lost. Put process-directed signals to the process queue unconditionally, selecting the thread to deliver the signal only by the thread returning to usermode, since only then the thread can handle delivery of signal reliably. For exiting thread or thread that has blocked some signals, check whether the newly blocked signal is queued for the process, and try to find a thread to wakeup for delivery, in reschedule_signal(). For exiting thread, assume that all signals are blocked. Change cursig() and postsig() to look both into the thread and process signal queues. When there is a signal that thread returning to usermode could consume, TDF_NEEDSIGCHK flag is not neccessary set now. Do unlocked read of p_siglist and p_pendingcnt to check for queued signals. Note that thread that has a signal unblocked might get spurious wakeup and EINTR from the interruptible system call now, due to the possibility of being selected by reschedule_signals(), while other thread returned to usermode earlier and removed the signal from process queue. This should not cause compliance issues, since the thread has not blocked a signal and thus should be ready to receive it anyway. Reported by: Justin Teller <justin.teller gmail com> Reviewed by: davidxu, jilles MFC after: 1 month	2009-10-11 16:49:30 +00:00
Konstantin Belousov	15b7a831df	Fix typo. MFC after: 3 days	2009-10-01 12:46:58 +00:00
Robert Watson	e76d823b81	Use C99 initialization for struct filterops. Obtained from: Mac OS X Sponsored by: Apple Inc. MFC after: 3 weeks	2009-09-12 20:03:45 +00:00
Konstantin Belousov	f33a947b56	Add new msleep(9) flag PBDY that shall be specified together with PCATCH, to indicate that thread shall not be stopped upon receipt of SIGSTOP until it reaches the kernel->usermode boundary. Also change thread_single(SINGLE_NO_EXIT) to only stop threads at the user boundary unconditionally. Tested by: pho Reviewed by: jhb Approved by: re (kensmith)	2009-07-14 22:52:46 +00:00
Robert Watson	14961ba789	Replace AUDIT_ARG() with variable argument macros with a set more more specific macros for each audit argument type. This makes it easier to follow call-graphs, especially for automated analysis tools (such as fxr). In MFC, we should leave the existing AUDIT_ARG() macros as they may be used by third-party kernel modules. Suggested by: brooks Approved by: re (kib) Obtained from: TrustedBSD Project MFC after: 1 week	2009-06-27 13:58:44 +00:00
Peter Holm	401679debe	vn_open_cred() needs a non NULL ucred pointer Reviewed by: kib	2009-06-23 11:29:54 +00:00
Konstantin Belousov	e0c161b89c	Add another flags argument to vn_open_cred. Use it to specify that some vn_open_cred invocations shall not audit namei path. In particular, specify VN_OPEN_NOAUDIT for dotdot lookup performed by default implementation of vop_vptocnp, and for the open done for core file. vn_fullpath is called from the audit code, and vn_open there need to disable audit to avoid infinite recursion. Core file is created on return to user mode, that, in particular, happens during syscall return. The creation of the core file is audited by direct calls, and we do not want to overwrite audit information for syscall. Reported, reviewed and tested by: rwatson	2009-06-21 13:41:32 +00:00
Robert Watson	885868cd8f	Remove VOP_LEASE and supporting functions. This hasn't been used since the removal of NQNFS, but was left in in case it was required for NFSv4. Since our new NFSv4 client and server can't use it for their requirements, GC the old mechanism, as well as other unused lease- related code and interfaces. Due to its impact on kernel programming and binary interfaces, this change should not be MFC'd. Proposed by: jeff Reviewed by: jeff Discussed with: rmacklem, zach loafman @ isilon	2009-04-10 10:52:19 +00:00
Ed Schouten	c90c9021e9	Remove even more unneeded variable assignments. kern_time.c: - Unused variable `p'. kern_thr.c: - Variable `error' is always caught immediately, so no reason to initialize it. There is no way that error != 0 at the end of create_thread(). kern_sig.c: - Unused variable `code'. kern_synch.c: - `rval' is always assigned in all different cases. kern_rwlock.c: - `v' is always overwritten with RW_UNLOCKED further on. kern_malloc.c: - `size' is always initialized with the proper value before being used. kern_exit.c: - `error' is always caught and returned immediately. abort2() never returns a non-zero value. kern_exec.c: - `len' is always assigned inside the if-statement right below it. tty_info.c: - `td' is always overwritten by FOREACH_THREAD_IN_PROC(). Found by: LLVM's scan-build	2009-02-26 15:51:54 +00:00
David Xu	7b4a950a7d	Revert rev 184216 and 184199, due to the way the thread_lock works, it may cause a lockup. Noticed by: peter, jhb	2008-11-05 03:01:23 +00:00
David Xu	3f9be10eb0	Actually, for signal and thread suspension, extra process spin lock is unnecessary, the normal process lock and thread lock are enough. The spin lock is still needed for process and thread exiting to mimic single sched_lock.	2008-10-23 07:55:38 +00:00
David Xu	904c5ec4e3	Move per-thread userland debugging flags into seperated field, this eliminates some problems of locking, e.g, a thread lock is needed but can not be used at that time. Only the process lock is needed now for new field.	2008-10-15 06:31:37 +00:00
Attilio Rao	0359a12ead	Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2008-08-28 15:23:18 +00:00
John Baldwin	da7bbd2c08	If a thread that is swapped out is made runnable, then the setrunnable() routine wakes up proc0 so that proc0 can swap the thread back in. Historically, this has been done by waking up proc0 directly from setrunnable() itself via a wakeup(). When waking up a sleeping thread that was swapped out (the usual case when waking proc0 since only sleeping threads are eligible to be swapped out), this resulted in a bit of recursion (e.g. wakeup() -> setrunnable() -> wakeup()). With sleep queues having separate locks in 6.x and later, this caused a spin lock LOR (sleepq lock -> sched_lock/thread lock -> sleepq lock). An attempt was made to fix this in 7.0 by making the proc0 wakeup use the ithread mechanism for doing the wakeup. However, this required grabbing proc0's thread lock to perform the wakeup. If proc0 was asleep elsewhere in the kernel (e.g. waiting for disk I/O), then this degenerated into the same LOR since the thread lock would be some other sleepq lock. Fix this by deferring the wakeup of the swapper until after the sleepq lock held by the upper layer has been locked. The setrunnable() routine now returns a boolean value to indicate whether or not proc0 needs to be woken up. The end result is that consumers of the sleepq API such as *sleep/wakeup, condition variables, sx locks, and lockmgr, have to wakeup proc0 if they get a non-zero return value from sleepq_abort(), sleepq_broadcast(), or sleepq_signal(). Discussed with: jeff Glanced at by: sam Tested by: Jurgen Weber jurgen - ish com au MFC after: 2 weeks	2008-08-05 20:02:31 +00:00
John Birrell	5d217f173c	Add DTrace 'proc' provider probes using the Statically Defined Trace (sdt) mechanism.	2008-05-24 06:22:16 +00:00
Jeff Roberson	b7edba7704	- Add a new td flag TDF_NEEDSUSPCHK that is set whenever a thread needs to enter thread_suspend_check(). - Set TDF_ASTPENDING along with TDF_NEEDSUSPCHK so we can move the thread_suspend_check() to ast() rather than userret(). - Check TDF_NEEDSUSPCHK in the sleepq_catch_signals() optimization so that we don't miss a suspend request. If this is set use the expensive signal path. - Set NEEDSUSPCHK when creating a new thread in thr in case the creating thread is due to be suspended as well but has not yet. Reviewed by: davidxu (Authored original patch)	2008-03-21 08:23:25 +00:00
Jeff Roberson	374ae2a393	- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.	2008-03-19 06:19:01 +00:00
Jeff Roberson	6617724c5f	Remove kernel support for M:N threading. While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.	2008-03-12 10:12:01 +00:00
Robert Watson	36b208e008	Use sbuf routines to construct core dump filenames rather than custom string buffer handling, making the code both easier to read and more robust against string-handling bugs. MFC after: 1 week	2008-03-08 16:31:29 +00:00
Robert Watson	eeccc36738	Unlock the process lock when expand_name() fails, or we may leak the process lock leading to a hang. This bug was introduced in kern_sig.c:1.351, when the call to expand_name() was moved earlier bit this particular error case was not updated.	2008-03-08 15:48:06 +00:00
Attilio Rao	22db15c06f	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>	2008-01-13 14:44:15 +00:00
Attilio Rao	cb05b60a89	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>	2008-01-10 01:10:58 +00:00
David E. O'Brien	10c2b8e128	Be more exact with sigaction SA_SIGINFO handling. Reviewed by: marcel	2007-12-18 20:39:13 +00:00
Konstantin Belousov	89b57fcf01	Fix for the panic("vm_thread_new: kstack allocation failed") and silent NULL pointer dereference in the i386 and sparc64 pmap_pinit() when the kmem_alloc_nofault() failed to allocate address space. Both functions now return error instead of panicing or dereferencing NULL. As consequence, vmspace_exec() and vmspace_unshare() returns the errno int. struct vmspace arg was added to vm_forkproc() to avoid dealing with failed allocation when most of the fork1() job is already done. The kernel stack for the thread is now set up in the thread_alloc(), that itself may return NULL. Also, allocation of the first process thread is performed in the fork1() to properly deal with stack allocation failure. proc_linkup() is separated into proc_linkup() called from fork1(), and proc_linkup0(), that is used to set up the kernel process (was known as swapper). In collaboration with: Peter Holm Reviewed by: jhb	2007-11-05 11:36:16 +00:00
Christian S.J. Peron	57274c513c	Implement AUE_CORE, which adds process core dump support into the kernel. This change introduces audit_proc_coredump() which is called by coredump(9) to create an audit record for the coredump event. When a process dumps a core, it could be security relevant. It could be an indicator that a stack within the process has been overflowed with an incorrectly constructed malicious payload or a number of other events. The record that is generated looks like this: header,111,10,process dumped core,0,Thu Oct 25 19:36:29 2007, + 179 msec argument,0,0xb,signal path,/usr/home/csjp/test.core subject,csjp,csjp,staff,csjp,staff,1101,1095,50457,10.37.129.2 return,success,1 trailer,111 - We allocate a completely new record to make sure we arent clobbering the audit data associated with the syscall that produced the core (assuming the core is being generated in response to SIGABRT and not an invalid memory access). - Shuffle around expand_name() so we can use the coredump name at the very beginning of the coredump call. Make sure we free the storage referenced by "name" if we need to bail out early. - Audit both successful and failed coredump creation efforts Obtained from: TrustedBSD Project Reviewed by: rwatson MFC after: 1 month	2007-10-26 01:23:07 +00:00
Christian S.J. Peron	5ff3816d82	Move where we audit the PID argument such that we unconditionally audit it at the beginning of the syscall. This fixes a problem where the user supplies an invalid process ID which is > 0 which results in the PID argument not being audited. Obtained from: TrustedBSD Project MFC after: 1 week	2007-10-24 00:14:19 +00:00
Jeff Roberson	6eeb364b4c	- Calling sched_nice() in tdsigwakeup() is no longer required by ULE and actually causes LORs and other panics. Reported by: mlaier Approved by: re	2007-07-19 08:49:16 +00:00
Jeff Roberson	efe641b939	- Add a missing PROC_SUNLOCK() in tdsignal()	2007-06-11 23:27:03 +00:00
Matt Jacob	9b73d2396a	Initialized ets to zero. This is arguably a gcc bug in that ets is always set to rts when timeout is non-NULL and then timevalid is set and ets is only checked later when timervalid is set.	2007-06-10 01:43:11 +00:00
Jeff Roberson	a54e85fdbf	Commit 4/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. - Move some common code into thread_suspend_switch() to handle the mechanics of suspending a thread. The locking here is incredibly convoluted and should be simplified. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)	2007-06-04 23:52:24 +00:00
Jeff Roberson	1c4bcd050a	- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)	2007-06-01 01:12:45 +00:00
Konstantin Belousov	9e223287c0	Revert UF_OPENING workaround for CURRENT. Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file. Proposed and reviewed by: jhb Reviewed by: daichi (unionfs) Approved by: re (kensmith)	2007-05-31 11:51:53 +00:00
Robert Watson	4dec0e67ea	Comment that tdsignal() may be entered from the debugger.	2007-05-23 17:27:42 +00:00
John Baldwin	aa89d8cd52	Rename the 'mtx_object', 'rw_object', and 'sx_object' members of mutexes, rwlocks, and sx locks to 'lock_object'.	2007-03-21 21:20:51 +00:00
Robert Watson	873fbcd776	Further system call comment cleanup: - Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde) - Remove extra blank lines in some cases. - Add extra blank lines in some cases. - Remove no-op comments consisting solely of the function name, the word "syscall", or the system call name. - Add punctuation. - Re-wrap some comments.	2007-03-05 13:10:58 +00:00
Robert Watson	0c14ff0eb5	Remove 'MPSAFE' annotations from the comments above most system calls: all system calls now enter without Giant held, and then in some cases, acquire Giant explicitly. Remove a number of other MPSAFE annotations in the credential code and tweak one or two other adjacent comments.	2007-03-04 22:36:48 +00:00
Xin LI	d60226bd43	Give which signal caller has attempted to deliver when panicking.	2007-02-09 17:48:28 +00:00

1 2 3 4 5 ...

486 Commits