freebsd-dev

Author	SHA1	Message	Date
Jeff Roberson	68ce4375c4	- textvp may have been from a different mountpoint than ndp->ni_vp and we may need to acquire giant to vrele it. Found by: mjacob MFC After: 3 days	2006-02-02 08:39:39 +00:00
Alan Cox	05406e6f33	Remove unneeded calls to pmap_remove_all(). The given page is not mapped. Reviewed by: tegge	2005-12-11 22:06:57 +00:00
David Xu	d26b1a1fb9	Register itimers_event_hook as a kernel event handler, so I don't have to duplicate code to call it in exec() and exit1().	2005-12-09 05:43:26 +00:00
Alan Cox	8ad398d089	Reduce the scope of the page queues lock in exec_map_first_page(). The vm object lock is sufficient for reading a page's PG_BUSY and busy flags. MFC after: 1 week	2005-12-06 07:39:36 +00:00
David Xu	6d7b314b14	Cleanup some signal interfaces. Now the tdsignal function accepts both proc pointer and thread pointer, if thread pointer is NULL, tdsignal automatically finds a thread, otherwise it sends signal to given thread. Add utility function psignal_event to send a realtime sigevent to a process according to the delivery requirement specified in struct sigevent.	2005-11-03 04:49:16 +00:00
Paul Saab	1471f287e1	Calling setrlimit from 32bit apps could potentially increase certain limits beyond what should be capiable in a 32bit process, so we must fixup the limits. Reviewed by: jhb	2005-11-02 21:18:07 +00:00
David Xu	60354683d9	Make p_itimers as a pointer, so file sys/proc.h does not need to include sys/timers.h.	2005-10-23 12:19:08 +00:00
David Xu	86857b368d	Implement POSIX timers. Current only CLOCK_REALTIME and CLOCK_MONOTONIC clock are supported. I have plan to merge XSI timer ITIMER_REAL and other two CPU timers into the new code, current three slots are available for the XSI timers. The SIGEV_THREAD notification type is not supported yet because our sigevent struct lacks of two member fields: sigev_notify_function sigev_notify_attributes I have found the sigevent is used in AIO, so I won't add the two members unless the AIO code is adjusted.	2005-10-23 04:22:56 +00:00
David Xu	9104847f21	1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most changes in MD code are trivial, before this change, trapsignal and sendsig use discrete parameters, now they uses member fields of ksiginfo_t structure. For sendsig, this change allows us to pass POSIX realtime signal value to user code. 2. Remove cpu_thread_siginfo, it is no longer needed because we now always generate ksiginfo_t data and feed it to libpthread. 3. Add p_sigqueue to proc structure to hold shared signals which were blocked by all threads in the proc. 4. Add td_sigqueue to thread structure to hold all signals delivered to thread. 5. i386 and amd64 now return POSIX standard si_code, other arches will be fixed. 6. In this sigqueue implementation, pending signal set is kept as before, an extra siginfo list holds additional siginfo_t data for signals. kernel code uses psignal() still behavior as before, it won't be failed even under memory pressure, only exception is when deleting a signal, we should call sigqueue_delete to remove signal from sigqueue but not SIGDELSET. Current there is no kernel code will deliver a signal with additional data, so kernel should be as stable as before, a ksiginfo can carry more information, for example, allow signal to be delivered but throw away siginfo data if memory is not enough. SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can not be caught or masked. The sigqueue() syscall allows user code to queue a signal to target process, if resource is unavailable, EAGAIN will be returned as specification said. Just before thread exits, signal queue memory will be freed by sigqueue_flush. Current, all signals are allowed to be queued, not only realtime signals. Earlier patch reviewed by: jhb, deischen Tested on: i386, amd64	2005-10-14 12:43:47 +00:00
Diomidis Spinellis	9f5c1d1955	Move execve's access time update functionality into a new vfs_mark_atime() function, and use the new function for performing efficient atime updates in mmap(). Reviewed by: bde MFC after: 2 weeks	2005-10-12 06:56:00 +00:00
Don Lewis	34ea500bea	Add missing word to comment.	2005-10-04 04:02:33 +00:00
Colin Percival	33812c066d	If sufficiently bad things happen during a call to kern_execve(), it is possible for do_execve() to call exit1() rather than returning. As a result, the sequence "allocate memory; call kern_execve; free memory" can end up leaking memory. This commit documents this astonishing behaviour and adds a call to exec_free_args() before the exit1() call in do_execve(). Since all the users of kern_execve() in the tree use exec_free_args() to free the command-line arguments after kern_execve() returns, this should be safe, and it fixes the memory leak which can otherwise occur. Submitted by: Peter Holm MFC after: 3 days Security: Local denial of service	2005-10-03 12:49:54 +00:00
Don Lewis	5997cae9a4	Copy new process argument list in do_execve() before grabbing PROC_LOCK to avoid touching pageable memory while holding a mutex. Simplify argument list replacement logic. PR: kern/84935 Submitted by: "Antoine Pelisse" apelisse AT gmail.com (in a different form) MFC after: 3 days	2005-10-01 08:33:56 +00:00
Joseph Koshy	151392465f	MFP4: - pmcstat(8) gprof output mode fixes: lib/libpmc/pmclog.{c,h}, sys/sys/pmclog.h: + Add a 'is_usermode' field to the PMCLOG_PCSAMPLE event + Add an 'entryaddr' field to the PMCLOG_PROCEXEC event, so that pmcstat(8) can determine where the runtime loader /libexec/ld-elf.so.1 is getting loaded. sys/kern/kern_exec.c: + Use a local struct to group the entry address of the image being exec()'ed and the process credential changed flag to the exec handling hook inside hwpmc(4). usr.sbin/pmcstat/*: + Support "-k kernelpath", "-D sampledir". + Implement the ELF bits of 'gmon.out' profile generation in a new file "pmcstat_log.c". Move all log related functions to this file. + Move local definitions and prototypes to "pmcstat.h" - Other bug fixes: + lib/libpmc/pmclog.c: correctly handle EOF in pmclog_read(). + sys/dev/hwpmc_mod.c: unconditionally log a PROCEXIT event to all attached PMCs when a process exits. + sys/sys/pmc.h: correct a function prototype. + Improve usage checks in pmcstat(8). Approved by: re (blanket hwpmc)	2005-06-30 19:01:26 +00:00
Peter Wemm	4da0d332f4	Move HWPMC_HOOKS into its own opt_hwpmc_hooks.h file. It doesn't merit being in opt_global.h and forcing a global recompile when only a few files reference it. Approved by: re	2005-06-24 00:16:57 +00:00
Joseph Koshy	f263522a45	MFP4: - Implement sampling modes and logging support in hwpmc(4). - Separate MI and MD parts of hwpmc(4) and allow sharing of PMC implementations across different architectures. Add support for P4 (EMT64) style PMCs to the amd64 code. - New pmcstat(8) options: -E (exit time counts) -W (counts every context switch), -R (print log file). - pmc(3) API changes, improve our ability to keep ABI compatibility in the future. Add more 'alias' names for commonly used events. - bug fixes & documentation.	2005-06-09 19:45:09 +00:00
Ken Smith	6341095e0d	This patch addresses a standards violation issue. The standards say a file's access time should be updated when it gets executed. A while ago the mechanism used to exec was changed to use a more mmap based mechanism and this behavior was broken as a side-effect of that. A new vnode flag is added that gets set when the file gets executed, and the VOP_SETATTR() vnode operation gets called. The underlying filesystem is expected to handle it based on its own semantics, some filesystems don't support access time at all. Those that do should handle it in a way that does not block, does not generate I/O if possible, etc. In particular vn_start_write() has not been called. The UFS code handles it the same way as it would normally handle the access time if a file was read - the IN_ACCESS flag gets set in the inode but no other action happens at this point. The actual time update will happen later during a sync (which handles all the necessary locking). Got me into this: cperciva Discussed with: a lot with bde, a little with kan Showed patches to: phk, jeffr, standards@, arch@ Minor discussion on: arch@	2005-05-31 19:39:52 +00:00
Jeff Roberson	532c491b2b	- Initialize vfslocked correctly early enough for MAC to compile. - Fix one place where we explicitly drop Giant! Pointy hat to: me Submitted by: Max Laier Warned by: Tinderbox	2005-05-03 16:24:59 +00:00
Jeff Roberson	269576c82a	- Use namei to acquire Giant for VFS if it is necessary. Drop the explicit Giant acquisition. - Remove GIANT_REQUIRED in the few remaining cases; the vm and vfs have both been locked.	2005-05-03 10:55:05 +00:00
Jeff Roberson	4d7d299383	- Return EACCES if we're trying to exec on a vp with no object. Errno supplied by: cperciva	2005-05-01 00:58:19 +00:00
Jeff Roberson	7625cbf3cc	- Pass the ISOPEN flag to namei so filesystems will know we're about to open them or otherwise access the data.	2005-04-27 09:05:19 +00:00
Joseph Koshy	ebccf1e3a6	Bring a working snapshot of hwpmc(4), its associated libraries, userland utilities and documentation into -CURRENT. Bump FreeBSD_version. Reviewed by: alc, jhb (kernel changes)	2005-04-19 04:01:25 +00:00
Maxim Sobolev	90dc539be0	Welcome to the 21st century: increase MAXSHELLCMDLEN from 128 bytes to PAGE_SIZE. Unlike originator of the PR suggests retain MAXSHELLCMDLEN definition (he has been proposing to replace it with PAGE_SIZE everywhere), not only this reduced the diff significantly, but prevents code obfuscation and also allows to increase/decrease this parameter easily if needed. PR: kern/64196 Submitted by: Magnus Bäckström <b@etek.chalmers.se>	2005-02-25 11:49:42 +00:00
Maxim Sobolev	56c2262c0e	Grrr, this committer needs to have a sleep. Remove lines from the previous delta not intended for public consumption. MFC after: 2 weeks	2005-01-29 23:51:05 +00:00
Maxim Sobolev	c30af53213	Fix small non-conformance introduced in the previous commit: execve() is expected to return ENAMETOOLONG, not E2BIG if first argument doesn't fit into {PATH_MAX} bytes. MFC after: 2 weeks	2005-01-29 23:47:36 +00:00
Maxim Sobolev	610ecfe035	o Split out kernel part of execve(2) syscall into two parts: one that copies arguments into the kernel space and one that operates completely in the kernel space; o use kernel-only version of execve(2) to kill another stackgap in linuxlator/i386. Obtained from: DragonFlyBSD (partially) MFC after: 2 weeks	2005-01-29 23:12:00 +00:00
Poul-Henning Kamp	8516dd18e1	Don't use VOP_GETVOBJECT, use vp->v_object directly.	2005-01-25 00:40:01 +00:00
Warner Losh	9454b2d864	/* -> /*- for copyright notices, minor format tweaks as necessary	2005-01-06 23:35:40 +00:00
Poul-Henning Kamp	c113083c5a	Add new function fdunshare() which encapsulates the necessary light magic for ensuring that a process' filedesc is not shared with anybody. Use it in the two places which previously had private implmentations. This collects all fd_refcnt handling in kern_descrip.c	2004-12-14 07:20:03 +00:00
David Schultz	6004362e66	Don't include sys/user.h merely for its side-effect of recursively including other headers.	2004-11-27 06:51:39 +00:00
Poul-Henning Kamp	124e4c3be8	Introduce an alias for FILEDESC_{UN}LOCK() with the suffix _FAST. Use this in all the places where sleeping with the lock held is not an issue. The distinction will become significant once we finalize the exact lock-type to use for this kind of case.	2004-11-13 11:53:02 +00:00
Poul-Henning Kamp	598b7ec86b	Use more intuitive pointer for fdinit() and fdcopy(). Change fdcopy() to take unlocked filedesc.	2004-11-08 12:43:23 +00:00
Peter Wemm	a7bc3102c4	Put on my peril sensitive sunglasses and add a flags field to the internal sysctl routines and state. Add some code to use it for signalling the need to downconvert a data structure to 32 bits on a 64 bit OS when requested by a 32 bit app. I tried to do this in a generic abi wrapper that intercepted the sysctl oid's, or looked up the format string etc, but it was a real can of worms that turned into a fragile mess before I even got it partially working. With this, we can now run 'sysctl -a' on a 32 bit sysctl binary and have it not abort. Things like netstat, ps, etc have a long way to go. This also fixes a bug in the kern.ps_strings and kern.usrstack hacks. These do matter very much because they are used by libc_r and other things.	2004-10-11 22:04:16 +00:00
David Xu	84e0b075f6	Add an execve command for kse_thr_interrupt to allow libpthread to restore signal mask correctly, this is required by POSIX. Reviewed by: deischen	2004-10-07 13:50:10 +00:00
David Xu	906ac69d08	In original kern_execve() code, at the start of the function, it forces all other threads to suicide, problem is execve() could be failed, and a failed execve() would change threaded process to unthreaded, this side effect is unexpected. The new code introduces a new single threading mode SINGLE_BOUNDARY, in the mode, all threads should suspend themself at user boundary except the singler. we can not use SINGLE_NO_EXIT because we want to start from a clean state if execve() is successful, suspending other threads at unknown point and later resuming them from there and forcing them to exit at user boundary may cause the process to start from a dirty state. If execve() is successful, current thread upgrades to SINGLE_EXIT mode and forces other threads to suicide at user boundary, otherwise, other threads will be resumed and their interrupted syscall will be restarted. Reviewed by: julian	2004-10-06 00:40:41 +00:00
John Baldwin	63993cf011	- Don't try to unlock Giant if single threading fails since we don't have it locked. - Unlock Giant before calling exit1() since exit1() does not require Giant.	2004-09-23 21:01:50 +00:00
Julian Elischer	2e2e32b201	Revert the last change.. Better to kill all other threads than to panic the system if 2 threads call execve() at the same time. A better fix will be committed later. Note that this only affects the case where the execve fails.	2004-09-22 01:30:23 +00:00
Julian Elischer	297800599a	In a threaded process, don't kill off all the other threads until we have a reasonable chance that the eceve() is going to succeeed. I.e. wait until we've done the permission checks etc. MFC after: 1 week	2004-09-21 21:05:13 +00:00
Julian Elischer	ed062c8d66	Refactor a bunch of scheduler code to give basically the same behaviour but with slightly cleaned up interfaces. The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time. The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own. Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens. Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure. The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring. A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated. Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can. Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week	2004-09-05 02:09:54 +00:00
John-Mark Gurney	ad3b9257c2	Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)	2004-08-15 06:24:42 +00:00
Colin Percival	56f21b9d74	Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is somewhat clearer, but more importantly allows for a consistent naming scheme for suser_cred flags. The old name is still defined, but will be removed in a few days (unless I hear any complaints...) Discussed with: rwatson, scottl Requested by: jhb	2004-07-26 07:24:04 +00:00
Julian Elischer	aa3c8c02ae	White space fix.. diff reduction for upcoming commit.	2004-07-24 04:57:41 +00:00
Alan Cox	ce8da3091f	Push down the acquisition and release of the page queues lock into pmap_remove_pages(). (The implementation of pmap_remove_pages() is optional. If pmap_remove_pages() is unimplemented, the acquisition and release of the page queues lock is unnecessary.) Remove spl calls from the alpha, arm, and ia64 pmap_remove_pages().	2004-07-13 02:49:22 +00:00
Tim J. Robbins	aa0aa7a113	Move TDF_SA from td_flags to td_pflags (and rename it accordingly) so that it is no longer necessary to hold sched_lock while manipulating it. Reviewed by: davidxu	2004-06-02 07:52:36 +00:00
David Xu	702ac0f112	Clear KSE thread flags after KSE thread mode is ended. The side effect of not clearing the flags for execv() syscall will result that a new program runs in KSE thread mode without enabling it. Submitted by: tjr Modified by: davidxu	2004-05-21 14:50:23 +00:00
Alan Cox	59c8bc40ce	Utilize sf_buf_alloc() rather than pmap_qenter() (and sometimes kmem_alloc_wait()) for mapping the image header. On all machines with a direct virtual-to-physical mapping and SMP/HTT i386s, this is a clear win.	2004-04-23 03:01:40 +00:00
Alan Cox	148b3f62a9	Use vm_page_hold() rather than vm_page_wire() for short-duration page wiring. The reason being that vm_page_hold() is cheaper.	2004-04-11 19:57:11 +00:00
Pawel Jakub Dawidek	2fc0588da2	Remove sysctl kern.ps_argsopen, it is not very useful, one should use security.bsd.see_other_uids instead. Discussed with: phk, rwatson	2004-04-01 00:10:45 +00:00
Peter Wemm	a5bdcb2a2f	Make the process_exit eventhandler run without Giant. Add Giant hooks in the two consumers that need it.. processes using AIO and netncp. Update docs. Say that process_exec is called with Giant, but not to depend on it. All our consumers can handle it without Giant.	2004-03-14 02:06:28 +00:00
Peter Wemm	37814395c1	Push Giant down a little further: - no longer serialize on Giant for thread_single*() and family in fork, exit and exec - thread_wait() is mpsafe, assert no Giant - reduce scope of Giant in exit to not cover thread_wait and just do vm_waitproc(). - assert that thread_single() family are not called with Giant - remove the DROP/PICKUP_GIANT macros from thread_single() family - assert that thread_suspend_check() s not called with Giant - remove manual drop_giant hack in thread_suspend_check since we know it isn't held. - remove the DROP/PICKUP_GIANT macros from thread_suspend_check() family - mark kse_create() mpsafe	2004-03-13 22:31:39 +00:00
Ruslan Ermilov	7700eb86e7	Do what the execve(2) manpage says and enforce what a Strictly Conforming POSIX application should do by disallowing the argv argument to be NULL. PR: kern/33738 Submitted by: Marc Olzheim, Serge van den Boom OK'ed by: nectar	2004-03-12 21:06:20 +00:00
John Baldwin	8144e3b884	Lock Giant around the single threading code in exec() to satisfy an assertion in the single threading code.	2004-03-05 22:38:26 +00:00
Peter Wemm	df7c361e64	Checkpoint a hack to enable running i386 libc_r binaries on a 64 bit kernel. I'm not happy with it yet - refinements are to come. This hack allows the kern.ps_strings and kern.usrstack sysctls to respond to a 32 bit request, such as those coming from emulated i386 binaries.	2004-02-18 00:54:17 +00:00
Bruce Evans	d6c847f378	Fixed some style bugs (mainly, try to always use explicit comparisons with NULL when checking for null pointers).	2003-12-28 04:37:59 +00:00
Bruce Evans	ca46e90ef4	Fixed some disordering in revs.1.194 and 1,196. Moved the exceve() syscall function back to near the beginning of the file. Rev.1.194 moved it into the middle of auxiliary functions following kern_execve(). Moved the __mac_execve() syscall function up together with execve(). It was new in rev1.1.196 and perfectly misplaced after execve().	2003-12-28 04:18:13 +00:00
Alan Cox	34d2675761	Remove GIANT_REQUIRED from exec_unmap_first_page().	2003-12-27 19:40:03 +00:00
Robert Watson	eca8a663d4	Modify the MAC Framework so that instead of embedding a (struct label) in various kernel objects to represent security data, we embed a (struct label *) pointer, which now references labels allocated using a UMA zone (mac_label.c). This allows the size and shape of struct label to be varied without changing the size and shape of these kernel objects, which become part of the frozen ABI with 5-STABLE. This opens the door for boot-time selection of the number of label slots, and hence changes to the bound on the number of simultaneous labeled policies at boot-time instead of compile-time. This also makes it easier to embed label references in new objects as required for locking/caching with fine-grained network stack locking, such as inpcb structures. This change also moves us further in the direction of hiding the structure of kernel objects from MAC policy modules, not to mention dramatically reducing the number of '&' symbols appearing in both the MAC Framework and MAC policy modules, and improving readability. While this results in minimal performance change with MAC enabled, it will observably shrink the size of a number of critical kernel data structures for the !MAC case, and should have a small (but measurable) performance benefit (i.e., struct vnode, struct socket) do to memory conservation and reduced cost of zeroing memory. NOTE: Users of MAC must recompile their kernel and all MAC modules as a result of this change. Because this is an API change, third party MAC modules will also need to be updated to make less use of the '&' symbol. Suggestions from: bmilekic Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-12 03:14:31 +00:00
Marcel Moolenaar	9ee99eb496	Remove md_bspstore from the MD fields of struct thread. Now that the backing store is at a fixed address, there's no need for a per-thread variable.	2003-10-21 01:13:49 +00:00
Marcel Moolenaar	bab1f05277	Put the RSE backing store at a fixed address. This change is triggered by libguile that needs to know the base of the RSE backing store. We currently do not export the fixed address to userland by means of a sysctl so user code needs to hardcode it for now. This will be revisited later. The RSE backing store is now at the bottom of region 4. The memory stack is at the top of region 4. This means that the whole region is usable for the stacks, giving a 61-bit stack space. Port: lang/guile (depended of x11/gnome2)	2003-10-20 05:34:10 +00:00
Alan Cox	6ec2fca505	Eliminate some unnecessary uses of the vm page queues lock around the vm page's valid field. This field is being synchronized using the containing vm object's lock.	2003-10-04 22:47:20 +00:00
Marcel Moolenaar	c31f2280ed	Remove the regstkpages sysctl variable. We have a growable register stack now.	2003-09-27 23:07:47 +00:00
Marcel Moolenaar	fd75d71049	Part 2 of implementing rstacks: add the ability to create rstacks and use the ability on ia64 to map the register stack. The orientation of the stack (i.e. its grow direction) is passed to vm_map_stack() in the overloaded cow argument. Since the grow direction is represented by bits, it is possible and allowed to create bi-directional stacks. This is not an advertised feature, more of a side-effect. Fix a bug in vm_map_growstack() that's specific to rstacks and which we could only find by having the ability to create rstacks: when the mapped stack ends at the faulting address, we have not actually mapped the faulting address. we need to include or cover the faulting address. Note that at this time mmap(2) has not been extended to allow the creation of rstacks by processes. If such a need arises, this can be done. Tested on: alpha, i386, ia64, sparc64	2003-09-27 22:28:14 +00:00
Peter Wemm	c460ac3a00	Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bit systems where the data/stack/etc limits are too big for a 32 bit process. Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c. Supply an ia32_fixlimits function. Export the clip/default values to sysctl under the compat.ia32 heirarchy. Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max value rather than the sysctl tweakable variable. This allows mmap to place mappings at sensible locations when limits have been reduced. Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same method as mmap(0, ...) now does. Note that we cannot remove all references to the sysctl tweakable maxdsiz etc variables because /etc/login.conf specifies a datasize of 'unlimited'. And that causes exec etc to fail since it can no longer find space to mmap things.	2003-09-25 01:10:26 +00:00
Poul-Henning Kamp	a8d43c90af	Add a "int fd" argument to VOP_OPEN() which in the future will contain the filedescriptor number on opens from userland. The index is used rather than a "struct file " since it conveys a bit more information, which may be useful to in particular fdescfs and /dev/fd/ For now pass -1 all over the place.	2003-07-26 07:32:23 +00:00
David Xu	0e2a4d3aeb	Rename P_THREADED to P_SA. P_SA means a process is using scheduler activations.	2003-06-15 00:31:24 +00:00
Alan Cox	8630c1173e	Add vm object locking to various pagers' "get pages" methods, i386 stack management functions, and a u area management function.	2003-06-13 03:02:28 +00:00
David E. O'Brien	677b542ea2	Use __FBSDID().	2003-06-11 00:56:59 +00:00
Alan Cox	06fa71cdcc	Update the vm object and page locking in exec_map_first_page(). Mark the one still anticipated change with XXX. Otherwise, this function is done.	2003-06-09 19:37:14 +00:00
Alan Cox	fd0cc9a862	Lock the vm object when performing vm_page_grab().	2003-06-08 07:14:30 +00:00
John Baldwin	90af4afacb	- Merge struct procsig with struct sigacts. - Move struct sigacts out of the u-area and malloc() it using the M_SUBPROC malloc bucket. - Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(), sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared(). - Remove the p_sigignore, p_sigacts, and p_sigcatch macros. - Add a mutex to struct sigacts that protects all the members of the struct. - Add sigacts locking. - Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now that sigacts is locked. - Several in-kernel functions such as psignal(), tdsignal(), trapsignal(), and thread_stopped() are now MP safe. Reviewed by: arch@ Approved by: re (rwatson)	2003-05-13 20:36:02 +00:00
Jeff Roberson	2c10d16a4b	- Borrow the KSE single threading code for exec and exit. We use the check if (p->p_numthreads > 1) and not a flag because action is only necessary if there are other threads. The rest of the system has no need to identify thr threaded processes. - In kern_thread.c use thr_exit1() instead of thread_exit() if P_THREADED is not set.	2003-04-01 01:26:20 +00:00
John Baldwin	75b8b3b25c	Replace the at_fork, at_exec, and at_exit functions with the slightly more flexible process_fork, process_exec, and process_exit eventhandlers. This reduces code duplication and also means that I don't have to go duplicate the eventhandler locking three more times for each of at_fork, at_exec, and at_exit. Reviewed by: phk, jake, almost complete silence on arch@	2003-03-24 21:15:35 +00:00
John Baldwin	a5881ea55a	- Cache a reference to the credential of the thread that starts a ktrace in struct proc as p_tracecred alongside the current cache of the vnode in p_tracep. This credential is then used for all later ktrace operations on this file rather than using the credential of the current thread at the time of each ktrace event. - Now that we have multiple ktrace-related items in struct proc that are pointers, rename p_tracep to p_tracevp to make it less ambiguous. Requested by: rwatson (1)	2003-03-13 18:24:22 +00:00
Julian Elischer	ac2e415327	Change the process flags P_KSES to be P_THREADED. This is just a cosmetic change but I've been meaning to do it for about a year.	2003-02-27 02:05:19 +00:00
Warner Losh	a163d034fa	Back out M_* changes, per decision of the TRB. Approved by: trb	2003-02-19 05:47:46 +00:00
Jeff Roberson	5215b1872f	- Split the struct kse into struct upcall and struct kse. struct kse will soon be visible only to schedulers. This greatly simplifies much the KSE code. Submitted by: davidxu	2003-02-17 05:14:26 +00:00
Julian Elischer	6f8132a867	Reversion of commit by Davidxu plus fixes since applied. I'm not convinced there is anything major wrong with the patch but them's the rules.. I am using my "David's mentor" hat to revert this as he's offline for a while.	2003-02-01 12:17:09 +00:00
David Xu	0dbb100b9b	Move UPCALL related data structure out of kse, introduce a new data structure called kse_upcall to manage UPCALL. All KSE binding and loaning code are gone. A thread owns an upcall can collect all completed syscall contexts in its ksegrp, turn itself into UPCALL mode, and takes those contexts back to userland. Any thread without upcall structure has to export their contexts and exit at user boundary. Any thread running in user mode owns an upcall structure, when it enters kernel, if the kse mailbox's current thread pointer is not NULL, then when the thread is blocked in kernel, a new UPCALL thread is created and the upcall structure is transfered to the new UPCALL thread. if the kse mailbox's current thread pointer is NULL, then when a thread is blocked in kernel, no UPCALL thread will be created. Each upcall always has an owner thread. Userland can remove an upcall by calling kse_exit, when all upcalls in ksegrp are removed, the group is atomatically shutdown. An upcall owner thread also exits when process is in exiting state. when an owner thread exits, the upcall it owns is also removed. KSE is a pure scheduler entity. it represents a virtual cpu. when a thread is running, it always has a KSE associated with it. scheduler is free to assign a KSE to thread according thread priority, if thread priority is changed, KSE can be moved from one thread to another. When a ksegrp is created, there is always N KSEs created in the group. the N is the number of physical cpu in the current system. This makes it is possible that even an userland UTS is single CPU safe, threads in kernel still can execute on different cpu in parallel. Userland calls kse_create to add more upcall structures into ksegrp to increase concurrent in userland itself, kernel is not restricted by number of upcalls userland provides. The code hasn't been tested under SMP by author due to lack of hardware. Reviewed by: julian	2003-01-26 11:41:35 +00:00
Alfred Perlstein	44956c9863	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.	2003-01-21 08:56:16 +00:00
Robert Watson	ec35c2af68	Perform VOP_GETATTR() before mac_check_vnode_exec() so that the cached attributes are available to MAC modules. Submitted by: mike halderman <mrh@nosc.mil> Obtained from: TrustedBSD Project	2003-01-21 03:26:28 +00:00
Matthew Dillon	3db161e079	It is possible for an active aio to prevent shared memory from being dereferenced when a process exits due to the vmspace ref-count being bumped. Change shmexit() and shmexit_myhook() to take a vmspace instead of a process and call it in vmspace_dofree(). This way if it is missed in exit1()'s early-resource-free it will still be caught when the zombie is reaped. Also fix a potential race in shmexit_myhook() by NULLing out vmspace->vm_shm prior to calling shm_delete_mapping() and free(). MFC after: 7 days	2003-01-13 23:04:32 +00:00
David Xu	45f603e21c	Clear some KSE fields after kse mode was turned off.	2003-01-07 06:56:43 +00:00
Jake Burkholder	5dadd17b08	Add a sysctl to get the vm protections for the stack of the current process. On architectures with a non-executable stack, eg sparc64, this is used by libgcc to determine at runtime if its necessary to enable execute permissions on a region of the stack which will be used to execute code, allowing the call to mprotect to be avoided if the kernel is configured to map the stack executable.	2003-01-04 07:54:23 +00:00
Alfred Perlstein	c522c1bf4b	fdcopy() only needs a filedesc pointer.	2003-01-01 01:19:31 +00:00
Alan Cox	ee113343eb	Hold the page queues lock when performing vm_page_busy().	2002-12-18 20:16:22 +00:00
Alfred Perlstein	b80521fee5	remove syscallarg(). Suggested by: peter	2002-12-14 02:07:32 +00:00
Robert Drehmel	d1989db545	To avoid sleeping with all sorts of resources acquired (the reported problem was a locked directory vnode), do not give the process a chance to sleep in state "stopevent" (depends on the S_EXEC bit being set in p_stops) until most resources have been released again. Approved by: re	2002-11-26 17:30:55 +00:00
Alan Cox	2d21129db2	Acquire and release the page queues lock around pmap_remove_pages() because it updates several of vm_page's fields.	2002-11-25 04:37:44 +00:00
Jeff Roberson	a9a088823e	- Release the imgp vnode prior to freeing exec_map resources to avoid deadlock.	2002-11-17 09:33:00 +00:00
Alan Cox	4fec79bef8	Now that pmap_remove_all() is exported by our pmap implementations use it directly.	2002-11-16 07:44:25 +00:00
Alan Cox	d154fb4fe6	When prot is VM_PROT_NONE, call pmap_page_protect() directly rather than indirectly through vm_page_protect(). The one remaining page flag that is updated by vm_page_protect() is already being updated by our various pmap implementations. Note: A later commit will similarly change the VM_PROT_READ case and eliminate vm_page_protect().	2002-11-10 07:12:04 +00:00
Robert Watson	0c93266b9c	Correct merge-o: disable the right execve() variation if !MAC	2002-11-05 18:04:50 +00:00
Robert Watson	670cb89bf4	Bring in two sets of changes: (1) Permit userland applications to request a change of label atomic with an execve() via mac_execve(). This is required for the SEBSD port of SELinux/FLASK. Attempts to invoke this without MAC compiled in result in ENOSYS, as with all other MAC system calls. Complexity, if desired, is present in policy modules, rather than the framework. (2) Permit policies to have access to both the label of the vnode being executed as well as the interpreter if it's a shell script or related UNIX nonsense. Because we can't hold both vnode locks at the same time, cache the interpreter label. SEBSD relies on this because it supports secure transitioning via shell script executables. Other policies might want to take both labels into account during an integrity or confidentiality decision at execve()-time. Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-11-05 17:51:56 +00:00
Robert Watson	ccafe7eb35	Hook up the mac_will_execve_transition() and mac_execve_transition() entrypoints, #ifdef MAC. The supporting logic already existed in kern_mac.c, so no change there. This permits MAC policies to cause a process label change as the result of executing a binary -- typically, as a result of executing a specially labeled binary. For example, the SEBSD port of SELinux/FLASK uses this functionality to implement TE type transitions on processes using transitioning binaries, in a manner similar to setuid. Policies not implementing a notion of transition (all the ones in the tree right now) require no changes, since the old label data is copied to the new label via mac_create_cred() even if a transition does occur. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-11-05 14:57:49 +00:00
Robert Watson	450ffb4427	Remove reference to struct execve_args from struct imgact, which describes an image activation instance. Instead, make use of the existing fname structure entry, and introduce two new entries, userspace_argv, and userspace_envv. With the addition of mac_execve(), this divorces the image structure from the specifics of the execve() system call, removes a redundant pointer, etc. No semantic change from current behavior, but it means that the structure doesn't depend on syscalls.master-generated includes. There seems to be some redundant initialization of imgact entries, which I have maintained, but which could probably use some cleaning up at some point. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-11-05 01:59:56 +00:00
John Baldwin	e1b1aa3bc2	- Move the 'done1' label down below the unlock of the proc lock and move the locking of the proc lock after the goto to done1 to avoid locking the lock in an error case just so we can turn around and unlock it. - Move the exec_setregs() stuff out from under the proc lock and after the p_args stuff. This allows exec_setregs() to be able to sleep or write things out to userland, etc. which ia64 does. Tested by: peter	2002-10-11 21:04:01 +00:00
Jake Burkholder	05ba50f522	Use the fields in the sysentvec and in the vm map header in place of the constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS. This is mainly so that they can be variable even for the native abi, based on different machine types. Get stack protections from the sysentvec too. This makes it trivial to map the stack non-executable for certain abis, on machines that support it.	2002-09-21 22:07:17 +00:00
Nate Lawson	c1e2d3866f	Move setugidsafety() call outside of process lock. This prevents a lock recursion when closef() calls pfind() which also wants the proc lock. This case only occurred when setugidsafety() needed to close unsafe files. Reviewed by: truckman	2002-09-14 18:55:11 +00:00
Don Lewis	28b325aa60	Drop the proc lock while calling fdcheckstd() which may block to allocate memory. Reviewed by: jhb	2002-09-13 09:31:56 +00:00
David Xu	1279572a92	s/SGNL/SIG/ s/SNGL/SINGLE/ s/SNGLE/SINGLE/ Fix abbreviation for P_STOPPED_* etc flags, in original code they were inconsistent and difficult to distinguish between them. Approved by: julian (mentor)	2002-09-05 07:30:18 +00:00

1 2 3 4 5 ...

338 Commits