freebsd-skq

Author	SHA1	Message	Date
David Xu	906ac69d08	In original kern_execve() code, at the start of the function, it forces all other threads to suicide, problem is execve() could be failed, and a failed execve() would change threaded process to unthreaded, this side effect is unexpected. The new code introduces a new single threading mode SINGLE_BOUNDARY, in the mode, all threads should suspend themself at user boundary except the singler. we can not use SINGLE_NO_EXIT because we want to start from a clean state if execve() is successful, suspending other threads at unknown point and later resuming them from there and forcing them to exit at user boundary may cause the process to start from a dirty state. If execve() is successful, current thread upgrades to SINGLE_EXIT mode and forces other threads to suicide at user boundary, otherwise, other threads will be resumed and their interrupted syscall will be restarted. Reviewed by: julian	2004-10-06 00:40:41 +00:00
John Baldwin	63993cf011	- Don't try to unlock Giant if single threading fails since we don't have it locked. - Unlock Giant before calling exit1() since exit1() does not require Giant.	2004-09-23 21:01:50 +00:00
Julian Elischer	2e2e32b201	Revert the last change.. Better to kill all other threads than to panic the system if 2 threads call execve() at the same time. A better fix will be committed later. Note that this only affects the case where the execve fails.	2004-09-22 01:30:23 +00:00
Julian Elischer	297800599a	In a threaded process, don't kill off all the other threads until we have a reasonable chance that the eceve() is going to succeeed. I.e. wait until we've done the permission checks etc. MFC after: 1 week	2004-09-21 21:05:13 +00:00
Julian Elischer	ed062c8d66	Refactor a bunch of scheduler code to give basically the same behaviour but with slightly cleaned up interfaces. The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time. The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own. Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens. Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure. The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring. A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated. Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can. Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week	2004-09-05 02:09:54 +00:00
John-Mark Gurney	ad3b9257c2	Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)	2004-08-15 06:24:42 +00:00
Colin Percival	56f21b9d74	Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is somewhat clearer, but more importantly allows for a consistent naming scheme for suser_cred flags. The old name is still defined, but will be removed in a few days (unless I hear any complaints...) Discussed with: rwatson, scottl Requested by: jhb	2004-07-26 07:24:04 +00:00
Julian Elischer	aa3c8c02ae	White space fix.. diff reduction for upcoming commit.	2004-07-24 04:57:41 +00:00
Alan Cox	ce8da3091f	Push down the acquisition and release of the page queues lock into pmap_remove_pages(). (The implementation of pmap_remove_pages() is optional. If pmap_remove_pages() is unimplemented, the acquisition and release of the page queues lock is unnecessary.) Remove spl calls from the alpha, arm, and ia64 pmap_remove_pages().	2004-07-13 02:49:22 +00:00
Tim J. Robbins	aa0aa7a113	Move TDF_SA from td_flags to td_pflags (and rename it accordingly) so that it is no longer necessary to hold sched_lock while manipulating it. Reviewed by: davidxu	2004-06-02 07:52:36 +00:00
David Xu	702ac0f112	Clear KSE thread flags after KSE thread mode is ended. The side effect of not clearing the flags for execv() syscall will result that a new program runs in KSE thread mode without enabling it. Submitted by: tjr Modified by: davidxu	2004-05-21 14:50:23 +00:00
Alan Cox	59c8bc40ce	Utilize sf_buf_alloc() rather than pmap_qenter() (and sometimes kmem_alloc_wait()) for mapping the image header. On all machines with a direct virtual-to-physical mapping and SMP/HTT i386s, this is a clear win.	2004-04-23 03:01:40 +00:00
Alan Cox	148b3f62a9	Use vm_page_hold() rather than vm_page_wire() for short-duration page wiring. The reason being that vm_page_hold() is cheaper.	2004-04-11 19:57:11 +00:00
Pawel Jakub Dawidek	2fc0588da2	Remove sysctl kern.ps_argsopen, it is not very useful, one should use security.bsd.see_other_uids instead. Discussed with: phk, rwatson	2004-04-01 00:10:45 +00:00
Peter Wemm	a5bdcb2a2f	Make the process_exit eventhandler run without Giant. Add Giant hooks in the two consumers that need it.. processes using AIO and netncp. Update docs. Say that process_exec is called with Giant, but not to depend on it. All our consumers can handle it without Giant.	2004-03-14 02:06:28 +00:00
Peter Wemm	37814395c1	Push Giant down a little further: - no longer serialize on Giant for thread_single*() and family in fork, exit and exec - thread_wait() is mpsafe, assert no Giant - reduce scope of Giant in exit to not cover thread_wait and just do vm_waitproc(). - assert that thread_single() family are not called with Giant - remove the DROP/PICKUP_GIANT macros from thread_single() family - assert that thread_suspend_check() s not called with Giant - remove manual drop_giant hack in thread_suspend_check since we know it isn't held. - remove the DROP/PICKUP_GIANT macros from thread_suspend_check() family - mark kse_create() mpsafe	2004-03-13 22:31:39 +00:00
Ruslan Ermilov	7700eb86e7	Do what the execve(2) manpage says and enforce what a Strictly Conforming POSIX application should do by disallowing the argv argument to be NULL. PR: kern/33738 Submitted by: Marc Olzheim, Serge van den Boom OK'ed by: nectar	2004-03-12 21:06:20 +00:00
John Baldwin	8144e3b884	Lock Giant around the single threading code in exec() to satisfy an assertion in the single threading code.	2004-03-05 22:38:26 +00:00
Peter Wemm	df7c361e64	Checkpoint a hack to enable running i386 libc_r binaries on a 64 bit kernel. I'm not happy with it yet - refinements are to come. This hack allows the kern.ps_strings and kern.usrstack sysctls to respond to a 32 bit request, such as those coming from emulated i386 binaries.	2004-02-18 00:54:17 +00:00
Bruce Evans	d6c847f378	Fixed some style bugs (mainly, try to always use explicit comparisons with NULL when checking for null pointers).	2003-12-28 04:37:59 +00:00
Bruce Evans	ca46e90ef4	Fixed some disordering in revs.1.194 and 1,196. Moved the exceve() syscall function back to near the beginning of the file. Rev.1.194 moved it into the middle of auxiliary functions following kern_execve(). Moved the __mac_execve() syscall function up together with execve(). It was new in rev1.1.196 and perfectly misplaced after execve().	2003-12-28 04:18:13 +00:00
Alan Cox	34d2675761	Remove GIANT_REQUIRED from exec_unmap_first_page().	2003-12-27 19:40:03 +00:00
Robert Watson	eca8a663d4	Modify the MAC Framework so that instead of embedding a (struct label) in various kernel objects to represent security data, we embed a (struct label *) pointer, which now references labels allocated using a UMA zone (mac_label.c). This allows the size and shape of struct label to be varied without changing the size and shape of these kernel objects, which become part of the frozen ABI with 5-STABLE. This opens the door for boot-time selection of the number of label slots, and hence changes to the bound on the number of simultaneous labeled policies at boot-time instead of compile-time. This also makes it easier to embed label references in new objects as required for locking/caching with fine-grained network stack locking, such as inpcb structures. This change also moves us further in the direction of hiding the structure of kernel objects from MAC policy modules, not to mention dramatically reducing the number of '&' symbols appearing in both the MAC Framework and MAC policy modules, and improving readability. While this results in minimal performance change with MAC enabled, it will observably shrink the size of a number of critical kernel data structures for the !MAC case, and should have a small (but measurable) performance benefit (i.e., struct vnode, struct socket) do to memory conservation and reduced cost of zeroing memory. NOTE: Users of MAC must recompile their kernel and all MAC modules as a result of this change. Because this is an API change, third party MAC modules will also need to be updated to make less use of the '&' symbol. Suggestions from: bmilekic Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-12 03:14:31 +00:00
Marcel Moolenaar	9ee99eb496	Remove md_bspstore from the MD fields of struct thread. Now that the backing store is at a fixed address, there's no need for a per-thread variable.	2003-10-21 01:13:49 +00:00
Marcel Moolenaar	bab1f05277	Put the RSE backing store at a fixed address. This change is triggered by libguile that needs to know the base of the RSE backing store. We currently do not export the fixed address to userland by means of a sysctl so user code needs to hardcode it for now. This will be revisited later. The RSE backing store is now at the bottom of region 4. The memory stack is at the top of region 4. This means that the whole region is usable for the stacks, giving a 61-bit stack space. Port: lang/guile (depended of x11/gnome2)	2003-10-20 05:34:10 +00:00
Alan Cox	6ec2fca505	Eliminate some unnecessary uses of the vm page queues lock around the vm page's valid field. This field is being synchronized using the containing vm object's lock.	2003-10-04 22:47:20 +00:00
Marcel Moolenaar	c31f2280ed	Remove the regstkpages sysctl variable. We have a growable register stack now.	2003-09-27 23:07:47 +00:00
Marcel Moolenaar	fd75d71049	Part 2 of implementing rstacks: add the ability to create rstacks and use the ability on ia64 to map the register stack. The orientation of the stack (i.e. its grow direction) is passed to vm_map_stack() in the overloaded cow argument. Since the grow direction is represented by bits, it is possible and allowed to create bi-directional stacks. This is not an advertised feature, more of a side-effect. Fix a bug in vm_map_growstack() that's specific to rstacks and which we could only find by having the ability to create rstacks: when the mapped stack ends at the faulting address, we have not actually mapped the faulting address. we need to include or cover the faulting address. Note that at this time mmap(2) has not been extended to allow the creation of rstacks by processes. If such a need arises, this can be done. Tested on: alpha, i386, ia64, sparc64	2003-09-27 22:28:14 +00:00
Peter Wemm	c460ac3a00	Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bit systems where the data/stack/etc limits are too big for a 32 bit process. Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c. Supply an ia32_fixlimits function. Export the clip/default values to sysctl under the compat.ia32 heirarchy. Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max value rather than the sysctl tweakable variable. This allows mmap to place mappings at sensible locations when limits have been reduced. Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same method as mmap(0, ...) now does. Note that we cannot remove all references to the sysctl tweakable maxdsiz etc variables because /etc/login.conf specifies a datasize of 'unlimited'. And that causes exec etc to fail since it can no longer find space to mmap things.	2003-09-25 01:10:26 +00:00
Poul-Henning Kamp	a8d43c90af	Add a "int fd" argument to VOP_OPEN() which in the future will contain the filedescriptor number on opens from userland. The index is used rather than a "struct file " since it conveys a bit more information, which may be useful to in particular fdescfs and /dev/fd/ For now pass -1 all over the place.	2003-07-26 07:32:23 +00:00
David Xu	0e2a4d3aeb	Rename P_THREADED to P_SA. P_SA means a process is using scheduler activations.	2003-06-15 00:31:24 +00:00
Alan Cox	8630c1173e	Add vm object locking to various pagers' "get pages" methods, i386 stack management functions, and a u area management function.	2003-06-13 03:02:28 +00:00
David E. O'Brien	677b542ea2	Use __FBSDID().	2003-06-11 00:56:59 +00:00
Alan Cox	06fa71cdcc	Update the vm object and page locking in exec_map_first_page(). Mark the one still anticipated change with XXX. Otherwise, this function is done.	2003-06-09 19:37:14 +00:00
Alan Cox	fd0cc9a862	Lock the vm object when performing vm_page_grab().	2003-06-08 07:14:30 +00:00
John Baldwin	90af4afacb	- Merge struct procsig with struct sigacts. - Move struct sigacts out of the u-area and malloc() it using the M_SUBPROC malloc bucket. - Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(), sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared(). - Remove the p_sigignore, p_sigacts, and p_sigcatch macros. - Add a mutex to struct sigacts that protects all the members of the struct. - Add sigacts locking. - Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now that sigacts is locked. - Several in-kernel functions such as psignal(), tdsignal(), trapsignal(), and thread_stopped() are now MP safe. Reviewed by: arch@ Approved by: re (rwatson)	2003-05-13 20:36:02 +00:00
Jeff Roberson	2c10d16a4b	- Borrow the KSE single threading code for exec and exit. We use the check if (p->p_numthreads > 1) and not a flag because action is only necessary if there are other threads. The rest of the system has no need to identify thr threaded processes. - In kern_thread.c use thr_exit1() instead of thread_exit() if P_THREADED is not set.	2003-04-01 01:26:20 +00:00
John Baldwin	75b8b3b25c	Replace the at_fork, at_exec, and at_exit functions with the slightly more flexible process_fork, process_exec, and process_exit eventhandlers. This reduces code duplication and also means that I don't have to go duplicate the eventhandler locking three more times for each of at_fork, at_exec, and at_exit. Reviewed by: phk, jake, almost complete silence on arch@	2003-03-24 21:15:35 +00:00
John Baldwin	a5881ea55a	- Cache a reference to the credential of the thread that starts a ktrace in struct proc as p_tracecred alongside the current cache of the vnode in p_tracep. This credential is then used for all later ktrace operations on this file rather than using the credential of the current thread at the time of each ktrace event. - Now that we have multiple ktrace-related items in struct proc that are pointers, rename p_tracep to p_tracevp to make it less ambiguous. Requested by: rwatson (1)	2003-03-13 18:24:22 +00:00
Julian Elischer	ac2e415327	Change the process flags P_KSES to be P_THREADED. This is just a cosmetic change but I've been meaning to do it for about a year.	2003-02-27 02:05:19 +00:00
Warner Losh	a163d034fa	Back out M_* changes, per decision of the TRB. Approved by: trb	2003-02-19 05:47:46 +00:00
Jeff Roberson	5215b1872f	- Split the struct kse into struct upcall and struct kse. struct kse will soon be visible only to schedulers. This greatly simplifies much the KSE code. Submitted by: davidxu	2003-02-17 05:14:26 +00:00
Julian Elischer	6f8132a867	Reversion of commit by Davidxu plus fixes since applied. I'm not convinced there is anything major wrong with the patch but them's the rules.. I am using my "David's mentor" hat to revert this as he's offline for a while.	2003-02-01 12:17:09 +00:00
David Xu	0dbb100b9b	Move UPCALL related data structure out of kse, introduce a new data structure called kse_upcall to manage UPCALL. All KSE binding and loaning code are gone. A thread owns an upcall can collect all completed syscall contexts in its ksegrp, turn itself into UPCALL mode, and takes those contexts back to userland. Any thread without upcall structure has to export their contexts and exit at user boundary. Any thread running in user mode owns an upcall structure, when it enters kernel, if the kse mailbox's current thread pointer is not NULL, then when the thread is blocked in kernel, a new UPCALL thread is created and the upcall structure is transfered to the new UPCALL thread. if the kse mailbox's current thread pointer is NULL, then when a thread is blocked in kernel, no UPCALL thread will be created. Each upcall always has an owner thread. Userland can remove an upcall by calling kse_exit, when all upcalls in ksegrp are removed, the group is atomatically shutdown. An upcall owner thread also exits when process is in exiting state. when an owner thread exits, the upcall it owns is also removed. KSE is a pure scheduler entity. it represents a virtual cpu. when a thread is running, it always has a KSE associated with it. scheduler is free to assign a KSE to thread according thread priority, if thread priority is changed, KSE can be moved from one thread to another. When a ksegrp is created, there is always N KSEs created in the group. the N is the number of physical cpu in the current system. This makes it is possible that even an userland UTS is single CPU safe, threads in kernel still can execute on different cpu in parallel. Userland calls kse_create to add more upcall structures into ksegrp to increase concurrent in userland itself, kernel is not restricted by number of upcalls userland provides. The code hasn't been tested under SMP by author due to lack of hardware. Reviewed by: julian	2003-01-26 11:41:35 +00:00
Alfred Perlstein	44956c9863	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.	2003-01-21 08:56:16 +00:00
Robert Watson	ec35c2af68	Perform VOP_GETATTR() before mac_check_vnode_exec() so that the cached attributes are available to MAC modules. Submitted by: mike halderman <mrh@nosc.mil> Obtained from: TrustedBSD Project	2003-01-21 03:26:28 +00:00
Matthew Dillon	3db161e079	It is possible for an active aio to prevent shared memory from being dereferenced when a process exits due to the vmspace ref-count being bumped. Change shmexit() and shmexit_myhook() to take a vmspace instead of a process and call it in vmspace_dofree(). This way if it is missed in exit1()'s early-resource-free it will still be caught when the zombie is reaped. Also fix a potential race in shmexit_myhook() by NULLing out vmspace->vm_shm prior to calling shm_delete_mapping() and free(). MFC after: 7 days	2003-01-13 23:04:32 +00:00
David Xu	45f603e21c	Clear some KSE fields after kse mode was turned off.	2003-01-07 06:56:43 +00:00
Jake Burkholder	5dadd17b08	Add a sysctl to get the vm protections for the stack of the current process. On architectures with a non-executable stack, eg sparc64, this is used by libgcc to determine at runtime if its necessary to enable execute permissions on a region of the stack which will be used to execute code, allowing the call to mprotect to be avoided if the kernel is configured to map the stack executable.	2003-01-04 07:54:23 +00:00
Alfred Perlstein	c522c1bf4b	fdcopy() only needs a filedesc pointer.	2003-01-01 01:19:31 +00:00

1 2 3 4 5 ...

254 Commits