Commit Graph

338 Commits

Author SHA1 Message Date
Jeff Roberson
68ce4375c4 - textvp may have been from a different mountpoint than ndp->ni_vp and
we may need to acquire giant to vrele it.

Found by:	mjacob
MFC After:	3 days
2006-02-02 08:39:39 +00:00
Alan Cox
05406e6f33 Remove unneeded calls to pmap_remove_all(). The given page is not mapped.
Reviewed by: tegge
2005-12-11 22:06:57 +00:00
David Xu
d26b1a1fb9 Register itimers_event_hook as a kernel event handler, so I don't
have to duplicate code to call it in exec() and exit1().
2005-12-09 05:43:26 +00:00
Alan Cox
8ad398d089 Reduce the scope of the page queues lock in exec_map_first_page(). The vm
object lock is sufficient for reading a page's PG_BUSY and busy flags.

MFC after: 1 week
2005-12-06 07:39:36 +00:00
David Xu
6d7b314b14 Cleanup some signal interfaces. Now the tdsignal function accepts
both proc pointer and thread pointer, if thread pointer is NULL,
tdsignal automatically finds a thread, otherwise it sends signal
to given thread.
Add utility function psignal_event to send a realtime sigevent
to a process according to the delivery requirement specified in
struct sigevent.
2005-11-03 04:49:16 +00:00
Paul Saab
1471f287e1 Calling setrlimit from 32bit apps could potentially increase certain
limits beyond what should be capiable in a 32bit process, so we
must fixup the limits.

Reviewed by:	jhb
2005-11-02 21:18:07 +00:00
David Xu
60354683d9 Make p_itimers as a pointer, so file sys/proc.h does not need to include
sys/timers.h.
2005-10-23 12:19:08 +00:00
David Xu
86857b368d Implement POSIX timers. Current only CLOCK_REALTIME and CLOCK_MONOTONIC
clock are supported. I have plan to merge XSI timer ITIMER_REAL and other
two CPU timers into the new code, current three slots are available for
the XSI timers.
The SIGEV_THREAD notification type is not supported yet because our
sigevent struct lacks of two member fields:
sigev_notify_function
sigev_notify_attributes
I have found the sigevent is used in AIO, so I won't add the two members
unless the AIO code is adjusted.
2005-10-23 04:22:56 +00:00
David Xu
9104847f21 1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
   sendsig use discrete parameters, now they uses member fields of
   ksiginfo_t structure. For sendsig, this change allows us to pass
   POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
   generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
   blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
   thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
   be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
   an extra siginfo list holds additional siginfo_t data for signals.
   kernel code uses psignal() still behavior as before, it won't be failed
   even under memory pressure, only exception is when deleting a signal,
   we should call sigqueue_delete to remove signal from sigqueue but
   not SIGDELSET. Current there is no kernel code will deliver a signal
   with additional data, so kernel should be as stable as before,
   a ksiginfo can carry more information, for example, allow signal to
   be delivered but throw away siginfo data if memory is not enough.
   SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
   not be caught or masked.
   The sigqueue() syscall allows user code to queue a signal to target
   process, if resource is unavailable, EAGAIN will be returned as
   specification said.
   Just before thread exits, signal queue memory will be freed by
   sigqueue_flush.
   Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64
2005-10-14 12:43:47 +00:00
Diomidis Spinellis
9f5c1d1955 Move execve's access time update functionality into a new
vfs_mark_atime() function, and use the new function for
performing efficient atime updates in mmap().

Reviewed by:	bde
MFC after:	2 weeks
2005-10-12 06:56:00 +00:00
Don Lewis
34ea500bea Add missing word to comment. 2005-10-04 04:02:33 +00:00
Colin Percival
33812c066d If sufficiently bad things happen during a call to kern_execve(), it is
possible for do_execve() to call exit1() rather than returning.  As a
result, the sequence "allocate memory; call kern_execve; free memory"
can end up leaking memory.

This commit documents this astonishing behaviour and adds a call to
exec_free_args() before the exit1() call in do_execve().  Since all
the users of kern_execve() in the tree use exec_free_args() to free
the command-line arguments after kern_execve() returns, this should
be safe, and it fixes the memory leak which can otherwise occur.

Submitted by:	Peter Holm
MFC after:	3 days
Security:	Local denial of service
2005-10-03 12:49:54 +00:00
Don Lewis
5997cae9a4 Copy new process argument list in do_execve() before grabbing PROC_LOCK
to avoid touching pageable memory while holding a mutex.

Simplify argument list replacement logic.

PR:		kern/84935
Submitted by:	"Antoine Pelisse" apelisse AT gmail.com (in a different form)
MFC after:	3 days
2005-10-01 08:33:56 +00:00
Joseph Koshy
151392465f MFP4:
- pmcstat(8) gprof output mode fixes:

  lib/libpmc/pmclog.{c,h}, sys/sys/pmclog.h:
  + Add a 'is_usermode' field to the PMCLOG_PCSAMPLE event
  + Add an 'entryaddr' field to the PMCLOG_PROCEXEC event,
    so that pmcstat(8) can determine where the runtime loader
    /libexec/ld-elf.so.1 is getting loaded.

  sys/kern/kern_exec.c:
  + Use a local struct to group the entry address of the image being
    exec()'ed and the process credential changed flag to the exec
    handling hook inside hwpmc(4).

  usr.sbin/pmcstat/*:
  + Support "-k kernelpath", "-D sampledir".
  + Implement the ELF bits of 'gmon.out' profile generation in a new
    file "pmcstat_log.c".  Move all log related functions to this
    file.
  + Move local definitions and prototypes to "pmcstat.h"

- Other bug fixes:
  + lib/libpmc/pmclog.c: correctly handle EOF in pmclog_read().
  + sys/dev/hwpmc_mod.c: unconditionally log a PROCEXIT event to all
    attached PMCs when a process exits.
  + sys/sys/pmc.h: correct a function prototype.
  + Improve usage checks in pmcstat(8).

Approved by:	re (blanket hwpmc)
2005-06-30 19:01:26 +00:00
Peter Wemm
4da0d332f4 Move HWPMC_HOOKS into its own opt_hwpmc_hooks.h file. It doesn't merit
being in opt_global.h and forcing a global recompile when only a few files
reference it.

Approved by:  re
2005-06-24 00:16:57 +00:00
Joseph Koshy
f263522a45 MFP4:
- Implement sampling modes and logging support in hwpmc(4).

- Separate MI and MD parts of hwpmc(4) and allow sharing of
  PMC implementations across different architectures.
  Add support for P4 (EMT64) style PMCs to the amd64 code.

- New pmcstat(8) options: -E (exit time counts) -W (counts
  every context switch), -R (print log file).

- pmc(3) API changes, improve our ability to keep ABI compatibility
  in the future.  Add more 'alias' names for commonly used events.

- bug fixes & documentation.
2005-06-09 19:45:09 +00:00
Ken Smith
6341095e0d This patch addresses a standards violation issue. The standards say a
file's access time should be updated when it gets executed.  A while
ago the mechanism used to exec was changed to use a more mmap based
mechanism and this behavior was broken as a side-effect of that.

A new vnode flag is added that gets set when the file gets executed,
and the VOP_SETATTR() vnode operation gets called.  The underlying
filesystem is expected to handle it based on its own semantics, some
filesystems don't support access time at all.  Those that do should
handle it in a way that does not block, does not generate I/O if possible,
etc.  In particular vn_start_write() has not been called.  The UFS code
handles it the same way as it would normally handle the access time if
a file was read - the IN_ACCESS flag gets set in the inode but no other
action happens at this point.  The actual time update will happen later
during a sync (which handles all the necessary locking).

Got me into this:	cperciva
Discussed with:		a lot with bde, a little with kan
Showed patches to:	phk, jeffr, standards@, arch@
Minor discussion on:	arch@
2005-05-31 19:39:52 +00:00
Jeff Roberson
532c491b2b - Initialize vfslocked correctly early enough for MAC to compile.
- Fix one place where we explicitly drop Giant!

Pointy hat to:	me
Submitted by:	Max Laier
Warned by:	Tinderbox
2005-05-03 16:24:59 +00:00
Jeff Roberson
269576c82a - Use namei to acquire Giant for VFS if it is necessary. Drop the explicit
Giant acquisition.
 - Remove GIANT_REQUIRED in the few remaining cases; the vm and vfs have
   both been locked.
2005-05-03 10:55:05 +00:00
Jeff Roberson
4d7d299383 - Return EACCES if we're trying to exec on a vp with no object.
Errno supplied by:	cperciva
2005-05-01 00:58:19 +00:00
Jeff Roberson
7625cbf3cc - Pass the ISOPEN flag to namei so filesystems will know we're about to
open them or otherwise access the data.
2005-04-27 09:05:19 +00:00
Joseph Koshy
ebccf1e3a6 Bring a working snapshot of hwpmc(4), its associated libraries, userland utilities
and documentation into -CURRENT.

Bump FreeBSD_version.

Reviewed by:	alc, jhb (kernel changes)
2005-04-19 04:01:25 +00:00
Maxim Sobolev
90dc539be0 Welcome to the 21st century: increase MAXSHELLCMDLEN from 128 bytes to
PAGE_SIZE.

Unlike originator of the PR suggests retain MAXSHELLCMDLEN definition
(he has been proposing to replace it with PAGE_SIZE everywhere), not only
this reduced the diff significantly, but prevents code obfuscation and also
allows to increase/decrease this parameter easily if needed.

PR:		kern/64196
Submitted by:	Magnus Bäckström <b@etek.chalmers.se>
2005-02-25 11:49:42 +00:00
Maxim Sobolev
56c2262c0e Grrr, this committer needs to have a sleep. Remove lines from the previous
delta not intended for public consumption.

MFC after:	2 weeks
2005-01-29 23:51:05 +00:00
Maxim Sobolev
c30af53213 Fix small non-conformance introduced in the previous commit: execve() is
expected to return ENAMETOOLONG, not E2BIG if first argument doesn't
fit into {PATH_MAX} bytes.

MFC after:	2 weeks
2005-01-29 23:47:36 +00:00
Maxim Sobolev
610ecfe035 o Split out kernel part of execve(2) syscall into two parts: one that
copies arguments into the kernel space and one that operates
  completely in the kernel space;

o use kernel-only version of execve(2) to kill another stackgap in
  linuxlator/i386.

Obtained from:  DragonFlyBSD (partially)
MFC after:      2 weeks
2005-01-29 23:12:00 +00:00
Poul-Henning Kamp
8516dd18e1 Don't use VOP_GETVOBJECT, use vp->v_object directly. 2005-01-25 00:40:01 +00:00
Warner Losh
9454b2d864 /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
Poul-Henning Kamp
c113083c5a Add new function fdunshare() which encapsulates the necessary light magic
for ensuring that a process' filedesc is not shared with anybody.

Use it in the two places which previously had private implmentations.

This collects all fd_refcnt handling in kern_descrip.c
2004-12-14 07:20:03 +00:00
David Schultz
6004362e66 Don't include sys/user.h merely for its side-effect of recursively
including other headers.
2004-11-27 06:51:39 +00:00
Poul-Henning Kamp
124e4c3be8 Introduce an alias for FILEDESC_{UN}LOCK() with the suffix _FAST.
Use this in all the places where sleeping with the lock held is not
an issue.

The distinction will become significant once we finalize the exact
lock-type to use for this kind of case.
2004-11-13 11:53:02 +00:00
Poul-Henning Kamp
598b7ec86b Use more intuitive pointer for fdinit() and fdcopy().
Change fdcopy() to take unlocked filedesc.
2004-11-08 12:43:23 +00:00
Peter Wemm
a7bc3102c4 Put on my peril sensitive sunglasses and add a flags field to the internal
sysctl routines and state.  Add some code to use it for signalling the need
to downconvert a data structure to 32 bits on a 64 bit OS when requested by
a 32 bit app.

I tried to do this in a generic abi wrapper that intercepted the sysctl
oid's, or looked up the format string etc, but it was a real can of worms
that turned into a fragile mess before I even got it partially working.

With this, we can now run 'sysctl -a' on a 32 bit sysctl binary and have
it not abort.  Things like netstat, ps, etc have a long way to go.

This also fixes a bug in the kern.ps_strings and kern.usrstack hacks.
These do matter very much because they are used by libc_r and other things.
2004-10-11 22:04:16 +00:00
David Xu
84e0b075f6 Add an execve command for kse_thr_interrupt to allow libpthread to
restore signal mask correctly, this is required by POSIX.

Reviewed by: deischen
2004-10-07 13:50:10 +00:00
David Xu
906ac69d08 In original kern_execve() code, at the start of the function, it forces
all other threads to suicide, problem is execve() could be failed, and
a failed execve() would change threaded process to unthreaded, this side
effect is unexpected.
The new code introduces a new single threading mode SINGLE_BOUNDARY, in
the mode, all threads should suspend themself at user boundary except
the singler. we can not use SINGLE_NO_EXIT because we want to start from
a clean state if execve() is successful, suspending other threads at unknown
point and later resuming them from there and forcing them to exit at user
boundary may cause the process to start from a dirty state. If execve() is
successful, current thread upgrades to SINGLE_EXIT mode and forces other
threads to suicide at user boundary, otherwise, other threads will be resumed
and their interrupted syscall will be restarted.

Reviewed by: julian
2004-10-06 00:40:41 +00:00
John Baldwin
63993cf011 - Don't try to unlock Giant if single threading fails since we don't have
it locked.
- Unlock Giant before calling exit1() since exit1() does not require Giant.
2004-09-23 21:01:50 +00:00
Julian Elischer
2e2e32b201 Revert the last change..
Better to kill all other threads than to panic the system if 2 threads call
execve() at the same time. A better fix will be committed later.

Note that this only affects the case where the execve fails.
2004-09-22 01:30:23 +00:00
Julian Elischer
297800599a In a threaded process, don't kill off all the other threads until we have a
reasonable chance that the eceve() is going to succeeed. I.e.
wait until we've done the permission checks etc.

MFC after:	1 week
2004-09-21 21:05:13 +00:00
Julian Elischer
ed062c8d66 Refactor a bunch of scheduler code to give basically the same behaviour
but with slightly cleaned up interfaces.

The KSE structure has become the same as the "per thread scheduler
private data" structure. In order to not make the diffs too great
one is #defined as the other at this time.

The KSE (or td_sched) structure is  now allocated per thread and has no
allocation code of its own.

Concurrency for a KSEGRP is now kept track of via a simple pair of counters
rather than using KSE structures as tokens.

Since the KSE structure is different in each scheduler, kern_switch.c
is now included at the end of each scheduler. Nothing outside the
scheduler knows the contents of the KSE (aka td_sched) structure.

The fields in the ksegrp structure that are to do with the scheduler's
queueing mechanisms are now moved to the kg_sched structure.
(per ksegrp scheduler private data structure). In other words how the
scheduler queues and keeps track of threads is no-one's business except
the scheduler's. This should allow people to write experimental
schedulers with completely different internal structuring.

A scheduler call sched_set_concurrency(kg, N) has been added that
notifies teh scheduler that no more than N threads from that ksegrp
should be allowed to be on concurrently scheduled. This is also
used to enforce 'fainess' at this time so that a ksegrp with
10000 threads can not swamp a the run queue and force out a process
with 1 thread, since the current code will not set the concurrency above
NCPU, and both schedulers will not allow more than that many
onto the system run queue at a time. Each scheduler should eventualy develop
their own methods to do this now that they are effectively separated.

Rejig libthr's kernel interface to follow the same code paths as
linkse for scope system threads. This has slightly hurt libthr's performance
but I will work to recover as much of it as I can.

Thread exit code has been cleaned up greatly.
exit and exec code now transitions a process back to
'standard non-threaded mode' before taking the next step.
Reviewed by:	scottl, peter
MFC after:	1 week
2004-09-05 02:09:54 +00:00
John-Mark Gurney
ad3b9257c2 Add locking to the kqueue subsystem. This also makes the kqueue subsystem
a more complete subsystem, and removes the knowlege of how things are
implemented from the drivers.  Include locking around filter ops, so a
module like aio will know when not to be unloaded if there are outstanding
knotes using it's filter ops.

Currently, it uses the MTX_DUPOK even though it is not always safe to
aquire duplicate locks.  Witness currently doesn't support the ability
to discover if a dup lock is ok (in some cases).

Reviewed by:	green, rwatson (both earlier versions)
2004-08-15 06:24:42 +00:00
Colin Percival
56f21b9d74 Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with:	rwatson, scottl
Requested by:	jhb
2004-07-26 07:24:04 +00:00
Julian Elischer
aa3c8c02ae White space fix..
diff reduction for upcoming commit.
2004-07-24 04:57:41 +00:00
Alan Cox
ce8da3091f Push down the acquisition and release of the page queues lock into
pmap_remove_pages().  (The implementation of pmap_remove_pages() is
optional.  If pmap_remove_pages() is unimplemented, the acquisition and
release of the page queues lock is unnecessary.)

Remove spl calls from the alpha, arm, and ia64 pmap_remove_pages().
2004-07-13 02:49:22 +00:00
Tim J. Robbins
aa0aa7a113 Move TDF_SA from td_flags to td_pflags (and rename it accordingly)
so that it is no longer necessary to hold sched_lock while
manipulating it.

Reviewed by:	davidxu
2004-06-02 07:52:36 +00:00
David Xu
702ac0f112 Clear KSE thread flags after KSE thread mode is ended. The side effect
of not clearing the flags for execv() syscall will result that a new
program runs in KSE thread mode without enabling it.

Submitted by: tjr
Modified by: davidxu
2004-05-21 14:50:23 +00:00
Alan Cox
59c8bc40ce Utilize sf_buf_alloc() rather than pmap_qenter() (and sometimes
kmem_alloc_wait()) for mapping the image header.  On all machines with a
direct virtual-to-physical mapping and SMP/HTT i386s, this is a clear win.
2004-04-23 03:01:40 +00:00
Alan Cox
148b3f62a9 Use vm_page_hold() rather than vm_page_wire() for short-duration page
wiring.  The reason being that vm_page_hold() is cheaper.
2004-04-11 19:57:11 +00:00
Pawel Jakub Dawidek
2fc0588da2 Remove sysctl kern.ps_argsopen, it is not very useful, one should use
security.bsd.see_other_uids instead.

Discussed with:	phk, rwatson
2004-04-01 00:10:45 +00:00
Peter Wemm
a5bdcb2a2f Make the process_exit eventhandler run without Giant. Add Giant hooks
in the two consumers that need it.. processes using AIO and netncp.
Update docs.  Say that process_exec is called with Giant, but not to
depend on it.  All our consumers can handle it without Giant.
2004-03-14 02:06:28 +00:00
Peter Wemm
37814395c1 Push Giant down a little further:
- no longer serialize on Giant for thread_single*() and family in fork,
  exit and exec
- thread_wait() is mpsafe, assert no Giant
- reduce scope of Giant in exit to not cover thread_wait and just do
  vm_waitproc().
- assert that thread_single() family are not called with Giant
- remove the DROP/PICKUP_GIANT macros from thread_single() family
- assert that thread_suspend_check() s not called with Giant
- remove manual drop_giant hack in thread_suspend_check since we know it
  isn't held.
- remove the DROP/PICKUP_GIANT macros from thread_suspend_check() family
- mark kse_create() mpsafe
2004-03-13 22:31:39 +00:00
Ruslan Ermilov
7700eb86e7 Do what the execve(2) manpage says and enforce what a Strictly
Conforming POSIX application should do by disallowing the argv
argument to be NULL.

PR:		kern/33738
Submitted by:	Marc Olzheim, Serge van den Boom
OK'ed by:	nectar
2004-03-12 21:06:20 +00:00
John Baldwin
8144e3b884 Lock Giant around the single threading code in exec() to satisfy an
assertion in the single threading code.
2004-03-05 22:38:26 +00:00
Peter Wemm
df7c361e64 Checkpoint a hack to enable running i386 libc_r binaries on a 64 bit
kernel.  I'm not happy with it yet - refinements are to come.
This hack allows the kern.ps_strings and kern.usrstack sysctls to respond
to a 32 bit request, such as those coming from emulated i386 binaries.
2004-02-18 00:54:17 +00:00
Bruce Evans
d6c847f378 Fixed some style bugs (mainly, try to always use explicit comparisons with
NULL when checking for null pointers).
2003-12-28 04:37:59 +00:00
Bruce Evans
ca46e90ef4 Fixed some disordering in revs.1.194 and 1,196. Moved the exceve() syscall
function back to near the beginning of the file.  Rev.1.194 moved it into
the middle of auxiliary functions following kern_execve().  Moved the
__mac_execve() syscall function up together with execve().  It was new in
rev1.1.196 and perfectly misplaced after execve().
2003-12-28 04:18:13 +00:00
Alan Cox
34d2675761 Remove GIANT_REQUIRED from exec_unmap_first_page(). 2003-12-27 19:40:03 +00:00
Robert Watson
eca8a663d4 Modify the MAC Framework so that instead of embedding a (struct label)
in various kernel objects to represent security data, we embed a
(struct label *) pointer, which now references labels allocated using
a UMA zone (mac_label.c).  This allows the size and shape of struct
label to be varied without changing the size and shape of these kernel
objects, which become part of the frozen ABI with 5-STABLE.  This opens
the door for boot-time selection of the number of label slots, and hence
changes to the bound on the number of simultaneous labeled policies
at boot-time instead of compile-time.  This also makes it easier to
embed label references in new objects as required for locking/caching
with fine-grained network stack locking, such as inpcb structures.

This change also moves us further in the direction of hiding the
structure of kernel objects from MAC policy modules, not to mention
dramatically reducing the number of '&' symbols appearing in both the
MAC Framework and MAC policy modules, and improving readability.

While this results in minimal performance change with MAC enabled, it
will observably shrink the size of a number of critical kernel data
structures for the !MAC case, and should have a small (but measurable)
performance benefit (i.e., struct vnode, struct socket) do to memory
conservation and reduced cost of zeroing memory.

NOTE: Users of MAC must recompile their kernel and all MAC modules as a
result of this change.  Because this is an API change, third party
MAC modules will also need to be updated to make less use of the '&'
symbol.

Suggestions from:	bmilekic
Obtained from:		TrustedBSD Project
Sponsored by:		DARPA, Network Associates Laboratories
2003-11-12 03:14:31 +00:00
Marcel Moolenaar
9ee99eb496 Remove md_bspstore from the MD fields of struct thread. Now that
the backing store is at a fixed address, there's no need for a
per-thread variable.
2003-10-21 01:13:49 +00:00
Marcel Moolenaar
bab1f05277 Put the RSE backing store at a fixed address. This change is triggered
by libguile that needs to know the base of the RSE backing store. We
currently do not export the fixed address to userland by means of a
sysctl so user code needs to hardcode it for now. This will be revisited
later.

The RSE backing store is now at the bottom of region 4. The memory stack
is at the top of region 4. This means that the whole region is usable
for the stacks, giving a 61-bit stack space.

Port: lang/guile (depended of x11/gnome2)
2003-10-20 05:34:10 +00:00
Alan Cox
6ec2fca505 Eliminate some unnecessary uses of the vm page queues lock around the
vm page's valid field.  This field is being synchronized using the
containing vm object's lock.
2003-10-04 22:47:20 +00:00
Marcel Moolenaar
c31f2280ed Remove the regstkpages sysctl variable. We have a growable register
stack now.
2003-09-27 23:07:47 +00:00
Marcel Moolenaar
fd75d71049 Part 2 of implementing rstacks: add the ability to create rstacks and
use the ability on ia64 to map the register stack. The orientation of
the stack (i.e. its grow direction) is passed to vm_map_stack() in the
overloaded cow argument. Since the grow direction is represented by
bits, it is possible and allowed to create bi-directional stacks.
This is not an advertised feature, more of a side-effect.

Fix a bug in vm_map_growstack() that's specific to rstacks and which
we could only find by having the ability to create rstacks: when
the mapped stack ends at the faulting address, we have not actually
mapped the faulting address. we need to include or cover the faulting
address.

Note that at this time mmap(2) has not been extended to allow the
creation of rstacks by processes. If such a need arises, this can
be done.

Tested on: alpha, i386, ia64, sparc64
2003-09-27 22:28:14 +00:00
Peter Wemm
c460ac3a00 Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bit
systems where the data/stack/etc limits are too big for a 32 bit process.

Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c.

Supply an ia32_fixlimits function.  Export the clip/default values to
sysctl under the compat.ia32 heirarchy.

Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max
value rather than the sysctl tweakable variable.  This allows mmap to
place mappings at sensible locations when limits have been reduced.

Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same
method as mmap(0, ...) now does.

Note that we cannot remove all references to the sysctl tweakable
maxdsiz etc variables because /etc/login.conf specifies a datasize
of 'unlimited'.  And that causes exec etc to fail since it can no
longer find space to mmap things.
2003-09-25 01:10:26 +00:00
Poul-Henning Kamp
a8d43c90af Add a "int fd" argument to VOP_OPEN() which in the future will
contain the filedescriptor number on opens from userland.

The index is used rather than a "struct file *" since it conveys a bit
more information, which may be useful to in particular fdescfs and /dev/fd/*

For now pass -1 all over the place.
2003-07-26 07:32:23 +00:00
David Xu
0e2a4d3aeb Rename P_THREADED to P_SA. P_SA means a process is using scheduler
activations.
2003-06-15 00:31:24 +00:00
Alan Cox
8630c1173e Add vm object locking to various pagers' "get pages" methods, i386 stack
management functions, and a u area management function.
2003-06-13 03:02:28 +00:00
David E. O'Brien
677b542ea2 Use __FBSDID(). 2003-06-11 00:56:59 +00:00
Alan Cox
06fa71cdcc Update the vm object and page locking in exec_map_first_page(). Mark the
one still anticipated change with XXX.  Otherwise, this function is done.
2003-06-09 19:37:14 +00:00
Alan Cox
fd0cc9a862 Lock the vm object when performing vm_page_grab(). 2003-06-08 07:14:30 +00:00
John Baldwin
90af4afacb - Merge struct procsig with struct sigacts.
- Move struct sigacts out of the u-area and malloc() it using the
  M_SUBPROC malloc bucket.
- Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(),
  sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared().
- Remove the p_sigignore, p_sigacts, and p_sigcatch macros.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Add sigacts locking.
- Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now
  that sigacts is locked.
- Several in-kernel functions such as psignal(), tdsignal(), trapsignal(),
  and thread_stopped() are now MP safe.

Reviewed by:	arch@
Approved by:	re (rwatson)
2003-05-13 20:36:02 +00:00
Jeff Roberson
2c10d16a4b - Borrow the KSE single threading code for exec and exit. We use the check
if (p->p_numthreads > 1) and not a flag because action is only necessary
   if there are other threads.  The rest of the system has no need to
   identify thr threaded processes.
 - In kern_thread.c use thr_exit1() instead of thread_exit() if P_THREADED
   is not set.
2003-04-01 01:26:20 +00:00
John Baldwin
75b8b3b25c Replace the at_fork, at_exec, and at_exit functions with the slightly more
flexible process_fork, process_exec, and process_exit eventhandlers.  This
reduces code duplication and also means that I don't have to go duplicate
the eventhandler locking three more times for each of at_fork, at_exec, and
at_exit.

Reviewed by:	phk, jake, almost complete silence on arch@
2003-03-24 21:15:35 +00:00
John Baldwin
a5881ea55a - Cache a reference to the credential of the thread that starts a ktrace in
struct proc as p_tracecred alongside the current cache of the vnode in
  p_tracep.  This credential is then used for all later ktrace operations on
  this file rather than using the credential of the current thread at the
  time of each ktrace event.
- Now that we have multiple ktrace-related items in struct proc that are
  pointers, rename p_tracep to p_tracevp to make it less ambiguous.

Requested by:	rwatson (1)
2003-03-13 18:24:22 +00:00
Julian Elischer
ac2e415327 Change the process flags P_KSES to be P_THREADED.
This is just a cosmetic change but I've been meaning to do it for about a year.
2003-02-27 02:05:19 +00:00
Warner Losh
a163d034fa Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
Jeff Roberson
5215b1872f - Split the struct kse into struct upcall and struct kse. struct kse will
soon be visible only to schedulers.  This greatly simplifies much the
   KSE code.

Submitted by:	davidxu
2003-02-17 05:14:26 +00:00
Julian Elischer
6f8132a867 Reversion of commit by Davidxu plus fixes since applied.
I'm not convinced there is anything major wrong with the patch but
them's the rules..

I am using my "David's mentor" hat to revert this as he's
offline for a while.
2003-02-01 12:17:09 +00:00
David Xu
0dbb100b9b Move UPCALL related data structure out of kse, introduce a new
data structure called kse_upcall to manage UPCALL. All KSE binding
and loaning code are gone.

A thread owns an upcall can collect all completed syscall contexts in
its ksegrp, turn itself into UPCALL mode, and takes those contexts back
to userland. Any thread without upcall structure has to export their
contexts and exit at user boundary.

Any thread running in user mode owns an upcall structure, when it enters
kernel, if the kse mailbox's current thread pointer is not NULL, then
when the thread is blocked in kernel, a new UPCALL thread is created and
the upcall structure is transfered to the new UPCALL thread. if the kse
mailbox's current thread pointer is NULL, then when a thread is blocked
in kernel, no UPCALL thread will be created.

Each upcall always has an owner thread. Userland can remove an upcall by
calling kse_exit, when all upcalls in ksegrp are removed, the group is
atomatically shutdown. An upcall owner thread also exits when process is
in exiting state. when an owner thread exits, the upcall it owns is also
removed.

KSE is a pure scheduler entity. it represents a virtual cpu. when a thread
is running, it always has a KSE associated with it. scheduler is free to
assign a KSE to thread according thread priority, if thread priority is changed,
KSE can be moved from one thread to another.

When a ksegrp is created, there is always N KSEs created in the group. the
N is the number of physical cpu in the current system. This makes it is
possible that even an userland UTS is single CPU safe, threads in kernel still
can execute on different cpu in parallel. Userland calls kse_create to add more
upcall structures into ksegrp to increase concurrent in userland itself, kernel
is not restricted by number of upcalls userland provides.

The code hasn't been tested under SMP by author due to lack of hardware.

Reviewed by: julian
2003-01-26 11:41:35 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
Robert Watson
ec35c2af68 Perform VOP_GETATTR() before mac_check_vnode_exec() so that
the cached attributes are available to MAC modules.

Submitted by:   mike halderman <mrh@nosc.mil>
Obtained from:	TrustedBSD Project
2003-01-21 03:26:28 +00:00
Matthew Dillon
3db161e079 It is possible for an active aio to prevent shared memory from being
dereferenced when a process exits due to the vmspace ref-count being
bumped.  Change shmexit() and shmexit_myhook() to take a vmspace instead
of a process and call it in vmspace_dofree().  This way if it is missed
in exit1()'s early-resource-free it will still be caught when the zombie is
reaped.

Also fix a potential race in shmexit_myhook() by NULLing out
vmspace->vm_shm prior to calling shm_delete_mapping() and free().

MFC after:	7 days
2003-01-13 23:04:32 +00:00
David Xu
45f603e21c Clear some KSE fields after kse mode was turned off. 2003-01-07 06:56:43 +00:00
Jake Burkholder
5dadd17b08 Add a sysctl to get the vm protections for the stack of the current process.
On architectures with a non-executable stack, eg sparc64, this is used by
libgcc to determine at runtime if its necessary to enable execute permissions
on a region of the stack which will be used to execute code, allowing the
call to mprotect to be avoided if the kernel is configured to map the stack
executable.
2003-01-04 07:54:23 +00:00
Alfred Perlstein
c522c1bf4b fdcopy() only needs a filedesc pointer. 2003-01-01 01:19:31 +00:00
Alan Cox
ee113343eb Hold the page queues lock when performing vm_page_busy(). 2002-12-18 20:16:22 +00:00
Alfred Perlstein
b80521fee5 remove syscallarg().
Suggested by: peter
2002-12-14 02:07:32 +00:00
Robert Drehmel
d1989db545 To avoid sleeping with all sorts of resources acquired (the reported
problem was a locked directory vnode), do not give the process a chance
to sleep in state "stopevent" (depends on the S_EXEC bit being set in
p_stops) until most resources have been released again.

Approved by:	re
2002-11-26 17:30:55 +00:00
Alan Cox
2d21129db2 Acquire and release the page queues lock around pmap_remove_pages() because
it updates several of vm_page's fields.
2002-11-25 04:37:44 +00:00
Jeff Roberson
a9a088823e - Release the imgp vnode prior to freeing exec_map resources to avoid
deadlock.
2002-11-17 09:33:00 +00:00
Alan Cox
4fec79bef8 Now that pmap_remove_all() is exported by our pmap implementations
use it directly.
2002-11-16 07:44:25 +00:00
Alan Cox
d154fb4fe6 When prot is VM_PROT_NONE, call pmap_page_protect() directly rather than
indirectly through vm_page_protect().  The one remaining page flag that
is updated by vm_page_protect() is already being updated by our various
pmap implementations.

Note: A later commit will similarly change the VM_PROT_READ case and
eliminate vm_page_protect().
2002-11-10 07:12:04 +00:00
Robert Watson
0c93266b9c Correct merge-o: disable the right execve() variation if !MAC 2002-11-05 18:04:50 +00:00
Robert Watson
670cb89bf4 Bring in two sets of changes:
(1) Permit userland applications to request a change of label atomic
    with an execve() via mac_execve().  This is required for the
    SEBSD port of SELinux/FLASK.  Attempts to invoke this without
    MAC compiled in result in ENOSYS, as with all other MAC system
    calls.  Complexity, if desired, is present in policy modules,
    rather than the framework.

(2) Permit policies to have access to both the label of the vnode
    being executed as well as the interpreter if it's a shell
    script or related UNIX nonsense.  Because we can't hold both
    vnode locks at the same time, cache the interpreter label.
    SEBSD relies on this because it supports secure transitioning
    via shell script executables.  Other policies might want to
    take both labels into account during an integrity or
    confidentiality decision at execve()-time.

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-05 17:51:56 +00:00
Robert Watson
ccafe7eb35 Hook up the mac_will_execve_transition() and mac_execve_transition()
entrypoints, #ifdef MAC.  The supporting logic already existed in
kern_mac.c, so no change there.  This permits MAC policies to cause
a process label change as the result of executing a binary --
typically, as a result of executing a specially labeled binary.

For example, the SEBSD port of SELinux/FLASK uses this functionality
to implement TE type transitions on processes using transitioning
binaries, in a manner similar to setuid.  Policies not implementing
a notion of transition (all the ones in the tree right now) require
no changes, since the old label data is copied to the new label
via mac_create_cred() even if a transition does occur.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-05 14:57:49 +00:00
Robert Watson
450ffb4427 Remove reference to struct execve_args from struct imgact, which
describes an image activation instance.  Instead, make use of the
existing fname structure entry, and introduce two new entries,
userspace_argv, and userspace_envv.  With the addition of
mac_execve(), this divorces the image structure from the specifics
of the execve() system call, removes a redundant pointer, etc.
No semantic change from current behavior, but it means that the
structure doesn't depend on syscalls.master-generated includes.

There seems to be some redundant initialization of imgact entries,
which I have maintained, but which could probably use some cleaning
up at some point.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-05 01:59:56 +00:00
John Baldwin
e1b1aa3bc2 - Move the 'done1' label down below the unlock of the proc lock and move
the locking of the proc lock after the goto to done1 to avoid locking
  the lock in an error case just so we can turn around and unlock it.
- Move the exec_setregs() stuff out from under the proc lock and after
  the p_args stuff.  This allows exec_setregs() to be able to sleep or
  write things out to userland, etc. which ia64 does.

Tested by:	peter
2002-10-11 21:04:01 +00:00
Jake Burkholder
05ba50f522 Use the fields in the sysentvec and in the vm map header in place of the
constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS.
This is mainly so that they can be variable even for the native abi, based
on different machine types.  Get stack protections from the sysentvec too.
This makes it trivial to map the stack non-executable for certain abis, on
machines that support it.
2002-09-21 22:07:17 +00:00
Nate Lawson
c1e2d3866f Move setugidsafety() call outside of process lock. This prevents a lock
recursion when closef() calls pfind() which also wants the proc lock.
This case only occurred when setugidsafety() needed to close unsafe files.

Reviewed by:	truckman
2002-09-14 18:55:11 +00:00
Don Lewis
28b325aa60 Drop the proc lock while calling fdcheckstd() which may block to allocate
memory.

Reviewed by:	jhb
2002-09-13 09:31:56 +00:00
David Xu
1279572a92 s/SGNL/SIG/
s/SNGL/SINGLE/
s/SNGLE/SINGLE/

Fix abbreviation for P_STOPPED_* etc flags, in original code they were
inconsistent and difficult to distinguish between them.

Approved by: julian (mentor)
2002-09-05 07:30:18 +00:00