Commit Graph

367 Commits

Author SHA1 Message Date
John Baldwin
75b8b3b25c Replace the at_fork, at_exec, and at_exit functions with the slightly more
flexible process_fork, process_exec, and process_exit eventhandlers.  This
reduces code duplication and also means that I don't have to go duplicate
the eventhandler locking three more times for each of at_fork, at_exec, and
at_exit.

Reviewed by:	phk, jake, almost complete silence on arch@
2003-03-24 21:15:35 +00:00
John Baldwin
a5881ea55a - Cache a reference to the credential of the thread that starts a ktrace in
struct proc as p_tracecred alongside the current cache of the vnode in
  p_tracep.  This credential is then used for all later ktrace operations on
  this file rather than using the credential of the current thread at the
  time of each ktrace event.
- Now that we have multiple ktrace-related items in struct proc that are
  pointers, rename p_tracep to p_tracevp to make it less ambiguous.

Requested by:	rwatson (1)
2003-03-13 18:24:22 +00:00
Julian Elischer
ac2e415327 Change the process flags P_KSES to be P_THREADED.
This is just a cosmetic change but I've been meaning to do it for about a year.
2003-02-27 02:05:19 +00:00
Warner Losh
a163d034fa Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
Jeff Roberson
5215b1872f - Split the struct kse into struct upcall and struct kse. struct kse will
soon be visible only to schedulers.  This greatly simplifies much the
   KSE code.

Submitted by:	davidxu
2003-02-17 05:14:26 +00:00
Julian Elischer
6f8132a867 Reversion of commit by Davidxu plus fixes since applied.
I'm not convinced there is anything major wrong with the patch but
them's the rules..

I am using my "David's mentor" hat to revert this as he's
offline for a while.
2003-02-01 12:17:09 +00:00
David Xu
0dbb100b9b Move UPCALL related data structure out of kse, introduce a new
data structure called kse_upcall to manage UPCALL. All KSE binding
and loaning code are gone.

A thread owns an upcall can collect all completed syscall contexts in
its ksegrp, turn itself into UPCALL mode, and takes those contexts back
to userland. Any thread without upcall structure has to export their
contexts and exit at user boundary.

Any thread running in user mode owns an upcall structure, when it enters
kernel, if the kse mailbox's current thread pointer is not NULL, then
when the thread is blocked in kernel, a new UPCALL thread is created and
the upcall structure is transfered to the new UPCALL thread. if the kse
mailbox's current thread pointer is NULL, then when a thread is blocked
in kernel, no UPCALL thread will be created.

Each upcall always has an owner thread. Userland can remove an upcall by
calling kse_exit, when all upcalls in ksegrp are removed, the group is
atomatically shutdown. An upcall owner thread also exits when process is
in exiting state. when an owner thread exits, the upcall it owns is also
removed.

KSE is a pure scheduler entity. it represents a virtual cpu. when a thread
is running, it always has a KSE associated with it. scheduler is free to
assign a KSE to thread according thread priority, if thread priority is changed,
KSE can be moved from one thread to another.

When a ksegrp is created, there is always N KSEs created in the group. the
N is the number of physical cpu in the current system. This makes it is
possible that even an userland UTS is single CPU safe, threads in kernel still
can execute on different cpu in parallel. Userland calls kse_create to add more
upcall structures into ksegrp to increase concurrent in userland itself, kernel
is not restricted by number of upcalls userland provides.

The code hasn't been tested under SMP by author due to lack of hardware.

Reviewed by: julian
2003-01-26 11:41:35 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
Robert Watson
ec35c2af68 Perform VOP_GETATTR() before mac_check_vnode_exec() so that
the cached attributes are available to MAC modules.

Submitted by:   mike halderman <mrh@nosc.mil>
Obtained from:	TrustedBSD Project
2003-01-21 03:26:28 +00:00
Matthew Dillon
3db161e079 It is possible for an active aio to prevent shared memory from being
dereferenced when a process exits due to the vmspace ref-count being
bumped.  Change shmexit() and shmexit_myhook() to take a vmspace instead
of a process and call it in vmspace_dofree().  This way if it is missed
in exit1()'s early-resource-free it will still be caught when the zombie is
reaped.

Also fix a potential race in shmexit_myhook() by NULLing out
vmspace->vm_shm prior to calling shm_delete_mapping() and free().

MFC after:	7 days
2003-01-13 23:04:32 +00:00
David Xu
45f603e21c Clear some KSE fields after kse mode was turned off. 2003-01-07 06:56:43 +00:00
Jake Burkholder
5dadd17b08 Add a sysctl to get the vm protections for the stack of the current process.
On architectures with a non-executable stack, eg sparc64, this is used by
libgcc to determine at runtime if its necessary to enable execute permissions
on a region of the stack which will be used to execute code, allowing the
call to mprotect to be avoided if the kernel is configured to map the stack
executable.
2003-01-04 07:54:23 +00:00
Alfred Perlstein
c522c1bf4b fdcopy() only needs a filedesc pointer. 2003-01-01 01:19:31 +00:00
Alan Cox
ee113343eb Hold the page queues lock when performing vm_page_busy(). 2002-12-18 20:16:22 +00:00
Alfred Perlstein
b80521fee5 remove syscallarg().
Suggested by: peter
2002-12-14 02:07:32 +00:00
Robert Drehmel
d1989db545 To avoid sleeping with all sorts of resources acquired (the reported
problem was a locked directory vnode), do not give the process a chance
to sleep in state "stopevent" (depends on the S_EXEC bit being set in
p_stops) until most resources have been released again.

Approved by:	re
2002-11-26 17:30:55 +00:00
Alan Cox
2d21129db2 Acquire and release the page queues lock around pmap_remove_pages() because
it updates several of vm_page's fields.
2002-11-25 04:37:44 +00:00
Jeff Roberson
a9a088823e - Release the imgp vnode prior to freeing exec_map resources to avoid
deadlock.
2002-11-17 09:33:00 +00:00
Alan Cox
4fec79bef8 Now that pmap_remove_all() is exported by our pmap implementations
use it directly.
2002-11-16 07:44:25 +00:00
Alan Cox
d154fb4fe6 When prot is VM_PROT_NONE, call pmap_page_protect() directly rather than
indirectly through vm_page_protect().  The one remaining page flag that
is updated by vm_page_protect() is already being updated by our various
pmap implementations.

Note: A later commit will similarly change the VM_PROT_READ case and
eliminate vm_page_protect().
2002-11-10 07:12:04 +00:00
Robert Watson
0c93266b9c Correct merge-o: disable the right execve() variation if !MAC 2002-11-05 18:04:50 +00:00
Robert Watson
670cb89bf4 Bring in two sets of changes:
(1) Permit userland applications to request a change of label atomic
    with an execve() via mac_execve().  This is required for the
    SEBSD port of SELinux/FLASK.  Attempts to invoke this without
    MAC compiled in result in ENOSYS, as with all other MAC system
    calls.  Complexity, if desired, is present in policy modules,
    rather than the framework.

(2) Permit policies to have access to both the label of the vnode
    being executed as well as the interpreter if it's a shell
    script or related UNIX nonsense.  Because we can't hold both
    vnode locks at the same time, cache the interpreter label.
    SEBSD relies on this because it supports secure transitioning
    via shell script executables.  Other policies might want to
    take both labels into account during an integrity or
    confidentiality decision at execve()-time.

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-05 17:51:56 +00:00
Robert Watson
ccafe7eb35 Hook up the mac_will_execve_transition() and mac_execve_transition()
entrypoints, #ifdef MAC.  The supporting logic already existed in
kern_mac.c, so no change there.  This permits MAC policies to cause
a process label change as the result of executing a binary --
typically, as a result of executing a specially labeled binary.

For example, the SEBSD port of SELinux/FLASK uses this functionality
to implement TE type transitions on processes using transitioning
binaries, in a manner similar to setuid.  Policies not implementing
a notion of transition (all the ones in the tree right now) require
no changes, since the old label data is copied to the new label
via mac_create_cred() even if a transition does occur.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-05 14:57:49 +00:00
Robert Watson
450ffb4427 Remove reference to struct execve_args from struct imgact, which
describes an image activation instance.  Instead, make use of the
existing fname structure entry, and introduce two new entries,
userspace_argv, and userspace_envv.  With the addition of
mac_execve(), this divorces the image structure from the specifics
of the execve() system call, removes a redundant pointer, etc.
No semantic change from current behavior, but it means that the
structure doesn't depend on syscalls.master-generated includes.

There seems to be some redundant initialization of imgact entries,
which I have maintained, but which could probably use some cleaning
up at some point.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-05 01:59:56 +00:00
John Baldwin
e1b1aa3bc2 - Move the 'done1' label down below the unlock of the proc lock and move
the locking of the proc lock after the goto to done1 to avoid locking
  the lock in an error case just so we can turn around and unlock it.
- Move the exec_setregs() stuff out from under the proc lock and after
  the p_args stuff.  This allows exec_setregs() to be able to sleep or
  write things out to userland, etc. which ia64 does.

Tested by:	peter
2002-10-11 21:04:01 +00:00
Jake Burkholder
05ba50f522 Use the fields in the sysentvec and in the vm map header in place of the
constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS.
This is mainly so that they can be variable even for the native abi, based
on different machine types.  Get stack protections from the sysentvec too.
This makes it trivial to map the stack non-executable for certain abis, on
machines that support it.
2002-09-21 22:07:17 +00:00
Nate Lawson
c1e2d3866f Move setugidsafety() call outside of process lock. This prevents a lock
recursion when closef() calls pfind() which also wants the proc lock.
This case only occurred when setugidsafety() needed to close unsafe files.

Reviewed by:	truckman
2002-09-14 18:55:11 +00:00
Don Lewis
28b325aa60 Drop the proc lock while calling fdcheckstd() which may block to allocate
memory.

Reviewed by:	jhb
2002-09-13 09:31:56 +00:00
David Xu
1279572a92 s/SGNL/SIG/
s/SNGL/SINGLE/
s/SNGLE/SINGLE/

Fix abbreviation for P_STOPPED_* etc flags, in original code they were
inconsistent and difficult to distinguish between them.

Approved by: julian (mentor)
2002-09-05 07:30:18 +00:00
Jake Burkholder
f36ba45234 Added fields for VM_MIN_ADDRESS, PS_STRINGS and stack protections to
sysentvec.  Initialized all fields of all sysentvecs, which will allow
them to be used instead of constants in more places.  Provided stack
fixup routines for emulations that previously used the default.
2002-09-01 21:41:24 +00:00
Jake Burkholder
bafbd49201 Renamed poorly named setregs to exec_setregs. Moved its prototype to
imgact.h with the other exec support functions.
2002-08-29 06:17:48 +00:00
Jake Burkholder
f3bec5d746 Don't require that sysentvec.sv_szsigcode be non-NULL. 2002-08-29 01:28:27 +00:00
Jake Burkholder
81f223ca02 Fixed most indentation bugs. 2002-08-25 22:36:52 +00:00
Jake Burkholder
ca0387ef9f Fixed placement of operators. Wrapped long lines. 2002-08-25 20:48:45 +00:00
Jake Burkholder
fd559a8a39 Fixed white space around operators, casts and reserved words.
Reviewed by:	md5
2002-08-24 22:55:16 +00:00
Jake Burkholder
a7cddfed7f return x; -> return (x);
return(x); -> return (x);

Reviewed by:	md5
2002-08-24 22:01:40 +00:00
Julian Elischer
49539972e9 slight cleanup of single-threading code for KSE processes 2002-08-22 21:45:58 +00:00
Jeff Roberson
619eb6e579 - Hold the vnode lock throughout execve.
- Set VV_TEXT in the top level execve code.
 - Fixup the image activators to deal with the newly locked vnode.
2002-08-13 06:55:28 +00:00
Robert Watson
339b79b939 Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke an appropriate MAC entry point to authorize execution of
a file by a process.  The check is placed slightly differently
than it appears in the trustedbsd_mac tree so that it prevents
a little more information leakage about the target of the execve()
operation.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 14:31:58 +00:00
Jacques Vidrine
89ab930718 For processes which are set-user-ID or set-group-ID, the kernel performs a few
special actions for safety.  One of these is to make sure that file descriptors
0..2 are in use, by opening /dev/null for those that are not already open.
Another is to close any file descriptors 0..2 that reference procfs.  However,
these checks were made out of order, so that it was still possible for a
set-user-ID or set-group-ID process to be started with some of the file
descriptors 0..2 unused.

Submitted by:	Georgi Guninski <guninski@guninski.com>
2002-07-30 15:38:29 +00:00
Robert Watson
d06c0d4d40 Slight restructuring of the logic for credential change case identification
during execve() to use a 'credential_changing' variable.  This makes it
easier to have outstanding patchsets against this code, as well as to
add conditionally defined clauses.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-27 18:06:49 +00:00
Peter Wemm
3ebc124838 Infrastructure tweaks to allow having both an Elf32 and an Elf64 executable
handler in the kernel at the same time.  Also, allow for the
exec_new_vmspace() code to build a different sized vmspace depending on
the executable environment.  This is a big help for execing i386 binaries
on ia64.   The ELF exec code grows the ability to map partial pages when
there is a page size difference, eg: emulating 4K pages on 8K or 16K
hardware pages.

Flesh out the i386 emulation support for ia64.  At this point, the only
binary that I know of that fails is cvsup, because the cvsup runtime
tries to execute code in pages not marked executable.

Obtained from:  dfr (mostly, many tweaks from me).
2002-07-20 02:56:12 +00:00
Alan Cox
b3afd20d9a In execve(), delay the acquisition of Giant until after kmem_alloc_wait().
(Operations on the exec_map don't require Giant.)
2002-07-14 17:58:35 +00:00
John Baldwin
63c9e754e0 We don't need to clear oldcred here since newcred is not NULL yet. 2002-07-13 03:13:15 +00:00
Alan Cox
97646f567d o Lock accesses to the page queues. 2002-07-11 18:48:05 +00:00
Jeff Roberson
0b2ed1aef7 Clean up execve locking:
- Grab the vnode object early in exec when we still have the vnode lock.
 - Cache the object in the image_params.
 - Make use of the cached object in imgact_*.c
2002-07-06 07:00:01 +00:00
Peter Wemm
c781aea8ba #include <sys/ktrace.h> would be useful too. (for ktrace_mtx) 2002-07-01 23:18:08 +00:00
Peter Wemm
1e9b3d9142 Add #include "opt_ktrace.h" 2002-07-01 19:49:04 +00:00
Julian Elischer
e602ba25fd Part 1 of KSE-III
The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by:	Almost everyone who counts
	(at various times, peter, jhb, matt, alfred, mini, bernd,
	and a cast of thousands)

	NOTE: this is still Beta code, and contains lots of debugging stuff.
	expect slight instability in signals..
2002-06-29 17:26:22 +00:00
Alfred Perlstein
7f05b0353a More caddr_t removal, make fo_ioctl take a void * instead of a caddr_t. 2002-06-29 01:50:25 +00:00
Alan Cox
366838ddfe o Eliminate vmspace::vm_minsaddr. It's initialized but never used.
o Replace stale comments in vmspace by "const until freed" annotations
   on some fields.
2002-06-25 18:14:38 +00:00
Alfred Perlstein
69be5db96f Don't leak resources if fdcheckstd() fails during exec.
Submitted by: Mike Makonnen <makonnen@pacbell.net>
2002-06-20 17:27:28 +00:00
Alfred Perlstein
1419eacb86 Squish the "could sleep with process lock" messages caused by calling
uifind() with a proc lock held.

change_ruid() and change_euid() have been modified to take a uidinfo
structure which will be pre-allocated by callers, they will then
call uihold() on the uidinfo structure so that the caller's logic
is simplified.

This allows one to call uifind() before locking the proc struct and
thereby avoid a potential blocking allocation with the proc lock
held.

This may need revisiting, perhaps keeping a spare uidinfo allocated
per process to handle this situation or re-examining if the proc
lock needs to be held over the entire operation of changing real
or effective user id.

Submitted by: Don Lewis <dl-freebsd@catspoiler.org>
2002-06-19 06:39:25 +00:00
John Baldwin
6c84de02e0 Properly lock accesses to p_tracep and p_traceflag. Also make a few
ktrace-only things #ifdef KTRACE that were not before.
2002-06-07 05:41:27 +00:00
John Baldwin
9b3b1c5fdf - Reorder execve() so that it performs blocking operations before it
locks the process.
- Defer other blocking operations such as vrele()'s until after we
  release locks.
- execsigs() now requires the proc lock to be held when it is called
  rather than locking the process internally.
2002-05-02 15:00:14 +00:00
Jacques Vidrine
e983a3762b When exec'ing a set[ug]id program, make sure that the stdio file descriptors
(0, 1, 2) are allocated by opening /dev/null for any which are not already
open.

Reviewed by:	alfred, phk
MFC after:	2 days
2002-04-19 00:45:29 +00:00
Peter Wemm
911fc92344 Increase the size of the register stack storage on ia64 from 32K to 2MB so
that we can compile gcc.  This is a hack because it adds a fixed 2MB to
each process's VSIZE regardless of how much is really being used since
there is no grow-up stack support.  At least it isn't physical memory.
Sigh.

Add a sysctl to enable tweaking it for new processes.
2002-04-05 01:57:45 +00:00
John Baldwin
44731cab3b Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API.  The entire API now consists of two functions
similar to the pre-KSE API.  The suser() function takes a thread pointer
as its only argument.  The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0.  The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on:	smp@
2002-04-01 21:31:13 +00:00
Alan Cox
5e20c11f19 Add a local proc *p in exec_new_vmspace() to avoid repeated dereferencing
to obtain it.
2002-03-31 00:05:30 +00:00
Alfred Perlstein
8899023f66 Make the reference counting of 'struct pargs' SMP safe.
There is still some locations where the PROC lock should be held
in order to prevent inconsistent views from outside (like the
proc->p_fd fix for kern/vfs_syscalls.c:checkdirs()) that can be
fixed later.

Submitted by: Jonathan Mini <mini@haikugeek.com>
2002-03-27 21:36:18 +00:00
Alan Cox
cb100b25ce Remove an unnecessary and inconsistently used variable from exec_new_vmspace(). 2002-03-26 19:20:04 +00:00
Alfred Perlstein
4d77a549fe Remove __P. 2002-03-19 21:25:46 +00:00
Jake Burkholder
ac59490b5e Convert all pmap_kenter/pmap_kremove pairs in MI code to use pmap_qenter/
pmap_qremove.  pmap_kenter is not safe to use in MI code because it is not
guaranteed to flush the mapping from the tlb on all cpus.  If the process
in question is preempted and migrates cpus between the call to pmap_kenter
and pmap_kremove, the original cpu will be left with stale mappings in its
tlb.  This is currently not a problem for i386 because we do not use PG_G on
SMP, and thus all mappings are flushed from the tlb on context switches, not
just user mappings.  This is not the case on all architectures, and if PG_G
is to be used with SMP on i386 it will be a problem.  This was committed by
peter earlier as part of his fine grained tlb shootdown work for i386, which
was backed out for other reasons.

Reviewed by:	peter
2002-03-17 00:56:41 +00:00
Warner Losh
0cf3c909d8 Remove now unused struct proc *p.
Approved by: jhb
2002-02-27 20:57:57 +00:00
John Baldwin
a854ed9893 Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.
2002-02-27 18:32:23 +00:00
Peter Wemm
d1693e1701 Back out all the pmap related stuff I've touched over the last few days.
There is some unresolved badness that has been eluding me, particularly
affecting uniprocessor kernels.  Turning off PG_G helped (which is a bad
sign) but didn't solve it entirely.  Userland programs still crashed.
2002-02-27 09:51:33 +00:00
Peter Wemm
bd1e3a0f89 Jake further reduced IPI shootdowns on sparc64 in loops by using ranged
shootdowns in a couple of key places.  Do the same for i386.  This also
hides some physical addresses from higher levels and has it use the
generic vm_page_t's instead.  This will help for PAE down the road.

Obtained from:	jake (MI code, suggestions for MD part)
2002-02-27 02:14:58 +00:00
Julian Elischer
079b7badea Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
2002-02-07 20:58:47 +00:00
Alan Cox
6f5dafea75 o Call the functions registered with at_exec() from exec_new_vmspace()
instead of execve().  Otherwise, the possibility still exists
   for a pending AIO to modify the new address space.

Reviewed by:	alfred
2002-01-13 19:36:35 +00:00
Alfred Perlstein
426da3bcfb SMP Lock struct file, filedesc and the global file list.
Seigo Tanimura (tanimura) posted the initial delta.

I've polished it quite a bit reducing the need for locking and
adapting it for KSE.

Locks:

1 mutex in each filedesc
   protects all the fields.
   protects "struct file" initialization, while a struct file
     is being changed from &badfileops -> &pipeops or something
     the filedesc should be locked.

1 mutex in each struct file
   protects the refcount fields.
   doesn't protect anything else.
   the flags used for garbage collection have been moved to
     f_gcflag which was the FILLER short, this doesn't need
     locking because the garbage collection is a single threaded
     container.
  could likely be made to use a pool mutex.

1 sx lock for the global filelist.

struct file *	fhold(struct file *fp);
        /* increments reference count on a file */

struct file *	fhold_locked(struct file *fp);
        /* like fhold but expects file to locked */

struct file *	ffind_hold(struct thread *, int fd);
        /* finds the struct file in thread, adds one reference and
                returns it unlocked */

struct file *	ffind_lock(struct thread *, int fd);
        /* ffind_hold, but returns file locked */

I still have to smp-safe the fget cruft, I'll get to that asap.
2002-01-13 11:58:06 +00:00
Alfred Perlstein
21d56e9c33 Make AIO a loadable module.
Remove the explicit call to aio_proc_rundown() from exit1(), instead AIO
will use at_exit(9).

Add functions at_exec(9), rm_at_exec(9) which function nearly the
same as at_exec(9) and rm_at_exec(9), these functions are called
on behalf of modules at the time of execve(2) after the image
activator has run.

Use a modified version of tegge's suggestion via at_exec(9) to close
an exploitable race in AIO.

Fix SYSCALL_MODULE_HELPER such that it's archetecuterally neutral,
the problem was that one had to pass it a paramater indicating the
number of arguments which were actually the number of "int".  Fix
it by using an inline version of the AS macro against the syscall
arguments.  (AS should be available globally but we'll get to that
later.)

Add a primative system for dynamically adding kqueue ops, it's really
not as sophisticated as it should be, but I'll discuss with jlemon when
he's around.
2001-12-29 07:13:47 +00:00
David E. O'Brien
91f9161737 Repeat after me -- "Use of ANSI string concatenation can be bad."
In this case, C99's __func__ is properly defined as:

	static const char __func__[] = "function-name";

and GCC 3.1 will not allow it to be used in bogus string concatenation.
2001-12-10 05:40:12 +00:00
Peter Wemm
c3699b5f63 For what its worth, sync up the type of ps_arg_cache_max (unsigned long)
with the sysctl type (signed long).
2001-11-08 00:24:48 +00:00
Dag-Erling Smørgrav
9ca45e813c Add a P_INEXEC flag that indicates that the process has called execve() and
it has not yet returned.  Use this flag to deny debugging requests while
the process is execve()ing, and close once and for all any race conditions
that might occur between execve() and various debugging interfaces.

Reviewed by:	jhb, rwatson
2001-10-27 11:11:25 +00:00
Robert Drehmel
9a024fc559 Use vm_offset_t instead of caddr_t to fix a warning and remove
two casts.
2001-10-24 14:15:28 +00:00
Matthew Dillon
79deba82cd Fix ktrace enablement/disablement races that can result in a vnode
ref count panic.

Bug noticed by:	ps
Reviewed by:	ps
MFC after:	1 day
2001-10-24 01:05:39 +00:00
Paul Saab
cbc89bfbfe Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader
tunable.

Reviewed by:	peter
MFC after:	2 weeks
2001-10-10 23:06:54 +00:00
Doug Rabson
e913ca22e2 Move setregs() out from under the PROC_LOCK so that it can use functions
list suword() which may trap.
2001-10-10 20:04:57 +00:00
John Baldwin
8688bb9383 proces -> process in a comment. 2001-10-09 17:25:30 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
Matthew Dillon
116734c4d1 Pushdown Giant for acct(), kqueue(), kevent(), execve(), fork(),
vfork(), rfork(), jail().
2001-09-01 03:04:31 +00:00
Alexander Langer
b8c526df70 Fix a simple typo I just happened to find. 2001-08-22 19:12:24 +00:00
Dima Dorfman
b2c3fa70e3 Correct spelling in a comment and remove trailing newline from a
panic() call (panic() adds it itself).
2001-07-11 02:04:43 +00:00
Guido van Rooij
333ea48563 Don't share sig handlers after an exec
Reviewed by:	Alfred Perlstein
2001-07-09 19:01:42 +00:00
Matthew Dillon
0cddd8f023 With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage).  Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.
2001-07-04 16:20:28 +00:00
John Baldwin
fbd26f7594 Fix some lock order reversals where we called free() while holding a proc
lock.  We now use temporary variables to save the process argument pointer
and just update the pointer while holding the lock.  We then perform the
free on the cached pointer after releasing the lock.
2001-06-20 23:10:06 +00:00
Peter Wemm
b85db19691 Move setugid() a little sooner to before we release tracing in case
crdup() or change_e*id() block on malloc() or mutex.
2001-06-16 23:34:23 +00:00
Robert Watson
7cb8e4d277 o pcred-removal changes included modifications to optimize the setting of
the saved uid and gid during execve().  Unfortunately, the optimizations
  were incorrect in the case where the credential was updated, skipping
  the setting of the saved uid and gid when new credentials were generated.
  This change corrects that problem by handling the newcred!=NULL case
  correctly.

Reported/tested by:	David Malone <dwmalone@maths.tcd.ie>

Obtained from:	TrustedBSD Project
2001-05-26 19:59:44 +00:00
Robert Watson
b1fc0ec1a7 o Merge contents of struct pcred into struct ucred. Specifically, add the
real uid, saved uid, real gid, and saved gid to ucred, as well as the
  pcred->pc_uidinfo, which was associated with the real uid, only rename
  it to cr_ruidinfo so as not to conflict with cr_uidinfo, which
  corresponds to the effective uid.
o Remove p_cred from struct proc; add p_ucred to struct proc, replacing
  original macro that pointed.
  p->p_ucred to p->p_cred->pc_ucred.
o Universally update code so that it makes use of ucred instead of pcred,
  p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo,
  cr_{r,sv}{u,g}id instead of p_*, etc.
o Remove pcred0 and its initialization from init_main.c; initialize
  cr_ruidinfo there.
o Restruction many credential modification chunks to always crdup while
  we figure out locking and optimizations; generally speaking, this
  means moving to a structure like this:
        newcred = crdup(oldcred);
        ...
        p->p_ucred = newcred;
        crfree(oldcred);
  It's not race-free, but better than nothing.  There are also races
  in sys_process.c, all inter-process authorization, fork, exec, and
  exit.
o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid;
  remove comments indicating that the old arrangement was a problem.
o Restructure exec1() a little to use newcred/oldcred arrangement, and
  use improved uid management primitives.
o Clean up exit1() so as to do less work in credential cleanup due to
  pcred removal.
o Clean up fork1() so as to do less work in credential cleanup and
  allocation.
o Clean up ktrcanset() to take into account changes, and move to using
  suser_xxx() instead of performing a direct uid==0 comparision.
o Improve commenting in various kern_prot.c credential modification
  calls to better document current behavior.  In a couple of places,
  current behavior is a little questionable and we need to check
  POSIX.1 to make sure it's "right".  More commenting work still
  remains to be done.
o Update credential management calls, such as crfree(), to take into
  account new ruidinfo reference.
o Modify or add the following uid and gid helper routines:
      change_euid()
      change_egid()
      change_ruid()
      change_rgid()
      change_svuid()
      change_svgid()
  In each case, the call now acts on a credential not a process, and as
  such no longer requires more complicated process locking/etc.  They
  now assume the caller will do any necessary allocation of an
  exclusive credential reference.  Each is commented to document its
  reference requirements.
o CANSIGIO() is simplified to require only credentials, not processes
  and pcreds.
o Remove lots of (p_pcred==NULL) checks.
o Add an XXX to authorization code in nfs_lock.c, since it's
  questionable, and needs to be considered carefully.
o Simplify posix4 authorization code to require only credentials, not
  processes and pcreds.  Note that this authorization, as well as
  CANSIGIO(), needs to be updated to use the p_cansignal() and
  p_cansched() centralized authorization routines, as they currently
  do not take into account some desirable restrictions that are handled
  by the centralized routines, as well as being inconsistent with other
  similar authorization instances.
o Update libkvm to take these changes into account.

Obtained from:	TrustedBSD Project
Reviewed by:	green, bde, jhb, freebsd-arch, freebsd-audit
2001-05-25 16:59:11 +00:00
John Baldwin
d8aad40c88 Axe unneeded spl()'s. 2001-05-21 18:30:50 +00:00
Alfred Perlstein
2395531439 Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb
2001-05-19 01:28:09 +00:00
Mark Murray
fb919e4d5a Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by:	bde (with reservations)
2001-05-01 08:13:21 +00:00
Greg Lehey
60fb0ce365 Revert consequences of changes to mount.h, part 2.
Requested by:	bde
2001-04-29 02:45:39 +00:00
Greg Lehey
d98dc34f52 Correct #includes to work with fixed sys/mount.h. 2001-04-23 09:05:15 +00:00
John Baldwin
e65897c381 Proc locking. 2001-03-07 03:27:32 +00:00
Jeroen Ruigrok van der Werven
1a6e52d0e9 Fix typo: seperate -> separate.
Seperate does not exist in the english language.
2001-02-06 11:21:58 +00:00
Jake Burkholder
98f03f9030 Protect proc.p_pptr and proc.p_children/p_sibling with the
proctree_lock.

linprocfs not locked pending response from informal maintainer.

Reviewed by:	jhb, -smp@
2000-12-23 19:43:10 +00:00
Robert Watson
cf64863a1e o Add a comment to exec_check_permissions() to indicate that the
passed vnode must be locked; this is the case because of calls
  to VOP_GETATTR(), VOP_ACCESS(), and VOP_OPEN().  This becomes
  more of an issue when VOP_ACCESS() gets a bit more complicated,
  which it does when you introduce ACL, Capability, and MAC
  support.

Obtained from:	TrustedBSD Project
2000-11-30 21:06:05 +00:00
John Baldwin
35e0e5b311 Catch up to moving headers:
- machine/ipl.h -> sys/ipl.h
- machine/mutex.h -> sys/mutex.h
2000-10-20 07:58:15 +00:00
Doug Rabson
63c47a5ca0 Add a gross hack for ia64 to allocate the backing store for a new program. 2000-10-12 14:24:03 +00:00
Takanori Watanabe
b9a22da4cf Make size of dynamic loader argument variable to support
various executable file format.

Reviewed by:	peter
2000-09-26 05:09:21 +00:00
Don Lewis
eabc23efb3 Remove unneeded #include that was a remnant of an earlier version of
my uidinfo patch.

Found by:	phk
2000-09-21 09:04:17 +00:00
Bruce Evans
621dbe43df Added used include of <sys/mutex.h> (don't depend on pollution in
<sys/signalvar.h>).
2000-09-17 12:20:49 +00:00
Boris Popov
9ff5ce6baf Add three new VOPs: VOP_CREATEVOBJECT, VOP_DESTROYVOBJECT and VOP_GETVOBJECT.
They will be used by nullfs and other stacked filesystems to support full
cache coherency.

Reviewed in general by:	mckusick, dillon
2000-09-12 09:49:08 +00:00
Don Lewis
f535380cb6 Remove uidinfo hash table lookup and maintenance out of chgproccnt() and
chgsbsize(), which are called rather frequently and may be called from an
interrupt context in the case of chgsbsize().  Instead, do the hash table
lookup and maintenance when credentials are changed, which is a lot less
frequent.  Add pointers to the uidinfo structures to the ucred and pcred
structures for fast access.  Pass a pointer to the credential to chgproccnt()
and chgsbsize() instead of passing the uid.  Add a reference count to the
uidinfo structure and use it to decide when to free the structure rather
than freeing the structure when the resource consumption drops to zero.
Move the resource tracking code from kern_proc.c to kern_resource.c.  Move
some duplicate code sequences in kern_prot.c to separate helper functions.
Change KASSERTs in this code to unconditional tests and calls to panic().
2000-09-05 22:11:13 +00:00
John Baldwin
9701cd40b4 Support for unsigned integer and long sysctl variables. Update the
SYSCTL_LONG macro to be consistent with other integer sysctl variables
and require an initial value instead of assuming 0.  Update several
sysctl variables to use the unsigned types.

PR:		15251
Submitted by:	Kelly Yancey <kbyanc@posi.net>
2000-07-05 07:46:41 +00:00
Poul-Henning Kamp
2c9b67a8df Remove unneeded #include <vm/vm_zone.h>
Generated by:	src/tools/tools/kerninclude
2000-04-30 18:52:11 +00:00
Matthew Dillon
d323ddf317 Fix #! script exec under linux emulation. If a script is exec'd from a
program running under linux emulation, the script binary is checked for
    in /compat/linux first.  Without this patch the wrong script binary
    (i.e. the FreeBSD binary) will be run instead of the linux binary.
    For example, #!/bin/sh, thus breaking out of linux compatibility mode.

    This solves a number of problems people have had installing linux
    software on FreeBSD boxes.
2000-04-26 20:58:40 +00:00
Poul-Henning Kamp
ed6aff7387 Remove unneeded <sys/buf.h> includes.
Due to some interesting cpp tricks in lockmgr, the LINT kernel shrinks
by 924 bytes.
2000-04-18 15:15:39 +00:00
Jonathan Lemon
cb679c385e Introduce kqueue() and kevent(), a kernel event notification facility. 2000-04-16 18:53:38 +00:00
Warner Losh
5e2664428c When we are execing a setugid program, and we have a procfs filesystem
file open in one of the special file descriptors (0, 1, or 2), close
it before completing the exec.

Submitted by: nergal@idea.avet.com.pl
Constructive comments: deraadt@openbsd.org, sef, peter, jkh
2000-01-20 07:12:52 +00:00
Bruce Evans
654f6be1c8 Changed the type used to represent the user stack pointer from `long *'
to `register_t *'.  This fixes bugs like misplacement of argc and argv
on the user stack on i386's with 64-bit longs.  We still use longs to
represent "words" like argc and argv, and assume that they are on the
stack (and that there is stack).  The suword() and fuword() families
should also use register_t.
1999-12-27 10:42:55 +00:00
Eivind Eklund
762e6b856c Introduce NDFREE (and remove VOP_ABORTOP) 1999-12-15 23:02:35 +00:00
Poul-Henning Kamp
a8704f8999 Add a sysctl to control if argv is disclosed to the world:
kern.ps_argsopen
It defaults to 1 which means that all users can see all argvs in ps(1).

Reviewed by:	Warner
1999-11-26 08:27:16 +00:00
Poul-Henning Kamp
b9df5231ca Introduce commandline caching in the kernel.
This fixes some nasty procfs problems for SMP, makes ps(1) run much faster,
and makes ps(1) even less dependent on /proc which will aid chroot and
jails alike.

To disable this facility and revert to previous behaviour:
        sysctl -w kern.ps_arg_cache_limit=0

For full details see the current@FreeBSD.org mail-archives.
1999-11-16 20:31:58 +00:00
Poul-Henning Kamp
923502ff91 useracc() the prequel:
Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>.  This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.
1999-10-29 18:09:36 +00:00
Peter Wemm
c3aac50f28 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
Warner Losh
fdf4e8b30c Stop profiling on exec.
Obtained from: NetBSD
1999-08-11 20:35:38 +00:00
Poul-Henning Kamp
f711d546d2 Suser() simplification:
1:
  s/suser/suser_xxx/

2:
  Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
  s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.
1999-04-27 11:18:52 +00:00
Peter Wemm
db42d90829 unifdef -DVM_STACK - it's been on for a while for x86 and was checked
and appeared to be working for the Alpha some time ago.
1999-04-19 14:14:14 +00:00
John Polstra
4fe88fe637 Restore support for executing BSD/OS binaries on the i386 by passing
the address of the ps_strings structure to the process via %ebx.
For other kinds of binaries, %ebx is still zeroed as before.

Submitted by:	Thomas Stephens <tas@stephens.org>
Reviewed by:	jdp
1999-04-03 22:20:03 +00:00
Luoqi Chen
b1028ad122 Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This
is the preparation step for moving pmap storage out of vmspace proper.

Reviewed by:	Alan Cox	<alc@cs.rice.edu>
		Matthew Dillion	<dillon@apollo.backplane.com>
1999-02-19 14:25:37 +00:00
Matthew Dillon
8aef171243 Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile
1999-01-28 00:57:57 +00:00
Matthew Dillon
d254af07a1 Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile
1999-01-27 21:50:00 +00:00
Julian Elischer
2267af789e Add (but don't activate) code for a special VM option to make
downward growing stacks more general.
Add (but don't activate) code to use the new stack facility
when running threads, (specifically the linux threads support).
This allows people to use both linux compiled linuxthreads, and also the
native FreeBSD linux-threads port.

The code is conditional on VM_STACK. Not using this will
produce the old heavily tested system.

Submitted by: Richard Seaman <dick@tar.com>
1999-01-06 23:05:42 +00:00
Doug Rabson
9c0fed3dcf Various changes to support OSF1 emulation:
* Move the user stack from VM_MAXUSER_ADDRESS to a place below the 32bit
  boundary (needed to support 32bit OSF programs).  This should also save
  one pagetable per process.
* Add cvtqlsv to the set of instructions handled by the floating point
  software completion code.
* Disable all floating point exceptions by default.
* A minor change to execve to allow the OSF1 image activator to support
  dynamic loading.
1998-12-30 10:38:59 +00:00
Doug Rabson
486bddb033 Fix some 64bit truncation problems which crept into SYSCTL_LONG() with the
last cleanup.  Since the oid_arg2 field of struct sysctl_oid is not wide
enough to hold a long, the SYSCTL_LONG() macro has been modified to only
support exporting long variables by pointer instead of by value.

Reviewed by: bde
1998-12-27 18:03:29 +00:00
Bruce Evans
4c56fcdead Removed the cast to a pointer in the definition of PS_STRINGS and
adjusted related casts to match (only in the kernel in this commit).
The pointer was only wanted in one place in kern_exec.c.  Applications
should use the kern.ps_strings sysctl instead of PS_STRINGS, so they
shouldn't notice this change.
1998-12-16 16:28:58 +00:00
Bruce Evans
2caecceeb5 Removed all traces of SYSCTL_INTPTR(). Pointers can't really be passed
across the kernel -> application interface, and for the one sysctl where
they were passed and actually used (kern.ps_strings), the applications
want addresses represented as u_longs anyway (the other sysctl that
passed them, kern.usrstack, has never been used).

Agreed to by:	dfr, phk
1998-12-16 16:06:29 +00:00
David Greenman
730075613a Added a second argument, "activate" to the vm_page_unwire() call so that
the caller can select either inactive or active queue to put the page on.
1998-10-28 13:37:02 +00:00
Peter Wemm
aa855a598d *gulp*. Jordan specifically OK'ed this..
This is the bulk of the support for doing kld modules.  Two linker_sets
were replaced by SYSINIT()'s.  VFS's and exec handlers are self registered.
kld is now a superset of lkm.  I have converted most of them, they will
follow as a seperate commit as samples.
This all still works as a static a.out kernel using LKM's.
1998-10-16 03:55:01 +00:00
Doug Rabson
e69763a315 Cosmetic changes to the PAGE_XXX macros to make them consistent with
the other objects in vm.
1998-09-04 08:06:57 +00:00
Doug Rabson
069e9bc1b4 Change various syscalls to use size_t arguments instead of u_int.
Add some overflow checks to read/write (from bde).

Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags
and vm_object::paging_in_progress to use operations which are not
interruptable.

Reviewed by: Bruce Evans <bde@zeta.org.au>
1998-08-24 08:39:39 +00:00
Bruce Evans
aae0aa4593 Cast between longs and pointers via intptr_t. The results of fuword()
should be checked before casting.  The results of suword() should be
checked.
1998-07-15 06:19:33 +00:00
Doug Rabson
ecbb00a262 This commit fixes various 64bit portability problems required for
FreeBSD/alpha.  The most significant item is to change the command
argument to ioctl functions from int to u_long.  This change brings us
inline with various other BSD versions.  Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.
1998-06-07 17:13:14 +00:00
Dag-Erling Smørgrav
dc73342347 Seventy-odd "its" / "it's" typos in comments fixed as per kern/6108. 1998-04-17 22:37:19 +00:00
John Dyson
eed2412e5a Free the first page also if it is not valid. 1998-03-08 06:21:33 +00:00
John Dyson
8f9110f6a1 This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code.  These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances.  Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code.  This code might have been committed seperately, but
almost everything is interrelated.

1)	Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
	are fully valid.
2)	Rather than deactivating erroneously read initial (header) pages in
	kern_exec, we now free them.
3)	Fix the rundown of non-VMIO buffers that are in an inconsistent
	(missing vp) state.
4)	Fix the disassociation of pages from buffers in brelse.  The previous
	code had rotted and was faulty in a couple of important circumstances.
5)	Remove a gratuitious buffer wakeup in vfs_vmio_release.
6)	Remove a crufty and currently unused cluster mechanism for VBLK
	files in vfs_bio_awrite.  When the code is functional, I'll add back
	a cleaner version.
7)	The page busy count wakeups assocated with the buffer cache usage were
	incorrectly cleaned up in a previous commit by me.  Revert to the
	original, correct version, but with a cleaner implementation.
8)	The cluster read code now tries to keep data associated with buffers
	more aggressively (without breaking the heuristics) when it is presumed
	that the read data (buffers) will be soon needed.
9)	Change to filesystem lockmgr locks so that they use LK_NOPAUSE.  The
	delay loop waiting is not useful for filesystem locks, due to the
	length of the time intervals.
10)	Correct and clean-up spec_getpages.
11)	Implement a fully functional nfs_getpages, nfs_putpages.
12)	Fix nfs_write so that modifications are coherent with the NFS data on
	the server disk (at least as well as NFS seems to allow.)
13)	Properly support MS_INVALIDATE on NFS.
14)	Properly pass down MS_INVALIDATE to lower levels of the VM code from
	vm_map_clean.
15)	Better support the notion of pages being busy but valid, so that
	fewer in-transit waits occur.  (use p->busy more for pageouts instead
	of PG_BUSY.)  Since the page is fully valid, it is still usable for
	reads.
16)	It is possible (in error) for cached pages to be busy.  Make the
	page allocation code handle that case correctly.  (It should probably
	be a printf or panic, but I want the system to handle coding errors
	robustly.  I'll probably add a printf.)
17)	Correct the design and usage of vm_page_sleep.  It didn't handle
	consistancy problems very well, so make the design a little less
	lofty.  After vm_page_sleep, if it ever blocked, it is still important
	to relookup the page (if the object generation count changed), and
	verify it's status (always.)
18)	In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19)	Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20)	Fix vm_pager_put_pages and it's descendents to support an int flag
	instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
Peter Wemm
c8a7999933 Update the ELF image activator to use some of the exec resources rather
than rolling it's own.  This means that it now uses the "safe"
exec_map_first_page() to get the ld.so headers rather than risking a panic
on a page fault failure (eg: NFS server goes down).
Since all the ELF tools go to a lot of trouble to make sure everything
lives in the first page for executables, this is a win.  I have not seen
any ELF executable on any system where all the headers didn't fit in the
first page with lots of room to spare.
I have been running variations of this code for some time on my pure ELF
systems.
1998-03-02 05:47:58 +00:00
Bruce Evans
5132080e71 Removed unused #includes. 1998-02-25 13:08:07 +00:00
Eivind Eklund
0b08f5f737 Back out DIAGNOSTIC changes. 1998-02-06 12:14:30 +00:00
John Dyson
95461b450d 1) Start using a cleaner and more consistant page allocator instead
of the various ad-hoc schemes.
2)	When bringing in UPAGES, the pmap code needs to do another vm_page_lookup.
3)	When appropriate, set the PG_A or PG_M bits a-priori to both avoid some
	processor errata, and to minimize redundant processor updating of page
	tables.
4)	Modify pmap_protect so that it can only remove permissions (as it
	originally supported.)  The additional capability is not needed.
5)	Streamline read-only to read-write page mappings.
6)	For pmap_copy_page, don't enable write mapping for source page.
7)	Correct and clean-up pmap_incore.
8)	Cluster initial kern_exec pagin.
9)	Removal of some minor lint from kern_malloc.
10)	Correct some ioopt code.
11)	Remove some dead code from the MI swapout routine.
12)	Correct vm_object_deallocate (to remove backing_object ref.)
13)	Fix dead object handling, that had problems under heavy memory load.
14)	Add minor vm_page_lookup improvements.
15)	Some pages are not in objects, and make sure that the vm_page.c can
	properly support such pages.
16)	Add some more page deficit handling.
17)	Some minor code readability improvements.
1998-02-05 03:32:49 +00:00
Eivind Eklund
47cfdb166d Turn DIAGNOSTIC into a new-style option. 1998-02-04 22:34:03 +00:00
John Dyson
1616db3cf8 Implement the first page access for object type determination more
VM clean.  Also, use vm_map_insert instead of vm_mmap.
Reviewed by:	dg@freebsd.org
1998-01-11 21:35:38 +00:00
John Dyson
95e5e988e0 Make our v_usecount vnode reference count work identically to the
original BSD code.  The association between the vnode and the vm_object
no longer includes reference counts.  The major difference is that
vm_object's are no longer freed gratuitiously from the vnode, and so
once an object is created for the vnode, it will last as long as the
vnode does.

When a vnode object reference count is incremented, then the underlying
vnode reference count is incremented also.  The two "objects" are now
more intimately related, and so the interactions are now much less
complex.

When vnodes are now normally placed onto the free queue with an object still
attached.  The rundown of the object happens at vnode rundown time, and
happens with exactly the same filesystem semantics of the original VFS
code.  There is absolutely no need for vnode_pager_uncache and other
travesties like that anymore.

A side-effect of these changes is that SMP locking should be much simpler,
the I/O copyin/copyout optimizations work, NFS should be more ponderable,
and further work on layered filesystems should be less frustrating, because
of the totally coherent management of the vnode objects and vnodes.

Please be careful with your system while running this code, but I would
greatly appreciate feedback as soon a reasonably possible.
1998-01-06 05:26:17 +00:00
Bruce Evans
675ea6f083 Unspammed nested include of <vm/vm_zone.h>. 1997-12-27 02:56:39 +00:00
Sean Eric Fagan
d5f81602a7 Clear the p_stops field on change of user/group id, unless the correct
flag is set in the p_pfsflags field.  This, essentially, prevents an SUID
proram from hanging after being traced.  (E.g., "truss /usr/bin/rlogin" would
fail, but leave rlogin in a stopevent state.)  Yet another case where procctl
is (hopefully ;)) no longer needed in the general case.

Reviewed by:	bde (thanks bruce :))
1997-12-20 03:05:47 +00:00
David Greenman
c7ce9e2634 Fix bug where a struct buf was free()'d back to the system malloc pool.
Quite amazing that the system runs at all with this bug. Also present in
2.2.5. The bug appears to have come in with changes in rev 1.53.

PR:		might fix PR#5313
Submitted by:	bde
1997-12-16 15:40:29 +00:00
Sean Eric Fagan
2a024a2b05 Changes to allow event-based process monitoring and control. 1997-12-06 04:11:14 +00:00
Poul-Henning Kamp
cb226aaa62 Move the "retval" (3rd) parameter from all syscall functions and put
it in struct proc instead.

This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.

I have not removed the /*ARGSUSED*/, they will require some looking at.

libkvm, ps and other userland struct proc frobbing programs will need
recompiled.
1997-11-06 19:29:57 +00:00
Guido van Rooij
d021ae3db5 On execing a sgid program, do not set P_SUGID when cr_gid and cr)_uid
do not change.
PR:		4755
Reviewed by:	Bruce Evans
1997-10-15 18:28:34 +00:00
John Dyson
99448ed11d Change the M_NAMEI allocations to use the zone allocator. This change
plus the previous changes to use the zone allocator decrease the useage
of malloc by half.  The Zone allocator will be upgradeable to be able
to use per CPU-pools, and has more intelligent usage of SPLs.  Additionally,
it has reasonable stats gathering capabilities, while making most calls
inline.
1997-09-21 04:24:27 +00:00
Bruce Evans
e4ba6a82b0 Removed unused #includes. 1997-09-02 20:06:59 +00:00
David Greenman
a78e8d2a83 Fixed security hole with sharing the file descriptor table (via rfork)
when execing a setuid/setgid binary. Code submitted by Sean Eric Fagan
(sef@freebsd.org).
Also consolidated the setuid/setgid checks into one place.
Reviewed by:	dyson,sef
1997-08-04 05:39:24 +00:00
Andrey A. Chernov
5cf3d12ca5 Don't clobber user space argv0 memory on shell exec, mainly for vfork()
Fix another bug: if argv[0] is NULL, garbadge args might be added for
shell script
Submitted by: Tor Egge <Tor.Egge@idi.ntnu.no> (with yet one fault detect from me)
1997-04-23 22:07:05 +00:00
David Greenman
1ebd0c5945 Brought fix from the 2.2 branch forward (see rev 1.47.2.7): serious bugs
with reading the image header.
1997-04-18 02:43:05 +00:00
John Dyson
492da96c9d Correct the previous thread-fix commit. I made a clerical error. 1997-04-13 03:05:31 +00:00
John Dyson
5856e12e69 Fully implement vfork. Vfork is now much much faster than even our
fork. (On my machine, fork is about 240usecs, vfork is 78usecs.)

Implement rfork(!RFPROC !RFMEM), which allows a thread to divorce its memory
	from the other threads of a group.

Implement rfork(!RFPROC RFCFDG), which closes all file descriptors, eliminating
	possible existing shares with other threads/processes.

Implement rfork(!RFPROC RFFDG), which divorces the file descriptors for a
	thread from the rest of the group.

Fix the case where a thread does an exec.  It is almost nonsense for a thread
	to modify the other threads address space by an exec, so we
	now automatically divorce the address space before modifying it.
1997-04-13 01:48:35 +00:00
John Dyson
c04b956c6f Effectively remove the previous commit to fix threads forking. The
change was a false-start, and needs more work.
1997-04-12 04:07:50 +00:00
John Dyson
af9ec88589 Allow a kernel-supported process thread to do an exec without blasting
away the VM space of all of the other, associated threads.
1997-04-11 23:37:23 +00:00
David Greenman
66141753e6 Killed unnecessary vp == NULL check after namei. 1997-04-04 09:06:20 +00:00
David Greenman
a3cf6ebae3 Oops, only free component name buffer if namei() didn't. This bug has
been in here since I wrote the code 3 years ago! Thanks, Bruce!

Submitted by:	bde
1997-04-04 07:30:06 +00:00
David Greenman
6d5a0a8c23 Various fixes:
1. imgp->image_header needs to be cleared for the bp == NULL && `goto
   interpret' case, else exec_fail_dealloc would free it twice after
   an error.

2. Moved the vp->v_writecount check in exec_check_permissions() to
   near the end.  This fixes execve("/dev/null", ...) returning the
   bogus errno ETXTBSY.  ETXTBSY is still returned for attempts to
   exec interpreted files that are open for writing.  The man page
   is very old and wrong here.  It says that ETXTBSY is for pure
   procedure (shared text) files that are open for writing or reading.

3. Moved the setuid disabling in exec_check_permissions() to the end.
   Cosmetic.  It's more natural to dispose of all the error cases
   first.

...plus a couple of other cosmetic changes.

Submitted by:	bde
1997-04-04 04:17:11 +00:00
David Greenman
8677f5094d Lose the vnode lock on a permissions failure.
Submitted by:	Tor Egge <Tor.Egge@idi.ntnu.no>
1997-04-04 01:30:33 +00:00
David Greenman
9caaadb63a Changed the way that the exec image header is read to be filesystem-
centric rather than VM-centric to fix a problem with errors not being
detectable when the header is read.
Killed exech_map as a result of these changes.
There appears to be no performance difference with this change.
1997-03-31 11:11:26 +00:00
Peter Wemm
6875d25465 Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.
1997-02-22 09:48:43 +00:00
David Greenman
e47bda0730 Fix from PR #2757:
execve() clears the P_SUGID process flag in execve() if the binary
executed does not have suid or sgid permission bits set.

This also happens when the effective uid is different from the real
uid or the effective gid is different from the real gid. Under
these circumstances, the process still has set id privileges and
the P_SUGID flag should not be cleared.

Submitted by:	Tor Egge <Tor.Egge@idt.ntnu.no>
1997-02-19 03:51:34 +00:00
John Dyson
996c772f58 This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
		Mount_std mounts will not work until the getfsent
		library routine is changed.

Reviewed by:	various people
Submitted by:	Jeffery Hsu <hsu@freebsd.org>
1997-02-10 02:22:35 +00:00
Jordan K. Hubbard
1130b656e5 Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore.  This update would have been
insane otherwise.
1997-01-14 07:20:47 +00:00
John Dyson
2cb544c3c9 Fix an ordering bug -- pmap_remove_pages should be called BEFORE
vm_map_remove, not after...

2.2-RELEASE candidate.
1996-11-09 03:54:25 +00:00
John Dyson
9d3fbbb5f4 Performance optimizations. One of which was meant to go in before the
previous snap.  Specifically, kern_exit and kern_exec now makes a
call into the pmap module to do a very fast removal of pages from the
address space.  Additionally, the pmap module now updates the PG_MAPPED
and PG_WRITABLE flags.  This is an optional optimization, but helpful
on the X86.
1996-10-12 21:35:25 +00:00
John Dyson
67bf686897 Backed out the recent changes/enhancements to the VM code. The
problem with the 'shell scripts' was found, but there was a 'strange'
problem found with a 486 laptop that we could not find.  This commit
backs the code back to 25-jul, and will be re-entered after the snapshot
in smaller (more easily tested) chunks.
1996-07-30 03:08:57 +00:00
John Dyson
4f4d35edf0 This commit is meant to solve a couple of VM system problems or
performance issues.

	1) The pmap module has had too many inlines, and so the
	   object file is simply bigger than it needs to be.
	   Some common code is also merged into subroutines.
	2) Removal of some *evil* PHYS_TO_VM_PAGE macro calls.
	   Unfortunately, a few have needed to be added also.
	   The removal caused the need for more vm_page_lookups.
	   I added lookup hints to minimize the need for the
	   page table lookup operations.
	3) Removal of some bogus performance improvements, that
	   mostly made the code more complex (tracking individual
	   page table page updates unnecessarily).  Those improvements
	   actually hurt 386 processors perf (not that people who
	   worry about perf use 386 processors anymore :-)).
	4) Changed pv queue manipulations/structures to be TAILQ's.
	5) The pv queue code has had some performance problems since
	   day one.  Some significant scalability issues are resolved
	   by threading the pv entries from the pmap AND the physical
	   address instead of just the physical address.  This makes
	   certain pmap operations run much faster.  This does
	   not affect most micro-benchmarks, but should help loaded system
	   performance *significantly*.  DG helped and came up with most
	   of the solution for this one.
	6) Most if not all pmap bit operations follow the pattern:
		pmap_test_bit();
		pmap_clear_bit();
	   That made for twice the necessary pv list traversal.   The
	   pmap interface now supports only pmap_tc_bit type operations:
	   pmap_[test/clear]_modified, pmap_[test/clear]_referenced.
	   Additionally, the modified routine now takes a vm_page_t arg
	   instead of a phys address.  This eliminates a PHYS_TO_VM_PAGE
	   operation.
	7) Several rewrites of routines that contain redundant code to
	   use common routines, so that there is a greater likelihood of
	   keeping the cache footprint smaller.
1996-07-27 03:24:10 +00:00
Bruce Evans
6ab46d52a5 Don't use NULL in non-pointer contexts. 1996-07-12 04:12:25 +00:00
David Greenman
86064318c4 Use kmem_alloc_wait/kmem_free_wakeup() to avoid allocation failures
from running out of string space in the exec_map.
1996-06-03 04:12:18 +00:00
David Greenman
6120fef1bc Fix declaration of ps_strings. 1996-06-03 04:09:36 +00:00
John Dyson
b18bfc3da7 This set of commits to the VM system does the following, and contain
contributions or ideas from Stephen McKay <syssgm@devetir.qld.gov.au>,
Alan Cox <alc@cs.rice.edu>, David Greenman <davidg@freebsd.org> and me:

	More usage of the TAILQ macros.  Additional minor fix to queue.h.
	Performance enhancements to the pageout daemon.
		Addition of a wait in the case that the pageout daemon
		has to run immediately.
		Slightly modify the pageout algorithm.
	Significant revamp of the pmap/fork code:
		1) PTE's and UPAGES's are NO LONGER in the process's map.
		2) PTE's and UPAGES's reside in their own objects.
		3) TOTAL elimination of recursive page table pagefaults.
		4) The page directory now resides in the PTE object.
		5) Implemented pmap_copy, thereby speeding up fork time.
		6) Changed the pv entries so that the head is a pointer
		   and not an entire entry.
		7) Significant cleanup of pmap_protect, and pmap_remove.
		8) Removed significant amounts of machine dependent
		   fork code from vm_glue.  Pushed much of that code into
		   the machine dependent pmap module.
		9) Support more completely the reuse of already zeroed
		   pages (Page table pages and page directories) as being
		   already zeroed.
	Performance and code cleanups in vm_map:
		1) Improved and simplified allocation of map entries.
		2) Improved vm_map_copy code.
		3) Corrected some minor problems in the simplify code.
	Implemented splvm (combo of splbio and splimp.)  The VM code now
		seldom uses splhigh.
	Improved the speed of and simplified kmem_malloc.
	Minor mod to vm_fault to avoid using pre-zeroed pages in the case
		of objects with backing objects along with the already
		existant condition of having a vnode.  (If there is a backing
		object, there will likely be a COW...  With a COW, it isn't
		necessary to start with a pre-zeroed page.)
	Minor reorg of source to perhaps improve locality of ref.
1996-05-18 03:38:05 +00:00
Bruce Evans
a794e791c8 Removed unnecessary #includes from <sys/imgact.h> so that it is
self-sufficient and added explicit #includes where required.
1996-05-01 02:43:13 +00:00
Sujal Patel
24b34f097b Fixed two typos in the comment.
Pointed out by:	davidg
1996-04-29 15:07:59 +00:00
David Greenman
39f70d4545 Killed sections 3 and 4 of my copyright as I don't agree with it (I believe
it to be unnecessarily restrictive). For tty_subr.c, update to my standard
copyright.
1996-04-08 01:22:00 +00:00
Søren Schmidt
e1743d02cd First attempt at FreeBSD & Linux ELF support.
Compile and link a new kernel, that will give native ELF support, and
provide the hooks for other ELF interpreters as well.

To make native ELF binaries use John Polstras elf-kit-1.0.1..
For the time being also use his ld-elf.so.1 and put it in
/usr/libexec.

The Linux emulator has been enhanced to also run ELF binaries, it
is however in its very first incarnation.
Just get some Linux ELF libs (Slackware-3.0) and put them in the
prober place (/compat/linux/...).
I've ben able to run all the Slackware-3.0 binaries I've tried
so far.
(No it won't run quake yet :)
1996-03-10 08:42:54 +00:00
Peter Wemm
d66a506616 Mega-commit for Linux emulator update.. This has been stress tested under
netscape-2.0 for Linux running all the Java stuff.  The scrollbars are now
working, at least on my machine. (whew! :-)

I'm uncomfortable with the size of this commit, but it's too
inter-dependant to easily seperate out.

The main changes:

COMPAT_LINUX is *GONE*.  Most of the code has been moved out of the i386
machine dependent section into the linux emulator itself.  The int 0x80
syscall code was almost identical to the lcall 7,0 code and a minor tweak
allows them to both be used with the same C code.  All kernels can now
just modload the lkm and it'll DTRT without having to rebuild the kernel
first.  Like IBCS2, you can statically compile it in with "options LINUX".

A pile of new syscalls implemented, including getdents(), llseek(),
readv(), writev(), msync(), personality().  The Linux-ELF libraries want
to use some of these.

linux_select() now obeys Linux semantics, ie: returns the time remaining
of the timeout value rather than leaving it the original value.

Quite a few bugs removed, including incorrect arguments being used in
syscalls..  eg:  mixups between passing the sigset as an int, vs passing
it as a pointer and doing a copyin(), missing return values, unhandled
cases, SIOC* ioctls, etc.

The build for the code has changed.  i386/conf/files now knows how
to build linux_genassym and generate linux_assym.h on the fly.

Supporting changes elsewhere in the kernel:

The user-mode signal trampoline has moved from the U area to immediately
below the top of the stack (below PS_STRINGS).  This allows the different
binary emulations to have their own signal trampoline code (which gets rid
of the hardwired syscall 103 (sigreturn on BSD, syslog on Linux)) and so
that the emulator can provide the exact "struct sigcontext *" argument to
the program's signal handlers.

The sigstack's "ss_flags" now uses SS_DISABLE and SS_ONSTACK flags, which
have the same values as the re-used SA_DISABLE and SA_ONSTACK which are
intended for sigaction only.  This enables the support of a SA_RESETHAND
flag to sigaction to implement the gross SYSV and Linux SA_ONESHOT signal
semantics where the signal handler is reset when it's triggered.

makesyscalls.sh no longer appends the struct sysentvec on the end of the
generated init_sysent.c code.  It's a lot saner to have it in a seperate
file rather than trying to update the structure inside the awk script. :-)

At exec time, the dozen bytes or so of signal trampoline code are copied
to the top of the user's stack, rather than obtaining the trampoline code
the old way by getting a clone of the parent's user area.  This allows
Linux and native binaries to freely exec each other without getting
trampolines mixed up.
1996-03-02 19:38:20 +00:00
Peter Wemm
99ac3bc8da Add two sysctl variables that can be read by libutil and libkvm so that
they can adapt to simple kernel VM layout changes.
1996-02-24 14:32:53 +00:00
Bruce Evans
9f29a57754 Removed stale #includes of "opt_sysvipc.h". 1996-01-20 21:36:31 +00:00
John Dyson
bd7e5f992e Eliminated many redundant vm_map_lookup operations for vm_mmap.
Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish
	overhead for merged cache.
Efficiency improvement for vfs_cluster.  It used to do alot of redundant
	calls to cluster_rbuild.
Correct the ordering for vrele of .text and release of credentials.
Use the selective tlb update for 486/586/P6.
Numerous fixes to the size of objects allocated for files.  Additionally,
	fixes in the various pagers.
Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs.
Fixes in the swap pager for exhausted resources.  The pageout code
	will not as readily thrash.
Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into
	page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE),
	thereby improving efficiency of several routines.
Eliminate even more unnecessary vm_page_protect operations.
Significantly speed up process forks.
Make vm_object_page_clean more efficient, thereby eliminating the pause
	that happens every 30seconds.
Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the
	case of filesystems mounted async.
Fix a panic with busy pages when write clustering is done for non-VMIO
	buffers.
1996-01-19 04:00:31 +00:00
Peter Wemm
81090119af (gulp!) reran makesyscalls..
sysv_ipc.c: add stub functions that either simply return (for the hooks
in kern_fork/kern_exit) or log() a messgae and call enosys() (for the
syscalls).  sysv_ipc.c will become "standard" in conf/files and has
#ifs for all the permutations.
1996-01-08 04:30:48 +00:00
Garrett Wollman
50c73f3620 Convert SYSV IPC to new-style options. (I hope I got everything...)
The LKMs will need an extra file, to come later.
1996-01-04 20:29:06 +00:00
Poul-Henning Kamp
87b6de2b76 A Major staticize sweep. Generates a couple of warnings that I'll deal
with later.
A number of unused vars removed.
A number of unused procs removed or #ifdefed.
1995-12-14 08:32:45 +00:00
Peter Wemm
1ed012f969 Reorganise ps_strings in order to gain BSD/OS 2.0 binary compatability.
This is now in line with NetBSD as well..

Note that once this series of commits is finished, you must recompile
libkvm, then ps and maybe 'w'.  If you are running the recently imported
sendmail-8.7, you should recompile that too (src/conf.c at least).
1995-12-09 04:29:11 +00:00
David Greenman
efeaf95a41 Untangled the vm.h include file spaghetti. 1995-12-07 12:48:31 +00:00
David Greenman
c2f9f36bae Use kmem_alloc_pageable/kmem_free to allocate memory instead of individual
VM map functions.
1995-11-13 10:45:22 +00:00
Bruce Evans
d2d3e8751c Included <sys/sysproto.h> to get central declarations for syscall args
structs and prototypes for syscalls.

Ifdefed duplicated decentralized declarations of args structs.  It's
convenient to have this visible but they are hard to maintain.  Some
are already different from the central declarations.  4.4lite2 puts
them in comments in the function headers but I wanted to avoid the
large changes for that.
1995-11-12 06:43:28 +00:00
David Greenman
c52007c2cc All:
Changed vnodep -> vp for consistency with the rest of the kernel, and
changed iparams -> imgp for brevity.

kern_exec.c:
   Explicitly initialized some additional parts of the image_params struct
to avoid bzeroing it. Rewrote the set-id code to reduce the number of
logical tests. The rewrite exposed a mostly benign bug in the algorithm:
traced set-id images would get ktracing disabled even if the set-id didn't
happen for other reasons.
1995-11-06 12:52:37 +00:00
David Greenman
079cc25b11 Killed a few gratuitous #include's. 1995-10-21 08:38:13 +00:00
Steven Wallace
ad7507e248 Remove prototype definitions from <sys/systm.h>.
Prototypes are located in <sys/sysproto.h>.

Add appropriate #include <sys/sysproto.h> to files that needed
protos from systm.h.

Add structure definitions to appropriate files that relied on sys/systm.h,
right before system call definition, as in the rest of the kernel source.

In kern_prot.c, instead of using the dummy structure "args", create
individual dummy structures named <syscall>_args.  This makes
life easier for prototype generation.
1995-10-08 00:06:22 +00:00
David Greenman
c0e5de7d88 Moved setting of VTEXT flag into the appropriate image activators. This
fixes a bug where linux binaries would get the flag set inappropriately.
1995-08-24 10:32:37 +00:00
Rodney W. Grimes
9b2e535452 Remove trailing whitespace. 1995-05-30 08:16:23 +00:00
David Greenman
3ed8a40336 Use 'p' rather than 'curproc' when appropriate. 1995-03-25 01:34:21 +00:00
David Greenman
b35ba931bd Use NDINIT macro to initialize fields for namei. 1995-03-25 01:20:38 +00:00
David Greenman
9022e4ad9c Fixed bug introduced in the previous commit - the lock must be held until
after the call to exec_check_permissions().
1995-03-19 23:27:57 +00:00