Commit Graph

88 Commits

Author SHA1 Message Date
Poul-Henning Kamp
b9df5231ca Introduce commandline caching in the kernel.
This fixes some nasty procfs problems for SMP, makes ps(1) run much faster,
and makes ps(1) even less dependent on /proc which will aid chroot and
jails alike.

To disable this facility and revert to previous behaviour:
        sysctl -w kern.ps_arg_cache_limit=0

For full details see the current@FreeBSD.org mail-archives.
1999-11-16 20:31:58 +00:00
Poul-Henning Kamp
2e3c8fcbd0 This is a partial commit of the patch from PR 14914:
Alot of the code in sys/kern directly accesses the *Q_HEAD and *Q_ENTRY
   structures for list operations.  This patch makes all list operations
   in sys/kern use the queue(3) macros, rather than directly accessing the
   *Q_{HEAD,ENTRY} structures.

This batch of changes compile to the same object files.

Reviewed by:    phk
Submitted by:   Jake Burkholder <jake@checker.org>
PR:     14914
1999-11-16 10:56:05 +00:00
Luoqi Chen
645682fd40 Add a per-signal flag to mark handlers registered with osigaction, so we
can provide the correct context to each signal handler.

Fix broken sigsuspend(): don't use p_oldsigmask as a flag, use SAS_OLDMASK
as we did before the linuxthreads support merge (submitted by bde).

Move ps_sigstk from to p_sigacts to the main proc structure since signal
stack should not be shared among threads.

Move SAS_OLDMASK and SAS_ALTSTACK flags from sigacts::ps_flags to proc::p_flag.
Move PS_NOCLDSTOP and PS_NOCLDWAIT flags from proc::p_flag to procsig::ps_flag.

Reviewed by:	marcel, jdp, bde
1999-10-11 20:33:17 +00:00
Peter Wemm
0894f4a92c Clean up some cruft. We don't run <= 4.3 binaries on hp300 or luna68k
arches using owait(2).
1999-10-11 15:15:45 +00:00
Marcel Moolenaar
2c42a14602 sigset_t change (part 2 of 5)
-----------------------------

The core of the signalling code has been rewritten to operate
on the new sigset_t. No methodological changes have been made.
Most references to a sigset_t object are through macros (see
signalvar.h) to create a level of abstraction and to provide
a basis for further improvements.

The NSIG constant has not been changed to reflect the maximum
number of signals possible. The reason is that it breaks
programs (especially shells) which assume that all signals
have a non-null name in sys_signame. See src/bin/sh/trap.c
for an example. Instead _SIG_MAXSIG has been introduced to
hold the maximum signal possible with the new sigset_t.

struct sigprop has been moved from signalvar.h to kern_sig.c
because a) it is only used there, and b) access must be done
though function sigprop(). The latter because the table doesn't
holds properties for all signals, but only for the first NSIG
signals.

signal.h has been reorganized to make reading easier and to
add the new and/or modified structures. The "old" structures
are moved to signalvar.h to prevent namespace polution.

Especially the coda filesystem suffers from the change, because
it contained lines like (p->p_sigmask == SIGIO), which is easy
to do for integral types, but not for compound types.

NOTE: kdump (and port linux_kdump) must be recompiled.

Thanks to Garrett Wollman and Daniel Eischen for pressing the
importance of changing sigreturn as well.
1999-09-29 15:03:48 +00:00
Peter Wemm
c3aac50f28 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
Marcel Moolenaar
c6dfea0ebd Add sysctl variables for the Linuxulator. These reside under `compat.linux' as
discussed on current.

The following variables are defined (for now):

    osname (defaults to "Linux")
        Allow users to change the name of the OS as returned by uname(2),
        specially added for all those Linux Netscape users and statistics
        maniacs :-) We now have what we all wanted!

    osrelease (defaults to "2.2.5")
        Allow users to change the version of the OS as returned by uname(2).
        Since -current supports glibc2.1 now, change the default to 2.2.5
        (was 2.0.36).

    oss_version (defaults to 198144 [0x030600])
        This one will be used by the OSS_GETVERSION ioctl (PR 12917) which I
        can commit now that we have the MIB. The default version number is the
        lowest version possible with the current 'encoding'.

A note about imprisoned processes (see jail(2)):
  These variables are copy-on-write (as suggested by phk). This means that
  imprisoned processes will use the system wide value unless it is written/set
  by the process. From that moment on, a copy local to the prison will be
  used.

A note about the implementation:
  I choose to add a single pointer to struct prison, because I didn't like the
  idea of changing struct prison every time I come up with a new variable. As
  a side effect, the extra storage is only needed when a variable is set from
  within the prison. This also minimizes kernel bloat when the Linuxulator is
  not used; both compiled in or as a module.

Reviewed by: bde (first version only) and phk
1999-08-27 19:47:41 +00:00
Mike Smith
79fc0bf4a0 From the submitter:
- this causes POSIX locking to use the thread group leader
   (p->p_leader) as the locking thread for all advisory locks.
   In non-kernel-threaded code p->p_leader == p, so this will have
   no effect.

   This results in (more) correct POSIX threaded flock-ing semantics.

   It also prevents the leader from exiting before any of the children.
   (so that p->p_leader will never be stale) in exit1().

   We have been running this patch for over a month now in our lab
   under load and at customer sites.

Submitted by:	John Plevyak <jplevyak@inktomi.com>
1999-06-07 20:37:29 +00:00
Poul-Henning Kamp
75c1354190 This Implements the mumbled about "Jail" feature.
This is a seriously beefed up chroot kind of thing.  The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.

For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact:  "real virtual servers".

Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.

Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.

It generally does what one would expect, but setting up a jail
still takes a little knowledge.

A few notes:

   I have no scripts for setting up a jail, don't ask me for them.

   The IP number should be an alias on one of the interfaces.

   mount a /proc in each jail, it will make ps more useable.

   /proc/<pid>/status tells the hostname of the prison for
   jailed processes.

   Quotas are only sensible if you have a mountpoint per prison.

   There are no privisions for stopping resource-hogging.

   Some "#ifdef INET" and similar may be missing (send patches!)

If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!

Tools, comments, patches & documentation most welcome.

Have fun...

Sponsored by:   http://www.rndassociates.com/
Run for almost a year by:       http://www.servetheweb.com/
1999-04-28 11:38:52 +00:00
Luoqi Chen
5206bca10a Enable vmspace sharing on SMP. Major changes are,
- %fs register is added to trapframe and saved/restored upon kernel entry/exit.
- Per-cpu pages are no longer mapped at the same virtual address.
- Each cpu now has a separate gdt selector table. A new segment selector
  is added to point to per-cpu pages, per-cpu global variables are now
  accessed through this new selector (%fs). The selectors in gdt table are
  rearranged for cache line optimization.
- fask_vfork is now on as default for both UP and SMP.
- Some aio code cleanup.

Reviewed by:	Alan Cox	<alc@cs.rice.edu>
		John Dyson	<dyson@iquest.net>
		Julian Elischer	<julian@whistel.com>
		Bruce Evans	<bde@zeta.org.au>
		David Greenman	<dg@root.com>
1999-04-28 01:04:33 +00:00
Peter Wemm
e91896117b Well folks, this is it - The second stage of the removal for build support
for LKM's..
1999-04-17 08:36:07 +00:00
Bruce Evans
56ce1a8dc4 Fixed runtime accounting. The time since the previous context switch
was discarded on every call to calcru().  Hacking on the `switchtime'
global for a related fix in rev.1.38 of kern_resource.c was too fragile
and broke when p_switchtime went away.

PR:		10402
1999-03-11 21:53:12 +00:00
Julian Elischer
4ac9ae7083 Fix thread/process tracking and differentiation for Linux threads emulation.
Submitted by:	Richard Seaman, Jr." <dick@tar.com>

Also clean some compiler warnings in surrounding code.
1999-03-02 00:28:09 +00:00
Luoqi Chen
b1028ad122 Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This
is the preparation step for moving pmap storage out of vmspace proper.

Reviewed by:	Alan Cox	<alc@cs.rice.edu>
		Matthew Dillion	<dillon@apollo.backplane.com>
1999-02-19 14:25:37 +00:00
Mark Newton
ba198b1c45 Added comments about non-staticization so it doesn't get un-done next
time someone goes on a staticization binge.

Suggested by: eivind
1999-01-31 03:15:13 +00:00
Mark Newton
69a6f20bc8 Unstaticized routines which are needed by the svr4 KLD and the streams
garbage needed to support SysVR4 networking.
1999-01-30 06:25:00 +00:00
Julian Elischer
88c5ea4574 Enable Linux threads support by default.
This takes the conditionals out of the code that has been tested by
various people for a while.
ps and friends (libkvm) will need a recompile as some proc structure
changes are made.

Submitted by:	"Richard Seaman, Jr." <dick@tar.com>
1999-01-26 02:38:12 +00:00
Julian Elischer
dc9c271aa1 Changes to the LINUX_THREADS support to only allocate extra memory for
shared signal handling when there is shared signal handling being
used.

This removes the main objection to making the shared signal handling
a standard ability in rfork() and friends and 'unconditionalising'
this code. (i.e. the allocation of an extra 328 bytes per process).

Signal handling information remains in the U area until such a time as
it's reference count would be incremented to > 1. At that point a new
struct is malloc'd and maintained in KVM so that it can be shared between
the processes (threads) using it.

A function to check the reference count and move the struct back to the U
area when it drops back to 1 is also supplied. Signal information is
therefore now swapable for all processes that are not sharing that
information with other processes. THis should addres the concerns raised
by Garrett and others.

Submitted by:	"Richard Seaman, Jr." <dick@tar.com>
1999-01-07 21:23:50 +00:00
Julian Elischer
6626c6045c Reviewed by: Luoqi Chen, Jordan Hubbard
Submitted by:	 "Richard Seaman, Jr." <lists@tar.com>
Obtained from:	linux :-)

Code to allow Linux Threads to run under FreeBSD.

By default not enabled
This code is dependent on the conditional
COMPAT_LINUX_THREADS (suggested by Garret)
This is not yet a 'real' option but will be within some number of hours.
1998-12-19 02:55:34 +00:00
Don Lewis
831d27a9f5 Installed the second patch attached to kern/7899 with some changes suggested
by bde, a few other tweaks to get the patch to apply cleanly again and
some improvements to the comments.

This change closes some fairly minor security holes associated with
F_SETOWN, fixes a few bugs, and removes some limitations that F_SETOWN
had on tty devices.  For more details, see the description on the PR.

Because this patch increases the size of the proc and pgrp structures,
it is necessary to re-install the includes and recompile libkvm,
the vinum lkm, fstat, gcore, gdb, ipfilter, ps, top, and w.

PR:		kern/7899
Reviewed by:	bde, elvind
1998-11-11 10:04:13 +00:00
Peter Wemm
1c5bb3eaa1 add #include <sys/kernel.h> where it's needed by MALLOC_DEFINE() 1998-11-10 09:16:29 +00:00
David Greenman
b5afad7198 Moved limit frobbing (and the resulting limcopy()) that occurs for
accounting to the accounting function so that this isn't needlessly
done for some process exits.
Reviewed by:	bde,phk
1998-06-05 21:44:20 +00:00
Poul-Henning Kamp
4cf41af3d4 Make a kernel version of the timer* functions called timerval* to be
more consistent.

OK'ed by:	bde
1998-04-06 08:26:08 +00:00
John Dyson
2d8acc0f4a VM level code cleanups.
1)	Start using TSM.
	Struct procs continue to point to upages structure, after being freed.
	Struct vmspace continues to point to pte object and kva space for kstack.
	u_map is now superfluous.
2)	vm_map's don't need to be reference counted.  They always exist either
	in the kernel or in a vmspace.  The vmspaces are managed by reference
	counts.
3)	Remove the "wired" vm_map nonsense.
4)	No need to keep a cache of kernel stack kva's.
5)	Get rid of strange looking ++var, and change to var++.
6)	Change more data structures to use our "zone" allocator.  Added
	struct proc, struct vmspace and struct vnode.  This saves a significant
	amount of kva space and physical memory.  Additionally, this enables
	TSM for the zone managed memory.
7)	Keep ioopt disabled for now.
8)	Remove the now bogus "single use" map concept.
9)	Use generation counts or id's for data structures residing in TSM, where
	it allows us to avoid unneeded restart overhead during traversals, where
	blocking might occur.
10)	Account better for memory deficits, so the pageout daemon will be able
	to make enough memory available (experimental.)
11)	Fix some vnode locking problems. (From Tor, I think.)
12)	Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp.
	(experimental.)
13)	Significantly shrink, cleanup, and make slightly faster the vm_fault.c
	code.  Use generation counts, get rid of unneded collpase operations,
	and clean up the cluster code.
14)	Make vm_zone more suitable for TSM.

This commit is partially as a result of discussions and contributions from
other people, including DG, Tor Egge, PHK, and probably others that I
have forgotten to attribute (so let me know, if I forgot.)

This is not the infamous, final cleanup of the vnode stuff, but a necessary
step.  Vnode mgmt should be correct, but things might still change, and
there is still some missing stuff (like ioopt, and physical backing of
non-merged cache files, debugging of layering concepts.)
1998-01-22 17:30:44 +00:00
Eivind Eklund
5591b823d1 Make COMPAT_43 and COMPAT_SUNOS new-style options. 1997-12-16 17:40:42 +00:00
Sean Eric Fagan
847e5f5f9a Use at_exit() to invoke procfs_exit() instead of calling it directly.
Note that an unload facility should be used to call rm_at_exit() (if
procfs is being loaded as an LKM and is subsequently removed), but it
was non-obvious how to do this in the VFS framework.

Reviewed by:	Julian Elischer
1997-12-08 01:06:36 +00:00
Sean Eric Fagan
ed1b05436a Surround the call to procfs_exit() by #ifdef PROCFS/#endif -- much to my
surprise, procfs actually is optional, and some people truly do generate
kernels without it.  Wow.  I built a kernel without 'options PROCFS' and
it compiled and linked.
1997-12-07 18:16:43 +00:00
Sean Eric Fagan
2a024a2b05 Changes to allow event-based process monitoring and control. 1997-12-06 04:11:14 +00:00
Bruce Evans
01166e9245 Avoid passing a `retval' to wait1()
Disallow wait options that are not a combination of the standard POSIX
options WUNTRACED and WNOHANG, as is required by POSIX.  BSD doesn't
have any extensions here, but the code was `#ifdef notyet' for some
reason.
1997-11-20 19:09:43 +00:00
Poul-Henning Kamp
cb226aaa62 Move the "retval" (3rd) parameter from all syscall functions and put
it in struct proc instead.

This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.

I have not removed the /*ARGSUSED*/, they will require some looking at.

libkvm, ps and other userland struct proc frobbing programs will need
recompiled.
1997-11-06 19:29:57 +00:00
Poul-Henning Kamp
a1c995b626 Last major round (Unless Bruce thinks of somthing :-) of malloc changes.
Distribute all but the most fundamental malloc types.  This time I also
remembered the trick to making things static:  Put "static" in front of
them.

A couple of finer points by:	bde
1997-10-12 20:26:33 +00:00
Poul-Henning Kamp
55166637cd Distribute and statizice a lot of the malloc M_* types.
Substantial input from:	bde
1997-10-11 18:31:40 +00:00
Justin T. Gibbs
ab36c06737 init_main.c subr_autoconf.c:
Add support for "interrupt driven configuration hooks".
	A component of the kernel can register a hook, most likely
	during auto-configuration, and receive a callback once
	interrupt services are available.  This callback will occur before
	the root and dump devices are configured, so the configuration
	task can affect the selection of those two devices or complete
	any tasks that need to be performed prior to launching init.
	System boot is posponed so long as a hook is registered.  The
	hook owner is responsible for removing the hook once their task
	is complete or the system boot can continue.

kern_acct.c kern_clock.c kern_exit.c kern_synch.c kern_time.c:
	Change the interface and implementation for the kernel callout
	service.  The new implemntaion is based on the work of
	Adam M. Costello and George Varghese, published in a technical
	report entitled "Redesigning the BSD Callout and Timer Facilities".
	The interface used in FreeBSD is a little different than the one
	outlined in the paper.  The new function prototypes are:

	struct callout_handle timeout(void (*func)(void *),
				      void *arg, int ticks);

	void untimeout(void (*func)(void *), void *arg,
		       struct callout_handle handle);

	If a client wishes to remove a timeout, it must store the
	callout_handle returned by timeout and pass it to untimeout.

	The new implementation gives 0(1) insert and removal of callouts
	making this interface scale well even for applications that
	keep 100s of callouts outstanding.

	See the updated timeout.9 man page for more details.
1997-09-21 22:00:25 +00:00
Joerg Wunsch
245f17d43c Implement SA_NOCLDWAIT.
The implementation is done (unlike what i've originally been
contemplating) by reparenting kids of processes that have the
appropriate bit set to PID 1, and let PID 1 handle the zombie.  This
is far less problematical than what would seem to be ``doing it
right'', for a number of reasons.

Of our currently shipping PID-1-intended programs, 50 % fail the above
assumption. ;-)  (Read this: sysinstall doesn't do it right.  This is
no problem as long as no program called by sysinstall actually uses
SA_NOCLDWAIT.)

ToDo:		. clarify the correct SA_* flag inheritance, compared
		  to other systems,
		. decide whether the compat cruft (osigvec(9)) should
		  deal with new system additions or not,
		. merge OpenBSD's SA_SIGINFO implementation. ;)
Reviewed by:	bde
1997-09-13 19:42:29 +00:00
Bruce Evans
e4ba6a82b0 Removed unused #includes. 1997-09-02 20:06:59 +00:00
Bruce Evans
eb776aea19 Fixed some gratuitous ANSIisms. 1997-08-26 00:15:04 +00:00
Bruce Evans
b1037dcd53 #include <machine/limits.h> explicitly in the few places that it is required. 1997-08-21 20:33:42 +00:00
John Dyson
5aaef07c50 Clean up some lint associated with the AIO code. 1997-07-17 04:49:43 +00:00
John Dyson
2244ea07dc This is an upgrade so that the kernel supports the AIO calls from
POSIX.4.  Additionally, there is some initial code that supports LIO.
This code supports AIO/LIO for all types of file descriptors, with
few if any restrictions.  There will be a followup very soon that
will support significantly more efficient operation for VCHR type
files (raw.)  This code is also dependent on some kernel features
that don't work under SMP yet.  After I commit the changes to the
kernel to support proper address space sharing on SMP, this code
will also work under SMP.
1997-07-06 02:40:43 +00:00
John Dyson
2c1011f7ef Modifications to existing files to support the initial AIO/LIO and
kernel based threading support.
1997-06-16 00:29:36 +00:00
Poul-Henning Kamp
c5318f2363 Remove cruft relating to p_selbits and p_selbits_size 1997-05-22 07:25:20 +00:00
Peter Wemm
a2a1c95c10 The biggie: Get rid of the UPAGES from the top of the per-process address
space. (!)

Have each process use the kernel stack and pcb in the kvm space.  Since
the stacks are at a different address, we cannot copy the stack at fork()
and allow the child to return up through the function call tree to return
to user mode - create a new execution context and have the new process
begin executing from cpu_switch() and go to user mode directly.
In theory this should speed up fork a bit.

Context switch the tss_esp0 pointer in the common tss.  This is a lot
simpler since than swithching the gdt[GPROC0_SEL].sd.sd_base pointer
to each process's tss since the esp0 pointer is a 32 bit pointer, and the
sd_base setting is split into three different bit sections at non-aligned
boundaries and requires a lot of twiddling to reset.

The 8K of memory at the top of the process space is now empty, and unmapped
(and unmappable, it's higher than VM_MAXUSER_ADDRESS).

Simplity the pmap code to manage process contexts, we no longer have to
double map the UPAGES, this simplifies and should measuably speed up fork().

The following parts came from John Dyson:

Set PG_G on the UPAGES that are now in kernel context, and invalidate
them when swapping them out.

Move the upages object (upobj) from the vmspace to the proc structure.

Now that the UPAGES (pcb and kernel stack) are out of user space, make
rfork(..RFMEM..) do what was intended by sharing the vmspace
entirely via reference counting rather than simply inheriting the mappings.
1997-04-07 07:16:06 +00:00
Bruce Evans
fce002fdef Don't include <sys/ioctl.h> in the kernel. Stage 1: don't include
it when it is not used.  In most cases, the reasons for including it
went away when the special ioctl headers became self-sufficient.
1997-03-24 11:25:10 +00:00
Peter Wemm
6875d25465 Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.
1997-02-22 09:48:43 +00:00
John Dyson
996c772f58 This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
		Mount_std mounts will not work until the getfsent
		library routine is changed.

Reviewed by:	various people
Submitted by:	Jeffery Hsu <hsu@freebsd.org>
1997-02-10 02:22:35 +00:00
David Nugent
1273ebf576 Copy process resource settings before modifying.
Candidate for 2.2.
1997-01-21 16:37:01 +00:00
Jordan K. Hubbard
1130b656e5 Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore.  This update would have been
insane otherwise.
1997-01-14 07:20:47 +00:00
John Dyson
9d3fbbb5f4 Performance optimizations. One of which was meant to go in before the
previous snap.  Specifically, kern_exit and kern_exec now makes a
call into the pmap module to do a very fast removal of pages from the
address space.  Additionally, the pmap module now updates the PG_MAPPED
and PG_WRITABLE flags.  This is an optional optimization, but helpful
on the X86.
1996-10-12 21:35:25 +00:00
Julian Elischer
dd45d8ad18 If we have no console device it is possible to be
1/ session leader
2/ Have a console device vnode (/dev/console)
3/ have  NULL pointer for a consoel tty struct.

fix the only case where the tty struct is referenced without a prior
check for existance.
1996-10-04 23:43:12 +00:00
Bruce Evans
fc0b1dbf68 Don't use __dead in the kernel. It was an obfuscation for gcc >= 2.5
and a no-op for gcc >= 2.6.
1996-09-13 09:20:15 +00:00