7013 Commits

Author SHA1 Message Date
Don Lewis
47934cef8f Split the mlock() kernel code into two parts, mlock(), which unpacks
the syscall arguments and does the suser() permission check, and
kern_mlock(), which does the resource limit checking and calls
vm_map_wire().  Split munlock() in a similar way.

Enable the RLIMIT_MEMLOCK checking code in kern_mlock().

Replace calls to vslock() and vsunlock() in the sysctl code with
calls to kern_mlock() and kern_munlock() so that the sysctl code
will obey the wired memory limits.

Nuke the vslock() and vsunlock() implementations, which are no
longer used.

Add a member to struct sysctl_req to track the amount of memory
that is wired to handle the request.

Modify sysctl_wire_old_buffer() to return an error if its call to
kern_mlock() fails.  Only wire the minimum of the length specified
in the sysctl request and the length specified in its argument list.
It is recommended that sysctl handlers that use sysctl_wire_old_buffer()
should specify reasonable estimates for the amount of data they
want to return so that only the minimum amount of memory is wired
no matter what length has been specified by the request.

Modify the callers of sysctl_wire_old_buffer() to look for the
error return.

Modify sysctl_old_user to obey the wired buffer length and clean up
its implementation.

Reviewed by:	bms
2004-02-26 00:27:04 +00:00
Robert Watson
049ffe98a8 Assert pipe mutex in pipeselwakeup(), as we manipulate pipe_state
in a non-atomic manner.  It appears to always be called with the
mutex (good).
2004-02-26 00:18:22 +00:00
Robert Watson
094bdd260c Update comment regarding MAC labels: we no longer pass endpoints
into the MAC Framework, just the pipe pair.

GC 'hadpeer' used in pipedestroy(), which is no longer needed as
we check pipe_present flags on the pair.
2004-02-25 23:30:56 +00:00
Dag-Erling Smørgrav
854a417d92 Whitespace cleanup 2004-02-24 19:31:30 +00:00
Poul-Henning Kamp
652d04726d Fix two oversights here: don't trash the freelist, and properly cleanup
the cdevsw{}.

Submitted by:	tegge
2004-02-23 08:42:55 +00:00
Brian Feldman
240160d48b Correct some major SMP-harmful problems in the pipe implementation. First
of all, PIPE_EOF is not checked pervasively after everything that can drop
the pipe mutex and msleep(), so fix.  Additionally, though it might not
harm anything, pipelock() and pipeunlock() are not used consistently.
Third, the kqueue support functions do not use the pipe mutex correctly.
Last, but absolutely not least, is a race: if pipe_busy is not set on
the closing side of the pipe, the other side that is trying to write to
that will crash BECAUSE PIPE_EOF IS NOT SET!  Unconditionally set
PIPE_EOF, and get rid of all the lockups/crashes I have seen trying
to build ports.
2004-02-22 23:00:14 +00:00
Daniel Eischen
2648efa621 Add sysctls to allow showing threads for pgrp, tty, uid, ruid,
and pid.
2004-02-22 17:54:32 +00:00
Pawel Jakub Dawidek
63dba32b76 Reimplement sysctls handling by MAC framework.
Now I believe it is done in the right way.

Removed some XXMAC cases, we now assume 'high' integrity level for all
sysctls, except those with CTLFLAG_ANYBODY flag set. No more magic.

Reviewed by:	rwatson
Approved by:	rwatson, scottl (mentor)
Tested with:	LINT (compilation), mac_biba(4) (functionality)
2004-02-22 12:31:44 +00:00
Colin Percival
b17dd2bcc0 If we're going to panic(), do it before dereferencing a NULL pointer.
Reported by:	"Ted Unangst" <tedu@coverity.com>
Approved by:	rwatson (mentor)
2004-02-22 01:11:53 +00:00
Robert Watson
f6a4109212 Update my personal copyrights and NETA copyrights in the kernel
to use the "year1-year3" format, as opposed to "year1, year2, year3".
This seems to make lawyers more happy, but also prevents the
lines from getting excessively long as the years start to add up.

Suggested by:	imp
2004-02-22 00:33:12 +00:00
Poul-Henning Kamp
ded67d0f77 Check for NODEV return from udev2dev() 2004-02-21 23:52:03 +00:00
Poul-Henning Kamp
cd690b60de Device megapatch 6/6:
This is what we came here for:  Hang dev_t's from their cdevsw,
refcount cdevsw and dev_t and generally keep track of things a lot
better than we used to:

Hold a cdevsw reference around all entrances into the device driver,
this will be necessary to safely determine when we can unload driver
code.

Hold a dev_t reference while the device is open.

KASSERT that we do not enter the driver on a non-referenced dev_t.

Remove old D_NAG code, anonymous dev_t's are not a problem now.

When destroy_dev() is called on a referenced dev_t, move it to
dead_cdevsw's list.  When the refcount drops, free it.

Check that cdevsw->d_version is correct.  If not, set all methods
to the dead_*() methods to prevent entrance into driver.  Print
warning on console to this effect.  The device driver may still
explode if it is also incompatible with newbus, but in that case
we probably didn't get this far in the first place.
2004-02-21 21:57:26 +00:00
Poul-Henning Kamp
816d62bbb9 Device megapatch 5/6:
Remove the unused second argument from udev2dev().

Convert all remaining users of makedev() to use udev2dev().  The
semantic difference is that udev2dev() will only locate a pre-existing
dev_t, it will not line makedev() create a new one.

Apart from the tiny well controlled windown in D_PSEUDO drivers,
there should no longer be any "anonymous" dev_t's in the system
now, only dev_t's created with make_dev() and make_dev_alias()
2004-02-21 21:32:15 +00:00
Poul-Henning Kamp
dc08ffec87 Device megapatch 4/6:
Introduce d_version field in struct cdevsw, this must always be
initialized to D_VERSION.

Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing
four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.
2004-02-21 21:10:55 +00:00
Poul-Henning Kamp
8e1f1df080 Device megapatch 3/6:
Add missing D_TTY flags to various drivers.

Complete asserts that dev_t's passed to ttyread(), ttywrite(),
ttypoll() and ttykqwrite() have (d_flags & D_TTY) and a struct tty
pointer.

Make ttyread(), ttywrite(), ttypoll() and ttykqwrite() the default
cdevsw methods for D_TTY drivers and remove the explicit initializations
in various drivers cdevsw structures.
2004-02-21 20:41:11 +00:00
Poul-Henning Kamp
b0b0334878 Device megapatch 2/6:
This commit adds a couple of functions for pseudodrivers to use for
implementing cloning in a manner we will be able to lock down (shortly).

Basically what happens is that pseudo drivers get a way to ask for
"give me the dev_t with this unit number" or alternatively "give
me a dev_t with the lowest guaranteed free unit number" (there is
unfortunately a lot of non-POLA in the exact numeric value of this
number, just live with it for now)

Managing the unit number space this way removes the need to use
rman(9) to do so in the drivers this greatly simplifies the code in
the drivers because even using rman(9) they still needed to manage
their dev_t's anyway.

I have taken the if_tun, if_tap, snp and nmdm drivers through the
mill, partly because they (ab)used makedev(), but mostly because
together they represent three different problems for device-cloning:

if_tun and snp is the plain case: just give me a device.

if_tap has two kinds of devices, with a flag for device type.

nmdm has paired devices (ala pty) can you can clone either of them.
2004-02-21 20:29:52 +00:00
Brian Feldman
fe5f3a72ac Make sure to wake up any select waiters when closing a kqueue (also, not
crash).  I am fairly sure that only people with SMP and multi-threaded
apps using kqueue will be affected by this, so I have a stress-testing
program on my web site:
<URL:http://green.homeunix.org/~green/getaddrinfo-pthreads-stresstest.c>
2004-02-20 04:00:48 +00:00
John Baldwin
712f57d8ab Tidy up the thread taskqueue implementation and close a lost wakeup race.
Instead of creating a mutex that we msleep on but don't actually lock when
doing the corresponding wakeup(), in the kthread, lock the mutex associated
with our taskqueue and msleep while the queue is empty.  Assert that the
queue is locked when the callback function is called to wake the kthread.
2004-02-19 22:03:52 +00:00
Jacques Vidrine
57f22bd4af Rework jail_attach(2) so that an already jailed process cannot hop
to another jail.

Submitted by:	rwatson
2004-02-19 21:03:20 +00:00
Pawel Jakub Dawidek
461167c289 Added sysctl security.jail.jailed.
It returns 1 is process is inside of jail and 0 if it is not.
Information if we are in jail or not is not a secret, there is plenty of
ways to discover it. Many people are using own hack to check this and
this will be a legal way from now on.

It will be great if our starting scripts will take advantage of this sysctl
to allow clean "boot" inside jail.

Approved by:	rwatson, scottl (mentor)
2004-02-19 14:29:14 +00:00
Pawel Jakub Dawidek
f6739b1ddc Simplify check. We are only able to check exclusive lock and if
2nd condition is true, first one is true for sure.

Approved by:	jhb, scottl (mentor)
2004-02-19 14:19:31 +00:00
Don Lewis
cf93aa166c When reparenting a process in the PT_DETACH code, only set p_sigparent
to SIGCHLD if the new parent process is initproc.

MFC after:	2 weeks
2004-02-19 10:39:42 +00:00
Don Lewis
6567eef757 A Linux thread created using clone() should not send SIGCHLD to its
parent if no signal is specified in the clone() flags argument.

PR:		42457
MFC after:	2 weeks
2004-02-19 06:43:48 +00:00
Nate Lawson
32869e71fb Add support for 'h' and 'hh' modifiers for printf(9).
Submitted by:	Bruno Ducrot <ducrot AT poupinou.org>
Reviewed by:	bde
2004-02-19 05:29:39 +00:00
Colin Percival
3a1bdbf8d1 Don't ignore errors from vfs_allocate_syncvnode.
PR:		kern/18503
Submitted by:	Anatoly Vorobey <mellon@pobox.com>
Approved by:	rwatson (mentor)
2004-02-18 05:20:54 +00:00
Peter Wemm
df7c361e64 Checkpoint a hack to enable running i386 libc_r binaries on a 64 bit
kernel.  I'm not happy with it yet - refinements are to come.
This hack allows the kern.ps_strings and kern.usrstack sysctls to respond
to a 32 bit request, such as those coming from emulated i386 binaries.
2004-02-18 00:54:17 +00:00
David Malone
a1cc6206fb Correct a comment.
Reviewed by:	alfred, tanimura
2004-02-17 12:30:32 +00:00
Dag-Erling Smørgrav
963385cf22 Mechanical whistespace cleanup. 2004-02-17 10:21:03 +00:00
Dag-Erling Smørgrav
44f4b94b38 Don't bother storing a result when all you need are the side effects. 2004-02-16 18:38:46 +00:00
David Malone
a82294d01c In fdcheckstd the descriptor table should never be shared, so just
KASSERT this rather than trying to deal with what happens when file
descriptors change out from under us.
2004-02-15 21:14:48 +00:00
Bruce Evans
72632ef235 Fixed style bugs near previous commit (mainly formatting errors and
missing parentheses).  Use default handling (trap to debugger) for
udev2dev(x, 1) since it is an error and doesn't happen anywhere in
the sys tree except in bogusly commented out code in coda.
2004-02-15 20:14:47 +00:00
Colin Percival
a20e9655b9 Remove opv_desc_vector from vfs_add_vnodeops, since it is defined
and given a value, but never used.  This has no effect on the
resulting binaries, since gcc optimizes the variable away anyway.

PR:		kern/62684
Approved by:	rwatson (mentor)
2004-02-15 17:27:33 +00:00
Poul-Henning Kamp
2a3faf2fbd Split the initialization of the cdevsw into a separate function. 2004-02-15 10:35:33 +00:00
Robert Watson
402d7aa884 Remove excess brackets. 2004-02-15 00:43:22 +00:00
Poul-Henning Kamp
d60d18d491 Use standard style for cdevsw initialization. 2004-02-14 20:03:36 +00:00
Robert Watson
679a106075 By default, don't allow processes in a jail to list the set of
jails in the system.  Previous behavior (allowed) may be restored
by setting security.jail.list_allowed=1.
2004-02-14 19:19:47 +00:00
Robert Watson
7e440242e5 Fix mismerge in last commit: check that cred->cr_prison is NULL
before dereferencing the prison pointer.
2004-02-14 18:52:43 +00:00
Robert Watson
f08df373a3 By default, when a process in jail calls getfsstat(), only return the
data for the file system on which the jail's root vnode is located.
Previous behavior (show data for all mountpoints) can be restored
by setting security.jail.getfsstatroot_only to 0.  Note: this also
has the effect of hiding other mounts inside a jail, such as /dev,
/tmp, and /proc, but errs on the side of leaking less information.
2004-02-14 18:31:11 +00:00
Poul-Henning Kamp
d08c5d0b9b Remove the check which used to protect us against make_dev() being
called until DEVFS had a chance to initialize.  Since DEVFS is mandatory
and things over in that department coincidentally works from without
any initialization now, this is safe.
2004-02-14 17:19:43 +00:00
Brian Feldman
a0ed09c0af T -CURRENT DO NOT CRASH UPON ^T K PLZ THX.
Also, use sched_pctcpu() instead of assuming td->td_kse is non-NULL.
2004-02-14 01:30:06 +00:00
Brian Feldman
f662a93197 Always socantsendmore() before deallocating a socket. This, in turn,
calls selwakeup() if necessary (which it is, if you don't want freed
memory hanging around on your td->td_selq).

Props to:	alfred
2004-02-12 01:48:40 +00:00
Don Lewis
55b5f2a202 When reparenting a process to init, make sure that p_sigparent is
set to SIGCHLD.  This avoids the creation of orphaned Linux-threaded
zombies that init is unable to reap.  This can occur when the parent
process sets its SIGCHLD to SIG_IGN.  Fix a similar situation in the
PT_DETACH code.

Tested by:	"Steven Hartland" <killing AT multiplay.co.uk>
2004-02-11 22:06:02 +00:00
John Baldwin
e7a44cace2 Argh! Fix a bogon. lim_cur() was returning the hard (max) limit rather
than the soft (cur) limit.

Submitted by:	bde
2004-02-11 18:04:13 +00:00
Mike Silbersack
b49d824e8b Add the SF_NODISKIO flag to sendfile. This flag causes sendfile to be
mindful of blocking on disk I/O and instead return EBUSY when such
blocking would occur.

Results from the DeBox project indicate that blocking on disk I/O
can slow the performance of a kqueue/poll based webserver.  Using
a flag such as SF_NODISKIO and throwing connections that would block
to helper processes/threads helped increase performance.

Currently, only the Flash webserver uses this flag, although it could
probably be applied to thttpd with relative ease.

Idea by:	Yaoping Ruan & Vivek Pai
2004-02-08 07:35:48 +00:00
Alan Cox
c5aebf380c swp_pager_async_iodone() no longer requires Giant. Modify bufdone()
and swapgeom_done() to perform swp_pager_async_iodone() without Giant.

Reviewed by:	tegge
2004-02-07 08:54:50 +00:00
John Baldwin
a875f38546 - Convert the plimit lock to a pool mutex lock.
- Hide struct plimit from userland.

Submitted by:	bde (2)
2004-02-06 19:35:14 +00:00
John Baldwin
f4daf05619 - Correct the translation of old rlimit values to properly handle the old
RLIM_INFINITY case for ogetrlimit().
- Use %jd and intmax_t to output negative time in usec in calcru().
- Rework getrusage() to make a copy of the rusage struct into a local
  variable while holding Giant and then do the copyout from the local
  variable to avoid having to have the original process rusage struct
  locked while doing the copyout (which would not be safe).  This also
  includes a few style fixes from Bruce to getrusage().

Submitted by:	bde (1, parts of 3)
Suggested by:	bde (2)
2004-02-06 19:30:12 +00:00
John Baldwin
99b6e02ba6 A few more style fixes from Bruce including a few I missed last time.
Submitted by:	bde
2004-02-06 19:25:34 +00:00
John Baldwin
4c3558aa82 Always set a process' state to normal when it is fully constructed in
fork1() rather than only doing it for the RFSTOPPED case and then having
to fix it up in other places later on.
2004-02-05 21:01:37 +00:00
John Baldwin
b4323d7729 - A lot of style and whitespace fixes.
- Update a few comments regarding locking notes.

Submitted by:	bde (1, mostly)
2004-02-05 20:53:25 +00:00