7023 Commits

Author SHA1 Message Date
jhb
d25301c858 Switch the sleep/wakeup and condition variable implementations to use the
sleep queue interface:
- Sleep queues attempt to merge some of the benefits of both sleep queues
  and condition variables.  Having sleep qeueus in a hash table avoids
  having to allocate a queue head for each wait channel.  Thus, struct cv
  has shrunk down to just a single char * pointer now.  However, the
  hash table does not hold threads directly, but queue heads.  This means
  that once you have located a queue in the hash bucket, you no longer have
  to walk the rest of the hash chain looking for threads.  Instead, you have
  a list of all the threads sleeping on that wait channel.
- Outside of the sleepq code and the sleep/cv code the kernel no longer
  differentiates between cv's and sleep/wakeup.  For example, calls to
  abortsleep() and cv_abort() are replaced with a call to sleepq_abort().
  Thus, the TDF_CVWAITQ flag is removed.  Also, calls to unsleep() and
  cv_waitq_remove() have been replaced with calls to sleepq_remove().
- The sched_sleep() function no longer accepts a priority argument as
  sleep's no longer inherently bump the priority.  Instead, this is soley
  a propery of msleep() which explicitly calls sched_prio() before
  blocking.
- The TDF_ONSLEEPQ flag has been dropped as it was never used.  The
  associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been
  dropped and replaced with a single explicit clearing of td_wchan.
  TD_SET_ONSLEEPQ() would really have only made sense if it had taken
  the wait channel and message as arguments anyway.  Now that that only
  happens in one place, a macro would be overkill.
2004-02-27 18:52:44 +00:00
jhb
d76d631711 Drop sched_lock around the wakeup of the parent process after setting
the process state to zombie when a process exits to avoid a lock order
reversal with the sleepqueue locks.  This appears to be the only place
that we call wakeup() with sched_lock held.
2004-02-27 18:39:09 +00:00
jhb
d07a9130c6 Add an implementation of a generic sleep queue abstraction that is used
to queue threads sleeping on a wait channel similar to how turnstiles are
used to queue threads waiting for a lock.  This subsystem will be used as
the backend for sleep/wakeup and condition variables initially.  Eventually
it will also be used to replace the ithread-specific iwait thread
inhibitor.

Sleep queues are also not locked by sched_lock, so this splits sched_lock
up a bit further increasing concurrency within the scheduler.  Sleep queues
also natively support timeouts on sleeps and interruptible sleeps allowing
for the reduction of a lot of duplicated code between the sleep/wakeup and
condition variable implementations.  For more details on the sleep queue
implementation, check the comments in sys/sleepqueue.h and
kern/subr_sleepqueue.c.
2004-02-27 18:33:09 +00:00
des
51a4ce2dff Add sysctl_move_oid() which reparents an existing OID. 2004-02-27 17:13:23 +00:00
jhb
b23d8371fa Clarify and tweak some comments. 2004-02-27 16:14:27 +00:00
jhb
b7ab1db7c3 Fix _sx_assert() to panic() rather than printf() when an assertion fails
and ignore assertions if we have already paniced.
2004-02-27 16:13:44 +00:00
jhb
b9e0b0f9af Replace the ktrace queue's semaphore with a condition variable instead as
it is slightly more efficient since we already have a mutex to protect the
queue.  Ktrace originally used a semaphore more as a proof of concept.
2004-02-26 19:30:22 +00:00
truckman
1de257deb3 Split the mlock() kernel code into two parts, mlock(), which unpacks
the syscall arguments and does the suser() permission check, and
kern_mlock(), which does the resource limit checking and calls
vm_map_wire().  Split munlock() in a similar way.

Enable the RLIMIT_MEMLOCK checking code in kern_mlock().

Replace calls to vslock() and vsunlock() in the sysctl code with
calls to kern_mlock() and kern_munlock() so that the sysctl code
will obey the wired memory limits.

Nuke the vslock() and vsunlock() implementations, which are no
longer used.

Add a member to struct sysctl_req to track the amount of memory
that is wired to handle the request.

Modify sysctl_wire_old_buffer() to return an error if its call to
kern_mlock() fails.  Only wire the minimum of the length specified
in the sysctl request and the length specified in its argument list.
It is recommended that sysctl handlers that use sysctl_wire_old_buffer()
should specify reasonable estimates for the amount of data they
want to return so that only the minimum amount of memory is wired
no matter what length has been specified by the request.

Modify the callers of sysctl_wire_old_buffer() to look for the
error return.

Modify sysctl_old_user to obey the wired buffer length and clean up
its implementation.

Reviewed by:	bms
2004-02-26 00:27:04 +00:00
rwatson
50cda16803 Assert pipe mutex in pipeselwakeup(), as we manipulate pipe_state
in a non-atomic manner.  It appears to always be called with the
mutex (good).
2004-02-26 00:18:22 +00:00
rwatson
3bcc63994d Update comment regarding MAC labels: we no longer pass endpoints
into the MAC Framework, just the pipe pair.

GC 'hadpeer' used in pipedestroy(), which is no longer needed as
we check pipe_present flags on the pair.
2004-02-25 23:30:56 +00:00
des
2e0da0ec37 Whitespace cleanup 2004-02-24 19:31:30 +00:00
phk
7c172839f5 Fix two oversights here: don't trash the freelist, and properly cleanup
the cdevsw{}.

Submitted by:	tegge
2004-02-23 08:42:55 +00:00
green
3ae42dea9b Correct some major SMP-harmful problems in the pipe implementation. First
of all, PIPE_EOF is not checked pervasively after everything that can drop
the pipe mutex and msleep(), so fix.  Additionally, though it might not
harm anything, pipelock() and pipeunlock() are not used consistently.
Third, the kqueue support functions do not use the pipe mutex correctly.
Last, but absolutely not least, is a race: if pipe_busy is not set on
the closing side of the pipe, the other side that is trying to write to
that will crash BECAUSE PIPE_EOF IS NOT SET!  Unconditionally set
PIPE_EOF, and get rid of all the lockups/crashes I have seen trying
to build ports.
2004-02-22 23:00:14 +00:00
deischen
7d4838de1e Add sysctls to allow showing threads for pgrp, tty, uid, ruid,
and pid.
2004-02-22 17:54:32 +00:00
pjd
01d59d6bbb Reimplement sysctls handling by MAC framework.
Now I believe it is done in the right way.

Removed some XXMAC cases, we now assume 'high' integrity level for all
sysctls, except those with CTLFLAG_ANYBODY flag set. No more magic.

Reviewed by:	rwatson
Approved by:	rwatson, scottl (mentor)
Tested with:	LINT (compilation), mac_biba(4) (functionality)
2004-02-22 12:31:44 +00:00
cperciva
9f17986268 If we're going to panic(), do it before dereferencing a NULL pointer.
Reported by:	"Ted Unangst" <tedu@coverity.com>
Approved by:	rwatson (mentor)
2004-02-22 01:11:53 +00:00
rwatson
90431761a2 Update my personal copyrights and NETA copyrights in the kernel
to use the "year1-year3" format, as opposed to "year1, year2, year3".
This seems to make lawyers more happy, but also prevents the
lines from getting excessively long as the years start to add up.

Suggested by:	imp
2004-02-22 00:33:12 +00:00
phk
711ff67b90 Check for NODEV return from udev2dev() 2004-02-21 23:52:03 +00:00
phk
5551e292d8 Device megapatch 6/6:
This is what we came here for:  Hang dev_t's from their cdevsw,
refcount cdevsw and dev_t and generally keep track of things a lot
better than we used to:

Hold a cdevsw reference around all entrances into the device driver,
this will be necessary to safely determine when we can unload driver
code.

Hold a dev_t reference while the device is open.

KASSERT that we do not enter the driver on a non-referenced dev_t.

Remove old D_NAG code, anonymous dev_t's are not a problem now.

When destroy_dev() is called on a referenced dev_t, move it to
dead_cdevsw's list.  When the refcount drops, free it.

Check that cdevsw->d_version is correct.  If not, set all methods
to the dead_*() methods to prevent entrance into driver.  Print
warning on console to this effect.  The device driver may still
explode if it is also incompatible with newbus, but in that case
we probably didn't get this far in the first place.
2004-02-21 21:57:26 +00:00
phk
39fb4aef3d Device megapatch 5/6:
Remove the unused second argument from udev2dev().

Convert all remaining users of makedev() to use udev2dev().  The
semantic difference is that udev2dev() will only locate a pre-existing
dev_t, it will not line makedev() create a new one.

Apart from the tiny well controlled windown in D_PSEUDO drivers,
there should no longer be any "anonymous" dev_t's in the system
now, only dev_t's created with make_dev() and make_dev_alias()
2004-02-21 21:32:15 +00:00
phk
ad925439e0 Device megapatch 4/6:
Introduce d_version field in struct cdevsw, this must always be
initialized to D_VERSION.

Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing
four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.
2004-02-21 21:10:55 +00:00
phk
fcf7e634fb Device megapatch 3/6:
Add missing D_TTY flags to various drivers.

Complete asserts that dev_t's passed to ttyread(), ttywrite(),
ttypoll() and ttykqwrite() have (d_flags & D_TTY) and a struct tty
pointer.

Make ttyread(), ttywrite(), ttypoll() and ttykqwrite() the default
cdevsw methods for D_TTY drivers and remove the explicit initializations
in various drivers cdevsw structures.
2004-02-21 20:41:11 +00:00
phk
32b7c9a433 Device megapatch 2/6:
This commit adds a couple of functions for pseudodrivers to use for
implementing cloning in a manner we will be able to lock down (shortly).

Basically what happens is that pseudo drivers get a way to ask for
"give me the dev_t with this unit number" or alternatively "give
me a dev_t with the lowest guaranteed free unit number" (there is
unfortunately a lot of non-POLA in the exact numeric value of this
number, just live with it for now)

Managing the unit number space this way removes the need to use
rman(9) to do so in the drivers this greatly simplifies the code in
the drivers because even using rman(9) they still needed to manage
their dev_t's anyway.

I have taken the if_tun, if_tap, snp and nmdm drivers through the
mill, partly because they (ab)used makedev(), but mostly because
together they represent three different problems for device-cloning:

if_tun and snp is the plain case: just give me a device.

if_tap has two kinds of devices, with a flag for device type.

nmdm has paired devices (ala pty) can you can clone either of them.
2004-02-21 20:29:52 +00:00
green
59b7a05d47 Make sure to wake up any select waiters when closing a kqueue (also, not
crash).  I am fairly sure that only people with SMP and multi-threaded
apps using kqueue will be affected by this, so I have a stress-testing
program on my web site:
<URL:http://green.homeunix.org/~green/getaddrinfo-pthreads-stresstest.c>
2004-02-20 04:00:48 +00:00
jhb
a0f2609521 Tidy up the thread taskqueue implementation and close a lost wakeup race.
Instead of creating a mutex that we msleep on but don't actually lock when
doing the corresponding wakeup(), in the kthread, lock the mutex associated
with our taskqueue and msleep while the queue is empty.  Assert that the
queue is locked when the callback function is called to wake the kthread.
2004-02-19 22:03:52 +00:00
nectar
e5fb5670ab Rework jail_attach(2) so that an already jailed process cannot hop
to another jail.

Submitted by:	rwatson
2004-02-19 21:03:20 +00:00
pjd
01946bf901 Added sysctl security.jail.jailed.
It returns 1 is process is inside of jail and 0 if it is not.
Information if we are in jail or not is not a secret, there is plenty of
ways to discover it. Many people are using own hack to check this and
this will be a legal way from now on.

It will be great if our starting scripts will take advantage of this sysctl
to allow clean "boot" inside jail.

Approved by:	rwatson, scottl (mentor)
2004-02-19 14:29:14 +00:00
pjd
6bf6911776 Simplify check. We are only able to check exclusive lock and if
2nd condition is true, first one is true for sure.

Approved by:	jhb, scottl (mentor)
2004-02-19 14:19:31 +00:00
truckman
0dbd9a3b16 When reparenting a process in the PT_DETACH code, only set p_sigparent
to SIGCHLD if the new parent process is initproc.

MFC after:	2 weeks
2004-02-19 10:39:42 +00:00
truckman
05861c09f2 A Linux thread created using clone() should not send SIGCHLD to its
parent if no signal is specified in the clone() flags argument.

PR:		42457
MFC after:	2 weeks
2004-02-19 06:43:48 +00:00
njl
a3ce894968 Add support for 'h' and 'hh' modifiers for printf(9).
Submitted by:	Bruno Ducrot <ducrot AT poupinou.org>
Reviewed by:	bde
2004-02-19 05:29:39 +00:00
cperciva
02c3a3d063 Don't ignore errors from vfs_allocate_syncvnode.
PR:		kern/18503
Submitted by:	Anatoly Vorobey <mellon@pobox.com>
Approved by:	rwatson (mentor)
2004-02-18 05:20:54 +00:00
peter
175d4b65a9 Checkpoint a hack to enable running i386 libc_r binaries on a 64 bit
kernel.  I'm not happy with it yet - refinements are to come.
This hack allows the kern.ps_strings and kern.usrstack sysctls to respond
to a 32 bit request, such as those coming from emulated i386 binaries.
2004-02-18 00:54:17 +00:00
dwmalone
35850f8433 Correct a comment.
Reviewed by:	alfred, tanimura
2004-02-17 12:30:32 +00:00
des
196898f65e Mechanical whistespace cleanup. 2004-02-17 10:21:03 +00:00
des
177497ee85 Don't bother storing a result when all you need are the side effects. 2004-02-16 18:38:46 +00:00
dwmalone
824c230543 In fdcheckstd the descriptor table should never be shared, so just
KASSERT this rather than trying to deal with what happens when file
descriptors change out from under us.
2004-02-15 21:14:48 +00:00
bde
70edb52ffb Fixed style bugs near previous commit (mainly formatting errors and
missing parentheses).  Use default handling (trap to debugger) for
udev2dev(x, 1) since it is an error and doesn't happen anywhere in
the sys tree except in bogusly commented out code in coda.
2004-02-15 20:14:47 +00:00
cperciva
d2e6533b8d Remove opv_desc_vector from vfs_add_vnodeops, since it is defined
and given a value, but never used.  This has no effect on the
resulting binaries, since gcc optimizes the variable away anyway.

PR:		kern/62684
Approved by:	rwatson (mentor)
2004-02-15 17:27:33 +00:00
phk
4fea4ee178 Split the initialization of the cdevsw into a separate function. 2004-02-15 10:35:33 +00:00
rwatson
9af8bd8baa Remove excess brackets. 2004-02-15 00:43:22 +00:00
phk
0b3ced28d7 Use standard style for cdevsw initialization. 2004-02-14 20:03:36 +00:00
rwatson
ee9218912a By default, don't allow processes in a jail to list the set of
jails in the system.  Previous behavior (allowed) may be restored
by setting security.jail.list_allowed=1.
2004-02-14 19:19:47 +00:00
rwatson
c7301501a5 Fix mismerge in last commit: check that cred->cr_prison is NULL
before dereferencing the prison pointer.
2004-02-14 18:52:43 +00:00
rwatson
8caf918eda By default, when a process in jail calls getfsstat(), only return the
data for the file system on which the jail's root vnode is located.
Previous behavior (show data for all mountpoints) can be restored
by setting security.jail.getfsstatroot_only to 0.  Note: this also
has the effect of hiding other mounts inside a jail, such as /dev,
/tmp, and /proc, but errs on the side of leaking less information.
2004-02-14 18:31:11 +00:00
phk
ff78c06dec Remove the check which used to protect us against make_dev() being
called until DEVFS had a chance to initialize.  Since DEVFS is mandatory
and things over in that department coincidentally works from without
any initialization now, this is safe.
2004-02-14 17:19:43 +00:00
green
7211bcf201 T -CURRENT DO NOT CRASH UPON ^T K PLZ THX.
Also, use sched_pctcpu() instead of assuming td->td_kse is non-NULL.
2004-02-14 01:30:06 +00:00
green
5004ca437d Always socantsendmore() before deallocating a socket. This, in turn,
calls selwakeup() if necessary (which it is, if you don't want freed
memory hanging around on your td->td_selq).

Props to:	alfred
2004-02-12 01:48:40 +00:00
truckman
da322e8d35 When reparenting a process to init, make sure that p_sigparent is
set to SIGCHLD.  This avoids the creation of orphaned Linux-threaded
zombies that init is unable to reap.  This can occur when the parent
process sets its SIGCHLD to SIG_IGN.  Fix a similar situation in the
PT_DETACH code.

Tested by:	"Steven Hartland" <killing AT multiplay.co.uk>
2004-02-11 22:06:02 +00:00
jhb
920753994a Argh! Fix a bogon. lim_cur() was returning the hard (max) limit rather
than the soft (cur) limit.

Submitted by:	bde
2004-02-11 18:04:13 +00:00