Commit Graph

3427 Commits

Author SHA1 Message Date
Alfred Perlstein
3a4d365463 Add reserved lkmressys keyword. I swear, this script will die the
next time I need to hack on it.
2000-12-01 08:47:54 +00:00
Alfred Perlstein
1dc8643099 implement NOSTD syscall type, this creates the syscall args, but sticks
a lkmnosys into the sysent table so that SYSCALL_MODULE() works
2000-12-01 07:40:20 +00:00
Alfred Perlstein
c5a86b0ab9 Translate alfred to english.
Submitted by: bde
2000-12-01 06:59:18 +00:00
Jake Burkholder
1512b5d6ab Use an mp-safe callout for endtsleep. 2000-12-01 04:55:52 +00:00
John Baldwin
2191340786 Use msleep() instead of mtx_exit()/tsleep() so that we release the lock and
go to sleep as an "atomic" operation.
2000-12-01 03:43:33 +00:00
John Baldwin
472fd56ea5 Don't update p_stat in exit1() to SZOMB until after releasing the allproc
lock.  Otherwise, if we block on the backing mutex while releasing the
allproc lock, then when we resume, we will be at SRUN, and we will stay
that way all the way through cpu_exit.  As a result, our parent will never
harvest us.
2000-12-01 03:42:17 +00:00
Jake Burkholder
96fde7da19 Use msleep instead of mtx_exit; tsleep; mtx_enter, which is not safe. 2000-12-01 02:18:38 +00:00
John Baldwin
6936206ebd Split the WITNESS and MUTEX_DEBUG options apart so that WITNESS does not
depend on MUTEX_DEBUG.  The MUTEX_DEBUG option turns on extra assertions
and checks to verify that mutexes themselves are implemented properly.
The WITNESS option uses extra checks and diagnostics to verify that other
code is using mutexes properly.
2000-12-01 00:10:59 +00:00
Robert Watson
cf64863a1e o Add a comment to exec_check_permissions() to indicate that the
passed vnode must be locked; this is the case because of calls
  to VOP_GETATTR(), VOP_ACCESS(), and VOP_OPEN().  This becomes
  more of an issue when VOP_ACCESS() gets a bit more complicated,
  which it does when you introduce ACL, Capability, and MAC
  support.

Obtained from:	TrustedBSD Project
2000-11-30 21:06:05 +00:00
Alfred Perlstein
c6ab5768aa only call bwillwrite() to stall on IO when dealing with VNODEs otherwise
we will stall on non-disk IO for things like fifos and sockets
2000-11-30 20:23:14 +00:00
Alfred Perlstein
237710275e This is a fix for a problem described in PR kern/19572. It was
recently discussed at -hackers. The problem is a null-pointer
    dereference that happens in kern/vfs_lookup.c when accessing ".."
    with a v_mount entry for the current directory vnode of NULL. This
    happens when a volume is forcibly unmounted, and the vnode for a
    working directory in the mounted volume is cleared.

PR: 23191
Submitted by: Thomas Moestl <tmoestl@gmx.net>
2000-11-30 20:04:44 +00:00
Alfred Perlstein
1baf4aabbc use a oppurtunistic locking strategy with the uidinfo structures to avoid
locking the global hash on each uifree()

make struct uidinfo only visible to the kernel

make uihold() a function rather than a macro to reduce bloat

swap the order of a spl/mutex to maintain consistancy
2000-11-30 19:15:22 +00:00
Alfred Perlstein
5c3f70d7c0 make crfree into a function rather than a macro to avoid bloat because of
the mutex aquire/release

reorder struct ucred
2000-11-30 19:09:48 +00:00
Kirk McKusick
6d984dfa6a Get rid of a bogus mtx_exit (it was attempting to release an
already released mutex).

Submitted by:	"Chris Knight" <chris@aims.com.au>
2000-11-30 19:09:29 +00:00
Marcel Moolenaar
d034d459da Don't use p->p_sigstk.ss_flags to keep state of whether the
process is on the alternate stack or not. For compatibility
with sigstack(2) state is being updated if such is needed.

We now determine whether the process is on the alternate
stack by looking at its stack pointer. This allows a process
to siglongjmp from a signal handler on the alternate stack
to the place of the sigsetjmp on the normal stack. When
maintaining state, this would have invalidated the state
information and causing a subsequent signal to be delivered
on the normal stack instead of the alternate stack.

PR: 22286
2000-11-30 05:23:49 +00:00
John Baldwin
1bd0eefb4c Fix up priority propagation:
- Use a better test for determining when a process is running.
- Convert some checks to assertions.
- Remove unnecessary tests.
- Save the priority before acquiring a mutex rather than in msleep(9).
2000-11-30 00:51:16 +00:00
John Baldwin
86327ad8a4 Set p_mtxname when blocking on a mutex and clear it when waking up. 2000-11-29 20:17:15 +00:00
John Baldwin
62ca2477d8 Save a copy of p_mtxname in e_mtxname when creating an eproc. 2000-11-29 20:14:50 +00:00
John Baldwin
f404050e44 Use an atomic operation with an appropriate memory barrier when releasing
a contested sleep mutex in the case that at least two processes are blocked
on the contested mutex.
2000-11-29 18:41:19 +00:00
John Baldwin
8f838cb563 The sched_lock mutex goes after the sio mutex in the locking order since
a software interrupt can be scheduled in the sio interrupt handler while
the sio mutex is held.
2000-11-29 18:38:14 +00:00
John Baldwin
bbc7a98a31 Save the line number and filename of the last mtx_enter operation for
spin locks.  We already do this for sleep locks.
2000-11-29 18:37:01 +00:00
John Baldwin
e2979dcc85 Don't drop Giant and the passed in mutex incorrectly in the
cold || panicstr case.  Do drop the passed in mutex in that case if
PDROP is specified.
2000-11-29 18:32:50 +00:00
John Baldwin
2bcc63c545 Only print out APIC info on an SMP system during a panic if APIC_IO is
defined.
2000-11-29 01:33:15 +00:00
John Baldwin
8d9888d37a Don't wait forever for CPUs to stop or restart. Instead, give up after a
timeout.  If DIAGNOSTIC is turned on, then display a message to the console
with a map of which CPUs failed to stop or restart.  This gives an SMP box
at least a fighting chance of getting into DDB if one of the other CPUs has
interrupts disabled.
2000-11-28 23:52:36 +00:00
Jordan K. Hubbard
7022a92395 Kernel support for erase2 character.
Submitted by:	Rui Pedro Mendes Salgueiro <rps@mat.uc.pt>
2000-11-28 20:03:23 +00:00
Matthew N. Dodd
46aa504e42 Alter the return value and arguments of the GET_RESOURCE_LIST bus method.
Alter consumers of this method to conform to the new convention.
Minor cosmetic adjustments to bus.h.

This isn't of concern as this interface isn't in use yet.
2000-11-28 06:49:15 +00:00
Jake Burkholder
4f55983606 Use callout_reset instead of timeout(9). Most callouts are statically
allocated, 2 have been added to struct proc for setitimer and sleep.

Reviewed by:	jhb, jlemon
2000-11-27 22:52:31 +00:00
John Baldwin
91b7c97713 Drop Giant around the mi_switch() call in yield().
Submitted by:	tegge
2000-11-27 18:48:13 +00:00
Alfred Perlstein
1e5d626ad9 ucred system overhaul:
1) mpsafe (protect the refcount with a mutex).
2) reduce duplicated code by removing the inlined crdup() from crcopy()
   and make crcopy() call crdup().
3) use M_ZERO flag when allocating initial structs instead of calling bzero
   after allocation.
4) expand the size of the refcount from a u_short to an u_int, by using
   shorts we might have an overflow.

Glanced at by: jake
2000-11-27 00:09:16 +00:00
Alfred Perlstein
0931dcefb3 Move the #define of _KERN_MUTEX_C_ so that it's before any system headers
are included.  System headers can include sys/mutex.h and then certain
macros do not get defined.

Reviewed by: jake
2000-11-26 21:14:17 +00:00
Poul-Henning Kamp
a52585d77e Simplify the tprintf() API.
Loose the special <sys/tprintf.h> #include file.
2000-11-26 20:35:21 +00:00
Poul-Henning Kamp
4d88c4598f Make log(-1, ...) do what addlog(...) did.
Replace all uses of addlog(...) with log(-1, ...)

Remove bogus "register" keywords in subr_prf.c

Make log() return void.
2000-11-26 19:34:06 +00:00
Poul-Henning Kamp
cb7e609a3c Make diskerr() always log with printf. 2000-11-26 19:29:15 +00:00
Jake Burkholder
a5d5c61c12 Add uidinfo hash and uidinfo struct to the witness order list. 2000-11-26 15:05:46 +00:00
Alfred Perlstein
9c19bcddf0 Make uidinfo subsystem mpsafe
use a mutex lock when looking up/deleting entries on the hashlist
use a mutex lock on each uidinfo when updating fields

make uifree() a void function rather than 'int' since no one cares

allocate uidinfo structs with the M_ZERO flag and don't explicitly initialize
them

Assisted by: eivind, jhb, jakeb
2000-11-26 12:08:17 +00:00
Jonathan Lemon
e82ac18e52 Revert the last commit to the callout interface, and add a flag to
callout_init() indicating whether the callout is safe or not.  Update
the callers of callout_init() to reflect the new interface.

Okayed by: Jake
2000-11-25 06:22:16 +00:00
Jake Burkholder
249849e0b9 - Rename callout_reset to _callout_reset and add a flags argument.
- Add macros callout_reset, which does the obvious, and
  mp_callout_reset, which passes the CALLOUT_MPSAFE flag.
2000-11-25 03:34:49 +00:00
Jake Burkholder
553629ebc9 Protect the following with a lockmgr lock:
allproc
	zombproc
	pidhashtbl
	proc.p_list
	proc.p_hash
	nextpid

Reviewed by:	jhb
Obtained from:	BSD/OS and netbsd
2000-11-22 07:42:04 +00:00
John Baldwin
0959cc6680 Ahem, fix the disclaimer portion of the copyright so it disclaim's the
voices in my head.  You can sue the voices in Bill Paul's head all you
want.

Noticed by:	jhb
2000-11-21 21:10:15 +00:00
Jonathan Lemon
4a476efa51 Protect p_wchan with sched_lock in selwakeup(). 2000-11-21 20:22:34 +00:00
Alan Cox
c6fa9f78d2 Provide a new interface for the user of aio_read() and aio_write() to request
a kevent upon completion of the I/O.  Specifically, introduce a new type
of sigevent notification, SIGEV_EVENT.  If sigev_notify is SIGEV_EVENT,
then sigev_notify_kqueue names the kqueue that should receive the event
and sigev_value contains the "void *" is copied into the kevent's udata
field.

In contrast to the existing interface, this one: 1) works on
the Alpha 2) avoids the extra copyin() call for the kevent because all
of the information needed is in the sigevent and 3) could be
applied to request a single kevent upon completion of an entire lio_listio().

Reviewed by:	jlemon
2000-11-21 19:36:36 +00:00
Alfred Perlstein
830fedd28f Accept filters broke kernels compiled without options INET.
Make accept filters conditional on INET support to fix.

Pointed out by: bde
Tested and assisted by: Stephen J. Kiernan <sab@vegamuse.org>
2000-11-20 01:35:25 +00:00
Robert Watson
7f112b0489 o Export cp_time ("CPU time statistics") using SYSCTL_OPAQUE.
This removes a reason that systat requires setgid kmem.  More to
  come.
2000-11-20 00:44:58 +00:00
Robert Watson
aa5429970c o Export nchstats ("VFS cache effectiveness statistics") using
SYSCTL_OPAQUE.  This removes a reason that systat requires
  setgid kmem.  More to come.
2000-11-20 00:41:11 +00:00
David Malone
32af0d74f0 Make sbcompress use the new M_WRITABLE macro. Previously sbcompress
could not compress into clusters. This could result in lots of
wasted clusters while recieving small packets from an interface
that uses clusters for all it's packets.

Patch is partially from BSDi (limiting the size of the copy) and
based on a patch for 4.1 by Ian Dowse <iedowse@maths.tcd.ie> and
myself.

Reviewed by:	bmilekic
Obtained From:	BSDi
Submitted by:	iedowse
2000-11-19 22:22:47 +00:00
Jake Burkholder
fa2fbc3dac - Protect the callout wheel with a separate spin mutex, callout_lock.
- Use the mutex in hardclock to ensure no races between it and
  softclock.
- Make softclock be INTR_MPSAFE and provide a flag,
  CALLOUT_MPSAFE, which specifies that a callout handler does not
  need giant.  There is still no way to set this flag when
  regstering a callout.

Reviewed by:	-smp@, jlemon
2000-11-19 06:02:32 +00:00
Matthew Dillon
936524aa02 Implement a low-memory deadlock solution.
Removed most of the hacks that were trying to deal with low-memory
    situations prior to now.

    The new code is based on the concept that I/O must be able to function in
    a low memory situation.  All major modules related to I/O (except
    networking) have been adjusted to allow allocation out of the system
    reserve memory pool.  These modules now detect a low memory situation but
    rather then block they instead continue to operate, then return resources
    to the memory pool instead of cache them or leave them wired.

    Code has been added to stall in a low-memory situation prior to a vnode
    being locked.

    Thus situations where a process blocks in a low-memory condition while
    holding a locked vnode have been reduced to near nothing.  Not only will
    I/O continue to operate, but many prior deadlock conditions simply no
    longer exist.

Implement a number of VFS/BIO fixes

	(found by Ian): in biodone(), bogus-page replacement code, the loop
        was not properly incrementing loop variables prior to a continue
        statement.  We do not believe this code can be hit anyway but we
        aren't taking any chances.  We'll turn the whole section into a
        panic (as it already is in brelse()) after the release is rolled.

	In biodone(), the foff calculation was incorrectly
        clamped to the iosize, causing the wrong foff to be calculated
        for pages in the case of an I/O error or biodone() called without
        initiating I/O.  The problem always caused a panic before.  Now it
        doesn't.  The problem is mainly an issue with NFS.

	Fixed casts for ~PAGE_MASK.  This code worked properly before only
        because the calculations use signed arithmatic.  Better to properly
        extend PAGE_MASK first before inverting it for the 64 bit masking
        op.

	In brelse(), the bogus_page fixup code was improperly throwing
        away the original contents of 'm' when it did the j-loop to
        fix the bogus pages.  The result was that it would potentially
        invalidate parts of the *WRONG* page(!), leading to corruption.

	There may still be cases where a background bitmap write is
        being duplicated, causing potential corruption.  We have identified
        a potentially serious bug related to this but the fix is still TBD.
        So instead this patch contains a KASSERT to detect the problem
  	and panic the machine rather then continue to corrupt the filesystem.
	The problem does not occur very often..  it is very hard to
	reproduce, and it may or may not be the cause of the corruption
	people have reported.

Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>)
Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>
2000-11-18 23:06:26 +00:00
Matthew Dillon
279d722604 This patchset fixes a large number of file descriptor race conditions.
Pre-rfork code assumed inherent locking of a process's file descriptor
    array.  However, with the advent of rfork() the file descriptor table
    could be shared between processes.  This patch closes over a dozen
    serious race conditions related to one thread manipulating the table
    (e.g. closing or dup()ing a descriptor) while another is blocked in
    an open(), close(), fcntl(), read(), write(), etc...

PR: kern/11629
Discussed with: Alexander Viro <viro@math.psu.edu>
2000-11-18 21:01:04 +00:00
John Baldwin
b6b55e27a4 Release sched_lock very briefly to give interrupts a chance to fire if we
are in softclock() for a long time.  The old code already did an
splx()/slphigh() pair here, I just missed adding in the equivalent mutex
operations on sched_lock earlier.
2000-11-18 00:21:00 +00:00
Tor Egge
e5c5b82950 Don't attempt to cluster write buffers where the VMIO flag isn't set. 2000-11-17 23:40:08 +00:00
Jake Burkholder
7da6f97772 - Split the run queue and sleep queue linkage, so that a process
may block on a mutex while on the sleep queue without corrupting
it.
- Move dropping of Giant to after the acquire of sched_lock.

Tested by:	John Hay <jhay@icomtek.csir.co.za>
		jhb
2000-11-17 18:09:18 +00:00
John Baldwin
cb799bfef9 The recent changes to msleep() and mawait() resulted in timeout() and
untimeout() not being called with Giant in those functions.  For now,
use the sched_lock to protect the callout wheel in softclock() and in
the various timeout and callout functions.

Noticed by:	tegge
2000-11-16 21:20:52 +00:00
John Baldwin
20cdcc5b73 Don't release and acquire Giant in mi_switch(). Instead, release and
acquire Giant as needed in functions that call mi_switch().  The releases
need to be done outside of the sched_lock to avoid potential deadlocks
from trying to acquire Giant while interrupts are disabled.

Submitted by:	witness
2000-11-16 02:16:44 +00:00
John Baldwin
92c79c7e3e Argh, add in a missing release of the sched_lock. 2000-11-16 01:16:54 +00:00
John Baldwin
95de685572 CURSIG() calls functions that acquire sleep mutexes, so it is not a good
idea to be holding the sched_lock while we are calling it.  As such,
release sched_lock before calling CURSIG() in msleep() and mawait() and
reacquire it after CURSIG() returns.

Submitted by:	witness
2000-11-16 01:07:19 +00:00
John Baldwin
b84988521c - Rename await() to mawait(). mawait() is to await() as msleep() is to
tsleep().  Namely, mawait() takes an extra argument which is a mutex
  to drop when going to sleep.  Just as with msleep(), if the priority
  argument includes the PDROP flag, then the mutex will be dropped and will
  not be reacquired when the process wakes up.
- Add in a backwards compatible macro await() that passes in NULL as the
  mutex argument to mawait().
2000-11-15 22:39:35 +00:00
John Baldwin
3ae4dd935b - Replace a KASSERT() that knew too much about mutex internals with a
mtx_assert() that ensures the mutex we release during msleep() is both
  not recursed and owned by the current process.
2000-11-15 22:30:48 +00:00
John Baldwin
f33a072eb9 - Convert references from tsleep() -> msleep()
- Fix a buglet in a comment above await()
2000-11-15 22:27:38 +00:00
John Baldwin
9c36c934a1 Include the right headers to get the DDB #define and the db_active variable. 2000-11-15 22:08:16 +00:00
John Baldwin
896c2303d4 - Replace some instances of sched_ithd with sched_swi in KTR tracepoints.
- Assert that Giant is not owned during the main loop of sithd_loop().
2000-11-15 22:05:23 +00:00
John Baldwin
59f857e4ea Declare the 'witness_spin_check' properly as a per-CPU variable in the
non-SMP case.
2000-11-15 22:02:05 +00:00
John Baldwin
ecbd8e3710 Don't perform witness checks in witness_enter() during a panic. 2000-11-15 22:00:31 +00:00
John Baldwin
22f1b34223 Make ktr_verbose a bit more useful:
- On SMP systems display the cpu number with each message
- If ktr_verbose > 1, then include the filename and line number with each
  trace message
2000-11-15 21:51:53 +00:00
Kirk McKusick
324d6bacc3 Bug fix for revision 1.14 on the replacement of CIRCLEQ with TAILQ.
Submitted by:	Warner Losh <imp@village.org>
2000-11-15 20:07:16 +00:00
Kirk McKusick
a077f63555 In preparation for deprecating CIRCLEQ macros in favor of TAILQ
macros which provide the same functionality and are a bit more
efficient, convert use of CIRCLEQ's in resource manager to TAILQ's.

Approved by:	Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
2000-11-14 20:46:02 +00:00
David Greenman
866746b6a6 Fixed a certain panic on IO error in sendfile(): Page must be set PG_BUSY
before calling vm_page_free() on it.
2000-11-12 14:51:15 +00:00
Bosko Milekic
e778918123 * Have m_pulldown() use the new M_WRITABLE() macro in order to determine
whether the given ext_buf is shared.

* Have the sf_bufs be setup with the mbuf subsystem using MEXTADD() with the
two new arguments.

Note: m_pulldown() is somewhat crotchy; the added comment explains the
situation.

Reviewed by: jlemon
2000-11-11 23:04:15 +00:00
Robert Watson
7f73938e96 o Fix a mis-transcription of sef's -STABLE protection fixes--only root
could debug processes after the commit that introduced the typo.
  Security is good, but security is not always the same as turning things
  off :-).

PR:		kern/22711
Obtained from:	brooks@one-eyed-alien.net
2000-11-10 23:57:48 +00:00
John Baldwin
20af769e69 Don't overwrite the filename for KTR_EXTEND with "../../kern/kern_ktr.c". 2000-11-10 22:30:44 +00:00
John Baldwin
9842fc8dda Axe some unused variables. 2000-11-10 21:54:19 +00:00
John Baldwin
bf619f9506 Fix SMP kernel compiles by #include'ing machine/globals.h to get the
cpuid variable.
2000-11-10 21:52:04 +00:00
John Baldwin
0fe4e534b1 Minor whitespace nit in a comment. 2000-11-10 21:21:20 +00:00
John Baldwin
b5d09a79b5 Ignore the INTR_MPSAFE flag when calculating the priority of an interrupt
thread.
2000-11-10 21:19:14 +00:00
Mike Smith
edcb5775ec Implement a trivial but effective interface for obtaining the kernel's
device tree and resource manager contents.  This is the kernel side of
the upcoming libdevinfo, which will expose this information to userspace
applications in a trivial fashion.

Remove the now-obsolete DEVICE_SYSCTLS code.
2000-11-09 10:21:23 +00:00
Marcel Moolenaar
806d7daafe Make MINSIGSTKSZ machine dependent, and have the sigaltstack
syscall compare against a variable sv_minsigstksz in struct
sysentvec as to properly take the size of the machine- and
ABI dependent struct sigframe into account.

The SVR4 and iBCS2 modules continue to have a minsigstksz of
8192 to preserve behavior. The real values (if different) are
not known at this time. Other ABI modules use the real
values.

The native MINSIGSTKSZ is now defined as follows:

Arch		MINSIGSTKSZ
----		-----------
alpha		    4096
i386		    2048
ia64		   12288

Reviewed by: mjacob
Suggested by: bde
2000-11-09 08:25:48 +00:00
John Baldwin
d8f03321bd - Remove much of the inlining of the KTR tracepoints into a ktr_tracepoint()
function declared in kern_ktr.c.  The only inline checks left are the
  checks that compare KTR_COMPILE with the supplied mask and thus should
  be optimized away into either nothing or a direct call to ktr_tracepoint().
- Move several KTR-related options to opt_ktr.h now that they are only
  needed by kern_ktr.c and not by ktr.h.
- Add in the ktr_verbose functionality if KTR_EXTEND is turned on.  If the
  global variable 'ktr_verbose' is non-zero, then KTR messages will be
  dumped to the console.  This variable can be set by either kernel code
  or via the 'debug.ktr_verbose' sysctl.  It defaults to off unless the
  KTR_VERBOSE kernel option is specified in which case it defaults to on.
  This can be useful when the machine locks up spinning in a loop with
  interrupts disabled as you might be able to see what it is doing when it
  locks up.

Requested by:	phk
2000-11-07 01:49:48 +00:00
John Baldwin
a924ab9741 Minor nit: missed ithd_loop -> sithd_loop in the KTR tracepoints. 2000-11-07 00:45:18 +00:00
David E. O'Brien
00910f2882 ELF kernels should use an ELF sysvec. This allows us to move a.out
specific files to those platforms that acutally support a.out.
2000-11-05 10:41:35 +00:00
Bosko Milekic
fe27eea9d1 Change the sf_bufs wakeups to be wakeup_one(), because we don't want to
wakeup all of the sleeping threads when we free only one buffer. This
avoids us having to needlessly try again (and fail, and go back to
sleep) for all the threads sleeping. We will now only wakeup the
thread we know will succeed.

Reviewed by: green
2000-11-04 21:55:25 +00:00
Bosko Milekic
0eecc42758 Setup and put to use the mutex lock for sf_freelist, the sendfile(2) bufs
freelist. Should now be thread-friendly, in part.

Note: More work is needed in uipc_syscalls.c, but it will have to wait until
the socket locking issues are at least 80% implemented and committed.
2000-11-04 07:16:08 +00:00
Tor Egge
a2d1480cf8 Clear the VFREE flag when the vnode is removed from the free list in
getnewvnode().  Otherwise routines called from VOP_INACTIVE() might
attempt to remove the vnode from a free list the vnode isn't on,
causing corruption.
PR:		18012
2000-11-02 21:42:54 +00:00
Poul-Henning Kamp
1d7e3e42e7 Take VBLK devices further out of their missery.
This should fix the panic I introduced in my previous commit on this topic.
2000-11-02 21:14:13 +00:00
Eivind Eklund
e3c4036b18 Give vop_mmap an untimely death. The opportunity to give it a timely
death timed out in 1996.
2000-11-01 17:57:24 +00:00
Poul-Henning Kamp
a16d0eb2d7 Deprecate devsw->d_bmaj entirely.
This removes support for booting current kernels with very old bootblocks.

Device driver writers: Please remove initializations for the d_bmaj
field in your cdevsw{}.
2000-10-31 10:58:14 +00:00
Jordan K. Hubbard
e7c2b5a51d Add a new ioctl for doing virgin disklabels.
Submitted by:	dillon
2000-10-31 07:05:40 +00:00
Robert Watson
cb1f0db9db o Deny access to System V IPC from within jail by default, as in the
current implementation, jail neither virtualizes the Sys V IPC namespace,
  nor provides inter-jail protections on IPC objects.
o Support for System V IPC can be enabled by setting jail.sysvipc_allowed=1
  using sysctl.
o This is not the "real fix" which involves virtualizing the System V
  IPC namespace, but prevents processes within jail from influencing those
  outside of jail when not approved by the administrator.

Reported by:	Paulo Fragoso <paulo@nlink.com.br>
2000-10-31 01:34:00 +00:00
Robert Watson
c087a04f6a o Tighten up rules for which processes can't debug which other processes
in the p_candebug() function.  Synchronize with sef's CHECKIO()
  macro from the old procfs, which seems to be a good source of security
  checks.

Obtained from:	TrustedBSD Project
2000-10-30 20:30:03 +00:00
Kenneth D. Merry
2906da29dc Write support for the cd(4) driver.
This allows writing to DVD-RAM, PD and similar drives that probe as CD
devices.  Note that these are randomly writeable devices, not
sequential-only devices like CD-R drives, which are supported by cdrecord.

Add a new flag value for dsopen(), DSO_COMPATLABEL.  The cd(4) driver now
uses this flag instead of the DSO_NOLABELS flag.  The DSO_NOLABELS always
used a "fake" disklabel for the entire disk, provided by the caller.

With the DSO_COMPATLABEL flag, dsopen() will first search the media for a
label, and if it finds a label, it will use that label.  Otherwise it will
use the fake disklabel provided by the caller.  This provides backwards
compatibility, since we will still have labels for ISO9660 media.

It also provides new functionality, since you can now have a regular BSD
disklabel on read-only media, or on writeable media (e.g. DVD-RAM).

Bruce and I both think that we should eventually (in a few years) get
away from using disklabels for ISO9660 media, and just use the whole disk
device (/dev/cd0).  At that point disklabel handling in the cd(4) driver
could follow the "normal" model, as used in the da(4) driver.

Also, clean up the path in a couple of places in cdregister().  (Thanks to
Nick Hibma for catching that bug.)

Reviewed by:	bde
2000-10-30 07:03:00 +00:00
Alan Cox
39b2b25fa0 _aio_aqueue(): Change kevent registration to use its own struct file pointer.
Otherwise, aio_read() and aio_write() on sockets are broken if a kevent is
 registered.  (The code after kevent registration for handling sockets assumes
 that the struct file pointer "fp" still refers to the socket, not the kqueue.)
2000-10-29 21:38:28 +00:00
Poul-Henning Kamp
fe4e324374 Allow all users to access the dev -> devname sysctl. 2000-10-29 19:50:06 +00:00
Poul-Henning Kamp
da936bf80a Remove unneeded <stddef.h> #includes. 2000-10-29 16:57:42 +00:00
Poul-Henning Kamp
cf9fa8e725 Move suser() and suser_xxx() prototypes and a related #define from
<sys/proc.h> to <sys/systm.h>.

Correctly document the #includes needed in the manpage.

Add one now needed #include of <sys/systm.h>.
Remove the consequent 48 unused #includes of <sys/proc.h>.
2000-10-29 16:06:56 +00:00
Poul-Henning Kamp
53ce36d17a Remove unneeded #include <sys/proc.h> lines. 2000-10-29 13:57:19 +00:00
Don Lewis
19c34d1596 Nuke a bit of dead code. 2000-10-29 01:00:36 +00:00
Alan Cox
4a71feb71c Add missing call to knote_fdclose() in setugidsafety() and fdcloseexec().
Reviewed by:	jlemon
2000-10-28 20:27:32 +00:00
Poul-Henning Kamp
46aa3347cb Convert all users of fldoff() to offsetof(). fldoff() is bad
because it only takes a struct tag which makes it impossible to
use unions, typedefs etc.

Define __offsetof() in <machine/ansi.h>

Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h>

Remove myriad of local offsetof() definitions.

Remove includes of <stddef.h> in kernel code.

NB: Kernelcode should *never* include from /usr/include !

Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API.

Deprecate <struct.h> with a warning.  The warning turns into an error on
01-12-2000 and the file gets removed entirely on 01-01-2001.

Paritials reviews by:   various.
Significant brucifications by:  bde
2000-10-27 11:45:49 +00:00
John Baldwin
a5a96a1978 - Use MUTEX_DECLARE() and MTX_COLD for the WITNESS code's internal mutex so
it can function before malloc(9) is up and running.
- Add two new options WITNESS_DDB and WITNESS_SKIPSPIN.  If WITNESS_SKIPSPIN
  is enabled, then spin mutexes are ignored by the WITNESS code.  If
  WITNESS_DDB is turned on and DDB is compiled into the kernel, then the
  kernel will drop into DDB when either a lock hierarchy violation occurs
  or mutexes are held when going to sleep.
- Add some new sysctls:
  debug.witness_ddb is a read-write sysctl that corresponds to WITNESS_DDB.
     The kernel option merely changes the default value to on at boot.
  debug.witness_skipspin is a read-only sysctl that one can use to determine
     if the kernel was compiled with WITNESS_SKIPSPIN.
- Wipe out the BSD/OS-specific lock order lists.  We get to build our own
  lists now as we add mutexes to the kernel.
2000-10-27 02:59:30 +00:00
Andrew Gallatin
810bfc8ea1 unstaticize change_ruid() because it is needed by osf1_setuid() 2000-10-26 15:49:35 +00:00
John Baldwin
8088699f79 - Overhaul the software interrupt code to use interrupt threads for each
type of software interrupt.  Roughly, what used to be a bit in spending
  now maps to a swi thread.  Each thread can have multiple handlers, just
  like a hardware interrupt thread.
- Instead of using a bitmask of pending interrupts, we schedule the specific
  software interrupt thread to run, so spending, NSWI, and the shandlers
  array are no longer needed.  We can now have an arbitrary number of
  software interrupt threads.  When you register a software interrupt
  thread via sinthand_add(), you get back a struct intrhand that you pass
  to sched_swi() when you wish to schedule your swi thread to run.
- Convert the name of 'struct intrec' to 'struct intrhand' as it is a bit
  more intuitive.  Also, prefix all the members of struct intrhand with
  'ih_'.
- Make swi_net() a MI function since there is now no point in it being
  MD.

Submitted by:	cp
2000-10-25 05:19:40 +00:00
John Baldwin
3127162743 Quite some warnings. 2000-10-25 04:37:54 +00:00
John Baldwin
d543796f86 - Make the eventhandler_mutex mutex a private variable in
subr_eventhandler.c
- Move the extra #include's in sys/eventhandler.h to be protected by
  the #ifndef SYS_EVENTHANDLER/#endif
2000-10-25 00:01:39 +00:00
Warner Losh
bbfe025461 Cleanup the rman_make_alignment_flags function to be much clearer and shorter
than the prior version.
2000-10-22 04:48:11 +00:00
John Baldwin
b67a3e6e85 Propogate the 'const'ness of mutex descriptions to the witness code to
quiet warnings.
2000-10-20 22:45:01 +00:00
John Baldwin
78f0da0373 Actually enable the witness code if the WITNESS kernel option is enabled. 2000-10-20 21:58:11 +00:00
John Baldwin
f5271ebc2f Doh. Fix a 64-bit-ism by using uintptr_t for a temporary lock variable
instead of int.
2000-10-20 20:24:40 +00:00
Poul-Henning Kamp
1921a06d6a Introduce the M_ZERO flag to malloc(9)
Instead of:

        foo = malloc(sizeof(foo), M_WAIT);
        bzero(foo, sizeof(foo));

You can now (and please do) use:

        foo = malloc(sizeof(foo), M_WAIT | M_ZERO);

In the future this will enable us to do idle-time pre-zeroing of
malloc-space.
2000-10-20 17:54:55 +00:00
John Baldwin
35e0e5b311 Catch up to moving headers:
- machine/ipl.h -> sys/ipl.h
- machine/mutex.h -> sys/mutex.h
2000-10-20 07:58:15 +00:00
John Baldwin
700bfa750f - GC some #if 0'd code regarding the non-existant safepri variable.
- Don't dink with the witness state of Giant unless we actually own it
  during mi_switch().
2000-10-20 07:52:10 +00:00
John Baldwin
eec258d257 - machine/mutex.h -> sys/mutex.h
- Use MUTEX_DECLARE() and MTX_COLD for the malloc_mtx mutex
2000-10-20 07:29:16 +00:00
John Baldwin
d8881ca31d - machine/mutex.h -> sys/mutex.h
- The initial lock_mtx mutex used in the lockmgr code is initialized very
  early, so use MUTEX_DECLARE() and MTX_COLD.
2000-10-20 07:28:00 +00:00
John Baldwin
36412d79b4 - Make the mutex code almost completely machine independent. This greatly
reducues the maintenance load for the mutex code.  The only MD portions
  of the mutex code are in machine/mutex.h now, which include the assembly
  macros for handling mutexes as well as optionally overriding the mutex
  micro-operations.  For example, we use optimized micro-ops on the x86
  platform #ifndef I386_CPU.
- Change the behavior of the SMP_DEBUG kernel option.  In the new code,
  mtx_assert() only depends on INVARIANTS, allowing other kernel developers
  to have working mutex assertiions without having to include all of the
  mutex debugging code.  The SMP_DEBUG kernel option has been renamed to
  MUTEX_DEBUG and now just controls extra mutex debugging code.
- Abolish the ugly mtx_f hack.  Instead, we dynamically allocate
  seperate mtx_debug structures on the fly in mtx_init, except for mutexes
  that are initiated very early in the boot process.   These mutexes
  are declared using a special MUTEX_DECLARE() macro, and use a new
  flag MTX_COLD when calling mtx_init.  This is still somewhat hackish,
  but it is less evil than the mtx_f filler struct, and the mtx struct is
  now the same size with and without mutex debugging code.
- Add some micro-micro-operation macros for doing the actual atomic
  operations on the mutex mtx_lock field to make it easier for other archs
  to override/optimize mutex ops if needed.  These new tiny ops also clean
  up the code in some places by replacing long atomic operation function
  calls that spanned 2-3 lines with a short 1-line macro call.
- Don't call mi_switch() from mtx_enter_hard() when we block while trying
  to obtain a sleep mutex.  Calling mi_switch() would bogusly release
  Giant before switching to the next process.  Instead, inline most of the
  code from mi_switch() in the mtx_enter_hard() function.  Note that when
  we finally kill Giant we can back this out and go back to calling
  mi_switch().
2000-10-20 07:26:37 +00:00
John Baldwin
d8f831678d Reparent a kernel thread to init during kthread_exit() so that the zombie
can be reaped.
2000-10-19 19:53:44 +00:00
Robert Watson
47460a23a0 o Introduce new VOP_ACCESS() flag VADMIN, allowing file systems to perform
"administrative" authorization checks.  In most cases, the VADMIN test
  checks to make sure the credential effective uid is the same as the file
  owner.
o Modify vaccess() to set VADMIN as an available right if the uid is
  appropriate.
o Modify references to uid-based access control operations such that they
  now always invoke VOP_ACCESS() instead of using hard-coded policy checks.
o This allows alternative UFS policies to be implemented by replacing only
  ufs_access() (such as mandatory system policies).
o VOP_ACCESS() requires the caller to hold an exclusive vnode lock on the
  vnode: I believe that new invocations of VOP_ACCESS() are always called
  with the lock held.
o Some direct checks of the uid remain, largely associated with the QUOTA
  and SUIDDIR code.

Reviewed by:	eivind
Obtained from:	TrustedBSD Project
2000-10-19 07:53:59 +00:00
John Baldwin
dc13e6dfbb Axe the idle_event eventhandler, and add a MD cpu_idle function used
for things such as halting CPU's, idling CPU's, etc.

Discussed with:	msmith
2000-10-19 07:47:16 +00:00
Peter Wemm
5d391f75d6 EVENTHANDLER_INVOKE() takes two arguments. 2000-10-18 17:56:06 +00:00
John Baldwin
86bc23af90 Don't needlessly pass the diagnostic counter to the idle_event event
handlers.
2000-10-18 08:10:25 +00:00
Matthew N. Dodd
0cb53e2487 Add new bus method 'GET_RESOURCE_LIST' and appropriate generic
implementation.

Add bus_generic_rl_{get,set,delete,release,alloc}_resource() functions
which provide generic operations for devices using resource list style
resource management.

This should simplify a number of bus drivers.  Further commits to follow.
2000-10-18 05:15:40 +00:00
John Baldwin
3650b37578 - Wrap the sanity checks for staying in the idle loop for absurdly long
amounts of time in #ifdef DIAGNOSTIC
- Call vm_page_zero_idle() during the idle loop.
2000-10-17 23:12:37 +00:00
Warner Losh
85d693f9d8 Implement resource alignment as discussed in arch@ a long time ago.
This was implemented by Shigeru YAMAMOTO-san and Jonathan Chen.  I've
cleaned them up somewhat and they seem to work well enough to boot
current (but given current's state it can be hard to tell).  Doug
Rabson also reviewed the design and signed off on it.
2000-10-17 22:08:03 +00:00
Nick Hibma
d686268728 Put the header section in the header file not the c file.
Submitted by:	Jonathan Chen <jon@spock.org>
PR:		21982
2000-10-15 15:19:35 +00:00
Poul-Henning Kamp
db7e3af111 Remove unneeded #include <machine/clock.h> 2000-10-15 14:19:01 +00:00
Bosko Milekic
181d2a1564 Add nmbcnt sysctl and make it tunable at boottime; nmbcnt is the
number of ext_buf counters that are possibly allocatable.

Do this because:

  (i) It will make it easier to influence EXT_COUNTERS for if_sk,
      if_ti (or similar) users where the driver allocates its own
      ext_bufs and where it is important for the mbuf system to take
      it into account when reserving necessary space for counters.

  (ii) Facilitate some percentile calculation for netstat(1)
2000-10-15 06:24:07 +00:00
John W. De Boskey
2ec40c9aac Remove the signal value check from the PT_STEP codepath. It
can cause an bogus failure.

Reviewed by:    Sean Eric Fagan <sef@kithrup.com>
                and no other response to the review request.
2000-10-14 03:56:01 +00:00
Peter Wemm
ac5f943c37 savectx() is now used exclusively by the crash dump system. Move the
i386 specific gunk (copy %cr3 to the pcb) from the MI dumpsys() to the
MD savectx().
2000-10-13 22:03:29 +00:00
Paul Saab
16a011f973 Do not allocate a callout for all crashdumps, not just when you panic. 2000-10-13 21:49:19 +00:00
Robert Watson
ab024bb02e o Simplify capability types away from an array of ints to a single
u_int64_t flag field, bounding the number of capabilities at 64,
  but substantially cleaning up capability logic (there are currently
  43 defined capabilities).

o Heads up to anyone actually using capabilities: the constant
  assignments for various capabilities have been redone, so any
  persistent binary capability stores (i.e., '$posix1e.cap' EA
  backing files) must be recreated.  If you have one of these,
  you'll know about it, so if you have no idea what this means,
  don't worry.

o Update libposix1e to reflect this new definition, fixing the
  exposed functions that directly manipulate the flags fields.

Obtained from:	TrustedBSD Project
2000-10-13 17:12:58 +00:00
Jason Evans
9722d88fba For lockmgr mutex protection, use an array of mutexes that are allocated
and initialized during boot.  This avoids bloating sizeof(struct lock).
As a side effect, it is no longer necessary to enforce the assumtion that
lockinit()/lockdestroy() calls are paired, so the LK_VALID flag has been
removed.

Idea taken from:	BSD/OS.
2000-10-12 22:37:28 +00:00
Doug Rabson
63c47a5ca0 Add a gross hack for ia64 to allocate the backing store for a new program. 2000-10-12 14:24:03 +00:00
Eivind Eklund
7eb9fca557 Blow away the v_specmountpoint define, replacing it with what it was
defined as (rdev->si_mountpoint)
2000-10-09 17:31:39 +00:00
Jason Evans
39df86086f Do not call lockdestroy() for v_vnlock, which may point to a lock in a
deeper vfs stacking layer.

Submitted by:	bp
2000-10-06 08:04:48 +00:00
John Baldwin
ca29467e9a Correct a warning where the r_debug_state() dummy function used to trigger
a breakpoint in the kernel didn't use the proper argument list.  To avoid
having to include the userland link.h header everyhwere that sys/linker.h
is used, make r_debug_state() a static function in link_elf.c as well.
2000-10-06 05:20:02 +00:00
John Baldwin
6c56727456 - Change fast interrupts on x86 to push a full interrupt frame and to
return through doreti to handle ast's.  This is necessary for the
  clock interrupts to work properly.
- Change the clock interrupts on the x86 to be fast instead of threaded.
  This is needed because both hardclock() and statclock() need to run in
  the context of the current process, not in a separate thread context.
- Kill the prevproc hack as it is no longer needed.
- We really need Giant when we call psignal(), but we don't want to block
  during the clock interrupt.  Instead, use two p_flag's in the proc struct
  to mark the current process as having a pending SIGVTALRM or a SIGPROF
  and let them be delivered during ast() when hardclock() has finished
  running.
- Remove CLKF_BASEPRI, which was #ifdef'd out on the x86 anyways.  It was
  broken on the x86 if it was turned on since cpl is gone.  It's only use
  was to bogusly run softclock() directly during hardclock() rather than
  scheduling an SWI.
- Remove the COM_LOCK simplelock and replace it with a clock_lock spin
  mutex.  Since the spin mutex already handles disabling/restoring
  interrupts appropriately, this also lets us axe all the *_intr() fu.
- Back out the hacks in the APIC_IO x86 cpu_initclocks() code to use
  temporary fast interrupts for the APIC trial.
- Add two new process flags P_ALRMPEND and P_PROFPEND to mark the pending
  signals in hardclock() that are to be delivered in ast().

Submitted by:	jakeb (making statclock safe in a fast interrupt)
Submitted by:	cp (concept of delaying signals until ast())
2000-10-06 02:20:21 +00:00
John Baldwin
a91b7dc11b Various whitespace cleanups after the SMPng commit, which jumbled things
around a bit in the trap handling code.
2000-10-06 01:55:07 +00:00
John Baldwin
0e2aab1237 Don't treat a kernel stack fault the same as a general protect fault or
a segment not present fault in the non-vm86 case.
2000-10-06 01:50:43 +00:00
John Baldwin
1931cf940a - Heavyweight interrupt threads on the alpha for device I/O interrupts.
- Make softinterrupts (SWI's) almost completely MI, and divorce them
  completely from the x86 hardware interrupt code.
  - The ihandlers array is now gone.  Instead, there is a MI shandlers array
    that just contains SWI handlers.
  - Most of the former machine/ipl.h files have moved to a new sys/ipl.h.
- Stub out all the spl*() functions on all architectures.

Submitted by:	dfr
2000-10-05 23:09:57 +00:00
Eivind Eklund
a863c0fb2f Style fixes based on comments by bde 2000-10-05 18:22:46 +00:00
Doug Rabson
c9b004775d Add a workaround for statically linked kernels. 2000-10-04 17:40:24 +00:00
Jason Evans
a18b1f1d4d Convert lockmgr locks from using simple locks to using mutexes.
Add lockdestroy() and appropriate invocations, which corresponds to
lockinit() and must be called to clean up after a lockmgr lock is no
longer needed.
2000-10-04 01:29:17 +00:00
Boris Popov
f8be809e0f Move KASSERTs which checks value of v_usecount after vnode locking, so
it will not produce wrong alarms.
2000-10-02 09:57:06 +00:00
Mike Smith
aa998012c7 Treat %X the same as %x (not entirely correct, but close enough). 2000-10-02 07:13:10 +00:00
Bosko Milekic
7d03271452 Big mbuf subsystem diff #1: incorporate mutexes and fix things up somewhat
to accomodate the changes.

 Here's a list of things that have changed (I may have left out a few); for a
 relatively complete list, see http://people.freebsd.org/~bmilekic/mtx_journal

   * Remove old (once useful) mcluster code for MCLBYTES > PAGE_SIZE which
     nobody uses anymore. It was great while it lasted, but now we're moving
     onto bigger and better things (Approved by: wollman).

   * Practically re-wrote the allocation macros in sys/sys/mbuf.h to accomodate
     new allocations which grab the necessary lock.

   * Make sure that necessary mbstat variables are manipulated with
     corresponding atomic() routines.

   * Changed the "wait" routines, cleaned it up, made one routine that does
     the job.

   * Generalized MWAKEUP() macro. Got rid of m_retry and m_retryhdr, as they
     are now included in the generalized "wait" routines.

   * Sleep routines now use msleep().

   * Free lists have locks.

   * etc... probably other stuff I'm missing...

  Things to look out for and work on later:

   * find a better way to (dynamically) adjust EXT_COUNTERS

   * move necessity to recurse on a lock from drain routines by providing
     lock-free lower-level version of MFREE() (and possibly m_free()?).

   * checkout include of mutex.h in sys/sys/mbuf.h - probably violating
     general philosophy here.

   The code has been reviewed quite a bit, but problems may arise... please,
   don't panic! Send me Emails: bmilekic@freebsd.org

Reviewed by: jlemon, cp, alfred, others?
2000-09-30 06:30:39 +00:00
Doug Rabson
918c9eec57 Add ia64 support. 2000-09-29 13:36:47 +00:00
Doug Rabson
ff2d7ae543 Don't support dynamic linking on ia64 for now - the tools can't cope. 2000-09-29 13:34:04 +00:00
Doug Rabson
b99353b99e Change the conditionaal so that we only build this on i386 instead of
trying to build it on all non-alpha arches.
2000-09-29 13:32:24 +00:00
Jonathan Lemon
d5aa12349f Check so_error in filt_so{read|write} in order to detect UDP errors.
PR: 21601
2000-09-28 04:41:22 +00:00
Kirk McKusick
02a1e48f02 Do the right thing if bdevvp is called twice for the same device.
Obtained from:	Poul-Henning Kamp <phk@freebsd.org>
2000-09-27 18:03:17 +00:00
Alan Cox
b92bb032d8 aio_qphysio: Eliminate one instance of an out-of-range check that is
performed twice.  Eliminate initialization that is already performed
 by _aio_aqueue.

aio_physwakeup: Eliminate redundant synchronization that is already
 performed by bufdone.
2000-09-26 06:35:22 +00:00
Takanori Watanabe
b9a22da4cf Make size of dynamic loader argument variable to support
various executable file format.

Reviewed by:	peter
2000-09-26 05:09:21 +00:00
Boris Popov
67e871664b Add a lock structure to vnode structure. Previously it was either allocated
separately (nfs, cd9660 etc) or keept as a first element of structure
referenced by v_data pointer(ffs). Such organization leads to known problems
with stacked filesystems.

From this point vop_no*lock*() functions maintain only interlock lock.
vop_std*lock*() functions maintain built-in v_lock structure using lockmgr().
vop_sharedlock() is compatible with vop_stdunlock(), but maintains a shared
lock on vnode.

If filesystem wishes to export lockmgr compatible lock, it can put an address
of this lock to v_vnlock field. This indicates that the upper filesystem
can take advantage of it and use single lock structure for entire (or part)
of stack of vnodes. This field shouldn't be examined or modified by VFS code
except for initialization purposes.

Reviewed in general by:	mckusick
2000-09-25 15:24:04 +00:00
John Baldwin
fd2802cfe0 Add a KASSERT() to catch instances where the mutex that we pass in to
msleep() are recursed.

Suggested by:	cp
2000-09-24 00:33:51 +00:00