Commit Graph

3355 Commits

Author SHA1 Message Date
John Baldwin
b5d09a79b5 Ignore the INTR_MPSAFE flag when calculating the priority of an interrupt
thread.
2000-11-10 21:19:14 +00:00
Mike Smith
edcb5775ec Implement a trivial but effective interface for obtaining the kernel's
device tree and resource manager contents.  This is the kernel side of
the upcoming libdevinfo, which will expose this information to userspace
applications in a trivial fashion.

Remove the now-obsolete DEVICE_SYSCTLS code.
2000-11-09 10:21:23 +00:00
Marcel Moolenaar
806d7daafe Make MINSIGSTKSZ machine dependent, and have the sigaltstack
syscall compare against a variable sv_minsigstksz in struct
sysentvec as to properly take the size of the machine- and
ABI dependent struct sigframe into account.

The SVR4 and iBCS2 modules continue to have a minsigstksz of
8192 to preserve behavior. The real values (if different) are
not known at this time. Other ABI modules use the real
values.

The native MINSIGSTKSZ is now defined as follows:

Arch		MINSIGSTKSZ
----		-----------
alpha		    4096
i386		    2048
ia64		   12288

Reviewed by: mjacob
Suggested by: bde
2000-11-09 08:25:48 +00:00
John Baldwin
d8f03321bd - Remove much of the inlining of the KTR tracepoints into a ktr_tracepoint()
function declared in kern_ktr.c.  The only inline checks left are the
  checks that compare KTR_COMPILE with the supplied mask and thus should
  be optimized away into either nothing or a direct call to ktr_tracepoint().
- Move several KTR-related options to opt_ktr.h now that they are only
  needed by kern_ktr.c and not by ktr.h.
- Add in the ktr_verbose functionality if KTR_EXTEND is turned on.  If the
  global variable 'ktr_verbose' is non-zero, then KTR messages will be
  dumped to the console.  This variable can be set by either kernel code
  or via the 'debug.ktr_verbose' sysctl.  It defaults to off unless the
  KTR_VERBOSE kernel option is specified in which case it defaults to on.
  This can be useful when the machine locks up spinning in a loop with
  interrupts disabled as you might be able to see what it is doing when it
  locks up.

Requested by:	phk
2000-11-07 01:49:48 +00:00
John Baldwin
a924ab9741 Minor nit: missed ithd_loop -> sithd_loop in the KTR tracepoints. 2000-11-07 00:45:18 +00:00
David E. O'Brien
00910f2882 ELF kernels should use an ELF sysvec. This allows us to move a.out
specific files to those platforms that acutally support a.out.
2000-11-05 10:41:35 +00:00
Bosko Milekic
fe27eea9d1 Change the sf_bufs wakeups to be wakeup_one(), because we don't want to
wakeup all of the sleeping threads when we free only one buffer. This
avoids us having to needlessly try again (and fail, and go back to
sleep) for all the threads sleeping. We will now only wakeup the
thread we know will succeed.

Reviewed by: green
2000-11-04 21:55:25 +00:00
Bosko Milekic
0eecc42758 Setup and put to use the mutex lock for sf_freelist, the sendfile(2) bufs
freelist. Should now be thread-friendly, in part.

Note: More work is needed in uipc_syscalls.c, but it will have to wait until
the socket locking issues are at least 80% implemented and committed.
2000-11-04 07:16:08 +00:00
Tor Egge
a2d1480cf8 Clear the VFREE flag when the vnode is removed from the free list in
getnewvnode().  Otherwise routines called from VOP_INACTIVE() might
attempt to remove the vnode from a free list the vnode isn't on,
causing corruption.
PR:		18012
2000-11-02 21:42:54 +00:00
Poul-Henning Kamp
1d7e3e42e7 Take VBLK devices further out of their missery.
This should fix the panic I introduced in my previous commit on this topic.
2000-11-02 21:14:13 +00:00
Eivind Eklund
e3c4036b18 Give vop_mmap an untimely death. The opportunity to give it a timely
death timed out in 1996.
2000-11-01 17:57:24 +00:00
Poul-Henning Kamp
a16d0eb2d7 Deprecate devsw->d_bmaj entirely.
This removes support for booting current kernels with very old bootblocks.

Device driver writers: Please remove initializations for the d_bmaj
field in your cdevsw{}.
2000-10-31 10:58:14 +00:00
Jordan K. Hubbard
e7c2b5a51d Add a new ioctl for doing virgin disklabels.
Submitted by:	dillon
2000-10-31 07:05:40 +00:00
Robert Watson
cb1f0db9db o Deny access to System V IPC from within jail by default, as in the
current implementation, jail neither virtualizes the Sys V IPC namespace,
  nor provides inter-jail protections on IPC objects.
o Support for System V IPC can be enabled by setting jail.sysvipc_allowed=1
  using sysctl.
o This is not the "real fix" which involves virtualizing the System V
  IPC namespace, but prevents processes within jail from influencing those
  outside of jail when not approved by the administrator.

Reported by:	Paulo Fragoso <paulo@nlink.com.br>
2000-10-31 01:34:00 +00:00
Robert Watson
c087a04f6a o Tighten up rules for which processes can't debug which other processes
in the p_candebug() function.  Synchronize with sef's CHECKIO()
  macro from the old procfs, which seems to be a good source of security
  checks.

Obtained from:	TrustedBSD Project
2000-10-30 20:30:03 +00:00
Kenneth D. Merry
2906da29dc Write support for the cd(4) driver.
This allows writing to DVD-RAM, PD and similar drives that probe as CD
devices.  Note that these are randomly writeable devices, not
sequential-only devices like CD-R drives, which are supported by cdrecord.

Add a new flag value for dsopen(), DSO_COMPATLABEL.  The cd(4) driver now
uses this flag instead of the DSO_NOLABELS flag.  The DSO_NOLABELS always
used a "fake" disklabel for the entire disk, provided by the caller.

With the DSO_COMPATLABEL flag, dsopen() will first search the media for a
label, and if it finds a label, it will use that label.  Otherwise it will
use the fake disklabel provided by the caller.  This provides backwards
compatibility, since we will still have labels for ISO9660 media.

It also provides new functionality, since you can now have a regular BSD
disklabel on read-only media, or on writeable media (e.g. DVD-RAM).

Bruce and I both think that we should eventually (in a few years) get
away from using disklabels for ISO9660 media, and just use the whole disk
device (/dev/cd0).  At that point disklabel handling in the cd(4) driver
could follow the "normal" model, as used in the da(4) driver.

Also, clean up the path in a couple of places in cdregister().  (Thanks to
Nick Hibma for catching that bug.)

Reviewed by:	bde
2000-10-30 07:03:00 +00:00
Alan Cox
39b2b25fa0 _aio_aqueue(): Change kevent registration to use its own struct file pointer.
Otherwise, aio_read() and aio_write() on sockets are broken if a kevent is
 registered.  (The code after kevent registration for handling sockets assumes
 that the struct file pointer "fp" still refers to the socket, not the kqueue.)
2000-10-29 21:38:28 +00:00
Poul-Henning Kamp
fe4e324374 Allow all users to access the dev -> devname sysctl. 2000-10-29 19:50:06 +00:00
Poul-Henning Kamp
da936bf80a Remove unneeded <stddef.h> #includes. 2000-10-29 16:57:42 +00:00
Poul-Henning Kamp
cf9fa8e725 Move suser() and suser_xxx() prototypes and a related #define from
<sys/proc.h> to <sys/systm.h>.

Correctly document the #includes needed in the manpage.

Add one now needed #include of <sys/systm.h>.
Remove the consequent 48 unused #includes of <sys/proc.h>.
2000-10-29 16:06:56 +00:00
Poul-Henning Kamp
53ce36d17a Remove unneeded #include <sys/proc.h> lines. 2000-10-29 13:57:19 +00:00
Don Lewis
19c34d1596 Nuke a bit of dead code. 2000-10-29 01:00:36 +00:00
Alan Cox
4a71feb71c Add missing call to knote_fdclose() in setugidsafety() and fdcloseexec().
Reviewed by:	jlemon
2000-10-28 20:27:32 +00:00
Poul-Henning Kamp
46aa3347cb Convert all users of fldoff() to offsetof(). fldoff() is bad
because it only takes a struct tag which makes it impossible to
use unions, typedefs etc.

Define __offsetof() in <machine/ansi.h>

Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h>

Remove myriad of local offsetof() definitions.

Remove includes of <stddef.h> in kernel code.

NB: Kernelcode should *never* include from /usr/include !

Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API.

Deprecate <struct.h> with a warning.  The warning turns into an error on
01-12-2000 and the file gets removed entirely on 01-01-2001.

Paritials reviews by:   various.
Significant brucifications by:  bde
2000-10-27 11:45:49 +00:00
John Baldwin
a5a96a1978 - Use MUTEX_DECLARE() and MTX_COLD for the WITNESS code's internal mutex so
it can function before malloc(9) is up and running.
- Add two new options WITNESS_DDB and WITNESS_SKIPSPIN.  If WITNESS_SKIPSPIN
  is enabled, then spin mutexes are ignored by the WITNESS code.  If
  WITNESS_DDB is turned on and DDB is compiled into the kernel, then the
  kernel will drop into DDB when either a lock hierarchy violation occurs
  or mutexes are held when going to sleep.
- Add some new sysctls:
  debug.witness_ddb is a read-write sysctl that corresponds to WITNESS_DDB.
     The kernel option merely changes the default value to on at boot.
  debug.witness_skipspin is a read-only sysctl that one can use to determine
     if the kernel was compiled with WITNESS_SKIPSPIN.
- Wipe out the BSD/OS-specific lock order lists.  We get to build our own
  lists now as we add mutexes to the kernel.
2000-10-27 02:59:30 +00:00
Andrew Gallatin
810bfc8ea1 unstaticize change_ruid() because it is needed by osf1_setuid() 2000-10-26 15:49:35 +00:00
John Baldwin
8088699f79 - Overhaul the software interrupt code to use interrupt threads for each
type of software interrupt.  Roughly, what used to be a bit in spending
  now maps to a swi thread.  Each thread can have multiple handlers, just
  like a hardware interrupt thread.
- Instead of using a bitmask of pending interrupts, we schedule the specific
  software interrupt thread to run, so spending, NSWI, and the shandlers
  array are no longer needed.  We can now have an arbitrary number of
  software interrupt threads.  When you register a software interrupt
  thread via sinthand_add(), you get back a struct intrhand that you pass
  to sched_swi() when you wish to schedule your swi thread to run.
- Convert the name of 'struct intrec' to 'struct intrhand' as it is a bit
  more intuitive.  Also, prefix all the members of struct intrhand with
  'ih_'.
- Make swi_net() a MI function since there is now no point in it being
  MD.

Submitted by:	cp
2000-10-25 05:19:40 +00:00
John Baldwin
3127162743 Quite some warnings. 2000-10-25 04:37:54 +00:00
John Baldwin
d543796f86 - Make the eventhandler_mutex mutex a private variable in
subr_eventhandler.c
- Move the extra #include's in sys/eventhandler.h to be protected by
  the #ifndef SYS_EVENTHANDLER/#endif
2000-10-25 00:01:39 +00:00
Warner Losh
bbfe025461 Cleanup the rman_make_alignment_flags function to be much clearer and shorter
than the prior version.
2000-10-22 04:48:11 +00:00
John Baldwin
b67a3e6e85 Propogate the 'const'ness of mutex descriptions to the witness code to
quiet warnings.
2000-10-20 22:45:01 +00:00
John Baldwin
78f0da0373 Actually enable the witness code if the WITNESS kernel option is enabled. 2000-10-20 21:58:11 +00:00
John Baldwin
f5271ebc2f Doh. Fix a 64-bit-ism by using uintptr_t for a temporary lock variable
instead of int.
2000-10-20 20:24:40 +00:00
Poul-Henning Kamp
1921a06d6a Introduce the M_ZERO flag to malloc(9)
Instead of:

        foo = malloc(sizeof(foo), M_WAIT);
        bzero(foo, sizeof(foo));

You can now (and please do) use:

        foo = malloc(sizeof(foo), M_WAIT | M_ZERO);

In the future this will enable us to do idle-time pre-zeroing of
malloc-space.
2000-10-20 17:54:55 +00:00
John Baldwin
35e0e5b311 Catch up to moving headers:
- machine/ipl.h -> sys/ipl.h
- machine/mutex.h -> sys/mutex.h
2000-10-20 07:58:15 +00:00
John Baldwin
700bfa750f - GC some #if 0'd code regarding the non-existant safepri variable.
- Don't dink with the witness state of Giant unless we actually own it
  during mi_switch().
2000-10-20 07:52:10 +00:00
John Baldwin
eec258d257 - machine/mutex.h -> sys/mutex.h
- Use MUTEX_DECLARE() and MTX_COLD for the malloc_mtx mutex
2000-10-20 07:29:16 +00:00
John Baldwin
d8881ca31d - machine/mutex.h -> sys/mutex.h
- The initial lock_mtx mutex used in the lockmgr code is initialized very
  early, so use MUTEX_DECLARE() and MTX_COLD.
2000-10-20 07:28:00 +00:00
John Baldwin
36412d79b4 - Make the mutex code almost completely machine independent. This greatly
reducues the maintenance load for the mutex code.  The only MD portions
  of the mutex code are in machine/mutex.h now, which include the assembly
  macros for handling mutexes as well as optionally overriding the mutex
  micro-operations.  For example, we use optimized micro-ops on the x86
  platform #ifndef I386_CPU.
- Change the behavior of the SMP_DEBUG kernel option.  In the new code,
  mtx_assert() only depends on INVARIANTS, allowing other kernel developers
  to have working mutex assertiions without having to include all of the
  mutex debugging code.  The SMP_DEBUG kernel option has been renamed to
  MUTEX_DEBUG and now just controls extra mutex debugging code.
- Abolish the ugly mtx_f hack.  Instead, we dynamically allocate
  seperate mtx_debug structures on the fly in mtx_init, except for mutexes
  that are initiated very early in the boot process.   These mutexes
  are declared using a special MUTEX_DECLARE() macro, and use a new
  flag MTX_COLD when calling mtx_init.  This is still somewhat hackish,
  but it is less evil than the mtx_f filler struct, and the mtx struct is
  now the same size with and without mutex debugging code.
- Add some micro-micro-operation macros for doing the actual atomic
  operations on the mutex mtx_lock field to make it easier for other archs
  to override/optimize mutex ops if needed.  These new tiny ops also clean
  up the code in some places by replacing long atomic operation function
  calls that spanned 2-3 lines with a short 1-line macro call.
- Don't call mi_switch() from mtx_enter_hard() when we block while trying
  to obtain a sleep mutex.  Calling mi_switch() would bogusly release
  Giant before switching to the next process.  Instead, inline most of the
  code from mi_switch() in the mtx_enter_hard() function.  Note that when
  we finally kill Giant we can back this out and go back to calling
  mi_switch().
2000-10-20 07:26:37 +00:00
John Baldwin
d8f831678d Reparent a kernel thread to init during kthread_exit() so that the zombie
can be reaped.
2000-10-19 19:53:44 +00:00
Robert Watson
47460a23a0 o Introduce new VOP_ACCESS() flag VADMIN, allowing file systems to perform
"administrative" authorization checks.  In most cases, the VADMIN test
  checks to make sure the credential effective uid is the same as the file
  owner.
o Modify vaccess() to set VADMIN as an available right if the uid is
  appropriate.
o Modify references to uid-based access control operations such that they
  now always invoke VOP_ACCESS() instead of using hard-coded policy checks.
o This allows alternative UFS policies to be implemented by replacing only
  ufs_access() (such as mandatory system policies).
o VOP_ACCESS() requires the caller to hold an exclusive vnode lock on the
  vnode: I believe that new invocations of VOP_ACCESS() are always called
  with the lock held.
o Some direct checks of the uid remain, largely associated with the QUOTA
  and SUIDDIR code.

Reviewed by:	eivind
Obtained from:	TrustedBSD Project
2000-10-19 07:53:59 +00:00
John Baldwin
dc13e6dfbb Axe the idle_event eventhandler, and add a MD cpu_idle function used
for things such as halting CPU's, idling CPU's, etc.

Discussed with:	msmith
2000-10-19 07:47:16 +00:00
Peter Wemm
5d391f75d6 EVENTHANDLER_INVOKE() takes two arguments. 2000-10-18 17:56:06 +00:00
John Baldwin
86bc23af90 Don't needlessly pass the diagnostic counter to the idle_event event
handlers.
2000-10-18 08:10:25 +00:00
Matthew N. Dodd
0cb53e2487 Add new bus method 'GET_RESOURCE_LIST' and appropriate generic
implementation.

Add bus_generic_rl_{get,set,delete,release,alloc}_resource() functions
which provide generic operations for devices using resource list style
resource management.

This should simplify a number of bus drivers.  Further commits to follow.
2000-10-18 05:15:40 +00:00
John Baldwin
3650b37578 - Wrap the sanity checks for staying in the idle loop for absurdly long
amounts of time in #ifdef DIAGNOSTIC
- Call vm_page_zero_idle() during the idle loop.
2000-10-17 23:12:37 +00:00
Warner Losh
85d693f9d8 Implement resource alignment as discussed in arch@ a long time ago.
This was implemented by Shigeru YAMAMOTO-san and Jonathan Chen.  I've
cleaned them up somewhat and they seem to work well enough to boot
current (but given current's state it can be hard to tell).  Doug
Rabson also reviewed the design and signed off on it.
2000-10-17 22:08:03 +00:00
Nick Hibma
d686268728 Put the header section in the header file not the c file.
Submitted by:	Jonathan Chen <jon@spock.org>
PR:		21982
2000-10-15 15:19:35 +00:00
Poul-Henning Kamp
db7e3af111 Remove unneeded #include <machine/clock.h> 2000-10-15 14:19:01 +00:00
Bosko Milekic
181d2a1564 Add nmbcnt sysctl and make it tunable at boottime; nmbcnt is the
number of ext_buf counters that are possibly allocatable.

Do this because:

  (i) It will make it easier to influence EXT_COUNTERS for if_sk,
      if_ti (or similar) users where the driver allocates its own
      ext_bufs and where it is important for the mbuf system to take
      it into account when reserving necessary space for counters.

  (ii) Facilitate some percentile calculation for netstat(1)
2000-10-15 06:24:07 +00:00
John W. De Boskey
2ec40c9aac Remove the signal value check from the PT_STEP codepath. It
can cause an bogus failure.

Reviewed by:    Sean Eric Fagan <sef@kithrup.com>
                and no other response to the review request.
2000-10-14 03:56:01 +00:00
Peter Wemm
ac5f943c37 savectx() is now used exclusively by the crash dump system. Move the
i386 specific gunk (copy %cr3 to the pcb) from the MI dumpsys() to the
MD savectx().
2000-10-13 22:03:29 +00:00
Paul Saab
16a011f973 Do not allocate a callout for all crashdumps, not just when you panic. 2000-10-13 21:49:19 +00:00
Robert Watson
ab024bb02e o Simplify capability types away from an array of ints to a single
u_int64_t flag field, bounding the number of capabilities at 64,
  but substantially cleaning up capability logic (there are currently
  43 defined capabilities).

o Heads up to anyone actually using capabilities: the constant
  assignments for various capabilities have been redone, so any
  persistent binary capability stores (i.e., '$posix1e.cap' EA
  backing files) must be recreated.  If you have one of these,
  you'll know about it, so if you have no idea what this means,
  don't worry.

o Update libposix1e to reflect this new definition, fixing the
  exposed functions that directly manipulate the flags fields.

Obtained from:	TrustedBSD Project
2000-10-13 17:12:58 +00:00
Jason Evans
9722d88fba For lockmgr mutex protection, use an array of mutexes that are allocated
and initialized during boot.  This avoids bloating sizeof(struct lock).
As a side effect, it is no longer necessary to enforce the assumtion that
lockinit()/lockdestroy() calls are paired, so the LK_VALID flag has been
removed.

Idea taken from:	BSD/OS.
2000-10-12 22:37:28 +00:00
Doug Rabson
63c47a5ca0 Add a gross hack for ia64 to allocate the backing store for a new program. 2000-10-12 14:24:03 +00:00
Eivind Eklund
7eb9fca557 Blow away the v_specmountpoint define, replacing it with what it was
defined as (rdev->si_mountpoint)
2000-10-09 17:31:39 +00:00
Jason Evans
39df86086f Do not call lockdestroy() for v_vnlock, which may point to a lock in a
deeper vfs stacking layer.

Submitted by:	bp
2000-10-06 08:04:48 +00:00
John Baldwin
ca29467e9a Correct a warning where the r_debug_state() dummy function used to trigger
a breakpoint in the kernel didn't use the proper argument list.  To avoid
having to include the userland link.h header everyhwere that sys/linker.h
is used, make r_debug_state() a static function in link_elf.c as well.
2000-10-06 05:20:02 +00:00
John Baldwin
6c56727456 - Change fast interrupts on x86 to push a full interrupt frame and to
return through doreti to handle ast's.  This is necessary for the
  clock interrupts to work properly.
- Change the clock interrupts on the x86 to be fast instead of threaded.
  This is needed because both hardclock() and statclock() need to run in
  the context of the current process, not in a separate thread context.
- Kill the prevproc hack as it is no longer needed.
- We really need Giant when we call psignal(), but we don't want to block
  during the clock interrupt.  Instead, use two p_flag's in the proc struct
  to mark the current process as having a pending SIGVTALRM or a SIGPROF
  and let them be delivered during ast() when hardclock() has finished
  running.
- Remove CLKF_BASEPRI, which was #ifdef'd out on the x86 anyways.  It was
  broken on the x86 if it was turned on since cpl is gone.  It's only use
  was to bogusly run softclock() directly during hardclock() rather than
  scheduling an SWI.
- Remove the COM_LOCK simplelock and replace it with a clock_lock spin
  mutex.  Since the spin mutex already handles disabling/restoring
  interrupts appropriately, this also lets us axe all the *_intr() fu.
- Back out the hacks in the APIC_IO x86 cpu_initclocks() code to use
  temporary fast interrupts for the APIC trial.
- Add two new process flags P_ALRMPEND and P_PROFPEND to mark the pending
  signals in hardclock() that are to be delivered in ast().

Submitted by:	jakeb (making statclock safe in a fast interrupt)
Submitted by:	cp (concept of delaying signals until ast())
2000-10-06 02:20:21 +00:00
John Baldwin
a91b7dc11b Various whitespace cleanups after the SMPng commit, which jumbled things
around a bit in the trap handling code.
2000-10-06 01:55:07 +00:00
John Baldwin
0e2aab1237 Don't treat a kernel stack fault the same as a general protect fault or
a segment not present fault in the non-vm86 case.
2000-10-06 01:50:43 +00:00
John Baldwin
1931cf940a - Heavyweight interrupt threads on the alpha for device I/O interrupts.
- Make softinterrupts (SWI's) almost completely MI, and divorce them
  completely from the x86 hardware interrupt code.
  - The ihandlers array is now gone.  Instead, there is a MI shandlers array
    that just contains SWI handlers.
  - Most of the former machine/ipl.h files have moved to a new sys/ipl.h.
- Stub out all the spl*() functions on all architectures.

Submitted by:	dfr
2000-10-05 23:09:57 +00:00
Eivind Eklund
a863c0fb2f Style fixes based on comments by bde 2000-10-05 18:22:46 +00:00
Doug Rabson
c9b004775d Add a workaround for statically linked kernels. 2000-10-04 17:40:24 +00:00
Jason Evans
a18b1f1d4d Convert lockmgr locks from using simple locks to using mutexes.
Add lockdestroy() and appropriate invocations, which corresponds to
lockinit() and must be called to clean up after a lockmgr lock is no
longer needed.
2000-10-04 01:29:17 +00:00
Boris Popov
f8be809e0f Move KASSERTs which checks value of v_usecount after vnode locking, so
it will not produce wrong alarms.
2000-10-02 09:57:06 +00:00
Mike Smith
aa998012c7 Treat %X the same as %x (not entirely correct, but close enough). 2000-10-02 07:13:10 +00:00
Bosko Milekic
7d03271452 Big mbuf subsystem diff #1: incorporate mutexes and fix things up somewhat
to accomodate the changes.

 Here's a list of things that have changed (I may have left out a few); for a
 relatively complete list, see http://people.freebsd.org/~bmilekic/mtx_journal

   * Remove old (once useful) mcluster code for MCLBYTES > PAGE_SIZE which
     nobody uses anymore. It was great while it lasted, but now we're moving
     onto bigger and better things (Approved by: wollman).

   * Practically re-wrote the allocation macros in sys/sys/mbuf.h to accomodate
     new allocations which grab the necessary lock.

   * Make sure that necessary mbstat variables are manipulated with
     corresponding atomic() routines.

   * Changed the "wait" routines, cleaned it up, made one routine that does
     the job.

   * Generalized MWAKEUP() macro. Got rid of m_retry and m_retryhdr, as they
     are now included in the generalized "wait" routines.

   * Sleep routines now use msleep().

   * Free lists have locks.

   * etc... probably other stuff I'm missing...

  Things to look out for and work on later:

   * find a better way to (dynamically) adjust EXT_COUNTERS

   * move necessity to recurse on a lock from drain routines by providing
     lock-free lower-level version of MFREE() (and possibly m_free()?).

   * checkout include of mutex.h in sys/sys/mbuf.h - probably violating
     general philosophy here.

   The code has been reviewed quite a bit, but problems may arise... please,
   don't panic! Send me Emails: bmilekic@freebsd.org

Reviewed by: jlemon, cp, alfred, others?
2000-09-30 06:30:39 +00:00
Doug Rabson
918c9eec57 Add ia64 support. 2000-09-29 13:36:47 +00:00
Doug Rabson
ff2d7ae543 Don't support dynamic linking on ia64 for now - the tools can't cope. 2000-09-29 13:34:04 +00:00
Doug Rabson
b99353b99e Change the conditionaal so that we only build this on i386 instead of
trying to build it on all non-alpha arches.
2000-09-29 13:32:24 +00:00
Jonathan Lemon
d5aa12349f Check so_error in filt_so{read|write} in order to detect UDP errors.
PR: 21601
2000-09-28 04:41:22 +00:00
Kirk McKusick
02a1e48f02 Do the right thing if bdevvp is called twice for the same device.
Obtained from:	Poul-Henning Kamp <phk@freebsd.org>
2000-09-27 18:03:17 +00:00
Alan Cox
b92bb032d8 aio_qphysio: Eliminate one instance of an out-of-range check that is
performed twice.  Eliminate initialization that is already performed
 by _aio_aqueue.

aio_physwakeup: Eliminate redundant synchronization that is already
 performed by bufdone.
2000-09-26 06:35:22 +00:00
Takanori Watanabe
b9a22da4cf Make size of dynamic loader argument variable to support
various executable file format.

Reviewed by:	peter
2000-09-26 05:09:21 +00:00
Boris Popov
67e871664b Add a lock structure to vnode structure. Previously it was either allocated
separately (nfs, cd9660 etc) or keept as a first element of structure
referenced by v_data pointer(ffs). Such organization leads to known problems
with stacked filesystems.

From this point vop_no*lock*() functions maintain only interlock lock.
vop_std*lock*() functions maintain built-in v_lock structure using lockmgr().
vop_sharedlock() is compatible with vop_stdunlock(), but maintains a shared
lock on vnode.

If filesystem wishes to export lockmgr compatible lock, it can put an address
of this lock to v_vnlock field. This indicates that the upper filesystem
can take advantage of it and use single lock structure for entire (or part)
of stack of vnodes. This field shouldn't be examined or modified by VFS code
except for initialization purposes.

Reviewed in general by:	mckusick
2000-09-25 15:24:04 +00:00
John Baldwin
fd2802cfe0 Add a KASSERT() to catch instances where the mutex that we pass in to
msleep() are recursed.

Suggested by:	cp
2000-09-24 00:33:51 +00:00
Paul Saab
92b123a002 Move MAXCPU from machine/smp.h to machine/param.h to fix breakage
with !SMP kernels.  Also, replace NCPUS with MAXCPU since they are
redundant.
2000-09-23 12:18:06 +00:00
Jason Evans
9a02e8c68f Don't #include <sys/proc.h>, since machine/mutex.h does it now. 2000-09-23 00:01:37 +00:00
Paul Saab
7321545f26 Remove the NCPU, NAPIC, NBUS, NINTR config options. Make NAPIC,
NBUS, NINTR dynamic and set NCPU to a maximum of 16 under SMP.

Reviewed by:	peter
2000-09-22 23:40:10 +00:00
Robert Watson
100d2c187c o Introduce vn_extattr_rm(), a helper function in the style of
vn_extattr_get() and vn_extattr_set().  vn_extattr_rm() removes the
  specified extended attribute from a vnode, authorizing the change as
  the kernel (NULL cred).

Obtained from:	TrustedBSD Project
2000-09-22 22:33:13 +00:00
Eivind Eklund
453aaa0dff Style fixes:
* Add lots of comments
* Convert a couple of assertions to KASSERT()
* Minimal whitespace & misapplied {} fixes
* Convert #if 0 to #if COMPILING_LINT for code we presently do not
  support, but want to keep available.

Reviewed by:	adrian, markm
2000-09-22 12:22:36 +00:00
Eivind Eklund
bba25953af Staticize addalias() 2000-09-22 11:54:48 +00:00
Mike Smith
bdf3d8b954 Create an event (idle_event) which is invoked every time around the
idle loop.  Machine-dependant code can elect to eg. take power-saving
actions when this event is invoked.
2000-09-22 03:19:24 +00:00
Mike Smith
6595c331f3 Make the EVENTHANDLER mechanism MP-safe. Events can now be invoked
without holding the Giant lock.
2000-09-22 03:17:35 +00:00
Robert Watson
988ee790d4 o Change locking rules for VOP_GETACL() to indicate that vnode locks
must be held when retrieving ACLs from vnodes.  This is required for
  EA-based UFS ACL implementations.
o Update vacl_get_acl() so that it does appropriate vnode locking.
o Remove static from M_ACL malloc define so that it is accessible for
  consumers of ACLs other than in kern_acl.c

Obtained from:	TrustedBSD Project
2000-09-21 18:43:32 +00:00
Alfred Perlstein
21a9039725 comment vfs_export functions, requested by: eivind 2000-09-21 15:55:55 +00:00
Don Lewis
eabc23efb3 Remove unneeded #include that was a remnant of an earlier version of
my uidinfo patch.

Found by:	phk
2000-09-21 09:04:17 +00:00
Robert Watson
e084835893 o Add additional comment describing vaccess() behavior.
Requested by:	eivind
Reviewed by:	eivind, adrian
2000-09-20 17:18:12 +00:00
Peter Wemm
6413a4bc9d Fully initialize msqids[]. This could lead to ENOSPC and other strange
stuff.

PR: 21085
Submitted by:  Marcin Cieslak <saper@SYSTEM.PL>
2000-09-19 22:59:22 +00:00
Poul-Henning Kamp
b0d17ba69e Rename lminor() to dev2unit(). This function gives a linear unit number
which hides the 'hole' in the minor bits.

Introduce unit2minor() to do the reverse operation.

Fix some some make_dev() calls which didn't use UID_* or GID_* macros.

Kill the v_hashchain alias macro, it hides the real relationship.

Introduce experimental SI_CHEAPCLONE flag set it on cloned bpfs.
2000-09-19 10:28:44 +00:00
Paul Saab
b429049a5d Add new line character to debugging printf's. 2000-09-18 17:03:03 +00:00
Matthew N. Dodd
098b8a1eb0 Initialize 'hints_loaded' to 0.
This allows static hints to work properly.
2000-09-17 23:57:52 +00:00
Bruce Evans
33510ef17a Unpessimized CURSIG(). The fast path through CURSIG() was broken in
the 128-bit sigset_t changes by moving conditionally (rarely) executed
code to the beginning where it is always executed, and since this code
now involves 3 128-bit operations, the pessimization was relatively
large.  This change speeds up lmbench's pipe latency benchmark by
3.5%.

Fixed style bugs in CURSIG().
2000-09-17 15:12:04 +00:00
Bruce Evans
fbbeeb6cd6 Uninlined CURSIG() and unpolluted <sys/signalvar.h>. CURSIG() had become
very bloated, first with 128-bit sigset_t's, then with locking in the
SMP case, then with locking in all cases.  The space bloat was probably
also time bloat, partly because the fast path through CURSIG() was
pessimized by the sigset_t changes.  This change speeds up lmbench's
pipe-based latency benchmark by 4% on a Celeron.  <sys/signalvar.h>
had become very polluted to support the bloat.
2000-09-17 14:28:33 +00:00
Bruce Evans
621dbe43df Added used include of <sys/mutex.h> (don't depend on pollution in
<sys/signalvar.h>).
2000-09-17 12:20:49 +00:00
Boris Popov
3ff1a2f43e Add new flag PDIRUNLOCK to the component.cn_flags which should be set by
filesystem lookup() routine if it unlocks parent directory. This flag should
be carefully tracked by filesystems if they want to work properly with nullfs
and other stacked filesystems.

VFS takes advantage of this flag to perform symantically correct usage
of vrele() instead of vput() if parent directory already unlocked.

If filesystem fails to track this flag then previous codepath in VFS left
unchanged.

Convert UFS code to set PDIRUNLOCK flag if necessary. Other filesystmes will
be changed after some period of testing.

Reviewed in general by:	mckusick, dillon, adrian
Obtained from:	NetBSD
2000-09-17 07:26:42 +00:00
Poul-Henning Kamp
c866ec47e3 Make LINT compile. 2000-09-16 18:55:05 +00:00
Poul-Henning Kamp
fc87418be0 Turn dkcksum() into an __inline function.
Change its type to u_int_16_t.
2000-09-16 13:43:00 +00:00
John Baldwin
4cc6117e1d Remove some commented out cruft. 2000-09-15 23:00:46 +00:00
John Baldwin
7ab37af1ed - Add a new process flag P_NOLOAD that marks a process that should be
ignored during load average calcuations.
- Set this flag for the idle processes and the softinterrupt process.
2000-09-15 22:00:23 +00:00
John Baldwin
f6a0af8015 Idle processes are always runnable, so let them state at SRUN. 2000-09-15 19:49:48 +00:00
John Baldwin
db72809d24 Release Giant before starting up init.
Submitted by:	jake
2000-09-15 19:25:29 +00:00
Don Lewis
42fd51cedc Enforce process limit policy in one place to keep proccnt from diverging
from reality.
2000-09-14 23:07:39 +00:00
John Baldwin
606f8eb27a Remove the mtx_t, witness_t, and witness_blessed_t types. Instead, just
use struct mtx, struct witness, and struct witness_blessed.

Requested by:	bde
2000-09-14 20:15:16 +00:00
Jonathan Lemon
6f451c99b3 Pipes are not writeable while a direct write is in progress. However,
the kqueue filter got the sense of the test reversed, so fix it.

Spotted by:	Michael Elkins <me@sigpipe.org>
2000-09-14 20:10:19 +00:00
Eivind Eklund
98d39ed48b Add function comments for functions missing them 2000-09-14 19:13:59 +00:00
Eivind Eklund
1d95078aaf Blow away COMPAT_43 support for mount 2000-09-14 18:11:44 +00:00
Eivind Eklund
cb144e905c GC vax-only code 2000-09-14 16:51:47 +00:00
John Baldwin
9a94c9c5c3 - Remove the inthand2_t type and use the equivalent driver_intr_t type from
newbus for referencing device interrupt handlers.
- Move the 'struct intrec' type which describes interrupt sources into
  sys/interrupt.h instead of making it just be a x86 structure.
- Don't create 'ithd' and 'intrec' typedefs, instead, just use 'struct ithd'
  and 'struct intrec'
- Move the code to translate new-bus interrupt flags into an interrupt thread
  priority out of the x86 nexus code and into a MI ithread_priority()
  function in sys/kern/kern_intr.c.
- Remove now-uneeded x86-specific headers from sys/dev/ata/ata-all.c and
  sys/pci/pci_compat.c.
2000-09-13 18:33:25 +00:00
Bruce Evans
9c15b3c143 Fixed hang on booting with -d. mtx_enter() was called on an uninitialized
lock.  The quick fix in trap.c was not quite the version tested and had no
effect; back it out.
2000-09-13 12:40:43 +00:00
Boris Popov
6413416817 Unlock current directory when calling VFS_ROOT() because underlying
filesystem may hold the lock. Otherwise unavoidable deadlock will occur.
This shouldn't have any side effects as long as we hold vfs lock.

Obtained from:	NetBSD
2000-09-13 08:57:56 +00:00
John Baldwin
77044cb6d9 Clean up process accounting some more. Unfortunately, it is still not
quite right on i386 as the CPU who runs statclock() doesn't have a valid
clockframe to calculate statistics with.
2000-09-12 18:57:59 +00:00
Bruce Evans
bbbb2579b4 Quick fix for hang on booting with -d. mtx_enter() was called before
curproc was initialized.  curproc == NULL was interpreted as matching
the process holding Giant...  Just skip mtx_enter() and mtx_exit() in
trap() if (curproc == NULL && cold) (&& cold for safety).
2000-09-12 18:41:56 +00:00
Boris Popov
9ff5ce6baf Add three new VOPs: VOP_CREATEVOBJECT, VOP_DESTROYVOBJECT and VOP_GETVOBJECT.
They will be used by nullfs and other stacked filesystems to support full
cache coherency.

Reviewed in general by:	mckusick, dillon
2000-09-12 09:49:08 +00:00
John Baldwin
4a6404dfc1 Fix some printf format string warnings due to sizeof(int) != sizeof(long) on
the alpha.
2000-09-11 23:55:10 +00:00
Poul-Henning Kamp
5ef2707e6e revent multiple make_dev() calls on the same dev_t and similar bogosities.
A couple of new warnings may be emitted during boot if drivers DTWT.

Tested by:	George Cox <gjvc@gjvc.com>
2000-09-11 17:15:33 +00:00
John Baldwin
b162b45509 When doing statistics for statclock on other CPU's, use the other CPUs'
idleproc pointers instead of our own for comparisons.

Submitted by:	tegge
2000-09-11 04:10:29 +00:00
John Baldwin
a93a7807b2 aio processes need to have the Giant mutex before doing work.
Submitted by:	tegge
2000-09-11 04:06:48 +00:00
Jason Evans
69ef67f983 Add malloc_mtx to protect malloc and friends, so that they're thread-safe.
Reviewed by:	peter
2000-09-11 02:32:30 +00:00
Jake Burkholder
817bf5d4a6 Rename tsleep to msleep and add a mutex argument, which is
released before sleeping and re-acquired before msleep
returns.  A compatibility cpp macro has been provided for
tsleep to avoid changing all occurences of it in the kernel.

Remove an assertion that the Giant mutex be held before
calling tsleep or asleep.

This is intended to serve the same purpose as condition
variables, but does not preclude their addition in the
future.

Approved by:	jasone
Obtained from:	BSD/OS
2000-09-11 00:20:02 +00:00
Jason Evans
62820f25f5 Allow interrupt threads to run during shutdown. This should fix the
"dirty buffers during shutdown" problem introduced by the SMPng commit.

Submitted by:	tegge, cg
2000-09-10 23:06:50 +00:00
Doug Rabson
36240ea5bf Move the include of <sys/systm.h> so that KTR gets a declaration for
snprintf().
2000-09-10 13:54:52 +00:00
Doug Rabson
4eb38057ea Fix printf warnings in CTRx calls. 2000-09-10 13:34:35 +00:00
Doug Rabson
21ac8e0b77 Move the include of <sys/systm.h> so that KTR gets a declaration for
snprintf().
2000-09-10 13:33:31 +00:00
Poul-Henning Kamp
8925e63cd3 Updates to the ntp pll from John Hay.
Submitted by:	jhay
2000-09-10 09:13:34 +00:00
Boris Popov
67b23794b1 Change variable naming to be consistent with the rest of VFS code.
Reduce number of indirections by using already fetched values.
2000-09-10 03:46:12 +00:00
Jason Evans
28d4c2dde3 Back out the addition of malloc_mtx. It was incompletely conceived, and
will be done correctly in the future.
2000-09-10 01:54:15 +00:00
Jason Evans
5340642a2e Style cleanups. No functional changes. 2000-09-09 23:18:48 +00:00
Jason Evans
46bf3fe5a6 Add file and line arguments to WITNESS_ENTER() and WITNESS_EXIT, since
__FILE__ and __LINE__ don't get expanded usefully in inline functions.

Add const to all witness*() arguments that are filenames.
2000-09-09 22:43:22 +00:00
Jason Evans
9360d3ebdd Add a mutex to the malloc interfaces so that it can safely be called
without owning the Giant lock.
2000-09-09 22:27:35 +00:00
Poul-Henning Kamp
8d25eb2c3a Add code to devname(3) so it can find the names of devices which
were not present when dev_mkdb(8) was run.

First the dev_mkdb(8) database is searched, this caters for non-DEVFS
cases where people have renamed a device.

If that fails we ask the kernel using sysctl kern.devname if the device
driver has put a name in the dev_t.  This covers DEVFS cloned devices.

If that also fails we format a string which isn't entirely useless.
2000-09-09 11:39:59 +00:00
Jason Evans
12473b76dc Rename mtx_enter(), mtx_try_enter(), and mtx_exit() and wrap them with cpp
macros that expand to pass filename and line number information.  This is
necessary since we're using inline functions instead of macros now.

Add const to the filename pointers passed througout the mtx and witness
code.
2000-09-08 21:48:06 +00:00
John Baldwin
1baab78f9e Remove an unneeded extern declaration of cp_time. 2000-09-08 20:18:29 +00:00
Jake Burkholder
4ef34f39ec Really fix USER_LDT. (Don't use currentldt as an L-value.) 2000-09-08 03:36:09 +00:00
Jason Evans
0384fff8c5 Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*().  See mutex(9).  (Note: The
  alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
  preempted (i386 only).

Partially contributed by:	BSDi (BSD/OS)
Submissions by (at least):	cp, dfr, dillon, grog, jake, jhb, sheldonh
2000-09-07 01:33:02 +00:00
Jason Evans
62ae6c89ad Add KTR, a facility that logs kernel events in order to to facilitate
debugging.

Acquired from:	BSDi (BSD/OS)
Submitted by:	dfr, grog, jake, jhb
2000-09-07 01:29:44 +00:00
Don Lewis
c5930ee45a Change the calls to panic() in uifree(), chgproccnt(), and chgsbsize()
to printf().  Any errors detected are not likely to be fatal, so it
should be safe to let things keep running.
2000-09-06 19:00:19 +00:00
Alfred Perlstein
34b94e8b82 Accept filter maintainance
Update copyrights.

Introduce a new sysctl node:
  net.inet.accf

Although acceptfilters need refcounting to be properly (safely) unloaded
as a temporary hack allow them to be unloaded if the sysctl
net.inet.accf.unloadable is set, this is really for developers who want
to work on thier own filters.

A near complete re-write of the accf_http filter:
  1) Parse check if the request is HTTP/1.0 or HTTP/1.1 if not dump
     to the application.
     Because of the performance implications of this there is a sysctl
     'net.inet.accf.http.parsehttpversion' that when set to non-zero
     parses the HTTP version.
     The default is to parse the version.
  2) Check if a socket has filled and dump to the listener
  3) optimize the way that mbuf boundries are handled using some voodoo
  4) even though you'd expect accept filters to only be used on TCP
     connections that don't use m_nextpkt I've fixed the accept filter
     for socket connections that use this.

This rewrite of accf_http should allow someone to use them and maintain
full HTTP compliance as long as net.inet.accf.http.parsehttpversion is
set.
2000-09-06 18:49:13 +00:00
Peter Wemm
fc611b0634 Do not panic on an uninitialized VOP_xxx() call. This was meant as a
sanity check, but it is too easy to run into, eg: making an ACL syscall
when no filesystems have the ACL implementation enabled.

The original reason for the panic was that the VOP_ vector had not been
assigned and therefor could not be passed down the stack.. and there
was no point passing it down since nothing implemented it anyway.
vop_defaultop entries could not pass it on because it had a zero (unknown)
vector that was indistinguishable from another unknown VOP vector.

Anyway, we can do something reasonable in this case, we shouldn't need
to panic here as there is a reasonable recovery option (return EOPNOTSUPP
and dont pass it down the stack).

Requested by: rwatson
2000-09-06 17:51:54 +00:00
Robert Watson
728783c27a o Synchronize vaccess() capability access control checks with TrustedBSD
tree.

Obtained from:	TrustedBSD Project
2000-09-06 12:18:24 +00:00
David E. O'Brien
6b6821c771 The kernel is now known as `kernel.ko' and it and its matching modules
live in ``/boot/kernel/''.

Submitted by:	Hisashi Hiramoto <hiramoto@phys.chs.nihon-u.ac.jp>
2000-09-06 06:22:20 +00:00
Boris Popov
9548091b84 Ignore ELF files with 'interpreter' section because KLDs doesn't contain it.
Reviewed by:	peter
2000-09-06 02:21:43 +00:00
Don Lewis
f535380cb6 Remove uidinfo hash table lookup and maintenance out of chgproccnt() and
chgsbsize(), which are called rather frequently and may be called from an
interrupt context in the case of chgsbsize().  Instead, do the hash table
lookup and maintenance when credentials are changed, which is a lot less
frequent.  Add pointers to the uidinfo structures to the ucred and pcred
structures for fast access.  Pass a pointer to the credential to chgproccnt()
and chgsbsize() instead of passing the uid.  Add a reference count to the
uidinfo structure and use it to decide when to free the structure rather
than freeing the structure when the resource consumption drops to zero.
Move the resource tracking code from kern_proc.c to kern_resource.c.  Move
some duplicate code sequences in kern_prot.c to separate helper functions.
Change KASSERTs in this code to unconditional tests and calls to panic().
2000-09-05 22:11:13 +00:00
Poul-Henning Kamp
64dc16df4a Move extern declaration of dead_vnodeop_p to a .h file.
Remove race condition in vn_isdisk().
2000-09-05 21:09:56 +00:00
Robert Watson
e81c5f4307 o vn_extattr_set() will now call appropriate vn_start_write() and
vn_finished_write() if IO_NODELOCKED is not set.

Obtained from:	TrustedBSD Project
2000-09-05 03:15:02 +00:00
Robert Watson
b4d0de586d o Remove commented out code which modified return values from
extattr_{get,set} syscalls in the face of partial reads or writes.

Obtained from:	TrustedBSD Project
2000-09-05 02:13:14 +00:00
Peter Wemm
58da4602af When we are picking the next available unit number, specifically say
what we picked.  Otherwise it is anybody's guess as to where the
device ended up.
2000-09-05 00:30:46 +00:00
Poul-Henning Kamp
97804a5c99 Update the NTP kernel PLL code to the 2000-08-29 version of Dave Mills
nanokernel.

The FreeBSD private mode hardpps Type 2 PLL has been removed.
2000-09-04 08:19:32 +00:00