Commit Graph

3854 Commits

Author SHA1 Message Date
Boris Popov
71d8277b51 Prevent race condition by using msleep() instead of mtx_unlock()/tsleep().
Reviewed by:	alfred
2001-03-26 03:10:07 +00:00
Bosko Milekic
2ba1a89559 Move the atomic() mbstat.m_drops incrementing to the MGET(HDR) and
MCLGET macros in order to avoid incrementing the drop count twice.
Otherwise, in some cases, we may increment m_drops once in m_mballoc()
for example, and increment it again in m_mballoc_wait() if the
wait fails.
2001-03-24 23:47:52 +00:00
John Baldwin
1f723035c8 Use (..., "%s", foo) instead of (..., foo) to avoid a warning about a
non-constant format string when calling kthread_create() to create an
ithread.
2001-03-24 06:26:47 +00:00
Peter Wemm
32e479705a This is kind of a hack, but it should work. Currently, world is broken
because libc/rpc/key_call.c references uname(), and ps/print.c also
defines uname(), and ps is linked statically.  This leads to a symbol
clash.  The userland uname(3) kinda sucked anyway as the hostname
etc was too short.  And since the libc rpc interface now uses
the utsname.nodename which gets truncated, I was tempted into doing
something about it.  Create a new userland uname function, called
__xuname() which takes an extra argument that allows you to change
the size of the fields.  uname() becomes a static inline function
in sys/utsname.h that passes the extra argument in.  struct utsname
has its field members expanded by default now in userland.
We still provide a 'uname' externally linkable function for things
that either think that they ``know'' the utsname format and assume
32 character strings and bypass the include file, or objects that
are linked against old libcs.  ie: just about every plausible
case that I can think of is covered.  Should we ever change the
default lengths again, a libc major bump should not be required
as the size is now passed to the function.

XXX the uname(2) in the kernel is for FreeBSD 1.1 binary compatability!
All the uname(3) functions that are exported to userland are actually
implemented in libc with sysctl.  uname(1) uses sysctl directly and
does not call uname(3).

PR:		bin/4688
2001-03-24 04:40:49 +00:00
John Baldwin
bae3a80b16 Just use the proc lock to protect read accesses to p_pptr rather than the
more expensive proctree lock.
2001-03-24 04:00:01 +00:00
John Baldwin
8d2725181a Protect p_wmesg and p_wchan with sched_lock while checking for deadlocks
with other byte range file locks.
2001-03-24 03:57:44 +00:00
Alfred Perlstein
ec4dff5e50 replace calls to non-existant bail() subroutine with calls to
the die() builtin function.
2001-03-23 11:48:50 +00:00
Boris Popov
a91f68bca6 o Actually extract version of interface and store it along with the name.
o Add new parameter to the modlist_lookup() function to perform lookups
  with strict version matching.

o Collapse duplicate code to function(s).
2001-03-22 08:58:45 +00:00
Boris Popov
303b15f193 Slightly reorganize code in the linker_load_dependancies() function to make
codepath more straightforward.
2001-03-22 07:55:33 +00:00
Boris Popov
804f27299d Remove support for old way of handling module dependencies.
Approved by:	peter
2001-03-22 07:14:42 +00:00
Poul-Henning Kamp
71d033119f Make the pseudo-driver for "/dev/fd/*" handle fd's larger than 255.
PR:	25936
2001-03-20 13:26:13 +00:00
Poul-Henning Kamp
15b6f00fd1 Add a KASSERT on unit2minor() so that we catch it if people try to pass
us unit numbers which doesn't fit in 24 bits.
2001-03-20 13:24:24 +00:00
Bruce Evans
0abc15fd0b Fixed breakage of access() in rev.1.164. Wrong credentials were used for
the final path component.
2001-03-20 09:38:05 +00:00
Peter Wemm
439fea92c2 Use the same API as the example code.
Allow the initial hash value to be passed in, as the examples do.
Incrementally hash in the dvp->v_id (using the official api) rather than
add it.  This seems to help power-of-two predictable filename trees
where the filenames repeat on a power-of-two cycle and the directory trees
have power-of-two components in it.  The simple add then mask was causing
things like 12000+ entry collision chains while most other entries have
between 0 and 3 entries each.  This way seems to improve things.
2001-03-20 02:10:18 +00:00
Robert Watson
231b9e916a o Rename "namespace" argument to "attrnamespace" as namespace is a C++
reserved word.  Part 2 of syscalls.master commit to catch rebuilt
  files.

Submitted by:	jkh
Obtained from:	TrustedBSD Project
2001-03-19 05:48:58 +00:00
Robert Watson
3063207147 o Rename "namespace" argument to "attrnamespace" as namespace is a C++
reserved word.

Submitted by:	jkh
Obtained from:	TrustedBSD Project
2001-03-19 05:44:15 +00:00
Bosko Milekic
9612101e4c Fix a couple of things in the internal mbuf allocation interface:
- Make sure that m_mballoc() really doesn't allow over nmbufs mbufs to
  be allocated from mb_map. In the case where nmbufs-reserved space is not
  an exact multiple of PAGE_SIZE (which it should be, but anyway...), we
  hold nmbufs as an absolute maximum which need not ever be reached.

- Clean up m_clalloc(); make it more consistent in the sense that the first
  argument `ncl' really means "the number of clusters ensured to be allocated"
  and not "the number of pages worth of clusters to be allocated," as was
  previously the case. This also makes it consistent with m_mballoc() as well
  as the comment that preceeds it.

Reviewed by: jlemon
2001-03-17 23:23:24 +00:00
Peter Wemm
6eb39ac8fc Use a generic implementation of the Fowler/Noll/Vo hash (FNV hash).
Make the name cache hash as well as the nfsnode hash use it.

As a special tweak, create an unsigned version of register_t.  This allows
us to use a special tweak for the 64 bit versions that significantly
speeds up the i386 version (ie: int64 XOR int64 is slower than int64
XOR int32).

The code layout is a little strange for the string function, but I was
able to get between 5 to 10% improvement over the original version I
started with. The layout affects gcc code generation choices and this way
was fastest on x86 and alpha.

Note that 'CPUTYPE=p3' etc makes a fair difference to this.  It is
around 45% faster with -march=pentiumpro on a p6 cpu.
2001-03-17 09:31:06 +00:00
Jonathan Lemon
4d286823c5 When doing a recv(.. MSG_WAITALL) for a message which is larger than
the socket buffer size, the receive is done in sections.  After completing
a read, call pru_rcvd on the underlying protocol before blocking again.

This allows the the protocol to take appropriate action, such as
sending a TCP window update to the peer, if the window happened to
close because the socket buffer was filled.  If the protocol is not
notified, a TCP transfer may stall until the remote end sends a window
probe.
2001-03-16 22:37:06 +00:00
Peter Wemm
50e2347e68 Kill the 4MB kernel limit dead. [I hope :-)].
For UP, we were using $tmp_stk as a stack from the data section.  If the
kernel text section grew beyond ~3MB, the data section would be pushed
beyond the temporary 4MB P==V mapping.  This would cause the trampoline
up to high memory to fault.  The hack workaround I did was to use all of
the page table pages that we already have while preparing the initial
P==V mapping, instead of just the first one.
For SMP, the AP bootstrap process suffered the same sort of problem and
got the same treatment.

MFC candidate - this breaks on 4.x just the same..

Thanks to:	Richard Todd <rmtodd@ichotolot.servalan.com>
2001-03-15 05:10:06 +00:00
Peter Wemm
6fe01250f4 Jake essentially rewrote this. It is not by any stretch of the
imagination a derivative of what I did before.
2001-03-15 05:02:08 +00:00
Peter Wemm
043cc5a602 Regenerate after rwatson's commit to syscalls.master (rev 1.85) 2001-03-15 04:43:57 +00:00
Robert Watson
70f3685105 o Change the API and ABI of the Extended Attribute kernel interfaces to
introduce a new argument, "namespace", rather than relying on a first-
  character namespace indicator.  This is in line with more recent
  thinking on EA interfaces on various mailing lists, including the
  posix1e, Linux acl-devel, and trustedbsd-discuss forums.  Two namespaces
  are defined by default, EXTATTR_NAMESPACE_SYSTEM and
  EXTATTR_NAMESPACE_USER, where the primary distinction lies in the
  access control model: user EAs are accessible based on the normal
  MAC and DAC file/directory protections, and system attributes are
  limited to kernel-originated or appropriately privileged userland
  requests.

o These API changes occur at several levels: the namespace argument is
  introduced in the extattr_{get,set}_file() system call interfaces,
  at the vnode operation level in the vop_{get,set}extattr() interfaces,
  and in the UFS extended attribute implementation.  Changes are also
  introduced in the VFS extattrctl() interface (system call, VFS,
  and UFS implementation), where the arguments are modified to include
  a namespace field, as well as modified to advoid direct access to
  userspace variables from below the VFS layer (in the style of recent
  changes to mount by adrian@FreeBSD.org).  This required some cleanup
  and bug fixing regarding VFS locks and the VFS interface, as a vnode
  pointer may now be optionally submitted to the VFS_EXTATTRCTL()
  call.  Updated documentation for the VFS interface will be committed
  shortly.

o In the near future, the auto-starting feature will be updated to
  search two sub-directories to the ".attribute" directory in appropriate
  file systems: "user" and "system" to locate attributes intended for
  those namespaces, as the single filename is no longer sufficient
  to indicate what namespace the attribute is intended for.  Until this
  is committed, all attributes auto-started by UFS will be placed in
  the EXTATTR_NAMESPACE_SYSTEM namespace.

o The default POSIX.1e attribute names for ACLs and Capabilities have
  been updated to no longer include the '$' in their filename.  As such,
  if you're using these features, you'll need to rename the attribute
  backing files to the same names without '$' symbols in front.

o Note that these changes will require changes in userland, which will
  be committed shortly.  These include modifications to the extended
  attribute utilities, as well as to libutil for new namespace
  string conversion routines.  Once the matching userland changes are
  committed, a buildworld is recommended to update all the necessary
  include files and verify that the kernel and userland environments
  are in sync.  Note: If you do not use extended attributes (most people
  won't), upgrading is not imperative although since the system call
  API has changed, the new userland extended attribute code will no longer
  compile with old include files.

o Couple of minor cleanups while I'm there: make more code compilation
  conditional on FFS_EXTATTR, which should recover a bit of space on
  kernels running without EA's, as well as update copyright dates.

Obtained from:	TrustedBSD Project
2001-03-15 02:54:29 +00:00
Søren Schmidt
b417a1a8c8 Dont call device close and ioctl functions if device has disappeared.
Reviewed by: phk
2001-03-13 08:45:05 +00:00
Dag-Erling Smørgrav
9cbd039343 Assert that the process we're trying to enqueue isn't already there. 2001-03-11 18:57:30 +00:00
Alan Cox
136446540a When aio_read/write() is used on a raw device, physical buffers are
used for up to "vfs.aio.max_buf_aio" of the requests.  If a request
size is MAXPHYS, but the request base isn't page aligned, vmapbuf()
will map the end of the user space buffer into the start of the kva
allocated for the next physical buffer.  Don't use a physical buffer
in this case.  (This change addresses problem report 25617.)

When an aio_read/write() on a raw device has completed, timeout() is
used to schedule a signal to the process.  Thus, the reporting is
delayed up to 10 ms (assuming hz is 100).  The process might have
terminated in the meantime, causing a trap 12 when attempting to
deliver the signal.  Thus, the timeout must be cancelled when removing
the job.

aio jobs in state JOBST_JOBQGLOBAL should be removed from the
kaio_jobqueue list during process rundown.

During process rundown, some aio jobs might move from one list to a
different list that has already been "emptied", causing the rundown to
be incomplete.  Retry the rundown.

A call to BUF_KERNPROC() is needed after obtaining a physical buffer
to disassociate the lock from the running process since it can return
to userland without releasing that lock.

PR:		25617
Submitted by:	tegge
2001-03-10 22:47:57 +00:00
Alfred Perlstein
9708152c20 Don't call malloc with M_WAITOK while holding a mutex. 2001-03-09 18:40:34 +00:00
Jonathan Lemon
c0647e0d07 Push the test for a disconnected socket when accept()ing down to the
protocol layer.  Not all protocols behave identically.  This fixes the
brokenness observed with unix-domain sockets (and postfix)
2001-03-09 08:16:40 +00:00
John Baldwin
5db078a9be Fix mtx_legal2block. The only time that it is bad to block on a mutex is
if we hold a spin mutex, since we can trivially get into deadlocks if we
start switching out of processes that hold spinlocks.  Checking to see if
interrupts were disabled was a sort of cheap way of doing this since most
of the time interrupts were only disabled when holding a spin lock.  At
least on the i386.  To fix this properly, use a per-process counter
p_spinlocks that counts the number of spin locks currently held, and
instead of checking to see if interrupts are disabled in the witness code,
check to see if we hold any spin locks.  Since child processes always
start up with the sched lock magically held in fork_exit(), we initialize
p_spinlocks to 1 for child processes.  Note that proc0 doesn't go through
fork_exit(), so it starts with no spin locks held.

Consulting from:	cp
2001-03-09 07:24:17 +00:00
Alan Cox
c9a970a79f Use the kthread API to create and destroy AIO daemons.
Submitted by:	jhb
2001-03-09 06:27:01 +00:00
John Baldwin
3a3f608288 Add a new informative KASSERT to ensure that a process is in the SRUN state
before we return it to cpu_switch().
2001-03-09 03:59:50 +00:00
Bosko Milekic
4bde2ac539 Fix is a similar race condition as existed in the mbuf code. When we go
into an interruptable sleep and we increment a sleep count, we make sure
that we are the thread that will decrement the count when we wakeup.
Otherwise, what happens is that if we get interrupted (signal) and we
have to wake up, but before we get our mutex, some thread that wants
to wake us up detects that the count is non-zero and so enters wakeup_one(),
but there's nothing on the sleep queue and so we don't get woken up. The
thread will still decrement the sleep count, which is bad because we will
also decrement it again later (as we got interrupted) and are already off
the sleep queue.
2001-03-08 19:21:45 +00:00
David Malone
2239c07de9 Make the wait for sendfile buffers interruptable. Stops one process
consuming them all and then getting stuck.

Reviewed by:	dg
Reviewed by:	bmilekic
Observed by:	Andreas Persson <pap@garen.net>
2001-03-08 16:28:10 +00:00
Thomas Moestl
3a51557243 Make the SYSCTL_OUT handlers sysctl_old_user() and sysctl_old_kernel()
more robust. They would correctly return ENOMEM for the first time when
the buffer was exhausted, but subsequent calls in this case could cause
writes ouside of the buffer bounds.

Approved by:	rwatson
2001-03-08 01:20:43 +00:00
Kirk McKusick
589c7af992 Fixes to track snapshot copy-on-write checking in the specinfo
structure rather than assuming that the device vnode would reside
in the FFS filesystem (which is obviously a broken assumption with
the device filesystem).
2001-03-07 07:09:55 +00:00
Kirk McKusick
393d77ffad Bitch more loudly when someone botches changes to kinfo_proc
in the hopes that they will actually *read* the comment above
it and *follow* the instructions so as to cause all the rest
of us less a lot less grief.
2001-03-07 06:52:12 +00:00
John Baldwin
5641ae5dc3 - Don't hold the proc lock across VREF and the fd* functions to avoid lock
order reversals.
- Add some preliminary locking in the !RF_PROC case.
- Protect p_estcpu with sched_lock.
2001-03-07 05:21:47 +00:00
John Baldwin
f227364a17 - Release Giant a bit earlier on syscall exit.
- Don't try to grab Giant before postsig() in userret() as it is no longer
  needed.
- Don't grab Giant before psignal() in ast() but get the proc lock instead.
2001-03-07 03:53:39 +00:00
John Baldwin
19eb87d22a Grab the process lock while calling psignal and before calling psignal. 2001-03-07 03:37:06 +00:00
John Baldwin
15e9ec5153 Proc locking including using proc lock in place of proctree where
appropriate and locking processes while we signal them.
2001-03-07 03:28:50 +00:00
John Baldwin
e65897c381 Proc locking. 2001-03-07 03:27:32 +00:00
John Baldwin
28aa95b6ee Use the proc lock to protect access to p_sigacts->ps_sigintr. 2001-03-07 03:26:39 +00:00
John Baldwin
731a1aea4c - Proc locking.
- Remove some unneeded spl()'s.
2001-03-07 03:06:18 +00:00
John Baldwin
378240232a Lock the process while sending it SIGARLM and updating p_realtimer. 2001-03-07 03:02:56 +00:00
John Baldwin
eed4805444 - Proc locking.
- Remove unneeded spl()'s.
2001-03-07 03:01:53 +00:00
John Baldwin
628d2653d6 - Proc locking. Most of signal handling is now MP safe and doesn't require
Giant.  The only exception is the CANSIGNAL() macro.  Unlocking the proc
  lock around sendsig() in trapsignal() is also questionable.  Note that
  the functions sigexit(), psignal(), and issignal() must be called with
  the proc lock of the process in question held.  postsig() and
  trapsignal() should not be called with the proc lock held, but they
  also do not require Giant anymore either.
- Remove spl's that are now no longer needed as they are fully replaced.
2001-03-07 02:59:54 +00:00
John Baldwin
87729a2b64 Lock initproc when we send SIGINT to init during shutdown. 2001-03-07 02:50:09 +00:00
John Baldwin
1b43703b47 - Add an extra check in priority_propagation() for UP systems to ensure we
don't end up back at ourselves which would indicate deadlock.
- Add the proc lock to the witness dup_list as we may hold more than one
  process lock at a time.
- Don't assert a mutex is owned in _mtx_unlock_sleep() as that is too late.
  We do the checks in the macros instead.
2001-03-07 02:45:15 +00:00
John Baldwin
6451855f6d - Use _PHOLD and move it before a PROC_UNLOCK to reduce the number of
mutex operations in kthread_create().
- Lock a kthread's proc before changing its parent via proc_reparent().
- Test P_KTHREAD not P_SYSTEM in kthread_suspend() and kthread_resume().
  P_SYSTEM just means that the process shouldn't be swapped and is used
  for vinum's daemon for example.
- Lock all the signal state used for suspending and resuming kthreads with
  the proc lock.
2001-03-07 02:36:47 +00:00
John Baldwin
57934cd3c8 - Lock the forklist with an sx lock.
- Add proc locking to fork1().  Always lock the child procoess (new
  process) first when both processes need to be locked at the same
  time.
- Remove unneeded spl()'s as the data they protected is now locked.
- Ensure that the proctree is exclusively locked and the new process is
  locked when setting up the parent process pointer.
- Lock the check for P_KTHREAD in p_flag in fork_exit().
2001-03-07 02:30:39 +00:00
John Baldwin
2aa33d2f1e Check to see if p_fd is NULL before derferencing it in checkdirs(). It's
possible for us to see a process in the early stages of fork before p_fd
has been initialized.  Ideally, we wouldn't stick a process on the allproc
list until it was fully created however.
2001-03-07 02:25:13 +00:00
John Baldwin
c65437a326 - Call proc_reparent() when handing a process off to init in exit rather
than dinking around in the process lists explicitly.
- Hold both the proctree lock and proc lock of the child process when
  reparenting a process via proc_reparent.
- Lock processes while sending them signals.
- Miscellaenous proc locking.
- proc_reparent() now asserts that the child is locked in addition to an
  exclusive proctree lock.
2001-03-07 02:22:31 +00:00
John Baldwin
7331c2a252 In order to avoid recursing on the backing mutex for sx locks in the
INVARIANTS case, define the actual KASSERT() in _SX_ASSERT_[SX]LOCKED
macros that are used in the sx code itself and convert the
SX_ASSERT_[SX]LOCKED macros to simple wrappers that grab the mutex for the
duration of the check.
2001-03-06 23:13:15 +00:00
Dag-Erling Smørgrav
cab5b963a0 Make the KASSERTs report the correct function names.
Fix two off-by-one errors that would sometimes cause the final length of
the sbuf to include the trailing zero.
2001-03-06 17:48:26 +00:00
Robert Watson
5293465fef o Introduce filesystem-independent POSIX.1e ACL utility routines to
support implementations of ACLs in file systems.  Introduce the
  following new functions:

      vaccess_acl_posix1e()          vaccess() that accepts an ACL
      acl_posix1e_mode_to_perm()     Convert mode bits to ACL rights
      acl_posix1e_mode_to_entry()    Build ACL entry from mode/uid/gid
      acl_posix1e_perms_to_mode()    Generate file mode from ACL
      acl_posix1e_check()            Syntax verification for ACL

  These functions allow a file system to rely on central ACL evaluation
  and syntax checking, as well as providing useful utilities to
  allow ACL-based file systems to generate mode/owner/etc information
  to return via VOP_GETATTR(), and to support file systems that split
  their ACL information over their existing inode storage (mode, uid,
  gid) and extended ACL into extended attributes (additional users,
  groups, ACL mask).

o Add prototypes for exported functions to sys/acl.h, sys/vnode.h

Reviewed by:	trustedbsd-discuss, freebsd-arch
Obtained from:	TrustedBSD Project
2001-03-06 17:28:24 +00:00
Alan Cox
9c8a2647f6 Add a missing splx() to aio_fphysio(). (This change is a no-op in -5.0,
but potentially significant in -4.x.)

Eliminate a pointless parameter to aio_fphysio().

Remove unnecessary casts from aio_fphysio() and aio_physwakeup().
2001-03-06 15:54:38 +00:00
Bosko Milekic
af76144992 - Add sx_descr description member to sx lock structure
- Add sx_xholder member to sx struct which is used for INVARIANTS-enabled
  assertions. It indicates the thread that presently owns the xlock.
- Add some assertions to the sx lock code that will detect the fatal
  API abuse:
     xlock --> xlock
     xlock --> slock
  which now works thanks to sx_xholder.
  Notice that the remaining two problematic cases:
     slock --> xlock
     slock --> slock (a little less problematic, but still recursion)
  will need to be handled by witness eventually, as they are more
  involved.

Reviewed by: jhb, jake, jasone
2001-03-06 06:17:05 +00:00
Jason Evans
6281b30a73 Implement shared/exclusive locks.
Reviewed by:	bmilekic, jake, jhb
2001-03-05 19:59:41 +00:00
Alan Cox
88ed460e6b Eliminate the aio_freejobs list. Its purpose was to store free
aiocb's allocated by zalloc().  In other words, zfree() was never
 called.  Now, we call zfree().  Why eliminate this micro-
 optimization?  At some later point, when we multithread the AIO
 system, we would need a mutex to synchronize access to aio_freejobs,
 making its use nearly indistinguishable in cost from zalloc() and
 zfree().

Remove unnecessary fhold() and fdrop() calls from aio_qphysio(),
 undo'ing a part of revision 1.86.  The reference count on the file
 structure is already incremented by _aio_aqueue() before it calls
 aio_qphysio().  (Update the comments to document this fact.)

Remove unnecessary casts from _aio_aqueue(), aio_read(), aio_write()
 and aio_waitcomplete().

Remove an unnecessary "return;" from aio_process().

Add "static" in various places.
2001-03-05 01:30:23 +00:00
David E. O'Brien
828c9e13a3 Do not set a default ELF syscall ABI fallback.
If one runs an un-branded Linux static binary that calls Linux's fcntl
the machine will reboot when interupted by the FreeBSD syscall ABI.
2001-03-04 11:58:50 +00:00
Assar Westerlund
3617ddfc33 implement OCRNL, ONOCR, and ONLRET
Obtained from:	NetBSD
2001-03-04 06:04:50 +00:00
Alan Cox
fb579e9a61 Remove the field privatemodes from struct __aiocb_private and the
related code from aio_read() and aio_write().  This field was
intended, but never used, to allow a mythical user-level library to
make an aio_read() or aio_write() behave like an ordinary read() or
write(), i.e., a blocking I/O operation.
2001-03-04 01:22:23 +00:00
Adrian Chadd
fbedc11796 Mismatched MFSNAMELEN and MNAMELEN with fstype / fspath.
Submitted by:	Naoki Kobayashi <shibata@geo.titech.ac.jp>
2001-03-02 14:05:49 +00:00
John Baldwin
003fb9ec2f Ok, the kernel will panic in kmem_malloc() if the kernel map is full, so
malloc with M_WAITOK can't actually return NULL.  I wish I could get two
people to give me the same answer about this when I ask...

Submitted by:	jake
2001-03-02 06:07:38 +00:00
John Baldwin
653dd8c243 - Check to see if malloc() returned NULL even with M_WAITOK.
- Add a KASSERT() to ensure an ithread has a backing kernel thread when we
  schedule it.
- Don't attempt to preemptively switch to an ithread if p_stat of curproc
  is not SRUN.
2001-03-02 05:33:03 +00:00
Adrian Chadd
f3a90da995 Reviewed by: jlemon
An initial tidyup of the mount() syscall and VFS mount code.

This code replaces the earlier work done by jlemon in an attempt to
make linux_mount() work.

* the guts of the mount work has been moved into vfs_mount().

* move `type', `path' and `flags' from being userland variables into being
  kernel variables in vfs_mount(). `data' remains a pointer into
  userspace.

* Attempt to verify the `type' and `path' strings passed to vfs_mount()
  aren't too long.

* rework mount() and linux_mount() to take the userland parameters
  (besides data, as mentioned) and pass kernel variables to vfs_mount().
  (linux_mount() already did this, I've just tidied it up a little more.)

* remove the copyin*() stuff for `path'. `data' still requires copyin*()
  since its a pointer into userland.

* set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each
  filesystem.  This variable is generally initialised with `path', and
  each filesystem can override it if they want to.

* NOTE: f_mntonname is intiailised with "/" in the case of a root mount.
2001-03-01 21:00:17 +00:00
Ian Dowse
a90ef2ae0f The kernel did not hold a vnode reference associated with the
`rootvnode' pointer, but vfs_syscalls.c's checkdirs() assumed that
it did. This bug reliably caused a panic at reboot time if any
filesystem had been mounted directly over /.

The checkdirs() function is called at mount time to find any process
fd_cdir or fd_rdir pointers referencing the covered mountpoint
vnode. It transfers these to point at the root of the new filesystem.
However, this process was not reversed at unmount time, so processes
with a cwd/root at a mount point would unexpectedly lose their
cwd/root following a mount-unmount cycle at that mountpoint.

This change should fix both of the above issues. Start_init() now
holds an extra vnode reference corresponding to `rootvnode', and
dounmount() releases this reference when the root filesystem is
unmounted just before reboot. Dounmount() now undoes the actions
taken by checkdirs() at mount time; any process cdir/rdir pointers
that reference the root vnode of the unmounted filesystem are
transferred to the now-uncovered vnode.

Reviewed by:	bde, phk
2001-02-28 20:54:28 +00:00
Julian Elischer
a96dcd84d2 Shuffle netgraph mutexes a bit and hold a reference on a node
from the function that is calling the destructor.
2001-02-28 18:49:09 +00:00
Matthew Dillon
63692125a9 Fix lockup for loopback NFS mounts. The pipelined I/O limitations could be
hit on the client side and prevent the server side from retiring writes.
Pipeline operations turned off for all READs (no big loss since reads are
usually synchronous) and for NFS writes, and left on for the default bwrite().
(MFC expected prior to 4.3 freeze)

Testing by: mjacob, dillon
2001-02-28 04:13:11 +00:00
Jake Burkholder
5b270b2a55 Sigh. Try to get priorities sorted out. Don't bother trying to
update native priority, it is diffcult to get right and likely
to end up horribly wrong.  Use an honestly wrong fixed value
that seems to work; PUSER for user threads, and the interrupt
priority for ithreads.  Set it once when the process is created
and forget about it.

Suggested by:	bde
Pointy hat:	me
2001-02-28 02:53:44 +00:00
Jonathan Lemon
ea0237ed11 Correctly declare variables as u_int rather than doing typecasts.
Kill some register declarations while I'm here.

Submitted by:  bde (1)
2001-02-27 15:11:31 +00:00
Ruslan Ermilov
8ac6dca795 In soshutdown(), use SHUT_{RD,WR,RDWR} instead of FREAD and FWRITE.
Also, return EINVAL if `how' is invalid, as required by POSIX spec.
2001-02-27 13:48:07 +00:00
Jonathan Lemon
0b7088c4d0 Cast nfds to u_int before range checking it in order to catch negative
values.

PR:	25393
2001-02-27 00:50:20 +00:00
Jake Burkholder
be15bfc091 Initialize native priority to PRI_MAX. It was usually 0 which made a
process's priority go through the roof when it released a (contested)
mutex.  Only set the native priority in mtx_lock if hasn't already
been set.

Reviewed by:	jhb
2001-02-26 23:27:35 +00:00
Jake Burkholder
a10f496636 Remove brackets around variables in a function that used to be
a macro.
2001-02-25 16:18:13 +00:00
Peter Wemm
d6df01d823 Make this compile in a.out mode. link.h has extra dependencies for a.out. 2001-02-25 07:26:54 +00:00
Peter Wemm
1a5f13cfbf Manually add an extra _ to _DYNAMIC since it is provided by ld, not gcc.
Make the rest compile.
2001-02-25 07:25:05 +00:00
Bosko Milekic
096e2dd9d8 Remove superfluous m_pkthdr.rcv_if = NULL assignment following
m_gethdr() mbuf allocation, which already does this for us.
2001-02-25 06:33:50 +00:00
Julian Elischer
7433466190 Move netgraph spimlock order entries out of
the #ifdef SMP section. They need to be there for UP too.
2001-02-25 04:56:23 +00:00
Jake Burkholder
631d7bf3da - Rename the lcall system call handler from Xsyscall to Xlcall_syscall
to be more like Xint0x80_syscall and less like c function syscall().
- Reduce code duplication between the int0x80 and lcall handlers by
  shuffling the elfags into the right place, saving the sizeof the
  instruction in tf_err and jumping into the common int0x80 code.

Reviewed by:	peter
2001-02-25 02:53:06 +00:00
David E. O'Brien
21a3ee0ead MFS: bring the consistent `compat_3_brand' support into -CURRENT
(the work was first done in the RELENG_4 branch near a release
	 during a MFC to make the code cleaner and more consistent)
2001-02-24 22:20:11 +00:00
John Baldwin
1103f3b05b Grrr, s/INVARIANTS_SUPPORT/INVARIANT_SUPPORT/. 2001-02-24 21:29:32 +00:00
John Baldwin
15ec816acc - Axe RETIP() as it was very i386 specific and unwieldy. Instead, use the
passed in filename and line number in the KTR tracepoint message.
- Even though it is #if 0'd code, change the code to detect that a process
  is an interrupt thread to check p->p_ithd against NULL rather than
  checking non-existant process flags from BSD/OS.
- Use '%p' to print pointers in KTR log messages instead of assuming
  sizeof(int) == sizeof(void *).
- Don't set p_mtxname to NULL when releasing a mutex.  It doesn't hurt
  to leave it set (we don't clear w_mesg for example) and at least at
  one time in the past, there used to be race conditions in the kernel
  that would result in setting this to NULL causing the kernel to
  dereference NULL.
- Make the _mtx_assert() function be compiled in if INVARIANTS_SUPPORT is
  defined rather than if INVARIANTS is defined so that a KLD compiled
  with INVARIANTS that uses mtx_assert() can be used with a kernel that
  just has INVARIANT_SUPPORT compiled in.
2001-02-24 19:36:13 +00:00
Boris Popov
d8589bd5cb Introduce API for sequential reads/writes (build/dissect) of mbuf chains.
Reviewed by:	Ian Dowse <iedowse@maths.tcd.ie>,
		Bosko Milekic <bmilekic@technokratis.com>,
		Julian Elischer <julian@elischer.org> and arch@/net@
Obtained from:	smbfs
2001-02-24 15:44:30 +00:00
Julian Elischer
33338e7370 Add knowledge of the netgraph spinlocks into the Witness code.
Well, at least I think that's how it's done.
2001-02-24 14:29:47 +00:00
Jake Burkholder
f32ded2fb5 - Assert that the proc to return is not NULL in runq_choose the
same as runq_remove.
- bzero the whole struct runq in runq_init just in case its not
  statically allocated.
2001-02-24 14:06:36 +00:00
John Baldwin
130c1f25a4 It turns out the kernel console works fine and thus doesn't need quite this
much extra testing.
2001-02-24 03:40:23 +00:00
Jonathan Lemon
24607d88ed Add an EV_SET() convenience macro for initializing struct kevent prior
to the call to kevent().

Update the copyright notices as well.
2001-02-24 01:44:03 +00:00
Jonathan Lemon
da403b9df8 Introduce a NOTE_LOWAT flag for use with the read/write filters, which
allow the watermark to be passed in via the data field during the EV_ADD
operation.

Hook this up to the socket read/write filters; if specified, it overrides
the so_{rcv|snd}.sb_lowat values in the filter.

Inspired by: "Ronald F. Guilmette" <rfg@monkeys.com>
2001-02-24 01:41:31 +00:00
Jonathan Lemon
b07540c837 When returning EV_EOF for the socket read/write filters, also return
the current socket error in fflags.  This may be useful for determining
why a connect() request fails.

Inspired by:  "Jonathan Graehl" <jonathan@graehl.org>
2001-02-24 01:33:12 +00:00
Peter Wemm
3e688165a9 Stricter style(9) conformance - remove unnecessary blank lines in previous
commit.
2001-02-23 23:05:46 +00:00
Jonathan Lemon
89bbe051bb Fix typo in comment (knode -> knote). 2001-02-23 20:32:42 +00:00
Jonathan Lemon
7df2842dee Add a NOTE_REVOKE flag for vnodes, which is triggered from within vclean().
Use this to tell a filter attached to a vnode that the underlying vnode is
no longer valid, by returning EV_EOF.

PR: kern/25309, kern/25206
2001-02-23 20:06:01 +00:00
John Baldwin
0b1d793211 Test out the kernel console just before launching the AP's. 2001-02-23 19:44:25 +00:00
Peter Wemm
f1532aadee Activate USER_LDT by default. The new thread libraries are going to
depend on this.  The linux ABI emulator tries to use it for some linux
binaries too.  VM86 had a bigger cost than this and it was made default
a while ago.

Reviewed by:	jhb, imp
2001-02-23 01:25:02 +00:00
Tor Egge
9d0ddf1861 Streamline updating of switchtime (don't copy code from kern_sync.c).
Submitted by:	jhb
2001-02-22 20:16:51 +00:00
Tor Egge
35030da9f8 Backout previous commit. sched_lock is held, thus interrupts are prevented
here.

Submitted by:	jhb
2001-02-22 20:12:52 +00:00
Tor Egge
0d139b3741 Protect update of the per processor switchtime variable against
interrupts.

Protect usage of the per processor switchtime variable against
interrupts in calcru().

This seem to eliminate the "microuptime() went backwards" warnings.
2001-02-22 19:50:37 +00:00
John Baldwin
feb43c5f37 The p_md.md_regs member of proc is used in signal handling to reference
the the original trapframe of the syscall, trap, or interrupt that entered
the kernel.  Before SMPng, ast's were handled via a psuedo trap at the
end of doerti.  With the SMPng commit, ast's were broken out into a
separate ast() function that was called from doreti to match the behavior
of other architectures.  Unfortunately, when this was done, the
p_md.md_regs member of curproc was not updateda in ast(), thus when
signals are handled by userret() after an interrupt that returns to
userland, we end up using a stale trapframe that will result in the
registers from the old trapframe overwriting the real trapframe and
smashing all the registers right before we return to usermode.  The saved
%cs:%eip from where we were in usermode are saved in the trapframe for
example.
2001-02-22 19:35:20 +00:00
John Baldwin
51c9129957 Since the PC is a pointer to a code address, change the second parameter of
addupc_task() and addupc_intr() to be a uintptr_t instead of a u_long.
2001-02-22 18:07:31 +00:00
John Baldwin
f308e0d714 - Change ast() to take a pointer to a trapframe like other architectures.
- Don't use an atomic operation to update cnt.v_soft in ast().  This is
  the only place the variable is written to, and sched_lock is always
  held when it is written, so it is already protected and the mutex release
  of sched_lock asserts a memory barrier that ensures the value will be
  updated in a timely fashion.
2001-02-22 18:05:15 +00:00
John Baldwin
26f9f5c7c7 - Use TRAPF_PC() on the alpha to acess the PC in the trap frame.
- Don't hold sched_lock around addupc_task() as this apparently breaks
  profiling badly due to sched_lock being held across copyin().

Reported by:	bde (2)
2001-02-22 16:23:12 +00:00
John Baldwin
c978f49e20 Add a mtx_assert() in maybe_resched() just to be sure it's always called
with sched_lock held.
2001-02-22 13:47:01 +00:00
John Baldwin
3a18729505 Lock need_resched with sched_lock.
Reported by:	des
2001-02-22 13:46:09 +00:00
John Baldwin
de271f01c2 Work around a race condition where an interrupt handler can be removed from
an interrupt thread while the interrupt thread is blocked on Giant waiting
to execute the interrupt handler being removed.  The result was that the
intrhand structure would be free'd, and we would call 0xdeadc0de.  The work
around is to check to see if the interrupt thread is idle when removing a
handler.  If not, then we mark the interrupt handler as being dead using
the new IH_DEAD flag and don't remove it from the interrupt threads' list
of handlers.  When the interrupt thread resumes, it will see a dead handler
while traversing the list of handlers and will remove the handler then.
2001-02-22 02:18:32 +00:00
John Baldwin
60f2b032fe Just use the ithread->it_proc directly in a KTR tracepoint instead of
assigning a local var to it and using it, as otherwise the local var wasn't
used, and generated a warning in the !KTR case.

Noticed by:	bde
2001-02-22 02:15:57 +00:00
John Baldwin
addec20c38 Add KTR tracepoints for adding/removing interrupt handlers,
creating/destroying interrupt threads, and updating the state of an
interrupt thread.
2001-02-22 02:14:08 +00:00
John Baldwin
25d209f260 - Use the NOCPU constant.
- Move the ithread spin locks before sched lock and clk in preparation for
  future commits to the ithread code.
2001-02-22 02:12:54 +00:00
John Baldwin
9764c9d36e Quiet a warning with a uintptr_t cast.
Noticed by:	bde
2001-02-22 02:10:33 +00:00
John Baldwin
5a93f3e851 - Use the new NOCPU constant.
- Fix a warning.

Noticed by:	bde (2)
2001-02-22 00:32:13 +00:00
John Baldwin
76bd604e7d Fix a bug where the 'ithread' variable was being set in a KASSERT()
condition and thus was not initialized properly in the !INVARIANTS case.

Noticed by:	bde
Pointy hat to:	me
2001-02-22 00:23:56 +00:00
John Baldwin
719f43d3df Remove attempt to add in PREEMPTION #ifdef test in MI code that didn't
work because opt_preemption.h wasn't #include'd.  Instead, make use of the
do_switch parameter to ithread_schedule() and do the check in the alpha
interrupt code.
2001-02-21 22:51:00 +00:00
Boris Popov
03137ec82e Fix parameter order in the calls to MGET(). 2001-02-21 09:24:13 +00:00
Robert Watson
91421ba234 o Move per-process jail pointer (p->pr_prison) to inside of the subject
credential structure, ucred (cr->cr_prison).
o Allow jail inheritence to be a function of credential inheritence.
o Abstract prison structure reference counting behind pr_hold() and
  pr_free(), invoked by the similarly named credential reference
  management functions, removing this code from per-ABI fork/exit code.
o Modify various jail() functions to use struct ucred arguments instead
  of struct proc arguments.
o Introduce jailed() function to determine if a credential is jailed,
  rather than directly checking pointers all over the place.
o Convert PRISON_CHECK() macro to prison_check() function.
o Move jail() function prototypes to jail.h.
o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the
  flag in the process flags field itself.
o Eliminate that "const" qualifier from suser/p_can/etc to reflect
  mutex use.

Notes:

o Some further cleanup of the linux/jail code is still required.
o It's now possible to consider resolving some of the process vs
  credential based permission checking confusion in the socket code.
o Mutex protection of struct prison is still not present, and is
  required to protect the reference count plus some fields in the
  structure.

Reviewed by:	freebsd-arch
Obtained from:	TrustedBSD Project
2001-02-21 06:39:57 +00:00
Tor Egge
d82b3e319a Ensure that RLIMIT_NPROC limits are at least 1 to avoid bad interaction
with chgproccnt.  MFC candiate.

Reviewed by:	alfred
2001-02-20 23:34:16 +00:00
John Baldwin
62d654c142 - In the KTR_EXTEND case, use a const char * to point to the passed in
filename insteada of copying the first 32 characters of it.
- Add in const modifiers for the passed in format strings and filenames
  and their respective members in the ktr_entry struct.
2001-02-20 10:39:55 +00:00
John Baldwin
3e5da75445 - Add a new ithread_schedule() function to do the bulk of the work of
scheduling an interrupt thread to run when needed.  This has the side
  effect of enabling support for entropy gathering from interrupts on
  all architectures.
- Change the software interrupt and x86 and alpha hardware interrupt code
  to use ithread_schedule() for most of their processing when scheduling
  an interrupt to run.
- Remove the pesky Warning message about interrupt threads having entropy
  enabled.  I'm not sure why I put that in there in the first place.
- Add more error checking for parameters and change some cases that
  returned EINVAL to panic on failure instead via KASSERT().
- Instead of doing a documented evil hack of setting the P_NOLOAD flag
  on every interrupt thread whose pri was SWI_CLOCK, set the flag
  explicity for clk_ithd's proc during start_softintr().
2001-02-20 10:25:29 +00:00
John Baldwin
591faa2e45 - Abolish the 'show ktr_first' and 'show ktr_next' commands.
- Add pager capability to the 'show ktr' command.  It functions much like
  'ps': Enter at the prompt displays one more entry, Space displays
  another page, and any other key quits.
2001-02-20 09:53:27 +00:00
Luigi Rizzo
5fe86675f0 Preserve alignment of first mbuf in m_copypacket.
This is useful when doing copies of packet where some leading
space has been preallocated to insert protocol headers.
Note that there are in fact almost no users of m_copypacket.

MFC candidate.
2001-02-20 08:23:41 +00:00
John Baldwin
5813dc03bd - Don't call clear_resched() in userret(), instead, clear the resched flag
in mi_switch() just before calling cpu_switch() so that the first switch
  after a resched request will satisfy the request.
- While I'm at it, move a few things into mi_switch() and out of
  cpu_switch(), specifically set the p_oncpu and p_lastcpu members of
  proc in mi_switch(), and handle the sched_lock state change across a
  context switch in mi_switch().
- Since cpu_switch() no longer handles the sched_lock state change, we
  have to setup an initial state for sched_lock in fork_exit() before we
  release it.
2001-02-20 05:26:15 +00:00
Bruce Evans
0ad74739ac Removed all traces of T_ASTFLT (except for gaps where it was). It became
unused except in dead code when ast() was split off from trap().
2001-02-19 15:47:38 +00:00
Bruce Evans
d2ef4060d7 Fixed a longstanding latency bug in signal delivery. When a signal
is sent to a process, psignal() needs to schedule an AST for the
process if the process is runnable, not just if it is current, so that
pending signals get checked for on the next return of the process to
user mode.  This wasn't practical until recently because the AST flag
was per-cpu so setting it for a non-current process would usually just
cause a bogus AST for the current process.

For non-current processes looping in user mode, it took accidental
(?) magic to deliver signals at all.  Signals were usually delivered
late as a side effect of rescheduling (need_resched() sets astpending,
etc.).  In pre-SMPng, delivery was delayed by at most 1 quantum (the
need_resched() call in roundrobin() is certain to occur within 1
quantum for looping processes).  In -current, things are complicated
by normal interrupt handlers being threads.  Missing handling of the
complications makes roundrobin() a bogus no-op, but preemptive
scheduling sort of works anyway due to even larger bogons elsewhere.
2001-02-19 09:40:58 +00:00
Bruce Evans
866546105a Changed the aston() family to operate on a specified process instead of
always on curproc.  This is needed to implement signal delivery properly
(see a future log message for kern_sig.c).

Debogotified the definition of aston().  aston() was defined in terms
of signotify() (perhaps because only the latter already operated on
a specified process), but aston() is the primitive.

Similar changes are needed in the ia64 versions of cpu.h and trap.c.
I didn't make them because the ia64 is missing the prerequisite changes
to make astpending and need_resched per-process and those changes are
too large to make without testing.
2001-02-19 04:15:59 +00:00
Brian Feldman
c0511d3b58 Switch to using a struct xucred instead of a struct xucred when not
actually in the kernel.  This structure is a different size than
what is currently in -CURRENT, but should hopefully be the last time
any application breakage is caused there.  As soon as any major
inconveniences are removed, the definition of the in-kernel struct
ucred should be conditionalized upon defined(_KERNEL).

This also changes struct export_args to remove dependency on the
constantly-changing struct ucred, as well as limiting the bounds
of the size fields to the correct size.  This means: a) mountd and
friends won't break all the time, b) mountd and friends won't crash
the kernel all the time if they don't know what they're doing wrt
actual struct export_args layout.

Reviewed by:	bde
2001-02-18 13:30:20 +00:00
Jeroen Ruigrok van der Werven
d7d97eb0aa Preceed/preceeding are not english words. Use precede and preceding. 2001-02-18 10:43:53 +00:00
Bruce Evans
a25f057175 Added a dummy lookup vop. Specfs was broken by removing its dummy
lookup vop so that it defaulted to using vop_eopnotsupp for strange
lookups like the ones for open("/dev/null/", ...) and stat("/dev/null/",
...).  This mainly caused the wrong errno to be returned by vfs syscalls
(EOPNOTSUPP is not in POSIX, and is not documented in connection with
specfs in open.2 and is not documented in stat.2 at all).  Also, lookup
vops are apparently required to set *ap->a_vpp to NULL on error, but
vop_eopnotsupp is too broken to do this.
2001-02-18 02:22:58 +00:00
Jonathan Lemon
9bfd6482c8 Fix tab breakage from last commit.
Spotted by: bde
2001-02-17 19:40:22 +00:00
Jonathan Lemon
c3d7bcdfc9 Introduce copyinfrom and copyinstrfrom, which can copy data from either
user or kernel space.  This will allow layering of os-compat (e.g.: linux)
system calls.  Apply the changes to mount.
2001-02-16 14:31:49 +00:00
Jonathan Lemon
608a3ce62a Extend kqueue down to the device layer.
Backwards compatible approach suggested by: peter
2001-02-15 16:34:11 +00:00
Robert Watson
661702ab20 o Fix spellign in a comment: s/referernce/reference/ 2001-02-14 06:53:57 +00:00
Bosko Milekic
fffd12bd72 Implement m_getm() which will perform an "all or nothing" mbuf + cluster
allocation, as required.

If m_getm() receives NULL as a first argument, then it allocates `len'
(second argument) bytes worth of mbufs + clusters and returns the chain
only if it was able to allocate everything.
If the first argument is non-NULL, then it should be an existing mbuf
chain (e.g. pre-allocated mbuf sitting on a ring, on some list, etc.) and
so it will allocate `len' bytes worth of clusters and mbufs, as needed,
and append them to the tail of the passed in chain, only if it was able
to allocate everything requested.

If allocation fails, only what was allocated by the routine will be freed,
and NULL will be returned.

Also, get rid of existing m_getm() in netncp code and replace calls to it
to calls to this new generic code.

Heavily Reviewed by: bp
2001-02-14 05:13:04 +00:00
Jonathan Lemon
2fd7d53d36 Return ECONNABORTED from accept if connection is closed while on the
listen queue, as well as the current behavior of a zero-length sockaddr.

Obtained from: KAME
Reviewed by: -net
2001-02-14 02:09:11 +00:00
Robert Watson
d941d4752c o Export the nextpid variable via SYSCTL as kern.lastpid, decreasing by
one the number of variables needed for top and other setgid kmem
  utilities that could only be accessed via /dev/kmem previously.

Submitted by:	Thomas Moestl <tmoestl@gmx.net>
Reviewed by:	freebsd-audit
2001-02-12 17:59:01 +00:00
Bosko Milekic
2786342687 Change all instances of CURPROC' and CURTHD' to `curproc,' in order
to stay consistent.

Requested by: bde
2001-02-12 03:15:43 +00:00
Jake Burkholder
d5a08a6065 Implement a unified run queue and adjust priority levels accordingly.
- All processes go into the same array of queues, with different
  scheduling classes using different portions of the array.  This
  allows user processes to have their priorities propogated up into
  interrupt thread range if need be.
- I chose 64 run queues as an arbitrary number that is greater than
  32.  We used to have 4 separate arrays of 32 queues each, so this
  may not be optimal.  The new run queue code was written with this
  in mind; changing the number of run queues only requires changing
  constants in runq.h and adjusting the priority levels.
- The new run queue code takes the run queue as a parameter.  This
  is intended to be used to create per-cpu run queues.  Implement
  wrappers for compatibility with the old interface which pass in
  the global run queue structure.
- Group the priority level, user priority, native priority (before
  propogation) and the scheduling class into a struct priority.
- Change any hard coded priority levels that I found to use
  symbolic constants (TTIPRI and TTOPRI).
- Remove the curpriority global variable and use that of curproc.
  This was used to detect when a process' priority had lowered and
  it should yield.  We now effectively yield on every interrupt.
- Activate propogate_priority().  It should now have the desired
  effect without needing to also propogate the scheduling class.
- Temporarily comment out the call to vm_page_zero_idle() in the
  idle loop.  It interfered with propogate_priority() because
  the idle process needed to do a non-blocking acquire of Giant
  and then other processes would try to propogate their priority
  onto it.  The idle process should not do anything except idle.
  vm_page_zero_idle() will return in the form of an idle priority
  kernel thread which is woken up at apprioriate times by the vm
  system.
- Update struct kinfo_proc to the new priority interface.  Deliberately
  change its size by adjusting the spare fields.  It remained the same
  size, but the layout has changed, so userland processes that use it
  would parse the data incorrectly.  The size constraint should really
  be changed to an arbitrary version number.  Also add a debug.sizeof
  sysctl node for struct kinfo_proc.
2001-02-12 00:20:08 +00:00
Mark Murray
d888fc4e73 RIP <machine/lock.h>.
Some things needed bits of <i386/include/lock.h> - cy.c now has its
own (only) copy of the COM_(UN)LOCK() macros, and IMASK_(UN)LOCK()
has been moved to <i386/include/apic.h> (AKA <machine/apic.h>).
Reviewed by:	jhb
2001-02-11 10:44:09 +00:00
Bosko Milekic
122a814af5 Long awaited style fixup in mbuf code. Get rid of K&R style prototyping
and function argument declarations. Make sure that functions that are
supposed to return a pointer return NULL in case of failure. Don't cast
NULL. Finally, get rid of annoying `register' uses.
2001-02-11 05:02:06 +00:00
Bosko Milekic
5746a1d866 - Place back STR string declarations for lock/unlock strings used for KTR_LOCK
tracing in order to avoid duplication.
- Insert some tracepoints back into the mutex acq/rel code, thus ensuring
  that we can trace all lock acq/rel's again.
- All CURPROC != NULL checks are MPASS()es (under MUTEX_DEBUG) because they
  signify a serious mutex corruption.
- Change up some KASSERT()s to MPASS()es, and vice-versa, depending on the
  type of problem we're debugging (INVARIANTS is used here to check that
  the API is being used properly whereas MUTEX_DEBUG is used to ensure that
  something general isn't happening that will have bad impact on mutex
  locks).

Reminded by: jhb, jake, asmodai
2001-02-11 02:54:16 +00:00
Jake Burkholder
3cbe75a414 Clear the reschedule flag after finding it set in userret(). This
used to be in cpu_switch(), but I don't see any difference between
doing it here.
2001-02-10 20:33:35 +00:00
Jake Burkholder
c11f93b3e7 Acquire sched_lock around need_resched() in roundrobin() to satisfy
assertions that it is held.  Since roundrobin() is a timeout there's
no possible way that it could be called with sched_lock held.
2001-02-10 19:07:32 +00:00
John Baldwin
142ba5f3d7 - Make astpending and need_resched process attributes rather than CPU
attributes.  This is needed for AST's to be properly posted in a preemptive
  kernel.  They are backed by two new flags in p_sflag: PS_ASTPENDING and
  PS_NEEDRESCHED.  They are still accesssed by their old macros:
  aston(), astoff(), etc.  For completeness, an astpending() macro has been
  added to check for a pending AST, and clear_resched() has been added to
  clear need_resched().
- Rename syscall2() on the x86 back to syscall() to be consistent with
  other architectures.
2001-02-10 02:20:34 +00:00
John Baldwin
c75e5182ce Unify the two sleep lock order lists to enforce the process lock ->
uidinfo lock locking order.
2001-02-09 20:52:02 +00:00
John Baldwin
c3a6f33758 Revert the previous revision for two reasons:
- I can't seem to reproduce the warning I got from WITNESS anymore.
- The fix was wrong.  Since a uidinfo struct is a member of proc, it
  makes sense for the locking order to be such that you are allowed to
  hold proc and then grab the uidinfo lock.
2001-02-09 20:51:11 +00:00
John Baldwin
1aa97cdea7 Work around some sizeof(long) != sizeof(int) bogons. 2001-02-09 19:02:39 +00:00
John Baldwin
062d8ff5a0 - Catch up to the new swi API changes:
- Use swi_* function names.
  - Use void * to hold cookies to handlers instead of struct intrhand *.
- In sio.c, use 'driver_name' instead of "sio" as the name of the driver
  lock to minimize diffs with cy(4).
2001-02-09 17:46:35 +00:00
John Baldwin
b4151f7101 - Move struct ithd to sys/interrupt.h.
- Add a set of MI helper functions for interrupt threads:
  - ithread_create() creates a new interrupt thread
  - ithread_destroy() destroys an interrupt thread
  - ithread_add_handler() attaches a new handler to an interrupt thread
  - ithread_remove_handler() detaches a handler from an interrupt thread
- Rename sinthand_add() and sched_swi() to swi_add() and swi_sched()
  respectively so that they live in a consistent namespace.
- struct intrhand is no longer a public type.  It would be private to
  kern_intr.c but the current implementation of fast interrupts on the
  alpha requires the type to be exported.  However, all handlers should
  be treated as void * cookies in the way that new-bus treats them.  This
  includes references to software interrupt handlers.
2001-02-09 17:42:43 +00:00
John Baldwin
8ad802d82c Release the proc lock around crfree() and uifree() in wait1(). It leads to
a lock order violation, and since p is already a zombie at this point,
I'm not sure that we even need all the locking currently in wait1().
2001-02-09 16:43:18 +00:00
John Baldwin
635962afdf Proc locking. 2001-02-09 16:27:41 +00:00
John Baldwin
929604ec9b Move the initailization of the proc lock for proc0 very early into the MD
startup code.
2001-02-09 16:25:16 +00:00
John Baldwin
a91fe908db Woops, remove an obsolete reference to gd_cpu_lockid. 2001-02-09 16:13:57 +00:00
John Baldwin
e910ba59fc - Change the 'witness_list' ddb command to 'show mutexes'. Note that this
will only display sleep mutexes held by the current process.
- Clean up some nits in the witness_display() function and add a ddb
  command 'show witness' that dumps the hierarchy and order lists to the
  console.
- Use queue(3) macros where appropriate.
- Resort the spin lock order list so that "com" is before "sched_lock".
  Also, add appropriate #ifdef's around SMP and i386-specific mutexes.
- Add two new mutexes used to protect the ithread lists and tables to the
  order list.

Requested by:	bde (1)
2001-02-09 15:19:41 +00:00
John Baldwin
cd85c9e17c Change the ktr ddb commands to be show commands. The commands are now as
follows:
 - show ktr_first	display the first entry
 - show ktr_next	display the next entry
 - show ktr		display the entire buffer

The /v modifiers continue to work as described previously.

Requested by:	bde
2001-02-09 15:07:30 +00:00
John Baldwin
7ecfc090c0 - Point out that we don't lock anything during the idle setup because
only the boot processor should be running in the comments.
- Initialize curproc to point to each CPU's respective idleproc if their
  curproc is NULL.
- Keep track of the number of context switches performed by idleproc.
2001-02-09 14:59:43 +00:00
Peter Wemm
2bd5ac330f poll(2) array limits (take 2) - after some input from bde. 2001-02-09 08:10:22 +00:00
Bosko Milekic
9ed346bab0 Change and clean the mutex lock interface.
mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)
2001-02-09 06:11:45 +00:00
John Baldwin
5dbc7fe2d7 Don't bother with acquiring/releasing Giant around kmem_malloc() and
kmem_free() for now.  Kmem_malloc() and kmem_free() now have appropriate
assertions in place, and these checks aren't feasible until more of the
networking code is locked down.  Also, the extra assertions here should
already be caught by the WITNESS code as lock order violations should
mutex operations on Giant be reintroduced here later.
2001-02-08 00:27:38 +00:00
John Baldwin
297c46b68c Don't enable interrupts for a kernel breakpoint or trace trap. Otherwise,
this negates the explicit disabling of interrupts when entering the
debugger in Debugger().
2001-02-08 00:10:07 +00:00
Peter Wemm
89b716473e The code I picked up from NetBSD in '97 had a nasty bug. It limited
the index of the pollfd array to the number of fd's currently open, not
the maximum number of fd's.  ie: if you had 0,1,2 open, you could not
use pollfd slots higher than 20.  The specs say we only have to support
OPEN_MAX [64] entries but we allow way more than that.
2001-02-07 23:28:01 +00:00
Jeroen Ruigrok van der Werven
2fa72ea7d4 Fix typo: compatability -> compatibility.
Compatability is not an existing english word.
2001-02-06 12:05:58 +00:00
Jeroen Ruigrok van der Werven
1a6e52d0e9 Fix typo: seperate -> separate.
Seperate does not exist in the english language.
2001-02-06 11:21:58 +00:00
Jeroen Ruigrok van der Werven
f09deb6962 Fix typo: wierd -> weird.
There is no such thing as wierd in the english language.
2001-02-06 09:25:10 +00:00
Jeroen Ruigrok van der Werven
ba091d9673 Fix typo: teh -> the. 2001-02-06 09:18:39 +00:00
Brian Feldman
a02f31364e It is _DEFINITELY_ not okay to change shmseg on a running system. 2001-02-04 20:10:32 +00:00
Poul-Henning Kamp
37d4006626 Another round of the <sys/queue.h> FOREACH transmogriffer.
Created with:   sed(1)
Reviewed by:    md5(1)
2001-02-04 16:08:18 +00:00
Poul-Henning Kamp
fc2ffbe604 Mechanical change to use <sys/queue.h> macro API instead of
fondling implementation details.

Created with: sed(1)
Reviewed by: md5(1)
2001-02-04 13:13:25 +00:00
Matthew Dillon
4e71e795a1 This commit represents work mainly submitted by Tor and slightly modified
by myself.  It solves a serious vm_map corruption problem that can occur
with the buffer cache when block sizes > 64K are used.  This code has been
heavily tested in -stable but only tested somewhat on -current.  An MFC
will occur in a few days.  My additions include the vm_map_simplify_entry()
and minor buffer cache boundry case fix.

Make the buffer cache use a system map for buffer cache KVM rather then a
normal map.

Ensure that VM objects are not allocated for system maps.  There were cases
where a buffer map could wind up with a backing VM object -- normally
harmless, but this could also result in the buffer cache blocking in places
where it assumes no blocking will occur, possibly resulting in corrupted
maps.

Fix a minor boundry case in the buffer cache size limit is reached that
could result in non-optimal code.

Add vm_map_simplify_entry() calls to prevent 'creeping proliferation'
of vm_map_entry's in the buffer cache's vm_map.  Previously only a simple
linear optimization was made.  (The buffer vm_map typically has only a
handful of vm_map_entry's.  This stabilizes it at that level permanently).

PR: 20609
Submitted by: (Tor Egge) tegge
2001-02-04 06:19:28 +00:00
Brian Somers
115867175a KASSERT that the minor number passed to make_dev() is valid. 2001-02-02 03:32:11 +00:00
Boris Popov
f3f1af390d Properly lock new vnode.
Reminded by:	tegge
2001-01-31 04:54:23 +00:00
Boris Popov
1707240d2a Let M_PANIC go back to the private tree as its intention isn't understood well
for now.
2001-01-31 04:50:20 +00:00
Peter Wemm
2508f69037 Zap last remaining references to (and a use use of) of simple_locks. 2001-01-31 04:29:52 +00:00
Peter Wemm
aa0b4c590f Remove some leftovers from the CMAP* stuff in globaldata and the
BSP and AP startup.
2001-01-30 04:02:28 +00:00
Peter Wemm
fea8c9f26d Remove unused variable 'int n;' 2001-01-29 13:05:21 +00:00
Boris Popov
9211b0b657 Add M_PANIC flag to the list of available flags passed to malloc().
With this flag set malloc() will panic if memory allocation failed.
This usable only in critical places where failed allocation is fatal.

Reviewed by:	peter
2001-01-29 12:48:37 +00:00
Peter Wemm
c83abe02aa Remove unused #include "snp.h" 2001-01-29 10:06:22 +00:00
Peter Wemm
810d0bd1a9 Turn '#if NSNP > 0' into an option. 2001-01-29 09:43:36 +00:00
Peter Wemm
03927d3c33 Send "#if NISA > 0" to the bit-bucket and replace it with an option.
These were compile-time "is the isa code present?" tests and not
'how many isa busses' tests.
2001-01-29 09:38:39 +00:00
Marcel Moolenaar
5bcc1e51a0 Don't hard-code alignment and data declarations valid for 64-bit
machines (duh!). This was one reason why this script broke on
i386. The other being that on i386 sections did not have the
proper alignment. This has been fixed in sys/sys/linker_set.h.
2001-01-29 01:55:54 +00:00
Marcel Moolenaar
136345c019 Improve kernel bootstrapping:
o  Use objdump instead of gensetdefs(1) to build the linker sets.
o  Allow overriding of nm and objdump in resp. genassym.sh and
   gensetdefs.pl for non-native toolchains.

Reviewed by: arch
Perl improvements: Jos Backus <josb@cncdsl.com>, benno
2001-01-28 06:39:56 +00:00
Bosko Milekic
84e11fbc2e Move the setting of curproc to idleproc up earlier in ap_init(). The
problem is that a mutex lock, prior to this change, is acquired before
the curproc is set to idleproc, so we mess ourselves up by calling
the mutex lock routine with curproc == NULL.

Moving it up after the aps_ready spin-wait has us hopefully setting it
after idleproc is setup.

Solved by: jake (the allmighty) :-)
2001-01-28 03:41:01 +00:00
Tor Egge
48bed92485 Defer assignment of low level interrupt handlers for PCI interrupts
described in the MP table until something asks for the interrupt number
later on.
2001-01-28 01:07:54 +00:00
Dag-Erling Smørgrav
9fa2ef3da2 Remove an assertion I forgot to remove in the previous commit: sbuf_len()
may now be called with an unfinished sbuf.
For consistency, copy the related comment from sbuf_delete() to sbuf_clear()
and sbuf_len().
2001-01-28 00:33:58 +00:00
Dag-Erling Smørgrav
4dc1413915 Add sbuf_clear() and sbuf_overflowed().
Move the helper macros from sbuf.h to sbuf.c
Use ints instead of size_ts.
Relax the requirements for sbuf_finish(): it is now possible to finish an
overflowed buffer.
Make sbuf_len() return -1 instead of 0 if the sbuf overflowed.

Requested by:	gibbs
2001-01-28 00:13:01 +00:00
John Baldwin
d38b8dbfc8 Add a new ddb command 'witness_list' that lists the mutexes held by
curproc.

Requested by:	peter
2001-01-27 07:51:34 +00:00
Peter Wemm
0fee3d3550 p->p_intr_nesting_level is MI now and initialized to 0 in kern_fork.c,
so it should be save to KASSERT() on it even on an arch that may not
use it.
2001-01-27 06:32:20 +00:00
John Baldwin
ba88dfc733 Back out proc locking to protect p_ucred for obtaining additional
references along with the actual obtaining of additional references.
2001-01-27 00:01:31 +00:00
John Baldwin
8865286b9c Fix fork_exit() to take a pointer to a function that returns void as its
first argument rather than a function that returns a void *.

Noticed by:	jake
2001-01-26 23:51:41 +00:00
Jake Burkholder
28df158b49 Push Giant down into the trap handlers that need it, instead of
acquiring it unconditionally.

Reviewed by:	jhb
2001-01-26 04:16:16 +00:00
John Baldwin
2a36ec35ae - Change fork_exit() to take a pointer to a trapframe as its 3rd argument
instead of a trapframe directly.  (Requested by bde.)
- Convert the alpha switch_trampoline to call fork_exit() and use the MI
  fork_return() instead of child_return().
- Axe child_return().
2001-01-24 21:59:25 +00:00
John Baldwin
6d3e7b9b0b Add a new item to kinfo_proc: ki_sflag to mirror p_sflag. 2001-01-24 12:49:52 +00:00
Jason Evans
1b367556b5 Convert all simplelocks to mutexes and remove the simplelock implementations. 2001-01-24 12:35:55 +00:00
John Baldwin
168666be74 - Catch up to proc flag changes.
- Assert sched_lock is held in proc_compare.
2001-01-24 11:15:59 +00:00
John Baldwin
3897ca7c61 - Catch up to proc flag changes.
- Update stopevent() to assert that the proc lock is held when it is
  held and is not recursed.  Note that the STOPEVENT() macro obtains
  the proc lock when calling this function.
2001-01-24 11:15:24 +00:00
John Baldwin
e04ac2fe6b - Catch up to proc flag changes.
- Add proc locking for selwakeup() and selrecord().
2001-01-24 11:12:37 +00:00
John Baldwin
ec5a741d77 - Catch up to proc flag changes. 2001-01-24 11:11:35 +00:00
John Baldwin
1899325c72 - Catch up to proc flag changes.
- Add in some locking ops that might fix SIGXCPU, but don't enable them
  yet.
- Assert that sched_lock is not recursed when mi_switch() is called.
2001-01-24 11:10:55 +00:00
John Baldwin
40447cd4aa - Proc locking.
- Catch up to proc flag changes.
2001-01-24 11:08:02 +00:00
John Baldwin
7871c121ff - Add a mtx_assert() for sched_lock in calcru().
- Protect calcru() with sched_lock later on in the file when it is called.
2001-01-24 11:06:39 +00:00
John Baldwin
42a4ed9975 - Proc locking.
- Catch up to proc flag changes.
- Reorder the way we get things in fill_kinfoproc() to minimize the
  number of locking operations.
2001-01-24 11:05:50 +00:00
John Baldwin
8484de7555 - Don't use a union and fun tricks to shave one extra pointer off of struct
mtx right now as it makes debugging harder.  When we are in optimizing
  mode, we can revisit this.
- Fix the KTR trace messages to use %p rather than 0x%p to avoid duplicate
  0x's in KTR output.
- During witness_fixup, release Giant so that witness doesn't get confused.
  Also, grab all_mtx while walking the list of mutexes.
- Remove w_sleep and w_recurse.  Instead, perform checks on mutexes using
  the mutex's mtx_flags field.
- Allow debug.witness_ddb and debug.witness_skipspin to be set from the
  loader.
- Add Giant to the front of existing order_list entries to help ensure
  Giant is always first.
- Add an order entry for the various proc locks.  Note that this only
  helps keep proc in order mostly as the allproc and proctree mutexes are
  only obtained during a lockmgr operation on the specified mutex.
2001-01-24 10:57:01 +00:00
John Baldwin
f1dea27db4 - Catch up to proc flag changes.
- Set the new P_KTHREAD flag for kthreads during kthread_create.
2001-01-24 10:47:50 +00:00
John Baldwin
a7b124c3c7 - Catch up to proc flag changes.
- Add new fork_exit() and fork_return() MI C functions.
2001-01-24 10:47:14 +00:00
John Baldwin
981808d145 Catch up to P_FOO -> PS_FOO changes in proc flags. 2001-01-24 10:44:01 +00:00
John Baldwin
01cd094c32 - Proc locking.
- P_FOO -> PS_FOO.
2001-01-24 10:43:25 +00:00
John Baldwin
f202965e95 - Catch up to p_sflag changes.
- The MD code now initializes proc0.p_heldmtx, proc0.p_contested, and
  curproc.
- The MD code calls here with Giant already held.
- Proc locking.
2001-01-24 10:40:56 +00:00
John Baldwin
625c76db3a - Kill the have_giant parameter to userret() along with all instances of
that name as a variable.  Use mtx_owned(&Giant) where appropriate
  instead.
- Proc locking.
- P_FOO -> PS_FOO.
- Update comments about enable interrupts during trap and why this may be
  bad if we trap while holding a spin mutex.
- Don't bother resetting p to curproc in syscall() in case we are the child
  returning from fork.  The child hasn't returned from fork through syscall
  in a while.
- Remove fork_return() as it has been superseded by the MI version.
2001-01-24 09:53:49 +00:00
John Baldwin
4cbdef8448 - Relocate portions of this file to get it into an order closer to that of
the alpha mp_machdep.c.
- Proc locking.
- Catch up to the P_FOO -> PS_FOO proc flags changes.
- Stick ap_init()'s prototype with the other prototypes.
- Remove the Xforwardirq IPI.
- Remove unused simplelocks.
- Don't try to psignal() from forward_statclock(), but set the appropriate
  signal pending flag in p_sflag instead.
- Add in KTR_SMP tracepoints for various SMP functions.   (Brought over
  from the alpha port)
2001-01-24 09:48:52 +00:00
John Baldwin
45e889896e Fix a typo.
Reported by:	albert
2001-01-24 08:42:39 +00:00
Kirk McKusick
5ed57d323b Never reuse AUTO_OID values.
Approved by:	Alfred Perlstein <bright@wintelcom.net>
2001-01-24 04:35:13 +00:00
John Baldwin
0d6d6aa373 Don't grab Giant when calling kmem_alloc/kmem_free as this is just
encouraging other people to follow the same practice.  If this is going
to be done, then it should be done inside of those two functions instead.
2001-01-24 00:36:03 +00:00
John Baldwin
e5690aadaa Proc locking. 2001-01-24 00:35:12 +00:00
John Baldwin
a914fb6b27 - Proc locking.
- Protect calcru() with sched_lock.
2001-01-24 00:33:44 +00:00
John Baldwin
762dba203e - Proc locking.
- Protect calcru() with sched_lock.
2001-01-24 00:28:07 +00:00
John Baldwin
611d940790 Proc locking. 2001-01-24 00:27:28 +00:00
Matt Jacob
15516f16d2 Do not do the commenting out the way that saves bytes and looks cleaner
to you. Do it the way Vox Populi wants it.
2001-01-23 16:35:33 +00:00
Hajimu UMEMOTO
5d22597f3a Add mibs to hold the number of forks since boot. New mibs are:
vm.stats.vm.v_forks
	vm.stats.vm.v_vforks
	vm.stats.vm.v_rforks
	vm.stats.vm.v_kthreads
	vm.stats.vm.v_forkpages
	vm.stats.vm.v_vforkpages
	vm.stats.vm.v_rforkpages
	vm.stats.vm.v_kthreadpages

Submitted by:	Paul Herman <pherman@frenchfries.net>
Reviewed by:	alfred
2001-01-23 14:32:01 +00:00
Robert Watson
02b65ffb64 o The move to using VADMIN under vaccess() resulted in some system
calls returning EACCES instead of EPERM.  This patch modifies vaccess()
  to return EPERM instead of EACCES if VADMIN is among the requested
  rights.  This affects functions normally limited to the owners of
  a file, such as chmod(), as EPERM is the error indicating that
  privilege would allow the operation, rather than a chance in mandatory
  or discretionary rights.

Reported by:	bde
2001-01-23 04:15:19 +00:00
Matt Jacob
462574faf5 Move (now) unused variable declaration inside the block (now commented out). 2001-01-22 22:22:38 +00:00
Jason Evans
56771ca74b Print correct file name and line number in mtx_assert().
Noticed by:	jake
2001-01-22 05:56:55 +00:00
Jason Evans
0cde2e34af Move most of sys/mutex.h into kern/kern_mutex.c, thereby making the mutex
inline functions non-inlined.  Hide parts of the mutex implementation that
should not be exposed.

Make sure that WITNESS code is not executed during boot until the mutexes
are fully initialized by SI_SUB_MUTEX (the original motivation for this
commit).

Submitted by:	peter
2001-01-21 22:34:43 +00:00
Dag-Erling Smørgrav
a3ea6d41b9 First step towards an MP-safe zone allocator:
- have zalloc() and zfree() always lock the vm_zone.
 - remove zalloci() and zfreei(), which are now redundant.

Reviewed by:	bmilekic, jasone
2001-01-21 22:23:11 +00:00
Poul-Henning Kamp
1fd7b93f3f Convert a Debugger(3) to a panic(9) and a EINVAL.
Reminded by:	bde
2001-01-21 21:19:49 +00:00
Jake Burkholder
a448b62ac9 Make intr_nesting_level per-process, rather than per-cpu. Setup
interrupt threads to run with it always >= 1, so that malloc can
detect M_WAITOK from "interrupt" context.  This is also necessary
in order to context switch from sched_ithd() directly.

Reviewed By:	peter
2001-01-21 19:25:07 +00:00
Jason Evans
527c2fd277 Make the order of the static initializer for all_mtx match the order of
fields in struct mtx.

Found by:	jake
2001-01-21 11:05:02 +00:00
Peter Wemm
654c30a008 Remove APIC_INTR_DIAGNOSTIC - this has been disabled for some time now.
Remove some leftovers of removed SMP options.
2001-01-21 07:54:10 +00:00
Jason Evans
d1c1b8413e Remove MUTEX_DECLARE() and MTX_COLD. Instead, postpone full mutex
initialization until after malloc() is safe to call, then iterate through
all mutexes and complete their initialization.

This change is necessary in order to avoid some circular bootstrapping
dependencies.
2001-01-21 07:52:20 +00:00
Jake Burkholder
3e899e1063 Remove the per-cpu pages used for copy and zero-ing pages of memory
for SMP; just use the same ones as UP.  These weren't used without
holding Giant anyway, and the routines that use them would have to
be protected from pre-emption to avoid migrating cpus.
2001-01-21 06:50:03 +00:00
John Baldwin
27e864e300 - All of proc_compare needs sched_lock, so hold it for the for loop that
calls it rather than obtaining and releasing it a lot in proc_compare.
- Collect all of the data gathering and stick it just after the
  proc_compare loop.  This way, we only have to grab sched_lock once now
  when handling SIGINFO.  All the printf's are done after the values are
  calculated.

Submitted mostly by:	bde
2001-01-20 23:03:20 +00:00
Bosko Milekic
56acb799b2 When short of mbufs or mbuf clusters, we sleep on appropriate "counters."
The counters are incremented when a thread goes to sleep and decremented
either when a thread is woken up by another thread or when the sleep
times out. There existed a race where the sleep count could be decremented
twice resulting in an eventual underflow.
Move the decrementing of the "counters" to the thread initiating the sleep
and thus remedy the problem.
2001-01-20 21:29:10 +00:00
John Baldwin
049ebc15a1 Temporarily disable the printf() for micruptime() going backwards, the
SIGXCPU signal, and killing of processes that exceed their allowed run
time until they can play nice with sched_lock.  Right now they are just
potentital panics waiting to happen.  The printf() has bitten several
people.
2001-01-20 02:57:59 +00:00
Jake Burkholder
c1ef8aac9e - Make npx_intr INTR_MPSAFE and move acquiring Giant into the
function itself.
- Remove a hack to allow acquiring Giant from the npx asm trap
  vector.
2001-01-20 02:30:58 +00:00
John Baldwin
4848fbae35 Be more careful with sched_lock in the SIGINFO handler. Specifically, do
not hold sched_lock while calling ttyprintf().  If we are on a serial
console, then ttyprintf() will end up getting the sio lock, resulting in
a lock order violation.

Noticed by:	des
2001-01-20 02:04:44 +00:00
Peter Wemm
558226eae7 Use #ifdef DEV_NPX from opt_npx.h instead of #if NNPX > 0 from npx.h 2001-01-19 13:19:02 +00:00
Peter Wemm
f7b6e45d5b apic_itrace_splz[] is unused 2001-01-19 10:48:35 +00:00
Peter Wemm
198c5b0891 Remove the static splXXX functions and replace them by static __inline
stubs.  Remove the xxx_imask variables which have been all but gone for
a while.
2001-01-19 09:57:29 +00:00
John Baldwin
568ae39fd5 Revert revision 1.102. I don't think p_nice needs to be protected with
sched_lock, and I'm fairly certain P_TRACED will be protected with the
proc lock instead.

Pointed out indirectly by:	bde
2001-01-19 08:23:22 +00:00
Matthew Dillon
bcc740c453 Do not cluster with B_LOCKED buffers.
This is an odd one.  This patch appears to fix a panic related to background
bitmap writes (for FFS), though neither Kirk, Ian, or I can figure out how
B_CLUSTEROK could possibly be set on a bitmap block to cause the clustering
code to improperly cluster with a buffer undergoing a background write.

In anycase, the clustering code is very fragile and this patch helps with
that, as well as possibly fixing a bug Andre was having.

Suggested by: Ian Dowse <iedowse@maths.tcd.ie>
Testing by: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>
2001-01-19 05:31:07 +00:00
Bosko Milekic
08812b3925 Implement MTX_RECURSE flag for mtx_init().
All calls to mtx_init() for mutexes that recurse must now include
the MTX_RECURSE bit in the flag argument variable. This change is in
preparation for an upcoming (further) mutex API cleanup.
The witness code will call panic() if a lock is found to recurse but
the MTX_RECURSE bit was not set during the lock's initialization.

The old MTX_RECURSE "state" bit (in mtx_lock) has been renamed to
MTX_RECURSED, which is more appropriate given its meaning.

The following locks have been made "recursive," thus far:
eventhandler, Giant, callout, sched_lock, possibly some others declared
in the architecture-specific code, all of the network card driver locks
in pci/, as well as some other locks in dev/ stuff that I've found to
be recursive.

Reviewed by: jhb
2001-01-19 01:59:14 +00:00
John Baldwin
dcfc09d931 Protect p_stat and p_oncpu with sched_lock in forward_signal(). 2001-01-18 08:19:25 +00:00
Bosko Milekic
35c05ac61b Add some KASSERTs valid if WITNESS is defined to verify that the mbuf
allocation routines are being called safely. Since we drop our relevant
mbuf mutex and acquire Giant before we call kmem_malloc(), we have
to make sure that this does not pave the way for a fatal lock order
reversal. Check that either Giant is already held (in which case it's safe
to grab it again and recurse on it) or, if Giant is not held, that no
other locks are held before we try to acquire Giant.

Similarily, add a KASSERT valid in the WITNESS case in m_reclaim() to
nail callers who end up in m_reclaim() and hold a lock.

Pointed out by: jhb
2001-01-16 01:53:13 +00:00
Jason Evans
238510fc46 Implement condition variables. 2001-01-16 01:00:43 +00:00
Poul-Henning Kamp
9039f19fa0 A bit of sanity-checking in bioqdisksort(): panic if we recurse. 2001-01-14 18:48:42 +00:00
Dag-Erling Smørgrav
faa784b70c Use predictable internal names for the sysvipc modules, so we have a
chance of getting dependencies working.
2001-01-14 18:04:30 +00:00
John Baldwin
b947e93403 - Use sched_lock to prevent the mutex name from changing out from under us
while we are copying it to the kinfo_proc structure.
- Test against p_stat to see if we are blocked on a mutex.
- Terminate ki_mtxname with a null char rather than ki_wmesg.
2001-01-13 23:08:34 +00:00
Ben Smithurst
4c061a9da1 Fix getsid() to use "=" instead of "==".
Not objected to by:	audit
2001-01-13 22:49:59 +00:00
Jake Burkholder
063415120b Change return ??? to return -1 in some #if 0'ed code. 2001-01-12 08:24:25 +00:00
David Malone
3b54736e19 Style improvements for last fix. Should be functionally the same.
Submitted by:	bde
2001-01-11 00:13:54 +00:00
Jake Burkholder
ef73ae4b0c Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variables
other then curproc.
2001-01-10 04:43:51 +00:00
Bosko Milekic
d113d3857e In m_mballoc_wait(), drop the mmbfree mutex lock prior to calling
m_reclaim() and re-acquire it when m_reclaim() returns. This means that
we now call the drain routines without holding the mutex lock and
recursing into it. This was done for mainly two reasons:

(i) Avoid the long recursion; long recursions are typically bad and this
    is the case here because we block all other code from freeing mbufs
    if they need to. Doing that is kind of counter-productive, since we're
    really hoping that someone will free.

(ii) More importantly, avoid a potential lock order reversal. Right now,
     not all the locks have been added to our networking code; but
     without this change, we're introducing the possibility for deadlock.
     Consider for example ip_drain(). We will likely eventually introduce
     a lock for ipq there, and so ip_freef() will be called with ipq lock
     held. But, ip_freef() calls m_freem() which in turn acquires the
     mmbfree lock. Since we were previously calling ip_drain() with mmbfree
     held, our lock order would be: mmbfree->ipq->mmbfree. Some other code
     may very well lock ipq first and then call ip_freef(). This would
     result in the regular lock order, ipq->mmbfree. Clearly, we have
     deadlock if one thread acquires the ipq lock and sits waiting for
     mmbfree while another thread calling m_reclaim() acquires mmbfree
     and sits waiting for the ipq lock.

Also, make sure to add a comment above m_reclaim()'s definition briefly
explaining this. Also document this above the call to m_reclaim() in
m_mballoc_wait().

Suggested and reviewed by: alfred
2001-01-09 23:58:56 +00:00
Garrett Wollman
0a2c3d48c6 select() DKI is now in <sys/selinfo.h>. 2001-01-09 04:33:49 +00:00
Nick Hibma
11a8d6c202 Unset the devclass if the attach fails and the devclass was not set to
begin with.

Reviewed by:	dfr
2001-01-08 22:16:26 +00:00