Commit Graph

3980 Commits

Author SHA1 Message Date
Robert Watson
6bd1912df4 o Modify access control checks in p_candebug() such that the policy is as
follows: the effective uid of p1 (subject) must equal the real, saved,
  and effective uids of p2 (object), p2 must not have undergone a
  credential downgrade.  A subject with appropriate privilege may override
  these protections.

  In the future, we will extend these checks to require that p1 effective
  group membership must be a superset of p2 effective group membership.

Obtained from:	TrustedBSD Project
2001-05-17 21:48:44 +00:00
Alfred Perlstein
0fd061c0c4 Cleanup
Remove comment about setting error for reads on EOF, read returns 0 on
EOF so the code should be ok.

Remove non-effective priority boost, PRIO+1 doesn't do anything
(according to McKusick), if a real priority boost is needed it should
have been +4.

Style fixes:
.) return foo -> return (foo)
.) FLAG1|FlAG2 -> FLAG1 | FlAG2
.) wrap long lines
.) unwrap short lines
.) for(i=0;i=foo;i++) -> for (i = 0; i=foo; i++)
.) remove braces for some conditionals with a single statement
.) fix continuation lines.

md5 couldn't verify the binary because some code had to
be shuffled around to address the style issues.
2001-05-17 19:47:09 +00:00
Alfred Perlstein
2deb4a20c3 initialize pipe pointers 2001-05-17 18:22:58 +00:00
Alfred Perlstein
82a283fcf3 pipe_create has to zero out the select record earlier to avoid
returning a half-initialized pipe and causing pipeclose() to follow
a junk pointer.

Discovered by: "Nick S" <snicko@noid.org>
2001-05-17 17:59:28 +00:00
Ian Dowse
0864ef1e8a Change the second argument of vflush() to an integer that specifies
the number of references on the filesystem root vnode to be both
expected and released. Many filesystems hold an extra reference on
the filesystem root vnode, which must be accounted for when
determining if the filesystem is busy and then released if it isn't
busy. The old `skipvp' approach required individual filesystem
xxx_unmount functions to re-implement much of vflush()'s logic to
deal with the root vnode.

All 9 filesystems that hold an extra reference on the root vnode
got the logic wrong in the case of forced unmounts, so `umount -f'
would always fail if there were any extra root vnode references.
Fix this issue centrally in vflush(), now that we can.

This commit also fixes a vnode reference leak in devfs, which could
result in idle devfs filesystems that refuse to unmount.

Reviewed by:	phk, bp
2001-05-16 18:04:37 +00:00
Alfred Perlstein
a428c5ffef remove include of ipl.h because it no longer exists 2001-05-16 02:52:06 +00:00
John Baldwin
8bd57f8fc2 Remove unneeded includes of sys/ipl.h and machine/ipl.h. 2001-05-15 23:22:29 +00:00
John Baldwin
74fc745594 - Remove unneeded include of sys/ipl.h.
- Lock the process before calling killproc() to kill it for exceeding the
  maximum CPU limit.
2001-05-15 23:15:06 +00:00
John Baldwin
9081e5e826 - Remove unneeded include of sys/ipl.h.
- Require the proc lock be held for killproc() to allow for the vmdaemon to
  kill a process when memory is exhausted while holding the lock of the
  process to kill.
2001-05-15 23:13:58 +00:00
Brian Somers
eeee064735 Support /dev/ctty again
Submitted by:	peter
2001-05-15 18:12:38 +00:00
Seigo Tanimura
1b36970495 Back out scanning file descriptors with holding a process lock.
selrecord() requires allproc sx in pfind(), resulting in lock order
reversal between allproc and a process lock.
2001-05-15 10:19:57 +00:00
Jonathan Lemon
97f6754ff1 When calling poll() on a fd associated with a filesystem, let POLLIN/POLLOUT
behave identically to POLLRDNORM/POLLWRNORM.

Submitted by: bde
PR: 27287
merge after: 1 week
2001-05-14 14:37:25 +00:00
Poul-Henning Kamp
241e77c8a5 Use the new ability to avoid practically all the gunk in this file.
When people access /dev/tty, locate their controlling tty and return
the dev_t of it to them.  This basically makes /dev/tty act like
a variant symlink sort of thing which is much simpler than all the
mucking about with vnodes.
2001-05-14 08:22:56 +00:00
Seigo Tanimura
265fc98f36 - Convert msleep(9) in select(2) and poll(2) to cv_*wait*(9).
- Since polling should not involve sleeping, keep holding a
  process lock upon scanning file descriptors.

- Hold a reference to every file descriptor prior to entering
  polling loop in order to avoid lock order reversal between
  lockmgr and p_mtx upon calling fdrop() in fo_poll().
  (NOTE: this work has not been done for netncp and netsmb
  yet because a socket itself has no reference counts.)

Reviewed by:	jhb
2001-05-14 05:26:48 +00:00
John Baldwin
1efb92b7ca Simplify the vm fault trap handling code a bit by using if-else instead of
duplicating code in the then case and then using a goto to jump around
the else case.
2001-05-11 23:50:08 +00:00
Ian Dowse
1feb7a6efa In vrele() and vput(), avoid triggering the confusing "missed vn_close"
KASSERT when vp->v_usecount is zero or negative. In this case, the
"v*: negative ref cnt" panic that follows is much more appropriate.

Reviewed by:	mckusick
2001-05-11 20:42:41 +00:00
John Baldwin
9e5620599e Check witness_dead in more functions to avoid panic'ing when assertions
fail due to witness exhausting its internal resources and shutting down.

Reported by:	Szilveszter Adam <sziszi@petra.hos.u-szeged.hu>
Tested by:	David Wolfskill <david@catwhisker.org>
2001-05-11 20:25:29 +00:00
Tor Egge
dd1c45f3ca Regenerate. 2001-05-11 17:05:47 +00:00
Tor Egge
b4b469e6bb gettimeofday() is MP safe on both -current and -stable. 2001-05-11 17:05:12 +00:00
John Baldwin
ba228f6d96 - Split out the support for per-CPU data from the SMP code. UP kernels
have per-CPU data and gdb on the i386 at least needs access to it.
- Clean up includes in kern_idle.c and subr_smp.c.

Reviewed by:	jake
2001-05-10 17:45:49 +00:00
Alfred Perlstein
97d4578662 Remove an 'optimization' I hope to never see again.
The pipe code could not handle running out of kva, it would panic
if that happened.  Instead return ENFILE to the application which
is an acceptable error return from pipe(2).

There was some slightly tricky things that needed to be worked on,
namely that the pipe code can 'realloc' the size of the buffer if
it detects that the pipe could use a bit more room.  However if it
failed the reallocation it could not cope and would panic.  Fix
this by attempting to grow the pipe while holding onto our old
resources.  If all goes well free the old resources and use the
new ones, otherwise continue to use the smaller buffer already
allocated.

While I'm here add a few blank lines for style(9) and remove
'register'.
2001-05-08 09:09:18 +00:00
Poul-Henning Kamp
e0e0b6610e Always initialize bio_resid from bio_bcount in the disk mini-layer so
that the drivers don't have to do it umpteen times.
2001-05-08 08:24:54 +00:00
Akinori MUSHA
3b26be6ae1 Properly copy the P_ALTSTACK flag in struct proc::p_flag to the child
process on fork(2).

It is the supposed behavior stated in the manpage of sigaction(2), and
Solaris, NetBSD and FreeBSD 3-STABLE correctly do so.

The previous fix against libc_r/uthread/uthread_fork.c fixed the
problem only for the programs linked with libc_r, so back it out and
fix fork(2) itself to help those not linked with libc_r as well.

PR:		kern/26705
Submitted by:	KUROSAWA Takahiro <fwkg7679@mb.infoweb.ne.jp>
Tested by:	knu, GOTOU Yuuzou <gotoyuzo@notwork.org>,
		and some other people
Not objected by:	hackers
MFC in:		3 days
2001-05-07 18:07:29 +00:00
Poul-Henning Kamp
079f2df393 Make the disk mini-layer check for and handle zero-length transfers
instead of the underlying drivers.
2001-05-06 21:55:22 +00:00
Poul-Henning Kamp
a468031ce8 Actually biofinish(struct bio *, struct devstat *, int error) is more general
than the bioerror().

Most of this patch is generated by scripts.
2001-05-06 20:00:03 +00:00
Poul-Henning Kamp
b966319db7 Fix return type of vop_stdputpages()
Noticed by:	rwatson
2001-05-06 17:40:22 +00:00
Robert Watson
29b2efeb6b o First step in cleaning up authorization code for the posix4
implementation.  Move from direct uid 0 comparision to using suser_xxx()
  call with the same semantics.  Simplify CAN_AFFECT() macro as passed
  pcred was redundant.  The checks here still aren't "right", but they
  are probably "better".

Obtained from:	TrustedBSD Project
2001-05-06 16:15:42 +00:00
Matthew Dillon
1766b2e5fa Raise the SysV shared memory defaults to more reasonable values.
Mainly increases the shared memory limit from 4M to 32M (approx).
Many more programs these days use SysV shared memory, especially X-related
programs.
2001-05-04 18:43:19 +00:00
John Baldwin
6c49a8e295 Fix a bug in the pfind() changes due to confusing the process returned by
pfind() ('pp') with the process being detached from ptrace.

Reported by:	bde
2001-05-04 18:13:11 +00:00
John Baldwin
2d96f0b145 - Move state about lock objects out of struct lock_object and into a new
struct lock_instance that is stored in the per-process and per-CPU lock
  lists.  Previously, the lock lists just kept a pointer to each lock held.
  That pointer is now replaced by a lock instance which contains a pointer
  to the lock object, the file and line of the last acquisition of a lock,
  and various flags about a lock including its recursion count.
- If we sleep while holding a sleepable lock, then mark that lock instance
  as having slept and ignore any lock order violations that occur while
  acquiring Giant when we wake up with slept locks.  This is ok because of
  Giant's special nature.
- Allow witness to differentiate between shared and exclusive locks and
  unlocks of a lock.  Witness will now detect the case when a lock is
  acquired first in one mode and then in another.  Mutexes are always
  locked and unlocked exclusively.  Witness will also now detect the case
  where a process attempts to unlock a shared lock while holding an
  exclusive lock and vice versa.
- Fix a bug in the lock list implementation where we used the wrong
  constant to detect the case where a lock list entry was full.
2001-05-04 17:15:16 +00:00
John Baldwin
ac07d659c3 Don't hold the process mutex across calls to FREE() since the vm system
uses lockmgr locks and this leads to a lock order reversal.  At this point
in wait1() the process is not on any process lists or in the process tree,
so no other process should be able to find it or have a reference to it
anyways, so the locking is not needed.
2001-05-04 16:13:28 +00:00
Poul-Henning Kamp
a62615e59b Implement vop_std{get|put}pages() and add them to the default vop[].
Un-copy&paste all the VOP_{GET|PUT}PAGES() functions which do nothing but
the default.
2001-05-01 08:34:45 +00:00
Mark Murray
fb919e4d5a Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by:	bde (with reservations)
2001-05-01 08:13:21 +00:00
Alfred Perlstein
aad7597ce0 When panic()'ing because of recursion on a non-recursive mutex, print
out the location it was initially locked.

Ok'd by: jake
2001-04-30 01:01:52 +00:00
Jake Burkholder
e6af1080c2 Make rtprio work again.
- add a missing break which caused RTP_SET to always return EINVAL
- break instead of returning if p_can fails so proc_lock is always
  dropped correctly
- only copyin data that is actually needed
- use break instead of goto
- make rtp_to_pri return EINVAL instead of -1 if the values are out
  or range so we don't have to translate
2001-04-29 22:09:26 +00:00
Robert Watson
46157a65d7 o As part of the move to not maintaining copies of the vnode owning uid
and gid in the ACL, vaccess_acl_posix1e() was changed to accept
  explicit file_uid and file_gid as arguments.  However, in making the
  change, I explicitly checked file_gid against cr->cr_groups[0], rather
  than using groupmember, resulting in ACL_GROUP_OBJ entries being
  compared to the caller's effective gid only, not the remainder of
  its groups.  This was recently corrected for the version of the
  group call without privilege, but the second test (when privilege is
  added) was missed.  This change replaces an additiona cr->cr_groups[0]
  check with groupmember().

Pointed out by:	jedgar
Reviewed by:	jedgar
Obtained from:	TrustedBSD Project
2001-04-29 19:53:50 +00:00
Poul-Henning Kamp
855aa097af VOP_BALLOC was never really a VOP in the first place, so convert it
to UFS_BALLOC like the other "between UFS and FFS function interfaces".
2001-04-29 12:36:52 +00:00
Poul-Henning Kamp
b7ebffbc08 Add a vop_stdbmap(), and make it part of the default vop vector.
Make 7 filesystems which don't really know about VOP_BMAP rely
on the default vector, rather than more or less complete local
vop_nopbmap() implementations.
2001-04-29 11:48:41 +00:00
Greg Lehey
60fb0ce365 Revert consequences of changes to mount.h, part 2.
Requested by:	bde
2001-04-29 02:45:39 +00:00
Alfred Perlstein
6157b69f4a Instead of asserting that a mutex is not still locked after unlocking it,
assert that the mutex is owned and not recursed prior to unlocking it.

This should give a clearer diagnostic when a programming error is caught.
2001-04-28 12:11:01 +00:00
John Baldwin
6caa8a1501 Overhaul of the SMP code. Several portions of the SMP kernel support have
been made machine independent and various other adjustments have been made
to support Alpha SMP.

- It splits the per-process portions of hardclock() and statclock() off
  into hardclock_process() and statclock_process() respectively.  hardclock()
  and statclock() call the *_process() functions for the current process so
  that UP systems will run as before.  For SMP systems, it is simply necessary
  to ensure that all other processors execute the *_process() functions when the
  main clock functions are triggered on one CPU by an interrupt.  For the alpha
  4100, clock interrupts are delievered in a staggered broadcast fashion, so
  we simply call hardclock/statclock on the boot CPU and call the *_process()
  functions on the secondaries.  For x86, we call statclock and hardclock as
  usual and then call forward_hardclock/statclock in the MD code to send an IPI
  to cause the AP's to execute forwared_hardclock/statclock which then call the
  *_process() functions.
- forward_signal() and forward_roundrobin() have been reworked to be MI and to
  involve less hackery.  Now the cpu doing the forward sets any flags, etc. and
  sends a very simple IPI_AST to the other cpu(s).  AST IPIs now just basically
  return so that they can execute ast() and don't bother with setting the
  astpending or needresched flags themselves.  This also removes the loop in
  forward_signal() as sched_lock closes the race condition that the loop worked
  around.
- need_resched(), resched_wanted() and clear_resched() have been changed to take
  a process to act on rather than assuming curproc so that they can be used to
  implement forward_roundrobin() as described above.
- Various other SMP variables have been moved to a MI subr_smp.c and a new
  header sys/smp.h declares MI SMP variables and API's.   The IPI API's from
  machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h.
- The globaldata_register() and globaldata_find() functions as well as the
  SLIST of globaldata structures has become MI and moved into subr_smp.c.
  Also, the globaldata list is only available if SMP support is compiled in.

Reviewed by:	jake, peter
Looked over by:	eivind
2001-04-27 19:28:25 +00:00
Alfred Perlstein
3abedb4e01 Actually show the values that tripped the assertion "receive 1" 2001-04-27 13:42:50 +00:00
Robert Watson
80c9c40df9 o Remove the disabled p_cansched() test cases that permitted users to
modify the scheduling properties of processes with a different real
  uid but the same effective uid (i.e., daemons, et al).  (note: these
  cases were previously commented out, so this does not change the
  compiled code at al)

Obtained from:	TrustedBSD Project
2001-04-27 01:56:32 +00:00
Poul-Henning Kamp
8ee8b21b48 vfs_subr.c is getting rather fat. The underlying repocopy and this
commit moves the filesystem export handling code to vfs_export.c
2001-04-26 20:47:14 +00:00
Alfred Perlstein
06336fb26d Sendfile is documented to return 0 on success, however if when a
sf_hdtr is used to provide writev(2) style headers/trailers on the
sent data the return value is actually either the result of writev(2)
from the trailers or headers of no tailers are specified.

Fix sendfile to comply with the documentation, by returning 0 on
success.

Ok'd by: dg
2001-04-26 00:14:14 +00:00
Seigo Tanimura
ebdc3f1d2d Do not leave a process with no credential in zombproc.
Reviewed by:	jhb
2001-04-25 10:22:35 +00:00
Kirk McKusick
112f737245 When closing the last reference to an unlinked file, it is freed
by the inactive routine. Because the freeing causes the filesystem
to be modified, the close must be held up during periods when the
filesystem is suspended.

For snapshots to be consistent across crashes, they must write
blocks that they copy and claim those written blocks in their
on-disk block pointers before the old blocks that they referenced
can be allowed to be written.

Close a loophole that allowed unwritten blocks to be skipped when
doing ffs_sync with a request to wait for all I/O activity to be
completed.
2001-04-25 08:11:18 +00:00
Poul-Henning Kamp
a13234bb35 Move the netexport structure from the fs-specific mountstructure
to struct mount.

This makes the "struct netexport *" paramter to the vfs_export
and vfs_checkexport interface unneeded.

Consequently that all non-stacking filesystems can use
vfs_stdcheckexp().

At the same time, make it a pointer to a struct netexport
in struct mount, so that we can remove the bogus AF_MAX
and #include <net/radix.h> from <sys/mount.h>
2001-04-25 07:07:52 +00:00
Thomas Moestl
83f3198b2b Change uipc_sockaddr so that a sockaddr_un without a path is returned
nam for an unbound socket instead of leaving nam untouched in that case.
This way, the getsockname() output can be used to determine the address
family of such sockets (AF_LOCAL).

Reviewed by:	iedowse
Approved by:	rwatson
2001-04-24 19:09:23 +00:00
John Baldwin
33a9ed9d0e Change the pfind() and zpfind() functions to lock the process that they
find before releasing the allproc lock and returning.

Reviewed by:	-smp, dfr, jake
2001-04-24 00:51:53 +00:00
Thomas Moestl
e15480f8dd Fix a bug introduced in the last commit: vaccess_acl_posix1 only checked
the file gid gainst the egid of the accessing process for the
ACL_GROUP_OBJ case, and ignored supplementary groups.

Approved by:	rwatson
2001-04-23 22:52:26 +00:00
Greg Lehey
d98dc34f52 Correct #includes to work with fixed sys/mount.h. 2001-04-23 09:05:15 +00:00
Robert Watson
5ea6583e2d o Remove comment indicating policy permits loop-back debugging, but
semantics don't: in practice, both policy and semantics permit
  loop-back debugging operations, only it's just a subset of debugging
  operations (i.e., a proc can open its own /dev/mem), and that's at a
  higher layer.
2001-04-21 22:41:45 +00:00
John Baldwin
9d4f526475 Spelling nit: acquring -> acquiring.
Reported by:	T. William Wells <bill@twwells.com>
2001-04-21 01:50:32 +00:00
Alfred Perlstein
98689e1e70 Assert that when using an interlock mutex it is not recursed when lockmgr()
is called.

Ok'd by: jhb
2001-04-20 22:38:40 +00:00
John Baldwin
242d02a13f Make the ap_boot_mtx mutex static. 2001-04-20 01:09:05 +00:00
John Baldwin
d8915a7f34 - Whoops, forgot to enable the clock lock in the spin order list on the
alpha.
- Change the Debugger() functions to pass in the real function name.
2001-04-19 15:49:54 +00:00
Bosko Milekic
d04d50d1f7 Fix inconsistency in setup of kernel_map: we need to make sure that
we also reserve _adequate_ space for the mb_map submap; i.e. we need
space for nmbclusters, nmbufs, _and_ nmbcnt. Furthermore, we need to
rounddown, and not roundup, so that we are consistent.

Pointed out by: bde
2001-04-18 23:54:13 +00:00
Alfred Perlstein
2f3cf91876 Check validity of signal callback requested via aio routines.
Also move the insertion of the request to after the request is validated,
there's still looks like there may be some problems if an invalid address
is passed to the aio routines, basically a possible leak or having a
not completely initialized structure on the queue may still be possible.

A new sig macro was made _SIG_VALID to check the validity of a signal,
it would be advisable to use it from now on (in kern/kern_sig.c) rather
than rolling your own.

PR: kern/17152
2001-04-18 22:18:39 +00:00
Seigo Tanimura
759cb26335 Reclaim directory vnodes held in namecache if few free vnodes are
available.

Only directory vnodes holding no child directory vnodes held in
v_cache_src are recycled, so that directory vnodes near the root of
the filesystem hierarchy remain in namecache and directory vnodes are
not reclaimed in cascade.

The period of vnode reclaiming attempt and the number of vnodes
attempted to reclaim can be tuned via sysctl(2).

Suggested by:	tegge
Approved by:	phk
2001-04-18 11:19:50 +00:00
Poul-Henning Kamp
793d6d5d57 bread() is a special case of breadn(), so don't replicate code. 2001-04-18 07:16:07 +00:00
Dima Dorfman
25c7870e5d Make this driver play ball with devfs(5).
Reviewed by:	brian
2001-04-17 20:53:11 +00:00
Alfred Perlstein
e04670b734 Add a sanity check on ucred refcount.
Submitted by: Terry Lambert <terry@lambert.org>
2001-04-17 20:50:43 +00:00
Alfred Perlstein
603c86672c Implement client side NFS locks.
Obtained from: BSD/os
Import Ok'd by: mckusick, jkh, motd on builder.freebsd.org
2001-04-17 20:45:23 +00:00
Poul-Henning Kamp
0dfba3cef1 Write a switch statement as less obscure if statements. 2001-04-17 20:22:07 +00:00
John Baldwin
e3ee8974e3 Fix an old bug related to BETTER_CLOCK. Call forward_*clock if SMP
and __i386__ are defined rather than if SMP and BETTER_CLOCK are defined.
The removal of BETTER_CLOCK would have broken this except that kern_clock.c
doesn't include <machine/smptests.h>, so it doesn't see the definition of
BETTER_CLOCK, and forward_*clock aren't called, even on 4.x.  This seems to
fix the problem where a n-way SMP system would see 100 * n clk interrupts
and 128 * n rtc interrupts.
2001-04-17 17:53:36 +00:00
Poul-Henning Kamp
f84e29a06c This patch removes the VOP_BWRITE() vector.
VOP_BWRITE() was a hack which made it possible for NFS client
side to use struct buf with non-bio backing.

This patch takes a more general approach and adds a bp->b_op
vector where more methods can be added.

The success of this patch depends on bp->b_op being initialized
all relevant places for some value of "relevant" which is not
easy to determine.  For now the buffers have grown a b_magic
element which will make such issues a tiny bit easier to debug.
2001-04-17 08:56:39 +00:00
Kirk McKusick
5819ab3f12 Add debugging option to always read/write cylinder groups as full
sized blocks. To enable this option, use: `sysctl -w debug.bigcgs=1'.
Add debugging option to disable background writes of cylinder
groups. To enable this option, use: `sysctl -w debug.dobkgrdwrite=0'.
These debugging options should be tried on systems that are panicing
with corrupted cylinder group maps to see if it makes the problem
go away. The set of panics in question are:

	ffs_clusteralloc: map mismatch
	ffs_nodealloccg: map corrupted
	ffs_nodealloccg: block not in map
	ffs_alloccg: map corrupted
	ffs_alloccg: block not in map
	ffs_alloccgblk: cyl groups corrupted
	ffs_alloccgblk: can't find blk in cyl
	ffs_checkblk: partially free fragment

The following panics are less likely to be related to this problem,
but might be helped by these debugging options:

	ffs_valloc: dup alloc
	ffs_blkfree: freeing free block
	ffs_blkfree: freeing free frag
	ffs_vfree: freeing free inode

If you try these options, please report whether they helped reduce your
bitmap corruption panics to Kirk McKusick at <mckusick@mckusick.com>
and to Matt Dillon <dillon@earth.backplane.com>.
2001-04-17 05:37:51 +00:00
Robert Watson
b114e127e6 In my first reading of POSIX.1e, I misinterpreted handling of the
ACL_USER_OBJ and ACL_GROUP_OBJ fields, believing that modification of the
access ACL could be used by privileged processes to change file/directory
ownership.  In fact, this is incorrect; ACL_*_OBJ (+ ACL_MASK and
ACL_OTHER) should have undefined ae_id fields; this commit attempts
to correct that misunderstanding.

o Modify arguments to vaccess_acl_posix1e() to accept the uid and gid
  associated with the vnode, as those can no longer be extracted from
  the ACL passed as an argument.  Perform all comparisons against
  the passed arguments.  This actually has the effect of simplifying
  a number of components of this call, as well as reducing the indent
  level, but now seperates handling of ACL_GROUP_OBJ from ACL_GROUP.

o Modify acl_posix1e_check() to return EINVAL if the ae_id field of
  any of the ACL_{USER_OBJ,GROUP_OBJ,MASK,OTHER} entries is a value
  other than ACL_UNDEFINED_ID.  As a temporary work-around to allow
  clean upgrades, set the ae_id field to ACL_UNDEFINED_ID before
  each check so that this cannot cause a failure in the short term
  (this work-around will be removed when the userland libraries and
  utilities are updated to take this change into account).

o Modify ufs_sync_acl_from_inode() so that it forces
  ACL_{USER_OBJ,GROUP_OBJ,MASK,OTHER} ae_id fields to ACL_UNDEFINED_ID
  when synchronizing the ACL from the inode.

o Modify ufs_sync_inode_from_acl to not propagate uid and gid
  information to the inode from the ACL during ACL update.  Also
  modify the masking of permission bits that may be set from
  ALLPERMS to (S_IRWXU|S_IRWXG|S_IRWXO), as ACLs currently do not
  carry none-ACCESSPERMS (S_ISUID, S_ISGID, S_ISTXT).

o Modify ufs_getacl() so that when it emulates an access ACL from
  the inode, it initializes the ae_id fields to ACL_UNDEFINED_ID.

o Clean up ufs_setacl() substantially since it is no longer possible
  to perform chown/chgrp operations using vop_setacl(), so all the
  access control for that can be eliminated.

o Modify ufs_access() so that it passes owner uid and gid information
  into vaccess_acl_posix1e().

Pointed out by:	jedger
Obtained from:	TrustedBSD Project
2001-04-17 04:33:34 +00:00
John Baldwin
abd9053ee4 Blow away the panic mutex in favor of using a single atomic_cmpset() on a
panic_cpu shared variable.  I used a simple atomic operation here instead
of a spin lock as it seemed to be excessive overhead.  Also, this can avoid
recursive panics if, for example, witness is broken.
2001-04-17 04:18:08 +00:00
John Baldwin
3c41f323c9 Check to see if enroll() returns NULL in the witness initialization. This
can happen if witness runs out of resources during initialization or if
witness_skipspin is enabled.

Sleuthing by:	Peter Jeremy <peter.jeremy@alcatel.com.au>
2001-04-17 03:35:38 +00:00
John Baldwin
7141f2ad46 Exit and re-enter the critical section while spinning for a spinlock so
that interrupts can come in while we are waiting for a lock.
2001-04-17 03:34:52 +00:00
John Hay
24dbea46a9 Update to the 2001-04-02 version of the nanokernel code from Dave Mills. 2001-04-16 13:05:05 +00:00
Brian Somers
56700d4634 Call strlen() once instead of twice. 2001-04-14 21:33:58 +00:00
Robert Watson
e9e7ff5b22 o Since uid checks in p_cansignal() are now identical between P_SUGID
and non-P_SUGID cases, simplify p_cansignal() logic so that the
  P_SUGID masking of possible signals is independent from uid checks,
  removing redundant code and generally improving readability.

Reviewed by:	tmm
Obtained from:	TrustedBSD Project
2001-04-13 14:33:45 +00:00
Alfred Perlstein
1375ed7eb7 convert if/panic -> KASSERT, explain what triggered the assertion 2001-04-13 10:15:53 +00:00
Murray Stokely
a4e6da691f Generate useful error messages. 2001-04-13 09:37:25 +00:00
Mark Murray
f0b60d7560 Handle a rare but fatal race invoked sometimes when SIGSTOP is
invoked.
2001-04-13 09:29:34 +00:00
John Baldwin
7a9aa5d372 - Add a comment at the start of the spin locks list.
- The alpha SMP code uses an "ap boot" spinlock as well.
2001-04-13 08:31:38 +00:00
Robert Watson
44c3e09cdc o Disallow two "allow this" exceptions in p_cansignal() restricting
the ability of unprivileged processes to deliver arbitrary signals
  to daemons temporarily taking on unprivileged effective credentials
  when P_SUGID is not set on the target process:
  Removed:
     (p1->p_cred->cr_ruid != ps->p_cred->cr_uid)
     (p1->p_ucred->cr_uid != ps->p_cred->cr_uid)
o Replace two "allow this" exceptions in p_cansignal() restricting
  the ability of unprivileged processes to deliver arbitrary signals
  to daemons temporarily taking on unprivileged effective credentials
  when P_SUGID is set on the target process:
  Replaced:
     (p1->p_cred->p_ruid != p2->p_ucred->cr_uid)
     (p1->p_cred->cr_uid != p2->p_ucred->cr_uid)
  With:
     (p1->p_cred->p_ruid != p2->p_ucred->p_svuid)
     (p1->p_ucred->cr_uid != p2->p_ucred->p_svuid)
o These changes have the effect of making the uid-based handling of
  both P_SUGID and non-P_SUGID signal delivery consistent, following
  these four general cases:
     p1's ruid equals p2's ruid
     p1's euid equals p2's ruid
     p1's ruid equals p2's svuid
     p1's euid equals p2's svuid
  The P_SUGID and non-P_SUGID cases can now be largely collapsed,
  and I'll commit this in a few days if no immediate problems are
  encountered with this set of changes.
o These changes remove a number of warning cases identified by the
  proc_to_proc inter-process authorization regression test.
o As these are new restrictions, we'll have to watch out carefully for
  possible side effects on running code: they seem reasonable to me,
  but it's possible this change might have to be backed out if problems
  are experienced.

Submitted by:		src/tools/regression/security/proc_to_proc/testuid
Reviewed by:		tmm
Obtained from:	TrustedBSD Project
2001-04-13 03:06:22 +00:00
Robert Watson
0489082737 o Disable two "allow this" exceptions in p_cansched()m retricting the
ability of unprivileged processes to modify the scheduling properties
  of daemons temporarily taking on unprivileged effective credentials.
  These cases (p1->p_cred->p_ruid == p2->p_ucred->cr_uid) and
  (p1->p_ucred->cr_uid == p2->p_ucred->cr_uid), respectively permitting
  a subject process to influence the scheduling of a daemon if the subject
  process has the same real uid or effective uid as the daemon's effective
  uid.  This removes a number of the warning cases identified by the
  proc_to_proc iner-process authorization regression test.
o As these are new restrictions, we'll have to watch out carefully for
  possible side effects on running code: they seem reasonable to me,
  but it's possible this change might have to be backed out if problems
  are experienced.

Reported by:	src/tools/regression/security/proc_to_proc/testuid
Obtained from:	TrustedBSD Project
2001-04-12 22:46:07 +00:00
Robert Watson
e386f9bda3 o Make kqueue's filt_procattach() function use the error value returned
by p_can(...P_CAN_SEE), rather than returning EACCES directly.  This
  brings the error code used here into line with similar arrangements
  elsewhere, and prevents the leakage of pid usage information.

Reviewed by:	jlemon
Obtained from:	TrustedBSD Project
2001-04-12 21:32:02 +00:00
Robert Watson
d34f8d3030 o Limit process information leakage by introducing a p_can(...P_CAN_SEE...)
in rtprio()'s RTP_LOOKIP implementation.

Obtained from:	TrustedBSD Project
2001-04-12 20:46:26 +00:00
Robert Watson
eb9e5c1d72 o Reduce information leakage into jails by adding invocations of
p_can(...P_CAN_SEE...) to getpgid(), getsid(), and setpgid(),
  blocking these operations on processes that should not be visible
  by the requesting process.  Required to reduce information leakage
  in MAC environments.

Obtained from:	TrustedBSD Project
2001-04-12 19:39:00 +00:00
Robert Watson
4c5eb9c397 o Replace p_cankill() with p_cansignal(), remove wrappage of p_can()
from signal authorization checking.
o p_cansignal() takes three arguments: subject process, object process,
  and signal number, unlike p_cankill(), which only took into account
  the processes and not the signal number, improving the abstraction
  such that CANSIGNAL() from kern_sig.c can now also be eliminated;
  previously CANSIGNAL() special-cased the handling of SIGCONT based
  on process session.  privused is now deprecated.
o The new p_cansignal() further limits the set of signals that may
  be delivered to processes with P_SUGID set, and restructures the
  access control check to allow it to be extended more easily.
o These changes take into account work done by the OpenBSD Project,
  as well as by Robert Watson and Thomas Moestl on the TrustedBSD
  Project.

Obtained from:  TrustedBSD Project
2001-04-12 02:38:08 +00:00
Robert Watson
40829dd2dc o Regenerated following introduction of __setugid() system call for
"options REGRESSION".

Obtained from:	TrustedBSD Project
2001-04-11 20:21:37 +00:00
Robert Watson
130d0157d1 o Introduce a new system call, __setsugid(), which allows a process to
toggle the P_SUGID bit explicitly, rather than relying on it being
  set implicitly by other protection and credential logic.  This feature
  is introduced to support inter-process authorization regression testing
  by simplifying userland credential management allowing the easy
  isolation and reproduction of authorization events with specific
  security contexts.  This feature is enabled only by "options REGRESSION"
  and is not intended to be used by applications.  While the feature is
  not known to introduce security vulnerabilities, it does allow
  processes to enter previously inaccessible parts of the credential
  state machine, and is therefore disabled by default.  It may not
  constitute a risk, and therefore in the future pending further analysis
  (and appropriate need) may become a published interface.

Obtained from:	TrustedBSD Project
2001-04-11 20:20:40 +00:00
John Baldwin
7b531e6037 Stick proc0 in the PID hash table. 2001-04-11 18:50:50 +00:00
John Baldwin
2fea957dc5 Rename the IPI API from smp_ipi_* to ipi_* since the smp_ prefix is just
"redundant noise" and to match the IPI constant namespace (IPI_*).

Requested by:	bde
2001-04-11 17:06:02 +00:00
Chris D. Faulhaber
fb1af1f2bf Correct the following defines to match the POSIX.1e spec:
ACL_PERM_EXEC  -> ACL_EXECUTE
  ACL_PERM_READ  -> ACL_READ
  ACL_PERM_WRITE -> ACL_WRITE

Obtained from:	TrustedBSD
2001-04-11 02:19:01 +00:00
Peter Wemm
9d10eb0c0c Create debug.hashstat.[raw]nchash and debug.hashstat.[raw]nfsnode to
enable easy access to the hash chain stats.  The raw prefixed versions
dump an integer array to userland with the chain lengths.  This cheats
and calls it an array of 'struct int' rather than 'int' or sysctl -a
faithfully dumps out the 128K array on an average machine.  The non-raw
versions return 4 integers: count, number of chains used, maximum chain
length, and percentage utilization (fixed point, multiplied by 100).
The raw forms are more useful for analyzing the hash distribution, while
the other form can be read easily by humans and stats loggers.
2001-04-11 00:39:20 +00:00
John Baldwin
ca7ef17c08 Remove the BETTER_CLOCK #ifdef's. The code is on by default and is here
to stay for the foreseeable future.

OK'd by:	peter (the idea)
2001-04-10 21:34:13 +00:00
John Baldwin
6a0fa9a023 Add an MI API for sending IPI's. I used the same API present on the alpha
because:
 - it used a better namespace (smp_ipi_* rather than *_ipi),
 - it used better constant names for the IPI's (IPI_* rather than
   X*_OFFSET), and
 - this API also somewhat exists for both alpha and ia64 already.
2001-04-10 21:04:32 +00:00
Boris Popov
681a5bbef2 Import kernel part of SMB/CIFS requester.
Add smbfs(CIFS) filesystem.

Userland part will be in the ports tree for a while.

Obtained from:	smbfs-1.3.7-dev package.
2001-04-10 07:59:06 +00:00
Boris Popov
16162e5789 Avoid endless recursion on panic.
Reviewed by:	jhb
2001-04-10 00:56:19 +00:00
John Baldwin
d53d22496f Maintain a reference count on the witness struct. When the reference
count drops to 0 in witness_destroy, set the w_name and w_file pointers
to point to the string "(dead)" and the w_line field to 0.  This way,
if a mutex of a given name is used only in a module, then as long as
all mutexes in the module are destroyed when the module is unloaded,
witness will not maintain stale references to the mutex's name in the
module's data section causing a panic later on when the w_name or w_file
field's are examined.
2001-04-09 22:34:05 +00:00
Nick Hibma
7032578aac Remove a stale file. 2001-04-09 10:28:33 +00:00
Jake Burkholder
1681b00a79 Fix a precedence bug. ! has higher precedence than &. 2001-04-08 04:15:26 +00:00
Nick Hibma
6d0b13c1ba Use getopt instead of a home grown one
Submitted by:	DES
2001-04-07 20:51:24 +00:00
John Baldwin
3dcb6789d7 - Split out the functionality of displaying the contents of a single lock
list into a public witness_list_locks() function.  Call this function
  twice in witness_list() instead of using an evil goto.
- Adjust the 'show locks' command to take an optional parameter which
  specifies the pid of a process to list the locks of.  By default the
  locks held by the current process are displayed.
2001-04-06 21:37:52 +00:00
Bosko Milekic
4b8ae40a7c - Change the msleep()s to condition variables.
The mbuf and mcluster free lists now each "own" a condition variable,
  m_starved.

- Clean up minor indentention issues in sys/mbuf.h caused by previous
  commit.
2001-04-03 04:50:13 +00:00
Alfred Perlstein
5ea487f34d Use only one mutex for the entire mbuf subsystem.
Don't use atomic operations for the stats updating, instead protect
the counts with the mbuf mutex.  Most twiddling of the stats was
done right before or after releasing a mutex.  By doing this we
reduce the number of locked ops needed as well as allow a sysctl
to gain a consitant view of the entire stats structure.

In the future...

  This will allow us to chain common mbuf operations that would
  normally need to aquire/release 2 or 3 of the locks to build an
  mbuf with a cluster or external data attached into a single op
  requiring only one lock.

  Simplify the per-cpu locks that are planned.

There's also some if (1) code that should check if the "how"
operation specifies blocking/non-blocking behavior, we _could_ make
it so that we hold onto the mutex through calls into kmem_alloc
when non-blocking requests are made, but for safety reasons we
currently drop and reaquire the mutex around the calls.

Also, note that calling kmem_alloc is rare and only happens during
a shortage so drop/re-getting the mutex will not be a common
occurance.

Remove some #define's that seemed to obfuscate the code to me.

Remove an extranious comment.

Remove an XXX, including mutex.h isn't a crime.

Reviewed by: bmilekic
2001-04-03 03:15:11 +00:00
John Baldwin
5b3047d59f Change stop() to require the sched_lock as well as p's process lock to
avoid silly lock contention on sched_lock since in 2 out of the 3 places
that we call stop(), we get sched_lock right after calling it and we were
locking sched_lock inside of stop() anyways.
2001-04-03 01:39:23 +00:00
John Baldwin
1333047621 - Move the second stop() of process 'p' in issignal() to be after we send
SIGCHLD to our parent process.  Otherwise, we could block while obtaining
  the process lock for our parent process and switch out while we were
  in SSTOP.  Even worse, when we try to resume from the mutex being blocked
  on our p_stat will be SRUN, not SSTOP.
- Fix a comment above stop() to indicate that it requires that the proc lock
  be held, not a proctree lock.

Reported by:	markm
Sleuthing by:	jake
2001-04-02 17:26:51 +00:00
Robert Watson
685574864e o Part two of introduction of extattr_{delete,get,set}_fd() system calls,
regenerate necessary automatically-generated code.

Obtained from:	TrustedBSD Project
2001-03-31 16:21:19 +00:00
Robert Watson
fec605c882 o Introduce extattr_{delete,get,set}_fd() to allow extended attribute
operations on file descriptors, which complement the existing set of
  calls, extattr_{delete,get,set}_file() which act on paths.  In doing
  so, restructure the system call implementation such that the two sets
  of functions share most of the relevant code, rather than duplicating
  it.  This pushes the vnode locking into the shared code, but keeps
  the copying in of some arguments in the system call code.  Allowing
  access via file descriptors reduces the opportunity for race
  conditions when managing extended attributes.

Obtained from:	TrustedBSD Project
2001-03-31 16:20:05 +00:00
Robert Watson
f8e6ab29c2 o Restructure privilege check associated with process visibility for
ps_showallprocs such that if superuser is present to override process
  hiding, the search falls through [to success].  When additional
  restrictions are placed on process visibility, such as MAC, new clauses
  will be placed above the return(0).

Obtained from:	TrustedBSD Project
2001-03-29 22:59:44 +00:00
Robert Watson
ed6397209d o introduce u_cansee(), which performs access control checks between
two subject ucreds.  Unlike p_cansee(), u_cansee() doesn't have
  process lock requirements, only valid ucred reference requirements,
  so is prefered as process locking improves.  For now, back p_cansee()
  into u_cansee(), but eventually p_cansee() will go away.

Reviewed by:	jhb, tmm
Obtained from:	TrustedBSD Project
2001-03-28 20:50:15 +00:00
John Baldwin
026e76f43e Close a race condition where if we were obtaining a sleep lock and no spin
locks were held, we could be preempted and switch CPU's in between the time
that we set a variable to the list of spin locks on our CPU and the time
that we checked that variable to ensure no spinlocks were held while
grabbing a sleep lock.  Losing the race resulted in checking some other
CPU's spin lock list and bogusly panicing.
2001-03-28 16:11:51 +00:00
John Baldwin
f7012f592a - s/mutexes/locks/g in appropriate comments.
- Rename the 'show mutexes' ddb command to 'show locks' since it shows
  a list of all the lock objects held by the current process.
2001-03-28 12:39:40 +00:00
John Baldwin
1005a129e5 Convert the allproc and proctree locks from lockmgr locks to sx locks. 2001-03-28 11:52:56 +00:00
John Baldwin
c739adbf42 Pass in a pointer to the mutex's lock_object as the second argument to
WITNESS_SLEEP() rather than the mutex itself.
2001-03-28 10:41:15 +00:00
John Baldwin
f34fa851e0 Catch up to header include changes:
- <sys/mutex.h> now requires <sys/systm.h>
- <sys/mutex.h> and <sys/sx.h> now require <sys/lock.h>
2001-03-28 09:17:56 +00:00
John Baldwin
192846463a Rework the witness code to work with sx locks as well as mutexes.
- Introduce lock classes and lock objects.  Each lock class specifies a
  name and set of flags (or properties) shared by all locks of a given
  type.  Currently there are three lock classes: spin mutexes, sleep
  mutexes, and sx locks.  A lock object specifies properties of an
  additional lock along with a lock name and all of the extra stuff needed
  to make witness work with a given lock.  This abstract lock stuff is
  defined in sys/lock.h.  The lockmgr constants, types, and prototypes have
  been moved to sys/lockmgr.h.  For temporary backwards compatability,
  sys/lock.h includes sys/lockmgr.h.
- Replace proc->p_spinlocks with a per-CPU list, PCPU(spinlocks), of spin
  locks held.  By making this per-cpu, we do not have to jump through
  magic hoops to deal with sched_lock changing ownership during context
  switches.
- Replace proc->p_heldmtx, formerly a list of held sleep mutexes, with
  proc->p_sleeplocks, which is a list of held sleep locks including sleep
  mutexes and sx locks.
- Add helper macros for logging lock events via the KTR_LOCK KTR logging
  level so that the log messages are consistent.
- Add some new flags that can be passed to mtx_init():
  - MTX_NOWITNESS - specifies that this lock should be ignored by witness.
    This is used for the mutex that blocks a sx lock for example.
  - MTX_QUIET - this is not new, but you can pass this to mtx_init() now
    and no events will be logged for this lock, so that one doesn't have
    to change all the individual mtx_lock/unlock() operations.
- All lock objects maintain an initialized flag.  Use this flag to export
  a mtx_initialized() macro that can be safely called from drivers.  Also,
  we on longer walk the all_mtx list if MUTEX_DEBUG is defined as witness
  performs the corresponding checks using the initialized flag.
- The lock order reversal messages have been improved to output slightly
  more accurate file and line numbers.
2001-03-28 09:03:24 +00:00
John Baldwin
c31146a14e - Resort some includes to deal with the new witness code coming in shortly.
- Make sure we have Giant locked before calling coredump() in sigexit().

Spotted by:	peter (2)
2001-03-28 08:41:04 +00:00
John Baldwin
486b8ac04a Don't explicitly zero p_intr_nesting_level and p_aioinfo in fork. 2001-03-28 03:14:14 +00:00
John Baldwin
0006681fe6 Switch from save/disable/restore_intr() to critical_enter/exit(). 2001-03-28 03:06:10 +00:00
John Baldwin
b944b9033a Catch up to the mtx_saveintr -> mtx_savecrit change. 2001-03-28 02:46:21 +00:00
John Baldwin
35a472461a Use mtx_intr_enable() on sched_lock to ensure child processes always start
with interrupts enabled rather than calling the no-longer MI function
enable_intr().  This is bogus anyways and in theory shouldn't even be
needed.
2001-03-28 02:44:11 +00:00
John Baldwin
6283b7d01b - Switch from using save/disable/restore_intr to using critical_enter/exit
and change the u_int mtx_saveintr member of struct mtx to a critical_t
  mtx_savecrit.
- On the alpha we no longer need a custom _get_spin_lock() macro to avoid
  an extra PAL call, so remove it.
- Partially fix using mutexes with WITNESS in modules.  Change all the
  _mtx_{un,}lock_{spin,}_flags() macros to accept explicit file and line
  parameters and rename them to use a prefix of two underscores.  Inside
  of kern_mutex.c, generate wrapper functions for
  _mtx_{un,}lock_{spin,}_flags() (only using a prefix of one underscore)
  that are called from modules.  The macros mtx_{un,}lock_{spin,}_flags()
  are mapped to the __mtx_* macros inside of the kernel to inline the
  usual case of mutex operations and map to the internal _mtx_* functions
  in the module case so that modules will use WITNESS and KTR logging if
  the kernel is compiled with support for it.
2001-03-28 02:40:47 +00:00
Paul Saab
6b8b8c7fdc Last commit was broken.. It always prints '[CTRL-C to abort]'.
Move duplicate code for printing the status of the dump and checking
for abort into a separate function.

Pointy hat to:	me
2001-03-28 01:37:29 +00:00
David Malone
d1cadeb02a Don't leak the memory we've just malloced if we can't find the
process we're looking for. (I don't think this can currently
happen, but it depends how the function is called).

PR:		25932
Submitted by:	David Xu <davidx@viasoft.com.cn>
2001-03-27 20:49:51 +00:00
Yaroslav Tykhiy
ae5fa19aa9 Make cblock_alloc_cblocks() spell its own name
correctly in its warning message.

PR: kern/7693
2001-03-27 10:21:26 +00:00
Kenneth D. Merry
3393f8daa3 Rewrite of the CAM error recovery code.
Some of the major changes include:

	- The SCSI error handling portion of cam_periph_error() has
	  been broken out into a number of subfunctions to better
	  modularize the code that handles the hierarchy of SCSI errors.
	  As a result, the code is now much easier to read.

	- String handling and error printing has been significantly
	  revamped.  We now use sbufs to do string formatting instead
	  of using printfs (for the kernel) and snprintf/strncat (for
	  userland) as before.

	  There is a new catchall error printing routine,
	  cam_error_print() and its string-based counterpart,
	  cam_error_string() that allow the kernel and userland
	  applications to pass in a CCB and have errors printed out
	  properly, whether or not they're SCSI errors.  Among other
	  things, this helped eliminate a fair amount of duplicate code
	  in camcontrol.

	  We now print out more information than before, including
	  the CAM status and SCSI status and the error recovery action
	  taken to remedy the problem.

	- sbufs are now available in userland, via libsbuf.  This
	  change was necessary since most of the error printing code
	  is shared between libcam and the kernel.

	- A new transfer settings interface is included in this checkin.
	  This code is #ifdef'ed out, and is primarily intended to aid
	  discussion with HBA driver authors on the final form the
	  interface should take.  There is example code in the ahc(4)
	  driver that implements the HBA driver side of the new
	  interface.  The new transfer settings code won't be enabled
	  until we're ready to switch all HBA drivers over to the new
	  interface.

src/Makefile.inc1,
lib/Makefile:		Add libsbuf.  It must be built before libcam,
			since libcam uses sbuf routines.

libcam/Makefile:	libcam now depends on libsbuf.

libsbuf/Makefile:	Add a makefile for libsbuf.  This pulls in the
			sbuf sources from sys/kern.

bsd.libnames.mk:	Add LIBSBUF.

camcontrol/Makefile:	Add -lsbuf.  Since camcontrol is statically
			linked, we can't depend on the dynamic linker
			to pull in libsbuf.

camcontrol.c:		Use cam_error_print() instead of checking for
			CAM_SCSI_STATUS_ERROR on every failed CCB.

sbuf.9:			Change the prototypes for sbuf_cat() and
			sbuf_cpy() so that the source string is now a
			const char *.  This is more in line wth the
			standard system string functions, and helps
			eliminate warnings when dealing with a const
			source buffer.

			Fix a typo.

cam.c:			Add description strings for the various CAM
			error status values, as well as routines to
			look up those strings.

			Add new cam_error_string() and
			cam_error_print() routines for userland and
			the kernel.

cam.h:			Add a new CAM flag, CAM_RETRY_SELTO.

			Add enumerated types for the various options
			available with cam_error_print() and
			cam_error_string().

cam_ccb.h:		Add new transfer negotiation structures/types.

			Change inq_len in the ccb_getdev structure to
			be "reserved".  This field has never been
			filled in, and will be removed when we next
			bump the CAM version.

cam_debug.h:		Fix typo.

cam_periph.c:		Modularize cam_periph_error().  The SCSI error
			handling part of cam_periph_error() is now
			in camperiphscsistatuserror() and
			camperiphscsisenseerror().

			In cam_periph_lock(), increase the reference
			count on the periph while we wait for our lock
			attempt to succeed so that the periph won't go
			away while we're sleeping.

cam_xpt.c:		Add new transfer negotiation code.  (ifdefed
			out)

			Add a new function, xpt_path_string().  This
			is a string/sbuf analog to xpt_print_path().

scsi_all.c:		Revamp string handing and error printing code.
			We now use sbufs for much of the string
			formatting code.  More of that code is shared
			between userland the kernel.

scsi_all.h:		Get rid of SS_TURSTART, it wasn't terribly
			useful in the first place.

			Add a new error action, SS_REQSENSE.  (Send a
			request sense and then retry the command.)
			This is useful when the controller hasn't
			performed autosense for some reason.

			Change the default actions around a bit.

scsi_cd.c,
scsi_da.c,
scsi_pt.c,
scsi_ses.c:		SF_RETRY_SELTO -> CAM_RETRY_SELTO.  Selection
			timeouts shouldn't be covered by a sense flag.

scsi_pass.[ch]:		SF_RETRY_SELTO -> CAM_RETRY_SELTO.

			Get rid of the last vestiges of a read/write
			interface.

libkern/bsearch.c,
sys/libkern.h,
conf/files:		Add bsearch.c, which is needed for some of the
			new table lookup routines.

aic7xxx_freebsd.c:	Define AHC_NEW_TRAN_SETTINGS if
			CAM_NEW_TRAN_CODE is defined.

sbuf.h,
subr_sbuf.c:		Add the appropriate #ifdefs so sbufs can
			compile and run in userland.

			Change sbuf_printf() to use vsnprintf()
			instead of kvprintf(), which is only available
			in the kernel.

			Change the source string for sbuf_cpy() and
			sbuf_cat() to be a const char *.

			Add __BEGIN_DECLS and __END_DECLS around
			function prototypes since they're now exported
			to userland.

kdump/mkioctls:		Include stdio.h before cam.h since cam.h now
			includes a function with a FILE * argument.

Submitted by:	gibbs (mostly)
Reviewed by:	jdp, marcel (libsbuf makefile changes)
Reviewed by:	des (sbuf changes)
Reviewed by:	ken
2001-03-27 05:45:52 +00:00
Boris Popov
602ef63172 Previous commit broke interlock locking for !LK_RETRY case. 2001-03-26 12:45:35 +00:00
Poul-Henning Kamp
f83880518b Send the remains (such as I have located) of "block major numbers" to
the bit-bucket.
2001-03-26 12:41:29 +00:00
Boris Popov
71d8277b51 Prevent race condition by using msleep() instead of mtx_unlock()/tsleep().
Reviewed by:	alfred
2001-03-26 03:10:07 +00:00
Bosko Milekic
2ba1a89559 Move the atomic() mbstat.m_drops incrementing to the MGET(HDR) and
MCLGET macros in order to avoid incrementing the drop count twice.
Otherwise, in some cases, we may increment m_drops once in m_mballoc()
for example, and increment it again in m_mballoc_wait() if the
wait fails.
2001-03-24 23:47:52 +00:00
John Baldwin
1f723035c8 Use (..., "%s", foo) instead of (..., foo) to avoid a warning about a
non-constant format string when calling kthread_create() to create an
ithread.
2001-03-24 06:26:47 +00:00
Peter Wemm
32e479705a This is kind of a hack, but it should work. Currently, world is broken
because libc/rpc/key_call.c references uname(), and ps/print.c also
defines uname(), and ps is linked statically.  This leads to a symbol
clash.  The userland uname(3) kinda sucked anyway as the hostname
etc was too short.  And since the libc rpc interface now uses
the utsname.nodename which gets truncated, I was tempted into doing
something about it.  Create a new userland uname function, called
__xuname() which takes an extra argument that allows you to change
the size of the fields.  uname() becomes a static inline function
in sys/utsname.h that passes the extra argument in.  struct utsname
has its field members expanded by default now in userland.
We still provide a 'uname' externally linkable function for things
that either think that they ``know'' the utsname format and assume
32 character strings and bypass the include file, or objects that
are linked against old libcs.  ie: just about every plausible
case that I can think of is covered.  Should we ever change the
default lengths again, a libc major bump should not be required
as the size is now passed to the function.

XXX the uname(2) in the kernel is for FreeBSD 1.1 binary compatability!
All the uname(3) functions that are exported to userland are actually
implemented in libc with sysctl.  uname(1) uses sysctl directly and
does not call uname(3).

PR:		bin/4688
2001-03-24 04:40:49 +00:00
John Baldwin
bae3a80b16 Just use the proc lock to protect read accesses to p_pptr rather than the
more expensive proctree lock.
2001-03-24 04:00:01 +00:00
John Baldwin
8d2725181a Protect p_wmesg and p_wchan with sched_lock while checking for deadlocks
with other byte range file locks.
2001-03-24 03:57:44 +00:00
Alfred Perlstein
ec4dff5e50 replace calls to non-existant bail() subroutine with calls to
the die() builtin function.
2001-03-23 11:48:50 +00:00
Boris Popov
a91f68bca6 o Actually extract version of interface and store it along with the name.
o Add new parameter to the modlist_lookup() function to perform lookups
  with strict version matching.

o Collapse duplicate code to function(s).
2001-03-22 08:58:45 +00:00
Boris Popov
303b15f193 Slightly reorganize code in the linker_load_dependancies() function to make
codepath more straightforward.
2001-03-22 07:55:33 +00:00
Boris Popov
804f27299d Remove support for old way of handling module dependencies.
Approved by:	peter
2001-03-22 07:14:42 +00:00
Poul-Henning Kamp
71d033119f Make the pseudo-driver for "/dev/fd/*" handle fd's larger than 255.
PR:	25936
2001-03-20 13:26:13 +00:00
Poul-Henning Kamp
15b6f00fd1 Add a KASSERT on unit2minor() so that we catch it if people try to pass
us unit numbers which doesn't fit in 24 bits.
2001-03-20 13:24:24 +00:00
Bruce Evans
0abc15fd0b Fixed breakage of access() in rev.1.164. Wrong credentials were used for
the final path component.
2001-03-20 09:38:05 +00:00
Peter Wemm
439fea92c2 Use the same API as the example code.
Allow the initial hash value to be passed in, as the examples do.
Incrementally hash in the dvp->v_id (using the official api) rather than
add it.  This seems to help power-of-two predictable filename trees
where the filenames repeat on a power-of-two cycle and the directory trees
have power-of-two components in it.  The simple add then mask was causing
things like 12000+ entry collision chains while most other entries have
between 0 and 3 entries each.  This way seems to improve things.
2001-03-20 02:10:18 +00:00
Robert Watson
231b9e916a o Rename "namespace" argument to "attrnamespace" as namespace is a C++
reserved word.  Part 2 of syscalls.master commit to catch rebuilt
  files.

Submitted by:	jkh
Obtained from:	TrustedBSD Project
2001-03-19 05:48:58 +00:00
Robert Watson
3063207147 o Rename "namespace" argument to "attrnamespace" as namespace is a C++
reserved word.

Submitted by:	jkh
Obtained from:	TrustedBSD Project
2001-03-19 05:44:15 +00:00
Bosko Milekic
9612101e4c Fix a couple of things in the internal mbuf allocation interface:
- Make sure that m_mballoc() really doesn't allow over nmbufs mbufs to
  be allocated from mb_map. In the case where nmbufs-reserved space is not
  an exact multiple of PAGE_SIZE (which it should be, but anyway...), we
  hold nmbufs as an absolute maximum which need not ever be reached.

- Clean up m_clalloc(); make it more consistent in the sense that the first
  argument `ncl' really means "the number of clusters ensured to be allocated"
  and not "the number of pages worth of clusters to be allocated," as was
  previously the case. This also makes it consistent with m_mballoc() as well
  as the comment that preceeds it.

Reviewed by: jlemon
2001-03-17 23:23:24 +00:00
Peter Wemm
6eb39ac8fc Use a generic implementation of the Fowler/Noll/Vo hash (FNV hash).
Make the name cache hash as well as the nfsnode hash use it.

As a special tweak, create an unsigned version of register_t.  This allows
us to use a special tweak for the 64 bit versions that significantly
speeds up the i386 version (ie: int64 XOR int64 is slower than int64
XOR int32).

The code layout is a little strange for the string function, but I was
able to get between 5 to 10% improvement over the original version I
started with. The layout affects gcc code generation choices and this way
was fastest on x86 and alpha.

Note that 'CPUTYPE=p3' etc makes a fair difference to this.  It is
around 45% faster with -march=pentiumpro on a p6 cpu.
2001-03-17 09:31:06 +00:00
Jonathan Lemon
4d286823c5 When doing a recv(.. MSG_WAITALL) for a message which is larger than
the socket buffer size, the receive is done in sections.  After completing
a read, call pru_rcvd on the underlying protocol before blocking again.

This allows the the protocol to take appropriate action, such as
sending a TCP window update to the peer, if the window happened to
close because the socket buffer was filled.  If the protocol is not
notified, a TCP transfer may stall until the remote end sends a window
probe.
2001-03-16 22:37:06 +00:00
Peter Wemm
50e2347e68 Kill the 4MB kernel limit dead. [I hope :-)].
For UP, we were using $tmp_stk as a stack from the data section.  If the
kernel text section grew beyond ~3MB, the data section would be pushed
beyond the temporary 4MB P==V mapping.  This would cause the trampoline
up to high memory to fault.  The hack workaround I did was to use all of
the page table pages that we already have while preparing the initial
P==V mapping, instead of just the first one.
For SMP, the AP bootstrap process suffered the same sort of problem and
got the same treatment.

MFC candidate - this breaks on 4.x just the same..

Thanks to:	Richard Todd <rmtodd@ichotolot.servalan.com>
2001-03-15 05:10:06 +00:00
Peter Wemm
6fe01250f4 Jake essentially rewrote this. It is not by any stretch of the
imagination a derivative of what I did before.
2001-03-15 05:02:08 +00:00
Peter Wemm
043cc5a602 Regenerate after rwatson's commit to syscalls.master (rev 1.85) 2001-03-15 04:43:57 +00:00
Robert Watson
70f3685105 o Change the API and ABI of the Extended Attribute kernel interfaces to
introduce a new argument, "namespace", rather than relying on a first-
  character namespace indicator.  This is in line with more recent
  thinking on EA interfaces on various mailing lists, including the
  posix1e, Linux acl-devel, and trustedbsd-discuss forums.  Two namespaces
  are defined by default, EXTATTR_NAMESPACE_SYSTEM and
  EXTATTR_NAMESPACE_USER, where the primary distinction lies in the
  access control model: user EAs are accessible based on the normal
  MAC and DAC file/directory protections, and system attributes are
  limited to kernel-originated or appropriately privileged userland
  requests.

o These API changes occur at several levels: the namespace argument is
  introduced in the extattr_{get,set}_file() system call interfaces,
  at the vnode operation level in the vop_{get,set}extattr() interfaces,
  and in the UFS extended attribute implementation.  Changes are also
  introduced in the VFS extattrctl() interface (system call, VFS,
  and UFS implementation), where the arguments are modified to include
  a namespace field, as well as modified to advoid direct access to
  userspace variables from below the VFS layer (in the style of recent
  changes to mount by adrian@FreeBSD.org).  This required some cleanup
  and bug fixing regarding VFS locks and the VFS interface, as a vnode
  pointer may now be optionally submitted to the VFS_EXTATTRCTL()
  call.  Updated documentation for the VFS interface will be committed
  shortly.

o In the near future, the auto-starting feature will be updated to
  search two sub-directories to the ".attribute" directory in appropriate
  file systems: "user" and "system" to locate attributes intended for
  those namespaces, as the single filename is no longer sufficient
  to indicate what namespace the attribute is intended for.  Until this
  is committed, all attributes auto-started by UFS will be placed in
  the EXTATTR_NAMESPACE_SYSTEM namespace.

o The default POSIX.1e attribute names for ACLs and Capabilities have
  been updated to no longer include the '$' in their filename.  As such,
  if you're using these features, you'll need to rename the attribute
  backing files to the same names without '$' symbols in front.

o Note that these changes will require changes in userland, which will
  be committed shortly.  These include modifications to the extended
  attribute utilities, as well as to libutil for new namespace
  string conversion routines.  Once the matching userland changes are
  committed, a buildworld is recommended to update all the necessary
  include files and verify that the kernel and userland environments
  are in sync.  Note: If you do not use extended attributes (most people
  won't), upgrading is not imperative although since the system call
  API has changed, the new userland extended attribute code will no longer
  compile with old include files.

o Couple of minor cleanups while I'm there: make more code compilation
  conditional on FFS_EXTATTR, which should recover a bit of space on
  kernels running without EA's, as well as update copyright dates.

Obtained from:	TrustedBSD Project
2001-03-15 02:54:29 +00:00
Søren Schmidt
b417a1a8c8 Dont call device close and ioctl functions if device has disappeared.
Reviewed by: phk
2001-03-13 08:45:05 +00:00
Dag-Erling Smørgrav
9cbd039343 Assert that the process we're trying to enqueue isn't already there. 2001-03-11 18:57:30 +00:00
Alan Cox
136446540a When aio_read/write() is used on a raw device, physical buffers are
used for up to "vfs.aio.max_buf_aio" of the requests.  If a request
size is MAXPHYS, but the request base isn't page aligned, vmapbuf()
will map the end of the user space buffer into the start of the kva
allocated for the next physical buffer.  Don't use a physical buffer
in this case.  (This change addresses problem report 25617.)

When an aio_read/write() on a raw device has completed, timeout() is
used to schedule a signal to the process.  Thus, the reporting is
delayed up to 10 ms (assuming hz is 100).  The process might have
terminated in the meantime, causing a trap 12 when attempting to
deliver the signal.  Thus, the timeout must be cancelled when removing
the job.

aio jobs in state JOBST_JOBQGLOBAL should be removed from the
kaio_jobqueue list during process rundown.

During process rundown, some aio jobs might move from one list to a
different list that has already been "emptied", causing the rundown to
be incomplete.  Retry the rundown.

A call to BUF_KERNPROC() is needed after obtaining a physical buffer
to disassociate the lock from the running process since it can return
to userland without releasing that lock.

PR:		25617
Submitted by:	tegge
2001-03-10 22:47:57 +00:00
Alfred Perlstein
9708152c20 Don't call malloc with M_WAITOK while holding a mutex. 2001-03-09 18:40:34 +00:00
Jonathan Lemon
c0647e0d07 Push the test for a disconnected socket when accept()ing down to the
protocol layer.  Not all protocols behave identically.  This fixes the
brokenness observed with unix-domain sockets (and postfix)
2001-03-09 08:16:40 +00:00
John Baldwin
5db078a9be Fix mtx_legal2block. The only time that it is bad to block on a mutex is
if we hold a spin mutex, since we can trivially get into deadlocks if we
start switching out of processes that hold spinlocks.  Checking to see if
interrupts were disabled was a sort of cheap way of doing this since most
of the time interrupts were only disabled when holding a spin lock.  At
least on the i386.  To fix this properly, use a per-process counter
p_spinlocks that counts the number of spin locks currently held, and
instead of checking to see if interrupts are disabled in the witness code,
check to see if we hold any spin locks.  Since child processes always
start up with the sched lock magically held in fork_exit(), we initialize
p_spinlocks to 1 for child processes.  Note that proc0 doesn't go through
fork_exit(), so it starts with no spin locks held.

Consulting from:	cp
2001-03-09 07:24:17 +00:00
Alan Cox
c9a970a79f Use the kthread API to create and destroy AIO daemons.
Submitted by:	jhb
2001-03-09 06:27:01 +00:00
John Baldwin
3a3f608288 Add a new informative KASSERT to ensure that a process is in the SRUN state
before we return it to cpu_switch().
2001-03-09 03:59:50 +00:00
Bosko Milekic
4bde2ac539 Fix is a similar race condition as existed in the mbuf code. When we go
into an interruptable sleep and we increment a sleep count, we make sure
that we are the thread that will decrement the count when we wakeup.
Otherwise, what happens is that if we get interrupted (signal) and we
have to wake up, but before we get our mutex, some thread that wants
to wake us up detects that the count is non-zero and so enters wakeup_one(),
but there's nothing on the sleep queue and so we don't get woken up. The
thread will still decrement the sleep count, which is bad because we will
also decrement it again later (as we got interrupted) and are already off
the sleep queue.
2001-03-08 19:21:45 +00:00
David Malone
2239c07de9 Make the wait for sendfile buffers interruptable. Stops one process
consuming them all and then getting stuck.

Reviewed by:	dg
Reviewed by:	bmilekic
Observed by:	Andreas Persson <pap@garen.net>
2001-03-08 16:28:10 +00:00
Thomas Moestl
3a51557243 Make the SYSCTL_OUT handlers sysctl_old_user() and sysctl_old_kernel()
more robust. They would correctly return ENOMEM for the first time when
the buffer was exhausted, but subsequent calls in this case could cause
writes ouside of the buffer bounds.

Approved by:	rwatson
2001-03-08 01:20:43 +00:00
Kirk McKusick
589c7af992 Fixes to track snapshot copy-on-write checking in the specinfo
structure rather than assuming that the device vnode would reside
in the FFS filesystem (which is obviously a broken assumption with
the device filesystem).
2001-03-07 07:09:55 +00:00
Kirk McKusick
393d77ffad Bitch more loudly when someone botches changes to kinfo_proc
in the hopes that they will actually *read* the comment above
it and *follow* the instructions so as to cause all the rest
of us less a lot less grief.
2001-03-07 06:52:12 +00:00
John Baldwin
5641ae5dc3 - Don't hold the proc lock across VREF and the fd* functions to avoid lock
order reversals.
- Add some preliminary locking in the !RF_PROC case.
- Protect p_estcpu with sched_lock.
2001-03-07 05:21:47 +00:00
John Baldwin
f227364a17 - Release Giant a bit earlier on syscall exit.
- Don't try to grab Giant before postsig() in userret() as it is no longer
  needed.
- Don't grab Giant before psignal() in ast() but get the proc lock instead.
2001-03-07 03:53:39 +00:00
John Baldwin
19eb87d22a Grab the process lock while calling psignal and before calling psignal. 2001-03-07 03:37:06 +00:00
John Baldwin
15e9ec5153 Proc locking including using proc lock in place of proctree where
appropriate and locking processes while we signal them.
2001-03-07 03:28:50 +00:00
John Baldwin
e65897c381 Proc locking. 2001-03-07 03:27:32 +00:00
John Baldwin
28aa95b6ee Use the proc lock to protect access to p_sigacts->ps_sigintr. 2001-03-07 03:26:39 +00:00
John Baldwin
731a1aea4c - Proc locking.
- Remove some unneeded spl()'s.
2001-03-07 03:06:18 +00:00
John Baldwin
378240232a Lock the process while sending it SIGARLM and updating p_realtimer. 2001-03-07 03:02:56 +00:00
John Baldwin
eed4805444 - Proc locking.
- Remove unneeded spl()'s.
2001-03-07 03:01:53 +00:00
John Baldwin
628d2653d6 - Proc locking. Most of signal handling is now MP safe and doesn't require
Giant.  The only exception is the CANSIGNAL() macro.  Unlocking the proc
  lock around sendsig() in trapsignal() is also questionable.  Note that
  the functions sigexit(), psignal(), and issignal() must be called with
  the proc lock of the process in question held.  postsig() and
  trapsignal() should not be called with the proc lock held, but they
  also do not require Giant anymore either.
- Remove spl's that are now no longer needed as they are fully replaced.
2001-03-07 02:59:54 +00:00
John Baldwin
87729a2b64 Lock initproc when we send SIGINT to init during shutdown. 2001-03-07 02:50:09 +00:00
John Baldwin
1b43703b47 - Add an extra check in priority_propagation() for UP systems to ensure we
don't end up back at ourselves which would indicate deadlock.
- Add the proc lock to the witness dup_list as we may hold more than one
  process lock at a time.
- Don't assert a mutex is owned in _mtx_unlock_sleep() as that is too late.
  We do the checks in the macros instead.
2001-03-07 02:45:15 +00:00
John Baldwin
6451855f6d - Use _PHOLD and move it before a PROC_UNLOCK to reduce the number of
mutex operations in kthread_create().
- Lock a kthread's proc before changing its parent via proc_reparent().
- Test P_KTHREAD not P_SYSTEM in kthread_suspend() and kthread_resume().
  P_SYSTEM just means that the process shouldn't be swapped and is used
  for vinum's daemon for example.
- Lock all the signal state used for suspending and resuming kthreads with
  the proc lock.
2001-03-07 02:36:47 +00:00
John Baldwin
57934cd3c8 - Lock the forklist with an sx lock.
- Add proc locking to fork1().  Always lock the child procoess (new
  process) first when both processes need to be locked at the same
  time.
- Remove unneeded spl()'s as the data they protected is now locked.
- Ensure that the proctree is exclusively locked and the new process is
  locked when setting up the parent process pointer.
- Lock the check for P_KTHREAD in p_flag in fork_exit().
2001-03-07 02:30:39 +00:00
John Baldwin
2aa33d2f1e Check to see if p_fd is NULL before derferencing it in checkdirs(). It's
possible for us to see a process in the early stages of fork before p_fd
has been initialized.  Ideally, we wouldn't stick a process on the allproc
list until it was fully created however.
2001-03-07 02:25:13 +00:00
John Baldwin
c65437a326 - Call proc_reparent() when handing a process off to init in exit rather
than dinking around in the process lists explicitly.
- Hold both the proctree lock and proc lock of the child process when
  reparenting a process via proc_reparent.
- Lock processes while sending them signals.
- Miscellaenous proc locking.
- proc_reparent() now asserts that the child is locked in addition to an
  exclusive proctree lock.
2001-03-07 02:22:31 +00:00
John Baldwin
7331c2a252 In order to avoid recursing on the backing mutex for sx locks in the
INVARIANTS case, define the actual KASSERT() in _SX_ASSERT_[SX]LOCKED
macros that are used in the sx code itself and convert the
SX_ASSERT_[SX]LOCKED macros to simple wrappers that grab the mutex for the
duration of the check.
2001-03-06 23:13:15 +00:00
Dag-Erling Smørgrav
cab5b963a0 Make the KASSERTs report the correct function names.
Fix two off-by-one errors that would sometimes cause the final length of
the sbuf to include the trailing zero.
2001-03-06 17:48:26 +00:00
Robert Watson
5293465fef o Introduce filesystem-independent POSIX.1e ACL utility routines to
support implementations of ACLs in file systems.  Introduce the
  following new functions:

      vaccess_acl_posix1e()          vaccess() that accepts an ACL
      acl_posix1e_mode_to_perm()     Convert mode bits to ACL rights
      acl_posix1e_mode_to_entry()    Build ACL entry from mode/uid/gid
      acl_posix1e_perms_to_mode()    Generate file mode from ACL
      acl_posix1e_check()            Syntax verification for ACL

  These functions allow a file system to rely on central ACL evaluation
  and syntax checking, as well as providing useful utilities to
  allow ACL-based file systems to generate mode/owner/etc information
  to return via VOP_GETATTR(), and to support file systems that split
  their ACL information over their existing inode storage (mode, uid,
  gid) and extended ACL into extended attributes (additional users,
  groups, ACL mask).

o Add prototypes for exported functions to sys/acl.h, sys/vnode.h

Reviewed by:	trustedbsd-discuss, freebsd-arch
Obtained from:	TrustedBSD Project
2001-03-06 17:28:24 +00:00
Alan Cox
9c8a2647f6 Add a missing splx() to aio_fphysio(). (This change is a no-op in -5.0,
but potentially significant in -4.x.)

Eliminate a pointless parameter to aio_fphysio().

Remove unnecessary casts from aio_fphysio() and aio_physwakeup().
2001-03-06 15:54:38 +00:00
Bosko Milekic
af76144992 - Add sx_descr description member to sx lock structure
- Add sx_xholder member to sx struct which is used for INVARIANTS-enabled
  assertions. It indicates the thread that presently owns the xlock.
- Add some assertions to the sx lock code that will detect the fatal
  API abuse:
     xlock --> xlock
     xlock --> slock
  which now works thanks to sx_xholder.
  Notice that the remaining two problematic cases:
     slock --> xlock
     slock --> slock (a little less problematic, but still recursion)
  will need to be handled by witness eventually, as they are more
  involved.

Reviewed by: jhb, jake, jasone
2001-03-06 06:17:05 +00:00
Jason Evans
6281b30a73 Implement shared/exclusive locks.
Reviewed by:	bmilekic, jake, jhb
2001-03-05 19:59:41 +00:00
Alan Cox
88ed460e6b Eliminate the aio_freejobs list. Its purpose was to store free
aiocb's allocated by zalloc().  In other words, zfree() was never
 called.  Now, we call zfree().  Why eliminate this micro-
 optimization?  At some later point, when we multithread the AIO
 system, we would need a mutex to synchronize access to aio_freejobs,
 making its use nearly indistinguishable in cost from zalloc() and
 zfree().

Remove unnecessary fhold() and fdrop() calls from aio_qphysio(),
 undo'ing a part of revision 1.86.  The reference count on the file
 structure is already incremented by _aio_aqueue() before it calls
 aio_qphysio().  (Update the comments to document this fact.)

Remove unnecessary casts from _aio_aqueue(), aio_read(), aio_write()
 and aio_waitcomplete().

Remove an unnecessary "return;" from aio_process().

Add "static" in various places.
2001-03-05 01:30:23 +00:00
David E. O'Brien
828c9e13a3 Do not set a default ELF syscall ABI fallback.
If one runs an un-branded Linux static binary that calls Linux's fcntl
the machine will reboot when interupted by the FreeBSD syscall ABI.
2001-03-04 11:58:50 +00:00
Assar Westerlund
3617ddfc33 implement OCRNL, ONOCR, and ONLRET
Obtained from:	NetBSD
2001-03-04 06:04:50 +00:00
Alan Cox
fb579e9a61 Remove the field privatemodes from struct __aiocb_private and the
related code from aio_read() and aio_write().  This field was
intended, but never used, to allow a mythical user-level library to
make an aio_read() or aio_write() behave like an ordinary read() or
write(), i.e., a blocking I/O operation.
2001-03-04 01:22:23 +00:00
Adrian Chadd
fbedc11796 Mismatched MFSNAMELEN and MNAMELEN with fstype / fspath.
Submitted by:	Naoki Kobayashi <shibata@geo.titech.ac.jp>
2001-03-02 14:05:49 +00:00
John Baldwin
003fb9ec2f Ok, the kernel will panic in kmem_malloc() if the kernel map is full, so
malloc with M_WAITOK can't actually return NULL.  I wish I could get two
people to give me the same answer about this when I ask...

Submitted by:	jake
2001-03-02 06:07:38 +00:00
John Baldwin
653dd8c243 - Check to see if malloc() returned NULL even with M_WAITOK.
- Add a KASSERT() to ensure an ithread has a backing kernel thread when we
  schedule it.
- Don't attempt to preemptively switch to an ithread if p_stat of curproc
  is not SRUN.
2001-03-02 05:33:03 +00:00
Adrian Chadd
f3a90da995 Reviewed by: jlemon
An initial tidyup of the mount() syscall and VFS mount code.

This code replaces the earlier work done by jlemon in an attempt to
make linux_mount() work.

* the guts of the mount work has been moved into vfs_mount().

* move `type', `path' and `flags' from being userland variables into being
  kernel variables in vfs_mount(). `data' remains a pointer into
  userspace.

* Attempt to verify the `type' and `path' strings passed to vfs_mount()
  aren't too long.

* rework mount() and linux_mount() to take the userland parameters
  (besides data, as mentioned) and pass kernel variables to vfs_mount().
  (linux_mount() already did this, I've just tidied it up a little more.)

* remove the copyin*() stuff for `path'. `data' still requires copyin*()
  since its a pointer into userland.

* set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each
  filesystem.  This variable is generally initialised with `path', and
  each filesystem can override it if they want to.

* NOTE: f_mntonname is intiailised with "/" in the case of a root mount.
2001-03-01 21:00:17 +00:00
Ian Dowse
a90ef2ae0f The kernel did not hold a vnode reference associated with the
`rootvnode' pointer, but vfs_syscalls.c's checkdirs() assumed that
it did. This bug reliably caused a panic at reboot time if any
filesystem had been mounted directly over /.

The checkdirs() function is called at mount time to find any process
fd_cdir or fd_rdir pointers referencing the covered mountpoint
vnode. It transfers these to point at the root of the new filesystem.
However, this process was not reversed at unmount time, so processes
with a cwd/root at a mount point would unexpectedly lose their
cwd/root following a mount-unmount cycle at that mountpoint.

This change should fix both of the above issues. Start_init() now
holds an extra vnode reference corresponding to `rootvnode', and
dounmount() releases this reference when the root filesystem is
unmounted just before reboot. Dounmount() now undoes the actions
taken by checkdirs() at mount time; any process cdir/rdir pointers
that reference the root vnode of the unmounted filesystem are
transferred to the now-uncovered vnode.

Reviewed by:	bde, phk
2001-02-28 20:54:28 +00:00
Julian Elischer
a96dcd84d2 Shuffle netgraph mutexes a bit and hold a reference on a node
from the function that is calling the destructor.
2001-02-28 18:49:09 +00:00
Matthew Dillon
63692125a9 Fix lockup for loopback NFS mounts. The pipelined I/O limitations could be
hit on the client side and prevent the server side from retiring writes.
Pipeline operations turned off for all READs (no big loss since reads are
usually synchronous) and for NFS writes, and left on for the default bwrite().
(MFC expected prior to 4.3 freeze)

Testing by: mjacob, dillon
2001-02-28 04:13:11 +00:00
Jake Burkholder
5b270b2a55 Sigh. Try to get priorities sorted out. Don't bother trying to
update native priority, it is diffcult to get right and likely
to end up horribly wrong.  Use an honestly wrong fixed value
that seems to work; PUSER for user threads, and the interrupt
priority for ithreads.  Set it once when the process is created
and forget about it.

Suggested by:	bde
Pointy hat:	me
2001-02-28 02:53:44 +00:00
Jonathan Lemon
ea0237ed11 Correctly declare variables as u_int rather than doing typecasts.
Kill some register declarations while I'm here.

Submitted by:  bde (1)
2001-02-27 15:11:31 +00:00
Ruslan Ermilov
8ac6dca795 In soshutdown(), use SHUT_{RD,WR,RDWR} instead of FREAD and FWRITE.
Also, return EINVAL if `how' is invalid, as required by POSIX spec.
2001-02-27 13:48:07 +00:00
Jonathan Lemon
0b7088c4d0 Cast nfds to u_int before range checking it in order to catch negative
values.

PR:	25393
2001-02-27 00:50:20 +00:00
Jake Burkholder
be15bfc091 Initialize native priority to PRI_MAX. It was usually 0 which made a
process's priority go through the roof when it released a (contested)
mutex.  Only set the native priority in mtx_lock if hasn't already
been set.

Reviewed by:	jhb
2001-02-26 23:27:35 +00:00
Jake Burkholder
a10f496636 Remove brackets around variables in a function that used to be
a macro.
2001-02-25 16:18:13 +00:00
Peter Wemm
d6df01d823 Make this compile in a.out mode. link.h has extra dependencies for a.out. 2001-02-25 07:26:54 +00:00
Peter Wemm
1a5f13cfbf Manually add an extra _ to _DYNAMIC since it is provided by ld, not gcc.
Make the rest compile.
2001-02-25 07:25:05 +00:00
Bosko Milekic
096e2dd9d8 Remove superfluous m_pkthdr.rcv_if = NULL assignment following
m_gethdr() mbuf allocation, which already does this for us.
2001-02-25 06:33:50 +00:00
Julian Elischer
7433466190 Move netgraph spimlock order entries out of
the #ifdef SMP section. They need to be there for UP too.
2001-02-25 04:56:23 +00:00
Jake Burkholder
631d7bf3da - Rename the lcall system call handler from Xsyscall to Xlcall_syscall
to be more like Xint0x80_syscall and less like c function syscall().
- Reduce code duplication between the int0x80 and lcall handlers by
  shuffling the elfags into the right place, saving the sizeof the
  instruction in tf_err and jumping into the common int0x80 code.

Reviewed by:	peter
2001-02-25 02:53:06 +00:00
David E. O'Brien
21a3ee0ead MFS: bring the consistent `compat_3_brand' support into -CURRENT
(the work was first done in the RELENG_4 branch near a release
	 during a MFC to make the code cleaner and more consistent)
2001-02-24 22:20:11 +00:00
John Baldwin
1103f3b05b Grrr, s/INVARIANTS_SUPPORT/INVARIANT_SUPPORT/. 2001-02-24 21:29:32 +00:00
John Baldwin
15ec816acc - Axe RETIP() as it was very i386 specific and unwieldy. Instead, use the
passed in filename and line number in the KTR tracepoint message.
- Even though it is #if 0'd code, change the code to detect that a process
  is an interrupt thread to check p->p_ithd against NULL rather than
  checking non-existant process flags from BSD/OS.
- Use '%p' to print pointers in KTR log messages instead of assuming
  sizeof(int) == sizeof(void *).
- Don't set p_mtxname to NULL when releasing a mutex.  It doesn't hurt
  to leave it set (we don't clear w_mesg for example) and at least at
  one time in the past, there used to be race conditions in the kernel
  that would result in setting this to NULL causing the kernel to
  dereference NULL.
- Make the _mtx_assert() function be compiled in if INVARIANTS_SUPPORT is
  defined rather than if INVARIANTS is defined so that a KLD compiled
  with INVARIANTS that uses mtx_assert() can be used with a kernel that
  just has INVARIANT_SUPPORT compiled in.
2001-02-24 19:36:13 +00:00
Boris Popov
d8589bd5cb Introduce API for sequential reads/writes (build/dissect) of mbuf chains.
Reviewed by:	Ian Dowse <iedowse@maths.tcd.ie>,
		Bosko Milekic <bmilekic@technokratis.com>,
		Julian Elischer <julian@elischer.org> and arch@/net@
Obtained from:	smbfs
2001-02-24 15:44:30 +00:00
Julian Elischer
33338e7370 Add knowledge of the netgraph spinlocks into the Witness code.
Well, at least I think that's how it's done.
2001-02-24 14:29:47 +00:00
Jake Burkholder
f32ded2fb5 - Assert that the proc to return is not NULL in runq_choose the
same as runq_remove.
- bzero the whole struct runq in runq_init just in case its not
  statically allocated.
2001-02-24 14:06:36 +00:00
John Baldwin
130c1f25a4 It turns out the kernel console works fine and thus doesn't need quite this
much extra testing.
2001-02-24 03:40:23 +00:00
Jonathan Lemon
24607d88ed Add an EV_SET() convenience macro for initializing struct kevent prior
to the call to kevent().

Update the copyright notices as well.
2001-02-24 01:44:03 +00:00
Jonathan Lemon
da403b9df8 Introduce a NOTE_LOWAT flag for use with the read/write filters, which
allow the watermark to be passed in via the data field during the EV_ADD
operation.

Hook this up to the socket read/write filters; if specified, it overrides
the so_{rcv|snd}.sb_lowat values in the filter.

Inspired by: "Ronald F. Guilmette" <rfg@monkeys.com>
2001-02-24 01:41:31 +00:00
Jonathan Lemon
b07540c837 When returning EV_EOF for the socket read/write filters, also return
the current socket error in fflags.  This may be useful for determining
why a connect() request fails.

Inspired by:  "Jonathan Graehl" <jonathan@graehl.org>
2001-02-24 01:33:12 +00:00
Peter Wemm
3e688165a9 Stricter style(9) conformance - remove unnecessary blank lines in previous
commit.
2001-02-23 23:05:46 +00:00
Jonathan Lemon
89bbe051bb Fix typo in comment (knode -> knote). 2001-02-23 20:32:42 +00:00
Jonathan Lemon
7df2842dee Add a NOTE_REVOKE flag for vnodes, which is triggered from within vclean().
Use this to tell a filter attached to a vnode that the underlying vnode is
no longer valid, by returning EV_EOF.

PR: kern/25309, kern/25206
2001-02-23 20:06:01 +00:00
John Baldwin
0b1d793211 Test out the kernel console just before launching the AP's. 2001-02-23 19:44:25 +00:00
Peter Wemm
f1532aadee Activate USER_LDT by default. The new thread libraries are going to
depend on this.  The linux ABI emulator tries to use it for some linux
binaries too.  VM86 had a bigger cost than this and it was made default
a while ago.

Reviewed by:	jhb, imp
2001-02-23 01:25:02 +00:00
Tor Egge
9d0ddf1861 Streamline updating of switchtime (don't copy code from kern_sync.c).
Submitted by:	jhb
2001-02-22 20:16:51 +00:00
Tor Egge
35030da9f8 Backout previous commit. sched_lock is held, thus interrupts are prevented
here.

Submitted by:	jhb
2001-02-22 20:12:52 +00:00
Tor Egge
0d139b3741 Protect update of the per processor switchtime variable against
interrupts.

Protect usage of the per processor switchtime variable against
interrupts in calcru().

This seem to eliminate the "microuptime() went backwards" warnings.
2001-02-22 19:50:37 +00:00
John Baldwin
feb43c5f37 The p_md.md_regs member of proc is used in signal handling to reference
the the original trapframe of the syscall, trap, or interrupt that entered
the kernel.  Before SMPng, ast's were handled via a psuedo trap at the
end of doerti.  With the SMPng commit, ast's were broken out into a
separate ast() function that was called from doreti to match the behavior
of other architectures.  Unfortunately, when this was done, the
p_md.md_regs member of curproc was not updateda in ast(), thus when
signals are handled by userret() after an interrupt that returns to
userland, we end up using a stale trapframe that will result in the
registers from the old trapframe overwriting the real trapframe and
smashing all the registers right before we return to usermode.  The saved
%cs:%eip from where we were in usermode are saved in the trapframe for
example.
2001-02-22 19:35:20 +00:00
John Baldwin
51c9129957 Since the PC is a pointer to a code address, change the second parameter of
addupc_task() and addupc_intr() to be a uintptr_t instead of a u_long.
2001-02-22 18:07:31 +00:00
John Baldwin
f308e0d714 - Change ast() to take a pointer to a trapframe like other architectures.
- Don't use an atomic operation to update cnt.v_soft in ast().  This is
  the only place the variable is written to, and sched_lock is always
  held when it is written, so it is already protected and the mutex release
  of sched_lock asserts a memory barrier that ensures the value will be
  updated in a timely fashion.
2001-02-22 18:05:15 +00:00
John Baldwin
26f9f5c7c7 - Use TRAPF_PC() on the alpha to acess the PC in the trap frame.
- Don't hold sched_lock around addupc_task() as this apparently breaks
  profiling badly due to sched_lock being held across copyin().

Reported by:	bde (2)
2001-02-22 16:23:12 +00:00
John Baldwin
c978f49e20 Add a mtx_assert() in maybe_resched() just to be sure it's always called
with sched_lock held.
2001-02-22 13:47:01 +00:00
John Baldwin
3a18729505 Lock need_resched with sched_lock.
Reported by:	des
2001-02-22 13:46:09 +00:00
John Baldwin
de271f01c2 Work around a race condition where an interrupt handler can be removed from
an interrupt thread while the interrupt thread is blocked on Giant waiting
to execute the interrupt handler being removed.  The result was that the
intrhand structure would be free'd, and we would call 0xdeadc0de.  The work
around is to check to see if the interrupt thread is idle when removing a
handler.  If not, then we mark the interrupt handler as being dead using
the new IH_DEAD flag and don't remove it from the interrupt threads' list
of handlers.  When the interrupt thread resumes, it will see a dead handler
while traversing the list of handlers and will remove the handler then.
2001-02-22 02:18:32 +00:00
John Baldwin
60f2b032fe Just use the ithread->it_proc directly in a KTR tracepoint instead of
assigning a local var to it and using it, as otherwise the local var wasn't
used, and generated a warning in the !KTR case.

Noticed by:	bde
2001-02-22 02:15:57 +00:00
John Baldwin
addec20c38 Add KTR tracepoints for adding/removing interrupt handlers,
creating/destroying interrupt threads, and updating the state of an
interrupt thread.
2001-02-22 02:14:08 +00:00
John Baldwin
25d209f260 - Use the NOCPU constant.
- Move the ithread spin locks before sched lock and clk in preparation for
  future commits to the ithread code.
2001-02-22 02:12:54 +00:00
John Baldwin
9764c9d36e Quiet a warning with a uintptr_t cast.
Noticed by:	bde
2001-02-22 02:10:33 +00:00
John Baldwin
5a93f3e851 - Use the new NOCPU constant.
- Fix a warning.

Noticed by:	bde (2)
2001-02-22 00:32:13 +00:00
John Baldwin
76bd604e7d Fix a bug where the 'ithread' variable was being set in a KASSERT()
condition and thus was not initialized properly in the !INVARIANTS case.

Noticed by:	bde
Pointy hat to:	me
2001-02-22 00:23:56 +00:00
John Baldwin
719f43d3df Remove attempt to add in PREEMPTION #ifdef test in MI code that didn't
work because opt_preemption.h wasn't #include'd.  Instead, make use of the
do_switch parameter to ithread_schedule() and do the check in the alpha
interrupt code.
2001-02-21 22:51:00 +00:00
Boris Popov
03137ec82e Fix parameter order in the calls to MGET(). 2001-02-21 09:24:13 +00:00
Robert Watson
91421ba234 o Move per-process jail pointer (p->pr_prison) to inside of the subject
credential structure, ucred (cr->cr_prison).
o Allow jail inheritence to be a function of credential inheritence.
o Abstract prison structure reference counting behind pr_hold() and
  pr_free(), invoked by the similarly named credential reference
  management functions, removing this code from per-ABI fork/exit code.
o Modify various jail() functions to use struct ucred arguments instead
  of struct proc arguments.
o Introduce jailed() function to determine if a credential is jailed,
  rather than directly checking pointers all over the place.
o Convert PRISON_CHECK() macro to prison_check() function.
o Move jail() function prototypes to jail.h.
o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the
  flag in the process flags field itself.
o Eliminate that "const" qualifier from suser/p_can/etc to reflect
  mutex use.

Notes:

o Some further cleanup of the linux/jail code is still required.
o It's now possible to consider resolving some of the process vs
  credential based permission checking confusion in the socket code.
o Mutex protection of struct prison is still not present, and is
  required to protect the reference count plus some fields in the
  structure.

Reviewed by:	freebsd-arch
Obtained from:	TrustedBSD Project
2001-02-21 06:39:57 +00:00
Tor Egge
d82b3e319a Ensure that RLIMIT_NPROC limits are at least 1 to avoid bad interaction
with chgproccnt.  MFC candiate.

Reviewed by:	alfred
2001-02-20 23:34:16 +00:00
John Baldwin
62d654c142 - In the KTR_EXTEND case, use a const char * to point to the passed in
filename insteada of copying the first 32 characters of it.
- Add in const modifiers for the passed in format strings and filenames
  and their respective members in the ktr_entry struct.
2001-02-20 10:39:55 +00:00
John Baldwin
3e5da75445 - Add a new ithread_schedule() function to do the bulk of the work of
scheduling an interrupt thread to run when needed.  This has the side
  effect of enabling support for entropy gathering from interrupts on
  all architectures.
- Change the software interrupt and x86 and alpha hardware interrupt code
  to use ithread_schedule() for most of their processing when scheduling
  an interrupt to run.
- Remove the pesky Warning message about interrupt threads having entropy
  enabled.  I'm not sure why I put that in there in the first place.
- Add more error checking for parameters and change some cases that
  returned EINVAL to panic on failure instead via KASSERT().
- Instead of doing a documented evil hack of setting the P_NOLOAD flag
  on every interrupt thread whose pri was SWI_CLOCK, set the flag
  explicity for clk_ithd's proc during start_softintr().
2001-02-20 10:25:29 +00:00
John Baldwin
591faa2e45 - Abolish the 'show ktr_first' and 'show ktr_next' commands.
- Add pager capability to the 'show ktr' command.  It functions much like
  'ps': Enter at the prompt displays one more entry, Space displays
  another page, and any other key quits.
2001-02-20 09:53:27 +00:00
Luigi Rizzo
5fe86675f0 Preserve alignment of first mbuf in m_copypacket.
This is useful when doing copies of packet where some leading
space has been preallocated to insert protocol headers.
Note that there are in fact almost no users of m_copypacket.

MFC candidate.
2001-02-20 08:23:41 +00:00
John Baldwin
5813dc03bd - Don't call clear_resched() in userret(), instead, clear the resched flag
in mi_switch() just before calling cpu_switch() so that the first switch
  after a resched request will satisfy the request.
- While I'm at it, move a few things into mi_switch() and out of
  cpu_switch(), specifically set the p_oncpu and p_lastcpu members of
  proc in mi_switch(), and handle the sched_lock state change across a
  context switch in mi_switch().
- Since cpu_switch() no longer handles the sched_lock state change, we
  have to setup an initial state for sched_lock in fork_exit() before we
  release it.
2001-02-20 05:26:15 +00:00
Bruce Evans
0ad74739ac Removed all traces of T_ASTFLT (except for gaps where it was). It became
unused except in dead code when ast() was split off from trap().
2001-02-19 15:47:38 +00:00
Bruce Evans
d2ef4060d7 Fixed a longstanding latency bug in signal delivery. When a signal
is sent to a process, psignal() needs to schedule an AST for the
process if the process is runnable, not just if it is current, so that
pending signals get checked for on the next return of the process to
user mode.  This wasn't practical until recently because the AST flag
was per-cpu so setting it for a non-current process would usually just
cause a bogus AST for the current process.

For non-current processes looping in user mode, it took accidental
(?) magic to deliver signals at all.  Signals were usually delivered
late as a side effect of rescheduling (need_resched() sets astpending,
etc.).  In pre-SMPng, delivery was delayed by at most 1 quantum (the
need_resched() call in roundrobin() is certain to occur within 1
quantum for looping processes).  In -current, things are complicated
by normal interrupt handlers being threads.  Missing handling of the
complications makes roundrobin() a bogus no-op, but preemptive
scheduling sort of works anyway due to even larger bogons elsewhere.
2001-02-19 09:40:58 +00:00
Bruce Evans
866546105a Changed the aston() family to operate on a specified process instead of
always on curproc.  This is needed to implement signal delivery properly
(see a future log message for kern_sig.c).

Debogotified the definition of aston().  aston() was defined in terms
of signotify() (perhaps because only the latter already operated on
a specified process), but aston() is the primitive.

Similar changes are needed in the ia64 versions of cpu.h and trap.c.
I didn't make them because the ia64 is missing the prerequisite changes
to make astpending and need_resched per-process and those changes are
too large to make without testing.
2001-02-19 04:15:59 +00:00
Brian Feldman
c0511d3b58 Switch to using a struct xucred instead of a struct xucred when not
actually in the kernel.  This structure is a different size than
what is currently in -CURRENT, but should hopefully be the last time
any application breakage is caused there.  As soon as any major
inconveniences are removed, the definition of the in-kernel struct
ucred should be conditionalized upon defined(_KERNEL).

This also changes struct export_args to remove dependency on the
constantly-changing struct ucred, as well as limiting the bounds
of the size fields to the correct size.  This means: a) mountd and
friends won't break all the time, b) mountd and friends won't crash
the kernel all the time if they don't know what they're doing wrt
actual struct export_args layout.

Reviewed by:	bde
2001-02-18 13:30:20 +00:00