Commit Graph

3741 Commits

Author SHA1 Message Date
Alfred Perlstein
6157b69f4a Instead of asserting that a mutex is not still locked after unlocking it,
assert that the mutex is owned and not recursed prior to unlocking it.

This should give a clearer diagnostic when a programming error is caught.
2001-04-28 12:11:01 +00:00
John Baldwin
6caa8a1501 Overhaul of the SMP code. Several portions of the SMP kernel support have
been made machine independent and various other adjustments have been made
to support Alpha SMP.

- It splits the per-process portions of hardclock() and statclock() off
  into hardclock_process() and statclock_process() respectively.  hardclock()
  and statclock() call the *_process() functions for the current process so
  that UP systems will run as before.  For SMP systems, it is simply necessary
  to ensure that all other processors execute the *_process() functions when the
  main clock functions are triggered on one CPU by an interrupt.  For the alpha
  4100, clock interrupts are delievered in a staggered broadcast fashion, so
  we simply call hardclock/statclock on the boot CPU and call the *_process()
  functions on the secondaries.  For x86, we call statclock and hardclock as
  usual and then call forward_hardclock/statclock in the MD code to send an IPI
  to cause the AP's to execute forwared_hardclock/statclock which then call the
  *_process() functions.
- forward_signal() and forward_roundrobin() have been reworked to be MI and to
  involve less hackery.  Now the cpu doing the forward sets any flags, etc. and
  sends a very simple IPI_AST to the other cpu(s).  AST IPIs now just basically
  return so that they can execute ast() and don't bother with setting the
  astpending or needresched flags themselves.  This also removes the loop in
  forward_signal() as sched_lock closes the race condition that the loop worked
  around.
- need_resched(), resched_wanted() and clear_resched() have been changed to take
  a process to act on rather than assuming curproc so that they can be used to
  implement forward_roundrobin() as described above.
- Various other SMP variables have been moved to a MI subr_smp.c and a new
  header sys/smp.h declares MI SMP variables and API's.   The IPI API's from
  machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h.
- The globaldata_register() and globaldata_find() functions as well as the
  SLIST of globaldata structures has become MI and moved into subr_smp.c.
  Also, the globaldata list is only available if SMP support is compiled in.

Reviewed by:	jake, peter
Looked over by:	eivind
2001-04-27 19:28:25 +00:00
Alfred Perlstein
3abedb4e01 Actually show the values that tripped the assertion "receive 1" 2001-04-27 13:42:50 +00:00
Robert Watson
80c9c40df9 o Remove the disabled p_cansched() test cases that permitted users to
modify the scheduling properties of processes with a different real
  uid but the same effective uid (i.e., daemons, et al).  (note: these
  cases were previously commented out, so this does not change the
  compiled code at al)

Obtained from:	TrustedBSD Project
2001-04-27 01:56:32 +00:00
Poul-Henning Kamp
8ee8b21b48 vfs_subr.c is getting rather fat. The underlying repocopy and this
commit moves the filesystem export handling code to vfs_export.c
2001-04-26 20:47:14 +00:00
Alfred Perlstein
06336fb26d Sendfile is documented to return 0 on success, however if when a
sf_hdtr is used to provide writev(2) style headers/trailers on the
sent data the return value is actually either the result of writev(2)
from the trailers or headers of no tailers are specified.

Fix sendfile to comply with the documentation, by returning 0 on
success.

Ok'd by: dg
2001-04-26 00:14:14 +00:00
Seigo Tanimura
ebdc3f1d2d Do not leave a process with no credential in zombproc.
Reviewed by:	jhb
2001-04-25 10:22:35 +00:00
Kirk McKusick
112f737245 When closing the last reference to an unlinked file, it is freed
by the inactive routine. Because the freeing causes the filesystem
to be modified, the close must be held up during periods when the
filesystem is suspended.

For snapshots to be consistent across crashes, they must write
blocks that they copy and claim those written blocks in their
on-disk block pointers before the old blocks that they referenced
can be allowed to be written.

Close a loophole that allowed unwritten blocks to be skipped when
doing ffs_sync with a request to wait for all I/O activity to be
completed.
2001-04-25 08:11:18 +00:00
Poul-Henning Kamp
a13234bb35 Move the netexport structure from the fs-specific mountstructure
to struct mount.

This makes the "struct netexport *" paramter to the vfs_export
and vfs_checkexport interface unneeded.

Consequently that all non-stacking filesystems can use
vfs_stdcheckexp().

At the same time, make it a pointer to a struct netexport
in struct mount, so that we can remove the bogus AF_MAX
and #include <net/radix.h> from <sys/mount.h>
2001-04-25 07:07:52 +00:00
Thomas Moestl
83f3198b2b Change uipc_sockaddr so that a sockaddr_un without a path is returned
nam for an unbound socket instead of leaving nam untouched in that case.
This way, the getsockname() output can be used to determine the address
family of such sockets (AF_LOCAL).

Reviewed by:	iedowse
Approved by:	rwatson
2001-04-24 19:09:23 +00:00
John Baldwin
33a9ed9d0e Change the pfind() and zpfind() functions to lock the process that they
find before releasing the allproc lock and returning.

Reviewed by:	-smp, dfr, jake
2001-04-24 00:51:53 +00:00
Thomas Moestl
e15480f8dd Fix a bug introduced in the last commit: vaccess_acl_posix1 only checked
the file gid gainst the egid of the accessing process for the
ACL_GROUP_OBJ case, and ignored supplementary groups.

Approved by:	rwatson
2001-04-23 22:52:26 +00:00
Greg Lehey
d98dc34f52 Correct #includes to work with fixed sys/mount.h. 2001-04-23 09:05:15 +00:00
Robert Watson
5ea6583e2d o Remove comment indicating policy permits loop-back debugging, but
semantics don't: in practice, both policy and semantics permit
  loop-back debugging operations, only it's just a subset of debugging
  operations (i.e., a proc can open its own /dev/mem), and that's at a
  higher layer.
2001-04-21 22:41:45 +00:00
John Baldwin
9d4f526475 Spelling nit: acquring -> acquiring.
Reported by:	T. William Wells <bill@twwells.com>
2001-04-21 01:50:32 +00:00
Alfred Perlstein
98689e1e70 Assert that when using an interlock mutex it is not recursed when lockmgr()
is called.

Ok'd by: jhb
2001-04-20 22:38:40 +00:00
John Baldwin
242d02a13f Make the ap_boot_mtx mutex static. 2001-04-20 01:09:05 +00:00
John Baldwin
d8915a7f34 - Whoops, forgot to enable the clock lock in the spin order list on the
alpha.
- Change the Debugger() functions to pass in the real function name.
2001-04-19 15:49:54 +00:00
Bosko Milekic
d04d50d1f7 Fix inconsistency in setup of kernel_map: we need to make sure that
we also reserve _adequate_ space for the mb_map submap; i.e. we need
space for nmbclusters, nmbufs, _and_ nmbcnt. Furthermore, we need to
rounddown, and not roundup, so that we are consistent.

Pointed out by: bde
2001-04-18 23:54:13 +00:00
Alfred Perlstein
2f3cf91876 Check validity of signal callback requested via aio routines.
Also move the insertion of the request to after the request is validated,
there's still looks like there may be some problems if an invalid address
is passed to the aio routines, basically a possible leak or having a
not completely initialized structure on the queue may still be possible.

A new sig macro was made _SIG_VALID to check the validity of a signal,
it would be advisable to use it from now on (in kern/kern_sig.c) rather
than rolling your own.

PR: kern/17152
2001-04-18 22:18:39 +00:00
Seigo Tanimura
759cb26335 Reclaim directory vnodes held in namecache if few free vnodes are
available.

Only directory vnodes holding no child directory vnodes held in
v_cache_src are recycled, so that directory vnodes near the root of
the filesystem hierarchy remain in namecache and directory vnodes are
not reclaimed in cascade.

The period of vnode reclaiming attempt and the number of vnodes
attempted to reclaim can be tuned via sysctl(2).

Suggested by:	tegge
Approved by:	phk
2001-04-18 11:19:50 +00:00
Poul-Henning Kamp
793d6d5d57 bread() is a special case of breadn(), so don't replicate code. 2001-04-18 07:16:07 +00:00
Dima Dorfman
25c7870e5d Make this driver play ball with devfs(5).
Reviewed by:	brian
2001-04-17 20:53:11 +00:00
Alfred Perlstein
e04670b734 Add a sanity check on ucred refcount.
Submitted by: Terry Lambert <terry@lambert.org>
2001-04-17 20:50:43 +00:00
Alfred Perlstein
603c86672c Implement client side NFS locks.
Obtained from: BSD/os
Import Ok'd by: mckusick, jkh, motd on builder.freebsd.org
2001-04-17 20:45:23 +00:00
Poul-Henning Kamp
0dfba3cef1 Write a switch statement as less obscure if statements. 2001-04-17 20:22:07 +00:00
John Baldwin
e3ee8974e3 Fix an old bug related to BETTER_CLOCK. Call forward_*clock if SMP
and __i386__ are defined rather than if SMP and BETTER_CLOCK are defined.
The removal of BETTER_CLOCK would have broken this except that kern_clock.c
doesn't include <machine/smptests.h>, so it doesn't see the definition of
BETTER_CLOCK, and forward_*clock aren't called, even on 4.x.  This seems to
fix the problem where a n-way SMP system would see 100 * n clk interrupts
and 128 * n rtc interrupts.
2001-04-17 17:53:36 +00:00
Poul-Henning Kamp
f84e29a06c This patch removes the VOP_BWRITE() vector.
VOP_BWRITE() was a hack which made it possible for NFS client
side to use struct buf with non-bio backing.

This patch takes a more general approach and adds a bp->b_op
vector where more methods can be added.

The success of this patch depends on bp->b_op being initialized
all relevant places for some value of "relevant" which is not
easy to determine.  For now the buffers have grown a b_magic
element which will make such issues a tiny bit easier to debug.
2001-04-17 08:56:39 +00:00
Kirk McKusick
5819ab3f12 Add debugging option to always read/write cylinder groups as full
sized blocks. To enable this option, use: `sysctl -w debug.bigcgs=1'.
Add debugging option to disable background writes of cylinder
groups. To enable this option, use: `sysctl -w debug.dobkgrdwrite=0'.
These debugging options should be tried on systems that are panicing
with corrupted cylinder group maps to see if it makes the problem
go away. The set of panics in question are:

	ffs_clusteralloc: map mismatch
	ffs_nodealloccg: map corrupted
	ffs_nodealloccg: block not in map
	ffs_alloccg: map corrupted
	ffs_alloccg: block not in map
	ffs_alloccgblk: cyl groups corrupted
	ffs_alloccgblk: can't find blk in cyl
	ffs_checkblk: partially free fragment

The following panics are less likely to be related to this problem,
but might be helped by these debugging options:

	ffs_valloc: dup alloc
	ffs_blkfree: freeing free block
	ffs_blkfree: freeing free frag
	ffs_vfree: freeing free inode

If you try these options, please report whether they helped reduce your
bitmap corruption panics to Kirk McKusick at <mckusick@mckusick.com>
and to Matt Dillon <dillon@earth.backplane.com>.
2001-04-17 05:37:51 +00:00
Robert Watson
b114e127e6 In my first reading of POSIX.1e, I misinterpreted handling of the
ACL_USER_OBJ and ACL_GROUP_OBJ fields, believing that modification of the
access ACL could be used by privileged processes to change file/directory
ownership.  In fact, this is incorrect; ACL_*_OBJ (+ ACL_MASK and
ACL_OTHER) should have undefined ae_id fields; this commit attempts
to correct that misunderstanding.

o Modify arguments to vaccess_acl_posix1e() to accept the uid and gid
  associated with the vnode, as those can no longer be extracted from
  the ACL passed as an argument.  Perform all comparisons against
  the passed arguments.  This actually has the effect of simplifying
  a number of components of this call, as well as reducing the indent
  level, but now seperates handling of ACL_GROUP_OBJ from ACL_GROUP.

o Modify acl_posix1e_check() to return EINVAL if the ae_id field of
  any of the ACL_{USER_OBJ,GROUP_OBJ,MASK,OTHER} entries is a value
  other than ACL_UNDEFINED_ID.  As a temporary work-around to allow
  clean upgrades, set the ae_id field to ACL_UNDEFINED_ID before
  each check so that this cannot cause a failure in the short term
  (this work-around will be removed when the userland libraries and
  utilities are updated to take this change into account).

o Modify ufs_sync_acl_from_inode() so that it forces
  ACL_{USER_OBJ,GROUP_OBJ,MASK,OTHER} ae_id fields to ACL_UNDEFINED_ID
  when synchronizing the ACL from the inode.

o Modify ufs_sync_inode_from_acl to not propagate uid and gid
  information to the inode from the ACL during ACL update.  Also
  modify the masking of permission bits that may be set from
  ALLPERMS to (S_IRWXU|S_IRWXG|S_IRWXO), as ACLs currently do not
  carry none-ACCESSPERMS (S_ISUID, S_ISGID, S_ISTXT).

o Modify ufs_getacl() so that when it emulates an access ACL from
  the inode, it initializes the ae_id fields to ACL_UNDEFINED_ID.

o Clean up ufs_setacl() substantially since it is no longer possible
  to perform chown/chgrp operations using vop_setacl(), so all the
  access control for that can be eliminated.

o Modify ufs_access() so that it passes owner uid and gid information
  into vaccess_acl_posix1e().

Pointed out by:	jedger
Obtained from:	TrustedBSD Project
2001-04-17 04:33:34 +00:00
John Baldwin
abd9053ee4 Blow away the panic mutex in favor of using a single atomic_cmpset() on a
panic_cpu shared variable.  I used a simple atomic operation here instead
of a spin lock as it seemed to be excessive overhead.  Also, this can avoid
recursive panics if, for example, witness is broken.
2001-04-17 04:18:08 +00:00
John Baldwin
3c41f323c9 Check to see if enroll() returns NULL in the witness initialization. This
can happen if witness runs out of resources during initialization or if
witness_skipspin is enabled.

Sleuthing by:	Peter Jeremy <peter.jeremy@alcatel.com.au>
2001-04-17 03:35:38 +00:00
John Baldwin
7141f2ad46 Exit and re-enter the critical section while spinning for a spinlock so
that interrupts can come in while we are waiting for a lock.
2001-04-17 03:34:52 +00:00
John Hay
24dbea46a9 Update to the 2001-04-02 version of the nanokernel code from Dave Mills. 2001-04-16 13:05:05 +00:00
Brian Somers
56700d4634 Call strlen() once instead of twice. 2001-04-14 21:33:58 +00:00
Robert Watson
e9e7ff5b22 o Since uid checks in p_cansignal() are now identical between P_SUGID
and non-P_SUGID cases, simplify p_cansignal() logic so that the
  P_SUGID masking of possible signals is independent from uid checks,
  removing redundant code and generally improving readability.

Reviewed by:	tmm
Obtained from:	TrustedBSD Project
2001-04-13 14:33:45 +00:00
Alfred Perlstein
1375ed7eb7 convert if/panic -> KASSERT, explain what triggered the assertion 2001-04-13 10:15:53 +00:00
Murray Stokely
a4e6da691f Generate useful error messages. 2001-04-13 09:37:25 +00:00
Mark Murray
f0b60d7560 Handle a rare but fatal race invoked sometimes when SIGSTOP is
invoked.
2001-04-13 09:29:34 +00:00
John Baldwin
7a9aa5d372 - Add a comment at the start of the spin locks list.
- The alpha SMP code uses an "ap boot" spinlock as well.
2001-04-13 08:31:38 +00:00
Robert Watson
44c3e09cdc o Disallow two "allow this" exceptions in p_cansignal() restricting
the ability of unprivileged processes to deliver arbitrary signals
  to daemons temporarily taking on unprivileged effective credentials
  when P_SUGID is not set on the target process:
  Removed:
     (p1->p_cred->cr_ruid != ps->p_cred->cr_uid)
     (p1->p_ucred->cr_uid != ps->p_cred->cr_uid)
o Replace two "allow this" exceptions in p_cansignal() restricting
  the ability of unprivileged processes to deliver arbitrary signals
  to daemons temporarily taking on unprivileged effective credentials
  when P_SUGID is set on the target process:
  Replaced:
     (p1->p_cred->p_ruid != p2->p_ucred->cr_uid)
     (p1->p_cred->cr_uid != p2->p_ucred->cr_uid)
  With:
     (p1->p_cred->p_ruid != p2->p_ucred->p_svuid)
     (p1->p_ucred->cr_uid != p2->p_ucred->p_svuid)
o These changes have the effect of making the uid-based handling of
  both P_SUGID and non-P_SUGID signal delivery consistent, following
  these four general cases:
     p1's ruid equals p2's ruid
     p1's euid equals p2's ruid
     p1's ruid equals p2's svuid
     p1's euid equals p2's svuid
  The P_SUGID and non-P_SUGID cases can now be largely collapsed,
  and I'll commit this in a few days if no immediate problems are
  encountered with this set of changes.
o These changes remove a number of warning cases identified by the
  proc_to_proc inter-process authorization regression test.
o As these are new restrictions, we'll have to watch out carefully for
  possible side effects on running code: they seem reasonable to me,
  but it's possible this change might have to be backed out if problems
  are experienced.

Submitted by:		src/tools/regression/security/proc_to_proc/testuid
Reviewed by:		tmm
Obtained from:	TrustedBSD Project
2001-04-13 03:06:22 +00:00
Robert Watson
0489082737 o Disable two "allow this" exceptions in p_cansched()m retricting the
ability of unprivileged processes to modify the scheduling properties
  of daemons temporarily taking on unprivileged effective credentials.
  These cases (p1->p_cred->p_ruid == p2->p_ucred->cr_uid) and
  (p1->p_ucred->cr_uid == p2->p_ucred->cr_uid), respectively permitting
  a subject process to influence the scheduling of a daemon if the subject
  process has the same real uid or effective uid as the daemon's effective
  uid.  This removes a number of the warning cases identified by the
  proc_to_proc iner-process authorization regression test.
o As these are new restrictions, we'll have to watch out carefully for
  possible side effects on running code: they seem reasonable to me,
  but it's possible this change might have to be backed out if problems
  are experienced.

Reported by:	src/tools/regression/security/proc_to_proc/testuid
Obtained from:	TrustedBSD Project
2001-04-12 22:46:07 +00:00
Robert Watson
e386f9bda3 o Make kqueue's filt_procattach() function use the error value returned
by p_can(...P_CAN_SEE), rather than returning EACCES directly.  This
  brings the error code used here into line with similar arrangements
  elsewhere, and prevents the leakage of pid usage information.

Reviewed by:	jlemon
Obtained from:	TrustedBSD Project
2001-04-12 21:32:02 +00:00
Robert Watson
d34f8d3030 o Limit process information leakage by introducing a p_can(...P_CAN_SEE...)
in rtprio()'s RTP_LOOKIP implementation.

Obtained from:	TrustedBSD Project
2001-04-12 20:46:26 +00:00
Robert Watson
eb9e5c1d72 o Reduce information leakage into jails by adding invocations of
p_can(...P_CAN_SEE...) to getpgid(), getsid(), and setpgid(),
  blocking these operations on processes that should not be visible
  by the requesting process.  Required to reduce information leakage
  in MAC environments.

Obtained from:	TrustedBSD Project
2001-04-12 19:39:00 +00:00
Robert Watson
4c5eb9c397 o Replace p_cankill() with p_cansignal(), remove wrappage of p_can()
from signal authorization checking.
o p_cansignal() takes three arguments: subject process, object process,
  and signal number, unlike p_cankill(), which only took into account
  the processes and not the signal number, improving the abstraction
  such that CANSIGNAL() from kern_sig.c can now also be eliminated;
  previously CANSIGNAL() special-cased the handling of SIGCONT based
  on process session.  privused is now deprecated.
o The new p_cansignal() further limits the set of signals that may
  be delivered to processes with P_SUGID set, and restructures the
  access control check to allow it to be extended more easily.
o These changes take into account work done by the OpenBSD Project,
  as well as by Robert Watson and Thomas Moestl on the TrustedBSD
  Project.

Obtained from:  TrustedBSD Project
2001-04-12 02:38:08 +00:00
Robert Watson
40829dd2dc o Regenerated following introduction of __setugid() system call for
"options REGRESSION".

Obtained from:	TrustedBSD Project
2001-04-11 20:21:37 +00:00
Robert Watson
130d0157d1 o Introduce a new system call, __setsugid(), which allows a process to
toggle the P_SUGID bit explicitly, rather than relying on it being
  set implicitly by other protection and credential logic.  This feature
  is introduced to support inter-process authorization regression testing
  by simplifying userland credential management allowing the easy
  isolation and reproduction of authorization events with specific
  security contexts.  This feature is enabled only by "options REGRESSION"
  and is not intended to be used by applications.  While the feature is
  not known to introduce security vulnerabilities, it does allow
  processes to enter previously inaccessible parts of the credential
  state machine, and is therefore disabled by default.  It may not
  constitute a risk, and therefore in the future pending further analysis
  (and appropriate need) may become a published interface.

Obtained from:	TrustedBSD Project
2001-04-11 20:20:40 +00:00
John Baldwin
7b531e6037 Stick proc0 in the PID hash table. 2001-04-11 18:50:50 +00:00
John Baldwin
2fea957dc5 Rename the IPI API from smp_ipi_* to ipi_* since the smp_ prefix is just
"redundant noise" and to match the IPI constant namespace (IPI_*).

Requested by:	bde
2001-04-11 17:06:02 +00:00