Commit Graph

4309 Commits

Author SHA1 Message Date
Michael Reifenberger
491dec936c Introduce [IPC|SHM]_[INFO|STAT] to shmctl to make
`/compat/linux/usr/bin/ipcs -m` happy.
2001-10-28 09:29:10 +00:00
Matthew Dillon
4ffa210b94 syncdelay, filedelay, dirdelay, metadelay are ints, not time_t's,
and can also be made static.
2001-10-27 19:58:56 +00:00
Poul-Henning Kamp
4e4a76633b Nudge the axe a bit closer to cdevsw[]:
Make it a panic to repeat make_dev() or destroy_dev(), this check
   should maybe be neutered when -current goes -stable.

   Whine if devsw() is called on anon dev_t's in a devfs system.

   Make a hack to avoid our lazy-eval disk code triggering the above whine.

   Fix the multiple make_dev() in disk code by making ${disk}${unit}s${slice}
   an alias/symlink to ${disk}${unit}s${slice}c
2001-10-27 17:44:21 +00:00
Dag-Erling Smørgrav
9ca45e813c Add a P_INEXEC flag that indicates that the process has called execve() and
it has not yet returned.  Use this flag to deny debugging requests while
the process is execve()ing, and close once and for all any race conditions
that might occur between execve() and various debugging interfaces.

Reviewed by:	jhb, rwatson
2001-10-27 11:11:25 +00:00
Robert Watson
48be932ac0 o Update copyright dates.
Obtained from:	TrustedBSD Project
2001-10-27 05:46:43 +00:00
Robert Watson
fdba6d3a1e o Improve style(9) compliance following KSE modifications. In particular,
strip the space from '( struct thread *...', wrap long lines.
o Remove an unneeded comment on the topic of no lock being required as
  part of the NDINIT() in __acl_get_file(), as it's really not required
  there.

Obtained from:	TrustedBSD Project
2001-10-27 05:45:42 +00:00
Matthew Dillon
d23f5958bc Add mtx_lock_giant() and mtx_unlock_giant() wrappers for sysctl management
of Giant during the Giant unwinding phase, and start work on instrumenting
Giant for the file and proc mutexes.

These wrappers allow developers to turn on and off Giant around various
subsystems.  DEVELOPERS SHOULD NEVER TURN OFF GIANT AROUND A SUBSYSTEM JUST
BECAUSE THE SYSCTL EXISTS!  General developers should only considering
turning on Giant for a subsystem whos default is off (to help track down
bugs).  Only developers working on particular subsystems who know what
they are doing should consider turning off Giant.

These wrappers will greatly improve our ability to unwind Giant and test
the kernel on a (mostly) subsystem by subsystem basis.   They allow Giant
unwinding developers (GUDs) to emplace appropriate subsystem and structural
mutexes in the main tree and then request that the larger community test
the work by turning off Giant around the subsystem(s), without the larger
community having to mess around with patches.  These wrappers also allow
GUDs to boot into a (more likely to be) working system in the midst of
their unwinding work and to test that work under more controlled
circumstances.

There is a master sysctl, kern.giant.all, which defaults to 0 (off).  If
turned on it overrides *ALL* other kern.giant sysctls and forces Giant to
be turned on for all wrapped subsystems.  If turned off then Giant around
individual subsystems are controlled by various other kern.giant.XXX sysctls.

Code which overlaps multiple subsystems must have all related subsystem Giant
sysctls turned off in order to run without Giant.
2001-10-26 20:48:04 +00:00
John Baldwin
282873e2c0 - Change the taskqueue locking to protect the necessary parts of a task
while it is on a queue with the queue lock and remove the per-task locks.
- Remove TASK_DESTROY now that it is no longer needed.
- Go back to inlining TASK_INIT now that it is short again.

Inspired by:	dfr
2001-10-26 18:46:48 +00:00
Poul-Henning Kamp
5f7806ab69 Make cdevsw[] static. 2001-10-26 15:31:22 +00:00
John Baldwin
8e2e767b1f Add a per-thread ucred reference for syscalls and synchronous traps from
userland.  The per thread ucred reference is immutable and thus needs no
locks to be read.  However, until all the proc locking associated with
writes to p_ucred are completed, it is still not safe to use the per-thread
reference.

Tested on:	x86 (SMP), alpha, sparc64
2001-10-26 08:12:54 +00:00
John Baldwin
1de1c550b1 Add locking to taskqueues. There is one mutex per task, one mutex per
queue, and a mutex to protect the global list of taskqueues.  The only
visible change is that a TASK_DESTROY() macro has been added to mirror
the TASK_INIT() macro to destroy a task before it is free'd.

Submitted by:	Andrew Reiter <awr@watson.org>
2001-10-26 06:32:21 +00:00
John Baldwin
40c6d2be16 Use msleep() to avoid lost wakeup's instead of doing an ineffective
splhigh() before the mtx_unlock and tsleep().  The splhigh() was probably
correct in the original code using simplelocks but is not correct in
5.0-current.

Noticed by:	Andrew Reiter <awr@FreeBSD.org>
2001-10-26 06:09:01 +00:00
Matthew Dillon
245df27cee Implement kern.maxvnodes. adjusting kern.maxvnodes now actually has a
real effect.

Optimize vfs_msync().  Avoid having to continually drop and re-obtain
mutexes when scanning the vnode list.  Improves looping case by 500%.

Optimize ffs_sync().  Avoid having to continually drop and re-obtain
mutexes when scanning the vnode list.  This makes a couple of assumptions,
which I believe are ok, in regards to vnode stability when the mount list
mutex is held.  Improves looping case by 500%.

(more optimization work is needed on top of these fixes)

MFC after:	1 week
2001-10-26 00:08:05 +00:00
Matthew Dillon
f92dcd3e4a Add missing TAILQ_INSERT_TAIL's which somehow didn't get comitted with
the recent vnode cleanup.
2001-10-25 23:13:56 +00:00
Matthew Dillon
f02098e59c In cluster_rbuild(), 'size' had better match buf->b_bcount and buf->b_bufsize
or the cluster will not be properly merged.  Dup the code from
cluster_wbuild() and add some printf()s to see if bad cases are present.

MFC after:	2 weeks
2001-10-25 22:49:48 +00:00
John Baldwin
5a08b84f83 Fix an inverted test csae. Success of getenv() is determined by a return
value of !NUL rather than NUL.

Submitted by:	luigi
Pointy hat to:	jhb
2001-10-25 17:22:31 +00:00
Jonathan Lemon
18bfd58110 cnclose() can potentially race against itself. To avoid vn_close() races,
NULL-out cnd_vp before calling the latter, as it may block.

Submitted by: dillon
2001-10-25 04:51:37 +00:00
Jonathan Lemon
7ce26133ea Force FWRITE on when opening the console, so that the flags passed to
vn_close match those from vn_open.  This fixes the panic some people
were seeing about "vrele: missed vn_close".
2001-10-25 00:14:16 +00:00
John Baldwin
882bcf5879 Document the requirements and nature of the logical CPU IDs. It isn't
very strict and leaves much up to the platform so that it can define a
convenient mapping.

Requested by:	mjacob
2001-10-24 22:15:38 +00:00
Matthew Dillon
a06fe5111e unwind v_writecount in fhopen() if we are unable to allocate the
descriptor.

MFC after:	3 days
2001-10-24 18:32:17 +00:00
John Baldwin
781a35df6b Fix this to actually compile in the !INVARIANTS case.
Reported by:	Maxime Henrion <mux@qualys.com>
2001-10-24 14:18:33 +00:00
Robert Drehmel
9a024fc559 Use vm_offset_t instead of caddr_t to fix a warning and remove
two casts.
2001-10-24 14:15:28 +00:00
Matthew Dillon
79deba82cd Fix ktrace enablement/disablement races that can result in a vnode
ref count panic.

Bug noticed by:	ps
Reviewed by:	ps
MFC after:	1 day
2001-10-24 01:05:39 +00:00
John Baldwin
4e5e677bc0 Change the sx(9) assertion API to use a sx_assert() function similar to
mtx_assert(9) rather than several SX_ASSERT_* macros.
2001-10-23 22:39:11 +00:00
John Baldwin
21cbf0cc8b - Change getenv_quad() to return an int instead of a quad_t since it
returns an success/failure code rather than the actual value.
- Add getenv_string() which copies a string from the environment to another
  string and returns true on success.
2001-10-23 22:34:36 +00:00
Jonathan Lemon
991f976036 Implement multiple low-level console support. 2001-10-23 20:25:50 +00:00
Robert Watson
fc2749a40c o vn_open() fails to call VOP_CLOSE() if vfs_object_create fails. Ideally
all successful calls to VOP_OPEN() might be reflected in a call to
  VOP_CLOSE().  For now, simply add a comment reflecting this problem;
  this should be fixed at some point.
2001-10-23 19:09:01 +00:00
John Baldwin
ac9a258074 Assert that Giant is not held in mi_switch() unless the process state
is SMTX or SRUN.
2001-10-23 17:52:49 +00:00
Matthew Dillon
4f467cb8c1 Fix incorrect double-termination of vm_object. When a vm_object is
terminated and flushes pending dirty pages it is possible for the
object to be ref'd (0->1) and then deref'd (1->0) during termination.
We do not terminate the object a second time.

Document vop_stdgetvobject() to explicitly allow it to be called without
the vnode interlock held (for upcoming sync_msync() and ffs_sync()
performance optimizations)

MFC after:	3 days
2001-10-23 01:23:41 +00:00
Matthew Dillon
c72ccd014d Change the vnode list under the mount point from a LIST to a TAILQ
in preparation for an implementation of limiting code for kern.maxvnodes.

MFC after:	3 days
2001-10-23 01:21:29 +00:00
Poul-Henning Kamp
5015bb7f85 disk_clone() was a bit too eager to please: "md0s1ec" is not a valid
device.

Noticed by:	Chad David <davidc@acns.ab.ca>
2001-10-22 10:18:45 +00:00
Dag-Erling Smørgrav
7c62990641 Move procfs_* from procfs_machdep.c into sys_process.c, and rename them to
proc_* in the process; procfs_machdep.c is no longer needed.

Run-tested on i386, build-tested on Alpha, untested on other platforms.
2001-10-21 23:57:24 +00:00
Dag-Erling Smørgrav
45fb069ac9 Convert textvp_fullpath() into the more generic vn_fullpath() which takes a
struct thread * and a struct vnode * instead of a struct proc *.

Temporarily add a textvp_fullpath macro for compatibility.
2001-10-21 15:52:51 +00:00
Matthew Dillon
5eb13f768c Documentation
MFC after:	1 day
2001-10-21 06:26:55 +00:00
Matthew Dillon
57601bcb5d Syntax cleanup and documentation, no operational changes.
MFC after:	1 day
2001-10-21 06:12:06 +00:00
Ian Dowse
72ec63a53d Introduce some jitter to the timing of the samples that determine
the system load average. Previously, the load average measurement
was susceptible to synchronisation with processes that run at
regular intervals such as the system bufdaemon process.

Each interval is now chosen at random within the range of 4 to 6
seconds. This large variation is chosen so that over the shorter
5-minute load average timescale there is a good dispersion of
samples across the 5-second sample period (the time to perform 60
5-second samples now has a standard deviation of approx 4.5 seconds).
2001-10-20 16:07:17 +00:00
Ian Dowse
0eb6ce3169 Move the code that computes the system load average from vm_meter.c
to kern_synch.c in preparation for adding some jitter to the
inter-sample time.

Note that the "vm.loadavg" sysctl still lives in vm_meter.c which
isn't the right place, but it is appropriate for the current (bad)
name of that sysctl.

Suggested by:	jhb (some time ago)
Reviewed by:	bde
2001-10-20 13:10:43 +00:00
John Baldwin
7ada587697 The mtx_init() and sx_init() functions bzero'd locks before handing them
off to witness_init() making the check for double intializating a lock by
testing the LO_INITIALIZED flag moot.  Workaround this by checking the
LO_INITIALIZED flag ourself before we bzero the lock structure.
2001-10-20 01:22:42 +00:00
Peter Wemm
259ed91740 Add a sysctl for preventing the sync() in panic() recovery. This can
be so dangerous it isn't funny.  eg: if you panic inside NFS or softdep,
and then try and sync you run into held locks and cause either deadlocks,
recursive panics or other interesting chaos.  Default is unchanged.
2001-10-19 23:32:03 +00:00
Jonathan Lemon
7e7c3f3f33 Add dev_named(dev, name), which is similar in spirit to devtoname().
This function returns success if the device is known by either 'name'
or any of its aliases.
2001-10-17 18:47:12 +00:00
Matthew Dillon
2210e5d9fa fix minor bug in kern.minvnodes sysctl. Use OID_AUTO. 2001-10-16 23:08:09 +00:00
Robert Watson
ab323a7d45 o Update init_sysent.c and friends for allocation of afs_syscall. 2001-10-13 13:30:21 +00:00
Robert Watson
b55abfd929 o Reserve system call 377 for afs_syscall; by reserving a system call
number, portable OpenAFS applications don't have to attempt to determine
  what system call number was dynamically allocated.  No system call
  prototype or implementation is defined.

Requested by:	Tom Maher <tardis@watson.org>
2001-10-13 13:19:34 +00:00
Poul-Henning Kamp
ce9d2b59b2 Regenerate syscall stuff.
Remove syscall-hide.h
2001-10-13 09:18:28 +00:00
Poul-Henning Kamp
5ab1bfacb1 Don't generate <sys/syscalls-hide.h> it has never had any users anywhere in
the source tree.
2001-10-13 09:17:49 +00:00
Peter Pentchev
88fbb423d4 Remove the panic when trying to register a sysctl with an oid too high.
This stops panics on unloading modules which define their own sysctl sets.

However, this also removes the protection against somebody actually
defining a static sysctl with an oid in the range of the dynamic ones,
which would break badly if there is already a dynamic sysctl with
the requested oid.

Apparently, the algorithm for removing sysctl sets needs a bit more work.
For the present, the panic I introduced only leads to Bad Things (tm).

Submitted by:	many users of -current :(
Pointy hat to:	roam (myself) for not testing rev. 1.112 enough.
2001-10-12 09:16:36 +00:00
John Baldwin
a2f2b3afcd - Catch up to the new ucred API.
- Add proc locking to the jail() syscall.  This mostly involved shuffling
  a few things around so that blockable things like malloc and copyin
  were performed before acquiring the lock and checking the existing
  ucred and then updating the ucred as one "atomic" change under the proc
  lock.
2001-10-11 23:39:43 +00:00
John Baldwin
bd78cece5d Change the kernel's ucred API as follows:
- crhold() returns a reference to the ucred whose refcount it bumps.
- crcopy() now simply copies the credentials from one credential to
  another and has no return value.
- a new crshared() primitive is added which returns true if a ucred's
  refcount is > 1 and false (0) otherwise.
2001-10-11 23:38:17 +00:00
John Baldwin
698166ca55 Whitespace fixes. 2001-10-11 22:49:27 +00:00
John Baldwin
6a90c862d3 Rework some code to be a bit simpler by inverting a few tests and using
else clauses instead of goto's.
2001-10-11 22:48:37 +00:00
John Baldwin
61d80e90a9 Add missing includes of sys/ktr.h. 2001-10-11 17:53:43 +00:00
John Baldwin
7106ca0d1a Add missing includes of sys/lock.h. 2001-10-11 17:52:20 +00:00
Michael Reifenberger
91a701cd13 Fix SysV Semaphore Handling.
Updated by peter following KSE and Giant pushdown.
I've running with this patch for two week with no ill side effects.

PR:		kern/12014: Fix SysV Semaphore handling
Submitted by:	Peter Jeremy <peter.jeremy@alcatel.com.au>
2001-10-11 08:15:14 +00:00
Paul Saab
cbc89bfbfe Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader
tunable.

Reviewed by:	peter
MFC after:	2 weeks
2001-10-10 23:06:54 +00:00
John Baldwin
f21fc12736 Add a temporary hack that will go away with the ucred API update to bzero
the duplicated mutex before initializing it to avoid triggering the check
for init'ing an already initialized mutex.
2001-10-10 20:45:40 +00:00
John Baldwin
6a40eccec3 Malloc mutexes pre-zero'd as random garbage (including 0xdeadcode) my
trigget the check to make sure we don't initalize a mutex twice.
2001-10-10 20:43:50 +00:00
Doug Rabson
e913ca22e2 Move setregs() out from under the PROC_LOCK so that it can use functions
list suword() which may trap.
2001-10-10 20:04:57 +00:00
Robert Watson
8a7d8cc675 - Combine kern.ps_showallprocs and kern.ipc.showallsockets into
a single kern.security.seeotheruids_permitted, describes as:
  "Unprivileged processes may see subjects/objects with different real uid"
  NOTE: kern.ps_showallprocs exists in -STABLE, and therefore there is
  an API change.  kern.ipc.showallsockets does not.
- Check kern.security.seeotheruids_permitted in cr_cansee().
- Replace visibility calls to socheckuid() with cr_cansee() (retain
  the change to socheckuid() in ipfw, where it is used for rule-matching).
- Remove prison_unpcb() and make use of cr_cansee() against the UNIX
  domain socket credential instead of comparing root vnodes for the
  UDS and the process.  This allows multiple jails to share the same
  chroot() and not see each others UNIX domain sockets.
- Remove unused socheckproc().

Now that cr_cansee() is used universally for socket visibility, a variety
of policies are more consistently enforced, including uid-based
restrictions and jail-based restrictions.  This also better-supports
the introduction of additional MAC models.

Reviewed by:	ps, billf
Obtained from:	TrustedBSD Project
2001-10-09 21:40:30 +00:00
John Baldwin
8688bb9383 proces -> process in a comment. 2001-10-09 17:25:30 +00:00
Robert Watson
32d186043b o Recent addition of (p1==p2) exception in p_candebug() permitted
processes to attach debugging to themselves even though the
  global kern_unprivileged_procdebug_permitted policy might disallow
  this.
o Move the kern_unprivileged_procdebug_permitted check above the
  (p1==p2) check.

Reviewed by:	des
2001-10-09 16:56:29 +00:00
John Baldwin
74e4502e62 Replace 'curproc' with 'td->td_proc'. 2001-10-08 21:05:46 +00:00
Matthew Dillon
917efbaaba WS Cleanup 2001-10-08 19:51:13 +00:00
Dag-Erling Smørgrav
3da3249106 Dissociate ptrace from procfs.
Until now, the ptrace syscall was implemented as a wrapper that called
various functions in procfs depending on which ptrace operation was
requested.  Most of these functions were themselves wrappers around
procfs_{read,write}_{,db,fp}regs(), with only some extra error checks,
which weren't necessary in the ptrace case anyway.

This commit moves procfs_rwmem() from procfs_mem.c into sys_process.c
(renaming it to proc_rwmem() in the process), and implements ptrace()
directly in terms of procfs_{read,write}_{,db,fp}regs() instead of
having it fake up a struct uio and then call procfs_do{,db,fp}regs().

It also moves the prototypes for procfs_{read,write}_{,db,fp}regs()
and proc_rwmem() from proc.h to ptrace.h, and marks all procfs files
except procfs_machdep.c as "optional procfs" instead of "standard".
2001-10-07 20:08:42 +00:00
Dag-Erling Smørgrav
23fad5b6c9 Always succeed if the target process is the same as the requesting process. 2001-10-07 20:06:03 +00:00
Ian Dowse
80f42b555d Fix a typo in do_sigaction() where sa_sigaction and sa_handler were
confused. Since sa_sigaction and sa_handler alias each other in a
union, the bug was completely harmless. This had been fixed as part
of the SIGCHLD changes in revision 1.125, but it was reverted when
they were backed out in revision 1.126.
2001-10-07 16:11:37 +00:00
Robert Watson
c175d2226f o Introduce an 'options REGRESSION'-dependant sysctl namespaces,
'regression.*'.
o Add 'regression.securelevel_nonmonotonic', conditional on 'options
  REGRESSION', which allows the securelevel to be lowered for the purposes
  of efficient regression testing of securelevel policy decisions.
  Regression tests for securelevels will be committed shortly.

NOTE: 'options REGRESSION' should never be used on production machines, as
it permits violation of system invariants so as to improve the ability to
effectively test edge cases, and improve testing efficiency.
2001-10-07 03:51:22 +00:00
Marcel Moolenaar
49ead724c6 Fix breakage caused by previous commit. The lkmnosys and lkmressys
syscalls are of type NODEF but not in a way that fits the given
definition of that type. The exact difference of lkmressys and
lkmnosys is unclear, which makes it all the more confusing. A
reevaluation of what we have and what we really need is in order.

Spotted by: Maxime Henrion <mux@qualys.com>
Pointy hat: marcel
2001-10-07 00:16:31 +00:00
Matthew Dillon
845bd795c9 vinvalbuf() was only waiting for write-I/O to complete. It really has to
wait for both read AND write I/O to complete.  Only NFS calls vinvalbuf()
on an active vnode (when the server indicates that the file is stale), so
this bug fix only effects NFS clients.

MFC after:	3 days
2001-10-05 20:10:32 +00:00
John Baldwin
43150722c9 The aio kthreads start off with a root credential just like all other
kthreads, so don't malloc a ucred just so we can create a duplicate of the
one we already have.
2001-10-05 17:55:11 +00:00
Paul Saab
4787fd37af Only allow users to see their own socket connections if
kern.ipc.showallsockets is set to 0.

Submitted by:	billf (with modifications by me)
Inspired by:	Dave McKay (aka pm aka Packet Magnet)
Reviewed by:	peter
MFC after:	2 weeks
2001-10-05 07:06:32 +00:00
Dag-Erling Smørgrav
50f74e92b8 Final style(9) commit: placement of opening brace; a continuation indent I
missed in the previous commit; a line that exceeded 80 characters.  No
functional changes, but the object file's md5 checksum changes because some
lines have been displaced.
2001-10-04 16:35:44 +00:00
Dag-Erling Smørgrav
8a8d4e459c More style(9) fixes: no spaces between function name and parameter list;
some indentation fixes (particularly continuation lines).

Reviewed by:	md5(1)
2001-10-04 16:29:45 +00:00
Dag-Erling Smørgrav
c5799337ea This file had a mixture of "return foo;" and "return (foo);"; standardize
on "return (foo);" as mandated by style(9).

Reviewed by:	md5(1)
2001-10-04 16:09:22 +00:00
David Malone
2bc21ed985 Hopefully improve control message passing over Unix domain sockets.
1) Allow the sending of more than one control message at a time
over a unix domain socket. This should cover the PR 29499.

2) This requires that unp_{ex,in}ternalize and unp_scan understand
mbufs with more than one control message at a time.

3) Internalize and externalize used to work on the mbuf in-place.
This made life quite complicated and the code for sizeof(int) <
sizeof(file *) could end up doing the wrong thing. The patch always
create a new mbuf/cluster now. This resulted in the change of the
prototype for the domain externalise function.

4) You can now send SCM_TIMESTAMP messages.

5) Always use CMSG_DATA(cm) to determine the start where the data
in unp_{ex,in}ternalize. It was using ((struct cmsghdr *)cm + 1)
in some places, which gives the wrong alignment on the alpha.
(NetBSD made this fix some time ago).

This results in an ABI change for discriptor passing and creds
passing on the alpha. (Probably on the IA64 and Spare ports too).

6) Fix userland programs to use CMSG_* macros too.

7) Be more careful about freeing mbufs containing (file *)s.
This is made possible by the prototype change of externalise.

PR:		29499
MFC after:	6 weeks
2001-10-04 13:11:48 +00:00
David Malone
59bdd40568 Allow sbcreatecontrol to make cluster sized control messages. 2001-10-04 12:59:53 +00:00
John Baldwin
0479e3d339 Move the ap boot spin lock earlier in the lock order before the sio(4)
lock since we occasionally call printf() while holding the ap boot lock
which can call down into the sio(4) driver if using a serial console.
2001-10-01 22:50:30 +00:00
Robert Watson
c6ab2f6b4e o Complete the migration from suser error checking in the following form
in vfs_syscalls.c:

    if (mp->mnt_stat.f_owner != p->p_ucred->cr_uid &&
        (error = suser_td(td)) != 0) {
            unwrap_lots_of_stuff();
            return (error);
    }

  to:

    if (mp->mnt_stat.f_owner != p->p_ucred->cr_uid) {
            error = suser_td(td);
            if (error) {
                unwrap_lots_of_stuff();
                return (error);
            }
    }

  This makes the code more readable when complex clauses are in use,
  and minimizes conflicts for large outstanding patchsets modifying the
  kernel authorization code (of which I have several), especially where
  existing authorization and context code are combined in the same if()
  conditional.

Obtained from:	TrustedBSD Project
2001-10-01 20:01:07 +00:00
Matthew Dillon
b5810bab2d After extensive testing it has been determined that adding complexity
to avoid removing higher level directory vnodes from the namecache has
no perceivable effect and will be removed.  This is especially true
when vmiodirenable is turned on, which it is by default now.  ( vmiodirenable
makes a huge difference in directory caching ).  The vfs.vmiodirenable and
vfs.nameileafonly sysctls have been left in to allow further testing, but
I expect to rip out vfs.nameileafonly soon too.

I have also determined through testing that the real problem with numvnodes
getting too large is due to the VM Page cache preventing the vnode from
being reclaimed.  The directory stuff made only a tiny dent relative
to Poul's original code, enough so that some tests succeeded.  But tests
with several million small files show that the bigger problem is the VM Page
cache.  This will have to be addressed by a future commit.

MFC after:	3 days
2001-10-01 04:33:35 +00:00
Jonathan Lemon
1a6fc8ef63 When FREE()ing kqueue related structures, charge them to the correct bucket.
Submitted by: iedowse
Forgotten by: jlemon
2001-09-30 17:00:56 +00:00
Bosko Milekic
70a61707f6 Re-enable mbtypes statistics in the mbuf allocator. I disabled these
when I changed the allocator bits. This implements per-CPU mbtypes
stats by keeping net number of decrements/increments of a given mbtype
per-CPU and then summing all of the per-CPU mbtypes to produce the total
net number of allocated mbufs of the given mbtype.
Counters are carefully balanced to avoid/prevent underflows/overflows.

mbtypes stats are re-enabled with the idea that we may occasionally
(although very rarely) observe slight inconsistencies in the stat
reporting. Most of the time, we should be fine, though.

Also make appropriate modifications to netstat(1) and systat(1) to do
the necessary reporting.

Submitted by: Jiangyi Liu <jyliu@163.net>
2001-09-30 01:58:39 +00:00
Jonathan Lemon
0217f5c71e Have EVFILT_TIMERS allocate their callouts via malloc() instead of using
the static callout list allocated by the system.

Change malloc type from M_TEMP to M_KQUEUE to better track memory.

Add a kern.kq_calloutmax to globally limit the amount of kernel memory
that can be allocated by callouts.

Submitted by: iedowse  (items 1, 2)
2001-09-29 17:48:39 +00:00
Dag-Erling Smørgrav
5b6db47748 Add a couple of API functions I need for my pseudofs WIP. Documentation
will follow when I've decided whether to keep this API or ditch it in
favor of something slightly more subtle.
2001-09-29 00:32:46 +00:00
Marcel Moolenaar
4166877345 Make the NODEF type usable. A syscall of type NODEF will only
have its entry in the syscall table added. Nothing else is
done. This differs from type NOPROTO in that NOPROTO adds a
definition to syscall.h besides adding a sysent. A syscall can
now have multiple entries without conflict. Note that the
argssize is fixed and depends on the syscall name.
2001-09-28 01:21:57 +00:00
Robert Watson
87fce2bb96 o When performing a securelevel check as part of securelevel_ge() or
securelevel_gt(), determine first if a local securelevel exists --
  if so, perform the check based on imax(local, global).  Otherwise,
  simply use the global value.
o Note: even though local securelevels might lag below the global one,
  if the global value is updated to higher than local values, maximum
  will still be used, making the global dominant even if there is local
  lag.

Obtained from:	TrustedBSD Project
2001-09-26 20:41:48 +00:00
Robert Watson
8a528812a0 o Modify kern.securelevel MIB entry to return a local securelevel, if
one is present in the current jail, otherwise, to return the global
  securelevel.
o If the securelevel is being updated, require that it be greater than
  the maximum of local and global, if a local securelevel exists,
  otherwise, just maximum of the global.  If there is a local
  securelevel, update the local one instead of the global one.
o Note: this does allow local securelevels to lag behind the global one
  as long as the local one is not updated following a global increase.

Obtained from:	TrustedBSD Project
2001-09-26 20:39:48 +00:00
Robert Watson
567931c8f6 o Initialize per-jail securelevel from global securelevel as part of
jail creation.

Obtained from:	TrustedBSD Project
2001-09-26 20:37:15 +00:00
Robert Watson
d501d04b9e o Modify static settime() to accept the proc * for the process requesting
a time change, and callers so that they provide td->td_proc.
o Modify settime() to use securevel_gt() for securelevel checking.

Obtained from:	TrustedBSD Project
2001-09-26 19:53:57 +00:00
Robert Watson
c2f413af19 o Modify sysctl access control check to use securelevel_gt(), and
clarify sysctl access control logic.

Obtained from:	TrustedBSD Project
2001-09-26 19:51:25 +00:00
Matthew Dillon
46cad5761c Enable vmiodirenable by default. Remove incorrect comment from sysctl.conf.
MFC after:	1 week
2001-09-26 19:35:04 +00:00
Matthew Dillon
3418ebebfe Make uio_yield() a global. Call uio_yield() between chunks
in vn_rdwr_inchunks(), allowing other processes to gain an exclusive
lock on the vnode.  Specifically: directory scanning, to avoid a race to the
root directory, and multiple child processes coring simultaniously so they
can figure out that some other core'ing child has an exclusive adv lock and
just exit instead.

This completely fixes performance problems when large programs core.  You
can have hundreds of copies (forked children) of the same binary core all
at once and not notice.

MFC after:	3 days
2001-09-26 06:54:32 +00:00
Paul Saab
88b1d98f31 Lock the vnode while truncating the corefile. This fixes a panic
with softupdates dangling deps.

Submitted by:	peter
MFC:		ASAP :)
2001-09-26 01:24:07 +00:00
John Baldwin
21377ce065 Remove superflous parens after de-macroizing. 2001-09-26 00:05:18 +00:00
Robert Watson
75bc5b3f22 o So, when <dd> e-mailed me and said that the comment was inverted
for securelevel_ge() and securelevel_gt(), I was a little surprised,
  but fixed it.  Turns out that it was the code that was inverted, during
  a whitespace cleanup in my commit tree.  This commit inverts the
  checks, and restores the comment.
2001-09-25 21:08:33 +00:00
John Baldwin
dde96c9933 Since we no longer inline any debugging code in the mutex operations, move
all the debugging code into the function versions of the mutex operations
in kern_mutex.c.  This reduced the __mtx_* macros to simply wrappers of
the _{get,rel}_lock_* macros, so the __mtx_* macros were also abolished in
favor of just calling the _{get,rel}_lock_* macros.  The tangled hairy mass
of macros calling macros is at least a bit more sane now.
2001-09-22 21:19:55 +00:00
Robert Watson
b4799065ef o vpaccess() -> vn_access() -- Peter reminds me that there is already
a convention for vnop helper routines of this sort.

Submitted by:	Mr Wemm <peter>
2001-09-22 03:07:41 +00:00
John Baldwin
ed01445d8f Use the passed in thread to selrecord() instead of curthread. 2001-09-21 22:46:54 +00:00
John Baldwin
456ca585db Use the passed in thread pointer instead of curthread in calls to
selrecord() in ptcpoll().  The pre-KSE code used the passed in proc pointer
rather than curproc, and an earlier seltrue() call uses the passed in
thread and not curthread.
2001-09-21 22:22:25 +00:00
John Baldwin
fea2ab833e The P_SELECT flag was moved from p->p_flag to td->td_flags, but p_flag
was locked by the proc lock and td_flags is locked by the sched_lock.
The places that read, set, and cleared TDF_SELECT weren't updated, so they
read and modified td_flags w/o holding the sched_lock, meaning that they
could corrupt the per-thread flags field.  As an immediate band-aid,
grab sched_lock while reading and manipulating td_flags in relation to
TDF_SELECT.  This will probably be cleaned up some later on.
2001-09-21 22:06:22 +00:00
John Baldwin
e649bcb506 Remove unneeded proc variables and fix comments. 2001-09-21 21:54:45 +00:00
Robert Watson
a90a3f2882 o Part two of eaccess(2) commit, rebuilt system call code.
Obtained from:	TrustedBSD Project
2001-09-21 21:34:06 +00:00
Robert Watson
9c94f7731e o Introduce eaccess(2), a version of access(2) that uses the effective
credentials rather than the real credentials.  This is useful for
  implementing GUI's which need to modify icons based on access rights,
  but where use of open(2) is too expensive, use of stat(2) doesn't
  reflect the file system's real protection model, and use of
  access() suffers from real/effective credential confusion.  This
  implementation provides the same semantics as the call of the same
  name on SCO OpenServer.  Note: using this call improperly can
  leave you subject to some of the same races present in the
  access(2) call.
o To implement this, break out the basic logic of access(2) into
  vpaccess(), which accepts a passed credential to perform the
  invocation of VOP_ACCESS().  Add eaccess(2) to invoke vpaccess(),
  and modify access(2) to use vpaccess().

Obtained from:	TrustedBSD Project
2001-09-21 21:33:22 +00:00
John Baldwin
278da5113f Remove a bogus comment. "atomic" doesn't mean that the operation is done
as a physical atomic operation.  That would require the code to use the
atomic API, which it does not.  Instead, the operation is made psuedo
atomic (hence the quotes) by use of the lock to protect clearing all of the
flags in question.
2001-09-21 19:26:57 +00:00
John Baldwin
21832b1ec0 GC some #if 0'd code. 2001-09-21 19:21:18 +00:00
John Baldwin
3226cbf43b Whitespace and spelling fixes. 2001-09-21 19:16:12 +00:00
Michael Reifenberger
896de692f8 Make msgseg, msgssz (->msgmax) and msgmni TUNABLE. 2001-09-21 09:25:17 +00:00
Peter Wemm
1114d18594 Add a pointer to kenv(1). 2001-09-21 02:25:53 +00:00
Jonathan Lemon
57ea1fa07f Revert last commit. The same functionality can be obtained through the
'kenv' command, which I obviously was unaware of.
2001-09-21 02:09:01 +00:00
Robert Watson
94088977c9 o Rename u_cansee() to cr_cansee(), making the name more comprehensible
in the face of a rename of ucred to cred, and possibly generally.

Obtained from:	TrustedBSD Project
2001-09-20 21:45:31 +00:00
Jonathan Lemon
e492f03505 Add a sysctl MIB 'kern.env', that dumps the contents of the kernel
environment from the loader, as well as the kernel's compiled in static
hints.
2001-09-20 20:09:37 +00:00
Peter Wemm
fbd7a9dd97 decrement the dumping variable after use so we can call it several times
if needed.
2001-09-20 06:08:53 +00:00
John Baldwin
a44f918bf9 Fix a bug in propagate priority: the kse group pointer wasn't being
updated in the loop so the new thread always seemd to have the same
priority as the original thread and no actual priorities were changed.
2001-09-19 22:52:59 +00:00
Robert Watson
288b789333 o Clarification of securelevel_{ge,gt} comment.
Submitted by:	dd
2001-09-19 14:09:13 +00:00
Peter Wemm
66f769fe39 Add missing ; in last commit
Pointy-hat-to: jhb
2001-09-19 02:53:59 +00:00
Peter Wemm
98cdde71e7 Regenerate 2001-09-18 23:33:33 +00:00
Peter Wemm
eb25edbda3 Cleanup and split of nfs client and server code.
This builds on the top of several repo-copies.
2001-09-18 23:32:09 +00:00
John Baldwin
9ef3a9855d Use a 'p' variable instead of repetitively indirecting td->td_proc for
signal things that are still per-process and won't be per-thread.
2001-09-18 23:27:06 +00:00
John Baldwin
8cc06751dd Don't initialize proc0's mutex twice. It is already done earlier on in the
MD startup code.
2001-09-18 22:09:47 +00:00
Robert Watson
3ca719f12e o Introduce two new calls, securelevel_gt() and securelevel_ge(), which
abstract the securelevel implementation details from the checking
  code.  The call in -CURRENT accepts a struct ucred--in -STABLE, it
  will accept struct proc.  This facilitates the upcoming commit of
  per-jail securelevel support.  The calls will also generate a
  kernel printf if the calls are made with NULL ucred/proc pointers:
  generally speaking, there are few instances of this, and they should
  be fixed.
o Update p_candebug() to use securelevel_gt(); future updates to the
  remainder of the kernel tree will be committed soon.

Obtained from:	TrustedBSD Project
2001-09-18 21:03:53 +00:00
Mark Peek
796ed2a6d0 Set debug information on the process being traced, not the current (debugger)
process. This should allow gdb to function correctly on post-KSE kernels.
2001-09-18 19:06:11 +00:00
Jonathan Lemon
6a494eeb34 Change p into ke->ke_proc, this was hidden behind INVARIANTS. 2001-09-18 03:36:21 +00:00
Peter Wemm
d2718e479a Fix a fatal type mismatch (char *static_env; vs char static_env[]).
Submitted by:	bde
2001-09-17 21:27:41 +00:00
Julian Elischer
fdd4e5c652 Replace line accidentally deleted during KSE additions.
Symptom.. Stopped program unable to be restarted if it was stopped
while already sleeping.
2001-09-17 20:42:25 +00:00
Robert Watson
9844fbc3b5 o Correct authorization check in CANSIGIO(), which suffered from incorrect
transcription during the (pcred,ucred) merge; this was not used for
  the kill() system call, so does not affect direct explicit process
  signalling.

Pointed out by:	fenner
2001-09-15 22:34:46 +00:00
Peter Wemm
b711616825 In the devfs case, have initproc attempt the easy cases of mounting /dev.
This works if /dev exists, or if / is read/write (nfsroot).  If it is
too hard, leave it up to init -d (which will probably fail if /dev does
not exist, but there isn't much else we can do short of making a union
mount on /).

This means we get a proper /dev if you boot a 5.x kernel on a 4.x world,
which I happen to do often (the ramdisks on our install netboot servers
have 4.x userland worlds on them).
2001-09-15 11:15:22 +00:00
Doug Rabson
de1792cbb8 The ia64 kernel is now linked dynamically so parse its _DYNAMIC structure. 2001-09-15 11:02:10 +00:00
John Baldwin
bce9841972 Fix locking on td_flags for TDF_DEADLKTREAT. If the comments in the code
are true that curthread can change during this function, then this flag
needs to become a KSE flag, not a thread flag.
2001-09-13 22:33:37 +00:00
Michael Reifenberger
d528be2bf3 PR: kern/29698 (part)
Reviewed by:	audit
Implement SEM_STAT (like IPC_STAT but treats semid as sema-index).
The linuxerator will need it.
2001-09-13 21:06:41 +00:00
Michael Reifenberger
b3a4bc4247 PR: kern/29698 (part)
Reviewed by:	audit
Add tunables for the sem* and shm* syscontrols for tuning on boottime
until they become dynamic.
SAP R/3 doesn't like the compiled in defaults.
2001-09-13 20:20:09 +00:00
Julian Elischer
9dbea9237c If an incoming struct proc could have been NULL before, tehn don't
automatically change the code to add

struct proc *p = td->td_proc;

because now 'td' is probably capable of being NULL too.
I expect to see more of this kind of error during the 'weeding'
process. It's too easy to make. (junior hacker project.. look for these :-)

Submitted by:	mark Peek <mp@freebsd.org>
2001-09-12 20:26:57 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
Peter Wemm
8ee6d9e90f Fix the kern.module_path issue that required the trailing '/' character
on each module path component.  Fix a one-byte buffer overflow at the
same time that got highlighted in the process.
2001-09-12 00:50:23 +00:00
Dima Dorfman
34d2276e63 Correct a debugging message. 2001-09-11 12:20:24 +00:00
Peter Wemm
505222d35f Implement the long-awaited module->file cache database. A userland
tool (kldxref(8)) keeps a cache of what modules and versions are inside
what .ko files.  I have tested this on both Alpha and i386.

Submitted by:	bp
2001-09-11 01:09:24 +00:00
John Baldwin
04b5a9bbd6 - Axe holding_giant as it is not used now anyways and was ok'd by
dillon in an earlier e-mail.
- We don't need to test the console right before we vfprintf() the panicstr
  message.  The printing of the panic message is a fine console test by
  itself and doesn't make useful messages scroll off the screen or tick
  developers off in quite the same.

Requested by:	jlemon, imp, bmilekic, chris, gsutter, jake (2)
2001-09-10 21:04:49 +00:00
Peter Wemm
b03a0c9e5e Fix a warning on alpha (real problem) and make pstat -t work as a bonus.
'struct tty' was out of sync in user and kernel due to dev_t/udev_t
mixups.  This takes advantage of the fact that dev_t changes type in
userland, so it isn't too pretty.
2001-09-10 12:05:47 +00:00
Dima Dorfman
b40832162b Make the nsops' variable in semop' unsigned. This prevents an
overflow if uap->nsops (which is already unsigned) is over INT_MAX;
consequently, the bounds check below becomes valid.  Previously, if a
value over INT_MAX was passed in uap->nsops, the bounds check wouldn't
catch it, and the value would be used to compute copyin()'s third
argument.

Obtained from:	NetBSD
2001-09-10 11:36:08 +00:00
Kris Kennaway
bf61e26696 Fix some signed/unsigned integer confusion, and add bounds checking of
arguments to some functions.

Obtained from:	NetBSD
Reviewed by:	peter
MFC after:	2 weeks
2001-09-10 11:28:07 +00:00
Peter Wemm
ed6c38886e Fix a warning. l_name is managed by us and is malloc/free'ed.
It is the userland declaration of l_name that is inconvenient for us.
2001-09-10 07:53:04 +00:00
Peter Wemm
e414d9aad7 Add on UPAGES to ki_rssize since it is there as result of the process
and can be swapped out with the process.
2001-09-10 07:29:32 +00:00
Peter Wemm
eb30c1c0b9 Rip some well duplicated code out of cpu_wait() and cpu_exit() and move
it to the MI area.  KSE touched cpu_wait() which had the same change
replicated five ways for each platform.  Now it can just do it once.
The only MD parts seemed to be dealing with fpu state cleanup and things
like vm86 cleanup on x86.  The rest was identical.

XXX: ia64 and powerpc did not have cpu_throw(), so I've put a functional
stub in place.

Reviewed by:	jake, tmm, dillon
2001-09-10 04:28:58 +00:00
Matthew Dillon
06ae1e91c4 This brings in a Yahoo coredump patch from Paul, with additional mods by
me (addition of vn_rdwr_inchunks).  The problem Yahoo is solving is that
if you have large process images core dumping, or you have a large number of
forked processes all core dumping at the same time, the original coredump code
would leave the vnode locked throughout.  This can cause the directory vnode
to get locked up, which can cause the parent directory vnode to get locked
up, and so on all the way to the root node, locking the entire machine up
for extremely long periods of time.

This patch solves the problem in two ways.  First it uses an advisory
non-blocking lock to abort multiple processes trying to core to the same
file.  Second (my contribution) it chunks up the writes and uses bwillwrite()
to avoid holding the vnode locked while blocking in the buffer cache.

Submitted by:	ps
Reviewed by:	dillon
MFC after:	2 weeks
2001-09-08 20:02:33 +00:00
John Baldwin
df53e91c18 Call sendsig() with the proc lock held and return with it held. 2001-09-06 22:20:41 +00:00
Peter Wemm
fc8b64e494 Sigh. Dig up text from a signature in a 1994 Usenet post I made and redo
the ..uhh... ``console test'' to avoid another 50 emails about GPL issues.
2001-09-05 23:51:06 +00:00
David E. O'Brien
faf73940c6 Fix the definition generation code from rev 1.15 that generates non-style(9)
compliant structure definitions.
2001-09-05 01:27:53 +00:00
Ian Dowse
7476f7e87d Fix a memory leak in __getcwd() that can occur after a filesystem
has been forcibly unmounted. If the filesystem root vnode is reached
and it has no associated mountpoint (vp->v_mount == NULL), __getcwd
would return without freeing 'buf'. Add the missing free() call.

PR:		kern/30306
Submitted by:	Mike Potanin <potanin@mccme.ru>
MFC after:	1 week
2001-09-04 19:03:47 +00:00
Peter Wemm
c92c4c8f79 Unindent a if (1) { that was left behind in the last commit.
(commits were seperated to not obscure the real change)
2001-09-03 04:39:38 +00:00
Peter Wemm
00dda5e82b Argh. Make the ia64 kernel work in all situations. For some reason,
and I still dont know why, this was not failing on the non-kse kernel.
It certainly should have since things were using linker_kernel_file
unconditionally.  This has highlighted a different problem though that
means that trying to do a kldload on a non-dynamic kernel will implode.
2001-09-03 04:37:55 +00:00
David E. O'Brien
6533ba2e33 Match the declaration in net/netisr.h.
Submitted by:	gcc 3.0.1
2001-09-03 03:24:31 +00:00
Peter Wemm
772121fd11 The !RESTARTABLE_PANICS code has some loose ends. 2001-09-02 12:24:38 +00:00
Peter Wemm
ef4181d98e For ia64, set the default elf brand to be FreeBSD. This is temporarily
necessary only for as long as we're using a linux toolchain.
2001-09-02 12:23:08 +00:00