Commit Graph

4107 Commits

Author SHA1 Message Date
Jake Burkholder
f74250ca46 Remove some code that appears to have endian problems with INVARIANTS.
This is #if BIG_ENDIAN, but is only necessary if malloc types are shorts,
not struct malloc_type * like they are now.
2001-08-03 03:31:45 +00:00
John Baldwin
b39bc3e160 Use 'p' instead of the potentially more expensive 'curproc' inside of
mi_switch().
2001-08-02 22:15:31 +00:00
Warner Losh
c7021493ba Make the fmt arguments to make_dev and make_dev_alias const char *.
Approved on IRC as long as it didn't cause a large number of warnings by: phk

MFC After: 700 hours
2001-08-02 20:35:35 +00:00
Peter Wemm
aa7a4dae6d Temporarily back out kern_sig.c rev 1.125 and kern_exit.c rev 1.131.
This paniced my one of my machines one time too many :-( and there is
no sign of a solution in the pipeline.  The deltas are still easily
available in cvs.  The problem is that if the parent has been swapped
out, the child process cannot grope around in the parent's UPAGES to
see the sigact[] array or it will fault.  This probably is a showstopper
for this implementation anyway.
2001-08-01 20:35:24 +00:00
Bosko Milekic
bb6f838c79 Move CPU_ABSENT() macro to smp.h, where it belongs anyway. It will be
defined to 0 in the non-SMP case, which very much makes sense as it
permits its usage in per-CPU initialization loops (for an example, check
out subr_mbuf.c).
  Further, on a UP system, make mb_alloc always use the first per-CPU
container, regardless of cpuid (i.e. remove reliability on cpuid in the
UP case).

Requested by: alfred
2001-08-01 00:54:00 +00:00
John Baldwin
36c2e9feb4 Apply the cluebat to myself and undo the await() -> mawait() rename. The
asleep() and await() functions split the functionality of msleep() up into
two halves.  Only the asleep() half (which is what puts the process on the
sleep queue) actually needs the lock usually passed to msleep() held to
prevent lost wakeups.  await() does not need the lock held, so the lock
can be released prior to calling await() and does not need to be passed in
to the await() function.  Typical usage of these functions would be as
follows:

        mtx_lock(&foo_mtx);
        ... do stuff ...
        asleep(&foo_cond, PRIxx, "foowt", hz);
        ...
        mtx_unlock&foo_mtx);
        ...
        await(-1, -1);

Inspired by:	dillon on the couch at Usenix
2001-07-31 22:06:56 +00:00
John Baldwin
e9121d0663 Add a safety belt to mawait() for the (cold || panicstr) case identical to
the one in msleep() such that we return immediately rather than blocking.

Submitted by:	peter
Prodded by:	sheldonh
2001-07-31 20:57:57 +00:00
John Baldwin
5cb0fbe47e If we have already panic'd then don't bother enforcing mutex asserts as
things are pretty much shot already and all panic'ing does is hurt our
chances of getting a dump.

Inspired by:	sheldonh
2001-07-31 17:45:50 +00:00
John Baldwin
32bca5fe03 - Fix panicstr checks to explicitly check against NULL.
- Add a few more panicstr checks so that we don't panic recursively.

Requested by:	sheldonh (2)
2001-07-31 17:44:57 +00:00
Robert Watson
e7f65fdcf9 o Modify p_candebug() such that there is no longer automatic acceptance
of debugging the current process when that is in conflict with other
  restrictions (such as jail, unprivileged_procdebug_permitted, etc).
o This corrects anomolies in the behavior of
  kern.security.unprivileged_procdebug_permitted when using truss and
  ktrace.  The theory goes that this is now safe to use.

Obtained from:	TrustedBSD Project
2001-07-31 17:25:12 +00:00
Robert Watson
0ef5652e27 o Introduce new kern.security sysctl tree for kernel security policy
MIB entries.
o Relocate kern.suser_permitted to kern.security.suser_permitted.
o Introduce new kern.security.unprivileged_procdebug_permitted, which
  (when set to 0) prevents processes without privilege from performing
  a variety of inter-process debugging activities.  The default is 1,
  to provide current behavior.

  This feature allows "hardened" systems to disable access to debugging
  facilities, which have been associated with a number of past security
  vulnerabilities.  Previously, while procfs could be unmounted, other
  in-kernel facilities (such as ptrace()) were still available.  This
  setting should not be modified on normal development systems, as it
  will result in frustration.  Some utilities respond poorly to
  failing to get the debugging access they require, and error response
  by these utilities may be improved in the future in the name of
  beautification.

  Note that there are currently some odd interactions with some
  facilities, which will need to be resolved before this should be used
  in production, including odd interactions with truss and ktrace.
  Note also that currently, tracing is permitted on the current process
  regardless of this flag, for compatibility with previous
  authorization code in various facilities, but that will probably
  change (and resolve the odd interactions).

Obtained from:	TrustedBSD Project
2001-07-31 15:48:21 +00:00
Jake Burkholder
146be906a1 Don't try to find an eventhandler list if the list of lists hasn't
been initialized yet.
2001-07-31 03:52:16 +00:00
Jake Burkholder
98b0e9d587 Don't try to print a field that doesn't exist; in usually commented
out debugging code.
2001-07-31 03:51:07 +00:00
Jake Burkholder
7e5102989e Use a machine dependent type, Elf_Hashelt, for the elements of the elf
dynamic symbol table buckets and chains.  The sparc64 toolchain uses 32
bit .hash entries, unlike other 64 bits architectures (alpha), which use
64 bit entries.

Discussed with: dfr, jdp
2001-07-31 03:46:39 +00:00
Jeroen Ruigrok van der Werven
7b389f3335 Fix obsolete code.
FreeBSD _does_ define ENOMSG, so no need for checking if we support it.

Inspired by PR:		22470
Which was submitted by:	Bjorn Tornqvist <bjorn@west.se>
MFC after:	1 week
2001-07-30 19:28:02 +00:00
Peter Wemm
b219758f94 Revert previous accidental commit. FWIW, it was part of enabling
VM caching of disks through mmap() and stopping syncing of open files
that had their last reference in the fs removed (ie: their unsync'ed
pages get discarded on close already, so I made it stop syncing too).
2001-07-27 15:57:17 +00:00
Peter Wemm
24a590a074 Fix cut/paste blunder. Serves me right for doing a last minute tweak
to what I had for some time.

Submitted by:	bde
2001-07-27 15:52:49 +00:00
Peter Wemm
a03bd29498 Use the tunable maxusers rather than the compile-time one. Evaluate and
initialize in the right order to make derivative settings work right.
eg: at compile time, nmbufs was double nmbclusters.  For POLA this should
work the same at runtime.
2001-07-26 23:08:31 +00:00
Peter Wemm
ee342e1bf1 Move param.c out of the conf directory and make it fully dynamic.
Tunables are now derived at boot time from maxusers.  ie: change maxusers
via a tunable and all the derivative settings change.  You can change
the other tunables individually as well.  Even hz etc is tunable.
2001-07-26 23:04:03 +00:00
Bosko Milekic
49f854f926 - Do not handle the per-CPU containers in mbuf code as though the cpuids
were indices in a dense array. The cpuids are a sparse set and treat
  them as such, setting up containers only for CPUs activated during
  mb_init().

- Fix netstat(1) and systat(1) to treat the per-CPU stats area as a sparse
  map, in accordance with the above.

This allows us to properly boot with certain CPUs disactivated. However, if
we later decide to re-activate said CPUs, we will barf until we decide to
implement CPU spinon/spinoff callback hooks to allow for said CPUs' per-CPU
containers to get configured on their activation.

Reported by: mjacob
Partially (sys/ diffs) Submitted by: mjacob
2001-07-26 18:47:46 +00:00
Bill Fenner
c3cb7e5d7a Don't bother passing p to rtioctl just so it can fail to pass it to mrt_ioctl 2001-07-25 20:15:28 +00:00
Peter Pentchev
7ca4d05f34 Make dynamic sysctl entries start at 0x100, not decimal 100 - there are
static entries with oid's over 100, and defining enough dynamic entries
causes an overlap.

Move the "magic" value 0x100 into <sys/sysctl.h> where it belongs.

PR:		29131
Submitted by:	"Alexander N. Kabaev" <kabaev@mail.ru>
Reviewed by:	-arch, -audit
MFC after:	2 weeks
2001-07-25 17:21:18 +00:00
Peter Pentchev
107e7dc5c3 Style(9): function names on a separate line, max line length 80 chars.
Reviewed by:	-arch, -audit
MFC after:	2 weeks
2001-07-25 17:13:58 +00:00
Dima Dorfman
02bd5400fe sys/kern/tty_snoop.c is now sys/dev/snp/snp.c.
Repo-copy by:	jdp
2001-07-25 12:06:36 +00:00
Assar Westerlund
2b3dc41c15 correct description of `vpp' for mknod/symlink: they are actually
returned locked
2001-07-24 16:16:00 +00:00
Matthew Dillon
4fec48c6fe As per further discussions on hackers redo the SIGCHLD patch to not generate
an unexpected user-visible side effect with the sigaction flags.  Also cleanup
a minor union issue.

Submitted by: Rudolf Cejka <cejkar@dcse.fee.vutbr.cz>
MFC addendum: MFC will be combined w/ original commit
MFC after: 3 days
2001-07-22 18:47:31 +00:00
Assar Westerlund
17b65d5532 revert previous commit (bad style and not needed)
Noticed:	bde
2001-07-22 10:24:31 +00:00
Assar Westerlund
8cfdf32239 add prototype for dosetrlimit 2001-07-22 00:21:19 +00:00
Assar Westerlund
129a62d7c7 add <sys/cdefs.h> (for __unused and such) 2001-07-21 17:12:44 +00:00
John Baldwin
a5dd141db6 Add a missing ~ so that the LO_INITIALIZED flag actually gets turned off
in witness_destroy().
2001-07-20 23:29:25 +00:00
Jonathan Lemon
5f5c2e958f Introduce EVFILT_TIMER, which allows a process to establish an
arbitrary number of timers, both oneshot and periodic.

Repeatedly reminded to commit by: jayanth
Reviewed by: peter (a while back)
2001-07-19 18:34:40 +00:00
Kris Kennaway
2d075e994c Don't use kp->arg0 as a format string, grr.
MFC after:	1 week
2001-07-19 02:18:54 +00:00
Dima Dorfman
ac60b28d35 Keep track of all "struct snoop"'s so that snp_modevent can fail with
EBUSY if there's a device still open.
2001-07-18 13:39:43 +00:00
David E. O'Brien
b46ba8880c Increase NMBCLUSTERS by 4x.
This takes a GENERIC kernel (MAXUSERS=32) from 1536 to 3072.
2001-07-17 15:51:12 +00:00
Peter Wemm
2fc4762c60 Move the hints gunk to a seperate file. It isn't really part of the
newbus structure (no more than subr_rman.c is anyway).
2001-07-14 08:25:18 +00:00
Peter Wemm
9516fbd6d9 Go back to having either static OR dynamic hints, with fallback
support.  Trying to fix the merged set where dynamic overrode
static was getting more and more complicated by the day.

This should fix the duplicate atkbd, psm, fd* etc in GENERIC.  (which
paniced the alpha, but not the i386)
2001-07-14 00:23:10 +00:00
Dima Dorfman
b2c3fa70e3 Correct spelling in a comment and remove trailing newline from a
panic() call (panic() adds it itself).
2001-07-11 02:04:43 +00:00
Dag-Erling Smørgrav
f0cc1c6f81 Constify the fstype argument to vfs_mount(). This eliminates at least one
"call discards qualifier" warning (in sys/compat/linux/linux_file.c).
2001-07-09 19:11:51 +00:00
Guido van Rooij
333ea48563 Don't share sig handlers after an exec
Reviewed by:	Alfred Perlstein
2001-07-09 19:01:42 +00:00
Guido van Rooij
9b956e9897 Get rid of useless bcopy (the next statement was equivalent) 2001-07-09 19:00:08 +00:00
Jake Burkholder
d652b3d918 Backout mwakeup, etc. 2001-07-06 01:16:43 +00:00
Robert Watson
a0f75161f9 o Replace calls to p_can(..., P_CAN_xxx) with calls to p_canxxx().
The p_can(...) construct was a premature (and, it turns out,
  awkward) abstraction.  The individual calls to p_canxxx() better
  reflect differences between the inter-process authorization checks,
  such as differing checks based on the type of signal.  This has
  a side effect of improving code readability.
o Replace direct credential authorization checks in ktrace() with
  invocation of p_candebug(), while maintaining the special case
  check of KTR_ROOT.  This allows ktrace() to "play more nicely"
  with new mandatory access control schemes, as well as making its
  authorization checks consistent with other "debugging class"
  checks.
o Eliminate "privused" construct for p_can*() calls which allowed the
  caller to determine if privilege was required for successful
  evaluation of the access control check.  This primitive is currently
  unused, and as such, serves only to complicate the API.

Approved by:	({procfs,linprocfs} changes) des
Obtained from:	TrustedBSD Project
2001-07-05 17:10:46 +00:00
John Baldwin
f583b1d938 Spelling fix in a KASSERT: runq_chose -> runq_choose. 2001-07-04 20:00:48 +00:00
Matthew Dillon
7b9673fa28 cleanup: GIANT macros, rename DEPRECIATE to DEPRECATE
Move p_giant_optional to proc zero'd section
Remove (old) XXX zfree comment in pipe code
2001-07-04 17:11:03 +00:00
Matthew Dillon
0cddd8f023 With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage).  Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.
2001-07-04 16:20:28 +00:00
Matthew Dillon
085be199c6 postsig() currently requires Giant to be held. Giant is held properly at
the first postsig() call, but not always held at the second place,
resulting in an occassional panic.
2001-07-04 15:36:30 +00:00
Jake Burkholder
9316aed2ef Implement mwakeup, mwakeup_one, cv_signal_drop and cv_broadcast_drop.
These take an additional mutex argument, which is dropped before any
processes are made runnable.  This can avoid contention on the mutex
if the processes would immediately acquire it, and is done in such a
way that wakeups will not be lost.

Reviewed by:	jhb
2001-07-04 00:32:50 +00:00
Dag-Erling Smørgrav
2687c8741b Constify the format string.
Submitted by:	Mike Barcroft <mike@q9media.com>
2001-07-03 21:46:43 +00:00
Thomas Moestl
948d3d9484 Make the code to read the kernel message buffer via sysctl machine-
independent and rename the corresponding sysctls from machdep.msgbuf and
machdep.msgbuf_clear (i386 only) to kern.msgbuf and kern.msgbuf_clear.
2001-07-03 19:44:07 +00:00
John Baldwin
29905510e0 Remove spl's in uio_yield() that are covered by the sched_lock. 2001-07-03 15:58:37 +00:00
John Baldwin
d68a8cc0ab Remove commented-out garbage that skipped updating schedcpu() stats for
ithreads in SWAIT.
2001-07-03 08:03:56 +00:00
John Baldwin
97b4306f0f Just check p_oncpu when determining if a process is executing or not.
We already did this in the SMP case, and it is now maintained in the UP
case as well, and makes the code slightly more readable.  Note that
curproc is always executing, thus the p != curproc test does not need to
be performed if the p_oncpu check is made.
2001-07-03 08:00:57 +00:00
John Baldwin
9d36b83e2c Axe spl's that are covered by the sched_lock (and have been for quite
some time.)
2001-07-03 07:53:35 +00:00
John Baldwin
36f1548b96 Include the wait message and channel for msleep() in the KTR tracepoint. 2001-07-03 07:39:06 +00:00
John Baldwin
8f451b4114 Remove bogus need_resched() of the current CPU in roundrobin().
We don't actually need to force a context switch of the current process.
The act of firing the event triggers a context switch to softclock() and
then switching back out again which is equivalent to a preemption, thus
no further work is needed on the local CPU.
2001-07-03 05:33:09 +00:00
John Baldwin
64acb05b1c Grab Giant around postsig() since sendsig() can call into the vm to
grow the stack and we already needed Giant for KTRACE.
2001-07-03 05:27:53 +00:00
Robert Watson
e84b7987bc o Unfold p31b_proc() into the individual posix4 system calls so as to
allow call-specific authorization.
o Modify the authorization model so that p_can() is used to check
  scheduling get/set events, using P_CAN_SEE for gets, and P_CAN_SCHED
  for sets.  This brings the checks in line with get/setpriority().

Obtained from:	TrustedBSD Project
2001-06-30 07:55:19 +00:00
John Baldwin
aa3cefd06c Remove the p_spinlocks spin lock count that was obsoleted by the
per-CPU spinlocks list.
2001-06-30 03:35:22 +00:00
Robert Watson
1af55356f8 Replace some use of 'p' with 'targetp' so as to not scarily overload the
passed 'p' argument.  No functional change.

Obtained from:	USENIX Emporium, Cheap Tricks Department
2001-06-30 03:13:36 +00:00
John Baldwin
a300519d41 Make the schedlock saved critical section state a per-thread property. 2001-06-30 03:11:26 +00:00
John Baldwin
7aa7260e4a Move ast() and userret() to sys/kern/subr_trap.c now that they are MI. 2001-06-29 19:51:37 +00:00
John Baldwin
6be523bca7 Add a new MI pointer to the process' trapframe p_frame instead of using
various differently named pointers buried under p_md.

Reviewed by:	jake (in principle)
2001-06-29 11:10:41 +00:00
John Baldwin
92809bc001 Grab Giant around trap_pfault() for now. 2001-06-29 04:18:10 +00:00
Jonathan Lemon
84241bd0dc Fix up indentation. 2001-06-29 04:01:38 +00:00
Robert Watson
64e55bf47b Remove a fascinating but confusing construct involving chaining
conditional clauses in the following way:

	(0 || a || b);

No functional change.
2001-06-28 23:02:09 +00:00
Robert Watson
e8f7a95298 Add error checking for copyin() operations in posix4 scheduling code. 2001-06-28 22:53:42 +00:00
John Baldwin
ec178c1e4c Don't check witness assertions if the lock doesn't use witness or witness
is dead.
2001-06-28 22:22:20 +00:00
John Baldwin
cd2f721557 - Fix a mntvnode and vnode interlock reversal.
- Protect the mnt_vnode list with the mntvnode lock.
2001-06-28 04:05:54 +00:00
John Baldwin
5f36700a32 - Add trylock variants of shared and exclusive locks.
- The sx assertions don't actually need the internal sx mutex lock, so
  don't bother doing so.
- Add a new assertion SX_ASSERT_LOCKED() that asserts that either a
  shared or exclusive lock should be held.  This assertion should be used
  instead of SX_ASSERT_SLOCKED() in almost all cases.
- Adjust some KASSERT()'s to include file and line information.
- Use the new witness_assert() function in the WITNESS case for sx slock
  asserts to verify that the current thread actually owns a slock.
2001-06-27 06:39:37 +00:00
John Baldwin
04297fe609 - Add a new witness_assert() to perform arbitrary locking assertions.
- Clean up the KTR tracepoints to be slighlty more consistent and useful
- Fix a bug in WITNESS where we would recurse indefinitely and blow the
  stack when acquiring Giant after sleeping with a sleepable lock held.

Reported by:	tanimura (3)
2001-06-27 06:27:29 +00:00
John Baldwin
776e0b3693 - Always use the proc lock of the task leader to protect the peers list of
processes.
- Don't construct fake call args and then call kill().  psignal is not
  anymore complicated and is quicker and not prone to locking problems.
  Calling psignal() avoids having to do a pfind() since we already have a
  proc pointer and also allows us to keep the task leader locked while we
  kill all the peer processes so the list is kept coherent.
- When a kthread exits, do a wakeup() on its proc pointers.  This can be
  used by kernel modules that have kthreads and want to ensure they have
  safely exited before completely the MOD_UNLOAD event.

Connectivity provided by:	Usenix wireless
2001-06-27 06:15:44 +00:00
John Baldwin
b7e554f5d6 - Move the 'clk' spinlock below other spin locks since KTR trace events
may need the clock lock for nanotime().
- Add KTR trace events for lock list manipulations and other witness
  operations.
- Use a temporary variable instead of setting the lock list head directly
  and then setting up the links to add a new lock list entry to the lock
  list.  This small race could result in witness "forgetting" about all
  the locks held by this process temporarily during an interrupt.
- Close a more fatal race condition when removing a lock from a list.
  Removing a lock from the list entails both decrementing the count of
  items in this bucket as well as shuffling items in the current bucket up
  a notch to replace the gap left by the removed item.  Wrap these
  operations in a critical section.
2001-06-25 23:17:52 +00:00
John Baldwin
1715f07da3 - Replace the unused KTR_IDLELOOP trace class with a new KTR_WITNESS trace
class to trace witness events.
- Make the ktr_cpu field of ktr_entry be a standard field rather than one
  present only in the KTR_EXTEND case.
- Move the default definition of KTR_ENTRIES from sys/ktr.h to
  kern/kern_ktr.c.  It has not been needed in the header file since KTR
  was un-inlined.
- Minor include cleanup in kern/kern_ktr.c.
- Fiddle with the ktr_cpumask in ktr_tracepoint() to disable KTR events
  on the current CPU while we are processing an event.
- Set the current CPU inside of the critical section to ensure we don't
  migrate CPU's after the critical section but before we set the CPU.
2001-06-25 23:09:31 +00:00
John Baldwin
1d79f1bb9a - Sort includes.
- Count the context switches during shutdown when we give ithreads a chance
  to run as volutary context switches.

Submitted by:	bde (2)
2001-06-25 18:30:42 +00:00
John Baldwin
c4f7a18726 Count the context switch when blocking on a mutex as a voluntary context
switch.  Count the context switch when preempting the current thread to let
a higher priority thread blocked on a mutex we just released run as an
involuntary context switch.

Reported by:	bde
2001-06-25 18:29:32 +00:00
John Baldwin
84bbc4dbda Count the switch when an ithread goes idle as a voluntary context switch.
Submitted by:	bde
2001-06-25 18:27:33 +00:00
David Malone
db3cc2d09f Don't dereference a NULL pointer if we fail to get a sendfilebuf. 2001-06-24 12:27:30 +00:00
Matthew Dillon
c7503f60c4 After exhaustive discussions and some meandering and confusion, enough
people are on track with the cause and effect of this, and although
fixing this severely degenerate case appears to violate the letter of
POSIX.1-200x, Bruce and I (and enough others) agree that it should be
comitted.

So, this patch generates an ENOENT error for any attempt to do a path lookup
through an empty symlink (e.g. open(), stat()).

Submitted by: "Andrey A. Chernov" <ache@nagual.pp.ru>
Reviewed by: bde
Discussed exhaustively on: freebsd-current
Previously committed to: NetBSD 4 years ago
2001-06-24 05:24:41 +00:00
John Baldwin
1df95969b5 - Lock CURSIG() with the proc lock to close the signal race with psignal.
- Grab Giant around ktrace points.
- Clean up KTR_PROC tracepoints to not display the value of
  sched_lock.mtx_lock as it isn't really needed anymore and just obfuscates
  the messages.
- Add a few if conditions to replace gotos.
- Ensure that every msleep KTR event ends up with a matching msleep resume
  KTR event (this was broken when we didn't do a mi_switch()).
- Only note via ktrace that we resumed from a switch once rather than twice
  in several places in msleep().
- Remove spl's rom asleep and await as the proc lock and sched_lock provide
  all the needed locking.
- In mawait() add in a needed ktrace point for noting that we are about to
  switch out.
2001-06-22 23:11:26 +00:00
John Baldwin
87f9ffb805 - Lock CURSIG with the proc lock and don't release the proc lock until
after grabbing the sched lock to close a race.
- Lock ktrace points with Giant.
2001-06-22 23:06:38 +00:00
John Baldwin
06c836bbca - Grab the proc lock around CURSIG and postsig(). Don't release the proc
lock until after grabbing the sched_lock to avoid CURSIG racing with
  psignal.
- Don't grab Giant for addupc_task() as it isn't needed.

Reported by:	tegge (signal race), bde (addupc_task a while back)
2001-06-22 23:05:11 +00:00
John Baldwin
2ad7d3049a - Change CURSIG() and postsig() to require that the proc lock is held
rather than grabbing it and releasing it themselves.  This allows callers
  of these functions to get the lock to close race conditions.
- Grab Giant around ktrace in postsig.
- Count the switches performed on SIGSTOP's as involuntary context switches
  in the resource usage stats.

Reported by:	tegge (signal race), bde (missing csw stats)
2001-06-22 23:02:37 +00:00
Matt Jacob
2f7f966cb8 int -> size_t fix 2001-06-22 19:54:38 +00:00
Matt Jacob
8f5a1742c2 Temporary fix at least- define NCPU_PRESENT which will be mp_npcus for
SMP kernels, one (1) for non-SMP.
2001-06-22 16:03:23 +00:00
Jim Pirzyk
f83ae79fbe changed hostid from long to unsigned long to be able to store values > 2GB
on i386 platforms.  Also changed SYSCTL type from INT to ULONG and removed
comment about it.

PR:		kern/21132
MFC after:	1 month
2001-06-22 16:03:14 +00:00
Bosko Milekic
08442f8a82 Introduce numerous SMP friendly changes to the mbuf allocator. Namely,
introduce a modified allocation mechanism for mbufs and mbuf clusters; one
which can scale under SMP and which offers the possibility of resource
reclamation to be implemented in the future. Notable advantages:

 o Reduce contention for SMP by offering per-CPU pools and locks.
 o Better use of data cache due to per-CPU pools.
 o Much less code cache pollution due to excessively large allocation macros.
 o Framework for `grouping' objects from same page together so as to be able
   to possibly free wired-down pages back to the system if they are no longer
   needed by the network stacks.

 Additional things changed with this addition:

  - Moved some mbuf specific declarations and initializations from
    sys/conf/param.c into mbuf-specific code where they belong.
  - m_getclr() has been renamed to m_get_clrd() because the old name is really
    confusing. m_getclr() HAS been preserved though and is defined to the new
    name. No tree sweep has been done "to change the interface," as the old
    name will continue to be supported and is not depracated. The change was
    merely done because m_getclr() sounds too much like "m_get a cluster."
  - TEMPORARILY disabled mbtypes statistics displaying in netstat(1) and
    systat(1) (see TODO below).
  - Fixed systat(1) to display number of "free mbufs" based on new per-CPU
    stat structures.
  - Fixed netstat(1) to display new per-CPU stats based on sysctl-exported
    per-CPU stat structures. All infos are fetched via sysctl.

 TODO (in order of priority):

  - Re-enable mbtypes statistics in both netstat(1) and systat(1) after
    introducing an SMP friendly way to collect the mbtypes stats under the
    already introduced per-CPU locks (i.e. hopefully don't use atomic() - it
    seems too costly for a mere stat update, especially when other locks are
    already present).
  - Optionally have systat(1) display not only "total free mbufs" but also
    "total free mbufs per CPU pool."
  - Fix minor length-fetching issues in netstat(1) related to recently
    re-enabled option to read mbuf stats from a core file.
  - Move reference counters at least for mbuf clusters into an unused portion
    of the cluster itself, to save space and need to allocate a counter.
  - Look into introducing resource freeing possibly from a kproc.

Reviewed by (in parts): jlemon, jake, silby, terry
Tested by: jlemon (Intel & Alpha), mjacob (Intel & Alpha)
Preliminary performance measurements: jlemon (and me, obviously)
URL: http://people.freebsd.org/~bmilekic/mb_alloc/
2001-06-22 06:35:32 +00:00
John Baldwin
fbd26f7594 Fix some lock order reversals where we called free() while holding a proc
lock.  We now use temporary variables to save the process argument pointer
and just update the pointer while holding the lock.  We then perform the
free on the cached pointer after releasing the lock.
2001-06-20 23:10:06 +00:00
Bosko Milekic
f5eece3fb9 Change m_devget()'s outdated and unused `offset' argument to actually mean
something: offset into the first mbuf of the target chain before copying
the source data over.

Make drivers using m_devget() with a first argument "data - ETHER_ALIGN"
to use the offset argument to pass ETHER_ALIGN in. The way it was previously
done is potentially dangerous if the source data was at the top of a page
and the offset caused the previous page to be copied (if the
previous page has not yet been appropriately mapped).

The old `offset' argument in m_devget() is not used anywhere (it's always
0) and dates back to ~1995 (and earlier?) when support for ethernet trailers
existed. With that support gone, it was merely collecting dust.

Tested on alpha by: jlemon
Partially submitted by: jlemon
Reviewed by: jlemon
MFC after: 3 weeks
2001-06-20 19:48:35 +00:00
John Baldwin
2e1aacccac Preemption by an interrupt thread is an involuntary switch, not a voluntary
one.

Pointy-hat to:	me
2001-06-20 18:26:41 +00:00
Dag-Erling Smørgrav
0e79fe6f0e Constify (silence warnings introduced by last commit to sys/module.h) 2001-06-20 16:08:45 +00:00
Garrett Wollman
37336173d3 After one too many PRs on the subject, bite the bullet and define IOV_MAX
and its associated constants.  Implement _SC_IOV_MAX in the usual way.
Be a bit sloppy about the namespace question; this should get cleared up
in time for 5.0.

MFC after:	1 month
2001-06-18 20:24:54 +00:00
John Baldwin
6fad32afc9 Lock Giant in postsig() for the KTRACE case as ktrpsig() needs Giant when
it writes out to the trace file.

Reported by:	peter, gallatin, and others
2001-06-18 19:23:43 +00:00
Brian Somers
09dbb40410 Add linker_reference_module().
This function loads a module if required, otherwise bumps the reference
count -- the opposite of linker_file_unload().
2001-06-18 15:09:33 +00:00
Brian Somers
21ff14e0f9 Don't remove the SI_CHEAPCLONE for unsupported minors 2001-06-18 09:22:30 +00:00
Peter Wemm
b85db19691 Move setugid() a little sooner to before we release tracing in case
crdup() or change_e*id() block on malloc() or mutex.
2001-06-16 23:34:23 +00:00
Peter Wemm
5a280d9cd1 Add INTR_TYPE_AV so that we can get to the PI_AV priority in the ithread
handlers.  This is beneficial since it means that pcm's MPSAFE handler
can get run before things that will block on Giant in the shared irq
case.
2001-06-16 22:42:19 +00:00
Jonathan Lemon
9fa416ca19 Fix warnings:
112: warning: cast to pointer from integer of different size
125: warning: cast to pointer from integer of different size
2001-06-16 07:02:47 +00:00
Jonathan Lemon
7b748f0a21 Correctly hook up the write kqfilter to pipes.
Submitted by:  Niels Provos <provos@citi.umich.edu>
2001-06-15 20:45:01 +00:00
Peter Wemm
b93c3c5ed6 Fix some warnings in kern_environment.c. Make the getenv*() family
take a const 'name', since they dont modify anything.
159: warning: passing arg 1 of `getenv_int' discards qualifiers...
167: warning: passing arg 1 of `getenv' discards qualifiers from pointer..
2001-06-15 07:29:17 +00:00
Peter Wemm
ee24290963 As per comments in sys/linker_set.h:
BANG! BANG! BANG! BANG! BANG! BANG! CLICK! CLICK! CLICK! CLICK! CLICK!
<reload>
BANG! BANG! BANG! BANG! BANG! BANG! CLICK! CLICK! CLICK! CLICK! CLICK!
2001-06-14 01:28:56 +00:00
Peter Wemm
f41325db5f With this commit, I hereby pronounce gensetdefs past its use-by date.
Replace the a.out emulation of 'struct linker_set' with something
a little more flexible.  <sys/linker_set.h> now provides macros for
accessing elements and completely hides the implementation.

The linker_set.h macros have been on the back burner in various
forms since 1998 and has ideas and code from Mike Smith (SET_FOREACH()),
John Polstra (ELF clue) and myself (cleaned up API and the conversion
of the rest of the kernel to use it).

The macros declare a strongly typed set.  They return elements with the
type that you declare the set with, rather than a generic void *.

For ELF, we use the magic ld symbols (__start_<setname> and
__stop_<setname>).  Thanks to Richard Henderson <rth@redhat.com> for the
trick about how to force ld to provide them for kld's.

For a.out, we use the old linker_set struct.

NOTE: the item lists are no longer null terminated.  This is why
the code impact is high in certain areas.

The runtime linker has a new method to find the linker set
boundaries depending on which backend format is in use.

linker sets are still module/kld unfriendly and should never be used
for anything that may be modular one day.

Reviewed by:	eivind
2001-06-13 10:58:39 +00:00
Peter Wemm
db957588c9 Patch up a blunder I made a few days ago. nmbcnt was being initialized
too late.

Noted by:      bmilekic
Pointy-hat to: peter
2001-06-13 00:36:41 +00:00
Peter Wemm
2398f0cd1d Hints overhaul:
- Replace some very poorly thought out API hacks that should have been
  fixed a long while ago.
- Provide some much more flexible search functions (resource_find_*())
- Use strings for storage instead of an outgrowth of the rather
  inconvenient temporary ioconf table from config().  We already had a
  fallback to using strings before malloc/vm was running anyway.
2001-06-12 09:40:04 +00:00
Dag-Erling Smørgrav
8f7e4eb568 Rename nextpid to lastpid and externalize it. 2001-06-11 21:54:19 +00:00
Dag-Erling Smørgrav
fe46349692 Blah, I cut out a tad too much in the previous commit. (thanks again, Jake!) 2001-06-11 18:43:32 +00:00
Dag-Erling Smørgrav
e3b373228c copyin(9) doesn't return ENAMETOOLONG. (thanks, Jake!) 2001-06-11 18:36:18 +00:00
Dag-Erling Smørgrav
b0def2b548 Add sbuf_copyin(). Also add 'b' variants of sbuf_{cat,copyin,cpy}() which
ignore NUL bytes in the source string.
2001-06-11 17:05:52 +00:00
Hajimu UMEMOTO
3384154590 Sync with recent KAME.
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.

TODO:
  - The definitions of SADB_* in sys/net/pfkeyv2.h are still different
    from RFC2407/IANA assignment because of binary compatibility
    issue.  It should be fixed under 5-CURRENT.
  - ip6po_m member of struct ip6_pktopts is no longer used.  But, it
    is still there because of binary compatibility issue.  It should
    be removed under 5-CURRENT.

Reviewed by:	itojun
Obtained from:	KAME
MFC after:	3 weeks
2001-06-11 12:39:29 +00:00
David Malone
c7fd62da6c Try to make the setting of the SIGCHLD handler the same as setting of
the NOCLDWAI flag. Susv2 seems to require this.

Submitted by:	Cejka Rudolf <cejkar@dcse.fee.vutbr.cz>
Reviewed by:	dillon
2001-06-11 09:15:41 +00:00
Dag-Erling Smørgrav
d647935801 sbuf_new(9) now returns a struct sbuf * instead of an int. If the caller
does not provide a struct sbuf, sbuf_new(9) will allocate one and return
a pointer to it.
2001-06-10 15:48:04 +00:00
Peter Wemm
0978669829 "Fix" the previous initial attempt at fixing TUNABLE_INT(). This time
around, use a common function for looking up and extracting the tunables
from the kernel environment.  This saves duplicating the same function
over and over again.  This way typically has an overhead of 8 bytes + the
path string, versus about 26 bytes + the path string.
2001-06-08 05:24:21 +00:00
Peter Wemm
4422746fdf Back out part of my previous commit. This was a last minute change
and I botched testing.  This is a perfect example of how NOT to do
this sort of thing. :-(
2001-06-07 03:17:26 +00:00
Thomas Moestl
c0a0fb85e2 Fix an instance of NDINIT in the extattrctl syscall: LOCKLEAF was or'ed
to the operation parameter, not to the flags as it should be.

Reviewed by:	rwatson
2001-06-06 23:34:38 +00:00
Peter Wemm
81930014ef Make the TUNABLE_*() macros look and behave more consistantly like the
SYSCTL_*() macros.  TUNABLE_INT_DECL() was an odd name because it didn't
actually declare the int, which is what the name suggests it would do.
2001-06-06 22:17:08 +00:00
John Baldwin
5beb572b41 We don't need to hold a lock just to test a flag. 2001-06-06 22:05:48 +00:00
Ruslan Ermilov
4589be70fe Unbreak setregid(2).
Spotted by:	Alexander Leidinger <Alexander@Leidinger.net>
2001-06-06 13:58:03 +00:00
John Baldwin
262c9f8a3b Don't hold sched_lock across addupc_task().
Reported by:	David Taylor <davidt@yadt.co.uk>
Submitted by:	bde
2001-06-06 00:57:24 +00:00
Dima Dorfman
ddf5b79683 Add a line discipline close routine which restores some functionality
I accidently nuked in rev. 1.54.  Also rework the error handling in
snplwrite a little.
2001-06-05 05:07:53 +00:00
Dima Dorfman
f09f49f136 Style and cosmetic cleanups. This driver is now reasonably stlye(9)
compliant.  All the variable definitions and function names are
reasonably consistent, and the functions which should be static (i.e.,
all of them) are.  Other assorted fixes were made.  The majority of
the delta is indentation fixes.

Partially reviewed by:	bde
2001-06-05 05:00:17 +00:00
Dima Dorfman
7fd72392d9 Use the l_nullioctl exported from tty_conf.c rather than rolling our own. 2001-06-04 23:31:21 +00:00
Dima Dorfman
22cf0fb34d Unstaticize l_nullioctl; it is needed elsewhere (like in tty_snoop.c).
Suggested by:	bde
2001-06-04 23:30:47 +00:00
Matthew Dillon
1b3e974a71 The pipe_write() code was locking the pipe without busying it first in
certain cases, and a close() by another process could potentially rip the
pipe out from under the (blocked) locking operation.

Reported-by: Alexander Viro <viro@math.psu.edu>
2001-06-04 04:04:45 +00:00
Dima Dorfman
87826386e0 Remove unused includes, use *min() inline functions rather than a
home-grown macro, rewrite a confusing conditional in snpdevtotty(),
and change ibuf to 512 bytes instead of 1024 bytes in dsnwrite().

Reviewed by:	bde
2001-06-03 05:17:39 +00:00
Dima Dorfman
b8edb44cc3 When tring to find out if this is a request for a write in
kernel_sysctl and userland_sysctl, check for whether new is NULL, not
whether newlen is 0.  This allows one to set a string sysctl to "".
2001-06-03 04:58:51 +00:00
Dima Dorfman
c0b824f97d Include sys/mutex.h to silence a warning. 2001-06-03 02:19:07 +00:00
Jesper Skriver
5b86eac4e5 Revert the last bits of my bogus move of NMBCLUSTERS
to <sys/param.h>
2001-06-01 21:47:34 +00:00
Thomas Moestl
d279178df7 Clean up the code exporting interrupt statistics via sysctl a bit:
- move the sysctl code to kern_intr.c
- do not use INTRCNT_COUNT, but rather eintrcnt - intrcnt to determine
  the length of the intrcnt array
- move the declarations of intrnames, eintrnames, intrcnt and eintrcnt
  from machine-dependent include files to sys/interrupt.h
- remove the hw.nintr sysctl, it is not needed.
- fix various style bugs

Requested by:	bde
Reviewed by:	bde (some time ago)
2001-06-01 13:23:28 +00:00
Ruslan Ermilov
0b381bf1fd Remove vestiges of MFS. 2001-06-01 10:07:28 +00:00
David E. O'Brien
240ef84277 Back out jesper's 2001/05/31 14:58:11 PDT commit. It does not compile. 2001-06-01 09:51:14 +00:00
Jesper Skriver
e916d96e64 Move the definition of NMBCLUSTERS from src/sys/kern/uipc_mbuf.c
to <sys/param.h>, so it's available to src/sys/netinet/ip_input.c,
and remove the now unneeded includes of "opt_param.h".

MFC after:	1 week
2001-05-31 21:56:44 +00:00
Dima Dorfman
a723c4e173 Export via sysctl:
* all members of msginfo from sysv_msg.c;
  * msqids from sysv_msg.c;
  * sema from sysv_sem.c; and
  * shmsegs from sysv_shm.c;

These will be used by ipcs(1) in non-kvm mode.

Reviewed by:	tmm
2001-05-30 03:28:59 +00:00
Poul-Henning Kamp
22628ccf96 Remove the hack-around for the slice/label code, it didn't
cover the hole.
2001-05-29 18:19:57 +00:00
Ian Dowse
5f558fa42f Since the netexport struct was centralised to 'struct mount',
attempting to remove nonexistant exports with MNT_DELEXPORT returns
an error; before this change it always succeeded. This caused
mountd(8) to log "can't delete exports for /whatever" warnings.

Change the error code from EINVAL to a more specific ENOENT, and
make mountd ignore this error when deleting the export list. I
could have just restored the previous behaviour of returning success,
but I think an error return is a useful diagnostic.

Reviewed by:	phk
2001-05-29 17:46:52 +00:00
Poul-Henning Kamp
b63436919d Remove a comment which was past its shelf life.
PR:		18750
Submitted by:	Tony Finch <dot@dotat.at>
2001-05-29 09:22:22 +00:00
Poul-Henning Kamp
c01a009dc5 With the new kernel dev_t conversions done at release 4.X,
it becomes possible to trap in ptsstop() in kern/tty_pty.c
     if the slave side has never been opened during the life of a kernel.

     What happens is that calls to ttyflush() done from ptyioctl() for the
     controlling side end up calling ptsstop() [via (*tp->t_stop)(tp, <X>)]
     which evaluates the following:

	     struct pt_ioctl *pti = tp->t_dev->si_drv1;

     In order for tp->t_dev to be set, the slave device must first be
     opened in ttyopen() [kern/tty.c].

     It appears that the only problem is calls to (*tp->t_stop)(tp, <n>),
     so this could also happen with other ioctls initiated by the
     controlling side before the slave has been opened.

PR:		27698
Submitted by:	David Bein bein@netapp.com
MFC after:	6 days
2001-05-28 20:22:12 +00:00
Poul-Henning Kamp
507fbee0ad The disklabel/slice code is more twisted than I thought. Revert to
calling the cdevsw_add() unconditionally.
2001-05-28 16:12:55 +00:00
Brian Somers
04bd20e31d Handle NULL struct device *s 2001-05-28 01:00:03 +00:00
Robert Watson
823c224e95 o uifree() the cr_ruidinfo in crfree() as well as cr_uidinfo now that the real uid
info is in the credential also.

Submitted by:	egge
2001-05-27 21:43:46 +00:00
Robert Watson
7cb8e4d277 o pcred-removal changes included modifications to optimize the setting of
the saved uid and gid during execve().  Unfortunately, the optimizations
  were incorrect in the case where the credential was updated, skipping
  the setting of the saved uid and gid when new credentials were generated.
  This change corrects that problem by handling the newcred!=NULL case
  correctly.

Reported/tested by:	David Malone <dwmalone@maths.tcd.ie>

Obtained from:	TrustedBSD Project
2001-05-26 19:59:44 +00:00
Poul-Henning Kamp
3344c5a17e Create a general facility for making dev_t's depend on another
dev_t.  The dev_depends(dev_t, dev_t) function is for tying them
to each other.

When destroy_dev() is called on a dev_t, all dev_t's depending
on it will also be destroyed (depth first order).

Rewrite the make_dev_alias() to use this dependency facility.

kern/subr_disk.c:
Make the disk mini-layer use dependencies to make sure all
relevant dev_t's are removed when the disk disappears.

Make the disk mini-layer precreate some magic sub devices
which the disk/slice/label code expects to be there.

kern/subr_disklabel.c:
Remove some now unneeded variables.

kern/subr_diskmbr.c:
Remove some ancient, commented out code.

kern/subr_diskslice.c:
Minor cleanup.  Use name from dev_t instead of dsname()
2001-05-26 08:27:58 +00:00
John Baldwin
9d127f9ffb Add vm locking to sendfile(2) and sf_buf_free().
Reported by:	Tamiji Homma <thomma@BayNetworks.com>
Tested by:	Tamiji Homma <thomma@BayNetworks.com>
2001-05-25 19:23:04 +00:00
Robert Watson
b1fc0ec1a7 o Merge contents of struct pcred into struct ucred. Specifically, add the
real uid, saved uid, real gid, and saved gid to ucred, as well as the
  pcred->pc_uidinfo, which was associated with the real uid, only rename
  it to cr_ruidinfo so as not to conflict with cr_uidinfo, which
  corresponds to the effective uid.
o Remove p_cred from struct proc; add p_ucred to struct proc, replacing
  original macro that pointed.
  p->p_ucred to p->p_cred->pc_ucred.
o Universally update code so that it makes use of ucred instead of pcred,
  p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo,
  cr_{r,sv}{u,g}id instead of p_*, etc.
o Remove pcred0 and its initialization from init_main.c; initialize
  cr_ruidinfo there.
o Restruction many credential modification chunks to always crdup while
  we figure out locking and optimizations; generally speaking, this
  means moving to a structure like this:
        newcred = crdup(oldcred);
        ...
        p->p_ucred = newcred;
        crfree(oldcred);
  It's not race-free, but better than nothing.  There are also races
  in sys_process.c, all inter-process authorization, fork, exec, and
  exit.
o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid;
  remove comments indicating that the old arrangement was a problem.
o Restructure exec1() a little to use newcred/oldcred arrangement, and
  use improved uid management primitives.
o Clean up exit1() so as to do less work in credential cleanup due to
  pcred removal.
o Clean up fork1() so as to do less work in credential cleanup and
  allocation.
o Clean up ktrcanset() to take into account changes, and move to using
  suser_xxx() instead of performing a direct uid==0 comparision.
o Improve commenting in various kern_prot.c credential modification
  calls to better document current behavior.  In a couple of places,
  current behavior is a little questionable and we need to check
  POSIX.1 to make sure it's "right".  More commenting work still
  remains to be done.
o Update credential management calls, such as crfree(), to take into
  account new ruidinfo reference.
o Modify or add the following uid and gid helper routines:
      change_euid()
      change_egid()
      change_ruid()
      change_rgid()
      change_svuid()
      change_svgid()
  In each case, the call now acts on a credential not a process, and as
  such no longer requires more complicated process locking/etc.  They
  now assume the caller will do any necessary allocation of an
  exclusive credential reference.  Each is commented to document its
  reference requirements.
o CANSIGIO() is simplified to require only credentials, not processes
  and pcreds.
o Remove lots of (p_pcred==NULL) checks.
o Add an XXX to authorization code in nfs_lock.c, since it's
  questionable, and needs to be considered carefully.
o Simplify posix4 authorization code to require only credentials, not
  processes and pcreds.  Note that this authorization, as well as
  CANSIGIO(), needs to be updated to use the p_cansignal() and
  p_cansched() centralized authorization routines, as they currently
  do not take into account some desirable restrictions that are handled
  by the centralized routines, as well as being inconsistent with other
  similar authorization instances.
o Update libkvm to take these changes into account.

Obtained from:	TrustedBSD Project
Reviewed by:	green, bde, jhb, freebsd-arch, freebsd-audit
2001-05-25 16:59:11 +00:00
Poul-Henning Kamp
5696db457d Make the PTY drivers cloning algorithm create "CHEAPCLONE" dev_t,
so that some twit cannot allocate all 256 PTY's with "ls -l".
2001-05-25 13:23:42 +00:00
Poul-Henning Kamp
2613d3fec9 Use the name given to the dev_t, rather than creating our own.
This makes it possible to give sensible information for /dev/fd.720
and similar "special" devices.
2001-05-25 09:06:52 +00:00
Ruslan Ermilov
1166fb516b - sys/msdosfs moved to sys/fs/msdosfs
- msdos.ko renamed to msdosfs.ko
- /usr/include/msdosfs moved to /usr/include/fs/msdosfs
2001-05-25 08:14:14 +00:00
Poul-Henning Kamp
25e0288d07 Don't rely on cdevsw_add() when we hack about with dev_t's. 2001-05-24 20:28:06 +00:00
Poul-Henning Kamp
8576c652b4 Don't take the detour around devsw() to find out if the proto-cdevsw
is already initialized.
2001-05-24 20:27:16 +00:00
Alfred Perlstein
0cea693084 whitespace/style 2001-05-24 18:06:22 +00:00
Matthew Dillon
ac8f990bde This patch implements O_DIRECT about 80% of the way. It takes a patchset
Tor created a while ago, removes the raw I/O piece (that has cache coherency
problems), and adds a buffer cache / VM freeing piece.

Essentially this patch causes O_DIRECT I/O to not be left in the cache, but
does not prevent it from going through the cache, hence the 80%.  For
the last 20% we need a method by which the I/O can be issued directly to
buffer supplied by the user process and bypass the buffer cache entirely,
but still maintain cache coherency.

I also have the code working under -stable but the changes made to sys/file.h
may not be MFCable, so an MFC is not on the table yet.

Submitted by:	tegge, dillon
2001-05-24 07:22:27 +00:00
Dima Dorfman
028f979d1d Correct style bugs with regards to long lines and comments.
Reviewed by:	bde
2001-05-23 23:38:05 +00:00
John Baldwin
0dfefe6829 Don't acquire Giant just to call trap_fatal(), we are about to panic
anyway so we'd rather see the printf's then block if the system is
hosed.
2001-05-23 22:58:09 +00:00
John Baldwin
bdc60f5bd3 Don't release Giant around vm_oject_page_clean() in fsync() as the pager
putpages called will need Giant.
2001-05-23 22:55:13 +00:00
John Baldwin
8aa66068ed - Always call bfreekva() w/o vm_mtx held.
- Always call vfs_setdirty() with vm_mtx held.
- Fix an old comment: vm_hold_unload_pages is called vm_hold_free_pages()
  nowadays.
- Always call vm_hold_free_pages() w/o vm_mtx held.
2001-05-23 22:24:49 +00:00
John Baldwin
1b2555b243 - Lock the VM when initializing the vmspace for proc0.
- Don't bother releasing Giant while doing a lookup on the vm_map of
  initproc while starting up init.  We have to grab it again right after
  the lookup anyways.
2001-05-23 22:06:47 +00:00
John Baldwin
613c83cbf1 Lock the VM while twiddling the vmspace. 2001-05-23 22:05:08 +00:00
Bosko Milekic
629db60492 Increment mbstat.m_mpfail, not mbstat.m_mcfail, when m_pullup() fails.
This slipped in accidently a few commits back.
2001-05-23 20:44:54 +00:00
John Baldwin
5bd57bc8b7 Don't release the vm lock just to turn around and grab it again. 2001-05-23 19:51:12 +00:00
John Baldwin
b516d2f5e1 Add in assertions to ensure that we always call msleep or mawait with
either a timeout or a held mutex to detect unprotected infinite sleeps
that can easily lead to deadlock.

Submitted by:	alfred
2001-05-23 19:38:26 +00:00
Poul-Henning Kamp
4787f91d6b syslogd gets kernel log messages only once every 30 seconds or
at the top of the minute, whichever comes first.  It seems
logtimeout() is only called once after the kernel log is opened
and then never again after that.  So I guess syslogd only gets
kernel log messages by virtue of syncer(4)'s flushes ...?

PR:		27361
Submitted by:	pkern@utcc.utoronto.ca
MFC after:	1 week
2001-05-23 19:02:50 +00:00
Alfred Perlstein
53240603ee aquire vm_mutex a little bit earlier to protect a pmap call. 2001-05-23 10:26:36 +00:00
Ruslan Ermilov
99d300a1ec - FDESC, FIFO, NULL, PORTAL, PROC, UMAP and UNION file
systems were repo-copied from sys/miscfs to sys/fs.

- Renamed the following file systems and their modules:
  fdesc -> fdescfs, portal -> portalfs, union -> unionfs.

- Renamed corresponding kernel options:
  FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS.

- Install header files for the above file systems.

- Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland
  Makefiles.
2001-05-23 09:42:29 +00:00
Dima Dorfman
0150c6e83d Unifdef DEV_SNP; snp(4) no longer requires these ugly hacks.
Silence by:	-hackers, -audit
2001-05-22 22:16:18 +00:00
Dima Dorfman
47eaa5f542 Convert this driver to (ab?)use line disciplines to get the input it
needs instead of relying on idiosyncratic hacks in the tty subsystem.
Also add module code since this can now be compiled as a module.

Silence by:	-hackers, -audit
2001-05-22 22:13:14 +00:00
Bruce Evans
1c1771cb5b Convert npx interrupts into traps instead of vice versa. This is much
simpler for npx exceptions that start as traps (no assembly required...)
and works better for npx exceptions that start as interrupts (there is
no longer a problem for nested interrupts).

Submitted by:	original (pre-SMPng) version by luoqi
2001-05-22 21:20:49 +00:00
Dima Dorfman
a8dbafbe87 Correct the vm_mtx handling; specifically, don't acquire it in
shm_deallocate_segment because shmexit_myhook calls it, and the latter
should always be called with it already held.

Submitted by:	dwmalone, dd
Approved by:	alfred
2001-05-22 03:56:26 +00:00
Alfred Perlstein
a4d22b8035 Remove KASSERT test for sleeping on mv_mtx, instead let WITNESS catch
it.

Requested by: jhb
2001-05-22 00:58:20 +00:00
John Baldwin
9dceb26b23 Sort includes. 2001-05-21 18:52:02 +00:00
John Baldwin
270b041d95 - Assert that the vm mutex is held in pipe_free_kmem().
- Don't release the vm mutex early in pipespace() but instead hold it
  across vm_object_deallocate() if vm_map_find() returns an error and
  across pipe_free_kmem() if vm_map_find() succeeds.
- Add a XXX above a zfree() since zalloc already has its own locking,
  one would hope that zfree() wouldn't need the vm lock.
2001-05-21 18:47:17 +00:00
John Baldwin
d8aad40c88 Axe unneeded spl()'s. 2001-05-21 18:30:50 +00:00
Alfred Perlstein
67d1f21cbe Aquire vm mutex when releasing sysv shm segments.
Obtained from: Dima Dorfman <dima@unixfreak.org>
2001-05-20 20:37:47 +00:00
Jonathan Lemon
1890520a77 Add convenience function kernel_sysctlbyname() for kernel consumers,
so they don't have to roll their own sysctlbyname function.
2001-05-19 05:45:55 +00:00
Alfred Perlstein
5ee5c3aa1f remove my private assertions from tsleep.
add one assertion to ensure we don't sleep while holding vm.
2001-05-19 01:40:48 +00:00
Alfred Perlstein
2c3c846931 Regen syscalls that were made mpsafe via vm_mtx
obreak, getpagesize, sbrk, sstk, mmap, ovadvise, munmap, mprotect,
madvise, mincore, mmap, mlock, munlock, minherit, msync, mlockall,
munlockall
2001-05-19 01:37:12 +00:00
Alfred Perlstein
2395531439 Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb
2001-05-19 01:28:09 +00:00
John Baldwin
1ad5401134 - Don't panic on a try lock operation for a sleep lock if we hold a spin
lock.  Since we won't actually block on a try lock operation, it's not
  a problem.  Add a comment explaining why it is safe to skip lock order
  checking with try locks.
- Remove the ithread list lock spin lock from the order list.
2001-05-17 22:44:56 +00:00
John Baldwin
4d29cb2db9 - Remove the global ithread_list_lock spin lock in favor of per-ithread
sleep locks.
- Delay returning from ithread_remove_handler() until we are certain that
  the interrupt handler being removed has in fact been removed from the
  ithread.
- XXX: There is still a problem in that nothing protects the kernel from
  adding a new handler while the ithread is running, though with our
  current architectures this is not a problem.

Requested by:	gibbs (2)
2001-05-17 22:43:26 +00:00
John Baldwin
7a08bae6ec - Move the setting of bootverbose to a MI SI_SUB_TUNABLES SYSINIT.
- Attach a writable sysctl to bootverbose (debug.bootverbose) so it can be
  toggled after boot.
- Move the printf of the version string to a SI_SUB_COPYRIGHT SYSINIT just
  afer the display of the copyright message instead of doing it by hand in
  three MD places.
2001-05-17 22:28:46 +00:00
Robert Watson
6bd1912df4 o Modify access control checks in p_candebug() such that the policy is as
follows: the effective uid of p1 (subject) must equal the real, saved,
  and effective uids of p2 (object), p2 must not have undergone a
  credential downgrade.  A subject with appropriate privilege may override
  these protections.

  In the future, we will extend these checks to require that p1 effective
  group membership must be a superset of p2 effective group membership.

Obtained from:	TrustedBSD Project
2001-05-17 21:48:44 +00:00
Alfred Perlstein
0fd061c0c4 Cleanup
Remove comment about setting error for reads on EOF, read returns 0 on
EOF so the code should be ok.

Remove non-effective priority boost, PRIO+1 doesn't do anything
(according to McKusick), if a real priority boost is needed it should
have been +4.

Style fixes:
.) return foo -> return (foo)
.) FLAG1|FlAG2 -> FLAG1 | FlAG2
.) wrap long lines
.) unwrap short lines
.) for(i=0;i=foo;i++) -> for (i = 0; i=foo; i++)
.) remove braces for some conditionals with a single statement
.) fix continuation lines.

md5 couldn't verify the binary because some code had to
be shuffled around to address the style issues.
2001-05-17 19:47:09 +00:00
Alfred Perlstein
2deb4a20c3 initialize pipe pointers 2001-05-17 18:22:58 +00:00
Alfred Perlstein
82a283fcf3 pipe_create has to zero out the select record earlier to avoid
returning a half-initialized pipe and causing pipeclose() to follow
a junk pointer.

Discovered by: "Nick S" <snicko@noid.org>
2001-05-17 17:59:28 +00:00
Ian Dowse
0864ef1e8a Change the second argument of vflush() to an integer that specifies
the number of references on the filesystem root vnode to be both
expected and released. Many filesystems hold an extra reference on
the filesystem root vnode, which must be accounted for when
determining if the filesystem is busy and then released if it isn't
busy. The old `skipvp' approach required individual filesystem
xxx_unmount functions to re-implement much of vflush()'s logic to
deal with the root vnode.

All 9 filesystems that hold an extra reference on the root vnode
got the logic wrong in the case of forced unmounts, so `umount -f'
would always fail if there were any extra root vnode references.
Fix this issue centrally in vflush(), now that we can.

This commit also fixes a vnode reference leak in devfs, which could
result in idle devfs filesystems that refuse to unmount.

Reviewed by:	phk, bp
2001-05-16 18:04:37 +00:00
Alfred Perlstein
a428c5ffef remove include of ipl.h because it no longer exists 2001-05-16 02:52:06 +00:00
John Baldwin
8bd57f8fc2 Remove unneeded includes of sys/ipl.h and machine/ipl.h. 2001-05-15 23:22:29 +00:00
John Baldwin
74fc745594 - Remove unneeded include of sys/ipl.h.
- Lock the process before calling killproc() to kill it for exceeding the
  maximum CPU limit.
2001-05-15 23:15:06 +00:00
John Baldwin
9081e5e826 - Remove unneeded include of sys/ipl.h.
- Require the proc lock be held for killproc() to allow for the vmdaemon to
  kill a process when memory is exhausted while holding the lock of the
  process to kill.
2001-05-15 23:13:58 +00:00
Brian Somers
eeee064735 Support /dev/ctty again
Submitted by:	peter
2001-05-15 18:12:38 +00:00
Seigo Tanimura
1b36970495 Back out scanning file descriptors with holding a process lock.
selrecord() requires allproc sx in pfind(), resulting in lock order
reversal between allproc and a process lock.
2001-05-15 10:19:57 +00:00
Jonathan Lemon
97f6754ff1 When calling poll() on a fd associated with a filesystem, let POLLIN/POLLOUT
behave identically to POLLRDNORM/POLLWRNORM.

Submitted by: bde
PR: 27287
merge after: 1 week
2001-05-14 14:37:25 +00:00
Poul-Henning Kamp
241e77c8a5 Use the new ability to avoid practically all the gunk in this file.
When people access /dev/tty, locate their controlling tty and return
the dev_t of it to them.  This basically makes /dev/tty act like
a variant symlink sort of thing which is much simpler than all the
mucking about with vnodes.
2001-05-14 08:22:56 +00:00
Seigo Tanimura
265fc98f36 - Convert msleep(9) in select(2) and poll(2) to cv_*wait*(9).
- Since polling should not involve sleeping, keep holding a
  process lock upon scanning file descriptors.

- Hold a reference to every file descriptor prior to entering
  polling loop in order to avoid lock order reversal between
  lockmgr and p_mtx upon calling fdrop() in fo_poll().
  (NOTE: this work has not been done for netncp and netsmb
  yet because a socket itself has no reference counts.)

Reviewed by:	jhb
2001-05-14 05:26:48 +00:00
John Baldwin
1efb92b7ca Simplify the vm fault trap handling code a bit by using if-else instead of
duplicating code in the then case and then using a goto to jump around
the else case.
2001-05-11 23:50:08 +00:00
Ian Dowse
1feb7a6efa In vrele() and vput(), avoid triggering the confusing "missed vn_close"
KASSERT when vp->v_usecount is zero or negative. In this case, the
"v*: negative ref cnt" panic that follows is much more appropriate.

Reviewed by:	mckusick
2001-05-11 20:42:41 +00:00
John Baldwin
9e5620599e Check witness_dead in more functions to avoid panic'ing when assertions
fail due to witness exhausting its internal resources and shutting down.

Reported by:	Szilveszter Adam <sziszi@petra.hos.u-szeged.hu>
Tested by:	David Wolfskill <david@catwhisker.org>
2001-05-11 20:25:29 +00:00
Tor Egge
dd1c45f3ca Regenerate. 2001-05-11 17:05:47 +00:00
Tor Egge
b4b469e6bb gettimeofday() is MP safe on both -current and -stable. 2001-05-11 17:05:12 +00:00
John Baldwin
ba228f6d96 - Split out the support for per-CPU data from the SMP code. UP kernels
have per-CPU data and gdb on the i386 at least needs access to it.
- Clean up includes in kern_idle.c and subr_smp.c.

Reviewed by:	jake
2001-05-10 17:45:49 +00:00
Alfred Perlstein
97d4578662 Remove an 'optimization' I hope to never see again.
The pipe code could not handle running out of kva, it would panic
if that happened.  Instead return ENFILE to the application which
is an acceptable error return from pipe(2).

There was some slightly tricky things that needed to be worked on,
namely that the pipe code can 'realloc' the size of the buffer if
it detects that the pipe could use a bit more room.  However if it
failed the reallocation it could not cope and would panic.  Fix
this by attempting to grow the pipe while holding onto our old
resources.  If all goes well free the old resources and use the
new ones, otherwise continue to use the smaller buffer already
allocated.

While I'm here add a few blank lines for style(9) and remove
'register'.
2001-05-08 09:09:18 +00:00
Poul-Henning Kamp
e0e0b6610e Always initialize bio_resid from bio_bcount in the disk mini-layer so
that the drivers don't have to do it umpteen times.
2001-05-08 08:24:54 +00:00
Akinori MUSHA
3b26be6ae1 Properly copy the P_ALTSTACK flag in struct proc::p_flag to the child
process on fork(2).

It is the supposed behavior stated in the manpage of sigaction(2), and
Solaris, NetBSD and FreeBSD 3-STABLE correctly do so.

The previous fix against libc_r/uthread/uthread_fork.c fixed the
problem only for the programs linked with libc_r, so back it out and
fix fork(2) itself to help those not linked with libc_r as well.

PR:		kern/26705
Submitted by:	KUROSAWA Takahiro <fwkg7679@mb.infoweb.ne.jp>
Tested by:	knu, GOTOU Yuuzou <gotoyuzo@notwork.org>,
		and some other people
Not objected by:	hackers
MFC in:		3 days
2001-05-07 18:07:29 +00:00