Commit Graph

7471 Commits

Author SHA1 Message Date
Robert Watson
1c1ce9253f Push acquisition of Giant from fdrop_closed() into fo_close() so that
individual file object implementations can optionally acquire Giant if
they require it:

- soo_close(): depends on debug.mpsafenet
- pipe_close(): Giant not acquired
- kqueue_close(): Giant required
- vn_close(): Giant required
- cryptof_close(): Giant required (conservative)

Notes:

  Giant is still acquired in close() even when closing MPSAFE objects
  due to kqueue requiring Giant in the calling closef() code.
  Microbenchmarks indicate that this removal of Giant cuts 3%-3% off
  of pipe create/destroy pairs from user space with SMP compiled into
  the kernel.

  The cryptodev and opencrypto code appears MPSAFE, but I'm unable to
  test it extensively and so have left Giant over fo_close().  It can
  probably be removed given some testing and review.
2004-07-22 18:35:43 +00:00
Robert Watson
df04411ac4 suser() accepts a thread argument; as suser() dereferences td_ucred, a
thread-local pointer, in practice that thread needs to be curthread.  If
we're running with INVARIANTS, generate a warning if not.  If we have
KDB compiled in, generate a stack trace.  This doesn't fire at all in my
local test environment, but could be irritating if it fires frequently
for someone, so there will be motivation to fix things quickly when it
does.
2004-07-22 17:05:04 +00:00
Scott Long
9493183e77 Disable the PREEMPTION-enabled code in critical_exit() that encourages
switching to a different thread.  This is just a hack to try to improve
stability some more, but likely points closer to the real culprit.
2004-07-22 14:32:48 +00:00
Bosko Milekic
01e9ccbd9c Back out just a portion of Alfred's last commit. Remove the MBUF_CHECK
(WITNESS) for code paths that always call uma_zalloc_arg() shortly
after where the check was, because uma_zalloc_arg() already does
a similar check.

No objections from Alfred.  Thanks Alfred.
2004-07-21 21:03:01 +00:00
Robert Watson
46e38ce826 Don't sync the file system on panic by default. This seems to basically
work very infrequently, and often results in a compound panic which
confuses debugging; locking/SMP have made the layering violation (and
risks) of this more obvious over time.

Discussed with:	green, bde, et al.
2004-07-21 16:04:46 +00:00
Alfred Perlstein
05656b6e2b put several of the options for DEBUG_VFS_LOCKS under control of sysctls. 2004-07-21 07:13:14 +00:00
Alfred Perlstein
063d811465 Make sure we don't call mbuf allocation functions with mutexes held.
Discussed with: rwatson
2004-07-21 07:12:24 +00:00
Marcel Moolenaar
3d4f313695 Add kdb_thr_from_pid(), which given a PID returns the first thread
in the process. This is useful when working from or with a process.
2004-07-21 04:49:48 +00:00
Mike Silbersack
eb3d2c61b4 Fix a minor error in pipe_stat - st_size was always reported as 0
when direct writes kicked in.  Whether this affected any applications
is unknown.
2004-07-20 07:06:43 +00:00
Peter Wemm
b09cb1027b #ifdef __i386__ -> __i386__ || __amd64__ 2004-07-20 02:15:10 +00:00
Julian Elischer
3a63b92c12 You always spot the typos after you have committed.. Start sentence
with a Cap.
2004-07-19 18:06:12 +00:00
Julian Elischer
f6449d9d31 Allow the user who calls doadump() from the kernel debugger
to not get a page fault if he has not defined a dump device.
Panic can often not do a dump as it can hang forever in some cases.
 The original PR was for amd64 only. This is a generalised version of
that change.

PR:		amd64/67712
Submitted by:	wjw@withagen.nl <Willen Jan Withagen>
2004-07-19 18:03:02 +00:00
Brian Feldman
4362fada8f Reimplement contigmalloc(9) with an algorithm which stands a greatly-
improved chance of working despite pressure from running programs.
Instead of trying to throw a bunch of pages out to swap and hope for
the best, only a range that can potentially fulfill contigmalloc(9)'s
request will have its contents paged out (potentially, not forcibly)
at a time.

The new contigmalloc operation still operates in three passes, but it
could potentially be tuned to more or less.  The first pass only looks
at pages in the cache and free pages, so they would be thrown out
without having to block.  If this is not enough, the subsequent passes
page out any unwired memory.  To combat memory pressure refragmenting
the section of memory being laundered, each page is removed from the
systems' free memory queue once it has been freed so that blocking
later doesn't cause the memory laundered so far to get reallocated.

The page-out operations are now blocking, as it would make little sense
to try to push out a page, then get its status immediately afterward
to remove it from the available free pages queue, if it's unlikely to
have been freed.  Another change is that if KVA allocation fails, the
allocated memory segment will be freed and not leaked.

There is a sysctl/tunable, defaulting to on, which causes the old
contigmalloc() algorithm to be used.  Nonetheless, I have been using
vm.old_contigmalloc=0 for over a month.  It is safe to switch at
run-time to see the difference it makes.

A new interface has been used which does not require mapping the
allocated pages into KVA: vm_page.h functions vm_page_alloc_contig()
and vm_page_release_contig().  These are what vm.old_contigmalloc=0
uses internally, so the sysctl/tunable does not affect their operation.

When using the contigmalloc(9) and contigfree(9) interfaces, memory
is now tracked with malloc(9) stats.  Several functions have been
exported from kern_malloc.c to allow other subsystems to use these
statistics, as well.  This invalidates the BUGS section of the
contigmalloc(9) manpage.
2004-07-19 06:21:27 +00:00
Julian Elischer
55d44f79ea When calling scheduler entrypoints for creating new threads and processes,
specify "us" as the thread not the process/ksegrp/kse.
You can always find the others from the thread but the converse is not true.
Theorotically this would lead to runtime being allocated to the wrong
entity in some cases though it is not clear how often this actually happenned.
(would only affect threaded processes and would probably be pretty benign,
but it WAS a bug..)

Reviewed by: peter
2004-07-18 23:36:13 +00:00
Pawel Jakub Dawidek
ece2d9891e Now we have NO_ADAPTIVE_MUTEXES option, so use it here too.
Missed by:	scottl
2004-07-18 23:27:14 +00:00
Marcel Moolenaar
1f7a1baa37 After maintaining previous behaviour in writing out the core notes, it's
time now to break with the past: do not write the PID in the first note.
Rationale:
1.  [impact of the breakage] Process IDs in core files serve no immediate
    purpose to the debugger itself. They are only useful to relate a core
    file to a process. This can provide context to the person looking at
    the core file, provided one keeps track of this. Overall, not having
    the PID in the core file is only in very rare occasions unfortunate.
2.  [reason of the breakage] Having one PRSTATUS note contain the PID,
    while all others contain the LWPID of the corresponding kernel thread
    creates an irregularity for the debugger that cannot easily be worked
    around. This is caused by libthread_db correlating user thread IDs to
    kernel thread (aka LWP) IDs and thus aware of the actual LWPIDs.

Update comments accordingly.
2004-07-18 20:28:07 +00:00
David Malone
cdb71f7526 The recent changes to control message passing broke some things
that get certain types of control messages (ping6 and rtsol are
examples). This gets the new code closer to working:

	1) Collect control mbufs for processing in the controlp ==
	NULL case, so that they can be freed by externalize.

	2) Loop over the list of control mbufs, as the externalize
	function may not know how to deal with chains.

	3) In the case where there is no externalize function,
	remember to add the control mbuf to the controlp list so
	that it will be returned.

	4) After adding stuff to the controlp list, walk to the
	end of the list of stuff that was added, incase we added
	a chain.

This code can be further improved, but this is enough to get most
things working again.

Reviewed by:	rwatson
2004-07-18 19:10:36 +00:00
Doug Rabson
4c4392e791 Add doxygen doc comments for most of newbus and the BUS interface. 2004-07-18 16:30:31 +00:00
Scott Long
701f140800 Enable ADAPTIVE_MUTEXES by default by changing the sense of the option to
NO_ADAPTIVE_MUTEXES.  This option has been enabled by default on amd64 for
quite some time, and has been extensively tested on i386 and sparc64.  It
shows measurable performance gains in many circumstances, and few negative
effects.  It would be nice in t he future if adaptive mutexes actually went
to sleep after a certain amount of spinning, but that will require quite a
bit more testing.
2004-07-18 15:59:03 +00:00
Alan Cox
d8582da660 Remove GIANT_REQUIRED from vmapbuf(). 2004-07-18 04:57:49 +00:00
Robert Watson
2260c03d77 Drop Giant and acquire the UNIX domain socket subsystem lock a bit
earlier in unp_connect() so that vp->v_socket can't change between
our copying its value to a local variable and later use of that
variable.  This may have been responsible for a panic during
shutdown that I experienced where simultaneous closing of a listen
socket by rpcbind and a new connection being made to rpcbind by
mountd.
2004-07-18 01:29:43 +00:00
David Xu
c3d88cbab8 Fix typo. 2004-07-17 23:15:41 +00:00
David Malone
e140eb430c Add a kern_setsockopt and kern_getsockopt which can read the option
values from either user land or from the kernel. Use them for
[gs]etsockopt and to clean up some calls to [gs]etsockopt in the
Linux emulation code that uses the stackgap.
2004-07-17 21:06:36 +00:00
John Baldwin
52eb84641d - Move TDF_OWEPREEMPT, TDF_OWEUPC, and TDF_USTATCLOCK over to td_pflags
since they are only accessed by curthread and thus do not need any
  locking.
- Move pr_addr and pr_ticks out of struct uprof (which is per-process)
  and directly into struct thread as td_profil_addr and td_profil_ticks
  as these variables are really per-thread.  (They are used to defer an
  addupc_intr() that was too "hard" until ast()).
2004-07-16 21:04:55 +00:00
John Baldwin
d3373e371b Whitespace fix. 2004-07-16 21:01:52 +00:00
John Baldwin
6dbc085016 Improve readability a bit by changing some code at the end of a function
that did:

	if (foo)
		return
	else
		blah

to just do the simpler

	if (!foo)
		blah

instead.
2004-07-16 21:00:50 +00:00
Colin Percival
24283cc01b Add a SUSER_RUID flag to suser_cred. This flag indicates that we want to
check if the *real* user is the superuser (vs. the normal behaviour, which
checks the effective user).

Reviewed by:	rwatson
2004-07-16 15:57:16 +00:00
Robert Watson
dad7b41a9b When entering soclose(), assert that SS_NOFDREF is not already set. 2004-07-16 00:37:34 +00:00
Poul-Henning Kamp
672c05d49c Preparation commit for the tty cleanups that will follow in the near
future:

rename ttyopen() -> tty_open() and ttyclose() -> tty_close().

We need the ttyopen() and ttyclose() for the new generic cdevsw
functions for tty devices in order to have consistent naming.
2004-07-15 20:47:41 +00:00
Poul-Henning Kamp
3e019deaed Do a pass over all modules in the kernel and make them return EOPNOTSUPP
for unknown events.

A number of modules return EINVAL in this instance, and I have left
those alone for now and instead taught MOD_QUIESCE to accept this
as "didn't do anything".
2004-07-15 08:26:07 +00:00
Alfred Perlstein
bb5faea34f Cleanup shutdown output. 2004-07-15 08:01:00 +00:00
Alfred Perlstein
da6303bacc Tidy up system shutdown. 2004-07-15 04:29:48 +00:00
Alfred Perlstein
a88295bb83 Disable SIGIO for now, leave a comment as to why it's busted and hard
to fix.
2004-07-15 03:49:52 +00:00
Nate Lawson
8916adb1c9 Clean up the output on reboot by keeping completion messages on the same
line as the announcement.  Someone should probably update the "buffers
remaining" message since we now no longer should have any buffers remaining
at that point.
2004-07-15 03:20:08 +00:00
Poul-Henning Kamp
e2ad640e13 A module with no modevent function gets modevent_nop() as default.
Until now the function has just returned zero for any event, but
that is downright wrong for MOD_UNLOAD and not very useful for any
future events we add where it may be crucial to be able to tell
if the event was unhandled or successful.

Change the function to return as follows:

	MOD_LOAD -> 0
	MOD_UNLOAD -> EBUSY
	anything else -> EOPNOTSUPP
2004-07-14 22:37:36 +00:00
Christian S.J. Peron
ed6c545cf0 In addition to the real user ID check, do an explicit jail
check to ensure that the caller is not prison root.

The intention is to fix file descriptor creation so that
prison root can not use the last remaining file descriptors.
This privilege should be reserved for non-jailed root users.

Approved by:	bmilekic (mentor)
2004-07-14 19:04:31 +00:00
Alfred Perlstein
67543ab1e3 Make FIOASYNC, FIOSETOWN and FIOGETOWN work on kqueues. 2004-07-14 07:02:03 +00:00
John Baldwin
6942d4339e Set TDF_NEEDRESCHED when a higher priority thread is scheduled in
sched_add() rather than just doing it in sched_wakeup().  The old
ithread preemption code used to set NEEDRESCHED unconditionally if it
didn't preempt which masked this bug in SCHED_4BSD.

Noticed by:	jake
Reported by:	kensmith, marcel
2004-07-13 20:49:13 +00:00
Poul-Henning Kamp
65a311fcb2 Give kldunload a -f(orce) argument.
Add a MOD_QUIESCE event for modules.  This should return error (EBUSY)
of the module is in use.

MOD_UNLOAD should now only fail if it is impossible (as opposed to
inconvenient) to unload the module.  Valid reasons are memory references
into the module which cannot be tracked down and eliminated.

When kldunloading, we abandon if MOD_UNLOAD fails, and if -force is
not given, MOD_QUIESCE failing will also prevent the unload.

For backwards compatibility, we treat EOPNOTSUPP from MOD_QUIESCE as
success.

Document that modules should return EOPNOTSUPP for unknown events.
2004-07-13 19:36:59 +00:00
Poul-Henning Kamp
1a946b9fef Add kldunloadf() system call. Stay tuned for follwing commit messages. 2004-07-13 19:35:11 +00:00
Poul-Henning Kamp
49bddf0c9f fix compilation. 2004-07-13 16:33:38 +00:00
Colin Percival
65bba83fef Replace "uid != 0" with "suser(td->td_ucred) != 0" when checking if we've
hit the maximum number of processes.  The last ten processes are reserved
for the *non-jailed* superuser.
2004-07-13 13:10:07 +00:00
David Xu
4d47dc5549 Add code to support debugging threaded process.
1. Add tm_lwpid into kse_thr_mailbox to indicate which kernel
     thread current user thread is running on. Add tm_dflags into
     kse_thr_mailbox, the flags is written by debugger, it tells
     UTS and kernel what should be done when the process is being
     debugged, current, there two flags TMDF_SSTEP and TMDF_DONOTRUNUSER.

     TMDF_SSTEP is used to tell kernel to turn on single stepping,
     or turn off if it is not set.

     TMDF_DONOTRUNUSER is used to tell kernel to schedule upcall
     whenever possible, to UTS, it means do not run the user thread
     until debugger clears it, this behaviour is necessary because
     gdb wants to resume only one thread when the thread's pc is
     at a breakpoint, and thread needs to go forward, in order to
     avoid other threads sneak pass the breakpoints, it needs to remove
     breakpoint, only wants one thread to go. Also, add km_lwp to
     kse_mailbox, the lwp id is copied to kse_thr_mailbox at context
     switch time when process is not being debugged, so when process
     is attached, debugger can map kernel thread to user thread.

  2. Add p_xthread to proc strcuture and td_xsig to thread structure.
     p_xthread is used by a thread when it wants to report event
     to debugger, every thread can set the pointer, especially, when
     it is used in ptracestop, it is the last thread reporting event
     will win the race. Every thread has a td_xsig to exchange signal
     with debugger, thread uses TDF_XSIG flag to indicate it is reporting
     signal to debugger, if the flag is not cleared, thread will keep
     retrying until it is cleared by debugger, p_xthread may be
     used by debugger to indicate CURRENT thread. The p_xstat is still
     in proc structure to keep wait() to work, in future, we may
     just use td_xsig.

  3. Add TDF_DBSUSPEND flag, the flag is used by debugger to suspend
     a thread. When process stops, debugger can set the flag for
     thread, thread will check the flag in thread_suspend_check,
     enters a loop, unless it is cleared by debugger, process is
     detached or process is existing. The flag is also checked in
     ptracestop, so debugger can temporarily suspend a thread even
     if the thread wants to exchange signal.

  4. Current, in ptrace, we always resume all threads, but if a thread
     has already a TDF_DBSUSPEND flag set by debugger, it won't run.

Encouraged by: marcel, julian, deischen
2004-07-13 07:33:40 +00:00
David Xu
ef9457becb Implement following commands: PT_CLEARSTEP, PT_SETSTEP, PT_SUSPEND
PT_RESUME, PT_GETNUMLWPS, PT_GETLWPLIST.
2004-07-13 07:25:24 +00:00
David Xu
cbf4e354ec Add code to support debugging threaded process.
1. Add tm_lwpid into kse_thr_mailbox to indicate which kernel
   thread current user thread is running on. Add tm_dflags into
   kse_thr_mailbox, the flags is written by debugger, it tells
   UTS and kernel what should be done when the process is being
   debugged, current, there two flags TMDF_SSTEP and TMDF_DONOTRUNUSER.

   TMDF_SSTEP is used to tell kernel to turn on single stepping,
   or turn off if it is not set.

   TMDF_DONOTRUNUSER is used to tell kernel to schedule upcall
   whenever possible, to UTS, it means do not run the user thread
   until debugger clears it, this behaviour is necessary because
   gdb wants to resume only one thread when the thread's pc is
   at a breakpoint, and thread needs to go forward, in order to
   avoid other threads sneak pass the breakpoints, it needs to remove
   breakpoint, only wants one thread to go. Also, add km_lwp to
   kse_mailbox, the lwp id is copied to kse_thr_mailbox at context
   switch time when process is not being debugged, so when process
   is attached, debugger can map kernel thread to user thread.

2. Add p_xthread to proc strcuture and td_xsig to thread structure.
   p_xthread is used by a thread when it wants to report event
   to debugger, every thread can set the pointer, especially, when
   it is used in ptracestop, it is the last thread reporting event
   will win the race. Every thread has a td_xsig to exchange signal
   with debugger, thread uses TDF_XSIG flag to indicate it is reporting
   signal to debugger, if the flag is not cleared, thread will keep
   retrying until it is cleared by debugger, p_xthread may be
   used by debugger to indicate CURRENT thread. The p_xstat is still
   in proc structure to keep wait() to work, in future, we may
   just use td_xsig.

3. Add TDF_DBSUSPEND flag, the flag is used by debugger to suspend
   a thread. When process stops, debugger can set the flag for
   thread, thread will check the flag in thread_suspend_check,
   enters a loop, unless it is cleared by debugger, process is
   detached or process is existing. The flag is also checked in
   ptracestop, so debugger can temporarily suspend a thread even
   if the thread wants to exchange signal.

4. Current, in ptrace, we always resume all threads, but if a thread
   has already a TDF_DBSUSPEND flag set by debugger, it won't run.

Encouraged by: marcel, julian, deischen
2004-07-13 07:20:10 +00:00
Alan Cox
ce8da3091f Push down the acquisition and release of the page queues lock into
pmap_remove_pages().  (The implementation of pmap_remove_pages() is
optional.  If pmap_remove_pages() is unimplemented, the acquisition and
release of the page queues lock is unnecessary.)

Remove spl calls from the alpha, arm, and ia64 pmap_remove_pages().
2004-07-13 02:49:22 +00:00
David Malone
dcee93dcf9 Rename Alfred's kern_setsockopt to so_setsockopt, as this seems a
a better name. I have a kern_[sg]etsockopt which I plan to commit
shortly, but the arguments to these function will be quite different
from so_setsockopt.

Approved by:	alfred
2004-07-12 21:42:33 +00:00
Mike Makonnen
c21e3b38bd writers must hold both sched_lock and the process lock; therefore, readers
need only obtain the process lock.
2004-07-12 15:28:31 +00:00
Alfred Perlstein
f257b7a54b Make VFS_ROOT() and vflush() take a thread argument.
This is to allow filesystems to decide based on the passed thread
which vnode to return.
Several filesystems used curthread, they now use the passed thread.
2004-07-12 08:14:09 +00:00
David Xu
507b03186a Change kse_switchin to accept kse_thr_mailbox pointer, the syscall
will be used heavily in debugging KSE threads. This breaks libpthread
on IA64, but because libpthread was not in 5.2.1 release, I would like
to change it so we needn't to introduce another syscall.
2004-07-12 07:39:20 +00:00