7569 Commits

Author SHA1 Message Date
rwatson
7ac0169fa1 Always acquire the UNIX domain socket subsystem lock (UNP lock)
before dereferencing sotounpcb() and checking its value, as so_pcb
is protected by protocol locking, not subsystem locking.  This
prevents races during close() by one thread and use of ths socket
in another.

unp_bind() now assert the UNP lock, and uipc_bind() now acquires
the lock around calls to unp_bind().
2004-08-16 04:41:03 +00:00
green
1de6d6df05 Add the missing knote_fdclose(). 2004-08-16 03:09:01 +00:00
green
99deda206a Allocate the marker, when scanning a kqueue, from the "heap" instead of the
stack.  When swapped out, a process's kernel stack would be unavailable,
and we could get a page fault when scanning the same kqueue.

PR:	kern/61849
2004-08-16 03:08:38 +00:00
rwatson
2cd0dbc8de Annotate the current UNIX domain socket locking strategies, order,
strengths, and weaknesses in a comment.  Assert a copyright over the
changes made as part of the locking work.
2004-08-16 01:52:04 +00:00
silby
e3f2e32958 Major enhancements to pipe memory usage:
- pipespace is now able to resize non-empty pipes; this allows
  for many more resizing opportunities

- Backing is no longer pre-allocated for the reverse direction
  of pipes.  This direction is rarely (if ever) used, so this cuts the
  amount of map space allocated to a pipe in half.

- Pipe growth is now much more dynamic; a pipe will now grow when
  the total amount of data it contains and the size of the write are
  larger than the size of pipe.  Previously, only individual writes greater
  than the size of the pipe would cause growth.

- In low memory situations, pipes will now shrink during both read
  and write operations, where possible.  Once the memory shortage
  ends, the growth code will cause these pipes to grow back to an appropriate
  size.

- If the full PIPE_SIZE allocation fails when a new pipe is created, the
  allocation will be retried with SMALL_PIPE_SIZE.  This helps to deal
  with the situation of a fragmented map after a low memory period has
  ended.

- Minor documentation + code changes to support the above.

In total, these changes increase the total number of pipes that
can be allocated simultaneously, drastically reducing the chances that
pipe allocation will fail.

Performance appears unchanged due to dynamic resizing.
2004-08-16 01:27:24 +00:00
truckman
5facb5bcc2 Yet another tweak to the shutdown messages in boot():
Don't count busy buffers before the initial call to sync() and
  don't skip the initial sync() if no busy buffers were called.
  Always call sync() at least once if syncing is requested.  This
  defers the "Syncing disks, buffers remaining..." message until
  after the initial sync() call and the first count of busy
  buffers.  This backs out changes in kern_shutdown 1.162.

  Print a different message when there are no busy buffers after the
  initial sync(), which is now the expected situation.

  Print an additional message when syncing has completed successfully
  in the unusual situation where the work of syncing was done by
  boot().

  Uppercase one message to make it consistent with all of the other
  kernel shutdown messages.

Discussed with:	bde (in a much earlier form, prior to 1.162)
Reviewed by:	njl (in an earlier form)
2004-08-15 19:17:23 +00:00
jmg
bc1805c6e8 Add locking to the kqueue subsystem. This also makes the kqueue subsystem
a more complete subsystem, and removes the knowlege of how things are
implemented from the drivers.  Include locking around filter ops, so a
module like aio will know when not to be unloaded if there are outstanding
knotes using it's filter ops.

Currently, it uses the MTX_DUPOK even though it is not always safe to
aquire duplicate locks.  Witness currently doesn't support the ability
to discover if a dup lock is ok (in some cases).

Reviewed by:	green, rwatson (both earlier versions)
2004-08-15 06:24:42 +00:00
rwatson
b3113cfdfe Add a new sysctl, debug.kdb.stop_cpus, which controls whether or not we
attempt to IPI other cpus when entering the debugger in order to stop
them while in the debugger.  The default remains to issue the stop;
however, that can result in a hang if another cpu has interrupts disabled
and is spinning, since the IPI won't be received and the KDB will wait
indefinitely.  We probably need to add a timeout, but this is a useful
stopgap in the mean time.

Reviewed by:	marcel
2004-08-15 02:06:27 +00:00
rwatson
c7e2313e86 Cause pfind() not to return processes in the PRS_NEW state. As a result,
threads consuming the result of pfind() will not need to check for a NULL
credential pointer or other signs of an incompletely created process.
However, this also means that pfind() cannot be used to test for the
existence or find such a process.  Annotate pfind() to indicate that this
is the case.  A review of curent consumers seems to indicate that this is
not a problem for any of them.  This closes a number of race conditions
that could result in NULL pointer dereferences and related failure modes.
Other related races continue to exist, especially during iteration of the
allproc list without due caution.

Discussed with:	tjr, green
2004-08-14 17:15:16 +00:00
phk
9595df2db1 Add some KASSERTS. 2004-08-14 08:33:49 +00:00
julian
ae4d7bb6b9 Whitespace nit. 2004-08-14 07:21:20 +00:00
rwatson
136013f29f After completing a name lookup for a target UNIX domain socket to
connect to, re-check that the local UNIX domain socket hasn't been
closed while we slept, and if so, return EINVAL.  This affects the
system running both with and without Giant over the network stack,
and recent ULE changes appear to cause it to trigger more
frequently than previously under load.  While here, improve catching
of possibly closed UNIX domain sockets in one or two additional
circumstances.  I have a much larger set of related changes in
Perforce, but they require more testing before they can be merged.

One debugging printf is left in place to indicate when such a race
takes place: this is typically triggered by a buggy application
that simultaenously connect()'s and close()'s a UNIX domain socket
file descriptor.  I'll remove this at some point in the future, but
am interested in seeing how frequently this is reported.  In the
case of Martin's reported problem, it appears to be a result of a
non-thread safe syslog() implementation in the C library, which
does not synchronize access to its logging file descriptor.

Reported by:	mbr
2004-08-14 03:43:49 +00:00
jmg
bea28d4a04 clean up whitespace... 2004-08-13 17:43:53 +00:00
jmg
d2ff11056b looks like rwatson forgot tabs... :) 2004-08-13 07:38:58 +00:00
julian
9ab7967d3c Don't keep evaluating our own cpu mask..
it's not likely to have changed....
2004-08-13 00:57:43 +00:00
rwatson
74889f1a20 Trim trailing white space. 2004-08-12 18:06:21 +00:00
imp
482740a238 Minor formatting fixes for lines > 80 characters 2004-08-12 17:26:22 +00:00
jeff
8745e98dd0 - Introduce a new flag KEF_HOLD that prevents sched_add() from doing a
migration.  Use this in sched_prio() and sched_switch() to stop us from
   migrating threads that are in short term sleeps or are runnable.  These
   extra migrations were added in the patches to support KSE.
 - Only set NEEDRESCHED if the thread we're adding in sched_add() is a
   lower priority and is being placed on the current queue.
 - Fix some minor whitespace problems.
2004-08-12 07:56:33 +00:00
julian
765ec5c83b Properly keep track of how many kses are on the system run queue(s). 2004-08-11 20:54:48 +00:00
rwatson
eed836416f Replace a reference to splnet() with a reference to locking in a comment. 2004-08-11 03:43:10 +00:00
marcel
fbbaea5f90 Add __elfN(dump_thread). This function is called from __elfN(coredump)
to allow dumping per-thread machine specific notes. On ia64 we use this
function to flush the dirty registers onto the backingstore before we
write out the PRSTATUS notes.

Tested on: alpha, amd64, i386, ia64 & sparc64
Not tested on: arm, powerpc
2004-08-11 02:35:06 +00:00
rwatson
371cf09cf7 In v_addpollinfo(), we allocate storage to back vp->v_pollinfo. However,
we may sleep when doing so; check that we didn't race with another thread
allocating storage for the vnode after allocation is made to a local
pointer, and only update the vnode pointer if it's still NULL.  Otherwise,
accept that another thread got there first, and release the local storage.

Discussed with:	jmg
2004-08-11 01:27:53 +00:00
alc
7210ecc993 Eliminate the acquisition and release of Giant within physio(). Remove
the spl calls.

Reviewed by: phk@
Discussed with: scottl@
2004-08-10 21:47:11 +00:00
jhb
15d4b7d989 Synchronize the extra SA threading checks and return value handling of
condition variables with that of msleep().

Reviewed by:	davidxu
2004-08-10 17:42:59 +00:00
jeff
b109ddffbc - Use a new flag, KEF_XFERABLE, to record with certainty that this kse had
contributed to the transferable load count.  This prevents any potential
   problems with sched_pin() being used around calls to setrunqueue().
 - Change the sched_add() load balancing algorithm to try to migrate on
   wakeup.  This attempts to place threads that communicate with each other
   on the same CPU.
 - Don't clear the idle counts in kseq_transfer(), let the cpus do that when
   they call sched_add() from kseq_assign().
 - Correct a few out of date comments.
 - Make sure the ke_cpu field is correct when we preempt.
 - Call kseq_assign() from sched_clock() to catch any assignments that were
   done without IPI.  Presently all assignments are done with an IPI, but I'm
   trying a patch that limits that.
 - Don't migrate a thread if it is still runnable in sched_add().  Previously,
   this could only happen for KSE threads, but due to changes to
   sched_switch() all threads went through this path.
 - Remove some code that was added with preemption but is not necessary.
2004-08-10 07:52:21 +00:00
njl
7e21ce666c Skip the syncing disks loop if there are no dirty buffers. Remove a
variable used to flag the initial printf.

Submitted by:	truckman (earlier version)
2004-08-10 01:32:05 +00:00
scottl
ab3ce7c4d9 Add a temporary debugging hack to detect a deadlock in setrunqueue(). This
is here so that we can gather stats on the nature of the recent rash of
hard lockups, and in this particular case panic the machine instead of
letting it deadlock forever.
2004-08-10 00:26:25 +00:00
julian
00a6534a31 Slight changes to comments and some whitespace changes. 2004-08-09 21:57:30 +00:00
julian
38d3d854fe Make kg->kg_runnable actually count runnable threads in the ksegrp run queue
instead of only doing it sometimes.. This is not used outdide of debugging code
in the current code, but that will probably change.
2004-08-09 20:36:03 +00:00
julian
ecbe8aa287 Remove typos on KASSERT messages. 2004-08-09 20:13:07 +00:00
green
fbabec2d12 Normalize the VM wiring done with SPARSE_MAPPING: check for errors, and
unmap when done.  For whatever reason, SPARSE_MAPPING is not even a
config option, so this is dead code.
2004-08-09 18:46:13 +00:00
julian
61fada7840 Increase the amount of data exported by KTR in the KTR_RUNQ setting.
This extra data is needed to really follow what is going on in the
threaded case.
2004-08-09 18:21:12 +00:00
jmg
2c2b6c4ef7 add option to automaticly mark core dumps with the nodump flag
PR:		57065
Submitted by:	Walter C. Pelissero
2004-08-09 05:46:46 +00:00
davidxu
634d20a05e 1.Add KSE_INTR_DBSUSPEND command for kse_thr_interrupt to suspend a bound
thread, after the bound thread leaves critical region, the thread should
check debug flag may suspend itself by using the command.
2.Schedule upcall after thread is suspended by debugger
3.Wakeup upcall thread after process suspension.

Reviewed by: deischen
2004-08-08 22:32:20 +00:00
davidxu
f8c21c52ad Call thread_user_enter for M:N thread, ast() should be treated as another
entrance of kernel.
2004-08-08 22:28:33 +00:00
davidxu
6412ad5b2e Add pl_flags to ptrace_lwpinfo, two flags PL_FLAG_SA and PL_FLAG_BOUND
indicate that a thread is in UTS critical region.

Reviewed by: deischen
Approved by: marcel
2004-08-08 22:26:11 +00:00
dfr
6a047f3d1e Make sure that AT_PHDR has a useful value even for static programs. 2004-08-08 09:48:10 +00:00
jmg
6967b9b093 rearange some code that handles the thread taskqueue so that it is more
generic.  Introduce a new define TASKQUEUE_DEFINE_THREAD that takes a
single arg, which is the name of the queue.

Document these changes.
2004-08-08 02:37:22 +00:00
rwatson
656f433813 We're not yet ready to assert !Giant in kern_fcntl(), as it's called
with Giant from ABI wrappers such as Linux emulation.

Foot shoot off:	phk
2004-08-07 14:09:02 +00:00
rwatson
37eebe5058 Flag a broad range of VFS operations as GIANT_REQUIRED in order to
catch leaking into VFS without Giant.

Inch Giant a little lower in several file descriptor operations on
vnodes to cover only VFS operations that need it, rather than file
flag reading, etc.
2004-08-06 22:25:35 +00:00
rwatson
ee17f9503f In thread_exit(), include more information about the thread/process
context in the KTR trace record.  In particular, include the same
information as passed for mi_switch() and fork_exit() KTR trace
records.
2004-08-06 22:06:14 +00:00
rwatson
d6384e3daf Push UIDINFO_UNLOCK() slightly earlier in chgsbize(), as it's not
needed if we print the local variable version of the limit rather
than the shared version.
2004-08-06 22:04:33 +00:00
rwatson
8de3afda37 Avoid acquiring Giant for some common light-weight or already MPSAFE
fcntl() operations, including:

  F_DUPFD          dup() alias
  F_GETFD          retrieve close-on-exec flag
  F_SETFD          set close-on-exec flag
  F_GETFL          retrieve file descriptor flags

For the remaining fcntl() operations, do acquire Giant, especially
where we call into fo_ioctl() as a result.  We're not yet ready to
push Giant into fo_ioctl().  Once we do, this can all become quite a
bit prettier.
2004-08-06 22:00:55 +00:00
rwatson
36a8fef8a8 Cut a KTR record whenever a callout is invoked. Mark whether it runs
with Giant or not, and include the function point so it can be looked
up against the kernel symbol table during trace analysis.
2004-08-06 21:49:00 +00:00
jhb
d3254af40d Don't scare users with a warning about preemption being off when it isn't
yet safe to have on by default.
2004-08-06 15:49:44 +00:00
rwatson
6680706c2b In ithread_schedule(), when we plan to go harvest some entropy as
a result of scheduling an ithread, cut a KTR_INTR trace record so
that it's clear in tracing interrupt activity where and when the
entropy harvesting code is invoked.
2004-08-06 03:39:28 +00:00
cperciva
b4bae139fd When reseting a pending callout, perform the deregistration in
callout_reset rather than calling callout_stop.  This results in a few
lines of code duplication, but it provides a significant performance
improvement because it avoids recursing on callout_lock.

Requested by:	rwatson
2004-08-06 02:44:58 +00:00
jhb
73d1afd6fd Fix the code in rman that merges adjacent unallocated resources to use a
better check for 'adjacent'.  The old code assumed that if two resources
were adjacent in the linked list that they were also adjacent range wise.
This is not true when a resource manager has to manage disparate regions.
For example, the current interrupt code on i386/amd64 will instruct
irq_rman to manage two disjoint regions: 0-1 and 3-15 for the non-APIC
case.  If IRQs 1 and 3 were allocated and then released, the old code
would coalesce across the 1 to 3 boundary because the resources were
adjacent in the linked list thus adding 2 to the area of resources that
irq_rman managed as a side effect.  The fix adds extra checks so that
adjacent unallocated resources are only merged with the resource being
freed if the start and end values of the resources also match up.  The
patch also consolidates the checks for adjacent resources being allocated.
2004-08-05 15:48:18 +00:00
jhb
fb7bd65f3f Remove a potential deadlock on i386 SMP by changing the lazypmap ipi and
spin-wait code to use the same spin mutex (smp_tlb_mtx) as the TLB ipi
and spin-wait code snippets so that you can't get into the situation of
one CPU doing a TLB shootdown to another CPU that is doing a lazy pmap
shootdown each of which are waiting on each other.  With this change, only
one of the CPUs would do an IPI and spin-wait at a time.
2004-08-04 20:31:19 +00:00
jhb
f513ad537c Workaround a possible deadlock on SMP due to a spin lock LOR by disabling
the immediate awakening of proc0 (scheduler kproc, controls swapping
processes in and out).  The scheduler process periodically awakens already,
so this will not result in processes not being swapped in, there will just
be more latency in between a thread being made runnable and the scheduler
waking up to swap the affected process back in.
2004-08-04 20:24:40 +00:00