Commit Graph

8835 Commits

Author SHA1 Message Date
Robert Watson
923633b4b5 In closef(), remove the assumption that there is a thread associated
with the file descriptor.  When a file descriptor is closed as a result
of garbage collecting a UNIX domain socket, the file descriptor will
not have any associated thread, so the logic to identify advisory locks
held by that thread is not appropriate.  Check the thread for NULL to
avoid this scenario.  Expand an existing comment to say a bit more about
this.

MFC after:	1 week
2005-11-09 20:54:25 +00:00
Warner Losh
5d56add2ba General consensus is that it would be even better to run this in a
thread context.  While it doesn't matter too much at the moment, in
the future we could be back in the same boat if/when more restrictions
are placed (or enforced) in a SWI.

Suggested by: njl, bde, jhb, scottl
2005-11-09 16:22:56 +00:00
John Baldwin
26b7bd707c Use intptr_t casts to convert void * <--> int to make 64-bit archs happy. 2005-11-09 15:15:59 +00:00
Ruslan Ermilov
303989a2f3 Use sparse initializers for "struct domain" and "struct protosw",
so they are easier to follow for the human being.
2005-11-09 13:29:16 +00:00
David Xu
f4d8522334 WIFxxx macros requires an int type but p_xstat is short, convert it
to int before using the macros.

Bug reported by : Pyun YongHyeon pyunyh at gmail dot com
2005-11-09 07:58:16 +00:00
Warner Losh
161604d863 Kick off the suspend sequence from the keyboard in a SWI rather than
in the hardware interrupt context (even if it is likely just an
ithread).  We don't document that suspend/resume routines are run from
such a context and some of the things that happen in those routines
aren't interrupt safe.  Since there's no real need to run from that
context, this restores assumptions that suspend routines have made.

This fixes Thierry Herbelot's 'Trying to sleep while sleeping is
prohibited' problem.
2005-11-09 07:32:01 +00:00
Warner Losh
2002eaadb7 Clarify panic message, I parsed the old one 'trying to sleep while sleeping' 2005-11-09 07:28:52 +00:00
Craig Rodrigues
4560dfb5b1 For nmount(), allow a text string error message to be propagated back
to user-space if a parameter named "errmsg" is passed into the iovec.
Used in conjunction with vfs_mount_error(), more useful error messages
than errno can be passed back to userspace when mounting a filesystem
fails.

Discussed with:		phk, pjd
2005-11-09 02:26:38 +00:00
David Xu
323fe56580 In aio_waitcomplete, do not return EAGAIN if no other threads
have started aio, instead, initialize aio management structure
if it hasn't been done, the reason to adjust this behavior is
to make it a bit friendly for threaded program, consider two
threads, one submits aio_write, and another just calls
aio_waitcomplete to wait any I/O to be completed and recycle the
aio requests, before submitter doing any I/O, the recycler wants
to wait in kernel. This also fixes inconsistency with other aio
syscalls.
2005-11-08 23:48:32 +00:00
David Xu
c20cedbfc9 Make sure pending SIGCHLD is removed from previous parent when process
is attached or detached.
2005-11-08 23:28:12 +00:00
John Baldwin
2a522eb9d3 Various and sundry cleanups:
- Use curthread for calls to knlist_delete() and add a big comment
  explaining why as well as appropriate assertions.
- Use TAILQ_FOREACH and TAILQ_FOREACH_SAFE instead of handrolling them.
- Use fget() family of functions to lookup file objects instead of
  grovelling around in file descriptor tables.
- Destroy the aio_freeproc mutex if we are unloaded.

Tested on:	i386
2005-11-08 17:43:05 +00:00
Christian S.J. Peron
576068804d Giant clean up for exit(2)
-Change unconditional aquisition of Giant to only pickup Giant if the vnode
 for the controlling tty resides on a non-mpsafe file system.
-Pickup Giant around executable vnode reference counting operations only if
 the executable resides on a non-mpsafe file system.
-If this process is being traced, pickup Giant for trace file reference count
 operations only if it resides on a non-mpsafe file system.

Discussed with:	jhb
Tested by:	kris
2005-11-08 17:11:03 +00:00
David Xu
ebceaf6dc7 Add support for queueing SIGCHLD same as other UNIX systems did.
For each child process whose status has been changed, a SIGCHLD instance
is queued, if the signal is stilling pending, and process changed status
several times, signal information is updated to reflect latest process
status. If wait() returns because the status of a child process is
available, pending SIGCHLD signal associated with the child process is
discarded. Any other pending SIGCHLD signals remain pending.

The signal information is allocated at the same time when proc structure
is allocated, if process signal queue is fully filled or there is a memory
shortage, it can still send the signal to process.

There is a booting time tunable kern.sigqueue.queue_sigchild which
can control the behavior, setting it to zero disables the SIGCHLD queueing
feature, the tunable will be removed if the function is proved that it is
stable enough.

Tested on: i386 (SMP and UP)
2005-11-08 09:09:26 +00:00
Craig Rodrigues
84e69560b6 Add utility function to propagate mount errors as text string messages.
Discussed with:		phk
2005-11-08 04:13:39 +00:00
Gleb Smirnoff
49d46b616e Fix panic string in last revision. 2005-11-06 16:47:59 +00:00
Andre Oppermann
cd5bb63b3d Free only those mbuf+clusters back to the packet zone that were allocated
from there.  All others get broken up and free'd individually to the mbuf
and cluster zones.

The packet zone is a secondary zone to the mbuf zone.  There is currently
a limitation in UMA which prevents decreasing the packet zone stock when
the mbuf and cluster zone are drained and all their members are part of
packets.  When this is fixed this change may be reverted.
2005-11-05 19:43:55 +00:00
Andre Oppermann
a5f7708723 Fix a logic error introduced with mandatory mbuf cluster refcounting and
freeing of mbufs+clusters back to the packet zone.
2005-11-04 17:20:53 +00:00
David Xu
8f0371f19d Fix name compatible problem with POSIX standard. the sigval_ptr and
sigval_int really should be sival_ptr and sival_int.
Also sigev_notify_function accepts a union sigval value but not a
pointer.
2005-11-04 09:41:00 +00:00
John Baldwin
091e8307d0 Add stoppcbs[] arrays on Alpha and sparc64 and have each CPU save its
current context in the IPI_STOP handler so that we can get accurate stack
traces of threads on other CPUs on these two archs like we do now on i386
and amd64.

Tested on:	alpha, sparc64
2005-11-03 21:08:20 +00:00
John Baldwin
55de4dcab6 Fix 'show allpcpu' ddb command on non-x86. CPU IDs are in the range 0 ..
mp_maxid, not 0 .. mp_maxid - 1.  The result was that the highest numbered
CPU was skipped on Alpha and sparc64.

MFC after:	1 week
2005-11-03 21:06:29 +00:00
Pawel Jakub Dawidek
2a143d5bf5 Detect memory leaks when memory type is being destroyed.
This is very helpful for detecting kernel modules memory leaks on unload.

Discussed and reviewed by:	rwatson
2005-11-03 13:48:59 +00:00
David Xu
4c0fb2cfff Support sending realtime signal information via signal queue, realtime
signal memory is pre-allocated, so kernel can always notify user code.
2005-11-03 05:25:26 +00:00
David Xu
6d7b314b14 Cleanup some signal interfaces. Now the tdsignal function accepts
both proc pointer and thread pointer, if thread pointer is NULL,
tdsignal automatically finds a thread, otherwise it sends signal
to given thread.
Add utility function psignal_event to send a realtime sigevent
to a process according to the delivery requirement specified in
struct sigevent.
2005-11-03 04:49:16 +00:00
David Xu
d1f16b4d2e Oops, don't change tdsignal call. 2005-11-03 01:38:49 +00:00
David Xu
44355392b4 Add thread_find() function to search a thread by lwpid. 2005-11-03 01:34:08 +00:00
Paul Saab
1471f287e1 Calling setrlimit from 32bit apps could potentially increase certain
limits beyond what should be capiable in a 32bit process, so we
must fixup the limits.

Reviewed by:	jhb
2005-11-02 21:18:07 +00:00
Andre Oppermann
56a4e45ab3 Mandatory mbuf cluster reference counting and groundwork for UMA
based jumbo 9k and jumbo 16k cluster support.

All mbuf's with external storage attached are mandatory reference
counted.  For clusters and jumbo clusters UMA provides the refcnt
storage directly.  It does not have to be separatly allocated.  Any
other type of external storage gets its own refcnt allocated from
an UMA mbuf refcnt zone instead of normal kernel malloc.

The refcount API MEXT_ADD_REF() and MEXT_REM_REF() is no longer
publically accessible.  The proper m_* functions have to be used.

mb_ctor_clust() and mb_dtor_clust() both handle normal 2K as well
as 9k and 16k clusters.

Clusters and jumbo clusters may be obtained without attaching it
immideatly to an mbuf.  This is for high performance cluster
allocation in network drivers where mbufs are attached after the
cluster has been filled.

Tested by:	rwatson
Sponsored by:	TCP/IP Optimizations Fundraise 2005
2005-11-02 16:20:36 +00:00
Andre Oppermann
34333b16cd Retire MT_HEADER mbuf type and change its users to use MT_DATA.
Having an additional MT_HEADER mbuf type is superfluous and redundant
as nothing depends on it.  It only adds a layer of confusion.  The
distinction between header mbuf's and data mbuf's is solely done
through the m->m_flags M_PKTHDR flag.

Non-native code is not changed in this commit.  For compatibility
MT_HEADER is mapped to MT_DATA.

Sponsored by:	TCP/IP Optimization Fundraise 2005
2005-11-02 13:46:32 +00:00
John Baldwin
68a17869c1 Push down Giant into fdfree() and remove it from two of the callers.
Other callers such as some rfork() cases weren't locking Giant anyway.

Reviewed by:	csjp
MFC after:	1 week
2005-11-01 17:13:05 +00:00
Robert Watson
2bdeb3f9f2 Reuse ktr_unused field in ktr_header structure as ktr_tid; populate
ktr_tid as part of gathering of ktr header data for new ktrace
records.  The continued use of intptr_t is required for file layout
reasons, and cannot be changed to lwpid_t at this point.

MFC after:	1 month
Reviewed by:	davidxu
2005-11-01 14:46:37 +00:00
Robert Watson
d977a58307 Replace ktr_buffer pointer in struct ktr_header with a ktr_unused
intptr_t.  The buffer length needs to be written to disk as part
of the trace log, but the kernel pointer for the buffer does not.
Add a new ktr_buffer pointer to the kernel-only ktrace request
structure to hold that pointer.  This frees up an integer in the
ktrace record format that can be used to hold the threadid,
although older ktrace files will have a garbage ktr_buffer field
(or more accurately, a kernel pointer value).

MFC after:		2 weeks
Space requested by:	davidxu
2005-11-01 12:36:19 +00:00
Paul Saab
ecc44de7a2 Reformat socket control messages on input/output for 32bit compatibility
on 64bit systems.

Submitted by:	ps, ups
Reviewed by:	jhb
2005-10-31 21:09:56 +00:00
John Baldwin
f6494f2e13 Check to see if the hash table is present in link_elf_lookup_symbol()
before dereferencing it.  Certain corrupt kernel modules might not have
a valid hash table, and would cause a kernel panic when they were loaded.
Instead of panic'ing, the kernel now prints out a warning that it is
missing the symbol hash table.

Tested by:	Benjamin Close Benjamin dot Close at clearchain dot com
MFC after:	1 week
2005-10-31 19:17:32 +00:00
Robert Watson
5bb84bc84b Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in
  memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
  as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
  memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
  attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion.  Similar changes are required for UMA zone names.
2005-10-31 15:41:29 +00:00
Robert Watson
d374e81efd Push the assignment of a new or updated so_qlimit from solisten()
following the protocol pru_listen() call to solisten_proto(), so
that it occurs under the socket lock acquisition that also sets
SO_ACCEPTCONN.  This requires passing the new backlog parameter
to the protocol, which also allows the protocol to be aware of
changes in queue limit should it wish to do something about the
new queue limit.  This continues a move towards the socket layer
acting as a library for the protocol.

Bump __FreeBSD_version due to a change in the in-kernel protocol
interface.  This change has been tested with IPv4 and UNIX domain
sockets, but not other protocols.
2005-10-30 19:44:40 +00:00
David Xu
56c06c4b67 Let itimer store itimerspec instead of itimerval, so I don't have to
convert to or from timeval frequently.

Introduce function itimer_accept() to ack a timer signal in signal
acceptance code, this allows us to return more fresh overrun counter
than at signal generating time. while POSIX says:
"the value returned by timer_getoverrun() shall apply to the most
recent expiration signal delivery or acceptance for the timer,.."
I prefer returning it at acceptance time.

Introduce SIGEV_THREAD_ID notification mode, it is used by thread
libary to request kernel to deliver signal to a specified thread,
and in turn, the thread library may use the mechanism to implement
SIGEV_THREAD which is required by POSIX.

Timer signal is managed by timer code, so it can not fail even if
signal queue is full filled by sigqueue syscall.
2005-10-30 02:56:08 +00:00
David Xu
01790d850d Regen. 2005-10-30 02:14:37 +00:00
David Xu
0972628aff Fix sigevent's POSIX incompatible problem by adding member fields
sigev_notify_function and sigev_notify_attributes. AIO syscalls
use sigevent, so they have to be adjusted.

Reviewed by:	alc
2005-10-30 02:12:49 +00:00
Ed Maste
5d89e1d0af In watchdog_config enable the software watchdog iff the WD_ACTIVE flag is
set.  When watchdogd(1) is terminated intentionally it clears the bit,
which should then disable it in the kernel.

PR:		kern/74386
Submitted by:	Alex Hoff <ahoff at sandvine dot com>
Approved by:	phk, rwatson (mentor)
2005-10-27 17:22:47 +00:00
John Baldwin
2851f51eb1 Revert most of revision 1.235 and fix the problem a different way. We
can't acquire an sx lock in ttyinfo() because ttyinfo() can be called
from interrupt handlers (such as atkbd_intr()).  Instead, go back to
locking the process group while we pick a thread to display information for
and hold that lock until after we drop sched_lock to make sure the
process doesn't exit out from under us.  sched_lock ensures that the
specific thread from that process doesn't go away.  To protect against
the process exiting after we drop the proc lock but before we dereference
it to lookup the pid and p_comm in the call to ttyprintf(), we now copy
the pid and p_comm to local variables while holding the proc lock.

This problem was found by the recently added TD_NO_SLEEPING assertions for
interrupt handlers.

Tested by:	emaste
MFC after:	1 week
2005-10-27 16:47:28 +00:00
Paul Saab
53f5742d33 Allow 32bit get/setsockopt with SO_SNDTIMEO or SO_RECVTIMEO to work. 2005-10-27 04:26:35 +00:00
Peter Wemm
40c9966a37 Commit something we found useful at work at one point. Add sysctls for
debug.kdb.panic and debug.kdb.trap alongside the existing debug.kdb.enter
sysctl.  'panic' causes a panic, and 'trap' causes a page fault.  We used
these to ensure that crash dumps succeed from those two common failure
modes.  This avoids the need for creating a 'panic' kld module.
2005-10-26 22:40:07 +00:00
John Baldwin
fe486a370a Add a swi_remove() function to teardown software interrupt handlers. For
now it just calls intr_event_remove_handler(), but at some point it might
also be responsible for tearing down interrupt events created via swi_add.
2005-10-26 15:51:05 +00:00
Gleb Smirnoff
c0bc2867c1 - Fix leak of struct nlminfo on process exit.
- Fix malloc type collision, that made the above problem
  difficult to understand.

Reported by:	Vladimir Sharun <sharun ukr.net>
2005-10-26 07:18:37 +00:00
David Xu
4938faa635 do umtx_wake at userland thread exit address, so that others userland
threads can wait for a thread to exit, and safely assume that the thread
has left userland and is no longer using its userland stack, this is
necessary for pthread_join when a thread is waiting for another thread
to exit which has user customized stack, after pthread_join returns,
the userland stack can be reused for other purposes, without this change,
the joiner thread has to spin at the address to ensure the thread is really
exited.
2005-10-26 06:55:46 +00:00
John Baldwin
e0f66ef861 Reorganize the interrupt handling code a bit to make a few things cleaner
and increase flexibility to allow various different approaches to be tried
in the future.
- Split struct ithd up into two pieces.  struct intr_event holds the list
  of interrupt handlers associated with interrupt sources.
  struct intr_thread contains the data relative to an interrupt thread.
  Currently we still provide a 1:1 relationship of events to threads
  with the exception that events only have an associated thread if there
  is at least one threaded interrupt handler attached to the event.  This
  means that on x86 we no longer have 4 bazillion interrupt threads with
  no handlers.  It also means that interrupt events with only INTR_FAST
  handlers no longer have an associated thread either.
- Renamed struct intrhand to struct intr_handler to follow the struct
  intr_foo naming convention.  This did require renaming the powerpc
  MD struct intr_handler to struct ppc_intr_handler.
- INTR_FAST no longer implies INTR_EXCL on all architectures except for
  powerpc.  This means that multiple INTR_FAST handlers can attach to the
  same interrupt and that INTR_FAST and non-INTR_FAST handlers can attach
  to the same interrupt.  Sharing INTR_FAST handlers may not always be
  desirable, but having sio(4) and uhci(4) fight over an IRQ isn't fun
  either.  Drivers can always still use INTR_EXCL to ask for an interrupt
  exclusively.  The way this sharing works is that when an interrupt
  comes in, all the INTR_FAST handlers are executed first, and if any
  threaded handlers exist, the interrupt thread is scheduled afterwards.
  This type of layout also makes it possible to investigate using interrupt
  filters ala OS X where the filter determines whether or not its companion
  threaded handler should run.
- Aside from the INTR_FAST changes above, the impact on MD interrupt code
  is mostly just 's/ithread/intr_event/'.
- A new MI ddb command 'show intrs' walks the list of interrupt events
  dumping their state.  It also has a '/v' verbose switch which dumps
  info about all of the handlers attached to each event.
- We currently don't destroy an interrupt thread when the last threaded
  handler is removed because it would suck for things like ppbus(8)'s
  braindead behavior.  The code is present, though, it is just under
  #if 0 for now.
- Move the code to actually execute the threaded handlers for an interrrupt
  event into a separate function so that ithread_loop() becomes more
  readable.  Previously this code was all in the middle of ithread_loop()
  and indented halfway across the screen.
- Made struct intr_thread private to kern_intr.c and replaced td_ithd
  with a thread private flag TDP_ITHREAD.
- In statclock, check curthread against idlethread directly rather than
  curthread's proc against idlethread's proc. (Not really related to intr
  changes)

Tested on:	alpha, amd64, i386, sparc64
Tested on:	arm, ia64 (older version of patch by cognet and marcel)
2005-10-25 19:48:48 +00:00
John Baldwin
6caf758e7c Use shorter names for the Giant and fast taskqueues so that their names
actually fit.
2005-10-25 19:29:02 +00:00
John Baldwin
58553b9925 Rename the KDB_STOP_NMI kernel option to STOP_NMI and make it apply to all
IPI_STOP IPIs.
- Change the i386 and amd64 MD IPI code to send an NMI if STOP_NMI is
  enabled if an attempt is made to send an IPI_STOP IPI.  If the kernel
  option is enabled, there is also a sysctl to change the behavior at
  runtime (debug.stop_cpus_with_nmi which defaults to enabled).  This
  includes removing stop_cpus_nmi() and making ipi_nmi_selected() a
  private function for i386 and amd64.
- Fix ipi_all(), ipi_all_but_self(), and ipi_self() on i386 and amd64 to
  properly handle bitmapped IPIs as well as IPI_STOP IPIs when STOP_NMI is
  enabled.
- Fix ipi_nmi_handler() to execute the restart function on the first CPU
  that is restarted making use of atomic_readandclear() rather than
  assuming that the BSP is always included in the set of restarted CPUs.
  Also, the NMI handler didn't clear the function pointer meaning that
  subsequent stop and restarts could execute the function again.
- Define a new macro HAVE_STOPPEDPCBS on i386 and amd64 to control the use
  of stoppedpcbs[] and always enable it for i386 and amd64 instead of
  being dependent on KDB_STOP_NMI.  It works fine in both the NMI and
  non-NMI cases.
2005-10-24 21:04:19 +00:00
John Baldwin
6b1e0d75b0 - Various small whitespace and style nits.
- Use PCPU_GET(cpumask) in preference to 1 << PCPU_GET(cpuid) in a few
  places.
2005-10-24 20:31:04 +00:00
John Baldwin
f55ab99409 Document in #ifdef notnow code the actions that proc_fini would need to
take if struct procs were actually freed.
2005-10-24 20:15:23 +00:00