Commit Graph

8864 Commits

Author SHA1 Message Date
David Xu
4c0fb2cfff Support sending realtime signal information via signal queue, realtime
signal memory is pre-allocated, so kernel can always notify user code.
2005-11-03 05:25:26 +00:00
David Xu
6d7b314b14 Cleanup some signal interfaces. Now the tdsignal function accepts
both proc pointer and thread pointer, if thread pointer is NULL,
tdsignal automatically finds a thread, otherwise it sends signal
to given thread.
Add utility function psignal_event to send a realtime sigevent
to a process according to the delivery requirement specified in
struct sigevent.
2005-11-03 04:49:16 +00:00
David Xu
d1f16b4d2e Oops, don't change tdsignal call. 2005-11-03 01:38:49 +00:00
David Xu
44355392b4 Add thread_find() function to search a thread by lwpid. 2005-11-03 01:34:08 +00:00
Paul Saab
1471f287e1 Calling setrlimit from 32bit apps could potentially increase certain
limits beyond what should be capiable in a 32bit process, so we
must fixup the limits.

Reviewed by:	jhb
2005-11-02 21:18:07 +00:00
Andre Oppermann
56a4e45ab3 Mandatory mbuf cluster reference counting and groundwork for UMA
based jumbo 9k and jumbo 16k cluster support.

All mbuf's with external storage attached are mandatory reference
counted.  For clusters and jumbo clusters UMA provides the refcnt
storage directly.  It does not have to be separatly allocated.  Any
other type of external storage gets its own refcnt allocated from
an UMA mbuf refcnt zone instead of normal kernel malloc.

The refcount API MEXT_ADD_REF() and MEXT_REM_REF() is no longer
publically accessible.  The proper m_* functions have to be used.

mb_ctor_clust() and mb_dtor_clust() both handle normal 2K as well
as 9k and 16k clusters.

Clusters and jumbo clusters may be obtained without attaching it
immideatly to an mbuf.  This is for high performance cluster
allocation in network drivers where mbufs are attached after the
cluster has been filled.

Tested by:	rwatson
Sponsored by:	TCP/IP Optimizations Fundraise 2005
2005-11-02 16:20:36 +00:00
Andre Oppermann
34333b16cd Retire MT_HEADER mbuf type and change its users to use MT_DATA.
Having an additional MT_HEADER mbuf type is superfluous and redundant
as nothing depends on it.  It only adds a layer of confusion.  The
distinction between header mbuf's and data mbuf's is solely done
through the m->m_flags M_PKTHDR flag.

Non-native code is not changed in this commit.  For compatibility
MT_HEADER is mapped to MT_DATA.

Sponsored by:	TCP/IP Optimization Fundraise 2005
2005-11-02 13:46:32 +00:00
John Baldwin
68a17869c1 Push down Giant into fdfree() and remove it from two of the callers.
Other callers such as some rfork() cases weren't locking Giant anyway.

Reviewed by:	csjp
MFC after:	1 week
2005-11-01 17:13:05 +00:00
Robert Watson
2bdeb3f9f2 Reuse ktr_unused field in ktr_header structure as ktr_tid; populate
ktr_tid as part of gathering of ktr header data for new ktrace
records.  The continued use of intptr_t is required for file layout
reasons, and cannot be changed to lwpid_t at this point.

MFC after:	1 month
Reviewed by:	davidxu
2005-11-01 14:46:37 +00:00
Robert Watson
d977a58307 Replace ktr_buffer pointer in struct ktr_header with a ktr_unused
intptr_t.  The buffer length needs to be written to disk as part
of the trace log, but the kernel pointer for the buffer does not.
Add a new ktr_buffer pointer to the kernel-only ktrace request
structure to hold that pointer.  This frees up an integer in the
ktrace record format that can be used to hold the threadid,
although older ktrace files will have a garbage ktr_buffer field
(or more accurately, a kernel pointer value).

MFC after:		2 weeks
Space requested by:	davidxu
2005-11-01 12:36:19 +00:00
Paul Saab
ecc44de7a2 Reformat socket control messages on input/output for 32bit compatibility
on 64bit systems.

Submitted by:	ps, ups
Reviewed by:	jhb
2005-10-31 21:09:56 +00:00
John Baldwin
f6494f2e13 Check to see if the hash table is present in link_elf_lookup_symbol()
before dereferencing it.  Certain corrupt kernel modules might not have
a valid hash table, and would cause a kernel panic when they were loaded.
Instead of panic'ing, the kernel now prints out a warning that it is
missing the symbol hash table.

Tested by:	Benjamin Close Benjamin dot Close at clearchain dot com
MFC after:	1 week
2005-10-31 19:17:32 +00:00
Robert Watson
5bb84bc84b Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in
  memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
  as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
  memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
  attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion.  Similar changes are required for UMA zone names.
2005-10-31 15:41:29 +00:00
Robert Watson
d374e81efd Push the assignment of a new or updated so_qlimit from solisten()
following the protocol pru_listen() call to solisten_proto(), so
that it occurs under the socket lock acquisition that also sets
SO_ACCEPTCONN.  This requires passing the new backlog parameter
to the protocol, which also allows the protocol to be aware of
changes in queue limit should it wish to do something about the
new queue limit.  This continues a move towards the socket layer
acting as a library for the protocol.

Bump __FreeBSD_version due to a change in the in-kernel protocol
interface.  This change has been tested with IPv4 and UNIX domain
sockets, but not other protocols.
2005-10-30 19:44:40 +00:00
David Xu
56c06c4b67 Let itimer store itimerspec instead of itimerval, so I don't have to
convert to or from timeval frequently.

Introduce function itimer_accept() to ack a timer signal in signal
acceptance code, this allows us to return more fresh overrun counter
than at signal generating time. while POSIX says:
"the value returned by timer_getoverrun() shall apply to the most
recent expiration signal delivery or acceptance for the timer,.."
I prefer returning it at acceptance time.

Introduce SIGEV_THREAD_ID notification mode, it is used by thread
libary to request kernel to deliver signal to a specified thread,
and in turn, the thread library may use the mechanism to implement
SIGEV_THREAD which is required by POSIX.

Timer signal is managed by timer code, so it can not fail even if
signal queue is full filled by sigqueue syscall.
2005-10-30 02:56:08 +00:00
David Xu
01790d850d Regen. 2005-10-30 02:14:37 +00:00
David Xu
0972628aff Fix sigevent's POSIX incompatible problem by adding member fields
sigev_notify_function and sigev_notify_attributes. AIO syscalls
use sigevent, so they have to be adjusted.

Reviewed by:	alc
2005-10-30 02:12:49 +00:00
Ed Maste
5d89e1d0af In watchdog_config enable the software watchdog iff the WD_ACTIVE flag is
set.  When watchdogd(1) is terminated intentionally it clears the bit,
which should then disable it in the kernel.

PR:		kern/74386
Submitted by:	Alex Hoff <ahoff at sandvine dot com>
Approved by:	phk, rwatson (mentor)
2005-10-27 17:22:47 +00:00
John Baldwin
2851f51eb1 Revert most of revision 1.235 and fix the problem a different way. We
can't acquire an sx lock in ttyinfo() because ttyinfo() can be called
from interrupt handlers (such as atkbd_intr()).  Instead, go back to
locking the process group while we pick a thread to display information for
and hold that lock until after we drop sched_lock to make sure the
process doesn't exit out from under us.  sched_lock ensures that the
specific thread from that process doesn't go away.  To protect against
the process exiting after we drop the proc lock but before we dereference
it to lookup the pid and p_comm in the call to ttyprintf(), we now copy
the pid and p_comm to local variables while holding the proc lock.

This problem was found by the recently added TD_NO_SLEEPING assertions for
interrupt handlers.

Tested by:	emaste
MFC after:	1 week
2005-10-27 16:47:28 +00:00
Paul Saab
53f5742d33 Allow 32bit get/setsockopt with SO_SNDTIMEO or SO_RECVTIMEO to work. 2005-10-27 04:26:35 +00:00
Peter Wemm
40c9966a37 Commit something we found useful at work at one point. Add sysctls for
debug.kdb.panic and debug.kdb.trap alongside the existing debug.kdb.enter
sysctl.  'panic' causes a panic, and 'trap' causes a page fault.  We used
these to ensure that crash dumps succeed from those two common failure
modes.  This avoids the need for creating a 'panic' kld module.
2005-10-26 22:40:07 +00:00
John Baldwin
fe486a370a Add a swi_remove() function to teardown software interrupt handlers. For
now it just calls intr_event_remove_handler(), but at some point it might
also be responsible for tearing down interrupt events created via swi_add.
2005-10-26 15:51:05 +00:00
Gleb Smirnoff
c0bc2867c1 - Fix leak of struct nlminfo on process exit.
- Fix malloc type collision, that made the above problem
  difficult to understand.

Reported by:	Vladimir Sharun <sharun ukr.net>
2005-10-26 07:18:37 +00:00
David Xu
4938faa635 do umtx_wake at userland thread exit address, so that others userland
threads can wait for a thread to exit, and safely assume that the thread
has left userland and is no longer using its userland stack, this is
necessary for pthread_join when a thread is waiting for another thread
to exit which has user customized stack, after pthread_join returns,
the userland stack can be reused for other purposes, without this change,
the joiner thread has to spin at the address to ensure the thread is really
exited.
2005-10-26 06:55:46 +00:00
John Baldwin
e0f66ef861 Reorganize the interrupt handling code a bit to make a few things cleaner
and increase flexibility to allow various different approaches to be tried
in the future.
- Split struct ithd up into two pieces.  struct intr_event holds the list
  of interrupt handlers associated with interrupt sources.
  struct intr_thread contains the data relative to an interrupt thread.
  Currently we still provide a 1:1 relationship of events to threads
  with the exception that events only have an associated thread if there
  is at least one threaded interrupt handler attached to the event.  This
  means that on x86 we no longer have 4 bazillion interrupt threads with
  no handlers.  It also means that interrupt events with only INTR_FAST
  handlers no longer have an associated thread either.
- Renamed struct intrhand to struct intr_handler to follow the struct
  intr_foo naming convention.  This did require renaming the powerpc
  MD struct intr_handler to struct ppc_intr_handler.
- INTR_FAST no longer implies INTR_EXCL on all architectures except for
  powerpc.  This means that multiple INTR_FAST handlers can attach to the
  same interrupt and that INTR_FAST and non-INTR_FAST handlers can attach
  to the same interrupt.  Sharing INTR_FAST handlers may not always be
  desirable, but having sio(4) and uhci(4) fight over an IRQ isn't fun
  either.  Drivers can always still use INTR_EXCL to ask for an interrupt
  exclusively.  The way this sharing works is that when an interrupt
  comes in, all the INTR_FAST handlers are executed first, and if any
  threaded handlers exist, the interrupt thread is scheduled afterwards.
  This type of layout also makes it possible to investigate using interrupt
  filters ala OS X where the filter determines whether or not its companion
  threaded handler should run.
- Aside from the INTR_FAST changes above, the impact on MD interrupt code
  is mostly just 's/ithread/intr_event/'.
- A new MI ddb command 'show intrs' walks the list of interrupt events
  dumping their state.  It also has a '/v' verbose switch which dumps
  info about all of the handlers attached to each event.
- We currently don't destroy an interrupt thread when the last threaded
  handler is removed because it would suck for things like ppbus(8)'s
  braindead behavior.  The code is present, though, it is just under
  #if 0 for now.
- Move the code to actually execute the threaded handlers for an interrrupt
  event into a separate function so that ithread_loop() becomes more
  readable.  Previously this code was all in the middle of ithread_loop()
  and indented halfway across the screen.
- Made struct intr_thread private to kern_intr.c and replaced td_ithd
  with a thread private flag TDP_ITHREAD.
- In statclock, check curthread against idlethread directly rather than
  curthread's proc against idlethread's proc. (Not really related to intr
  changes)

Tested on:	alpha, amd64, i386, sparc64
Tested on:	arm, ia64 (older version of patch by cognet and marcel)
2005-10-25 19:48:48 +00:00
John Baldwin
6caf758e7c Use shorter names for the Giant and fast taskqueues so that their names
actually fit.
2005-10-25 19:29:02 +00:00
John Baldwin
58553b9925 Rename the KDB_STOP_NMI kernel option to STOP_NMI and make it apply to all
IPI_STOP IPIs.
- Change the i386 and amd64 MD IPI code to send an NMI if STOP_NMI is
  enabled if an attempt is made to send an IPI_STOP IPI.  If the kernel
  option is enabled, there is also a sysctl to change the behavior at
  runtime (debug.stop_cpus_with_nmi which defaults to enabled).  This
  includes removing stop_cpus_nmi() and making ipi_nmi_selected() a
  private function for i386 and amd64.
- Fix ipi_all(), ipi_all_but_self(), and ipi_self() on i386 and amd64 to
  properly handle bitmapped IPIs as well as IPI_STOP IPIs when STOP_NMI is
  enabled.
- Fix ipi_nmi_handler() to execute the restart function on the first CPU
  that is restarted making use of atomic_readandclear() rather than
  assuming that the BSP is always included in the set of restarted CPUs.
  Also, the NMI handler didn't clear the function pointer meaning that
  subsequent stop and restarts could execute the function again.
- Define a new macro HAVE_STOPPEDPCBS on i386 and amd64 to control the use
  of stoppedpcbs[] and always enable it for i386 and amd64 instead of
  being dependent on KDB_STOP_NMI.  It works fine in both the NMI and
  non-NMI cases.
2005-10-24 21:04:19 +00:00
John Baldwin
6b1e0d75b0 - Various small whitespace and style nits.
- Use PCPU_GET(cpumask) in preference to 1 << PCPU_GET(cpuid) in a few
  places.
2005-10-24 20:31:04 +00:00
John Baldwin
f55ab99409 Document in #ifdef notnow code the actions that proc_fini would need to
take if struct procs were actually freed.
2005-10-24 20:15:23 +00:00
John Baldwin
cf23efc12a Don't panic if a spin lock is initialized that isn't in our static order
list.  Just warn about it instead.

Requested by:	scottl
MFC after:	1 day
2005-10-24 20:14:24 +00:00
John Baldwin
8d2a5b8c34 Revert previous change to this file. I accidentally committed while
fixing spelling in a comment.
2005-10-24 15:58:21 +00:00
John Baldwin
971d0ad835 Spell hierarchy correctly in comments.
Submitted by:	Wojciech A. Koszek dunstan at freebsd dot czest dot pl
2005-10-24 15:57:27 +00:00
Stephan Uphoff
198b0a3b71 Only set B_RAM (Read ahead mark) on an incore buffers if we can lock it.
This fixes a race condition caused by the unlocked write access to the
b_flags field.

MFC after:	3 days
2005-10-24 14:23:04 +00:00
David Xu
fe80a39034 Don't touch last overrun if signal was already on queue. 2005-10-23 22:59:33 +00:00
David Xu
60354683d9 Make p_itimers as a pointer, so file sys/proc.h does not need to include
sys/timers.h.
2005-10-23 12:19:08 +00:00
Alan Cox
5b5908005a Previously, nothing prevented the page that was returned by pmap_extract()
from being reclaimed before it was wired.  Use pmap_extract_and_hold()
instead of pmap_extract() and retain the hold on the page until it has been
wired.
2005-10-23 07:41:56 +00:00
David Xu
e706ee8a16 Regen for POSIX timer syscalls. 2005-10-23 04:26:10 +00:00
David Xu
86857b368d Implement POSIX timers. Current only CLOCK_REALTIME and CLOCK_MONOTONIC
clock are supported. I have plan to merge XSI timer ITIMER_REAL and other
two CPU timers into the new code, current three slots are available for
the XSI timers.
The SIGEV_THREAD notification type is not supported yet because our
sigevent struct lacks of two member fields:
sigev_notify_function
sigev_notify_attributes
I have found the sigevent is used in AIO, so I won't add the two members
unless the AIO code is adjusted.
2005-10-23 04:22:56 +00:00
David Xu
5da49fcb8a 1. Make ksiginfo_alloc and ksiginfo_free public.
2. Introduce flags KSI_EXT and KSI_INS. The flag KSI_EXT allows a ksiginfo
   to be managed by outside code, the KSI_INS indicates sigqueue_add should
   directly insert passed ksiginfo into queue other than copy it.
2005-10-23 04:12:26 +00:00
Alan Cox
0a5a219830 Verify that access to the given address is allowed from user-space.
Discussed with: rwatson@
2005-10-22 20:02:59 +00:00
Alan Cox
52ad48b69f Eliminate spl* calls. 2005-10-21 05:48:38 +00:00
Robert Watson
64a266f9e8 Change format string for u_int64_t to %ju from %llu, in order to use the
correct format string on 64-bit systems.

Pointed out by:	pjd
2005-10-20 21:28:31 +00:00
Robert Watson
909ed16c2b Add a "show malloc" command to DDB, which prints out the current stats for
available kernel malloc types.  Quite useful for post-mortem debugging of
memory leaks without a dump device configured on a panicked box.

MFC after:	2 weeks
2005-10-20 17:41:47 +00:00
John Baldwin
2a2b58faa4 Add entry for the spin mutex used by the hptmv(4) driver.
MFC after: 	1 day
Tested by:	Philip Kizer pckizer at nostrum dot com
2005-10-20 14:49:59 +00:00
John Polstra
135c43dc52 Fix a bug in the kernel module runtime linker that made it impossible
to unload the usb.ko module after boot if it was originally preloaded
from "/boot/loader.conf".  When processing preloaded modules, the
linker erroneously added self-dependencies the each module's reference
count.  That prevented usb.ko's reference count from ever going to 0,
so it could not be unloaded.

Sponsored by Isilon Systems.

Reviewed by:	pjd, peter
MFC after:	1 week
2005-10-19 20:40:30 +00:00
John Baldwin
8c4b6380c7 Move the initialization of the devmtx into the mutex_init() function
called during early init before cninit().

Tested on:	i386, alpha, sparc64
Reviewed by:	phk, imp
Reported by:	Divacky Roman xdivac02 at stud dot fit dot vutbr dot cz
MFC after:	1 week
2005-10-18 18:27:44 +00:00
Stefan Farfeleder
d60e86c86e Const-qualify ksem_timedwait's parameter abstime as it's only passed in. 2005-10-18 11:46:24 +00:00
Peter Wemm
1a330eb01d Add support for kernel modules with a single PT_LOAD section.
While here, support up to four sections because it was trivial to do
and cheap. (One pointer per section).

For amd64 with "-fpic -shared" format .ko files, using a single PT_LOAD
section is important to avoid wasting about 1MB of KVM and physical ram
for the 'gap' between the two PT_LOAD sections.  amd64 normally uses
.o format kld files and isn't affected normally.  But -fpic -shared modules
are actually possible to produce and load...  (And with a bugfix to
binutils, we can build and use plain -shared .ko files without -fpic)

i386 only wastes 4K per .ko file, so that isn't such a big deal there.
2005-10-17 23:21:55 +00:00
Poul-Henning Kamp
5ef5ee7b62 Use new functions to call into drivers methods. 2005-10-16 21:07:31 +00:00
Poul-Henning Kamp
7423b2b40c Make ttyconsolemode() call ttsetwater() so that drivers don't have to. 2005-10-16 20:58:22 +00:00
Poul-Henning Kamp
51514bc484 Make ttsetcompat() static 2005-10-16 20:40:40 +00:00
Poul-Henning Kamp
733634738e Eliminate two unused arguments to ttycreate(). 2005-10-16 20:22:56 +00:00
Paul Saab
a372f8224c Implement the 32bit versions of recvmsg, recvfrom, sendmsg
Partially obtained from:	jhb
2005-10-15 05:57:06 +00:00
Paul Saab
f0b479cd75 Implement 32bit wrappers for clock_gettime, clock_settime, and
clock_getres.
2005-10-15 02:54:18 +00:00
Kris Kennaway
14cdc36456 mpsafevm has been stable and defaulted to 1 on sparc64 for over 6 months,
so we are ready for mpsafevfs=1 by default on sparc64 too.  I have been
running this on all my sparc64 machines for over 6 months, and have not
encountered MD problems.

MFC after:	1 week
2005-10-14 23:56:13 +00:00
Kris Kennaway
f098dcded5 Partially revert revision 1.66, which contained a change that did not
correspond to the commit log.  It changed the maxswzone and maxbcache
parameters from int to long, without changing the extern definitions
in <sys/buf.h>.

In fact it's a good thing it did not, because other parts of the system
are not yet ready for this, and on large-memory sparc machines it causes
severe filesystem damage if you try.

The worst effect of the change was that the tunables controlling the
above variables stopped working.  These were necessary to allow such
large sparc64 machines (with >12GB RAM) to boot, since sparc64 did not
set a hard-coded upper limit on these parameters and they ended
up overflowing an int, causing an infinite loop at boot in bufinit().

Reviewed by:	mlaier
2005-10-14 19:15:10 +00:00
David Xu
823acd70b6 Regen for sigqueue syscall. 2005-10-14 12:56:28 +00:00
David Xu
9104847f21 1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
   sendsig use discrete parameters, now they uses member fields of
   ksiginfo_t structure. For sendsig, this change allows us to pass
   POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
   generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
   blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
   thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
   be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
   an extra siginfo list holds additional siginfo_t data for signals.
   kernel code uses psignal() still behavior as before, it won't be failed
   even under memory pressure, only exception is when deleting a signal,
   we should call sigqueue_delete to remove signal from sigqueue but
   not SIGDELSET. Current there is no kernel code will deliver a signal
   with additional data, so kernel should be as stable as before,
   a ksiginfo can carry more information, for example, allow signal to
   be delivered but throw away siginfo data if memory is not enough.
   SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
   not be caught or masked.
   The sigqueue() syscall allows user code to queue a signal to target
   process, if resource is unavailable, EAGAIN will be returned as
   specification said.
   Just before thread exits, signal queue memory will be freed by
   sigqueue_flush.
   Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64
2005-10-14 12:43:47 +00:00
Doug Ambrisko
db43cd0417 Fix tinderbox box by removing incomplete/bad spl usage. Proper giant free
locking is required in for aio.

Pointed out by:	imp
2005-10-12 22:33:22 +00:00
Doug Ambrisko
69cd28dacb Add in kqueue support to LIO event notification and fix how it handled
notifications when LIO operations completed.  These were the problems
with LIO event complete notification:
      -	Move all LIO/AIO event notification into one general function
	so we don't have bugs in different data paths.  This unification
	got rid of several notification bugs one of which if kqueue was
	used a SIGILL could get sent to the process.
      -	Change the LIO event accounting to count all AIO request that
	could have been split across the fast path and daemon mode.
	The prior accounting only kept track of AIO op's in that
	mode and not the entire list of operations.  This could cause
	a bogus LIO event complete notification to occur when all of
	the fast path AIO op's completed and not the AIO op's that
	ended up queued for the daemon.

Suggestions from:	alc
2005-10-12 17:51:31 +00:00
Diomidis Spinellis
9f5c1d1955 Move execve's access time update functionality into a new
vfs_mark_atime() function, and use the new function for
performing efficient atime updates in mmap().

Reviewed by:	bde
MFC after:	2 weeks
2005-10-12 06:56:00 +00:00
Tor Egge
8272da3106 Release clean buffer with wrong size and no dependencies also for non-VMIO
case.
2005-10-09 22:41:25 +00:00
Marcel Moolenaar
125fbd3cdc Add parse_uuid() that creates a binary representation of an UUID from
a string representation.
2005-10-07 13:37:10 +00:00
Poul-Henning Kamp
0694506637 Eliminate __RMAN_RESOURCE_VISIBLE hack entirely by moving the struct
resource_ to subr_rman.c where it belongs.
2005-10-06 21:49:31 +00:00
Gleb Smirnoff
f0796cd26c - Don't pollute opt_global.h with DEVICE_POLLING and introduce
opt_device_polling.h
- Include opt_device_polling.h into appropriate files.
- Embrace with HAVE_KERNEL_OPTION_HEADERS the include in the files that
  can be compiled as loadable modules.

Reviewed by:	bde
2005-10-05 10:09:17 +00:00
Warner Losh
2f624c21c5 When data passed into devctl_notify is NULL, don't print (null). Instead
don't print anything at all.

# this fixes a problem that I noticed with devd.pipe not terminating lines
# with \n correctly sometimes.
2005-10-04 22:25:14 +00:00
Robert Watson
7723d5ed12 Re-order MAC and DAC checks in shmget() in order to give precedence to
the MAC result, as well as avoid losing the DAC check result when MAC
is enabled.

MFC after:	3 days
Reported by:	Patrick LeBlanc <Patrick dot LeBlanc at sparta dot com>
2005-10-04 16:40:20 +00:00
Roman Kurakin
826cf005ed Use FILEDESC_UNLOCK(fdp) after FILE_UNLOCK(p), not before to avoid LOR.
Slightly discussed on current@.

LOR #055

MFC after:	14 days
2005-10-04 16:27:54 +00:00
Christian S.J. Peron
9eea3d85cc Standard Giant push down operations for the Mandatory Access Control (MAC)
framework. This makes Giant protection around MAC operations which inter-
act with VFS conditional, based on the MPSAFE status of the file system.

Affected the following syscalls:

o __mac_get_fd
o __mac_get_file
o __mac_get_link
o __mac_set_fd
o __mac_set_file
o __mac_set_link

-Drop Giant all together in __mac_set_proc because the
 mac_cred_mmapped_drop_perms_recurse routine no longer requires it.
-Move conditional Giant aquisitions to after label allocation routines.
-Move the conditional release of Giant to before label de-allocation
 routines.

Discussed with:	rwatson
2005-10-04 14:32:58 +00:00
Don Lewis
34ea500bea Add missing word to comment. 2005-10-04 04:02:33 +00:00
Gleb Smirnoff
e113edf30a o Move a lot of parameter checking from netisr_poll() to
dedicated sysctl handlers. Protect manipulations with
  poll_mtx. The affected sysctls are:
  - kern.polling.burst_max
  - kern.polling.each_burst
  - kern.polling.user_frac
  - kern.polling.reg_frac
o Use CTLFLAG_RD on MIBs that supposed to be read-only.
o u_int32t -> uint32_t
o Remove unneeded locking from poll_switch().
2005-10-03 14:15:26 +00:00
Colin Percival
33812c066d If sufficiently bad things happen during a call to kern_execve(), it is
possible for do_execve() to call exit1() rather than returning.  As a
result, the sequence "allocate memory; call kern_execve; free memory"
can end up leaking memory.

This commit documents this astonishing behaviour and adds a call to
exec_free_args() before the exit1() call in do_execve().  Since all
the users of kern_execve() in the tree use exec_free_args() to free
the command-line arguments after kern_execve() returns, this should
be safe, and it fixes the memory leak which can otherwise occur.

Submitted by:	Peter Holm
MFC after:	3 days
Security:	Local denial of service
2005-10-03 12:49:54 +00:00
Hajimu UMEMOTO
56e5a87a55 make saved cpu level stackable. 2005-10-03 06:57:29 +00:00
Don Lewis
5032ff8197 Always wire the sysctl output buffer in sysctl_kern_proc() before
calling sysctl_out_proc().  -- fix from jhb

Move the code in fill_kinfo_thread() that gathers data from struct proc
into the new function fill_kinfo_proc_only().

Change all callers of fill_kinfo_thread() to call both
fill_kinfo_proc_only() and fill_kinfo() thread.  When gathering
data from a multi-threaded process, fill_kinfo_proc_only() only needs
to be called once.

Grab sched_lock before accessing the process thread list or calling
fill_kinfo_thread().

PR:		kern/84684
MFC after:	3 days
2005-10-02 23:27:56 +00:00
Robert Watson
c30bf5c317 Include kdb.h so that kdb_active is declared regardless of KDB being
included in the kernel.

MFC after:	0 days
2005-10-02 10:03:51 +00:00
Poul-Henning Kamp
7bbb3a2690 Make sure the clone lists are sorted in the right order.
Explosion triggered by:	pjd
MFC:	3 days
2005-10-01 19:21:03 +00:00
Gleb Smirnoff
4092996774 Big polling(4) cleanup.
o Axe poll in trap.

o Axe IFF_POLLING flag from if_flags.

o Rework revision 1.21 (Giant removal), in such a way that
  poll_mtx is not dropped during call to polling handler.
  This fixes problem with idle polling.

o Make registration and deregistration from polling in a
  functional way, insted of next tick/interrupt.

o Obsolete kern.polling.enable. Polling is turned on/off
  with ifconfig.

Detailed kern_poll.c changes:
  - Remove polling handler flags, introduced in 1.21. The are not
    needed now.
  - Forget and do not check if_flags, if_capenable and if_drv_flags.
  - Call all registered polling handlers unconditionally.
  - Do not drop poll_mtx, when entering polling handlers.
  - In ether_poll() NET_LOCK_GIANT prior to locking poll_mtx.
  - In netisr_poll() axe the block, where polling code asks drivers
    to unregister.
  - In netisr_poll() and ether_poll() do polling always, if any
    handlers are present.
  - In ether_poll_[de]register() remove a lot of error hiding code. Assert
    that arguments are correct, instead.
  - In ether_poll_[de]register() use standard return values in case of
    error or success.
  - Introduce poll_switch() that is a sysctl handler for kern.polling.enable.
    poll_switch() goes through interface list and enabled/disables polling.
    A message that kern.polling.enable is deprecated is printed.

Detailed driver changes:
  - On attach driver announces IFCAP_POLLING in if_capabilities, but
    not in if_capenable.
  - On detach driver calls ether_poll_deregister() if polling is enabled.
  - In polling handler driver obtains its lock and checks IFF_DRV_RUNNING
    flag. If there is no, then unlocks and returns.
  - In ioctl handler driver checks for IFCAP_POLLING flag requested to
    be set or cleared. Driver first calls ether_poll_[de]register(), then
    obtains driver lock and [dis/en]ables interrupts.
  - In interrupt handler driver checks IFCAP_POLLING flag in if_capenable.
    If present, then returns.This is important to protect from spurious
    interrupts.

Reviewed by:	ru, sam, jhb
2005-10-01 18:56:19 +00:00
Don Lewis
5997cae9a4 Copy new process argument list in do_execve() before grabbing PROC_LOCK
to avoid touching pageable memory while holding a mutex.

Simplify argument list replacement logic.

PR:		kern/84935
Submitted by:	"Antoine Pelisse" apelisse AT gmail.com (in a different form)
MFC after:	3 days
2005-10-01 08:33:56 +00:00
Don Lewis
bd3c2d867d Un-staticize waitrunningbufspace() and call it before returning from
ffs_copyonwrite() if any async writes were launched.

Restore the threads previous TDP_NORUNNINGBUF state before returning
from ffs_copyonwrite().
2005-09-30 18:07:41 +00:00
David Xu
763a429571 Fox a LOR of sleep and sched_lock by using a timeout wait
when process reaches maximum number of threads.

MFC after: 3 days
2005-09-30 06:09:41 +00:00
Don Lewis
6c8b634f1d Un-staticize runningbufwakeup() and staticize updateproc.
Add a new private thread flag to indicate that the thread should
not sleep if runningbufspace is too large.

Set this flag on the bufdaemon and syncer threads so that they skip
the waitrunningbufspace() call in bufwrite() rather than than
checking the proc pointer vs. the known proc pointers for these two
threads.  A way of preventing these threads from being starved for
I/O but still placing limits on their outstanding I/O would be
desirable.

Set this flag in ffs_copyonwrite() to prevent bufwrite() calls from
blocking on the runningbufspace check while holding snaplk.  This
prevents snaplk from being held for an arbitrarily long period of
time if runningbufspace is high and greatly reduces the contention
for snaplk.  The disadvantage is that ffs_copyonwrite() can start
a large amount of I/O if there are a large number of snapshots,
which could cause a deadlock in other parts of the code.

Call runningbufwakeup() in ffs_copyonwrite() to decrement runningbufspace
before attempting to grab snaplk so that I/O requests waiting on
snaplk are not counted in runningbufspace as being in-progress.
Increment runningbufspace again before actually launching the
original I/O request.

Prior to the above two changes, the system could deadlock if enough
I/O requests were blocked by snaplk to prevent runningbufspace from
falling below lorunningspace and one of the bawrite() calls in
ffs_copyonwrite() blocked in waitrunningbufspace() while holding
snaplk.

See <http://www.holm.cc/stress/log/cons143.html>
2005-09-30 01:30:01 +00:00
John Baldwin
b65089ccb5 Trim a couple of unneeded includes. 2005-09-29 19:13:52 +00:00
Peter Edwards
d41c4674c2 Close a race in biodone(), whereby the bio_done field of the passed
bio may have been freed and reassigned by the wakeup before being
tested after releasing the bdonelock.

There's a non-zero chance this is the cause of a few of the crashes
knocking around with biodone() sitting in the stack backtrace.

Reviewed By: phk@
2005-09-29 10:37:20 +00:00
Poul-Henning Kamp
64fd97df54 puc(4) does strange things to resources in order to fool the
subdrivers to hook up.

It should probably be rewritten to implement a simple bus to which
the sub drivers attach using some kind of hint.

Until then, provide a couple of crutch functions with big warning
signs so it can survive the recent changes to struct resource.
2005-09-28 18:06:25 +00:00
Robert Watson
5f419982c2 Back out alpha/alpha/trap.c:1.124, osf1_ioctl.c:1.14, osf1_misc.c:1.57,
osf1_signal.c:1.41, amd64/amd64/trap.c:1.291, linux_socket.c:1.60,
svr4_fcntl.c:1.36, svr4_ioctl.c:1.23, svr4_ipc.c:1.18, svr4_misc.c:1.81,
svr4_signal.c:1.34, svr4_stat.c:1.21, svr4_stream.c:1.55,
svr4_termios.c:1.13, svr4_ttold.c:1.15, svr4_util.h:1.10,
ext2_alloc.c:1.43, i386/i386/trap.c:1.279, vm86.c:1.58,
unaligned.c:1.12, imgact_elf.c:1.164, ffs_alloc.c:1.133:

Now that Giant is acquired in uprintf() and tprintf(), the caller no
longer leads to acquire Giant unless it also holds another mutex that
would generate a lock order reversal when calling into these functions.
Specifically not backed out is the acquisition of Giant in nfs_socket.c
and rpcclnt.c, where local mutexes are held and would otherwise violate
the lock order with Giant.

This aligns this code more with the eventual locking of ttys.

Suggested by:	bde
2005-09-28 07:03:03 +00:00
Christian S.J. Peron
453f7d5369 Push Giant down in jails. Pass the MPSAFE flag to NDINIT, and keep track
of whether or not Giant was picked up by the filesystem. Add VFS_LOCK_GIANT
macros around vrele as it's possible that this can call in the VOP_INACTIVE
filesystem specific code. Also while we are here, remove the Giant assertion.
from the sysctl handler,  we do not actually require Giant here so we
shouldn't assert it. Doing so will just complicate things when Giant is removed
from the sysctl framework.
2005-09-28 00:30:56 +00:00
Robert Watson
667285c4e3 If KDB_STOP_NMI is compiled into the kernel, default
debug.kdb.stop_cpus_with_nmi to 1 rather than 0.

MFC after:	3 days
2005-09-27 21:12:05 +00:00
Robert Watson
2b59d50cfb In lockstatus(), don't lock and unlock the interlock when testing the
sleep lock status while kdb_active, or we risk contending with the
mutex on another CPU, resulting in a panic when using "show
lockedvnods" while in DDB.

MFC after:	3 days
Reviewed by:	jhb
Reported by:	kris
2005-09-27 21:02:59 +00:00
Robert Watson
32a6bd9510 No longer maintain mbstat statistics for the mbuf allocator, UMA
statistics and libmemstat(3) are now used to track mbuf statistics.

MFC after:	1 month
2005-09-27 20:28:43 +00:00
John Baldwin
7e9e371f2d Use the refcount API to manage the reference count for user credentials
rather than using pool mutexes.

Tested on:	i386, alpha, sparc64
2005-09-27 18:09:42 +00:00
John Baldwin
b2149bde1f Use the reference count API to manage the reference counts for process
limit structures rather than using pool mutexes to protect the reference
counts.

Tested on:	i386, alpha, sparc64
2005-09-27 18:07:05 +00:00
John Baldwin
55b4a5ae0d Use the refcount API to implement reference counts on process argument
structures rather than using a global mutex to protect the reference
counts.

Tested on:	i386, alpha, sparc64
2005-09-27 18:03:15 +00:00
Christian S.J. Peron
6acd4b6189 Update the "created from" section to reflect the most recent version of
syscalls.master

Requested by:	jhb
2005-09-27 14:36:59 +00:00
Christian S.J. Peron
7f300b47dd Mark the extended attribute syscalls as being MP safe.
Requested by:	jhb
2005-09-27 14:32:04 +00:00
John Baldwin
d27acf445e Add the spin lock used by the binary nvidia driver to the static lock
order list so that WITNESS and the driver play together nicely.

Tested by:	Harald Schmalzbauer
MFC after:	3 days
2005-09-26 18:30:12 +00:00
Robert Watson
9b7915859d Add "show allpcpu" to DDB, which prints the current CPU id followed by
the per-cpu data for all CPUs.  This is easier to ask users to do than
"figure out how many CPUs you have, now run show pcpu, then run it
once for each CPU you have".

MFC after:	3 days
2005-09-26 16:55:11 +00:00
David Xu
2b7182c6b7 Reorder statements to avoid accessing unknown memory.
In theory, invoking kenv with very long string can panic
kernel.
2005-09-26 14:14:55 +00:00
Robert Watson
329c75a730 Acquire Giant in uprintf() and tprintf() rather than asserting it. In
the vast majority of cases, these functions are called without mutexes
held, meaning that in all but two cases, there will be no ordering
issues with doing this, and it will eliminate the need for changes in
the caller.  In two cases, mutexes are held, so Giant must be acquired
before those mutexes such that uprintf() and tprintf() recurse Giant
rather than generating a lock order reversal.

Suggested by:	bde
2005-09-26 08:02:24 +00:00
Poul-Henning Kamp
2b35175c8a Add rman_is_region_manager() for the benefit of an alpha hack. 2005-09-25 20:10:10 +00:00
Christian S.J. Peron
c47a4d1c9f Implement new world order in VFS locking for extended attributes. This will
remove the unconditional acquisition of Giant for extended attribute related
operations. If the file system is set as being MP safe and debug.mpsafevfs is
1, do not pickup Giant.

Mark the following system calls as being MP safe so we no longer pickup Giant
in the system call handler:

o extattrctl
o extattr_set_file
o extattr_get_file
o extattr_delete_file
o extattr_set_fd
o extattr_get_fd
o extattr_delete_fd
o extattr_set_link
o extattr_get_link
o extattr_delete_link
o extattr_list_file
o extattr_list_link
o extattr_list_fd

-Pass MPSAFE flags to namei(9) lookup and introduce vfslocked variable which
 will keep track of any Giant acquisitions.
-Wrap any fd operations which manipulate vnodes in VFS_{UN}LOCK_GIANT
-Drop VFS_ASSERT_GIANT into function which operate on vnodes to ensure that
 we are sufficiently protected.

I've tested these changes with various TrustedBSD MAC policies which use
extended attribute a lot on SMP and UP systems (thanks to Scott Long for
making some SMP hardware available to me for testing).

Discussed with:	jeff
Requested by:	jhb, rwatson
2005-09-24 23:47:04 +00:00