Commit Graph

8790 Commits

Author SHA1 Message Date
John Baldwin
e0f66ef861 Reorganize the interrupt handling code a bit to make a few things cleaner
and increase flexibility to allow various different approaches to be tried
in the future.
- Split struct ithd up into two pieces.  struct intr_event holds the list
  of interrupt handlers associated with interrupt sources.
  struct intr_thread contains the data relative to an interrupt thread.
  Currently we still provide a 1:1 relationship of events to threads
  with the exception that events only have an associated thread if there
  is at least one threaded interrupt handler attached to the event.  This
  means that on x86 we no longer have 4 bazillion interrupt threads with
  no handlers.  It also means that interrupt events with only INTR_FAST
  handlers no longer have an associated thread either.
- Renamed struct intrhand to struct intr_handler to follow the struct
  intr_foo naming convention.  This did require renaming the powerpc
  MD struct intr_handler to struct ppc_intr_handler.
- INTR_FAST no longer implies INTR_EXCL on all architectures except for
  powerpc.  This means that multiple INTR_FAST handlers can attach to the
  same interrupt and that INTR_FAST and non-INTR_FAST handlers can attach
  to the same interrupt.  Sharing INTR_FAST handlers may not always be
  desirable, but having sio(4) and uhci(4) fight over an IRQ isn't fun
  either.  Drivers can always still use INTR_EXCL to ask for an interrupt
  exclusively.  The way this sharing works is that when an interrupt
  comes in, all the INTR_FAST handlers are executed first, and if any
  threaded handlers exist, the interrupt thread is scheduled afterwards.
  This type of layout also makes it possible to investigate using interrupt
  filters ala OS X where the filter determines whether or not its companion
  threaded handler should run.
- Aside from the INTR_FAST changes above, the impact on MD interrupt code
  is mostly just 's/ithread/intr_event/'.
- A new MI ddb command 'show intrs' walks the list of interrupt events
  dumping their state.  It also has a '/v' verbose switch which dumps
  info about all of the handlers attached to each event.
- We currently don't destroy an interrupt thread when the last threaded
  handler is removed because it would suck for things like ppbus(8)'s
  braindead behavior.  The code is present, though, it is just under
  #if 0 for now.
- Move the code to actually execute the threaded handlers for an interrrupt
  event into a separate function so that ithread_loop() becomes more
  readable.  Previously this code was all in the middle of ithread_loop()
  and indented halfway across the screen.
- Made struct intr_thread private to kern_intr.c and replaced td_ithd
  with a thread private flag TDP_ITHREAD.
- In statclock, check curthread against idlethread directly rather than
  curthread's proc against idlethread's proc. (Not really related to intr
  changes)

Tested on:	alpha, amd64, i386, sparc64
Tested on:	arm, ia64 (older version of patch by cognet and marcel)
2005-10-25 19:48:48 +00:00
John Baldwin
6caf758e7c Use shorter names for the Giant and fast taskqueues so that their names
actually fit.
2005-10-25 19:29:02 +00:00
John Baldwin
58553b9925 Rename the KDB_STOP_NMI kernel option to STOP_NMI and make it apply to all
IPI_STOP IPIs.
- Change the i386 and amd64 MD IPI code to send an NMI if STOP_NMI is
  enabled if an attempt is made to send an IPI_STOP IPI.  If the kernel
  option is enabled, there is also a sysctl to change the behavior at
  runtime (debug.stop_cpus_with_nmi which defaults to enabled).  This
  includes removing stop_cpus_nmi() and making ipi_nmi_selected() a
  private function for i386 and amd64.
- Fix ipi_all(), ipi_all_but_self(), and ipi_self() on i386 and amd64 to
  properly handle bitmapped IPIs as well as IPI_STOP IPIs when STOP_NMI is
  enabled.
- Fix ipi_nmi_handler() to execute the restart function on the first CPU
  that is restarted making use of atomic_readandclear() rather than
  assuming that the BSP is always included in the set of restarted CPUs.
  Also, the NMI handler didn't clear the function pointer meaning that
  subsequent stop and restarts could execute the function again.
- Define a new macro HAVE_STOPPEDPCBS on i386 and amd64 to control the use
  of stoppedpcbs[] and always enable it for i386 and amd64 instead of
  being dependent on KDB_STOP_NMI.  It works fine in both the NMI and
  non-NMI cases.
2005-10-24 21:04:19 +00:00
John Baldwin
6b1e0d75b0 - Various small whitespace and style nits.
- Use PCPU_GET(cpumask) in preference to 1 << PCPU_GET(cpuid) in a few
  places.
2005-10-24 20:31:04 +00:00
John Baldwin
f55ab99409 Document in #ifdef notnow code the actions that proc_fini would need to
take if struct procs were actually freed.
2005-10-24 20:15:23 +00:00
John Baldwin
cf23efc12a Don't panic if a spin lock is initialized that isn't in our static order
list.  Just warn about it instead.

Requested by:	scottl
MFC after:	1 day
2005-10-24 20:14:24 +00:00
John Baldwin
8d2a5b8c34 Revert previous change to this file. I accidentally committed while
fixing spelling in a comment.
2005-10-24 15:58:21 +00:00
John Baldwin
971d0ad835 Spell hierarchy correctly in comments.
Submitted by:	Wojciech A. Koszek dunstan at freebsd dot czest dot pl
2005-10-24 15:57:27 +00:00
Stephan Uphoff
198b0a3b71 Only set B_RAM (Read ahead mark) on an incore buffers if we can lock it.
This fixes a race condition caused by the unlocked write access to the
b_flags field.

MFC after:	3 days
2005-10-24 14:23:04 +00:00
David Xu
fe80a39034 Don't touch last overrun if signal was already on queue. 2005-10-23 22:59:33 +00:00
David Xu
60354683d9 Make p_itimers as a pointer, so file sys/proc.h does not need to include
sys/timers.h.
2005-10-23 12:19:08 +00:00
Alan Cox
5b5908005a Previously, nothing prevented the page that was returned by pmap_extract()
from being reclaimed before it was wired.  Use pmap_extract_and_hold()
instead of pmap_extract() and retain the hold on the page until it has been
wired.
2005-10-23 07:41:56 +00:00
David Xu
e706ee8a16 Regen for POSIX timer syscalls. 2005-10-23 04:26:10 +00:00
David Xu
86857b368d Implement POSIX timers. Current only CLOCK_REALTIME and CLOCK_MONOTONIC
clock are supported. I have plan to merge XSI timer ITIMER_REAL and other
two CPU timers into the new code, current three slots are available for
the XSI timers.
The SIGEV_THREAD notification type is not supported yet because our
sigevent struct lacks of two member fields:
sigev_notify_function
sigev_notify_attributes
I have found the sigevent is used in AIO, so I won't add the two members
unless the AIO code is adjusted.
2005-10-23 04:22:56 +00:00
David Xu
5da49fcb8a 1. Make ksiginfo_alloc and ksiginfo_free public.
2. Introduce flags KSI_EXT and KSI_INS. The flag KSI_EXT allows a ksiginfo
   to be managed by outside code, the KSI_INS indicates sigqueue_add should
   directly insert passed ksiginfo into queue other than copy it.
2005-10-23 04:12:26 +00:00
Alan Cox
0a5a219830 Verify that access to the given address is allowed from user-space.
Discussed with: rwatson@
2005-10-22 20:02:59 +00:00
Alan Cox
52ad48b69f Eliminate spl* calls. 2005-10-21 05:48:38 +00:00
Robert Watson
64a266f9e8 Change format string for u_int64_t to %ju from %llu, in order to use the
correct format string on 64-bit systems.

Pointed out by:	pjd
2005-10-20 21:28:31 +00:00
Robert Watson
909ed16c2b Add a "show malloc" command to DDB, which prints out the current stats for
available kernel malloc types.  Quite useful for post-mortem debugging of
memory leaks without a dump device configured on a panicked box.

MFC after:	2 weeks
2005-10-20 17:41:47 +00:00
John Baldwin
2a2b58faa4 Add entry for the spin mutex used by the hptmv(4) driver.
MFC after: 	1 day
Tested by:	Philip Kizer pckizer at nostrum dot com
2005-10-20 14:49:59 +00:00
John Polstra
135c43dc52 Fix a bug in the kernel module runtime linker that made it impossible
to unload the usb.ko module after boot if it was originally preloaded
from "/boot/loader.conf".  When processing preloaded modules, the
linker erroneously added self-dependencies the each module's reference
count.  That prevented usb.ko's reference count from ever going to 0,
so it could not be unloaded.

Sponsored by Isilon Systems.

Reviewed by:	pjd, peter
MFC after:	1 week
2005-10-19 20:40:30 +00:00
John Baldwin
8c4b6380c7 Move the initialization of the devmtx into the mutex_init() function
called during early init before cninit().

Tested on:	i386, alpha, sparc64
Reviewed by:	phk, imp
Reported by:	Divacky Roman xdivac02 at stud dot fit dot vutbr dot cz
MFC after:	1 week
2005-10-18 18:27:44 +00:00
Stefan Farfeleder
d60e86c86e Const-qualify ksem_timedwait's parameter abstime as it's only passed in. 2005-10-18 11:46:24 +00:00
Peter Wemm
1a330eb01d Add support for kernel modules with a single PT_LOAD section.
While here, support up to four sections because it was trivial to do
and cheap. (One pointer per section).

For amd64 with "-fpic -shared" format .ko files, using a single PT_LOAD
section is important to avoid wasting about 1MB of KVM and physical ram
for the 'gap' between the two PT_LOAD sections.  amd64 normally uses
.o format kld files and isn't affected normally.  But -fpic -shared modules
are actually possible to produce and load...  (And with a bugfix to
binutils, we can build and use plain -shared .ko files without -fpic)

i386 only wastes 4K per .ko file, so that isn't such a big deal there.
2005-10-17 23:21:55 +00:00
Poul-Henning Kamp
5ef5ee7b62 Use new functions to call into drivers methods. 2005-10-16 21:07:31 +00:00
Poul-Henning Kamp
7423b2b40c Make ttyconsolemode() call ttsetwater() so that drivers don't have to. 2005-10-16 20:58:22 +00:00
Poul-Henning Kamp
51514bc484 Make ttsetcompat() static 2005-10-16 20:40:40 +00:00
Poul-Henning Kamp
733634738e Eliminate two unused arguments to ttycreate(). 2005-10-16 20:22:56 +00:00
Paul Saab
a372f8224c Implement the 32bit versions of recvmsg, recvfrom, sendmsg
Partially obtained from:	jhb
2005-10-15 05:57:06 +00:00
Paul Saab
f0b479cd75 Implement 32bit wrappers for clock_gettime, clock_settime, and
clock_getres.
2005-10-15 02:54:18 +00:00
Kris Kennaway
14cdc36456 mpsafevm has been stable and defaulted to 1 on sparc64 for over 6 months,
so we are ready for mpsafevfs=1 by default on sparc64 too.  I have been
running this on all my sparc64 machines for over 6 months, and have not
encountered MD problems.

MFC after:	1 week
2005-10-14 23:56:13 +00:00
Kris Kennaway
f098dcded5 Partially revert revision 1.66, which contained a change that did not
correspond to the commit log.  It changed the maxswzone and maxbcache
parameters from int to long, without changing the extern definitions
in <sys/buf.h>.

In fact it's a good thing it did not, because other parts of the system
are not yet ready for this, and on large-memory sparc machines it causes
severe filesystem damage if you try.

The worst effect of the change was that the tunables controlling the
above variables stopped working.  These were necessary to allow such
large sparc64 machines (with >12GB RAM) to boot, since sparc64 did not
set a hard-coded upper limit on these parameters and they ended
up overflowing an int, causing an infinite loop at boot in bufinit().

Reviewed by:	mlaier
2005-10-14 19:15:10 +00:00
David Xu
823acd70b6 Regen for sigqueue syscall. 2005-10-14 12:56:28 +00:00
David Xu
9104847f21 1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
   sendsig use discrete parameters, now they uses member fields of
   ksiginfo_t structure. For sendsig, this change allows us to pass
   POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
   generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
   blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
   thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
   be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
   an extra siginfo list holds additional siginfo_t data for signals.
   kernel code uses psignal() still behavior as before, it won't be failed
   even under memory pressure, only exception is when deleting a signal,
   we should call sigqueue_delete to remove signal from sigqueue but
   not SIGDELSET. Current there is no kernel code will deliver a signal
   with additional data, so kernel should be as stable as before,
   a ksiginfo can carry more information, for example, allow signal to
   be delivered but throw away siginfo data if memory is not enough.
   SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
   not be caught or masked.
   The sigqueue() syscall allows user code to queue a signal to target
   process, if resource is unavailable, EAGAIN will be returned as
   specification said.
   Just before thread exits, signal queue memory will be freed by
   sigqueue_flush.
   Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64
2005-10-14 12:43:47 +00:00
Doug Ambrisko
db43cd0417 Fix tinderbox box by removing incomplete/bad spl usage. Proper giant free
locking is required in for aio.

Pointed out by:	imp
2005-10-12 22:33:22 +00:00
Doug Ambrisko
69cd28dacb Add in kqueue support to LIO event notification and fix how it handled
notifications when LIO operations completed.  These were the problems
with LIO event complete notification:
      -	Move all LIO/AIO event notification into one general function
	so we don't have bugs in different data paths.  This unification
	got rid of several notification bugs one of which if kqueue was
	used a SIGILL could get sent to the process.
      -	Change the LIO event accounting to count all AIO request that
	could have been split across the fast path and daemon mode.
	The prior accounting only kept track of AIO op's in that
	mode and not the entire list of operations.  This could cause
	a bogus LIO event complete notification to occur when all of
	the fast path AIO op's completed and not the AIO op's that
	ended up queued for the daemon.

Suggestions from:	alc
2005-10-12 17:51:31 +00:00
Diomidis Spinellis
9f5c1d1955 Move execve's access time update functionality into a new
vfs_mark_atime() function, and use the new function for
performing efficient atime updates in mmap().

Reviewed by:	bde
MFC after:	2 weeks
2005-10-12 06:56:00 +00:00
Tor Egge
8272da3106 Release clean buffer with wrong size and no dependencies also for non-VMIO
case.
2005-10-09 22:41:25 +00:00
Marcel Moolenaar
125fbd3cdc Add parse_uuid() that creates a binary representation of an UUID from
a string representation.
2005-10-07 13:37:10 +00:00
Poul-Henning Kamp
0694506637 Eliminate __RMAN_RESOURCE_VISIBLE hack entirely by moving the struct
resource_ to subr_rman.c where it belongs.
2005-10-06 21:49:31 +00:00
Gleb Smirnoff
f0796cd26c - Don't pollute opt_global.h with DEVICE_POLLING and introduce
opt_device_polling.h
- Include opt_device_polling.h into appropriate files.
- Embrace with HAVE_KERNEL_OPTION_HEADERS the include in the files that
  can be compiled as loadable modules.

Reviewed by:	bde
2005-10-05 10:09:17 +00:00
Warner Losh
2f624c21c5 When data passed into devctl_notify is NULL, don't print (null). Instead
don't print anything at all.

# this fixes a problem that I noticed with devd.pipe not terminating lines
# with \n correctly sometimes.
2005-10-04 22:25:14 +00:00
Robert Watson
7723d5ed12 Re-order MAC and DAC checks in shmget() in order to give precedence to
the MAC result, as well as avoid losing the DAC check result when MAC
is enabled.

MFC after:	3 days
Reported by:	Patrick LeBlanc <Patrick dot LeBlanc at sparta dot com>
2005-10-04 16:40:20 +00:00
Roman Kurakin
826cf005ed Use FILEDESC_UNLOCK(fdp) after FILE_UNLOCK(p), not before to avoid LOR.
Slightly discussed on current@.

LOR #055

MFC after:	14 days
2005-10-04 16:27:54 +00:00
Christian S.J. Peron
9eea3d85cc Standard Giant push down operations for the Mandatory Access Control (MAC)
framework. This makes Giant protection around MAC operations which inter-
act with VFS conditional, based on the MPSAFE status of the file system.

Affected the following syscalls:

o __mac_get_fd
o __mac_get_file
o __mac_get_link
o __mac_set_fd
o __mac_set_file
o __mac_set_link

-Drop Giant all together in __mac_set_proc because the
 mac_cred_mmapped_drop_perms_recurse routine no longer requires it.
-Move conditional Giant aquisitions to after label allocation routines.
-Move the conditional release of Giant to before label de-allocation
 routines.

Discussed with:	rwatson
2005-10-04 14:32:58 +00:00
Don Lewis
34ea500bea Add missing word to comment. 2005-10-04 04:02:33 +00:00
Gleb Smirnoff
e113edf30a o Move a lot of parameter checking from netisr_poll() to
dedicated sysctl handlers. Protect manipulations with
  poll_mtx. The affected sysctls are:
  - kern.polling.burst_max
  - kern.polling.each_burst
  - kern.polling.user_frac
  - kern.polling.reg_frac
o Use CTLFLAG_RD on MIBs that supposed to be read-only.
o u_int32t -> uint32_t
o Remove unneeded locking from poll_switch().
2005-10-03 14:15:26 +00:00
Colin Percival
33812c066d If sufficiently bad things happen during a call to kern_execve(), it is
possible for do_execve() to call exit1() rather than returning.  As a
result, the sequence "allocate memory; call kern_execve; free memory"
can end up leaking memory.

This commit documents this astonishing behaviour and adds a call to
exec_free_args() before the exit1() call in do_execve().  Since all
the users of kern_execve() in the tree use exec_free_args() to free
the command-line arguments after kern_execve() returns, this should
be safe, and it fixes the memory leak which can otherwise occur.

Submitted by:	Peter Holm
MFC after:	3 days
Security:	Local denial of service
2005-10-03 12:49:54 +00:00
Hajimu UMEMOTO
56e5a87a55 make saved cpu level stackable. 2005-10-03 06:57:29 +00:00
Don Lewis
5032ff8197 Always wire the sysctl output buffer in sysctl_kern_proc() before
calling sysctl_out_proc().  -- fix from jhb

Move the code in fill_kinfo_thread() that gathers data from struct proc
into the new function fill_kinfo_proc_only().

Change all callers of fill_kinfo_thread() to call both
fill_kinfo_proc_only() and fill_kinfo() thread.  When gathering
data from a multi-threaded process, fill_kinfo_proc_only() only needs
to be called once.

Grab sched_lock before accessing the process thread list or calling
fill_kinfo_thread().

PR:		kern/84684
MFC after:	3 days
2005-10-02 23:27:56 +00:00