Commit Graph

7937 Commits

Author SHA1 Message Date
rwatson
2b329ffd2f Rework sofree() logic to take into account a possible race with accept().
Sockets in the listen queues have reference counts of 0, so if the
protocol decides to disconnect the pcb and try to free the socket, this
triggered a race with accept() wherein accept() would bump the reference
count before sofree() had removed the socket from the listen queues,
resulting in a panic in sofree() when it discovered it was freeing a
referenced socket.  This might happen if a RST came in prior to accept()
on a TCP connection.

The fix is two-fold: to expand the coverage of the accept mutex earlier
in sofree() to prevent accept() from grabbing the socket after the "is it
really safe to free" tests, and to expand the logic of the "is it really
safe to free" tests to check that the refcount is still 0 (i.e., we
didn't race).

RELENG_5 candidate.

Much discussion with and work by:	green
Reported by:	Marc UBM Bocklet <ubm at u-boot-man dot de>
Reported by:	Vlad <marchenko at gmail dot com>
2004-10-11 08:11:26 +00:00
glebius
0dda31b4f9 Revert last commit since it breaks API.
Requested by:	sam
2004-10-10 09:16:48 +00:00
julian
30d2ba06b9 Don't release the slot twice.. sched_rem() has already done it.
Submitted by:	stephan uphoff (ups at tree dot com)
MFC after:	3 days
2004-10-10 05:19:22 +00:00
julian
8c3d54b9e4 Remove duplicate line. 2004-10-10 05:07:43 +00:00
glebius
0c7bb9f633 Remove inlined m_tag_free(). Rename _m_tag_free() to m_tag_free()
and make it visible (same way as in OpenBSD). Describe usage in manpage.

This change is useful for creating custom free methods, which
call default free method at their end.

While here, make malloc declaration for mbuf tags more informative.

Approved by:	julian (mentor), sam
MFC after:	1 month
2004-10-09 13:25:19 +00:00
green
3a482df790 Don't "implicitly order all sleep locks before spin locks" in witness
when the spin lock in question isn't -- it's the critical_enter() that
KDB set.  No more panic in DDB for console -> syscons -> tty -> knote
operations.
2004-10-09 08:16:37 +00:00
davidxu
94500a0336 Add an execve command for kse_thr_interrupt to allow libpthread to
restore signal mask correctly, this is required by POSIX.

Reviewed by: deischen
2004-10-07 13:50:10 +00:00
davidxu
e85209d12c Regen to unbreak world.
Pointy hat to: mtm
2004-10-07 01:09:46 +00:00
das
35b6f981ab Back out rev 1.240; it is unnecessary. In particular,
p1 == curthread, so _PHOLD(p1) will not have to block
to swap in p1.

Noticed by:	jhb
2004-10-06 23:53:49 +00:00
mtm
0a21f474dc Close a race between a thread exiting and the freeing of it's stack.
After some discussion the best option seems to be to signal the thread's
death from within the kernel. This requires that thr_exit() take an
argument.

Discussed with: davidxu, deischen, marcel
MFC after: 3 days
2004-10-06 14:23:00 +00:00
davidxu
e1ce006b64 Close a race between thr_create and sysctl -w, the thr_scope_sys could
be changed when thr_create is running, and we tested it for several times.
2004-10-06 02:29:19 +00:00
grog
152055d94b vtryrecycle: Don't rely on type VBAD alone to mean that we don't need
to clean the vnode.  If v_data is set, we still need to
	     clean it.  This code change should catch all incidents of
	     the previous commit (INVARIANTS only).
2004-10-06 02:09:59 +00:00
grog
882d69104e getnewvnode: Weaken the panic "cleaned vnode isn't" to a warning.
Discussion: this panic (or waning) only occurs when the kernel is
  compiled with INVARIANTS.  Otherwise the problem (which means that
  the vp->v_data field isn't NULL, and represents a coding error and
  possibly a memory leak) is silently ignored by setting it to NULL
  later on.

  Panicking here isn't very helpful: by this time, we can only find
  the symptoms.  The panic occurs long after the reason for "not
  cleaning" has been forgotten; in the case in point, it was the
  result of severe file system corruption which left the v_type field
  set to VBAD.  That issue will be addressed by a separate commit.
2004-10-06 02:06:11 +00:00
davidxu
7acde29a24 Restore some code removed in revision 1.193 and 1.194, julian said
he'd like to keep these code.
2004-10-06 00:49:41 +00:00
davidxu
793ea9317e In original kern_execve() code, at the start of the function, it forces
all other threads to suicide, problem is execve() could be failed, and
a failed execve() would change threaded process to unthreaded, this side
effect is unexpected.
The new code introduces a new single threading mode SINGLE_BOUNDARY, in
the mode, all threads should suspend themself at user boundary except
the singler. we can not use SINGLE_NO_EXIT because we want to start from
a clean state if execve() is successful, suspending other threads at unknown
point and later resuming them from there and forcing them to exit at user
boundary may cause the process to start from a dirty state. If execve() is
successful, current thread upgrades to SINGLE_EXIT mode and forces other
threads to suicide at user boundary, otherwise, other threads will be resumed
and their interrupted syscall will be restarted.

Reviewed by: julian
2004-10-06 00:40:41 +00:00
julian
d5dfe59f9e Fix whitespace botch that only showed up in the commit message diff :-/
MFC after:	4 days
2004-10-05 22:14:02 +00:00
julian
b4640b18f7 Slight cleanup in the single threading code.
MFC after:	4 days
2004-10-05 22:05:25 +00:00
julian
57fb03da54 When preempting a thread, put it back on the HEAD of its run queue.
(Only really implemented in 4bsd)

MFC after:	4 days
2004-10-05 22:03:10 +00:00
julian
7d0504ed38 Oops. left out part of the diff.
MFC after:	4 days
2004-10-05 21:26:27 +00:00
julian
7b170fd9fa Use some macros to trach available scheduler slots to allow
easier debugging.

MFC after:	4 days
2004-10-05 21:10:44 +00:00
julian
8587c9806d light rearrangement of some code to get some locking
more correct

MFC after:	4 days
2004-10-05 20:48:16 +00:00
julian
2094122f86 Break out to a separate function, the code to revert a multithreaded
process back to officially being a non-threaded program.

MFC after:	4 days
2004-10-05 20:39:26 +00:00
jhb
ce2d3f89af Rework how we store process times in the kernel such that we always store
the raw values including for child process statistics and only compute the
system and user timevals on demand.

- Fix the various kern_wait() syscall wrappers to only pass in a rusage
  pointer if they are going to use the result.
- Add a kern_getrusage() function for the ABI syscalls to use so that they
  don't have to play stackgap games to call getrusage().
- Fix the svr4_sys_times() syscall to just call calcru() to calculate the
  times it needs rather than calling getrusage() twice with associated
  stackgap, etc.
- Add a new rusage_ext structure to store raw time stats such as tick counts
  for user, system, and interrupt time as well as a bintime of the total
  runtime.  A new p_rux field in struct proc replaces the same inline fields
  from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime).  A new p_crux
  field in struct proc contains the "raw" child time usage statistics.
  ruadd() has been changed to handle adding the associated rusage_ext
  structures as well as the values in rusage.  Effectively, the values in
  rusage_ext replace the ru_utime and ru_stime values in struct rusage.  These
  two fields in struct rusage are no longer used in the kernel.
- calcru() has been split into a static worker function calcru1() that
  calculates appropriate timevals for user and system time as well as updating
  the rux_[isu]u fields of a passed in rusage_ext structure.  calcru() uses a
  copy of the process' p_rux structure to compute the timevals after updating
  the runtime appropriately if any of the threads in that process are
  currently executing.  It also now only locks sched_lock internally while
  doing the rux_runtime fixup.  calcru() now only requires the caller to
  hold the proc lock and calcru1() only requires the proc lock internally.
  calcru() also no longer allows callers to ask for an interrupt timeval
  since none of them actually did.
- calcru() now correctly handles threads executing on other CPUs.
- A new calccru() function computes the child system and user timevals by
  calling calcru1() on p_crux.  Note that this means that any code that wants
  child times must now call this function rather than reading from p_cru
  directly.  This function also requires the proc lock.
- This finishes the locking for rusage and friends so some of the Giant locks
  in exit1() and kern_wait() are now gone.
- The locking in ttyinfo() has been tweaked so that a shared lock of the
  proctree lock is used to protect the process group rather than the process
  group lock.  By holding this lock until the end of the function we now
  ensure that the process/thread that we pick to dump info about will no
  longer vanish while we are trying to output its info to the console.

Submitted by:	bde (mostly)
MFC after:	1 month
2004-10-05 18:51:11 +00:00
jhb
9536269a6d Add a critical section in turnstile_unpend() from before dropping the
turnstile chain lock until after making all the awakened threads
runnable.  First, this fixes a priority inversion race.  Second, this
attempts to finish waking up all of the threads waiting on a turnstile
before doing a preemption.

Reviewed by:	Stephan Uphoff (who found the priority inversion race)
2004-10-05 18:00:30 +00:00
pjd
c944ef39d6 Back out changes which were introduced to delay mounting root file system.
Those changes were made on gmirror needs, but now gmirror handles this
by itself.
2004-10-05 11:26:43 +00:00
davidxu
aa22b44625 Use scheduler api to adjust thread priority. 2004-10-05 09:10:30 +00:00
imp
cf32c9fe79 Add taskqueue_drain. This waits for the specified task to finish, if
running, or returns.  The calling program is responsible for making sure
that nothing new is enqueued.

# man page coming soon.
2004-10-05 04:16:01 +00:00
phk
bd3b1af9a6 Change the perfectly precise message
printf("No buffers busy after final sync");
to
       printf("All buffers synced.");
in order to not leave the users wondering if there should be.
2004-10-04 13:13:23 +00:00
julian
395c906e95 Another case where we need to guard against a partially
constructed process.

Submitted by: Stephan Uphoff ( ups at tree.com	)
MFC after:	3 days
2004-10-04 06:45:48 +00:00
julian
96dbdb17db Always strt out with an initilalised ksegrp structure.
MFC after:	3 days
2004-10-03 20:06:11 +00:00
davidxu
33faeb8a73 Don't bother to turn off other P_STOPPED bits for SIGKILL, doing
so would cause kernel to produce an unkillable process in some cases,
especially, P_STOPPED_SINGLE has a singling thread, turning off the
bit would mess the state.
2004-10-03 13:23:49 +00:00
alc
ad2a4ca3e0 Add a SOCKBUF_LOCK() to a rarely executed path in do_sendfile(). 2004-10-02 05:37:47 +00:00
alfred
0efc91b067 Clear a process's procfs trace points upon delivery of SIGKILL.
MT5 candidate. (Desired features for 5.3-RELEASE "More truss problems")
2004-10-01 14:15:20 +00:00
phk
2e5b8b9883 Fix a LOR relating to freeing cdevs. 2004-10-01 06:33:39 +00:00
alfred
a72e384f52 cover soreadable and sowriteable with the corresponding socketbuffer locks. 2004-10-01 05:54:06 +00:00
das
e399d76f1b Avoid calling _PHOLD(p1) with p2's lock held, since _PHOLD()
may block to swap in p1.  Instead, call _PHOLD earlier, at a
point where the only lock held happens to be p1's.
2004-10-01 05:01:29 +00:00
jhb
59af2fcb61 Fix a typo to fix the !DIAGNOSTIC build.
Submitted by:	many
2004-09-30 18:13:18 +00:00
phk
9743317a02 Assign a global unit number for the tty slave devices (init/lock) using
the new subr_unit.c code.

For now assert Giant in ttycreate() and ttyfree().  It is not obvious that
it will ever pay off to lock these with anything else.
2004-09-30 10:38:48 +00:00
phk
fbd7b98a6a Add a new API for allocating unit number (-like) resources.
Allocation is always lowest free unit number.

A mixed range/bitmap strategy for maximum memory efficiency.  In
the typical case where no unit numbers are freed total memory usage
is 56 bytes on i386.

malloc is called M_WAITOK but no locking is provided (yet).  A bit of
experience will be necessary to determine the best strategy.  Hopefully
a "caller provides locking" strategy can be maintained, but that may
require use of M_NOWAIT allocation and failure handling.

A userland test driver is included.
2004-09-30 07:04:03 +00:00
green
70acfe3e4f Account for alias devices when tearing them down in destroy_dev() so we
don't panic on a NULL cdev->si_devsw.
2004-09-29 16:38:38 +00:00
des
f665f60342 Turn VM_KMEM_SIZE_MAX and VM_KMEM_SIZE_SCALE into tunables.
MFC after:	3 days
2004-09-29 14:21:40 +00:00
phk
d08ddc3f6b Add functions to create and free the "tty-ness" of a serial port in a
generic way.  This code will allow a similar amount of code to be
removed from most if not all serial port drivers.

	Add generic cdevsw for tty devices.

	Add generic slave cdevsw for init/lock devices.

	Add ttypurge function which wakes up all know generic sleep
	points in the tty code, and calls into the hw-driver if it
	provides a method.

	Add ttycreate function which creates tty device and optionally
	cua device.  In both cases .init/.lock devices are created
	as well.

	Change ttygone() slightly to also call the hw driver provided
	purge routine.

	Add ttyfree() which will purge and destroy the cdevs.

	Add ttyconsole mode for setting console friendly termios
	on a port.
2004-09-28 19:33:49 +00:00
jmg
0d1f936e78 improve the mbuf m_print function.. Only pull length from pkthdr if there
is one, detect mbuf loops and stop, add an extra arg so you can only print
the first x bytes of the data per mbuf (print all if arg is -1), print
flags using %b (bitmask)...

No code in the tree appears to use m_print, and it's just a maner of adding
-1 as an additional arg to m_print to restore original behavior..

MFC after:	4 days
2004-09-28 18:40:18 +00:00
phk
6e31d065d3 Give cluster_write() an explicit vnode argument.
In the future a struct buf will not automatically point out a vnode for us.
2004-09-27 19:14:10 +00:00
phk
3234741a00 Used cached cdevsw pointer. 2004-09-27 06:34:30 +00:00
phk
27fb35d0b1 Add cdevsw->d_purge() support.
This device method shall wake up any threads sleeping in the device driver
and make the depart the drivers code for good.
2004-09-27 06:18:25 +00:00
marcel
266d410b93 Fix a bug introduced in the previous commit: kdb_cpu_trap() gets to
the trapframe via kdb_frame, but kdb_frame was not initialized until
after the call to kdb_cpu_trap(). Ergo: kdb_cpu_trap() was moved too
far up.

Pointy hat: marcel
2004-09-26 06:48:59 +00:00
julian
01b7ff330e Use the universal 'threaded process' flag rather than the
specific tests for different threading systems.

MFC after:	1 week
2004-09-25 00:53:46 +00:00
jhb
3bd2c1b67b Some more whitespace, style, and comment fixes.
Submitted by:	bde (mostly)
2004-09-24 20:27:04 +00:00
pjd
db95a45215 Rename 'mount_root_delay' tunable to 'vfs.root.mountdelay', which fits
a bit better to our current naming scheme.

Discussed with:	ru
2004-09-24 09:19:03 +00:00
phk
c67c50f50e Remove the cdevsw() function which is now unused. 2004-09-24 08:30:57 +00:00
phk
6ee26d135f Hold threadcount while throbbing cdevsw in our underlying driver.
This is a bit heavyhanded, and will be simplified once the tty code
learns to properly deal with disappearing hw and drivers.
2004-09-24 08:26:03 +00:00
phk
14a7813f86 Hold threadcount reference when we call into the underlying console
driver.
2004-09-24 07:16:56 +00:00
phk
615a2ebb57 Eliminate devsw() call, we are not dereferencing the pointer. 2004-09-24 07:11:02 +00:00
phk
88cf2bf7b8 Hold threadref while we throb cdevsw in devtoname() 2004-09-24 06:29:23 +00:00
phk
19aa7ffe99 Use vn_isdisk() to check if vnode is a disk.
(repeat, CVS core dumped on me)
2004-09-24 06:23:31 +00:00
phk
d04b40f97c use vn_isdisk() to see if vnode is a disk. 2004-09-24 06:21:43 +00:00
phk
ee853f3efd Hold dev_lock and check for NULL devsw pointer when we service FIODTYPE ioctl. 2004-09-24 06:16:48 +00:00
phk
5536d5757b Hold dev_lock and check for NULL devsw pointer when we determine
if a vnode is a disk.
2004-09-24 06:16:08 +00:00
phk
18b8697aaa use dev_re[fl]thread() rather than home rolled versions. 2004-09-24 05:55:03 +00:00
phk
7ede6d888c Introduce dev_re[lf]thread() functions.
dev_refthread() will return the cdevsw pointer or NULL.  If the
return value is non-NULL a threadcount is held which much be released
with dev_relthread().  If the returned cdevsw is NULL no threadcount
is held on the device.
2004-09-24 05:54:32 +00:00
jhb
7b666b4137 A modest collection of various and sundry style, spelling, and whitespace
fixes.

Submitted by:	bde (mostly)
2004-09-24 00:38:15 +00:00
cognet
49654e152d On arm, set the default elf brand to FreeBSD, until the binutils do it for us. 2004-09-23 23:29:24 +00:00
jhb
d0df115aaa Don't try to protect td_sticks with sched_lock. It doesn't need it as it
is only accessed by curthread.
2004-09-23 21:03:58 +00:00
jhb
f6dc0c3d5f - Assert sched_lock in upcall_remove() since it is needed there and all
callers already lock it there.
- Lock sched_lock slightly earlier in kse_create() so that it covers
  kg_numupcalls.
2004-09-23 21:03:16 +00:00
jhb
1f2758a712 - Don't try to unlock Giant if single threading fails since we don't have
it locked.
- Unlock Giant before calling exit1() since exit1() does not require Giant.
2004-09-23 21:01:50 +00:00
phk
eb5eea42df Split the ioctl function in control and slave side, this eliminated
a troublesome devsw() call.
2004-09-23 16:13:46 +00:00
phk
1d992e18ec Eliminate DEV_STRATEGY() macro: call dev_strategy() directly.
Make dev_strategy() handle errors and departing devices properly.
2004-09-23 14:45:04 +00:00
pjd
99b0ffd3c0 Introduce new /boot/loader.conf variable: root_mount_delay.
It can be used to delay mounting root partition to give a chance to GEOM
providers to show up.
Now, when there is no needed provider, vfs_rootmount() function will look
for it every second and if it can't be find in defined time, it'll ask
for root device name (before this change it was done immediately).

This will allow to boot from gmirror device in degraded mode.
2004-09-23 10:13:18 +00:00
phk
3947e54e89 Do not refcount the cdevsw, but rather maintain a cdev->si_threadcount
of the number of threads which are inside whatever is behind the
cdevsw for this particular cdev.

Make the device mutex visible through dev_lock() and dev_unlock().
We may want finer granularity later.

Replace spechash_mtx use with dev_lock()/dev_unlock().
2004-09-23 07:17:41 +00:00
jhb
3956303607 Various small style fixes. 2004-09-22 15:24:33 +00:00
julian
5015e1ce1f Revert the last change..
Better to kill all other threads than to panic the system if 2 threads call
execve() at the same time. A better fix will be committed later.

Note that this only affects the case where the execve fails.
2004-09-22 01:30:23 +00:00
julian
fca3e8d0f0 In a threaded process, don't kill off all the other threads until we have a
reasonable chance that the eceve() is going to succeeed. I.e.
wait until we've done the permission checks etc.

MFC after:	1 week
2004-09-21 21:05:13 +00:00
phk
3441ee7248 If a vnode has no v_rdev we cannot hope to answer FIODTYPE ioctl. 2004-09-21 08:33:05 +00:00
jhb
777f907276 Remove unused macro. 2004-09-20 19:01:44 +00:00
brian
255869f387 CTASSERT that MSZIE is a power of 2 (otherwise dtom() breaks)
Ask uma_zcreate() to align mbufs to MSIZE bytes (otherwise dtom() breaks)

As it happens, uma_zalloc_arg() always returned mbufs aligned to MSIZE
anyway, but that was an implementation side-effect....

KASSERT -> CTASSERT suggested by: dd@
Approved by:	silence on -net
2004-09-20 08:52:04 +00:00
das
8b64b8f028 The zone from which proc structures are allocated is marked
UMA_ZONE_NOFREE to guarantee type stability, so proc_fini() should
never be called.  Move an assertion from proc_fini() to proc_dtor()
and garbage-collect the rest of the unreachable code.  I have retained
vm_proc_dispose(), since I consider its disuse a bug.
2004-09-19 18:34:17 +00:00
phk
37dd5c7419 Initialize new ttys a bit more.
Check TS_GONE flag for gone-ness.
2004-09-18 17:02:18 +00:00
marcel
f2afc2cd7a Move makectx() after kdb_cpu_trap(), so the PCB will have possible MD
corrections made to the trapframe. This is more logical.
2004-09-17 22:27:23 +00:00
phk
1b1b81c0fc Add ttyopen and ttyclose functions which will do the right stuff for
most if not all of our tty drivers in the future.

Centralizing this stuff enables us to remove about 100 lines of
almost but not quite perfectly copy&paste code from each tty driver.
2004-09-17 11:43:35 +00:00
phk
f8ef366cb9 Add ttyalloc() which in due time will be the successor to ttymalloc(),
but without the "struct tty *" argument.
2004-09-17 06:13:47 +00:00
phk
a64f45cafb Use the tty->t_sc field to find our softc. 2004-09-16 12:07:25 +00:00
julian
6461286b21 clean up thread runq accounting a bit.
MFC after:	3 days
2004-09-16 07:12:59 +00:00
julian
b4933d4405 e specific code to revert a partial add ot teh run queue, not
remrunqueue() which can't handle a partially added thread.

MFC after:	1 week
2004-09-16 05:37:40 +00:00
phk
02df7323ee Remove unused B_WRITEINPROG flag 2004-09-15 21:49:22 +00:00
phk
43f0dbec3c Simplify initialization of va_null a little bit. 2004-09-15 21:42:03 +00:00
phk
7bf1722b65 undent some functions a bit. 2004-09-15 21:08:58 +00:00
phk
b2c5cf5b2a stylistic polishing. 2004-09-15 20:54:23 +00:00
julian
2e88fd3281 Try harder to get back to being a non threaded process.
Submitted by:	DavidXu
MFC after:	3 days
2004-09-15 18:39:09 +00:00
julian
d7dd18c6b5 Oops accidentally removed #ifdef SCHED_4BSD
as part of another commit
This function is not yet used in ULE
2004-09-15 03:51:51 +00:00
jmg
ab70754605 unlock global lock in kqueue_scan before msleep'ing to prevent dead
lock..  we didn't unlock global lock earlier to prevent just having
to reaquire it again..

Found by:	peter
Reviewed by:	ps
MFC after:	3 days
2004-09-14 18:38:16 +00:00
julian
2e10eab995 Commit a fix for some panics we've been seeing with preemption.
MFC after:	2 days
2004-09-13 23:06:39 +00:00
julian
0b88c839d5 Add some kasserts 2004-09-13 23:02:52 +00:00
julian
29732c6fb7 make some of these conditions apply equally to both threading systems. 2004-09-13 22:10:04 +00:00
phk
a915c8947e Create struct snapdata which contains the snapshot fields from cdev
and the previously malloc'ed snapshot lock.

Malloc struct snapdata instead of just the lock.

Replace snapshot fields in cdev with pointer to snapdata (saves 16 bytes).

While here, give the private readblock() function a vnode argument
in preparation for moving UFS to access GEOM directly.
2004-09-13 07:29:45 +00:00
phk
2806321da1 Remove the buffercache/vnode side of BIO_DELETE processing in
preparation for integration of p4::phk_bufwork.  In the future,
local filesystems will talk to GEOM directly and they will consequently
be able to issue BIO_DELETE directly.  Since the removal of the fla
driver, BIO_DELETE has effectively been a no-op anyway.
2004-09-13 06:50:42 +00:00
scottl
1e56230631 Revert the previous round of changes to td_pinned. The scheduler isn't
fully initialed when the pmap layer tries to call sched_pini() early in the
boot and results in an quick panic.  Use ke_pinned instead as was originally
done with Tor's patch.

Approved by: julian
2004-09-11 10:07:22 +00:00
julian
4b041e0e33 Try committing from the right tree this time
MFC after:	2 days
2004-09-11 00:11:09 +00:00
julian
7cae3c9d5b Make up my mind if cpu pinning is stored in the thread structure or the
scheduler specific extension to it. Put it in the extension as
the implimentation details of how the pinning is done needn't be visible
outside the scheduler.

Submitted by:	tegge  (of course!)   (with changes)
MFC after:	3 days
2004-09-10 22:28:33 +00:00
julian
9993c65718 Add some code to allow threads to nominat a sibling to run if theyu are going to sleep.
MFC after:	1 week
2004-09-10 21:04:38 +00:00
jmg
08f545c4a5 remove giant required from kqueue_close..
Reported by:	kuriyama
MFC after:	3 days
2004-09-10 03:14:32 +00:00
rwatson
d4e6ebd0c9 Hard code witness lock order for BPF locks. 2004-09-09 05:01:37 +00:00
phk
1912367ebb Create simple function init_va_filerev() for initializing a va_filerev
field.

Replace three instances of longhaired initialization va_filerev fields.

Added XXX comment wondering why we don't use random bits instead of
uptime of the system for this purpose.
2004-09-07 09:17:05 +00:00
julian
ae5a11b8d4 fix typo
MFC after:	2 days
2004-09-07 07:04:47 +00:00
julian
35060cd448 Make debug printf less threatenning and make it only print out once.
MFC after:	2 days
2004-09-07 06:38:22 +00:00
julian
de0e7f8937 Give libthr a choice (per system) of scope_system or scope_thread
scheduling.

MFC after:	4 days
2004-09-07 06:33:39 +00:00
jmg
ff58a59f8f make witness it's own sysctl branch instead of using _ to do this. I have
left the old tunables in to give people a few days to transition their
loader.conf and sysctl.conf's over to the new names..

MFC after:	5 days
2004-09-06 23:27:28 +00:00
jmg
b29998067a don't call f_detach if the filter has alread removed the knote.. This
happens when a proc exits, but needs to inform the user that this has
happened..  This also means we can remove the check for detached from
proc and sig f_detach functions as this is doing in kqueue now...

MFC after:	5 days
2004-09-06 19:02:42 +00:00
julian
91180c0a8c Don't do IPIs on behalf of interrupt threads.
just punt straight on through to teh preemption code.

Make a KASSSERT out of a condition that can no longer occur.
MFC after:	1 week
2004-09-06 07:23:14 +00:00
julian
daf0815c1d slight code cleanup
MFC after:	1 week
2004-09-05 23:23:58 +00:00
alfred
a91f587457 It's too easy to panic the machine when INVARIANTS are turned on
and you botch a call to nmount(2).

This is because there is an INVARIANTS check that asserts that
opt->len must be zero if opt->val is not NULL.  The problem is that
the code does not actually follow this invariant if there is an
error while processing mount options.

Fix the code to honor the INVARIANT.

Silence on: fs@
2004-09-05 22:24:28 +00:00
rwatson
0d9965ce27 Expand the scope of the socket buffer locks in sopoll() to include the
state test as well as set, or we risk a race between a socket wakeup
and registering for select() or poll() on the socket.  This does
increase the cost of the poll operation, but can probably be optimized
some in the future.

This appears to correct poll() "wedges" experienced with X11 on SMP
systems with highly interactive applications, and might affect a plethora
of other select() driven applications.

RELENG_5 candidate.

Problem reported by:	Maxim Maximov <mcsi at mcsi dot pp dot ru>
Debugged with help of:	dwhite
2004-09-05 14:33:21 +00:00
julian
e291fa7714 turn on IPIs for 4bsd scheduler by default.
MFC after:	1 week
2004-09-05 02:19:53 +00:00
julian
5813d27029 Refactor a bunch of scheduler code to give basically the same behaviour
but with slightly cleaned up interfaces.

The KSE structure has become the same as the "per thread scheduler
private data" structure. In order to not make the diffs too great
one is #defined as the other at this time.

The KSE (or td_sched) structure is  now allocated per thread and has no
allocation code of its own.

Concurrency for a KSEGRP is now kept track of via a simple pair of counters
rather than using KSE structures as tokens.

Since the KSE structure is different in each scheduler, kern_switch.c
is now included at the end of each scheduler. Nothing outside the
scheduler knows the contents of the KSE (aka td_sched) structure.

The fields in the ksegrp structure that are to do with the scheduler's
queueing mechanisms are now moved to the kg_sched structure.
(per ksegrp scheduler private data structure). In other words how the
scheduler queues and keeps track of threads is no-one's business except
the scheduler's. This should allow people to write experimental
schedulers with completely different internal structuring.

A scheduler call sched_set_concurrency(kg, N) has been added that
notifies teh scheduler that no more than N threads from that ksegrp
should be allowed to be on concurrently scheduled. This is also
used to enforce 'fainess' at this time so that a ksegrp with
10000 threads can not swamp a the run queue and force out a process
with 1 thread, since the current code will not set the concurrency above
NCPU, and both schedulers will not allow more than that many
onto the system run queue at a time. Each scheduler should eventualy develop
their own methods to do this now that they are effectively separated.

Rejig libthr's kernel interface to follow the same code paths as
linkse for scope system threads. This has slightly hurt libthr's performance
but I will work to recover as much of it as I can.

Thread exit code has been cleaned up greatly.
exit and exec code now transitions a process back to
'standard non-threaded mode' before taking the next step.
Reviewed by:	scottl, peter
MFC after:	1 week
2004-09-05 02:09:54 +00:00
julian
42bfd75cfe Don't declare a function we are not defining. 2004-09-03 09:19:49 +00:00
julian
3bc1a1327b fix compile for UP 2004-09-03 09:15:10 +00:00
julian
e2d37a7c26 ooops finish last commit.
moved the variables but not the declarations.
2004-09-03 08:19:31 +00:00
julian
373bbfc184 Move 4bsd specific experimental IP code into the 4bsd file.
Move the sysctls into kern.sched
2004-09-03 07:42:31 +00:00
alc
82e55fdf76 Push Giant deep into vm_forkproc(), acquiring it only if the process has
mapped System V shared memory segments (see shmfork_myhook()) or requires
the allocation of an ldt (see vm_fault_wire()).
2004-09-03 05:11:32 +00:00
rwatson
fd3f91ddf3 Tag AIO as requiring Giant over the network stack using
NET_NEEDS_GIANT().

RELENG_5 candidate.
2004-09-03 03:19:14 +00:00
julian
46d0945926 remove unused code
MFC after:	 2 days
2004-09-02 23:37:41 +00:00
scottl
d9af98161a Turn PREEMPTION into a kernel option. Make sure that it's defined if
FULL_PREEMPTION is defined.  Add a runtime warning to ULE if PREEMPTION is
enabled (code inspired by the PREEMPTION warning in kern_switch.c).  This
is a possible MT5 candidate.
2004-09-02 18:59:15 +00:00
julian
7f91bb5d9a *Blush* forgot to test non SMP builds.. oddly enough some UP code (particularly
in the acpi code) seems to want this in a UP build. (I guess so you can have
a sigle kernel module that works for both)
2004-09-01 18:05:43 +00:00
julian
8354ba9e3a Give the 4bsd scheduler the ability to wake up idle processors
when there is new work to be done.

MFC after:	5 days
2004-09-01 06:42:02 +00:00
julian
e9d9514975 Give setrunqueue() and sched_add() more of a clue as to
where they are coming from and what is expected from them.

MFC after:	2 days
2004-09-01 02:11:28 +00:00
davidxu
21ee614ff9 Remove TDP_USTATCLOCK, we no longer need it because we now always
update tick count for userland in thread_userret. This change
also removes a "no upcall owned" panic because fuword() schedules
an upcall under heavily loaded, and code assumes there is no upcall
can occur.

Reported and Tested by: Peter Holm <peter@holm.cc>
2004-08-31 11:52:05 +00:00
julian
2782d4b3fc Remove an unneeded argument..
The removed argument could trivially be derived from the remaining one.
That in turn should be the same as curthread, but it is possible that curthread could be expensive to derive on some syste,s so leave it as an argument.
Having both proc and thread as an argumen tjust gives an opportunity for
them to get out sync.

MFC after:	3 days
2004-08-31 07:34:54 +00:00
julian
ee753ed190 Remove sched_free_thread() which was only used
in diagnostics. It has outlived its usefulness and has started
causing panics for people who turn on DIAGNOSTIC, in what is otherwise
good code.

MFC after:	2 days
2004-08-31 06:12:13 +00:00
imp
7387033363 Fix BUS_DEBUG case 2004-08-30 05:48:49 +00:00
pjd
676a87c9ac Add a missing '\n'. 2004-08-30 01:10:20 +00:00
davidxu
650fed99d4 Only test return_instead if P_SINGLE_EXIT is set, otherwise a fork()
syscall can interrupt other thread's syscall in sleepq_catch_signals().
Current, all callers know thread_suspend_check may suspend thread
itself, so we need't to check return_instead for normal suspension
flags (no P_SINGLE_EXIT set).

Tested by: deischen
Reported by: Maarten L. Hekkelman <m.hekkelman@cmbi.kun.nl>
2004-08-29 23:10:02 +00:00
imp
5bc031c873 Initial support (disabled) for rebidding devices. I've been running
this in my tree for a while and in its disabled state there are no
issues.  It isn't enabled yet because some drivers (in acpi) have side
effects in their probe routines that need to be resolved in some
manner before this can be turned on.  The consensus at the last
developer's summit was to provide a static method for each driver
class that will return characteristics of the driver, one of which is
if can be reprobed idempotently.
2004-08-29 18:25:21 +00:00
imp
890b511bba MFp4: Merge in the patches, submitted long ago by someone whose email
address I've lost, that move the location information to the atttach
routine as well.  While one could use devinfo to get this data, that
is difficult and error prone and subject to races for short lived
devices.

Would make a good MT5 candidate.
2004-08-29 18:11:10 +00:00
des
431e20a6fe Remove the HW_WDOG option; it serves no purpose.
MFC after:	3 days
2004-08-29 11:10:09 +00:00
iedowse
23b0458914 Add support for completing the installation of ELF relocatable
object format modules that were read in by the loader. Loading
modules via the loader should now work on the amd64 platform.
2004-08-29 01:21:51 +00:00
davidxu
a6ba819750 1. try to use existing mailbox address in thread_update_usr_ticks.
2. remove '\n' in KASSERT.
2004-08-28 04:16:32 +00:00
davidxu
96f0feb1d4 Move TDF_CAN_UNBIND to thread private flags td_pflags, this eliminates
need of sched_lock in some places. Also in thread_userret, remove
spare thread allocation code, it is already done in thread_user_enter.

Reviewed by: julian
2004-08-28 04:08:05 +00:00
peter
9e60f4336e Backout the previous backout (with scott's ok). sched_ule.c:1.122 is
believed to fix the problem with ULE that this change triggered.
2004-08-28 01:04:44 +00:00
obrien
0fe47008f6 s/smp_rv_mtx/smp_ipi_mtx/g
Requested by:	jhb
2004-08-28 00:49:55 +00:00
peter
587d1d74f3 Commit Jeff's suggested changes for avoiding a bug that is exposed by
preemption and/or the rev 1.79 kern_switch.c change that was backed out.

The thread was being assigned to a runq without adding in the load, which
would cause the counter to hit -1.
2004-08-28 00:49:22 +00:00
andre
bae83b7595 Poll() uses the array smallbits that is big enough to hold 32 struct
pollfd's to avoid calling malloc() on small numbers of fd's.  Because
smalltype's members have type char, its address might be misaligned
for a struct pollfd.  Change the array of char to an array of struct
pollfd.

PR:		kern/58214
Submitted by:	Stefan Farfeleder <stefan@fafoe.narf.at>
Reviewed by:	bde (a long time ago)
MFC after:	3 days
2004-08-27 21:23:50 +00:00
kan
a060608a2d Reintroduce slightly modified patch from kern/69964. Check for
LK_HAVE_EXL in both acquire invocations.

MFC after:	5 days
2004-08-27 01:41:28 +00:00
iedowse
1268768c3d When trying each linker class in turn with a preloaded module, exit
the loop if the preload was successful. Previously a successful
preload was ignored if the linker class was not the last in the
list.
2004-08-27 01:20:26 +00:00
rwatson
4dfef2e0e5 Don't hold the UNIX domain socket subsystem lock over the body of the
UNIX domain socket garbage collection implementation, as that risks
holding the mutex over potentially sleeping operations (as well as
introducing some nasty lock order issues, etc).  unp_gc() will hold
the lock long enough to do necessary deferal checks and set that it's
running, but then release it until it needs to reset the gc state.

RELENG_5 candidate.

Discussed with:	alfred
2004-08-25 21:24:36 +00:00
rwatson
d168fd3606 Conditional acquisition of socket buffer mutexes when testing socket
buffers with kqueue filters is no longer required: the kqueue framework
will guarantee that the mutex is held on entering the filter, either
due to a call from the socket code already holding the mutex, or by
explicitly acquiring it.  This removes the last of the conditional
socket locking.
2004-08-24 05:28:18 +00:00
imp
9b0c1e7ac1 Set the description to NULL in the right detach routine. This should
keep dangling pointers to strings in loaded modules from hanging
around after the drivers are unloaded.
2004-08-24 05:19:15 +00:00
davidxu
cf0e9470a8 Remove checking of single exit flag in thread_user_enter(), this is
generic code for threaded process, should not be here.
2004-08-23 22:54:37 +00:00
peter
326b7f663e Commit Doug White and Alan Cox's fix for the cross-ipi smp deadlock.
We were obtaining different spin mutexes (which disable interrupts after
aquisition) and spin waiting for delivery.  For example, KSE processes
do LDT operations which use smp_rendezvous, while other parts of the
system are doing things like tlb shootdowns with a different mutex.

This patch uses the common smp_rendezvous mutex for all MD home-grown
IPIs that spinwait for delivery.  Having the single mutex means that
the spinloop to aquire it will enable interrupts periodically, thus
avoiding the cross-ipi deadlock.

Obtained from: dwhite, alc
Reviewed by:   jhb
2004-08-23 21:39:29 +00:00
kan
edf5a7b07f Temporarily back out r1.74 as it seems to cause a number of regressions
accordimg to numerous reports. It  might get reintroduced some time later
when an exact failure mode is understood better.
2004-08-23 02:39:45 +00:00
rwatson
b3e3a32317 Make debug.kdb.stop_cpus also a TUNABLE() so it can be set prior to boot
to help debug early nasty hangs.
2004-08-22 15:10:52 +00:00
julian
9349236b6f diff reduction for upcoming patch. Use a macro that masks
some of the odd goings on with sub-structures, because they will
go away anyhow.
2004-08-22 05:21:41 +00:00
truckman
f36627bd56 Don't bother calling the module event handlers from module_shutdown()
in the shutdown_final state if the RB_NOSYNC flag is set.

The specific motivation in this case is that a system panic in an
interrupt context results in a call to module_shutdown(), which
calls g_modevent(), which calls g_malloc(..., M_WAITOK), which
results in a second panic.   While g_modevent() could be fixed to
not call malloc() for MOD_SHUTDOWN events (which it doesn't handle
in any case), it is probably also a good idea to entirely skip the
execution of the module shutdown handlers after a panic.

This may be a MFC candidate for RELENG_5.
2004-08-20 21:47:48 +00:00
truckman
54d23a34f6 Don't attempt to trigger the syncer thread final sync code in the
shutdown_pre_sync state if the RB_NOSYNC flag is set.  This is the
likely cause of hangs after a system panic that are keeping crash
dumps from being done.

This is a MFC candidate for RELENG_5.

MFC after:	3 days
2004-08-20 19:21:47 +00:00
jhb
fc631187fd Remove some dead code under a straggling APIC_IO #ifdef that I missed
back before 5.2.
2004-08-20 17:24:52 +00:00
rwatson
5c80f32b93 Back out uipc_socket.c:1.208, as it incorrectly assumes that all
sockets are connection-oriented for the purposes of kqueue
registration.  Since UDP sockets aren't connection-oriented, this
appeared to break a great many things, such as RPC-based
applications and services (i.e., NFS).  Since jmg isn't around I'm
backing this out before too many more feet are shot, but intend to
investigate the right solution with him once he's available.

Apologies to:	jmg
Discussed with:	imp, scottl
2004-08-20 16:24:23 +00:00
scottl
30583f7adf Revert the previous change. It works great for 4BSD but causes major
problems for ULE.  The reason is quite unknown and worrisome.
2004-08-20 05:58:38 +00:00
scottl
b336a56514 In maybe_preempt(), ignore threads that are in an inconsistent state. This
is an effective band-aid for at least some of the scheduler corruption seen
recently.  The real fix will involve protecting threads while they are
inconsistent, and will come later.

Submitted by: julian
2004-08-20 05:18:50 +00:00
jmg
b0492852c8 make sure that the socket is either accepting connections or is connected
when attaching a knote to it...  otherwise return EINVAL...

Pointed out by:	benno
2004-08-20 04:15:30 +00:00
njl
7a83d1fca4 Add a newline. 2004-08-19 20:16:09 +00:00
phk
59d327838d Add bioq_takefirst().
If the bioq is empty, NULL is returned.  Otherwise the front element
is removed and returned.

This can simplify locking in many drivers from:

	lock()
	bp = bioq_first(bq);
	if (bp == NULL) {
		unlock()
		return
	}
	bioq_remove(bp, bq)
	unlock
to:
	lock()
	bp = bioq_takefirst(bq);
	unlock()
	if (bp == NULL)
		return;
2004-08-19 19:51:51 +00:00
njl
5aee38d321 Add debugging to rman_manage_region() as well. This is useful since we
manage subregions in ACPI.

MFC after:	3 days
2004-08-19 16:41:12 +00:00
rwatson
477ea1ed67 Remove GIANT_REQUIRED from setugidsafety() as knote_fdclose() no longer
requires Giant.
2004-08-19 14:59:51 +00:00
jhb
9e08178eb7 Now that the return value semantics of cv's for multithreaded processes
have been unified with that of msleep(9), further refine the sleepq
interface and consolidate some duplicated code:
- Move the pre-sleep checks for theaded processes into a
  thread_sleep_check() function in kern_thread.c.
- Move all handling of TDF_SINTR to be internal to subr_sleepqueue.c.
  Specifically, if a thread is awakened by something other than a signal
  while checking for signals before going to sleep, clear TDF_SINTR in
  sleepq_catch_signals().  This removes a sched_lock lock/unlock combo in
  that edge case during an interruptible sleep.  Also, fix
  sleepq_check_signals() to properly handle the condition if TDF_SINTR is
  clear rather than requiring the callers of the sleepq API to notice
  this edge case and call a non-_sig variant of sleepq_wait().
- Clarify the flags arguments to sleepq_add(), sleepq_signal() and
  sleepq_broadcast() by creating an explicit submask for sleepq types.
  Also, add an explicit SLEEPQ_MSLEEP type rather than a magic number of
  0.  Also, add a SLEEPQ_INTERRUPTIBLE flag for use with sleepq_add() and
  move the setting of TDF_SINTR to sleepq_add() if this flag is set rather
  than sleepq_catch_signals().  Note that it is the caller's responsibility
  to ensure that sleepq_catch_signals() is called if and only if this flag
  is passed to the preceeding sleepq_add().  Note that this also removes a
  sched_lock lock/unlock pair from sleepq_catch_signals().  It also ensures
  that for an interruptible sleep, TDF_SINTR is always set when
  TD_ON_SLEEPQ() is true.
2004-08-19 11:31:42 +00:00
jmg
bead871bc0 add options MPROF_BUFFERS and MPROF_HASH_SIZE that adjust the sizes of
the mutex profiling buffers.  Document them in the man page and in NOTES.
Ensure _HASH_SIZE is larger than _BUFFERS with a cpp error.
2004-08-19 06:38:26 +00:00
rwatson
3d9f38d578 Add UNP_UNLOCK_ASSERT() to asser that the UNIX domain socket subsystem
lock is not held.

Rather than annotating that the lock is released after calls to
unp_detach() with a comment, annotate with an assertion.

Assert that the UNIX domain socket subsystem lock is not held when
unp_externalize() and unp_internalize() are called.
2004-08-19 01:45:16 +00:00
rwatson
6dcf7eb45e Annotate call to DELAY() in interrupt storm mitigation as being
something to revisit.

Approved by:	re (scottl)
2004-08-17 04:09:09 +00:00
kan
2c6402abdd Upgrading a lock does not play well together with acquiring an exclusive lock
and can lead to two threads being granted exclusive access. Check that no one
has the same lock in exclusive  mode before proceeding to acquire it.

The LK_WANT_EXCL and LK_WANT_UPGRADE bits act as mini-locks and can block
other threads.  Normally this is not a problem since the mini locks are
upgraded to full locks and the release of the locks will unblock the other
threads.  However if a thread reset the bits without obtaining a full lock
other threads are not awoken. Add missing wakeups for these cases.

PR:		kern/69964
Submitted by:	Stephan Uphoff <ups at tree dot com>
Very good catch by: Stephan Uphoff <ups at tree dot com>
2004-08-16 15:01:22 +00:00
obrien
8f41b4e870 s/MAX_SAFE_MAXVNODES/MAXVNODES_MAX/g 2004-08-16 08:33:37 +00:00
rwatson
7ac0169fa1 Always acquire the UNIX domain socket subsystem lock (UNP lock)
before dereferencing sotounpcb() and checking its value, as so_pcb
is protected by protocol locking, not subsystem locking.  This
prevents races during close() by one thread and use of ths socket
in another.

unp_bind() now assert the UNP lock, and uipc_bind() now acquires
the lock around calls to unp_bind().
2004-08-16 04:41:03 +00:00
green
1de6d6df05 Add the missing knote_fdclose(). 2004-08-16 03:09:01 +00:00
green
99deda206a Allocate the marker, when scanning a kqueue, from the "heap" instead of the
stack.  When swapped out, a process's kernel stack would be unavailable,
and we could get a page fault when scanning the same kqueue.

PR:	kern/61849
2004-08-16 03:08:38 +00:00
rwatson
2cd0dbc8de Annotate the current UNIX domain socket locking strategies, order,
strengths, and weaknesses in a comment.  Assert a copyright over the
changes made as part of the locking work.
2004-08-16 01:52:04 +00:00
silby
e3f2e32958 Major enhancements to pipe memory usage:
- pipespace is now able to resize non-empty pipes; this allows
  for many more resizing opportunities

- Backing is no longer pre-allocated for the reverse direction
  of pipes.  This direction is rarely (if ever) used, so this cuts the
  amount of map space allocated to a pipe in half.

- Pipe growth is now much more dynamic; a pipe will now grow when
  the total amount of data it contains and the size of the write are
  larger than the size of pipe.  Previously, only individual writes greater
  than the size of the pipe would cause growth.

- In low memory situations, pipes will now shrink during both read
  and write operations, where possible.  Once the memory shortage
  ends, the growth code will cause these pipes to grow back to an appropriate
  size.

- If the full PIPE_SIZE allocation fails when a new pipe is created, the
  allocation will be retried with SMALL_PIPE_SIZE.  This helps to deal
  with the situation of a fragmented map after a low memory period has
  ended.

- Minor documentation + code changes to support the above.

In total, these changes increase the total number of pipes that
can be allocated simultaneously, drastically reducing the chances that
pipe allocation will fail.

Performance appears unchanged due to dynamic resizing.
2004-08-16 01:27:24 +00:00
truckman
5facb5bcc2 Yet another tweak to the shutdown messages in boot():
Don't count busy buffers before the initial call to sync() and
  don't skip the initial sync() if no busy buffers were called.
  Always call sync() at least once if syncing is requested.  This
  defers the "Syncing disks, buffers remaining..." message until
  after the initial sync() call and the first count of busy
  buffers.  This backs out changes in kern_shutdown 1.162.

  Print a different message when there are no busy buffers after the
  initial sync(), which is now the expected situation.

  Print an additional message when syncing has completed successfully
  in the unusual situation where the work of syncing was done by
  boot().

  Uppercase one message to make it consistent with all of the other
  kernel shutdown messages.

Discussed with:	bde (in a much earlier form, prior to 1.162)
Reviewed by:	njl (in an earlier form)
2004-08-15 19:17:23 +00:00
jmg
bc1805c6e8 Add locking to the kqueue subsystem. This also makes the kqueue subsystem
a more complete subsystem, and removes the knowlege of how things are
implemented from the drivers.  Include locking around filter ops, so a
module like aio will know when not to be unloaded if there are outstanding
knotes using it's filter ops.

Currently, it uses the MTX_DUPOK even though it is not always safe to
aquire duplicate locks.  Witness currently doesn't support the ability
to discover if a dup lock is ok (in some cases).

Reviewed by:	green, rwatson (both earlier versions)
2004-08-15 06:24:42 +00:00
rwatson
b3113cfdfe Add a new sysctl, debug.kdb.stop_cpus, which controls whether or not we
attempt to IPI other cpus when entering the debugger in order to stop
them while in the debugger.  The default remains to issue the stop;
however, that can result in a hang if another cpu has interrupts disabled
and is spinning, since the IPI won't be received and the KDB will wait
indefinitely.  We probably need to add a timeout, but this is a useful
stopgap in the mean time.

Reviewed by:	marcel
2004-08-15 02:06:27 +00:00
rwatson
c7e2313e86 Cause pfind() not to return processes in the PRS_NEW state. As a result,
threads consuming the result of pfind() will not need to check for a NULL
credential pointer or other signs of an incompletely created process.
However, this also means that pfind() cannot be used to test for the
existence or find such a process.  Annotate pfind() to indicate that this
is the case.  A review of curent consumers seems to indicate that this is
not a problem for any of them.  This closes a number of race conditions
that could result in NULL pointer dereferences and related failure modes.
Other related races continue to exist, especially during iteration of the
allproc list without due caution.

Discussed with:	tjr, green
2004-08-14 17:15:16 +00:00
phk
9595df2db1 Add some KASSERTS. 2004-08-14 08:33:49 +00:00
julian
ae4d7bb6b9 Whitespace nit. 2004-08-14 07:21:20 +00:00
rwatson
136013f29f After completing a name lookup for a target UNIX domain socket to
connect to, re-check that the local UNIX domain socket hasn't been
closed while we slept, and if so, return EINVAL.  This affects the
system running both with and without Giant over the network stack,
and recent ULE changes appear to cause it to trigger more
frequently than previously under load.  While here, improve catching
of possibly closed UNIX domain sockets in one or two additional
circumstances.  I have a much larger set of related changes in
Perforce, but they require more testing before they can be merged.

One debugging printf is left in place to indicate when such a race
takes place: this is typically triggered by a buggy application
that simultaenously connect()'s and close()'s a UNIX domain socket
file descriptor.  I'll remove this at some point in the future, but
am interested in seeing how frequently this is reported.  In the
case of Martin's reported problem, it appears to be a result of a
non-thread safe syslog() implementation in the C library, which
does not synchronize access to its logging file descriptor.

Reported by:	mbr
2004-08-14 03:43:49 +00:00
jmg
bea28d4a04 clean up whitespace... 2004-08-13 17:43:53 +00:00
jmg
d2ff11056b looks like rwatson forgot tabs... :) 2004-08-13 07:38:58 +00:00
julian
9ab7967d3c Don't keep evaluating our own cpu mask..
it's not likely to have changed....
2004-08-13 00:57:43 +00:00
rwatson
74889f1a20 Trim trailing white space. 2004-08-12 18:06:21 +00:00
imp
482740a238 Minor formatting fixes for lines > 80 characters 2004-08-12 17:26:22 +00:00
jeff
8745e98dd0 - Introduce a new flag KEF_HOLD that prevents sched_add() from doing a
migration.  Use this in sched_prio() and sched_switch() to stop us from
   migrating threads that are in short term sleeps or are runnable.  These
   extra migrations were added in the patches to support KSE.
 - Only set NEEDRESCHED if the thread we're adding in sched_add() is a
   lower priority and is being placed on the current queue.
 - Fix some minor whitespace problems.
2004-08-12 07:56:33 +00:00
julian
765ec5c83b Properly keep track of how many kses are on the system run queue(s). 2004-08-11 20:54:48 +00:00
rwatson
eed836416f Replace a reference to splnet() with a reference to locking in a comment. 2004-08-11 03:43:10 +00:00
marcel
fbbaea5f90 Add __elfN(dump_thread). This function is called from __elfN(coredump)
to allow dumping per-thread machine specific notes. On ia64 we use this
function to flush the dirty registers onto the backingstore before we
write out the PRSTATUS notes.

Tested on: alpha, amd64, i386, ia64 & sparc64
Not tested on: arm, powerpc
2004-08-11 02:35:06 +00:00
rwatson
371cf09cf7 In v_addpollinfo(), we allocate storage to back vp->v_pollinfo. However,
we may sleep when doing so; check that we didn't race with another thread
allocating storage for the vnode after allocation is made to a local
pointer, and only update the vnode pointer if it's still NULL.  Otherwise,
accept that another thread got there first, and release the local storage.

Discussed with:	jmg
2004-08-11 01:27:53 +00:00
alc
7210ecc993 Eliminate the acquisition and release of Giant within physio(). Remove
the spl calls.

Reviewed by: phk@
Discussed with: scottl@
2004-08-10 21:47:11 +00:00
jhb
15d4b7d989 Synchronize the extra SA threading checks and return value handling of
condition variables with that of msleep().

Reviewed by:	davidxu
2004-08-10 17:42:59 +00:00
jeff
b109ddffbc - Use a new flag, KEF_XFERABLE, to record with certainty that this kse had
contributed to the transferable load count.  This prevents any potential
   problems with sched_pin() being used around calls to setrunqueue().
 - Change the sched_add() load balancing algorithm to try to migrate on
   wakeup.  This attempts to place threads that communicate with each other
   on the same CPU.
 - Don't clear the idle counts in kseq_transfer(), let the cpus do that when
   they call sched_add() from kseq_assign().
 - Correct a few out of date comments.
 - Make sure the ke_cpu field is correct when we preempt.
 - Call kseq_assign() from sched_clock() to catch any assignments that were
   done without IPI.  Presently all assignments are done with an IPI, but I'm
   trying a patch that limits that.
 - Don't migrate a thread if it is still runnable in sched_add().  Previously,
   this could only happen for KSE threads, but due to changes to
   sched_switch() all threads went through this path.
 - Remove some code that was added with preemption but is not necessary.
2004-08-10 07:52:21 +00:00
njl
7e21ce666c Skip the syncing disks loop if there are no dirty buffers. Remove a
variable used to flag the initial printf.

Submitted by:	truckman (earlier version)
2004-08-10 01:32:05 +00:00
scottl
ab3ce7c4d9 Add a temporary debugging hack to detect a deadlock in setrunqueue(). This
is here so that we can gather stats on the nature of the recent rash of
hard lockups, and in this particular case panic the machine instead of
letting it deadlock forever.
2004-08-10 00:26:25 +00:00
julian
00a6534a31 Slight changes to comments and some whitespace changes. 2004-08-09 21:57:30 +00:00
julian
38d3d854fe Make kg->kg_runnable actually count runnable threads in the ksegrp run queue
instead of only doing it sometimes.. This is not used outdide of debugging code
in the current code, but that will probably change.
2004-08-09 20:36:03 +00:00
julian
ecbe8aa287 Remove typos on KASSERT messages. 2004-08-09 20:13:07 +00:00
green
fbabec2d12 Normalize the VM wiring done with SPARSE_MAPPING: check for errors, and
unmap when done.  For whatever reason, SPARSE_MAPPING is not even a
config option, so this is dead code.
2004-08-09 18:46:13 +00:00
julian
61fada7840 Increase the amount of data exported by KTR in the KTR_RUNQ setting.
This extra data is needed to really follow what is going on in the
threaded case.
2004-08-09 18:21:12 +00:00
jmg
2c2b6c4ef7 add option to automaticly mark core dumps with the nodump flag
PR:		57065
Submitted by:	Walter C. Pelissero
2004-08-09 05:46:46 +00:00
davidxu
634d20a05e 1.Add KSE_INTR_DBSUSPEND command for kse_thr_interrupt to suspend a bound
thread, after the bound thread leaves critical region, the thread should
check debug flag may suspend itself by using the command.
2.Schedule upcall after thread is suspended by debugger
3.Wakeup upcall thread after process suspension.

Reviewed by: deischen
2004-08-08 22:32:20 +00:00
davidxu
f8c21c52ad Call thread_user_enter for M:N thread, ast() should be treated as another
entrance of kernel.
2004-08-08 22:28:33 +00:00
davidxu
6412ad5b2e Add pl_flags to ptrace_lwpinfo, two flags PL_FLAG_SA and PL_FLAG_BOUND
indicate that a thread is in UTS critical region.

Reviewed by: deischen
Approved by: marcel
2004-08-08 22:26:11 +00:00
dfr
6a047f3d1e Make sure that AT_PHDR has a useful value even for static programs. 2004-08-08 09:48:10 +00:00
jmg
6967b9b093 rearange some code that handles the thread taskqueue so that it is more
generic.  Introduce a new define TASKQUEUE_DEFINE_THREAD that takes a
single arg, which is the name of the queue.

Document these changes.
2004-08-08 02:37:22 +00:00
rwatson
656f433813 We're not yet ready to assert !Giant in kern_fcntl(), as it's called
with Giant from ABI wrappers such as Linux emulation.

Foot shoot off:	phk
2004-08-07 14:09:02 +00:00
rwatson
37eebe5058 Flag a broad range of VFS operations as GIANT_REQUIRED in order to
catch leaking into VFS without Giant.

Inch Giant a little lower in several file descriptor operations on
vnodes to cover only VFS operations that need it, rather than file
flag reading, etc.
2004-08-06 22:25:35 +00:00
rwatson
ee17f9503f In thread_exit(), include more information about the thread/process
context in the KTR trace record.  In particular, include the same
information as passed for mi_switch() and fork_exit() KTR trace
records.
2004-08-06 22:06:14 +00:00
rwatson
d6384e3daf Push UIDINFO_UNLOCK() slightly earlier in chgsbize(), as it's not
needed if we print the local variable version of the limit rather
than the shared version.
2004-08-06 22:04:33 +00:00
rwatson
8de3afda37 Avoid acquiring Giant for some common light-weight or already MPSAFE
fcntl() operations, including:

  F_DUPFD          dup() alias
  F_GETFD          retrieve close-on-exec flag
  F_SETFD          set close-on-exec flag
  F_GETFL          retrieve file descriptor flags

For the remaining fcntl() operations, do acquire Giant, especially
where we call into fo_ioctl() as a result.  We're not yet ready to
push Giant into fo_ioctl().  Once we do, this can all become quite a
bit prettier.
2004-08-06 22:00:55 +00:00
rwatson
36a8fef8a8 Cut a KTR record whenever a callout is invoked. Mark whether it runs
with Giant or not, and include the function point so it can be looked
up against the kernel symbol table during trace analysis.
2004-08-06 21:49:00 +00:00
jhb
d3254af40d Don't scare users with a warning about preemption being off when it isn't
yet safe to have on by default.
2004-08-06 15:49:44 +00:00
rwatson
6680706c2b In ithread_schedule(), when we plan to go harvest some entropy as
a result of scheduling an ithread, cut a KTR_INTR trace record so
that it's clear in tracing interrupt activity where and when the
entropy harvesting code is invoked.
2004-08-06 03:39:28 +00:00
cperciva
b4bae139fd When reseting a pending callout, perform the deregistration in
callout_reset rather than calling callout_stop.  This results in a few
lines of code duplication, but it provides a significant performance
improvement because it avoids recursing on callout_lock.

Requested by:	rwatson
2004-08-06 02:44:58 +00:00
jhb
73d1afd6fd Fix the code in rman that merges adjacent unallocated resources to use a
better check for 'adjacent'.  The old code assumed that if two resources
were adjacent in the linked list that they were also adjacent range wise.
This is not true when a resource manager has to manage disparate regions.
For example, the current interrupt code on i386/amd64 will instruct
irq_rman to manage two disjoint regions: 0-1 and 3-15 for the non-APIC
case.  If IRQs 1 and 3 were allocated and then released, the old code
would coalesce across the 1 to 3 boundary because the resources were
adjacent in the linked list thus adding 2 to the area of resources that
irq_rman managed as a side effect.  The fix adds extra checks so that
adjacent unallocated resources are only merged with the resource being
freed if the start and end values of the resources also match up.  The
patch also consolidates the checks for adjacent resources being allocated.
2004-08-05 15:48:18 +00:00
jhb
fb7bd65f3f Remove a potential deadlock on i386 SMP by changing the lazypmap ipi and
spin-wait code to use the same spin mutex (smp_tlb_mtx) as the TLB ipi
and spin-wait code snippets so that you can't get into the situation of
one CPU doing a TLB shootdown to another CPU that is doing a lazy pmap
shootdown each of which are waiting on each other.  With this change, only
one of the CPUs would do an IPI and spin-wait at a time.
2004-08-04 20:31:19 +00:00
jhb
f513ad537c Workaround a possible deadlock on SMP due to a spin lock LOR by disabling
the immediate awakening of proc0 (scheduler kproc, controls swapping
processes in and out).  The scheduler process periodically awakens already,
so this will not result in processes not being swapped in, there will just
be more latency in between a thread being made runnable and the scheduler
waking up to swap the affected process back in.
2004-08-04 20:24:40 +00:00
jhb
c75eeac1df Cache the value of curthread in the _get_sleep_lock() and _get_spin_lock()
macros and pass the value to the associated _mtx_*() functions to avoid
more curthread dereferences in the function implementations.  This provided
a very modest perf improvement in some benchmarks.

Suggested by:	rwatson
Tested by:	scottl
2004-08-04 20:18:45 +00:00
rwatson
5d6fea3b71 Assert Giant in namei(). Bugs have been reported in which, following
a sleep() call waking up in namei(), a later assertion triggers that
Giant is not held.  By asserting Giant at the start of namei(), we can
know that if that assertion triggers, Giant is lost during the call to
namei(), and not before.
2004-08-04 18:39:07 +00:00
rwatson
243f24944e Assert Giant in the following file descriptor-related functions:
Function             Reason
--------             ------
fdfree()             VFS
setugidsafety()      KQueue
fdcheckstd()         VFS
_fgetvp()            VFS
fgetsock()           Conditional assertion based on debug.mpsafenet
2004-08-04 18:35:33 +00:00
rwatson
76535adbaa Remove spl's from kern_resource.c. 2004-08-04 18:19:09 +00:00
mux
35780dc21a Instead of calling ia32_pause() conditionally on __i386__ or __amd64__
being defined, define and use a new MD macro, cpu_spinwait().  It only
expands to something on i386 and amd64, so the compiled code should be
identical.

Name of the macro found by:	jhb
Reviewed by:	jhb
2004-08-03 18:44:27 +00:00
pjd
7a05d0a3cd Don't skip permission checks when sending signals to zombie processes.
Pointed out by:	bde
Reviewed by:	rwatson
2004-08-03 15:39:23 +00:00
silby
e327e6bd59 Standardize pipe locking, ensuring that everything is locked via
pipelock(), not via a mixture of mutexes and pipelock().  Additionally,
add a few KASSERTS, and change some statements that should have been
KASSERTS into KASSERTS.

As a result of these cleanups, some segments of code have become
significantly shorter and/or easier to read.
2004-08-03 02:59:15 +00:00
davidxu
6f2afa324d s/TMDF_DONOTRUNUSER/TMDF_SUSPEND/g
Dicussed with: deischen
2004-08-03 02:23:06 +00:00
julian
6121fa3e4d Repeat after me:
"Do not apply your tested patches to your commit tree by hand"
2004-08-03 01:43:29 +00:00
julian
f1c5d06daf Remove an argument that is never used. 2004-08-02 23:48:43 +00:00
obrien
47f728c0bc Put a cap on the auto-tuning of kern.maxvnodes.
Cap value chosen by:	scottl
2004-08-02 21:52:43 +00:00
rwatson
a21d9ff09b Add what appears to be a missing '*/' at the end of a comment. 2004-08-02 01:38:27 +00:00
green
9532ab7116 * Add a "how" argument to uma_zone constructors and initialization functions
so that they know whether the allocation is supposed to be able to sleep
  or not.
* Allow uma_zone constructors and initialation functions to return either
  success or error.  Almost all of the ones in the tree currently return
  success unconditionally, but mbuf is a notable exception: the packet
  zone constructor wants to be able to fail if it cannot suballocate an
  mbuf cluster, and the mbuf allocators want to be able to fail in general
  in a MAC kernel if the MAC mbuf initializer fails.  This fixes the
  panics people are seeing when they run out of memory for mbuf clusters.
* Allow debug.nosleepwithlocks on WITNESS to be disabled, without changing
  the default.

Both bmilekic and jeff have reviewed the changes made to make failable
zone allocations work.
2004-08-02 00:18:36 +00:00
julian
b0892abf37 Comment kse_create() and make a few minor code cleanups
Reviewed by:	davidxu
2004-08-01 23:02:00 +00:00
phk
2d868d02cf Put a version element in the VFS filesystem configuration structure
and refuse initializing filesystems with a wrong version.  This will
aid maintenance activites on the 5-stable branch.

s/vfs_mount/vfs_omount/

s/vfs_nmount/vfs_mount/

Name our filesystems mount function consistently.

Eliminate the namiedata argument to both vfs_mount and vfs_omount.
It was originally there to save stack space.  A few places abused
it to get hold of some credentials to pass around.  Effectively
it is unused.

Reorganize the root filesystem selection code.
2004-07-30 22:08:52 +00:00
alc
6aaed2f8ea Giant is no longer required by vm_waitproc() and vmspace_exitfree().
Eliminate it acquisition and release around vm_waitproc() in kern_wait().
2004-07-30 20:31:02 +00:00
njl
774b91783e Minor message cleanup. 2004-07-30 01:30:05 +00:00
pjd
809d561dd5 Syscall kill(2) called for a zombie process should return 0.
Obtained from:	Darwin
2004-07-29 20:38:19 +00:00
pjd
7e5db42c7a Fill some informations about zombie processes as well.
Before this change every zombie process were reported as an owner of PID 0 in
ps(1) output.

Reviewed by:	julian
2004-07-29 20:27:59 +00:00
phk
075684f5fd Remove global variable rootdevs and rootvp, they are unused as such.
Add local rootvp variables as needed.

Remove checks for miniroot's in the swappartition.  We never did that
and most of the filesystems could never be used for that, but it had
still been copy&pasted all over the place.
2004-07-28 20:21:04 +00:00
kan
48b2ea77a3 Avoid casts as lvalues. 2004-07-28 06:42:41 +00:00
davidxu
5610a7e068 Use P_SINGLE_EXIT to check single-threading case, P_WEXIT is not for that
purpose.
2004-07-28 06:30:52 +00:00
phk
8c9258b82e Convert the vfsconf list to a TAILQ.
Introduce vfs_byname() function to find things on it.

Staticize vfs_nmount() function under the name vfs_donmount().

Various cleanups.
2004-07-27 22:32:01 +00:00
rwatson
4ab080249a Pass a thread argument into cpu_critical_{enter,exit}() rather than
dereference curthread.  It is called only from critical_{enter,exit}(),
which already dereferences curthread.  This doesn't seem to affect SMP
performance in my benchmarks, but improves MySQL transaction throughput
by about 1% on UP on my Xeon.

Head nodding:	jhb, bmilekic
2004-07-27 16:41:01 +00:00
rwatson
1fea905f48 Add "options ADAPTIVE_GIANT" which causes Giant to also be treated in
an adaptive fashion when adaptive mutexes are enabled.  The theory
behind non-adaptive Giant is that Giant will be held for long periods
of time, and therefore spinning waiting on it is wasteful.  However,
in MySQL benchmarks which are relatively Giant-free, running Giant
adaptive makes an observable difference on SMP (5% transaction rate
improvement).  As such, make adaptive behavior on Giant an option so
it can be more widely benchmarked.
2004-07-27 16:34:48 +00:00
alc
8a38bc6b2c - Use atomic ops for updating the vmspace's refcnt and exitingcnt.
- Push down Giant into shmexit().  (Giant is acquired only if the vmspace
   contains shm segments.)
 - Eliminate the acquisition of Giant from proc_rwmem().
 - Reduce the scope of Giant in exit1(), uncovering the destruction of the
   address space.
2004-07-27 03:53:41 +00:00
bmilekic
1c3958ce88 Move the schedlock owner state update following the context
switch in fork_exit() to before anything else is done (but keep
schedlock for the deadthread check).  This means one less
nasty bug if ever in the future whatever might have been called
before the update played with schedlock or critical sections.

Discussed with: tjr
2004-07-27 03:46:31 +00:00
cperciva
c009fddfd6 In revision 1.228, I accidentally broke the "total number of processes in
the system" resource limit code: When checking if the caller has superuser
privileges, we should be checking the *real* user, not the *effective*
user.  (In general, resource limiting is done based on the real user, in
order to avoid resource-exhaustion-by-setuid-program attacks.)

Now that a SUSER_RUID flag to suser_cred exists, use it here to return
this code to its correct behaviour.

Pointed out by:	rwatson
2004-07-26 07:54:39 +00:00
cperciva
d9fecc83c8 Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with:	rwatson, scottl
Requested by:	jhb
2004-07-26 07:24:04 +00:00
rwatson
ec34d4330f Revert modification of subr_turnstile.c accidentally included in the
last commit; this assertion was provided by jhb for local debugging
and not intended for broader consumption.
2004-07-25 23:32:32 +00:00
rwatson
4c9acdbfaf In uipc_connect(), assert that the passed thread is curthread, and pass
td into unp_connect() instead of reading curthread.
2004-07-25 23:30:43 +00:00
rwatson
0e43e3b1b4 Do some initial locking on accept filter registration and attach. While
here, close some races that existed in the pre-locking world during low
memory conditions.  This locking isn't perfect, but it's closer than
before.
2004-07-25 23:29:47 +00:00