Commit Graph

8649 Commits

Author SHA1 Message Date
Don Lewis
4053cae340 Track all lock relationships instead of pruning direct relationships
if an indirect relationship exists (keep both A->B->C and A->C).
This allows witness_checkorder() to use isitmychild() instead of
the much more expensive isitmydescendant() to check for valid lock
ordering.

Don't do an expensive tree walk to update the w_level values when
the tree is updated.  Only update the w_level values when using the
debugger to display the tree.

Nuke the experimental "witness_watch > 1" mode that only compared
w_level for the two locks.  This information is no longer maintained
at run time, and the use of isitmychild() in witness_checkorder
should bring performance close enough to the acceptable level that
this hack is not needed.

Report witness data structure allocation statistics under the
debug.witness sysctl.

Reviewed by:	jhb
MFC after:	30 days
2005-08-25 03:47:37 +00:00
Don Lewis
ad9f180121 Back out the removal of LK_NOWAIT from the VOP_LOCK() call in
vlrureclaim() in vfs_subr.c 1.636  because waiting for the vnode
lock aggravates an existing race condition.  It is also undesirable
according to the commit log for 1.631.

Fix the tiny race condition that remains by rechecking the vnode
state after grabbing the vnode lock and grabbing the vnode interlock.

Fix the problem of other threads being starved (which 1.636 attempted
to fix by removing LK_NOWAIT) by calling uio_yield() periodically
in vlrureclaim().  This should be more deterministic than hoping
that VOP_LOCK() without LK_NOWAIT will block, which may not happen
in this loop.

Reviewed by:	kan
MFC after:	5 days
2005-08-23 03:44:06 +00:00
Pawel Jakub Dawidek
4e4aa37e75 mp_ncpus is always (properly) initialized, even on UP kernels, so just use it. 2005-08-21 18:03:31 +00:00
Robert Watson
6cd8dee3c5 Silence "busy" warnings when unmounting devfs at system shutdown. This
is a workaround for non-symetric teardown of the file systems at
shutdown with respect to the mount order at boot.  The proper long term
fix is to properly detach devfs from the root mount before unmounting
each, and should be implemented, but since the problem is non-harmful,
this temporary band-aid will prevent false positive bug reports and
unnecessary error output for 6.0-RELEASE.

MFC after:	3 days
Tested by:	pav, pjd
2005-08-20 17:12:47 +00:00
Poul-Henning Kamp
1d45c50ec3 Properly un-giant-trick the cdevsw in fini_cdevsw()
Tripped over by:	Huang wen hui <huang@gddsn.org.cn>
2005-08-20 12:13:51 +00:00
David Xu
86ef8e2671 Add missing brackets.
Noticed by: stefanf@
2005-08-19 22:30:13 +00:00
David Xu
8c6d7a8db8 Fix a LOR between sched_lock and sleep queue lock. 2005-08-19 13:35:34 +00:00
David Xu
f8ec133ed0 Move up code for testing KEF_HOLD to avoid ke_cpu being changed unexpectly
for PRI_ITHD and PRI_REALTIME threads.
2005-08-19 11:51:41 +00:00
Hajimu UMEMOTO
1fea6ce7dd - don't forget to save freqency when priority is raised.
- nuke redundant variable initialization.
2005-08-18 16:41:25 +00:00
Hajimu UMEMOTO
5f36393468 don't forget to update curr_priority. even when frequency is
not changed, priority may be changed.
2005-08-18 16:08:56 +00:00
Poul-Henning Kamp
516ad423b1 Handle device drivers with D_NEEDGIANT in a way which does not
penalize the 'good' drivers:  Allocate a shadow cdevsw and populate
it with wrapper functions which grab Giant
2005-08-17 08:19:52 +00:00
Poul-Henning Kamp
a07b0febaa In vop_stdpathconf(ap) also default for _PC_NAME_MAX and _PC_PATH_MAX. 2005-08-17 06:59:23 +00:00
Hajimu UMEMOTO
961f7f911f Save cpu level only when priority is greater than PRIO_USER
to make CPUFREQ_SET(NULL, prio) work.
TODO: implement saved_level as stack.

Reviewed by:	njl
2005-08-16 20:03:08 +00:00
Poul-Henning Kamp
b3740d656f Remove stale comment. 2005-08-16 19:47:42 +00:00
Poul-Henning Kamp
31cc57cdbd Collect the devfs related sysctls in one place 2005-08-16 19:25:02 +00:00
Poul-Henning Kamp
9c0af1310c Create a new internal .h file to communicate very private stuff
from kern_conf.c to devfs.

For now just two prototypes, more to come.
2005-08-16 19:08:01 +00:00
Alexander Kabaev
0c207975f2 Do not keep parent directory locked while calling VFS_ROOT to traverse mount
points in lookup(). The lock can be dropped safely around VFS_ROOT because
LOCKPARENT semantics with child and perent vnodes coming from different FSes
does not really have any meaningful use. On the other hard, this prevents
easily triggered deadlock on systems using automounter daemon.
2005-08-14 18:10:04 +00:00
Alexander Kabaev
857b66d505 Do not use vm_pager_init() to initialize vnode_pbuf_freecnt variable.
vm_pager_init() is run before required nswbuf variable has been set
to correct value. This caused system to run with single pbuf available
for vnode_pager. Handle both cluster_pbuf_freecnt and vnode_pbuf_freecnt
variable in the same way.

Reported by:	ade
Obtained from:	alc
MFC after:	2 days
2005-08-13 20:21:33 +00:00
Marcel Moolenaar
fd65baf8e2 Make mpsafe_vfs=1 the default on ia64. 2005-08-13 20:07:50 +00:00
Nate Lawson
da8a77c1f1 The "lowest" sysctl setting makes more sense as the lowest one to use, so
discard all levels less than this setting, not less than/equal to.

MFC after:	1 day
2005-08-11 18:40:58 +00:00
Alexander Kabaev
45a0d1ed7a Do not drop the vnode interlock if vdropl is called on already doomed vnode.
vdropl callers expect it to return with interlock still being held.

MFC after:	2 days
2005-08-10 11:46:03 +00:00
Robert Watson
ae018704a1 Add an order between UDP inpcb locks and the IPv4 multicast address
list lock, as there has been a report that an alternative lock order
is getting introduced.  This should help ferret it out.

Reported by:	Ed Maste <emaste at phaedrus dot sandvine dot ca>
2005-08-09 13:27:50 +00:00
Robert Watson
13f4c340ae Propagate rename of IFF_OACTIVE and IFF_RUNNING to IFF_DRV_OACTIVE and
IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to
ifnet.if_drv_flags.  Device drivers are now responsible for
synchronizing access to these flags, as they are in if_drv_flags.  This
helps prevent races between the network stack and device driver in
maintaining the interface flags field.

Many __FreeBSD__ and __FreeBSD_version checks maintained and continued;
some less so.

Reviewed by:	pjd, bz
MFC after:	7 days
2005-08-09 10:20:02 +00:00
Christian S.J. Peron
d8339a2616 Drop in a WITNESS_WARN into SYSCTL_IN to make sure that we are
not holding any non-sleep-able-locks locks when copyin is called.
This gets executed un-conditionally since we have no function
to wire the buffer in this direction.

Pointed out by:	truckman
MFC after:	1 week
2005-08-08 21:06:42 +00:00
Robert Watson
6a113b3de7 Merge the dev_clone and dev_clone_cred event handlers into a single
event handler, dev_clone, which accepts a credential argument.
Implementors of the event can ignore it if they're not interested,
and most do.  This avoids having multiple event handler types and
fall-back/precedence logic in devfs.

This changes the kernel API for /dev cloning, and may affect third
party packages containg cloning kernel modules.

Requested by:	phk
MFC after:	3 days
2005-08-08 19:55:32 +00:00
Christian S.J. Peron
417ab24f78 Check to see if we wired the user-supplied buffers in SYSCTL_OUT, if
the buffer has not been wired and we are holding any non-sleep-able locks,
drop a witness warning. If the buffer has not been wired, it is possible
that the writing of the data can sleep, especially if the page is not in
memory. This can result in a number of different locking issues, including
dead locks.

MFC after:	1 week
Discussed with:	rwatson
Reviewed by:	jhb
2005-08-08 18:54:35 +00:00
David Xu
1278181c6c Try best to keep a preempted thread at front of run queue, this seems
improved performance a bit for some workloads, but still seeing interactive
lagging unless cpu idling race is fixed.
2005-08-08 14:20:10 +00:00
Peter Grehan
e000e00118 Export a routine, kobj_machdep_init(), that allows platforms
to use the kobj subsystem as soon at mutex_init() has been called
instead of having to wait for the SI_SUB_LOCK sysinit.

Reviewed by:	dfr
2005-08-07 02:20:35 +00:00
Christian S.J. Peron
9baea4b4b4 Change the data type of the upper shared memory limits from a signed
integer to an unsigned long. This lifts variables like the maximum
number of pages available for shared memory from 2^31 to 2^32 on 32
bit architectures, and from 2^31 to 2^64 on 64 bit architectures.

It should be noted that this changes breaks ABI on 64 bit architectures
because the size of the shmmax, shmmin, shmmni, shmseg and shmall members
of the shminfo structure has changed.

Silence on:	current@
2005-08-06 07:20:18 +00:00
Suleiman Souhlal
34cc826ae8 Holding a vnode doesn't prevent v_mount from disappearing (when the
vnode is inactivated), possibly leading to a NULL dereference when
checking if the mount wants knotes to be activated in the VOP hooks.
So, we add a new vnode flag VV_NOKNOTE that is only set in getnewvnode(),
if necessary, and check it when activating knotes.
Since the flags are not erased when a vnode is being held, we can safely
read them.

Reviewed by:	kris@
MFC after:	3 days
2005-08-06 01:42:04 +00:00
Robert Watson
dd5a318ba3 Introduce in_multi_mtx, which will protect IPv4-layer multicast address
lists, as well as accessor macros.  For now, this is a recursive mutex
due code sequences where IPv4 multicast calls into IGMP calls into
ip_output(), which then tests for a multicast forwarding case.

For support macros in in_var.h to check multicast address lists, assert
that in_multi_mtx is held.

Acquire in_multi_mtx around iteration over the IPv4 multicast address
lists, such as in ip_input() and ip_output().

Acquire in_multi_mtx when manipulating the IPv4 layer multicast addresses,
as well as over the manipulation of ifnet multicast address lists in order
to keep the two layers in sync.

Lock down accesses to IPv4 multicast addresses in IGMP, or assert the
lock when performing IGMP join/leave events.

Eliminate spl's associated with IPv4 multicast addresses, portions of
IGMP that weren't previously expunged by IGMP locking.

Add in_multi_mtx, igmp_mtx, and if_addr_mtx lock order to hard-coded
lock order in WITNESS, in that order.

Problem reported by:	Ed Maste <emaste at phaedrus dot sandvine dot ca>
MFC after:		10 days
2005-08-03 19:29:47 +00:00
Jeff Roberson
40a495853a - Unlock before we call mac_destroy_vnode to prevent a lock order reversal.
Found by:	trhodes
2005-08-03 05:36:50 +00:00
Jeff Roberson
9e2aaec1e3 - Use lockmgr_printinfo rather than rolling our own. This introduces a
slight problem by using printf instead of db_printf however
   'show lockedvnods' does the same so I believe it is ok for now.
2005-08-03 05:02:08 +00:00
Jeff Roberson
7499fd8de9 - Fix a problem that slipped through review; the stack member of the lockmgr
structure should have the lk_ prefix.
 - Add stack_print(lkp->lk_stack) to the information printed with
   lockmgr_printinfo().
2005-08-03 04:59:07 +00:00
Jeff Roberson
e8ddb61d38 - Replace the series of DEBUG_LOCKS hacks which tried to save the vn_lock
caller by saving the stack of the last locker/unlocker in lockmgr.  We
   also put the stack in KTR at the moment.

Contributed by:		Antoine Brodin <antoine.brodin@laposte.net>
2005-08-03 04:48:22 +00:00
Jeff Roberson
8d511e2a05 - Add support for saving stack traces and displaying them via printf(9)
and KTR.

Contributed by:		Antoine Brodin <antoine.brodin@laposte.net>
Concept code from:	Neal Fachan <neal@isilon.com>
2005-08-03 04:27:40 +00:00
David Xu
3c424d1447 In adjustrunqueue(), add code to handle thread migrating case for
ULE scheduler. In original code, local run queue of threaded ksegrp
is corrupted if adjustrunqueue() is called while thread is migrating.
2005-08-03 01:23:45 +00:00
Ruslan Ermilov
2319835713 Long overdue, keep up with mbuf.h,v 1.148. 2005-08-02 20:03:23 +00:00
Kelly Yancey
dcb5fef5db Make getsockopt(..., SOL_SOCKET, SO_ACCEPTCONN, ...) work per IEEE Std
1003.1 (POSIX).
2005-08-01 21:15:09 +00:00
David Xu
3d16f519b6 If a thread was removed from system run queue, kse_assign shouldn't
add it again.
2005-07-31 15:11:21 +00:00
Alexander Leidinger
32069af652 The resource_xxx routines in subr_hints.c are called before and after the
kenv environment in kern_environment.c switches to dynamic kenv. The prior
call sets the static variable hintp to the static hints in subr_hints.c
(hintmode==0).

However, changes to the environment are not detected by the resource_xxx
lookups after the change to dynamic kernel environment, so the lookup
routines only report the old stuff of hintmode==0, even after the change to
the dynamic kenv. This causes kenv users to see a different environment than
the kernel routines.

This is a problem in the mixer.c code that looks up initial mixer volume
settings from the hints: If the hints are dynamic and not from the
device.hints file, mixer.c doesn't see them, but kenv does.

The patch from the PR (modified to comply to the style of the function)
solves this.

PR:		83686
Submitted by:	Harry Coin <harrycoin@qconline.com>
2005-07-31 10:46:55 +00:00
Alexander Leidinger
3904769ba8 Add bounds checking to the setenv part of the kernel environment.
This has no security implications since only root is allowed to use
kenv(1) (and corrupt the kernel memory after adding too much variables
previous to this commit).

This is based upon the PR [1] mentioned below, but extended to check both
bounds (in case of an overflow of the counting variable) and to comply
to the style of the function. An overflow of the counting variable
shouldn't happen after adding the check for the upper bound, but better
safe than sorry (in case some other function in the kernel overwrites
random memory).

An interested soul may want to add a printf to notify root in case the
bounds are hit.

Also allocate KENV_SIZE+1 entries (the array is NULL-terminated), since
the comment for KENV_SIZE says it's the maximum number of environment
strings. [2]

PR:		83687 [1]
Submitted by:	Harry Coin <harrycoin@qconline.com> [1]
Submitted by:	Ariff Abdullah <skywizard@MyBSD.org.my> [2]
2005-07-31 10:28:35 +00:00
Joseph Koshy
fadcc6e201 Fail the module loading process if the currently executing kernel
was not compiled with 'options HWPMC_HOOKS' or if the compiled-in
version numbers of the kernel and module are out of sync.

Reported by:	cracauer
MFC after:	3 days
2005-07-30 09:02:42 +00:00
Paul Saab
1126349ae7 Ignore mutex asserts when we're dumping as well. This allows me
to panic a system from DDB when INVARIANTS is compiled into the
kernel on a scsi system.
2005-07-30 05:54:30 +00:00
Sam Leffler
ab8ab90c5b add m_align, a function to align any type of mbuf (i.e. it
is a superset of M_ALIGN and MH_ALIGN)

Reviewed by:	several
2005-07-30 01:32:16 +00:00
R. Imura
080e3a63b3 Change API of mb_copy_t in libmchain so that netsmb can handle
multibyte character share name correctly.

Reviewed by:	bp
2005-07-29 13:22:37 +00:00
George V. Neville-Neil
0d52d7b01a Fix for PR 83885.
Make sure that there actually is a next packet before setting
nextrecord to that field.

PR: 83885
Submitted by: hirose@comm.yamaha.co.jp
Obtained from:	Patch suggested in the PR
MFC after: 1 week
2005-07-28 10:10:01 +00:00
Pawel Jakub Dawidek
73864adbd4 Fix the way how "InUse" column in 'vmstat -m' output works:
- increase number of allocations count only on successfull malloc(9),
  so it doesn't confuse people;
- because we need to check if 'size > 0', hide 'mtsp->mts_memalloced += size;'
  under the check as well, as for size=0 it is of course a no-op;
- avoid critical_enter()/critical_exit() in case of failure in
  malloc_type_allocated() as there will be nothing to do.

OK'ed by:	rwatson
MFC after:	2 days
2005-07-27 23:17:31 +00:00
Xin LI
05a6b7ad62 Cast to uintptr_t when the compiler complains. This unbreaks ULE
scheduler breakage accompanied by the recent atomic_ptr() change.
2005-07-25 10:21:49 +00:00
Alan Cox
ec9c9e7363 Eliminate inconsistency in the setting of the B_DONE flag. Specifically,
make the b_iodone callback responsible for setting it if it is needed.
Previously, it was set unconditionally by bufdone() without holding
whichever lock is shared by the b_iodone callback and the corresponding
top-half function.  Consequently, in a race, the top-half function could
conclude that operation was done before the b_iodone callback finished.
See, for example, aio_physwakeup() and aio_fphysio().

Note: I don't believe that the other, more widely-used b_iodone callbacks
are affected.

Discussed with: jeff
Reviewed by: phk
MFC after: 2 weeks
2005-07-20 19:06:06 +00:00