Commit Graph

9680 Commits

Author SHA1 Message Date
rrs
c427816562 The prepend function did not handle non-pkthdr's correctly.
It always called MH_ALIGN for small lengths being
prepended (less than MHLEN). This meant that if you did
a prepend on a non M_PKTHDR the system would panic with
the KASSERT in MH_ALIGN. Instead we are not aware of
this and do a MH_ALIGN or M_ALIGN as appropriate.

Reviewed by:	andre
Approved by:	gnn
2006-12-21 19:58:04 +00:00
rwatson
6fa1425be4 Remove mac_enforce_subsystem debugging sysctls. Enforcement on
subsystems will be a property of policy modules, which may require
access control check entry points to be invoked even when not actively
enforcing (i.e., to track information flow without providing
protection).

Obtained from:	TrustedBSD Project
Suggested by:	Christopher dot Vance at sparta dot com
2006-12-21 09:51:34 +00:00
rwatson
5749ecccba Expand commenting on label slots, justification for the MAC Framework locking
model, interactions between locking and policy init/destroy methods.

Rewrap some comments to 77 character line wrap.

Obtained from:	TrustedBSD Project
2006-12-20 20:38:44 +00:00
jkim
0099defbac MFP4: (part of) 110058
copyin()/copyout() for message type is separated from msgsnd()/msgrcv() and
it is done from its wrapper functions to support 32-bit emulations.  After I
implemented this, I have briefly referenced NetBSD and Darwin.  NetBSD passes
copyin()/copyout() function pointers from wrappers.  Darwin passes size of
message type as an argument, which is actually similar to my first
implementation (P4 109706).  We may revisit these implementations later.
2006-12-20 19:26:30 +00:00
kib
9311fcbc5d In rev. 1.514, iodone on async buffer may happen before code checks the
vnode v_flag. For cluster buffers this would result in dereferencing NULL
b_vp. To prevent the panic, cache relevant vnode flag before calling
bstrategy.

Reported by:	Peter Holm, kris
Tested by:	Peter Holm
Reviewed by: tegge
Pointy hat to:	kib
2006-12-20 09:22:31 +00:00
davidxu
5a984630fa Add a lwpid field into per-cpu structure, the lwpid represents current
running thread's id on each cpu. This allow us to add in-kernel adaptive
spin for user level mutex. While spinning in user space is possible,
without correct thread running state exported from kernel, it hardly
can be implemented efficiently without wasting cpu cycles, however
exporting thread running state unlikely will be implemented soon as
it has to design and stablize interfaces. This implementation is
transparent to user space, it can be disabled dynamically. With this
change, mutex ping-pong program's performance is improved massively on
SMP machine. performance of mysql super-smack select benchmark is increased
about 7% on Intel dual dual-core2 Xeon machine, it indicates on systems
which have bunch of cpus and system-call overhead is low (athlon64, opteron,
and core-2 are known to be fast), the adaptive spin does help performance.

Added sysctls:
    kern.threads.umtx_dflt_spins
        if the sysctl value is non-zero, a zero umutex.m_spincount will
        cause the sysctl value to be used a spin cycle count.
    kern.threads.umtx_max_spins
        the sysctl sets upper limit of spin cycle count.

Tested on: Athlon64 X2 3800+, Dual Xeon 5130
2006-12-20 04:40:39 +00:00
mbr
a2c03bf6cb Back out rev. 1.266. The real cause for the recent panics has been fixed
in rev. 1.267 and there is no need to keep this test.
2006-12-20 02:49:59 +00:00
mbr
37965664cc Giant might have been temporarily dropped while waiting for proctree_lock, allowing for an
intervening tty_close() that cleared tp->t_session.

Submitted by:	tegge
MFC:		1 day
2006-12-19 22:34:32 +00:00
mbr
ccabbc6486 Add the tp->t_refcnt validity check back. There are still some race
conditions where tp->t_refcnt can go to zero.
2006-12-19 16:46:13 +00:00
davidxu
b0b74f9bd3 Remove unused sysctls. 2006-12-19 13:06:01 +00:00
pjd
6cc6a8d100 Use pipe_direct_write() optimization only if the data is in process' memory.
This fixes sending data through pipe from the kernel.

Fix suggested by:	rwatson
2006-12-19 12:52:22 +00:00
kmacy
af645e118f ktrace_cv is no longer used - remove
Submitted by: Attilio Rao
2006-12-17 00:16:09 +00:00
kmacy
bb69932355 Cleaner fix for handling declaration of loop variable under INVARIANTS
- in trying to avoid nested brackets and #ifdef INVARIANTS around i at the
  top, I broke booting for INVARIANTS all together :-(
- the cleanest fix is to simply assign to sq twice if INVARIANTS is enabled
- tested both with and without INVARIANTS :-/
2006-12-17 00:14:20 +00:00
ache
aebc61a22f Don't intermix assignments and variable declarations in prev. commit 2006-12-16 21:17:27 +00:00
ache
84d03f55f7 Fix NULL pointer reference for INVARIANTS case
Submitted by:   Yuriy Tsibizov <Yuriy.Tsibizov@gfk.ru>
2006-12-16 20:33:26 +00:00
rodrigc
10e4664552 In vfs_export(), if we specify MNT_DELEXPORT in the struct export_args,
after we perform the operations to delete the export,
call vfs_deleteopt() to delete the "export" mount option from
the linked list of mount options associated with that mount point.

This fixes one scenario:
- put a filesystem in /etc/exports to export it
- remove the filesystem from /etc/exports to delete the export and restart
  mountd
- try to do a "mount -u -o ro" or "mount -u -o rw" on that filesystem
  now that it is no  longer exported.
2006-12-16 15:50:36 +00:00
rodrigc
ccbffdda2c Add a function vfs_deleteopt() which searches through the vfsoptlist
linked list of mount options by name, and deletes the option if it finds it.
2006-12-16 15:44:03 +00:00
rodrigc
2aade96a59 Convert to ANSI-style function prototypes. 2006-12-16 12:06:59 +00:00
rwatson
4d1fe0d425 For now, back out sysv_ipc.c:1.30, which caused shmget() with odd mode
arguments to fail.  The mode field for shmget() appears to have undefined
meaning in the context of an already-present IPC object, but applications
appear to assume any arbitrary passed value will be ignored.  I had hoped
to revisit this more quickly, but am removing the change for now to
prevent toe-stubbing.

Reported by:	JAroslav Suchanek <jarda at grisoft dot cz>
PR:		kern/106078
2006-12-16 11:30:54 +00:00
kmacy
4e5f5353fb correct name of number of sleep queues 2006-12-16 07:50:39 +00:00
kmacy
7327d346fc Add second sleep queue so that sx and lockmgr can have separate sleep
queues for shared and exclusive acquisitions

Submitted by: Attilio Rao
Approved by: jhb
2006-12-16 06:54:09 +00:00
kmacy
ad3abace0c - Fix some gcc warnings in lock_profile.h
- add cnt_hold cnt_lock support for spin mutexes
- make sure contested is initialized to zero to only bump contested when appropriate
- move initialization function to kern_mutex.c to avoid cyclic dependency between
  mutex.h and lock_profile.h
2006-12-16 02:37:58 +00:00
n_hibma
c98f016084 Align the interfaces for the various watchdogs and make the interface
behave as expected.

Also:
- Return an error if WD_PASSIVE is passed in to the ioctl as only
  WD_ACTIVE is implemented at the moment. See sys/watchdog.h for an
  explanation of the difference between WD_ACTIVE and WD_PASSIVE.
- Remove the I_HAVE_TOTALLY_LOST_MY_SENSE_OF_HUMOR define. If you've
  lost your sense of humor, than don't add a define.

Specific changes:

i80321_wdog.c
  Don't roll your own passive watchdog tickle as this would defeat the
  purpose of an active (userland) watchdog tickle.

ichwd.c / ipmi.c:
  WD_ACTIVE means active patting of the watchdog by a userland process,
  not whether the watchdog is active. See sys/watchdog.h.

kern_clock.c:
  (software watchdog) Remove a check for WD_ACTIVE as this does not make
  sense here. This reverts r1.181.
2006-12-15 21:44:49 +00:00
kib
e7cdcb3240 Resolve two deadlocks that could be caused by busy md device backed
by vnode. Allow for md thread and the thread that owns lock on vnode
backing the md device to do the write even when runningbufspace is
exhausted.

Tested by:	Peter Holm
Reviewed by:	tegge
MFC after:	2 weeks
2006-12-14 11:34:07 +00:00
jhb
65d8bd30a0 Add a function to return the MD interrupt source cookie associated with
an interrupt event.  Use this in the x86 code to fixup the intrcnt names
when an interrupt handler is removed.
2006-12-12 19:20:19 +00:00
jhb
7106027433 Add a comment and fix a whitespace nit. 2006-12-12 19:19:22 +00:00
julian
541d02c2d4 Fix a potential point of confusion. Art Ironport we've seen this end up
with an infinite loop in and out of the kernel during process shutdown.
2006-12-12 08:01:55 +00:00
rodrigc
fbf224913d Use vfs_mount_error() to log mount errors in a few places with human
readable strings which can be retrieved if an "errmsg" parameter is
passed into nmount().
2006-12-07 02:57:00 +00:00
julian
948c671f4a Changes to try fix sched_ule.c courtesy of David Xu. 2006-12-06 06:55:59 +00:00
julian
396ed947f6 Threading cleanup.. part 2 of several.
Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs.  Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.
2006-12-06 06:34:57 +00:00
kmacy
ff9c89ea11 Bug fix for obscenely large wait times on uncontested locks
if waittime was zero (the lock was uncontested) l->lpo_waittime
in the hash table would not get initialized.

Inspection prompted by questions from: Attilio Rao
2006-12-04 22:15:50 +00:00
jhb
12956f1daa Fix an edge case in rman_manage_region() where it didn't handle a resource
ending at ULONG_MAX properly.  While here, use TAILQ_FOREACH_SAFE().

Tested by:	"Stephane E. Potvin" <sepotvin at videotron-ca>
MFC after:	1 week
2006-12-04 16:45:23 +00:00
davidxu
22a81fc246 if a thread blocked on userland condition variable is
pthread_cancel()ed, it is expected that the thread will not
consume a pthread_cond_signal(), therefor, we use thr_wake()
to mark a flag, the flag tells a thread calling do_cv_wait()
in umtx code to not block on a condition variable.
Thread library is expected that once a thread detected itself
is in pthread_cond_wait, it will call the thr_wake() for itself
in its SIGCANCEL handler.
2006-12-04 14:15:12 +00:00
davidxu
f6f7f2a20e Introduce userspace condition variable, since we have already POSIX
priority mutex implemented, it is the time to introduce this stuff,
now we can use umutex and ucond together to implement pthread's
condition wait/signal.
2006-12-03 01:49:22 +00:00
kib
29a6b4a0eb Linker set support depends on the magic __start_<section> and
__stop_<section> symbols generated by the static linker for elf
sections. This is done only for the final link, and not for ld -r.
Augment elf_obj in-kernel linker by recognizing such special symbols,
and resolving them to the start and end of the section automatically.

As result, linker sets on amd64 could be used in the same way as on
other architectures, without explicit calls to linker_file_lookup_set().

Requested by:	rdivacky
No objections from:	peter, jhb
2006-11-30 10:50:29 +00:00
phk
b911f6e6f0 Only grab the sched_lock if we actually need to modify the thread priority.
During a buildworld only 2/3 of the calls to msleep actually changed
the priority.
2006-11-30 08:27:38 +00:00
jb
01bbbf558e Flushing the buffer is conditional on actually using the buffer. Oops. 2006-11-30 07:25:52 +00:00
jb
da35e3e55f Turn console printf buffering into a kernel option and only on
by default for sun4v where it is absolutely required.

This change moves the buffer from struct pcpu to the stack to avoid
using the critical section which created a LOR in a couple of cases
due to interaction with the tty code and kqueue. The LOR can't be
fixed with the critical section and the pcpu buffer can't be used
without the critical section.

Putting the buffer on the stack was my initial solution, but it was
pointed out that the stress on the stack might cause problems
depending on the call path. We don't have a way of creating tests
for those possible cases, so it's best to leave this as an option
for the time being. In time we may get enough data to enable this
option more generally.
2006-11-30 04:17:05 +00:00
davidxu
0c2ab2f920 - Remove third parameter of itimer_find, the parameter is always zero.
- Call callout_drain on deleting POSIX timer.
- Use kern_timer_delete in exiting hook.
2006-11-28 03:24:34 +00:00
mohans
38630c101d Fix a race in soclose() where connections could be queued to the
listening socket after the pass that cleans those queues. This
results in these connections being orphaned (and leaked). The fix
is to clean up the so queues after detaching the socket from the
protocol. Thanks to ups and jhb for discussions and a thorough code
review.
2006-11-22 23:54:29 +00:00
jhb
80327896bd Save exit status of an exiting process in kn_data in the knote.
Submitted by:	Jared Yanovich ^phirerunner at comcast.net^
MFC after:	2 weeks
2006-11-20 22:17:50 +00:00
julian
14dac92354 whitespace fix only 2006-11-20 16:13:02 +00:00
davidxu
a25887447d Use scheduler API sched_user_prio() to adjust thread's userland priority,
use td_base_user_prio to get real userland priority since POSIX priority
mutex may adjust td_user_pri which is an effective priority.
2006-11-20 05:50:59 +00:00
alc
d93a445ea9 Add vm map and object locking to each_writable_segment().
Noticed by: jhb@
MFC after: 3 weeks
2006-11-19 23:38:59 +00:00
jkim
edc10a6695 Fix msgsnd(3)/msgrcv(3) deadlock under heavy resource pressure by timing out
msgsnd and rechecking resources.  This problem was found while I was running
Linux Test Project test suite (test cases: msgctl08, msgctl09).
Change `msgwait' to `msgsnd' and `msgrcv' to distinguish its sleeping
conditions.  Few cosmetic changes to debugging messages.
2006-11-17 20:43:01 +00:00
pjd
63d82b700d Change sleepq_add(9) argument from 'struct mtx *' to 'struct lock_object *',
which allows to use it with different kinds of locks. For example it allows
to implement Solaris conditions variables which will be used in ZFS port on
top of sx(9) locks.

Reviewed by:	jhb
2006-11-16 01:02:00 +00:00
jhb
fa8eeee427 Adjust assertions to allow for magical properties of the 'lbolt' wait
channel for tsleep():
- Allow tsleep() on &lbolt without Giant with a timeout 0 since &lbolt has
  an implied timeout.
- If &lbolt is used with msleep() pass NULL to sleepq_add() for the lock
  object.  Unlike other sleepq channels, &lbolt doesn't have an associated
  owning lock.
2006-11-15 20:44:07 +00:00
davidxu
c3c0231226 Fix a copy-paste bug in NON-KSE case. 2006-11-14 05:48:27 +00:00
kmacy
0c00ea16db change vop_lock handling to allowing tracking of callers' file and line for
acquisition of lockmgr locks

Approved by: scottl (standing in for mentor rwatson)
2006-11-13 05:51:22 +00:00
kmacy
ec9503cd04 track lock class name in a way that doesn't break WITNESS 2006-11-13 05:41:46 +00:00