Commit Graph

9653 Commits

Author SHA1 Message Date
John Baldwin
c304531851 Add a function to return the MD interrupt source cookie associated with
an interrupt event.  Use this in the x86 code to fixup the intrcnt names
when an interrupt handler is removed.
2006-12-12 19:20:19 +00:00
John Baldwin
bc17acb2ad Add a comment and fix a whitespace nit. 2006-12-12 19:19:22 +00:00
Julian Elischer
0c17ece676 Fix a potential point of confusion. Art Ironport we've seen this end up
with an infinite loop in and out of the kernel during process shutdown.
2006-12-12 08:01:55 +00:00
Craig Rodrigues
3a13c9cc28 Use vfs_mount_error() to log mount errors in a few places with human
readable strings which can be retrieved if an "errmsg" parameter is
passed into nmount().
2006-12-07 02:57:00 +00:00
Julian Elischer
fc6c30f6c6 Changes to try fix sched_ule.c courtesy of David Xu. 2006-12-06 06:55:59 +00:00
Julian Elischer
ad1e7d285a Threading cleanup.. part 2 of several.
Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs.  Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.
2006-12-06 06:34:57 +00:00
Kip Macy
aa077979f6 Bug fix for obscenely large wait times on uncontested locks
if waittime was zero (the lock was uncontested) l->lpo_waittime
in the hash table would not get initialized.

Inspection prompted by questions from: Attilio Rao
2006-12-04 22:15:50 +00:00
John Baldwin
5505470e4a Fix an edge case in rman_manage_region() where it didn't handle a resource
ending at ULONG_MAX properly.  While here, use TAILQ_FOREACH_SAFE().

Tested by:	"Stephane E. Potvin" <sepotvin at videotron-ca>
MFC after:	1 week
2006-12-04 16:45:23 +00:00
David Xu
745fbd3a72 if a thread blocked on userland condition variable is
pthread_cancel()ed, it is expected that the thread will not
consume a pthread_cond_signal(), therefor, we use thr_wake()
to mark a flag, the flag tells a thread calling do_cv_wait()
in umtx code to not block on a condition variable.
Thread library is expected that once a thread detected itself
is in pthread_cond_wait, it will call the thr_wake() for itself
in its SIGCANCEL handler.
2006-12-04 14:15:12 +00:00
David Xu
a6abdf322d Introduce userspace condition variable, since we have already POSIX
priority mutex implemented, it is the time to introduce this stuff,
now we can use umutex and ucond together to implement pthread's
condition wait/signal.
2006-12-03 01:49:22 +00:00
Konstantin Belousov
7226306ed5 Linker set support depends on the magic __start_<section> and
__stop_<section> symbols generated by the static linker for elf
sections. This is done only for the final link, and not for ld -r.
Augment elf_obj in-kernel linker by recognizing such special symbols,
and resolving them to the start and end of the section automatically.

As result, linker sets on amd64 could be used in the same way as on
other architectures, without explicit calls to linker_file_lookup_set().

Requested by:	rdivacky
No objections from:	peter, jhb
2006-11-30 10:50:29 +00:00
Poul-Henning Kamp
a4dcb4f627 Only grab the sched_lock if we actually need to modify the thread priority.
During a buildworld only 2/3 of the calls to msleep actually changed
the priority.
2006-11-30 08:27:38 +00:00
John Birrell
d4fbc81d99 Flushing the buffer is conditional on actually using the buffer. Oops. 2006-11-30 07:25:52 +00:00
John Birrell
e0b651251d Turn console printf buffering into a kernel option and only on
by default for sun4v where it is absolutely required.

This change moves the buffer from struct pcpu to the stack to avoid
using the critical section which created a LOR in a couple of cases
due to interaction with the tty code and kqueue. The LOR can't be
fixed with the critical section and the pcpu buffer can't be used
without the critical section.

Putting the buffer on the stack was my initial solution, but it was
pointed out that the stress on the stack might cause problems
depending on the call path. We don't have a way of creating tests
for those possible cases, so it's best to leave this as an option
for the time being. In time we may get enough data to enable this
option more generally.
2006-11-30 04:17:05 +00:00
David Xu
843b99c6f7 - Remove third parameter of itimer_find, the parameter is always zero.
- Call callout_drain on deleting POSIX timer.
- Use kern_timer_delete in exiting hook.
2006-11-28 03:24:34 +00:00
Mohan Srinivasan
84eab9ad73 Fix a race in soclose() where connections could be queued to the
listening socket after the pass that cleans those queues. This
results in these connections being orphaned (and leaked). The fix
is to clean up the so queues after detaching the socket from the
protocol. Thanks to ups and jhb for discussions and a thorough code
review.
2006-11-22 23:54:29 +00:00
John Baldwin
6600b45d88 Save exit status of an exiting process in kn_data in the knote.
Submitted by:	Jared Yanovich ^phirerunner at comcast.net^
MFC after:	2 weeks
2006-11-20 22:17:50 +00:00
Julian Elischer
de38cd9d8b whitespace fix only 2006-11-20 16:13:02 +00:00
David Xu
fa0d3a327a Use scheduler API sched_user_prio() to adjust thread's userland priority,
use td_base_user_prio to get real userland priority since POSIX priority
mutex may adjust td_user_pri which is an effective priority.
2006-11-20 05:50:59 +00:00
Alan Cox
976a87a284 Add vm map and object locking to each_writable_segment().
Noticed by: jhb@
MFC after: 3 weeks
2006-11-19 23:38:59 +00:00
Jung-uk Kim
e22291430e Fix msgsnd(3)/msgrcv(3) deadlock under heavy resource pressure by timing out
msgsnd and rechecking resources.  This problem was found while I was running
Linux Test Project test suite (test cases: msgctl08, msgctl09).
Change `msgwait' to `msgsnd' and `msgrcv' to distinguish its sleeping
conditions.  Few cosmetic changes to debugging messages.
2006-11-17 20:43:01 +00:00
Pawel Jakub Dawidek
7ee07175af Change sleepq_add(9) argument from 'struct mtx *' to 'struct lock_object *',
which allows to use it with different kinds of locks. For example it allows
to implement Solaris conditions variables which will be used in ZFS port on
top of sx(9) locks.

Reviewed by:	jhb
2006-11-16 01:02:00 +00:00
John Baldwin
7eefbf10c8 Adjust assertions to allow for magical properties of the 'lbolt' wait
channel for tsleep():
- Allow tsleep() on &lbolt without Giant with a timeout 0 since &lbolt has
  an implied timeout.
- If &lbolt is used with msleep() pass NULL to sleepq_add() for the lock
  object.  Unlike other sleepq channels, &lbolt doesn't have an associated
  owning lock.
2006-11-15 20:44:07 +00:00
David Xu
653385756c Fix a copy-paste bug in NON-KSE case. 2006-11-14 05:48:27 +00:00
Kip Macy
2f6a774be4 change vop_lock handling to allowing tracking of callers' file and line for
acquisition of lockmgr locks

Approved by: scottl (standing in for mentor rwatson)
2006-11-13 05:51:22 +00:00
Kip Macy
61bd5e21b3 track lock class name in a way that doesn't break WITNESS 2006-11-13 05:41:46 +00:00
Kip Macy
44a96b46bd Unbreak witness 2006-11-12 23:23:38 +00:00
Andre Oppermann
3e932ca715 In kern_sendfile() fix the calculation of sbytes (the total number of bytes
written to the socket).  The rewrite in revision 1.240 got confused by the
FreeBSD 4.x bug compatibility code.

For some reason lighttpd, that was used for testing the new sendfile code,
was not affected by the problem but apache and others using headers/trailers
in the sendfile call received incorrect sbytes values after return from non-
blocking sockets.  This then lead to restarts with wrong offsets and thus
mixed up file contents when the socket was writeable again.  All programs
not using headers/trailers, like ftpd, were not affected by the bug.

Reported by:	Pawel Worach <pawel.worach-at-gmail.com>
Tested by:	Pawel Worach <pawel.worach-at-gmail.com>
2006-11-12 20:57:00 +00:00
David Xu
60d4823594 Copy base user priority in NO_KSE case. 2006-11-12 11:48:37 +00:00
Tom Rhodes
bedc1c9c96 Fix mispatch of includes list; allows my kernel to build successfully. 2006-11-12 03:34:03 +00:00
Kip Macy
54e57f7613 show lock class in profiling output for default case where type is not specified when initializing the lock
Approved by: scottl (standing in for mentor rwatson)
2006-11-12 03:30:01 +00:00
David Xu
812fb4a89f Use mi_switch, this should fix loadavg calculation problem in NO_KSE case. 2006-11-12 03:18:22 +00:00
Tom Rhodes
c4f7f0fd4a Update includes for sys/posix4 move.
Approved by:	silence on -arch and -standards
2006-11-11 16:46:31 +00:00
Tom Rhodes
6aeb05d7be Merge posix4/* into normal kernel hierarchy.
Reviewed by:	glanced at by jhb
Approved by:	silence on -arch@ and -standards@
2006-11-11 16:26:58 +00:00
Tom Rhodes
bdd04ab184 Update #includes list. 2006-11-11 16:19:12 +00:00
David Xu
5a21514727 Unbreak userland priority inheriting in NO_KSE case. 2006-11-11 13:11:29 +00:00
Kip Macy
ed6a7c42f6 tinderbox fix 2006-11-11 07:38:48 +00:00
Kip Macy
cf2c39e7a2 remove lingering call to rd(tick) 2006-11-11 07:28:45 +00:00
Kip Macy
83b72e3e25 missed nits replacing mutex with lock 2006-11-11 06:28:47 +00:00
Kip Macy
7c0435b933 MUTEX_PROFILING has been generalized to LOCK_PROFILING. We now profile
wait (time waited to acquire) and hold times for *all* kernel locks. If
the architecture has a system synchronized TSC, the profiling code will
use that - thereby minimizing profiling overhead. Large chunks of profiling
code have been moved out of line, the overhead measured on the T1 for when
it is compiled in but not enabled is < 1%.

Approved by: scottl (standing in for mentor rwatson)
Reviewed by: des and jhb
2006-11-11 03:18:07 +00:00
Maxim Konovalov
f645b5da88 o Fix a couple of obvious typos. 2006-11-08 09:09:07 +00:00
Andre Oppermann
62b36a7fc2 Style cleanups to the sctp_* syscall functions. 2006-11-07 21:28:12 +00:00
John Baldwin
6b8de13ab4 Simplify operations with sync_mtx in sched_sync():
- Don't drop the lock just to reacquire it again to check rushjob, this
  only wastes time.
- Use msleep() to drop the mutex while sleeping instead of explicitly
  unlocking around tsleep.

Reviewed by:	pjd
2006-11-07 19:45:05 +00:00
John Baldwin
8064e5d71f Fix comment typo and function declaration. 2006-11-07 19:07:33 +00:00
Tor Egge
40dee3da29 Don't drop reference to tty in tty_close() if TS_ISOPEN is already cleared.
Reviewed by:	bde
2006-11-06 22:12:43 +00:00
Andre Oppermann
bda8b1f3b8 Handle early errors in kern_sendfile() by introducing a new goto 'out'
label after the sbunlock() part.

This correctly handles calls to sendfile(2) without valid parameters
that was broken in rev. 1.240.

Coverity error:	272162
2006-11-06 21:53:19 +00:00
Robert Watson
acd3428b7d Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
Robert Watson
800c940832 Add a new priv(9) kernel interface for checking the availability of
privilege for threads and credentials.  Unlike the existing suser(9)
interface, priv(9) exposes a named privilege identifier to the privilege
checking code, allowing more complex policies regarding the granting of
privilege to be expressed.  Two interfaces are provided, replacing the
existing suser(9) interface:

suser(td)                 ->   priv_check(td, priv)
suser_cred(cred, flags)   ->   priv_check_cred(cred, priv, flags)

A comprehensive list of currently available kernel privileges may be
found in priv.h.  New privileges are easily added as required, but the
comments on adding privileges found in priv.h and priv(9) should be read
before doing so.

The new privilege interface exposed sufficient information to the
privilege checking routine that it will now be possible for jail to
determine whether a particular privilege is granted in the check routine,
rather than relying on hints from the calling context via the
SUSER_ALLOWJAIL flag.  For now, the flag is maintained, but a new jail
check function, prison_priv_check(), is exposed from kern_jail.c and used
by the privilege check routine to determine if the privilege is permitted
in jail.  As a result, a centralized list of privileges permitted in jail
is now present in kern_jail.c.

The MAC Framework is now also able to instrument privilege checks, both
to deny privileges otherwise granted (mac_priv_check()), and to grant
privileges otherwise denied (mac_priv_grant()), permitting MAC Policy
modules to implement privilege models, as well as control a much broader
range of system behavior in order to constrain processes running with
root privilege.

The suser() and suser_cred() functions remain implemented, now in terms
of priv_check() and the PRIV_ROOT privilege, for use during the transition
and possibly continuing use by third party kernel modules that have not
been updated.  The PRIV_DRIVER privilege exists to allow device drivers to
check privilege without adopting a more specific privilege identifier.

This change does not modify the actual security policy, rather, it
modifies the interface for privilege checks so changes to the security
policy become more feasible.

Sponsored by:		nCircle Network Security, Inc.
Obtained from:		TrustedBSD Project
Discussed on:		arch@
Reviewed (at least in part) by:	mlaier, jmg, pjd, bde, ceri,
			Alex Lyashkov <umka at sevcity dot net>,
			Skip Ford <skip dot ford at verizon dot net>,
			Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:37:19 +00:00
Pawel Jakub Dawidek
a2ca03b3ad Typo, 'from' vnode is locked here, not 'to' vnode. 2006-11-04 23:57:02 +00:00
Randall Stewart
af99851047 This commits the remake in kern/ make sysent to get
the correct syscalls.master's $FreeBSD$ tag record and
a make sysent in sys/compat/freebsd32. Thanks Ruslan
for pointing out the steps I missed :-0
Approved by:	gnn
2006-11-03 18:57:49 +00:00