Commit Graph

9784 Commits

Author SHA1 Message Date
John Baldwin
1a4435ee0e Print tid's rather than thread pointers in KTR_PROC traces. 2007-02-27 18:46:07 +00:00
John Baldwin
4d70511ac3 Use pause() rather than tsleep() on stack variables and function pointers. 2007-02-27 17:23:29 +00:00
John Baldwin
84d37a463a Use pause() rather than tsleep() on explicit global dummy variables. 2007-02-27 17:22:30 +00:00
Paolo Pisati
f2d619c8b1 Do not execute filter only handlers in ithread_execute_handlers():
this fixes the panics when filter only and ithread only handlers where
sharing the same irq .
2007-02-27 17:09:20 +00:00
Kip Macy
f183910b97 Further improvements to LOCK_PROFILING:
- Fix missing initialization in kern_rwlock.c causing bogus times to be collected
 - Move updates to the lock hash to after the lock is released for spin mutexes,
   sleep mutexes, and sx locks
 - Add new kernel build option LOCK_PROFILE_FAST - only update lock profiling
   statistics when an acquisition is contended. This reduces the overhead of
   LOCK_PROFILING to increasing system time by 20%-25% which on
   "make -j8 kernel-toolchain" on a dual woodcrest is unmeasurable in terms
   of wall-clock time. Contrast this to enabling lock profiling without
   LOCK_PROFILE_FAST and I see a 5x-6x slowdown in wall-clock time.
2007-02-27 06:42:05 +00:00
Robert Watson
e7c33e29ed Revise locking strategy used for UNIX domain sockets in order to improve
concurrency:

- Add per-unpcb mutexes protecting unpcb connection state, fields, etc.

- Replace global UNP mutex with a global UNP rwlock, which will protect the
  UNIX domain socket connection topology, v_socket, and be acquired
  exclusively before acquiring more than per-unpcb at a time in order to
  avoid lock order issues.

In performance measurements involving MySQL, this change has little or no
overhead on UP (+/- 1%), but leads to a significant (5%-30%) improvement in
multi-processor measurements using the sysbench and supersmack benchmarks.

Much testing by:	kris
Approved by:		re (kensmith)
2007-02-26 20:47:52 +00:00
John Baldwin
c0e767f9dd Use NULL rather than 0 for various pointer constants. 2007-02-26 19:28:18 +00:00
Robert Watson
8525230afd Add rw_wowned() interface to rwlock(9), allowing a kernel thread to
determine if it holds an exclusive rwlock reference or not.  This is
non-ideal, but recursion scenarios in the network stack currently
require it.

Approved by:	jhb
2007-02-26 19:05:13 +00:00
John Baldwin
59800afcb5 Mark the kernel linker file as linked so that it is visible to the various
kld*() syscalls.

Tested by:	piso
2007-02-26 16:48:14 +00:00
John Baldwin
4a0f58d25b Fix a comment. 2007-02-26 16:36:48 +00:00
Ruslan Ermilov
fac61393b9 Don't block on the socket zone limit during the socket()
call which can easily lock up a system otherwise; instead,
return ENOBUFS as documented in a manpage, thus reverting
us to the FreeBSD 4.x behavior.

Reviewed by:	rwatson
MFC after:	2 weeks
2007-02-26 10:45:21 +00:00
Kip Macy
fe68a91631 general LOCK_PROFILING cleanup
- only collect timestamps when a lock is contested - this reduces the overhead
  of collecting profiles from 20x to 5x

- remove unused function from subr_lock.c

- generalize cnt_hold and cnt_lock statistics to be kept for all locks

- NOTE: rwlock profiling generates invalid statistics (and most likely always has)
  someone familiar with that should review
2007-02-26 08:26:44 +00:00
Xin LI
1ad9ee8603 Close race conditions between fork() and [sg]etpriority()'s
PRIO_USER case, possibly also other places that deferences
p_ucred.

In the past, we insert a new process into the allproc list right
after PID allocation, and release the allproc_lock sx.  Because
most content in new proc's structure is not yet initialized,
this could lead to undefined result if we do not handle PRS_NEW
with care.

The problem with PRS_NEW state is that it does not provide fine
grained information about how much initialization is done for a
new process.  By defination, after PRIO_USER setpriority(), all
processes that belongs to given user should have their nice value
set to the specified value.  Therefore, if p_{start,end}copy
section was done for a PRS_NEW process, we can not safely ignore
it because p_nice is in this area.  On the other hand, we should
be careful on PRS_NEW processes because we do not allow non-root
users to lower their nice values, and without a successful copy
of the copy section, we can get stale values that is inherted
from the uninitialized area of the process structure.

This commit tries to close the race condition by grabbing proc
mutex *before* we release allproc_lock xlock, and do copy as
well as zero immediately after the allproc_lock xunlock.  This
guarantees that the new process would have its p_copy and p_zero
sections, as well as user credential informaion initialized.  In
getpriority() case, instead of grabbing PROC_LOCK for a PRS_NEW
process, we just skip the process in question, because it does
not affect the final result of the call, as the p_nice value
would be copied from its parent, and we will see it during
allproc traverse.

Other potential solutions are still under evaluation.

Discussed with:	davidxu, jhb, rwatson
PR:		kern/108071
MFC after:	2 weeks
2007-02-26 03:38:09 +00:00
Scott Long
04f0ce213f Fix a case in rman_manage_region() where the resource list would get missorted.
This would in turn confuse rman_reserve_resource().  This was only seen for
MSI resources that can get allocated and deallocated after boot.
2007-02-23 22:53:56 +00:00
John Baldwin
498eccc919 Drop the global kernel linker lock while executing the sysinit's for a
freshly-loaded kernel module.  To avoid various unload races, hide linker
files whose sysinit's are being run from userland so that they can't be
kldunloaded until after all the sysinit's have finished.

Tested by:	gallatin
2007-02-23 19:46:59 +00:00
John Baldwin
37e80fcac2 Add a new kernel sleep function pause(9). pause(9) is for places that
want an equivalent of DELAY(9) that sleeps instead of spins.  It accepts
a wmesg and a timeout and is not interrupted by signals.  It uses a private
wait channel that should never be woken up by wakeup(9) or wakeup_one(9).

Glanced at by:	phk
2007-02-23 16:22:09 +00:00
Paolo Pisati
ef544f6312 o break newbus api: add a new argument of type driver_filter_t to
bus_setup_intr()

o add an int return code to all fast handlers

o retire INTR_FAST/IH_FAST

For more info: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=465712+0+current/freebsd-current

Reviewed by: many
Approved by: re@
2007-02-23 12:19:07 +00:00
Xin LI
74f094f6a4 Use LIST_EMPTY() instead of unrolled version (LIST_FIRST() [!=]= NULL) 2007-02-22 14:52:59 +00:00
Robert Watson
6fac927ccc Add an additional MAC check to the UNIX domain socket connect path:
check that the subject has read/write access to the vnode using the
vnode MAC check.

MFC after:	3 weeks
Submitted by:	Spencer Minear <spencer_minear at securecomputing dot com>
Obtained from:	TrustedBSD Project
2007-02-22 09:37:44 +00:00
Robert Watson
7ee76f9d4e Remove unnecessary privilege and privilege check for WITNESS sysctl.
Head nod:	jhb
2007-02-20 23:49:31 +00:00
Robert Watson
5b950deabc Break introductory comment into two paragraphs to separate material on the
garbage collection complications from general discussion of UNIX domain
sockets.

Staticize unp_addsockcred().

Remove XXX comment regarding Giant and v_socket -- v_socket is protected
by the global UNIX domain socket lock.
2007-02-20 10:50:02 +00:00
Robert Watson
95420afea4 Remove unused PRIV_IPC_EXEC. Renumbers System V IPC privilege. 2007-02-20 00:12:52 +00:00
Robert Watson
2390d78f74 Sync up PRIV_IPC_{ADMIN,READ,WRITE} priv checks in ipcperm() with
kern_jail.c: allow jailed root these privileges.  This only has an
effect if System V IPC is administratively enabled for the jail.
2007-02-20 00:06:59 +00:00
Robert Watson
b12c55ab92 Restore sysv_ipc.c:1.30, which was backed out due to interactions with
System V shared memory, now believed fixed in sysv_shm.c:1.109:

  date: 2006/11/06 13:42:01;  author: rwatson;  state: Exp;  lines: +65 -37
  Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
  specific privilege names to a broad range of privileges.  These may
  require some future tweaking.

  Sponsored by:           nCircle Network Security, Inc.
  Obtained from:          TrustedBSD Project
  Discussed on:           arch@
  Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                          Alex Lyashkov <umka at sevcity dot net>,
                          Skip Ford <skip dot ford at verizon dot net>,
                          Antoine Brodin <antoine dot brodin at laposte dot net>

This restores fine-grained privilege support to System V IPC.

PR:	106078
2007-02-19 22:59:23 +00:00
Robert Watson
3d50b06b8e Remove call to ipcperm() in shmget_existing(). The flags argument is
ignored on other systems I investigated when accessing an existing
memory segment rather than creating a new one.  This call to ipcperm()
is the only one to pass in a complete mode flag to the permission
checks rather than a simple access request mask, and caused problems
for the revised ipcperm() based on the priv(9) interface, which can
now be restored.

PR:	106078
2007-02-19 22:56:10 +00:00
Robert Watson
95b091d2f2 Rename three quota privileges from the UFS privilege namespace to the
VFS privilege namespace: exceedquota, getquota, and setquota.  Leave
UFS-specific quota configuration privileges in the UFS name space.

This renumbers VFS and UFS privileges, so requires rebuilding modules
if you are using security policies aware of privilege identifiers.
This is likely no one at this point since none of the committed MAC
policies use the privilege checks.
2007-02-19 13:33:10 +00:00
Robert Watson
e82d0201bd Limit quota privileges in jail to PRIV_UFS_GETQUOTA and
PRIV_UFS_SETQUOTA.
2007-02-19 13:26:39 +00:00
Robert Watson
ea04d82da8 Do allow privilege to create over-sized messages on System V IPC
message queues in jail.
2007-02-19 13:23:45 +00:00
Robert Watson
86138fc742 Use priv_check(9) instead of suser(9) for checking the privilege to
set real-time priority on a thread.  It looks like this suser(9)
call was introduced after my first pass through replacing superuser
checks with named privilege checks.
2007-02-19 13:22:36 +00:00
Robert Watson
c3c1b5e62a For now, reflect practical reality that Audit system calls aren't
allowed in Jail: return a privilege error.
2007-02-19 13:10:29 +00:00
Konstantin Belousov
9b2f1a0740 Remove union_dircheckp hook, it is not needed by new unionfs code anymore.
As consequence, getdirentries() no longer needs to drop/reacquire
directory vnode lock, that would allow it to be reclaimed in between.

Reported and tested by:	Peter Holm
Approved by:		rodrigc (unionfs)
MFC after:		1 week
2007-02-19 10:56:09 +00:00
Pawel Jakub Dawidek
2c7b0f41ec Remove VFS_VPTOFH entirely. API is already broken and it is good time to
do it.

Suggested by:	rwatson
2007-02-16 17:32:41 +00:00
Pawel Jakub Dawidek
10bcafe9ab Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method.
This way we may support multiple structures in v_data vnode field within
one file system without using black magic.

Vnode-to-file-handle should be VOP in the first place, but was made VFS
operation to keep interface as compatible as possible with SUN's VFS.
BTW. Now Solaris also implements vnode-to-file-handle as VOP operation.

VFS_VPTOFH() was left for API backward compatibility, but is marked for
removal before 8.0-RELEASE.

Approved by:	mckusick
Discussed with:	many (on IRC)
Tested with:	ufs, msdosfs, cd9660, nullfs and zfs
2007-02-15 22:08:35 +00:00
Luigi Rizzo
33d5497079 Cleanup and document the implementation of firmware(9) based on
a version that i posted earlier on the -current mailing list,
and subsequent feedback received.

The core of the change is just in sys/firmware.h and kern/subr_firmware.c,
while other files are just adaptation of the clients to the ABI change
(const-ification of some parameters and hiding of internal info,
so this is fully compatible at the binary level).

In detail:
- reduce the amount of information exported to clients in struct firmware,
  and constify the pointer;

- internally, document and simplify the implementation of the various
  functions, and make sure error conditions are dealt with properly.

The diffs are large, but the code is really straightforward now (i hope).

Note also that there is a subtle issue with the implementation of
firmware_register(): currently, as in the previous version, we just
store a reference to the 'imagename' argument, but we should rather
copy it because there is no guarantee that this is a static string.
I realised this while testing this code, but i prefer to fix it in
a later commit -- there is no regression with respect to the past.

Note, too, that the version in RELENG_6 has various bugs including
missing locks around the module release calls, mishandling of modules
loaded by /boot/loader, and so on, so an MFC is absolutely necessary
there.  I was just postponing it until this cleanup to avoid doing
things twice.

MFC after: 1 week
2007-02-15 17:21:31 +00:00
Robert Watson
780a98ad1f Catch up file descriptor printing function in DDB to the addition of kqueues
and POSIX message queues.
2007-02-15 10:55:43 +00:00
Robert Watson
442f65e958 Break file descriptor printing logic out of db_show_files() into
db_print_file(), and add a new "show file <ptr>" DDB command, which can
be used to print out file descriptors referenced in stack traces.
2007-02-15 10:50:48 +00:00
Robert Watson
f58dd47091 Rename somaxconn_sysctl() to sysctl_somaxconn() so that I will be able to
claim that sofoo() functions all accept a socket as their first argument.
2007-02-15 10:11:00 +00:00
Konstantin Belousov
478a8db4ce If both ISDOTDOT and NOCROSSMOUNT are set then lookup() might breaks out
of the special handling for ".." and perform an ISDOTDOT VOP_LOOKUP()
for a filesystem root vnode. Handle this case inside lookup().

Submitted by:	tegge
PR:		92785
MFC after:	1 week
2007-02-15 09:53:49 +00:00
Robert Watson
c3b162d54e Teach DDB how to print sockets, socket buffers, protosw's, and domain
structures given pointers to them.
2007-02-15 01:28:22 +00:00
Robert Watson
aea52f1bf8 Minor rearrangement of global variables, comments, etc, in UNIX domain
sockets.
2007-02-14 15:05:40 +00:00
Robert Watson
46a1d9bfe8 Change unp_mtx to supporting recursion, and do not drop the unp_mtx over
sonewconn() in unp_connect().  This avoids a race that occurs due to
v_socket being an uncounted reference, as the lock was being released in
order to call sonewconn(), which otherwise recurses into the UNIX domain
socket code via pru_attach, as well as holding the lock over a sleeping
memory allocation in uipc_attach().  Switch to a non-sleeping memory
allocation during UNIX domain socket attach.

This fix non-ideal in that it requires enabling recursion, but is a much
smaller change than moving to using true references for v_socket.  The
reported panic occurs in unp_connect() following the return of
sonewconn().

Update copyright year.

Panic reported by:      jhb
2007-02-14 12:22:11 +00:00
Robert Watson
05102f04d5 Set UNP_CONNECTING when committing to moving ahead in unp_connect().
This logic was lost when merging the remainder of these changes in
1.178.
2007-02-13 21:00:57 +00:00
Olivier Houchard
38cc2a5caa Make vfs_getopts() set *error to ENOENT if the option wasn't found, so that
consumers don't have to check for both error and the return value (some of
them actually don't do it).

MFC After:	1 week
2007-02-13 01:28:48 +00:00
Mike Pritchard
51fd6380c5 Do not do a vn_close for all references to the ktraced file if we are
doing a CLEARFILE option.  Do a vrele instead.  This prevents
a panic later due to v_writecount being negative when the vnode
is taken off the freelist.

Submitted by:	jhb
2007-02-13 00:20:13 +00:00
Mike Pritchard
87aabdc126 Add a VNASSERT to vn_close to detect if v_writecount is going
to become negative.  This will detect the underflow when it
happens, instead of having it discovered when the vnode is
taken off the freelist, long after the offending process is long
gone.
2007-02-12 22:53:01 +00:00
Craig Rodrigues
d139ce67c0 Makefile changes to reflect moving sys/isofs/cd9660 to sys/fs/cd9660.
Continue to install userland include files in /usr/include/isofs/cd9660
so as not to break userland applications such as libstand.
2007-02-11 14:01:32 +00:00
Xin LI
d60226bd43 Give which signal caller has attempted to deliver when panicking. 2007-02-09 17:48:28 +00:00
Jeff Roberson
ed0e8f2fe9 - Change types for necent runq additions to u_char rather than int.
- Fix these types in ULE as well.  This fixes bugs in priority index
   calculations in certain edge cases. (int)-1 % 64 != (uint)-1 % 64.

Reported by:	kkenn using pho's stress2.
2007-02-08 01:52:25 +00:00
Alan Cox
0e2056ee7f Remove the vm page queue free mutex from the CDEV order. 2007-02-07 05:43:31 +00:00
Robert Watson
1f837c4753 Push UNIX domain socket locking further into uipc_ctloutput() in order to
avoid holding the UNIX domain socket subsystem lock over soooptcopyin()
and sooptcopyout().  This problem was introduced when LOCAL_CREDS, and
LOCAL_CONNWAIT support were added.

Reviewed by:	mdodd
2007-02-06 14:31:37 +00:00