Commit Graph

9823 Commits

Author SHA1 Message Date
Andre Oppermann
4e02375908 Maintain a pointer and offset pair into the socket buffer mbuf chain to
avoid traversal of the entire socket buffer for larger offsets on stream
sockets.

Adjust tcp_output() make use of it.

Tested by:	gallatin
2007-03-19 18:35:13 +00:00
Pawel Jakub Dawidek
9a2fd584b4 Don't deny unmounting file systems for jailed processes immediately, allow
prison_priv_check() to decide what to do.

This change is suppose not to change current (security) behaviour
in any way.

This change is simlar to the change of PRIV_VFS_MOUNT in previous revision.
2007-03-18 02:39:19 +00:00
Jeff Roberson
52bc574cc7 - Handle the case where slptime == runtime.
Submitted by:	Atoine Brodin
2007-03-17 23:32:48 +00:00
Jeff Roberson
4499aff6ec - Cast the intermediate value in priority computtion back down to
unsigned char.  Weirdly, casting the 1 constant to u_char still produces
   a signed integer result that is then used in the % computation.  This
   avoids that mess all together and causes a 0 pri to turn into 255 % 64
   as we expect.

Reported by:	kkenn (about 4 times, thanks)
2007-03-17 18:13:32 +00:00
John Baldwin
3076ca6720 Just use 'fdrop()' instead of 'FILE_LOCK(); fdrop_locked()' in
dupfdopen().  While I'm at it, move the second fdrop() out from under the
filedesc lock.
2007-03-15 21:19:21 +00:00
Pawel Jakub Dawidek
7533652025 Don't deny mounting for jailed processes immediately, allow
prison_priv_check() to decide what to do.

This change is suppose not to change current (security) behaviour
in any way.

Reviewed by:	rwatson
2007-03-14 13:09:59 +00:00
Pawel Jakub Dawidek
f7d4e990c7 White space nits. 2007-03-14 12:54:10 +00:00
Konstantin Belousov
71d49316cc Busy filesystem around call of VFS_QUOTACTL() vfs op.
Tested by:	Peter Holm
Reviewed by:	tegge
Approved by:	re (kensmith)
2007-03-14 08:45:55 +00:00
John Baldwin
c1f2a5334d Print readers count as unsigned in ddb 'show lock'.
Submitted by:	attilio
2007-03-13 16:51:27 +00:00
Tor Egge
61b9d89ff0 Make insmntque() externally visibile and allow it to fail (e.g. during
late stages of unmount).  On failure, the vnode is recycled.

Add insmntque1(), to allow for file system specific cleanup when
recycling vnode on failure.

Change getnewvnode() to no longer call insmntque().  Previously,
embryonic vnodes were put onto the list of vnode belonging to a file
system, which is unsafe for a file system marked MPSAFE.

Change vfs_hash_insert() to no longer lock the vnode.  The caller now
has that responsibility.

Change most file systems to lock the vnode and call insmntque() or
insmntque1() after a new vnode has been sufficiently setup.  Handle
failed insmntque*() calls by propagating errors to callers, possibly
after some file system specific cleanup.

Approved by:	re (kensmith)
Reviewed by:	kib
In collaboration with:	kib
2007-03-13 01:50:27 +00:00
John Baldwin
4b493b1a6d Fix a typo. 2007-03-12 20:10:29 +00:00
John Baldwin
7568503421 - Use m_gethdr(), m_get(), and m_clget() instead of the macros in
sosend_copyin().
- Use M_WAITOK instead of M_TRYWAIT in sosend_copyin().
- Don't check for NULL from M_WAITOK and return ENOBUFS.
  M_WAITOK/M_TRYWAIT allocations don't fail with NULL.

Reviewed by:	andre
Requested by:	andre (2)
2007-03-12 19:27:36 +00:00
Robert Watson
6e2faa2444 In uipc_close(), we no longer always free the unpcb, as the last reference
may be dropped later.  In this case, always unlock the unpcb so as not to
leak the lock.

Found by:	kris (BugMagnet)
2007-03-12 14:52:00 +00:00
John Baldwin
6caa5f40a2 Use sx_sleep() in the main loop of the accounting kthread. 2007-03-09 23:29:31 +00:00
John Baldwin
e7573e7ad7 Allow threads to atomically release rw and sx locks while waiting for an
event.  Locking primitives that support this (mtx, rw, and sx) now each
include their own foo_sleep() routine.
- Rename msleep() to _sleep() and change it's 'struct mtx' object to a
  'struct lock_object' pointer.  _sleep() uses the recently added
  lc_unlock() and lc_lock() function pointers for the lock class of the
  specified lock to release the lock while the thread is suspended.
- Add wrappers around _sleep() for mutexes (mtx_sleep()), rw locks
  (rw_sleep()), and sx locks (sx_sleep()).  msleep() still exists and
  is now identical to mtx_sleep(), but it is deprecated.
- Rename SLEEPQ_MSLEEP to SLEEPQ_SLEEP.
- Rewrite much of sleep.9 to not be msleep(9) centric.
- Flesh out the 'RETURN VALUES' section in sleep.9 and add an 'ERRORS'
  section.
- Add __nonnull(1) to _sleep() and msleep_spin() so that the compiler will
  warn if you try to pass a NULL wait channel.  The functions already have
  a KASSERT to that effect.
2007-03-09 22:41:01 +00:00
John Baldwin
6e21afd40c Add two new function pointers 'lc_lock' and 'lc_unlock' to lock classes.
These functions are intended to be used to drop a lock and then reacquire
it when doing an sleep such as msleep(9).  Both functions accept a
'struct lock_object *' as their first parameter.  The 'lc_unlock' function
returns an integer that is then passed as the second paramter to the
subsequent 'lc_lock' function.  This can be used to communicate state.
For example, sx locks and rwlocks use this to indicate if the lock was
share/read locked vs exclusive/write locked.

Currently, spin mutexes and lockmgr locks do not provide working lc_lock
and lc_unlock functions.
2007-03-09 16:27:11 +00:00
John Baldwin
3ff6d22988 Use C99-style struct member initialization for lock classes. 2007-03-09 16:19:34 +00:00
John Baldwin
ae8dde30c2 Use C99-style struct member initialization for lock classes. 2007-03-09 16:04:44 +00:00
Pawel Jakub Dawidek
2709e8904f Minor simplification. 2007-03-09 05:22:10 +00:00
Mohan Srinivasan
f9bb753844 Over NFS, an open() call could result in multiple over-the-wire
GETATTRs being generated - one from lookup()/namei() and the other
from nfs_open() (for cto consistency). This change eliminates the
GETATTR in nfs_open() if an otw GETATTR was done from the namei()
path. Instead of extending the vop interface, we timestamp each attr
load, and use this to detect whether a GETATTR was done from namei()
for this syscall. Introduces a thread-local variable that counts the
syscalls made by the thread and uses <pid, tid, thread syscalls> as
the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on
thread state that could be used as the timestamp with minimal overhead.
2007-03-09 04:02:38 +00:00
Julian Elischer
486a941418 Instead of doing comparisons using the pcpu area to see if
a thread is an idle thread, just see if it has the IDLETD
flag set. That flag will probably move to the pflags word
as it's permenent and never chenges for the life of the
system so it doesn't need locking.
2007-03-08 06:44:34 +00:00
Pawel Jakub Dawidek
9e5dcf7b21 White space nits. 2007-03-07 21:24:51 +00:00
John Baldwin
ddb38a1f3d Fix some nits in lock profiling for rwlocks:
- Properly note when a read lock is released.
- Always note when we contest on a read lock.
- Only note success of obtaining read locks for the first reader to match
  the behavior of sx(9).

Reviewed by:	kmacy
2007-03-07 20:48:48 +00:00
Julian Elischer
1d820a15b8 After the last change to KSE threading a bug was introduced where
all threads were counted against the count of upcall capable threads.
this changes the way we do this accounting.
2007-03-07 20:17:41 +00:00
Olivier Houchard
aed12d5ff8 Backout rev 1.17, msleep() can't be used with a spinlock.
Pointy hat to:	cognet
2007-03-06 12:08:38 +00:00
Robert Watson
b5368498b5 Replay minor system call comment cleanup applied to kern_acl.c in a race
with repo-copy of kern_acl.c to vfs_acl.c.
2007-03-05 13:26:07 +00:00
Robert Watson
e6f5470468 Recognize repo-copy of kern_acl.c to vfs_acl.c, remove kern_acl.c,
remove kern_acl.c from the build, connect vfs_acl.c to the build.

Thanks to:	joe
2007-03-05 13:24:01 +00:00
Robert Watson
873fbcd776 Further system call comment cleanup:
- Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde)
- Remove extra blank lines in some cases.
- Add extra blank lines in some cases.
- Remove no-op comments consisting solely of the function name, the word
  "syscall", or the system call name.
- Add punctuation.
- Re-wrap some comments.
2007-03-05 13:10:58 +00:00
Wojciech A. Koszek
59f65a4ba6 Change these descriptions of memory types used in malloc(9), as their
current, rather long strings make output from vmstat -m look unpleasant.

Approved by:	cognet (mentor)
2007-03-05 00:21:40 +00:00
Wojciech A. Koszek
d348f4d384 Use msleep(9) instead of tsleep(9) surrounded by lock acquisition and
release.

Approved by:	cognet (mentor)
2007-03-04 23:40:35 +00:00
Robert Watson
0c14ff0eb5 Remove 'MPSAFE' annotations from the comments above most system calls: all
system calls now enter without Giant held, and then in some cases, acquire
Giant explicitly.

Remove a number of other MPSAFE annotations in the credential code and
tweak one or two other adjacent comments.
2007-03-04 22:36:48 +00:00
Robert Watson
1a5d072b76 Move to ANSI C function headers. Re-wrap some comments. 2007-03-04 17:50:46 +00:00
John Baldwin
e41bcf3cfc - Don't do the interrupt storm protection stuff for software interrupt
handlers.
- Use pause() when throtting during an interrupt storm.

Reported by:	kris (1)
2007-03-02 17:01:45 +00:00
Kip Macy
c66d760608 lock stats updates need to be protected by the lock 2007-03-02 07:21:20 +00:00
Pawel Jakub Dawidek
bb531912ff Rename PRIV_VFS_CLEARSUGID to PRIV_VFS_RETAINSUGID, which seems to better
describe the privilege.

OK'ed by:	rwatson
2007-03-01 20:47:42 +00:00
Bruce M Simpson
6f7ca813c4 Do not dispatch SIGPIPE from the generic write path for a socket; with
this patch the code behaves according to the comment on the line above.

Without this patch, a socket could cause SIGPIPE to be delivered to its
process, once with SO_NOSIGPIPE set, and twice without.

With this patch, the kernel now passes the sigpipe regression test.

Tested by:	Anton Yuzhaninov
MFC after:	1 week
2007-03-01 19:20:25 +00:00
Kip Macy
a5bceb77f2 Evidently I've overestimated gcc's ability to peak inside inline functions
and optimize away unused stack values. The 48 bytes that the lock_profile_object
adds to the stack evidently has a measurable performance impact on certain workloads.
2007-03-01 09:35:48 +00:00
Robert Watson
ede6e136f8 Remove two simultaneous acquisitions of multiple unpcb locks from
uipc_send in cases where only a global read lock is held by breaking
them out and avoiding the unpcb lock acquire in the common case.  This
avoids deadlocks which manifested with X11, and should also marginally
further improve performance.

Reported by:	sepotvin, brooks
2007-03-01 09:00:42 +00:00
Robert Watson
3592fd4de5 Lock unp2 after checking for a non-NULL unp2 pointer in uipc_send() on
datagram UNIX domain sockets, not before.
2007-02-28 08:08:50 +00:00
John Baldwin
1a4435ee0e Print tid's rather than thread pointers in KTR_PROC traces. 2007-02-27 18:46:07 +00:00
John Baldwin
4d70511ac3 Use pause() rather than tsleep() on stack variables and function pointers. 2007-02-27 17:23:29 +00:00
John Baldwin
84d37a463a Use pause() rather than tsleep() on explicit global dummy variables. 2007-02-27 17:22:30 +00:00
Paolo Pisati
f2d619c8b1 Do not execute filter only handlers in ithread_execute_handlers():
this fixes the panics when filter only and ithread only handlers where
sharing the same irq .
2007-02-27 17:09:20 +00:00
Kip Macy
f183910b97 Further improvements to LOCK_PROFILING:
- Fix missing initialization in kern_rwlock.c causing bogus times to be collected
 - Move updates to the lock hash to after the lock is released for spin mutexes,
   sleep mutexes, and sx locks
 - Add new kernel build option LOCK_PROFILE_FAST - only update lock profiling
   statistics when an acquisition is contended. This reduces the overhead of
   LOCK_PROFILING to increasing system time by 20%-25% which on
   "make -j8 kernel-toolchain" on a dual woodcrest is unmeasurable in terms
   of wall-clock time. Contrast this to enabling lock profiling without
   LOCK_PROFILE_FAST and I see a 5x-6x slowdown in wall-clock time.
2007-02-27 06:42:05 +00:00
Robert Watson
e7c33e29ed Revise locking strategy used for UNIX domain sockets in order to improve
concurrency:

- Add per-unpcb mutexes protecting unpcb connection state, fields, etc.

- Replace global UNP mutex with a global UNP rwlock, which will protect the
  UNIX domain socket connection topology, v_socket, and be acquired
  exclusively before acquiring more than per-unpcb at a time in order to
  avoid lock order issues.

In performance measurements involving MySQL, this change has little or no
overhead on UP (+/- 1%), but leads to a significant (5%-30%) improvement in
multi-processor measurements using the sysbench and supersmack benchmarks.

Much testing by:	kris
Approved by:		re (kensmith)
2007-02-26 20:47:52 +00:00
John Baldwin
c0e767f9dd Use NULL rather than 0 for various pointer constants. 2007-02-26 19:28:18 +00:00
Robert Watson
8525230afd Add rw_wowned() interface to rwlock(9), allowing a kernel thread to
determine if it holds an exclusive rwlock reference or not.  This is
non-ideal, but recursion scenarios in the network stack currently
require it.

Approved by:	jhb
2007-02-26 19:05:13 +00:00
John Baldwin
59800afcb5 Mark the kernel linker file as linked so that it is visible to the various
kld*() syscalls.

Tested by:	piso
2007-02-26 16:48:14 +00:00
John Baldwin
4a0f58d25b Fix a comment. 2007-02-26 16:36:48 +00:00
Ruslan Ermilov
fac61393b9 Don't block on the socket zone limit during the socket()
call which can easily lock up a system otherwise; instead,
return ENOBUFS as documented in a manpage, thus reverting
us to the FreeBSD 4.x behavior.

Reviewed by:	rwatson
MFC after:	2 weeks
2007-02-26 10:45:21 +00:00