Commit Graph

8577 Commits

Author SHA1 Message Date
pjd
333a175a13 Close another information leak in ktrace(2): one was able to find active
process groups outside a jail, etc. by using ktrace(2).

OK'ed by:	rwatson
Approved by:	re (scottl)
MFC after:	1 week
2005-06-24 12:05:24 +00:00
peter
c9db8c3eb4 Move HWPMC_HOOKS into its own opt_hwpmc_hooks.h file. It doesn't merit
being in opt_global.h and forcing a global recompile when only a few files
reference it.

Approved by:  re
2005-06-24 00:16:57 +00:00
pjd
a99a8a69bd Actually only protect mount-point if security.jail.enforce_statfs is set to 2.
If we don't return statistics about requested file systems, system tools
may not work correctly or at all.

Approved by:	re (scottl)
2005-06-23 22:13:29 +00:00
jhb
d3f77097c8 Fix a typo in a comment.
Approved by:	re (scottl)
2005-06-23 21:55:43 +00:00
silby
cbb0f23931 Change the mbuf, mbuf cluster, and mbuf packet allocation routines so that
the UMA "trash" allocator is used - this ensures that any writes to a freed
mbuf should provoke a panic.

Only enabled under INVARIANTS, of course.

Approved by:	re (scottl)
2005-06-23 04:33:39 +00:00
pjd
01c87fdee6 Add missing unlock.
Pointy hat to:	pjd
Approved by:	re (dwhite)
2005-06-21 21:17:02 +00:00
jhb
a5e3094bcf Simplify the storming logic and remove a variable as a result.
Approved by:	re (dwhite)
2005-06-20 19:32:23 +00:00
gad
585dde8a03 Fix a panic which could occur parsing #!-lines in a shell-script. If the
#!-line had multiple whitespace characters after the interpreter name, and
it did not have any options, then the code would do nasty things trying to
process a (non-existent) option-string which "ended before it began"...

Submitted by:	Morten Johansen
Approved by:	re (dwhite)
2005-06-19 02:21:03 +00:00
jeff
5970417966 - Try to catch the wrong bufobj panics a little earlier. I believe they
are actually caused by a buf with both VNCLEAN and VNDIRTY set.  In
   the traces it is clear that the buf is removed from the dirty queue while
   it is actually on the clean queue which leaves the tail pointer set.
   Assert that both flags are not set in buf_vlist_add and buf_vlist_remove.

Sponsored by:	Isilon Systems, Inc.
Approved by:	re (blanket vfs)
2005-06-18 18:17:03 +00:00
jeff
9e1f35189b - Fix a leaked reference to a vnode via v_dd. We rely on cache_purge() and
cache_zap() to clear the v_dd pointers when a directory vnode is forcibly
   discarded.  For this to work, all vnodes with v_dd pointers to a directory
   must also have name cache entries linked via v_cache_dst to that dvp
   otherwise we could not find them at cache_purge() time.  The following
   code snipit could break this guarantee by unlinking a directory before
   fetching it's dotdot.  The dotdot lookup would initialize the v_dd field
   of the unlinked directory which could never be cleared.  To fix this
   we don't initialize v_dd for orphaned vnodes.
        printf("rmdir: %d\n", rmdir("../foo")); /* foo is cwd */
        printf("chdir: %d\n", chdir(".."));
        printf("%s\n", getwd(NULL));

Sponsored by:	Isilon Systems, Inc.
Discovered by:	kkenn
Approved by:	re (blanket vfs)
2005-06-17 01:05:13 +00:00
kensmith
82cf72b8bb Remove a variable that became unused as a result of changes made
in v1.139.  This was only exposed if MALLOC_PROFILE was defined.

Submitted by:	Gary Jennejohn
Pointy hat:	rwatson
Approved by:	re (scottl)
2005-06-16 16:01:46 +00:00
jeff
ca07a9f012 - Change holdcnt use around vnode recycling. We now always keep a holdcnt
ref while we're calling vgone().  This prevents transient refs from
   re-adding us to the free list.  Previously, a vfree() triggered via
   vinvalbuf() getting rid of all of a vnode's pages could place a partially
   destructed vnode on the free list where vtryrecycle() could find it.  The
   first call to vtryrecycle would hang up on the vnode lock, but when it
   failed it would place a now dead vnode onto the free list, and another
   call to vtryrecycle() would free an already free vnode.  There were many
   complications of having a zero ref count while freeing which can now go
   away.
 - Change vdropl() to release the interlock before returning.  All callers
   now respect this, so vdropl() directly frees VI_DOOMED vnodes once the
   last ref is dropped.  This means that we'll never have VI_DOOMED vnodes
   on the free list.
 - Seperate v_incr_usecount() into v_incr_usecount(), v_decr_usecount() and
   v_decr_useonly().  The incr/decr split is so that incr usecount can
   return with the interlock still held while decr drops the interlock so
   it can call vdropl() which will potentially free the vnode.  The calling
   function can't drop the lock of an already free'd node.  v_decr_useonly()
   drops a usecount without droping the hold count.  This is done so the
   usecount reaches zero in vput() before we recycle, however the holdcount
   is still 1 which prevents any new references from placing the vnode
   back on the free list.
 - Fix vnlrureclaim() to vhold the vnode since it doesn't do a vget().  We
   wouldn't want vnlrureclaim() to bump the usecount since this has
   different semantics.  Also change vnlrureclaim() to do a NOWAIT on the
   vn_lock.  When this function runs we're usually in a desperate situation
   and we wouldn't want to wait for any specific vnode to be released.
 - Fix a bunch of misc comments to reflect the new behavior.
 - Add vhold() and vdrop() to vflush() for the same reasons that we do in
   vlrureclaim().  Previously we held no reference and a vnode could have
   been freed while we were waiting on the lock.
 - Get rid of vlruvp() and vfreehead().  Neither are used.  vlruvp() should
   really be rethought before it's reintroduced.
 - vgonel() always returns with the vnode locked now and never puts the
   vnode back on a free list.  The vnode will be freed as soon as the last
   reference is released.

Sponsored by:	Isilon Systems, Inc.
Debugging help from:	Kris Kennaway, Peter Holm
Approved by:	re (blanket vfs)
2005-06-16 04:41:42 +00:00
jeff
dec8b83b9d - Fix insertions of bios which represent data earlier than anything else
in the queue.  The insertion sort assumed this had already been taken
   care of.

Spotted by:	Antoine Brodin
Approved by:	re (scottl)
2005-06-15 23:32:07 +00:00
jeff
78308b0fd3 - Add and enhance asserts related to the wrong bufobj panic.
Sponsored by:	Isilon Systems, Inc.
Approved by:	re (blanket vfs)
2005-06-14 20:32:27 +00:00
jeff
909b5b7c58 - In reassignbuf() add many asserts to validate the head and tail pointers
of the clean and dirty lists.  This is in an attempt to catch the wrong
   bufobj problem sooner.
 - In vgonel() don't acquire an extra reference in the active case, the
   vnode lock and VI_DOOMED protect us from recursively cleaning.
 - Also in vgonel() clean up some stale comments.

Sponsored by:	Isilon Systems, Inc.
Approved by:	re (blanket vfs)
2005-06-14 20:31:53 +00:00
jeff
1792584f2a - Remove vnode lock asserts at the end of vfs syscalls. These asserts were
used to ensure that we weren't exiting the syscall with a lock still
   held.  This wasn't safe, however, because we'd already executed a vput()
   and on a loaded system the vnode may have been free'd by the time we
   assert.  This functionality is also handled by the td_locks assert in
   userret, which doesn't tell you what the syscall was, but will at least
   panic before you deadlock.

Sponsored by:   Isilon Systems, Inc.
Discovred by:   Peter Holm
Approved by:	re (blanket vfs)
2005-06-14 01:14:40 +00:00
jeff
7a825fb457 - Don't make vgonel() globally visible, we want to change its prototype
anyway and it's not used outside of vfs_subr.c.
 - Change vgonel() to accept a parameter which determines whether or not
   we'll put the vnode on the free list when we're done.
 - Use the new vgonel() parameter rather than VI_DOOMED to signal our
   intentions in vtryrecycle().
 - In vgonel() return if VI_DOOMED is already set, this vnode has already
   been reclaimed.

Sponsored by:	Isilon Systems, Inc.
2005-06-13 06:26:55 +00:00
jeff
22b5cc07b0 - Clear v_dd in cache_zap() instead of cache_purge() as cache_purge() may
not be called in all cases where we free the cnp.

Sponsored by:	Isilon Systems, Inc.
2005-06-13 05:59:59 +00:00
jeff
df170ebc61 - It has long been my suspicion that we don't actually need a loop in
vn_lock().  Add an assert that will help me gain more confidence that this
   is correct.

Sponsored by:	Isilon Systems, Inc.
2005-06-13 00:47:29 +00:00
jeff
2ef7df2a1a - Add KTR_VFS events to vdestroy, vtruncbuf, vinvalbuf, vfreehead.
Sponsored by:	Isilon Systems, Inc.
2005-06-13 00:46:37 +00:00
jeff
bf15cf7167 - Add KTR_VFS messages for various name cache related events.
Sponsored by:	Isilon Systems, Inc.
2005-06-13 00:46:03 +00:00
jeff
c92b8a6f78 - Split one KASSERT in bremfree() into two to aid in debugging.
Sponsored by:	Isilon Systems, Inc.
2005-06-13 00:45:05 +00:00
jeff
2599199edf - Dramatically simplify bioqdisksort(). We no longer do ordered bios so
most of the code to deal with them has been dead for sometime.  Simplify
   the code by doing an insert sort hinted by the current head position.

Met with apathy by:	arch@
2005-06-12 22:32:29 +00:00
pjd
d2fe610c90 Do not allocate memory while holding a mutex.
I introduce a very small race here (some file system can be mounted or
unmounted between 'count' calculation and file systems list creation),
but it is harmless.

Found by:	FreeBSD Kernel Stress Test Suite: http://www.holm.cc/stress/
Reported by:	Peter Holm <peter@holm.cc>
2005-06-12 07:03:23 +00:00
pjd
be79126844 Do not allocate memory based on not-checked argument from userland.
It can be used to panic the kernel by giving too big value.
Fix it by moving allocation and size verification into kern_getfsstat().
This even simplifies kern_getfsstat() consumers, but destroys symmetry -
memory is allocated inside kern_getfsstat(), but has to be freed by the
caller.

Found by:	FreeBSD Kernel Stress Test Suite: http://www.holm.cc/stress/
Reported by:	Peter Holm <peter@holm.cc>
2005-06-11 14:58:20 +00:00
maxim
e5e29d142d o setsockopt(2) cannot remove accept filter. [1]
o getsockopt(SO_ACCEPTFILTER) always returns success on listen socket
  even we didn't install accept filter on the socket.
o Fix these bugs and add regression tests for them.

Submitted by:	Igor Sysoev [1]
Reviewed by:	alfred
MFC after:	2 weeks
2005-06-11 11:59:48 +00:00
jeff
306b180d66 - Assert that we're not in the name cache anymore in vdestroy().
Sponsored by:	Isilon Systems, Inc.
2005-06-11 08:48:09 +00:00
jeff
8a4fe36603 - Assert that we're not adding a doomed vnode to the name cache.
Sponsored by:	Isilon Systems, Inc.
2005-06-11 08:47:30 +00:00
jeff
3625e8746b - Add KTR_VFS tracing to track the life of vnodes. Eventually KTR_VFS
events could be added to cover other interesting details.
 - Add some VNASSERTs to discover places where we access vnodes after
   they have been uma_zfree'd before we try to free them again.
 - Add a few more VNASSERTs to vdestroy() to be certain that the vnode is
   really unused.

Sponsored by:	Isilon Systems, Inc.
2005-06-11 01:16:46 +00:00
green
ff904ffb64 Fix a serious deadlock with the NFS client. Given a large enough
atomic write request, it can fill the buffer cache with the entirety
of that write in order to handle retries.  However, it never drops
the vnode lock, or else it wouldn't be atomic, so it ends up waiting
indefinitely for more buf memory that cannot be gotten as it has it
all, and it waits in an uncancellable state.

To fix this, hibufspace is exported and scaled to a reasonable
fraction.  This is used as the limit of how much of an atomic write
request by the NFS client will be handled asynchronously.  If the
request is larger than this, it will be turned into a synchronous
request which won't deadlock the system.  It's possible this value is
far off from what is required by some, so it shall be tunable as soon
as mount_nfs(8) learns of the new field.

The slowdown between an asynchronous and a synchronous write on NFS
appears to be on the order of 2x-4x.

General nod by:	gad
MFC after:	2 weeks
More testing:	wes
PR:		kern/79208
2005-06-10 23:50:41 +00:00
jeff
d372186b52 - Add curthread to the state that ktr is saving. The extra information is
well worth the bloat.
 - Change the formatting of 'show ktr' slightly to accommodate the
   additional field.  Remove a tab from the verbose output and place the
   actual trace data after a : so it is more easy to understand which
   part is the event and which is part of the record.
2005-06-10 23:21:29 +00:00
jkoshy
b195d18520 Fix typo.
Reviewed by:	rwatson, sam
2005-06-10 18:06:59 +00:00
brooks
567ba9b00a Stop embedding struct ifnet at the top of driver softcs. Instead the
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.

This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.

Other changes of note:
 - Struct arpcom is no longer referenced in normal interface code.
   Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
   To enforce this ac_enaddr has been renamed to _ac_enaddr.
 - The second argument to ether_ifattach is now always the mac address
   from driver private storage rather than sometimes being ac_enaddr.

Reviewed by:	sobomax, sam
2005-06-10 16:49:24 +00:00
ups
5273b0bf9f Restore preemption of idle threads.
Submitted by:	jhb
2005-06-10 03:00:29 +00:00
ssouhlal
0835f7b4a9 Allow EVFILT_VNODE events to work on every filesystem type, not just
UFS by:
- Making the pre and post hooks for the VOP functions work even when
DEBUG_VFS_LOCKS is not defined.
- Moving the KNOTE activations into the corresponding VOP hooks.
- Creating a MNTK_NOKNOTE flag for the mnt_kern_flag field of struct
mount that permits filesystems to disable the new behavior.
- Creating a default VOP_KQFILTER function: vfs_kqfilter()

My benchmarks have not revealed any performance degradation.

Reviewed by:	jeff, bde
Approved by:	rwatson, jmg (kqueue changes), grehan (mentor)
2005-06-09 20:20:31 +00:00
scottl
7a9b003ce5 Drat! Committed from the wrong branch. Restore HEAD to its previous goodness. 2005-06-09 19:59:09 +00:00
scottl
6be4cb00a4 Back out 1.68.2.26. It was a mis-guided change that was already backed out
of HEAD and should not have been MFC'd.  This will restore UDP socket
functionality, which will correct the recent NFS problems.

Submitted by: rwatson
2005-06-09 19:56:38 +00:00
jkoshy
1d3209ab83 MFP4:
- Implement sampling modes and logging support in hwpmc(4).

- Separate MI and MD parts of hwpmc(4) and allow sharing of
  PMC implementations across different architectures.
  Add support for P4 (EMT64) style PMCs to the amd64 code.

- New pmcstat(8) options: -E (exit time counts) -W (counts
  every context switch), -R (print log file).

- pmc(3) API changes, improve our ability to keep ABI compatibility
  in the future.  Add more 'alias' names for commonly used events.

- bug fixes & documentation.
2005-06-09 19:45:09 +00:00
ups
4421a08742 Lots of whitespace cleanup.
Fix for broken if condition.

Submitted by:	nate@
2005-06-09 19:43:08 +00:00
pjd
47f442bcb9 Rename sysctl security.jail.getfsstatroot_only to security.jail.enforce_statfs
and extend its functionality:

value	policy
0	show all mount-points without any restrictions
1	show only mount-points below jail's chroot and show only part of the
	mount-point's path (if jail's chroot directory is /jails/foo and
	mount-point is /jails/foo/usr/home only /usr/home will be shown)
2	show only mount-point where jail's chroot directory is placed.

Default value is 2.

Discussed with:	rwatson
2005-06-09 18:49:19 +00:00
pjd
5269cbb9cd Remove process information leak from inside a jail, when
security.bsd.see_other_uids is set to 0, etc.
One can check if invisible process is active, by doing:

	# ktrace -p <pid>

If ktrace returns 'Operation not permitted' the process is alive and
if returns 'No such process' there is no such process.

MFC after:	1 week
2005-06-09 18:33:21 +00:00
ups
d9753fcc91 Fix some race conditions for pinned threads that may cause them to run
on the wrong CPU.

Add IPI support for preempting a thread on another CPU.

MFC after:3 weeks
2005-06-09 18:26:31 +00:00
pjd
3af857a21a Avoid code duplication in serval places by introducing universal
kern_getfsstat() function.

Obtained from:	jhb
2005-06-09 17:44:46 +00:00
imp
6bc1b07ae1 Simplify the code a bit after the bzero(). 2005-06-09 05:50:01 +00:00
jeff
f637381b78 - My sub-par public school education has been exposed. s/sentinal/sentinel/
Noticed by:	Emil Mikulic
2005-06-09 04:40:20 +00:00
gad
d916eb91e6 Remove the previous parsing-logic for arguments on the '#!'-line of shell
scripts.  As far as I know, no one has needed the '#!#<' kludge to get at
the behavior implemented by the historical parsing.
2005-06-09 00:27:02 +00:00
jeff
b53b83993c - Under heavy IO load the buf daemon can run for many hundereds of
milliseconds due to what is essentially n^2 algorithmic complexity.  This
   change makes the algorithm N*2 instead.  This heavy processing manifested
   itself as skipping in audio and video playback due to the long scheduling
   latencies and contention on giant by pcm.
 - flushbufqueues() is now responsible for flushing multiple buffers
   rather than one at a time.  This allows us to save our progress in the
   list by using a sentinal.  We must do the numdirtywakeup() and
   waitrunningbufspace() here now rather than in buf_daemon().
 - Also add a uio_yield() after we have processed the list once for bufs
   without deps and again for bufs with deps.  This is to release Giant
   and allow any other giant locked code to proceed.

Tested by:	Many users on current@
Revealed by:	schedgraph traces sent by Emil Mikulic & Anthony Ginepro
2005-06-08 20:26:05 +00:00
rodrigc
b2d9df7a8b Initialize uio_iovcnt to 1 in extattr_list_vp() and extattr_get_vp()
PR:		kern/79357
Approved by:	rwatson
2005-06-08 13:22:10 +00:00
rwatson
1a25bf9ccd In sem_forkhook(), don't attempt to generate a copy of the process semaphore
list on fork() if the process doesn't actually have references to any
semaphores.  This avoids extra work, as well as potentially asking to
allocate storage for 0 references.

Found by:	avatar
MFC after:	1 week
2005-06-08 07:29:22 +00:00
jeff
4a9af33a3f - Clear OWEINACT prior to calling VOP_INACTIVE to remove the possibility
of a vget causing another call to INACTIVE before we're finished.
2005-06-07 22:05:32 +00:00