Commit Graph

8710 Commits

Author SHA1 Message Date
alc
0a10c2b5cd Use the proc mtx to prevent simultaneous changes to p_aioinfo. 2005-05-30 19:33:33 +00:00
alc
3ecc8d1129 Eliminate unnecessary calls to wakeup(); no one sleeps on &aio_freeproc.
Eliminate an unused flag, AIOP_SCHED; it's cleared but never set.
2005-05-30 18:02:00 +00:00
rwatson
5010364761 Rebuild generated system call definition files following the addition of
the audit event field to the syscalls.master file format.

Submitted by:	wsalamon
Obtained from:	TrustedBSD Project
2005-05-30 15:20:21 +00:00
rwatson
370e72b242 Introduce a new field in the syscalls.master file format to hold the
audit event identifier associated with each system call, which will
be stored by makesyscalls.sh in the sy_auevent field of struct sysent.
For now, default the audit identifier on all system calls to AUE_NULL,
but in the near future, other BSM event identifiers will be used.  The
mapping of system calls to event identifiers is many:one due to
multiple system calls that map to the same end functionality across
compatibility wrappers, ABI wrappers, etc.

Submitted by:	wsalamon
Obtained from:	TrustedBSD Project
2005-05-30 15:09:18 +00:00
jeff
33b78c31e9 - Add bufobj_wrefl() to add a write ref to a bufobj that is already locked. 2005-05-30 07:01:18 +00:00
jkoshy
ad86ac4ba4 Kernel hooks to support PMC sampling modes.
Reviewed by:	alc
2005-05-30 06:29:29 +00:00
alc
f570134192 Eliminate aio_activeproc; it's unused. 2005-05-30 05:25:10 +00:00
alc
404d37a14d Eliminate aio_bufjobs; it's unused. 2005-05-29 21:29:15 +00:00
rwatson
aaf5c1d3e8 Normalize white space in syscalls.master: try to use tabs before system
call types.
2005-05-29 20:20:16 +00:00
rwatson
7035f9f56a Kernel malloc layers malloc_type allocation over one of two underlying
allocators: a set of power-of-two UMA zones for small allocations, and the
VM page allocator for large allocations.  In order to maintain unified
statistics for specific malloc types, kernel malloc maintains a separate
per-type statistics pool, which can be monitored using vmstat -m.  Prior
to this commit, each pool of per-type statistics was protected using a
per-type mutex associated with the malloc type.

This change modifies kernel malloc to maintain per-CPU statistics pools
for each malloc type, and protects writing those statistics using critical
sections.  It also moves to unsynchronized reads of per-CPU statistics
when generating coalesced statistics.  To do this, several changes are
implemented:

- In the previous world order, the statistics memory was allocated by
  the owner of the malloc type structure, allocated statically using
  MALLOC_DEFINE().  This embedded the definition of the malloc_type
  structure into all kernel modules.  Move to a model in which a pointer
  within struct malloc_type points at a UMA-allocated
  malloc_type_internal data structure owned and maintained by
  kern_malloc.c, and not part of the exported ABI/API to the rest of
  the kernel.  For the purposes of easing a possible MFC, re-use an
  existing pointer in 'struct malloc_type', and maintain the current
  malloc_type structure size, as well as layout with respect to the
  fields reused outside of the malloc subsystem (such as ks_shortdesc).
  There are several unused fields as a result of no longer requiring
  the mutex in malloc_type.

- Struct malloc_type_internal contains an array of malloc_type_stats,
  of size MAXCPU.  The structure defined above avoids hard-coding a
  kernel compile-time value of MAXCPU into kernel modules that interact
  with malloc.

- When accessing per-cpu statistics for a malloc type, surround read -
  modify - update requests with critical_enter()/critical_exit() in
  order to avoid races during write.  The per-CPU fields are written
  only from the CPU that owns them.

- Per-CPU stats now maintained "allocated" and "freed" counters for
  number of allocations/frees and bytes allocated/freed, since there is
  no longer a coherent global notion of the totals.  When coalescing
  malloc stats, accept a slight race between reading stats across CPUs,
  and avoid showing the user a negative allocation count for the type
  in the event of a race.  The global high watermark is no longer
  maintained for a malloc type, as there is no global notion of the
  number of allocations.

- While tearing up the sysctl() path, also switch to using sbufs.  The
  current "export as text" sysctl format is retained with the same
  syntax.  We may want to change this in the future to export more
  per-CPU information, such as how allocations and frees are balanced
  across CPUs.

This change results in a substantial speedup of kernel malloc and free
paths on SMP, as critical sections (where usable) out-perform mutexes
due to avoiding atomic/bus-locked operations.  There is also a minor
improvement on UP due to the slightly lower cost of critical sections
there.  The cost of the change to this approach is the loss of a
continuous notion of total allocations that can be exploited to track
per-type high watermarks, as well as increased complexity when
monitoring statistics.

Due to carefully avoiding changing the ABI, as well as hardening the ABI
against future changes, it is not necessary to recompile kernel modules
for this change.  However, MFC'ing this change to RELENG_5 will require
also MFC'ing optimizations for soft critical sections, which may modify
exposed kernel ABIs.  The internal malloc API is changed, and
modifications to vmstat in order to restore "vmstat -m" on core dumps will
follow shortly.

Several improvements from:		bde
Statistics approach discussed with:	ups
Tested by:				scottl, others
2005-05-29 13:38:07 +00:00
pjd
58d2b4c193 Fix panic when module is compiled in and it is loaded from loader.conf.
Only panic is fixed, module will be still listed in kldstat(8) output.
Not sure what is correct fix, because adding unloading code in case of
failure to linker_init_kernel_modules() doesn't work.
2005-05-28 23:20:05 +00:00
gad
2add4b872d Change the way options are parsed on the `#!'-line of a shell-script. Instead
of having the kernel parse that line and add an entry to the argument list for
each 'separate word' it finds, have it add only one entry which holds all
the words found on that line.  The old behavior is useful in some situations,
but it does not match the way any other operating system will parse that line.

This has been discussed in the thread "Bug in #! processing - One More Time"
on the freebsd-arch mailing list (starting back on Feb 24, 2005).  The first
few messages in that thread provide the background in much detail.

PR:		16393
Reviewed by:	freebsd-arch
2005-05-28 22:42:41 +00:00
pjd
7543a23525 Prevent loading modules with are compiled into the kernel.
PR:		kern/48759
Submitted by:	Pawe³ Ma³achowski <pawmal@unia.3lo.lublin.pl>
Patch from:	demon
MFC after:	2 weeks
2005-05-28 22:29:44 +00:00
rwatson
067b94d2d9 Regenerate from syscalls.master. 2005-05-28 14:35:43 +00:00
rwatson
c0001c0613 Mark ntp_gettime() as MSTD, since its system call path will acquire
Giant if required.
2005-05-28 14:35:05 +00:00
rwatson
f1dfea9d61 Explicitly acquire Giant around the ntp_gettime() and assert it in the
sysctl path.  While this code is close to MPSAFE, it may require some
additional locking.  Mark ntp_gettime1() as GIANT_REQUIRED for now.

Suggested by:	phk
2005-05-28 14:34:41 +00:00
rwatson
fb931ae00a Regenerate for updated syscalls.master. 2005-05-28 13:24:05 +00:00
rwatson
ceb26b4c48 Mark the following compatability system calls as MCOMPAT or MCOMPAT4 based
on the their simply wrapping MPSAFE implementations of existing MPSAFE
system calls:

  getfsstat()
  lseek()
  stat()
  lstat()
  truncate()
  ftruncate()
  statfs()
  fstatfs()

Note that ogetdirentries() is not marked MPSAFE because it does not share
the MPSAFE implementation used for getdirentries(), and requires separate
locking to be implemented.
2005-05-28 13:23:42 +00:00
rwatson
c060f4b949 Regenerate from syscalls.master. 2005-05-28 13:13:01 +00:00
rwatson
0439e13c01 Mark quotactl() as MSTD. 2005-05-28 13:12:04 +00:00
rwatson
527c640ad3 Acquire Giant explicitly in quotactl() so that the syscalls.master
entry can become MSTD.
2005-05-28 13:11:35 +00:00
rwatson
ff36d1a493 Regenerate from updated syscalls.master. 2005-05-28 13:09:56 +00:00
rwatson
35ffa17830 Mark kenv(2) as MPSAFE, since it appears to be properly locked down. 2005-05-28 13:09:41 +00:00
rwatson
ea08d61a73 Regenerate system call tables from syscalls.master. 2005-05-28 13:08:26 +00:00
rwatson
acb673063c Also mark the COMPAT4 version of fhstatfs() as MPSAFE. 2005-05-28 13:07:43 +00:00
rwatson
fa7cf37c72 Mark fhopen(), fhstat(), and fhstatfs() as MSTD, since they now
acquire Giant themselves.
2005-05-28 12:59:33 +00:00
rwatson
66d882141f Acquire Giant explicitly in fhopen(), fhstat(), and kern_fhstatfs(),
so that we can start to eliminate the presence of non-MPSAFE system
call entries in syscalls.master.
2005-05-28 12:58:54 +00:00
pjd
ac435fbb13 Remove (now) unused argument 'td' from cvtstatfs(). 2005-05-27 19:23:48 +00:00
pjd
788f75ddb2 Sync locking in freebsd4_getfsstat() with getfsstat().
Giant is probably also needed in kern_fhstatfs().
2005-05-27 19:21:08 +00:00
pjd
2fc56b12a9 Use consistent style in functions I want to modify in the near future. 2005-05-27 19:15:46 +00:00
rwatson
ac1a365e2d In the current world order, each socket has two mutexes: a mutex
that protects socket and receive socket buffer state, and a second
mutex to protect send socket buffer state.  In some places, the
mutex shared between the socket and receive socket buffer will be
acquired twice, once by each layer, resulting in some
inconsistency, but providing the abstraction benefit of being able
to more easily separate the two mutexes in the future if desired.

When transitioning a socket to the SS_ISDISCONNECTING or
SS_ISDISCONNECTED states, grab the socket/receive socket buffer lock
once rather than grabbing it as the socket lock, modifying socket
state, then grabbing a second time as the receive lock in order to
modify the socket buffer state to indicate no further data can be
read.  This change is believed to close a race between the change in
socket state and the change in socket buffer state, which for a
remotely initiated close on a UNIX domain socket, resulted in
soreceive() returning ENOTCONN rather than an EOF condition.

A similar race still exists in the case of send, however, and is
harder to fix as the socket and send socket buffer mutexes are not
the same, and we would like to avoid holding combinations of socket
mutexes over sb_upcall until we've finished clarifying the locking
protocol for upcalls.

This change has the side affect of reducing the number of mutex
operations to initiate disconnect or perform disconnect on a
socket by two.

PR:		78824
Rerported by:	Marc Olzheim <marcolz@stack.nl>
MFC after:	2 weeks
2005-05-27 17:16:43 +00:00
davidxu
5a8d3af0d6 Remove thread_upcall_check, it was used to avoid race bug in earlier
day's sleep queue code, today the bug no longer exists.
please see 04/25/2004 freebsd-threads@ mailing list archive.
2005-05-27 15:57:27 +00:00
davidxu
3fbc6983fa Remove sleep queue hack, it is no longer needed with current sleep queue.
Actually, it causes process to hang when it is being debugged.

PR: gnu/77818
2005-05-27 04:27:22 +00:00
jmg
07e93041c6 make stat return an zero'd struct, and be a FIFO again... This is only
to fix libc_r since it requires stat to close fd's, and so commented in
the code...

PR:		threads/75795
Reviewed by:	ps
MFC after:	1 week
2005-05-24 23:42:50 +00:00
cognet
9bcd47137c Don't set the default of kern.fallback_elf_brand to FreeBSD for arm, as
binutils now do the job for us
2005-05-24 22:21:44 +00:00
ups
acfce18a2a Use low level constructs borrowed from interrupt threads to wait for
work in proc0.
Remove the TDP_WAKEPROC0 workaround.
2005-05-23 23:01:53 +00:00
pjd
0b89469bda Protect fsid in freebsd4_getfsstat() in simlar way as it is done in
getfsstat().
2005-05-22 23:05:27 +00:00
pjd
a6e0e217b2 If we need to hide fsid, kern_statfs()/kern_fstatfs() will do it for us,
so do not duplicate the code in cvtstatfs().
Note, that we now need to clear fsid in freebsd4_getfsstat().

This moves all security related checks from functions like cvtstatfs()
and will allow to add more security related stuff (like statfs(2), etc.
protection for jails) a bit easier.
2005-05-22 21:52:30 +00:00
njl
9ab8d98ce5 Document that the returned pointer should be freed even if the number
of items returned is 0.
2005-05-20 05:04:22 +00:00
ups
c8d93020ce Fix a bug that caused preemption to happen for a thread in the same
ksegrp with the same priority as the currently running thread.
This can cause propagate_priority() to panic.

Pointy hat to: ups
2005-05-19 01:08:30 +00:00
pjd
4c810f35cd devfs_first() return value isn't used, remove it. 2005-05-18 22:05:12 +00:00
alc
bcfd7ad6a6 Revert revision 1.164: pmap_qremove() does not require protection by
VM_LOCK_GIANT.

Discussed with: jeff
2005-05-14 05:09:11 +00:00
jhb
6772446cb8 Actually use the iterating variable in the for loop when trying to avoid
overflow.

Reported by:	Vladislav Shabanov vs at rambler-co dot ru
MFC after:	1 week
Glanced at:	alfred
2005-05-12 20:04:48 +00:00
pjd
c6e5e8f446 We don't use 'mp' variable, but we do want to mount devfs, ehh. 2005-05-12 01:49:51 +00:00
pjd
91b47597be Remove unised variable introduced by accident in rev 1.168.
Found by:	Coverity Prevent analysis tool
2005-05-11 19:50:34 +00:00
pjd
f66a55ffcd Plug memory leaks.
Found by:		Coverity Prevent analysis tool
2005-05-11 19:27:38 +00:00
kan
4085840a33 Handle theoretical case of vfs_export being called with both MNT_DELEXPORT and
MNT_EXPORT flags set. Do not reuse the memory that has just been freed.
2005-05-11 18:25:42 +00:00
cperciva
a199a4f74b Fix two issues which were missed in FreeBSD-SA-05:08.kmem.
Reported by:	Uwe Doering
2005-05-07 00:41:36 +00:00
cperciva
e513415af9 If we are going to
1. Copy a NULL-terminated string into a fixed-length buffer, and
2. copyout that buffer to userland,
we really ought to
0. Zero the entire buffer
first.

Security: FreeBSD-SA-05:08.kmem
2005-05-06 02:50:00 +00:00
davidxu
af64c19b3b Only check signal event, single threading event shouldn't be reported. 2005-05-05 06:42:02 +00:00
emax
a52b6c9ce3 Change m_uiotombuf so it will accept offset at which data should be copied
to the mbuf. Offset cannot exceed MHLEN bytes. This is currently used to
fix Ethernet header alignment problem on alpha and sparc64. Also change all
users of m_uiotombuf to pass proper offset.

Reviewed by:	jmg, sam
Tested by:	Sten Spans "sten AT blinkenlights DOT nl"
MFC after:	1 week
2005-05-04 18:55:03 +00:00
rwatson
2197ab2d93 Introduce MAC Framework and MAC Policy entry points to label and control
access to POSIX Semaphores:

mac_init_posix_sem()            Initialize label for POSIX semaphore
mac_create_posix_sem()          Create POSIX semaphore
mac_destroy_posix_sem()         Destroy POSIX semaphore
mac_check_posix_sem_destroy()   Check whether semaphore may be destroyed
mac_check_posix_sem_getvalue()  Check whether semaphore may be queried
mac_check_possix_sem_open()     Check whether semaphore may be opened
mac_check_posix_sem_post()      Check whether semaphore may be posted to
mac_check_posix_sem_unlink()    Check whether semaphore may be unlinked
mac_check_posix_sem_wait()      Check whether may wait on semaphore

Update Biba, MLS, Stub, and Test policies to implement these entry points.
For information flow policies, most semaphore operations are effectively
read/write.

Submitted by:	Dandekar Hrishikesh <rishi_dandekar at sbcglobal dot net>
Sponsored by:	DARPA, McAfee, SPARTA
Obtained from:	TrustedBSD Project
2005-05-04 10:39:15 +00:00
rwatson
182429e8d0 Move definitions of 'struct kuser' and 'struct ksem' from uipc_sem.c
to ksem.h so that they are accessible from the MAC Framework for the
purposes of labeling and enforcing additional protections.  #error
if these are included without _KERNEL, since they are not intended
(nor installed) for user application use.

Submitted by:	Dandekar Hrishikesh <rishi_dandekar at sbcglobal dot net>
Sponsored by:	DARPA, SPARTA
Obtained from:	TrustedBSD Project
2005-05-03 20:21:24 +00:00
jeff
33ac8108e2 - Initialize vfslocked correctly early enough for MAC to compile.
- Fix one place where we explicitly drop Giant!

Pointy hat to:	me
Submitted by:	Max Laier
Warned by:	Tinderbox
2005-05-03 16:24:59 +00:00
jeff
79452537e3 - Remove two mtx_asserts that can incorrectly trigger if
devstat_end_transaction is called from a fast interrupt.  Presently
   there is no way for mtx_assert to determine that we're not executing
   in a real thread context.

Submitted by:	jhusted@isilon.com
2005-05-03 10:58:05 +00:00
jeff
92f17d1e6a - A vnode may have made its way onto the free list while it was being
vgone'd.  We must remove it from the freelist before returning in
   vtryrecycle() or we may get a duplicate free.

Reported by:	kkenn
2005-05-03 10:56:00 +00:00
jeff
ab437d7b1d - Use namei to acquire Giant for VFS if it is necessary. Drop the explicit
Giant acquisition.
 - Remove GIANT_REQUIRED in the few remaining cases; the vm and vfs have
   both been locked.
2005-05-03 10:55:05 +00:00
jeff
451e14446f - Use NAMEI to pickup Giant if we need it in fpcheckstd(). 2005-05-03 10:52:22 +00:00
jeff
617ce99006 - Neither of our image formats require Giant now that the vm and vfs have
been locked.
2005-05-03 10:51:38 +00:00
csjp
431f1afe8c Since it is not possible for curthread to be NULL in this context,
drop the check+initialization for a straight initialization. Also
assert that curthread will never be NULL just to be sure.

Discussed with:	rwatson, peter
MFC after:	1 week
2005-05-02 02:07:55 +00:00
jeff
dd41538cd8 - All buffers should either be clean or dirty. If neither of these flags
are set when we attempt to remove a buffer from a queue we should panic.
   Hopefully this will catch the source of the wrong bufobj panics.

Sponsored by:	Isilon Systems, Inc.
2005-05-01 12:00:36 +00:00
jeff
ff4a7a72e9 - Remove spls and comments relating to them. 2005-05-01 01:01:17 +00:00
jeff
22004a9723 - Remove an old splcam hack. 2005-05-01 00:59:55 +00:00
jeff
1bc61f8f0f - Remove unnecessary spls. 2005-05-01 00:59:34 +00:00
jeff
80bb41c921 - Return EACCES if we're trying to exec on a vp with no object.
Errno supplied by:	cperciva
2005-05-01 00:58:19 +00:00
sam
17d6060ac9 o enable shutdown of taskqueue threads; the thread servicing the queue checks
a new entry in the taskqueue struct each time it wakes up to see if it
  should terminate
o adjust TASKQUEUE_DEFINE_THREAD & co. to record the thread/proc identity for
  the shutdown rendezvous
o replace wakeup after adding a task to a queue with wakeup_one; this helps
  queues where multiple threads are used to service tasks (e.g. acpi)
o remove NULL check of tq_enqueue method; it should never be NULL

Reviewed by:	dfr, njl
2005-05-01 00:38:11 +00:00
dwhite
c8fa809967 Implement an alternate method to stop CPUs when entering DDB. Normally we use
a regular IPI vector, but this vector is blocked when interrupts are disabled.
With "options KDB_STOP_NMI" and debug.kdb.stop_cpus_with_nmi set, KDB will
send an NMI to each CPU instead. The code also has a context-stuffing
feature which helps ddb extract the state of processes running on the
stopped CPUs.

KDB_STOP_NMI is only useful with SMP and complains if SMP is not defined.
This feature only applies to i386 and amd64 at the moment, but could be
used on other architectures with the appropriate MD bits.

Submitted by:	ups
2005-04-30 20:01:00 +00:00
jeff
cb9dfadd87 - Remove long dead splbio() calls and comments relating to the old
synchronization mechanism.
2005-04-30 12:18:50 +00:00
jeff
116d72569a - Don't acquire Giant before calling b_biodone, individual consumers are
now required to do so themselves.

Sponsored by:	Isilon Systems, Inc.
2005-04-30 11:44:22 +00:00
jeff
32c015f463 - Acquire Giant in AIO's iodone routine. VFS will no longer do it for us
soon.

Sponsored by:	Isilon Systems, Inc.
2005-04-30 11:27:31 +00:00
jeff
f9172cb275 - Call VM_LOCK_GIANT in cluster_callback() to protect some pmap calls. VFS
will not be acquiring Giant before calling this function anymore.

Sponsored by:	Isilon Systems, Inc.
2005-04-30 11:26:58 +00:00
jeff
7354fc5e28 - In vnlru_free() remove the vnode from the free list before we call
vtryrecycle().  We could sometimes get into situations where two threads
   could try to recycle the same vnode before this.
 - vtryrecycle() is now responsible for returning the vnode to the free list
   if it fails and someone else hasn't done it.
 - Make a new function vfreehead() which moves a vnode to the head of the
   free list and use it in vgone() to clean up that code a bit.

Sponsored by:	Isilon Systems, Inc.
Reported by:	pho, kkenn
2005-04-30 11:22:40 +00:00
jeff
0e56b01ed6 - Don't vgonel() via vgone() or vrecycle() if the vnode is already doomed.
This fixes forced unmounts via nullfs.

Reported by:	kkenn
Sponsored by:	Isilon Systems, Inc.
2005-04-27 10:03:21 +00:00
jeff
a80bbe799e - Stop setting vxthread, we've asserted that it was useless for several
weeks now.
2005-04-27 09:17:33 +00:00
jeff
18cd3a36d3 - Stop checking vxthread, we've asserted that it was useless for several
weeks.
2005-04-27 09:17:11 +00:00
jeff
f869be5c72 - Pass the ISOPEN flag to namei so filesystems will know we're about to
open them or otherwise access the data.
2005-04-27 09:05:19 +00:00
mdodd
56c42039a5 Add missing break.
Found by:	marcus
2005-04-25 00:48:04 +00:00
sam
0f63abff2a o eliminate modification of task structures after their run to avoid
modify-after-free races when the task structure is malloc'd
o shrink task structure by removing ta_flags (no longer needed with
  avoid fix) and combining ta_pending and ta_priority

Reviewed by:	dwhite, dfr
MFC after:	4 days
2005-04-24 16:52:45 +00:00
davidxu
50a5bbcbfd Wake up swapper process if needed.
PR: kern/78474
Submitted by: Sam Lawrance <boris at brooknet dot com dot au>
2005-04-23 05:06:44 +00:00
davidxu
a247de6aeb Regen. 2005-04-23 02:38:17 +00:00
davidxu
1b8f9e10e1 Add new syscall thr_new to create thread in atomic, it will
inherit signal mask from parent thread, setup TLS and stack, and
user entry address.
Also support POSIX thread's PTHREAD_SCOPE_PROCESS and PTHREAD_SCOPE_SYSTEM,
sysctl is also provided to control the scheduler scope.
2005-04-23 02:36:07 +00:00
davidxu
2155a04472 Change cpu_set_kse_upcall to more generic style, so we can reuse it
in other codes. Add cpu_set_user_tls, use it to tweak user register
and setup user TLS. I ever wanted to merge it into cpu_set_kse_upcall,
but since cpu_set_kse_upcall is also used by M:N threads which may
not need this feature, so I wrote a separated cpu_set_user_tls.
2005-04-23 02:32:32 +00:00
jeff
4eaa5ebe1b - Define the real lock order with cdev and a few vm/vfs related locks. This
can be removed once cdev no longer calls free() with the cdev lock held.
2005-04-22 22:43:31 +00:00
jeff
b29bfc6efa - Check LO_DUPOK as well as LOP_DUPOK when determining whether we should
warn about duplicate acquires.

Sponsored by:	Isilon Systems, Inc.
2005-04-22 22:39:46 +00:00
trhodes
f02068c038 Get the directory structure correct in a comment.
Submitted by:	Samy Al Bahra
2005-04-22 19:09:12 +00:00
jeff
31cfb7f242 - Disable code which allows getnewvnode() to fail. Many ffs_vget() callers
do not correctly deal with failures.  This presently risks deadlock
   problems if dependency processing is held up by failures to allocate
   a vnode, however, this is better than the situation with the failures.

Sponsored by:	Isilon Systems, Inc.
2005-04-22 00:57:05 +00:00
jeff
d8b31a35ea - Add two KASSERTs to prevent us from recycling a buf that is still on a
bufobj list.

Sponsored by:	Isilon Systems, Inc.
2005-04-22 00:53:20 +00:00
marcel
dd5b3be596 Do not conditionally compile the contents of this file upon whether
HWPMC_HOOKS is defined. The pmc_cpu_is_*() functions in this file
are referenced unconditionally by hwpmc(4).

This is mostly a stop-gap. The pmc_cpu_is*() function should
probably be declared inline in <sys/pmc.h> or <sys/pmckern.h> and
the function pointers with corresponding SX lock should probably
be moved to another file and compiled conditionally upon HWPMC_HOOKS.

Ok'd by: jkoshy@
2005-04-20 20:30:59 +00:00
davidxu
0719b14efb Inherit signal mask for child process in fork1(), RELENG_4 and other
*BSD have this behaviour, also it is required by POSIX.

PR: kern/80130
Submitted by: Kostik Belousov konstantin.belousov at zoral dot com dot ua
2005-04-20 13:14:52 +00:00
mdodd
7826c585d5 Check sopt_level in uipc_ctloutput() and return early if it is non-zero.
This prevents unintended consequnces when an application calls things like
setsockopt(x, SOL_SOCKET, SO_REUSEADDR, ...) on a Unix domain socket.
2005-04-20 02:57:56 +00:00
pjd
db9ce4609f Call g_waitidle() before every check the list of holds is empty.
Suggested by:	phk
2005-04-19 21:44:44 +00:00
davidxu
9452a25d2d Clear P_STATCHILD earlier to avoid unnecessary retrying. 2005-04-19 12:31:15 +00:00
davidxu
913d50be4f Oops, forgot to update this file.
Fix a race condition between kern_wait() and thread_stopped().
Problem is in kern_wait(), parent process steps through children list,
once a child process is skipped, and later even if the child is stopped,
parent process still sleeps in msleep(), the race happens if parent
masked SIGCHLD.

Submitted by : Peter Edwards peadar.edwards at gmail dot com
MFC after    : 4 days
2005-04-19 08:11:28 +00:00
davidxu
02615ff23a Fix a race condition between kern_wait() and thread_stopped().
Problem is in kern_wait(), parent process steps through children list,
once a child process is skipped, and later even if the child is stopped,
parent process still sleeps in msleep(), the race happens if parent
masked SIGCHLD.

Submitted by : Peter Edwards peadar.edwards at gmail dot com
MFC after    : 4 days
2005-04-19 08:07:28 +00:00
phk
ed5a7da798 Call g_waitidle() instead of GEOM using the root_mount_hold() KPI.
GEOM could (and will) get events as a result of drivers coming in
late so a one-shot method is not good enough for GEOM.
2005-04-19 06:23:59 +00:00
jkoshy
dc3444cd91 Bring a working snapshot of hwpmc(4), its associated libraries, userland utilities
and documentation into -CURRENT.

Bump FreeBSD_version.

Reviewed by:	alc, jhb (kernel changes)
2005-04-19 04:01:25 +00:00
phk
b7f29c0fc0 Add a named reference-count KPI to hold off mounting of the root filesystem.
While we wait for holds to be released, print a list of who holds us
back once per second.

Use the new KPI from GEOM instead of vfs_mount.c calling g_waitidle().

Use the new KPI also from ata.

With ATAmkIII's newbusification, ata could narrowly miss the window
and ad0 would not exist when we tried to mount root.
2005-04-18 21:21:26 +00:00
phk
4bd811c8dd Initialize mountlist_mtx with an MTX_SYSINIT(), we need it to be ready
earlier.
2005-04-18 21:11:47 +00:00
rwatson
75030e30f6 Introduce p_canwait() and MAC Framework and MAC Policy entry points
mac_check_proc_wait(), which control the ability to wait4() specific
processes.  This permits MAC policies to limit information flow from
children that have changed label, although has to be handled carefully
due to common programming expectations regarding the behavior of
wait4().  The cr_seeotheruids() check in p_canwait() is #if 0'd for
this reason.

The mac_stub and mac_test policies are updated to reflect these new
entry points.

Sponsored by:	SPAWAR, SPARTA
Obtained from:	TrustedBSD Project
2005-04-18 13:36:57 +00:00
rwatson
997d8772c4 Remove end-of-line tabs.
MFC after:	3 days
2005-04-18 11:51:10 +00:00
das
5aec008257 Add a sysctl that returns the full path of a process' text file.
This information is needed by things like `gdb -p' and Sun's javac,
and previously it could only be obtained via procfs
2005-04-18 02:10:37 +00:00
rwatson
155bfd8789 Introduce three additional MAC Framework and MAC Policy entry points to
control socket poll() (select()), fstat(), and accept() operations,
required for some policies:

        poll()          mac_check_socket_poll()
        fstat()         mac_check_socket_stat()
        accept()        mac_check_socket_accept()

Update mac_stub and mac_test policies to be aware of these entry points.
While here, add missing entry point implementations for:

        mac_stub.c      stub_check_socket_receive()
        mac_stub.c      stub_check_socket_send()
        mac_test.c      mac_test_check_socket_send()
        mac_test.c      mac_test_check_socket_visible()

Obtained from:	TrustedBSD Project
Sponsored by:	SPAWAR, SPARTA
2005-04-16 18:46:29 +00:00
rwatson
76ab38319d In mac_get_fd(), remove unconditional acquisition of Giant around copying
of the socket label to thread-local storage, and replace it with
conditional acquisition based on debug.mpsafenet.  Acquire the socket
lock around the copy operation.

In mac_set_fd(), replace the unconditional acquisition of Giant with
the conditional acquisition of Giant based on debug.mpsafenet.  The socket
lock is acquired in mac_socket_label_set() so doesn't have to be
acquired here.

Obtained from:	TrustedBSD Project
Sponsored by:	SPAWAR, SPARTA
2005-04-16 18:33:13 +00:00
marius
dfe8329b58 Increase default HZ for sparc64 to 1000. 2005-04-16 15:07:41 +00:00
rwatson
51183f0f84 Introduce new MAC Framework and MAC Policy entry points to control the use
of system calls to manipulate elements of the process credential,
including:

        setuid()                mac_check_proc_setuid()
        seteuid()               mac_check_proc_seteuid()
        setgid()                mac_check_proc_setgid()
        setegid()               mac_check_proc_setegid()
        setgroups()             mac_check_proc_setgroups()
        setreuid()              mac_check_proc_setreuid()
        setregid()              mac_check_proc_setregid()
        setresuid()             mac_check_proc_setresuid()
        setresgid()             mac_check_rpoc_setresgid()

MAC checks are performed before other existing security checks; both
current credential and intended modifications are passed as arguments
to the entry points.  The mac_test and mac_stub policies are updated.

Submitted by:	Samy Al Bahra <samy@kerneled.org>
Obtained from:	TrustedBSD Project
2005-04-16 13:29:15 +00:00
rwatson
04a7b2d379 Modify the alq(9) alq_open() API to accept a file creation mode, rather
than defaulting the cmode argument to vn_open() to 0.  Supply a default
argument of ALQ_DEFAULT_CMODE (0600) in current callers.

Discussed with/pointed out by:	hmp
Reveiwed by:	jeff, hmp
MFC after:	3 days
2005-04-16 12:12:27 +00:00
maxim
2e89e331eb Fix a typo in the comment.
Noticed by:	Samy Al Bahra
2005-04-15 14:01:43 +00:00
jhb
249414d2cc Close a race between sleepq_broadcast() and sleepq_catch_signals().
Specifically, sleepq_broadcast() uses td_slpq for its private pending
queue of threads that it is going to wake up after it takes them off the
sleep queue.  The problem is that if one of the threads is actually not
asleep yet, then we can end up with td_slpq being corrupted and/or the
thread being made runnable at the wrong time resulting in the td_sleepqueue
== NULL assertion failures occasionally reported under heavy load.

The fix is to stop being so fancy and ditch the whole pending queue bit.
Instead, sleepq_remove_thread() and sleepq_resume_thread() were merged
into one function that requires the caller to hold sched_lock.  This
fixes several places that unlocked sched_lock only to call a function
that then locked sched_lock, so even though sched_lock is now held
slightly longer, removing the extra lock acquires (1 pair instead of 3
in some cases) probably makes it an overall win if you don't include the
fact that it closes a race.  This is definitely a 5.4 candidate.

PR:		kern/79693
Submitted by:	Steven Sears stevenjsears at yahoo dot com
MFC after:	4 days
2005-04-14 06:30:32 +00:00
jeff
b38ffce1e7 - Remove a debugging printf that slipped in.
Spotted by:	Peter Wemm
2005-04-13 23:36:28 +00:00
avatar
e496403c65 According to the comment in struct tty, t_modem is optional; hence we should
guard against NULL t_modem entry. Otherwise, driver doesn't have t_modem
callback implemented(such like sys/dev/usb/ucycom.c) would panic when
someone opens the driver's associated tty device.

Reviewed by:	phk, sam (mentor)
2005-04-13 13:56:17 +00:00
jeff
afab3762a0 - Change all filesystems and vfs_cache to relock the dvp once the child is
locked in the ISDOTDOT case.  Se vfs_lookup.c r1.79 for details.

Sponsored by:	Isilon Systems, Inc.
2005-04-13 10:59:09 +00:00
jeff
5642885b84 - Change vop_lookup_post assertions to reflect recent vfs_lookup changes.
Sponsored by:	Isilon Systems, Inc.
2005-04-13 10:57:53 +00:00
jeff
ee55350db0 - Further simplify lookup; Force all filesystems to relock in the DOTDOT
case.  There are bugs in some which didn't unlock in the ISDOTDOT case
   to begin with that need to be addressed seperately.  This simplifies
   things anyway.
 - Fix relookup() to prevent it from vrele()'ing the dvp while the vp
   is locked.  Catch up to other lookup changes.

Sponsored by:	Isilon Systems, Inc.
Reported by:	Peter Wemm
2005-04-13 10:57:13 +00:00
mdodd
bdcac6ad82 Implement unix(4) socket options LOCAL_CREDS and LOCAL_CONNWAIT.
- Add unp_addsockcred() (for LOCAL_CREDS).
- Add an argument to unp_connect2() to differentiate between
  PRU_CONNECT and PRU_CONNECT2. (for LOCAL_CONNWAIT)

Obtained from:	 NetBSD (with some changes)
2005-04-13 00:01:46 +00:00
rwatson
005d2e3810 Consistently style function declarations in kern_malloc.c.
MFC after:	3 days
2005-04-12 23:54:34 +00:00
jhb
f9da7305b5 Use PCPU_LAZY_INC() for cnt.v_{intr,trap,syscalls} rather than atomic
operations in some places and simple non-per CPU math in others.
2005-04-12 23:18:54 +00:00
vkashyap
363d709ec6 The latest release of the FreeBSD driver (twa) for
3ware's 9xxx series controllers.  This corresponds to
the 9.2 release (for FreeBSD 5.2.1) on the 3ware website.

Highlights of this release are:

1. The driver has been re-architected to use a "Common Layer"
    (all tw_cl* files), which is a consolidation of all OS-independent
    parts of the driver.  The FreeBSD OS specific portions of the
    driver go into an "OS Layer" (all tw_osl* files).
    This re-architecture is to achieve better maintainability, consistency
    of behavior across OS's, and better portability to new OS's (drivers
    for new OS's can be written by just adding an OS Layer that's specific
    to the OS, by complying to a "Common Layer Programming Interface" API.

2. The driver takes advantage of multiple processors.

3. The driver has a new firmware image bundled, the new features of which
   include Online Capacity Expansion and multi-lun support, among others.
   More details about 3ware's 9.2 release can be found here:
   http://www.3ware.com/download/Escalade9000Series/9.2/9.2_Release_Notes_Web.pdf

Since the Common Layer is used across OS's, the FreeBSD specific include
path for header files (/sys/dev/twa) is not part of the #include pre-processor
directive in any of the source files.  For being able to integrate twa into
the kernel despite this, Makefile.<arch> has been changed to add the include
path to CFLAGS.

Reviewed by: scottl
2005-04-12 22:07:11 +00:00
imp
a404b5c683 resource_list_purge: release the resources in this list, and purge the
elements of this list (eg, reset it).

Man page to follow
2005-04-12 15:20:36 +00:00
imp
8d4424715d rman_set_device() seems to have been omitted by mistake. Implement it. 2005-04-12 06:21:59 +00:00
jeff
802f97f119 - Remove unused include. 2005-04-12 05:45:58 +00:00
jeff
b53ca6ad0a - Differentiate two UPGRADE panics so I have a better idea of what's going
on here.
2005-04-12 05:43:03 +00:00
imp
02a4d4e044 Return the resource created/found in resource_list_add to avoid an extra
resouce_list_find in some places.

Suggested by: sam
Found by: Coventry Analysis tool.
2005-04-12 04:22:17 +00:00
jeff
a6b9b8b072 - Mark the VOPs that require exclusive locks. Those that aren't marked
with E may be called with a shared lock held.  This list really could
   be made per filesystem if we had any filesystems which differed from
   ffs in locking guarantees.  VFS itself is not sensitive to this except
   where vgone() etc. are concerned.

Sponsored by:	Isilon Systems, Inc.
2005-04-11 15:19:29 +00:00
jeff
b391d2675b - Enable ASSERT_VOP_ELOCKED and assert_vop_elocked() now that vnode_if.awk
uses it.

Sponsored by:	Isilon Systems, Inc.
2005-04-11 15:17:06 +00:00
jeff
17be4cbfa0 - Change the VOP_LOCK UPGRADE in vput() to do a LK_NOWAIT to avoid a
potential lock order reversal.  Also, don't unlock the vnode if this
   fails, lockmgr has already unlocked it for us.
 - Restructure vget() now that vn_lock() does all of VI_DOOMED checking
   for us and also handles the case where there is no real lock type.
 - If VI_OWEINACT is set, we need to upgrade the lock request to EXCLUSIVE
   so that we can call inactive.  It's not legal to vget a vnode that hasn't
   had INACTIVE called yet.

Sponsored by:	Isilon Systems, Inc.
2005-04-11 09:28:32 +00:00
jeff
1b29b37e26 - Assert that we're no longer doing recursive vn_locks in inactive/reclaim
as I'd like to get rid of the vxthread.
 - Handle lock requests which don't actually want a lock as this is a
   much more convenient place to handle this condition than in vget().
   These requests simply want to know that VI_DOOMED isn't set.
 - Correct a test at the end of vn_lock, if error !=0 should be
   if error == 0, this has been broken since I comitted the VI_DOOMED
   changes, but no one ran into it because vget() duplicated this
   functionality.

Sponsored by:	Isilon Systems, Inc.
2005-04-11 09:23:56 +00:00
jeff
fbc2ade92c - vput(tvp) before vrele(tdvp) in kern_rename() to avoid lock order issues. 2005-04-11 09:19:08 +00:00
njl
46ef074ef7 Add debugging prints to all the methods in case there are problems with
managing levels.  This can be enabled with the debug.cpufreq.verbose
tunable and sysctl.
2005-04-10 19:11:23 +00:00
das
7a8e8974c6 Suspend all other threads in the process while generating a core dump.
The main reason for doing this is that the ELF dump handler expects
the thread list to be fixed while the dump header is generated, so an
upcall that occurs at the wrong time can lead to buffer overruns and
other Bad Things.

Another solution would be to grab sched_lock in the ELF dump handler,
but we might as well single-thread, since the process is about to die.
Furthermore, I think this should ensure that the register sets in the
core file are sequentially consistent.
2005-04-10 02:31:24 +00:00
pjd
853bddab29 CDEV lock should be before 'system map' lock.
Hardcode this order to help track down reported LOR.

LOR reported by:	Thierry Herbelot <thierry@herbelot.com>
LOR info:		http://sources.zabbadoz.net/freebsd/lor.html#080
2005-04-09 13:32:01 +00:00
jeff
4048451472 - Remove the namei NOOBJ flag. It is meaningless now.
Sponsored by:	Isilon Systems, Inc.
2005-04-09 12:04:36 +00:00
jeff
46c812596a - If we vrele() a dvp while the child is locked we can potentially deadlock
when vrele() acquires the directory lock in the wrong order.  Fix this
   via the following changes:
 - Keep the directory locked after VOP_LOOKUP() until we've determined
   what we're going to do with the child.  This allows us to remove the
   complicated post LOOKUP code which determins whether we should lock or
   unlock the parent.  This means we may have to vput() in the appropriate
   cases later, rather than doing an unsafe vrele.
 - in NDFREE() keep two flags to indicate whether we need to unlock vp or
   dvp.  This allows us to vput rather than vrele in the appropriate
   cases without rechecking the flags.  Move the code to handle dvp after
   we handle vp.
 - Remove some dead code from namei() that was the result of changes to
   VFS_LOCK_GIANT().

Sponsored by:	Isilon Systems, Inc.
2005-04-09 11:53:16 +00:00
pjd
1a889a3a93 Add a missing terminator.
Confirmed by:	rwatson
2005-04-09 11:31:31 +00:00
glebius
f81c6c3eb6 Add additional newline to debug.mutex.prof.stats header, so that
column names are printed exactly above the columns.
2005-04-08 14:14:09 +00:00
ups
7bac02c146 Sprinkle some volatile magic and rearrange things a bit to avoid race
conditions in critical_exit now that it no longer blocks interrupts.

Reviewed by:	jhb
2005-04-08 03:37:53 +00:00
phk
b7a4869221 Fix bug in vfs_hash_rehash(): use correct bucket. This only affected
msdosfs which is broken in other ways too.
2005-04-07 07:54:08 +00:00
phk
46c303a78b Constify hexdump() harder. 2005-04-06 10:14:13 +00:00
jeff
a3d1d94c5a - Remove dead code. 2005-04-06 10:11:14 +00:00
jeff
60d07eec30 - Assert that the bufobj matches in flushbuflists. I still haven't gotten
to root cause on exactly how this happens.
 - If the assert is disabled, we presently try to handle this case, but the
   BUF_UNLOCK was missing.  Thus, if this condition ever hit we would leak
   a buf lock.

Many thanks to Peter Holm for all his help in finding this bug.  He really
put more effort into it than I did.
2005-04-06 06:49:46 +00:00
jeff
d42252c158 - Move NDFREE() from vfs_subr to vfs_lookup where namei() is. 2005-04-05 08:58:49 +00:00
jeff
01f6ce3a70 - Use taskqueue_thread rather than taskqueue_swi since our task is going
to vrele, which may vop lock.  This is not safe in a software interrupt
   context.
2005-04-05 08:51:45 +00:00
csjp
8a050fbf3b Assert that the vnode is locked. This is meant to catch bugs or
mis-use of the vnode API in conditions where IO_NODELOCKED has been
used without the vnode actually being locked.
2005-04-05 01:11:43 +00:00
jhb
41cadaa11e Divorce critical sections from spinlocks. Critical sections as denoted by
critical_enter() and critical_exit() are now solely a mechanism for
deferring kernel preemptions.  They no longer have any affect on
interrupts.  This means that standalone critical sections are now very
cheap as they are simply unlocked integer increments and decrements for the
common case.

Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter()
and spinlock_exit().  This KPI is responsible for providing whatever MD
guarantees are needed to ensure that a thread holding a spin lock won't
be preempted by any other code that will try to lock the same lock.  For
now all archs continue to block interrupts in a "spinlock section" as they
did formerly in all critical sections.  Note that I've also taken this
opportunity to push a few things into MD code rather than MI.  For example,
critical_fork_exit() no longer exists.  Instead, MD code ensures that new
threads have the correct state when they are created.  Also, we no longer
try to fixup the idlethreads for APs in MI code.  Instead, each arch sets
the initial curthread and adjusts the state of the idle thread it borrows
in order to perform the initial context switch.

This change is largely a big NOP, but the cleaner separation it provides
will allow for more efficient alternative locking schemes in other parts
of the kernel (bare critical sections rather than per-CPU spin mutexes
for per-CPU data for example).

Reviewed by:	grehan, cognet, arch@, others
Tested on:	i386, alpha, sparc64, powerpc, arm, possibly more
2005-04-04 21:53:56 +00:00
njl
f9c3ff58ce Document that devclass_get_maxunit(9) returns one greater than the current
highest unit.

Reviewed by:	dfr
MFC after:	2 weeks
2005-04-04 15:37:59 +00:00
njl
8a0ce01e92 Add devclass_get_drivers(9) which provides an array of pointers to driver
instances in a given devclass.  This is useful for systems that want to
call code in driver static methods, similar to device_identify().

Reviewed by:	dfr
MFC after:	2 weeks
2005-04-04 15:26:51 +00:00
jeff
d8b17b2eac - Add a missing unlock of the vnode_free_list_mtx.
Spotted by:	Antoine Brodin
2005-04-04 12:07:16 +00:00
jeff
b6f8b968c2 - Instead of waiting forever to get a vnode in getnewvnode() wait for
one to become available for one second and then return ENFILE.  We
   can run out of vnodes, and there must be a hard limit because without
   one we can quickly run out of KVA on x86.  Presently the system can
   deadlock if there are maxvnodes directories in the namecache.  The
   original 4.x BSD behavior was to return ENFILE if we reached the max,
   but 4.x BSD did not have the vnlru proc so it was less profitable to
   wait.
2005-04-04 11:43:44 +00:00
jeff
d9338897d1 - Include opt_vfs.h for LOOKUP_SHARED.
- Control the behavior of shared lookups with the lookup_shared sysctl
   which has its default behavior set via the LOOKUP_SHARED option.
2005-04-03 23:50:20 +00:00
njl
d126ccab5a maxunit is actually one higher than the greatest currently-allocated unit
in a devclass.  All the other uses of maxunit are correct and this one was
safe since it checks the return value of devclass_get_device(), which would
always say that the highest unit device doesn't exist.

Reviewed by:	dfr
MFC after:	3 days
2005-04-03 22:23:18 +00:00
jeff
62c728e499 - Slightly restructure acquire() so I can add more ktr information and
an assert to help find two strange bugs.
 - Remove some nearby spls.
2005-04-03 11:49:02 +00:00
jeff
3c5b12f96d - Now that writes to character devices supporting softupdates can
generate dirty bufs even with a locked vnode, 100 retries is not that
   many.  This should probably change from a retry count to an abort when
   we are no longer cleaning any buffers.
 - Don't call vprint() while we still hold the vnode locked.  Move the call
   to later in the function.
 - Clean up a comment.
2005-04-03 10:24:03 +00:00
alc
b3364f5e66 Remove GIANT_REQUIRED from elfN_load_section(). 2005-04-03 07:57:47 +00:00
jhb
a3c6b782c3 - Change the vm_mmap() function to accept an objtype_t parameter specifying
the type of object represented by the handle argument.
- Allow vm_mmap() to map device memory via cdev objects in addition to
  vnodes and anonymous memory.  Note that mmaping a cdev directly does not
  currently perform any MAC checks like mapping a vnode does.
- Unbreak the DRM getbufs ioctl by having it call vm_mmap() directly on the
  cdev the ioctl is acting on rather than trying to find a suitable vnode
  to map from.

Reviewed by:	alc, arch@
2005-04-01 20:00:11 +00:00
jhb
76cb101294 Actually commit the code for kern_sched_get_rr_interval(). 2005-03-31 22:54:48 +00:00
jhb
72d1a40cb6 Implement kern_adjtime(), kern_readv(), kern_sched_rr_get_interval(),
kern_settimeofday(), and kern_writev() to allow for further stackgap
reduction in the compat ABIs.
2005-03-31 22:51:18 +00:00
jhb
31fedfa192 - Denote a few places where kobj class references are manipulated without
holding the appropriate lock.
- Add a comment explaining why we bump a driver's kobj class reference
  when loading a module.
2005-03-31 22:49:31 +00:00
jhb
0deac201a8 Drop a bogus mp_fixme(). Adding a lock would do nothing to reduce userland
races regarding changing of jail-related sysctls.
2005-03-31 22:47:57 +00:00
jhb
c5e6b72803 Don't recursively panic when we call mi_switch() in a critical section,
even though calling mi_switch() after a panic is likely a bug anyway as
the recursive panic only serves to make things worse.
2005-03-31 20:36:44 +00:00
njl
749e1a55dd Add a check for cpufreq_unregister() being called with no cpufreq device
active.  Note that the logic indicates this should not be possible so
generate a warning if this ever happens.

Found by:	Coverity Prevent (via sam)
2005-03-31 18:56:54 +00:00
phk
7af1e31761 Explicitly hold a reference to the cdev we have just cloned. This
closes the race where the cdev was reclaimed before it ever made it
back to devfs lookup.
2005-03-31 12:19:44 +00:00
phk
2379f61770 cdev (still) needs per instance uid/gid/mode
Add unlocked version of dev_ref()

Clean up various stuff in sys/conf.h
2005-03-31 10:29:57 +00:00
phk
b83adaf8e5 Rename dev_ref() to dev_refl() 2005-03-31 06:51:54 +00:00
jeff
e4d4b610ba - Disable vfs shared locks by default. They must be specifically enabled
on filesystems which safely support them.  It appears that many
   network filesystems specifically are not shared lock safe.

Sponsored by:	Isilon Systems, Inc.
2005-03-31 05:22:45 +00:00
jeff
27112458d3 - Add a LK_NOSHARE flag which forces all shared lock requests to be
treated as exclusive lock requests.

Sponsored by:	Isilon Systems, Inc.
2005-03-31 05:18:19 +00:00
jeff
97c40ebd49 - LK_NOPAUSE is a nop now.
Sponsored by:   Isilon Systems, Inc.
2005-03-31 04:37:09 +00:00
jeff
2210f285f1 - Remove apause(). It makes no sense with our present mutex implementation
since simply unlocking a mutex does not ensure that one of the waiters
   will run and acquire it.  We're more likely to reacquire the mutex
   before anyone else has a chance.  It has also bit me three times now, as
   it's not safe to drop the interlock before sleeping in many cases.

Sponsored by:	Isilon Systems, Inc.
2005-03-31 04:25:59 +00:00
das
d3e0f098be Eliminate v_id and v_ddid. The name cache now holds references to
vnodes whose names it caches, so we no longer need a `generation
number' to tell us if a referenced vnode is invalid.  Replace the use
of the parent's v_id in the hash function with the address of the
parent vnode.

Tested by:	Peter Holm
Glanced at by:	jeff, phk
2005-03-30 03:01:36 +00:00
das
0fa243e9cd Merge kern___cwd() and vn_fullpath(), which were virtually identical,
except for places where people forget to update one of them.  We now
collect only one set of stats for both of these routines.  Other
changes in this commit include:

- Start acquiring Giant again in vn_fullpath(), since it is required
  when crossing a mount point.

- Expand the scope of the cache lock to avoid dropping it and
  picking it up again for every pathname component.  This also
  makes it trivial to avoid races in stats collection.

- Assert that nc_dvp == v_dd for directories instead of returning
  an error to userland when this is not true.  AFAIK, it should
  always be true when v_dd is non-null.

- For vn_fullpath(), handle the first (non-directory) vnode
  separately.

Glanced at by:  jeff, phk
2005-03-30 02:59:32 +00:00
jeff
39f5b5cc72 - Move the logic that locks and refs the new vnode from vfs_cache_lookup()
to cache_lookup().  This allows us to acquire the vnode interlock before
   dropping the cache lock.  This protects the vnodes identity until we
   have locked it.

Sponsored by:	Isilon Systems, Inc.
2005-03-29 12:59:06 +00:00
phk
9a44fd008d Remove the global cdev hash and use the cdevsw list instead.
Don't remove the now unused element from cdev yet, wait until
we have a better reason to bump the version.

There is now no longer any upper limit on how many device drivers
a FreeBSD kernel can have.
2005-03-29 11:15:54 +00:00
jeff
767230ce78 - Get rid of the old LOOKUP_SHARED code. namei() now supplies the
proper lock flags via cn_lkflag.

Sponsored by:	Isilon Systems, Inc.
2005-03-29 10:08:23 +00:00
jeff
4a949bda2b - Set cn_lkflags to LK_SHARED in the LOOKUP_SHARED case so that we only
acquire shared locks on intermediate directories.
 - For the LASTCN, we may have to LK_UPGRADE the parent directory before
   we lookup the last component.
 - Acquire VFS_ROOT and dp locks based on the cn_lkflag.

Sponsored by:	Isilon Systems, Inc.
2005-03-29 10:07:15 +00:00
jeff
b82462f008 - Dont clear OWEINACT in vbusy(), we still owe an inactive call if someone
vhold()s us.
 - Avoid an extra mutex acquire and release in the common case of vgonel()
   by checking for OWEINACT at the start of the function.
 - Fix the case where we set OWEINACT in vput().  LK_EXCLUPGRADE drops our
   shared lock if it fails.

Sponsored by:	Isilon Systems, Inc.
2005-03-29 10:02:48 +00:00
jeff
2059b48294 - Don't initial v_dd here, let cache_purge() do it for us.
Sponsored by:	Isilon Systems, Inc.
2005-03-29 09:59:34 +00:00
jeff
16b7e8d14e - Invalidate the childrens v_dd pointers when we cache_purge() a directory.
Otherwise the stale pointer may be accessed after a vnode is freed.

Sponsored by:	Isilon Systems, Inc.
2005-03-29 09:58:41 +00:00
phk
a018db09b4 Remove the global cdev hash and use the cdevsw list instead.
Don't remove the now unused element from cdev yet, wait until
we have a better reason to bump the version.
2005-03-29 09:56:21 +00:00
phk
88156f4e55 Privatize major(). 2005-03-29 08:13:17 +00:00
phk
d3b2ec6954 Print name of device instead of useless major/minor numbers. 2005-03-29 08:13:01 +00:00
jeff
a5324b331d - Remove an unused variable from relookup().
- Assert that REMOVE, CREATE, and RENAME callers have WANTPARENT
   or LOCKPARENT set.  You can't complete any of these operations without
   at least a reference to the parent.  Many filesystems check for this case
   even though it isn't possible in the current system.
2005-03-28 13:56:56 +00:00
jeff
c4f7e75896 - Remove an unused variable.
Sponsored by:	Isilon Systems, Inc.
2005-03-28 13:29:48 +00:00
jeff
acfff6d0d1 - Rev 1.83 of kern_lock.c fixes the td_locks assert, reenable it here.
Sponsored by:	Isilon Systems, Inc.
2005-03-28 12:52:46 +00:00
jeff
2ed686ac4d - Don't bump the count twice in the LK_DRAIN case.
Sponsored by:	Isilon Systems, Inc.
2005-03-28 12:52:10 +00:00
jeff
8c749eb801 - Move code that should probably be an assert above the main body of
vrele so that we can decrease the indentation of the real work and
   make things slightly more clear.

Sponsored by:	Isilon Systems, Inc.
2005-03-28 11:18:47 +00:00
jeff
eb142209f0 - We no longer have to bother with PDIRUNLOCK, lookup() handles it for us.
Sponsored by:	Isilon Systems, Inc.
2005-03-28 09:26:17 +00:00
jeff
b25a472993 - Adjust asserts in vop_lookup_post() to match the new post PDIRUNLOCK
vfs.

Sponsored by:	Isilon Systems, Inc.
2005-03-28 09:25:25 +00:00
jeff
9043d7b0a7 - Get rid of PDIRUNLOCK, instead, we fixup the lock state immediately after
calling VOP_LOOKUP().  Rather than having each filesystem check the
   LOCKPARENT flag, we simply check it once here and unlock as required.
   The only unusual case is ISDOTDOT, where we require an unlocked vnode
   on return.  Relocking this vnode with the child locked is allowed since
   the child is actually its parent.
 - Add a few asserts for some unusual conditions that I do not believe can
   happen.  These will later go away and turn into implementations for these
   conditions.

Sponsored by:	Isilon Systems, Inc.
2005-03-28 09:24:50 +00:00
phk
eac95420b8 Remove another ';' after if().
Also spotted by:	bz
2005-03-27 07:53:13 +00:00
phk
4b5dfbb1ae Remove extra ; at end of if().
Found by:	bz
2005-03-27 07:52:12 +00:00
phk
d55eb47f57 Make (some) serial ports implement the PPS-API again. This change
appearantly fell out during the tty code cleanup.
2005-03-26 20:12:39 +00:00
phk
e104a9e0b0 s/ENOTTY/ENOIOCTL/ 2005-03-26 20:04:28 +00:00
jeff
759d7ddf04 - The td_locks check is currently broken with snapshots and possibly
some case in unmount.  Disable the KASSERT until these problems can
   be diagnosed.

Sponsored by:	Isilon Systems, Inc.
2005-03-25 09:56:56 +00:00
jeff
6d72a7bd60 - Don't recycle vnodes anymore. Free them once they are dead. getnewvnode
now always allocates a new vnode.
 - Define a new function, vnlru_free, which frees vnodes from the free list.
   It takes as a parameter the number of vnodes to free, which is
   wantfreevnodes - freevnodes when called from vnlru_proc or 1 when
   called from getnewvnode().  For now, getnewvnode() still tries to reclaim
   a free vnode before creating a new one when we are near the limit.
 - Define a function, vdestroy, which handles the actual release of memory
   and teardown of locks, etc.  This could become a uma_dtor() routine.
 - Get rid of minvnodes.  Now wantfreevnodes is 1/4th the max vnodes.  This
   keeps more unreferenced vnodes around so that files which have only
   been stat'd are less likely to be kicked out of the system before we
   have a chance to read them, etc.  These vnodes may still be freed via
   the normal vnlru_proc() routines which may some day become a real lru.
2005-03-25 05:34:39 +00:00
marcel
f5cbb5f36a Fix inittodr() invocation. Now that devfs is mounted before the
actual root file system is mounted, the first entry on the mountlist
is not the root file system and the timestamp for that entry is
typically 0. Passing that to inittodr() caused annoying errors on
alpha and ia64.
So, call inittodr() for all file systems on mountlist, but only when
the timestamp (mnt_time) is non-zero.
2005-03-25 01:56:12 +00:00
jeff
8e533783f3 - Add information about the buf lock to db_show_buffer.
- Add a 'show lockedbufs' command that is similar to show lockedvnods.

Sponsored by:	Isilon Systems, Inc.
2005-03-25 00:20:37 +00:00
jeff
26329d2e37 - Restore COUNT() in all of its original glory. Don't make it dependent
on DEBUG as ufs will soon grow a dependency on this count.

Discussed with:	bde
Sponsored by:	Isilon Systems, Inc.
2005-03-25 00:00:44 +00:00
jhb
e04fe1a3ea Don't set ret_namelen and ret_resnamelen in res_find() unless both the
corresponding pointer to the buffer (ret_name and ret_resname) is non-NULL
to avoid possible NULL pointer derefs.

Reported by:	Coverity via sam
2005-03-24 21:20:25 +00:00
phk
72133122d4 Move implementation of hw.bus.rman sysctl to subr_rman.c so that
subr_bus.c doesn't need to peek inside struct resource.

OK from:	imp
2005-03-24 18:13:11 +00:00
jeff
e8e02448a8 - Fail an assert if we attempt to return with any lockmgr locks held in
userret().

Sponsored by:	Isilon Systems, Inc.
2005-03-24 09:35:38 +00:00
jeff
5fffa95033 - Complete the implementation of td_locks. Track the number of outstanding
lockmgr locks that this thread owns.  This is complicated due to
   LK_KERNPROC and because lockmgr tolerates unlocking an unlocked lock.

Sponsored by:	Isilon Systes, Inc.
2005-03-24 09:35:06 +00:00
jeff
0210925e42 - Pass LK_EXCLUSIVE to VFS_ROOT() to satisfy the new flags argument. For
now, all calls to VFS_ROOT() should still acquire exclusive locks.

Sponsored by:	Isilon Systems, Inc.
2005-03-24 07:31:38 +00:00
jeff
5528d705e8 - Fixup the default vfs_root function to match the new prototype.
Sponsored by:	Isilon Systems, Inc.
2005-03-24 07:30:00 +00:00
jeff
6d5bd3a0a2 - Grab the lock type that the caller requests in vfs_hash_insert().
Sponsored by:	Isilon Systems, Inc.
2005-03-24 06:16:27 +00:00
jeff
bf2e6f43e8 - If vput() is called with a shared lock it must upgrade to an exclusive
before it can call VOP_INACTIVE().  This must use the EXCLUPGRADE path
   because we may violate some lock order with another locked vnode if
   we drop and reacquire the lock.  If EXCLUPGRADE fails, we mark the
   vnode with VI_OWEINACT.  This case should be very rare.
 - Clear VI_OWEINACT in vinactive() and vbusy().
 - If VI_OWEINACT is set in vgone() do the VOP_INACTIVE call here as well.

Sponsored by:	Isilon Systems, Inc.
2005-03-24 06:08:58 +00:00
jeff
c6119ea3e7 - Remove some long dead LOOKUP_SHARED code that tracked the lock state.
- Always pass LOCKSHARED and rely on namei() to ignore it when
   LOOKUP_SHARED is not set.

Sponsored by:	Isilon Systems, Inc.
2005-03-24 06:04:35 +00:00
jeff
b84afc39aa - Remove the #ifdef LOOKUP_SHARED from some calls to NDINIT. The
LOCKSHARED flag is simply ignored in namei() if LOOKUP_SHARED is not
   enabled.

Sponsored by:	Isilon Systems, Inc.
2005-03-24 06:03:31 +00:00
jeff
9b04831cbb - Clear LOCKSHARED if LOOKUP_SHARED is not enabled. This is not strictly
necessary since we disable the shared locks in vfs_cache, but it is
   prefered that the option not leak out into filesystems when it is
   disabled.

Sponsored by:	Isilon Systems, Inc.
2005-03-24 06:02:37 +00:00
jeff
6ef6841ac5 - All of the bugs which lead to the complication of the LOOKUP_SHARED
config option have now been fixed.  All filesystems are properly locked
   and checked via DEBUG_VFS_LOCKS.  Remove the workaround code.

Sponsored by:	Isilon Systems, Inc.
2005-03-24 06:00:45 +00:00
julian
1a64e1bde4 Fix code freeing wrong cred pointer.
Submitted by:	das
Noticed by: Coverity tool
MFC after:	3 days

Note: usually the two pointers point to the same
thing but it was still a bug.
2005-03-21 22:55:38 +00:00
rwatson
560261414f Add a read-only kern.sched.preemption sysctl so that user space can tell
if "options PREEMPTION" is compiled into the kernel.
2005-03-20 17:05:12 +00:00
pjd
2275c8c92e Add ki_jid field to the kinfo_proc structure and store jail ID there.
Reviewed by:	gad
MFC after:	3 days
2005-03-20 10:35:23 +00:00
phk
22cd1201ca Sleeping is not allowed in uma->fini 2005-03-19 08:22:13 +00:00
sam
efb9d71d39 check copyin return value
Noticed by:	Coverity Prevent analysis tool
2005-03-19 04:34:23 +00:00
das
c2663b2f6d Add missing cases for PT_SYSCALL.
Found by:	Coverity Prevent analysis tool
2005-03-18 21:22:28 +00:00
sobomax
9f5986e422 Impose the upper limit on signals that are allowed between kernel threads
in set[ug]id program for compatibility with Linux. Linuxthreads uses
4 signals from SIGRTMIN to SIGRTMIN+3.

Pointed out by:		rwatson
2005-03-18 13:33:18 +00:00
phk
2f7506bdfd Use subr_unit to allocate thread ID's with.
Tested by:	davidxu
2005-03-18 12:34:14 +00:00
sobomax
8d935d1090 Linuxthreads uses not only signal 32 but several signals >= 32.
PR:		kern/72922
Submitted by:	Andriy Gapon <avg@icyb.net.ua>
2005-03-18 11:08:55 +00:00
phk
ae412c6cc0 Fix a bad copy&paste mistake I made.
Spotted by:	truckman
2005-03-18 06:01:21 +00:00
imp
7e9a7073ef Use STAILQ in preference to SLIST for the resources. Insert new resources
last in the list rather than first.

This makes the resouces print in the 4.x order rather than the 5.x order
(eg fdc0 at 0x3f0-0x3f5,0x3f7 is 4.x, but 0x3f7,0x3f0-0x3f5 is 5.x).  This
also means that the pci code will once again print the resources in BAR
ascending order.
2005-03-18 05:19:50 +00:00
jmg
d241e1d02f fix aio+kq... I've been running ambrisko's test program for much longer
w/o problems than I was before...  This simply brings back the knote_delete
as knlist_delete which will also drop the knote's, instead of just clearing
the list and seeing _ONESHOT...

Fix a race where if a note was _INFLUX and _DETACHED, it could end up being
modified... whoopse..

MFC after:	1 week
Prodded by:	ambrisko and dwhite
2005-03-18 01:11:39 +00:00
jmg
19da85af4a add m_copyup function.. This can be used to help make our ip stack less
alignment restrictive, and help performance on some ethernet cards which
currently copy the entire packet a couple bytes to get the packet aligned
properly...

Wordsmithing by:	dwhite
Obtained from:	NetBSD (code only)
I'll clean it up later:	rwatson
2005-03-17 19:34:57 +00:00
rwatson
04058cab9f A further step on the journey of meaking panics and debugging more reliable:
in the window between the beginning of panic() and entering the debugger,
it's possible to receive interrupts.  If we receive an interrupt, don't
preempt if panicstr != NULL, as the system is in the process of failing, and
the preempting thread is likely to stumble over the failure.  The typical
scenario is during the printf() in panic() prior to entering the debugger,
but when running with a slower console type such as serial console.

It could be that the panic string should be passed to the debugger to print,
so that it can run from the debugger's environment rather than a regular
kernel printf.

Glanced at by:	jhb
2005-03-17 15:18:01 +00:00
phk
d9bafd2447 Kill MAJOR_AUTO 2005-03-17 13:37:28 +00:00
phk
cfa6bb09ea Prepare for the final onslaught on devices:
Move uid/gid/mode from cdev to cdevsw.

Add kind field to use for devd(8) later.

Bump both D_VERSION and __FreeBSD_version
2005-03-17 12:07:00 +00:00
phk
d6c8765efc In stange circumstances we may end up being the last reference to a
session in tprintf().   SESSRELE() needs to properly dispose of the
sessions mutex.

Add sessrele() which does the proper cleanup and have SESSRELE() call it.

Use SESSRELE also in pgdelete().

Found by:	Coverity (ID:526)
2005-03-17 08:44:41 +00:00
phk
98f1c9b062 Add two arguments to the vfs_hash() KPI so that filesystems which do
not have unique hashes (NFS) can also use it.
2005-03-16 11:20:51 +00:00
phk
08503770f6 Fix a memoryleak in case of failed root filesystem mount.
Spotted by:     Coverity via sam
2005-03-16 11:06:49 +00:00
jmg
15cfd58a9c MFp4: print a more useful error when we don't have a /dev to mount devfs
on..
2005-03-16 08:04:39 +00:00
phk
e61634909a Add mnt_hashseed to struct mount and initialize it witn PRNG bits, use
it to get better hashing in vfs_hash.

In case of an insert collision in vfs_hash_insert(), put the loosing vnode
on a special list so that vfs_hash_remove() can just assume that it is on
a list.

Drop the VI_HASHED flag.
2005-03-16 07:35:06 +00:00
imp
2417261e25 Sometimes, when asked to return region A..C, we'd return A+N..C+N
instead of failing.

When looking for a region to allocate, we used to check to see if the
start address was < end.  In the case where A..B is allocated already,
and one wants to allocate A..C (B < C), then this test would
improperly fail (which means we'd examine that region as a possible
one), and we'd return the region B+1..C+(B-A+1) rather than NULL.
Since C+(B-A+1) is necessarily larger than C (end argument), this is
incorrect behavior for rman_reserve_resource_bound().

The fix is to exclude those regions where r->r_start + count - 1 > end
rather than r->r_start > end.  This bug has been in this code for a
very long time.  I believe that all other tests against end are
correctly done.

This is why sio0 generated a message about interrupts not being
enabled properly for the device.  When fdc had a bug that allocated
from 0x3f7 to 0x3fb, sio0 was then given 0x3fc-0x404 rather than the
0x3f8-0x3ff that it wanted.  Now when fdc has the same bug, sio0 fails
to allocate its ports, which is the proper behavior.  Since the probe
failed, we never saw the messed up resources reported.

I suspect that there are other places in the tree that have weird
looping or other odd work arounds to try to cope with the observed
weirdness this bug can introduce.  These workarounds should be located
and eliminated.

Minor debug write fix to match the above test done as well.

'nice' by: mdodd
Sponsored by: timing solutions (http://www.timing.com/)
2005-03-15 20:28:51 +00:00
imp
a5a3d20d00 Fix a debugging printf. The order of start/end was inconsistant with
all the other start/end debugs, causing momentary confusion when the
output was examined.
2005-03-15 20:15:15 +00:00
phk
d043926750 Improve the vfs_hash() API: vput() the unneeded vnode centrally to
avoid replicating the vput in all the filesystems.
2005-03-15 20:00:03 +00:00
jeff
d289cc6b5d - Now that there are no external users of vfree() make it static.
- Move VSHOULDBUSY, VSHOULDFREE, and VTRYRECYCLE into vfs_subr.c so
   no one else attempts to grow a dependency on them.
 - Now that objects with pages hold the vnode we don't have to do unlocked
   checks for the page count in the vm object in VSHOULDFREE.  These three
   macros could simply check for holdcnt state transitions to determine
   whether the vnode is on the free list already, but the extra safety
   the flag affords us is probably worth the minimal cost.
 - The leafonly sysctl and code have been dead for several years now,
   remove the sysctl and the code that employed it from vtryrecycle().
 - vtryrecycle() also no longer has to check the object's page count as
   the object holds the vnode until it reaches 0.

Sponsored by:	Isilon Systems, Inc.
2005-03-15 14:38:16 +00:00
phk
9e33e49ed5 Fix a debug message to print a usable device name rather than useless
major+minor tupple.
2005-03-15 14:08:10 +00:00
jeff
2115694bbc - Expose vholdl() so it may be used outside of vfs_subr.c 2005-03-15 13:43:10 +00:00
phk
7e1186083a Remove findcdev(). 2005-03-15 12:58:08 +00:00
phk
422db29b31 Rename cdev->si_udev to cdev->si_drv0 to reflect the new nature of
the field.
2005-03-15 11:33:28 +00:00
jeff
833fbdb710 - transferlockers() requires the interlock to be SMP safe.
Sponsored by:	Isilon Systems, Inc.
2005-03-15 09:27:45 +00:00
phk
124bf5e823 Simplify the vfs_hash calling convention. 2005-03-15 08:07:07 +00:00
phk
81b59515b3 Cleanup accidentally include #if 0 section. 2005-03-14 10:25:09 +00:00
phk
c15976e56b Currently (almost) all filesystems maintain a local inode hash table
to get from (mount + inode) to vnode.  These tables are mostly
copy&pasted from UFS, sized based on desiredvnodes and therefore
quite large (128K-512K).  Several filesystems are buggy enough that
they allocate the hash table even before they know if they will
ever be used or not.

Add "vfs_hash", a system wide hash table, which will replace all
the per-filesystem hash-tables.

The fields we add to struct vnode will more or less be saved in
the respective filesystems inodes.

Having one central implementation will save code and will allow us
to justify the complexity of code to dynamically (re)size the hash
at a later point.
2005-03-14 10:01:29 +00:00
jeff
3fcb9112fb - Increment the holdcnt once for each usecount reference. This allows us
to use only the holdcnt to determine whether a vnode may be recycled,
   simplifying the V* macros as well as vtryrecycle(), etc.

Sponsored by:	Isilon Systems, Inc.
2005-03-14 09:25:19 +00:00
jeff
2a81e8df21 - We do not have to check the object's ref_count in VSHOULDFREE or
vtryrecycle().  All obj refs also ref the vnode.
 - Consistently use v_incr_usecount() to increment the usecount.  This will
   be more important later.

Sponsored by:	Isilon Systems, Inc.
2005-03-14 08:30:31 +00:00
jeff
bb63517e7e - Slightly rearrange vrele() to move the common case in one indentation
level.

Sponsored by:	Isilon Systems, Inc.
2005-03-14 07:16:55 +00:00
jeff
a307ec6ef8 - Rework vget() so we drop the usecount in two failure cases that were
missed by my last commit.

Sponsored by:	Isilon Systems, Inc.
2005-03-14 07:11:19 +00:00
phk
466fb85e71 Remove debugging printfs. 2005-03-14 06:51:29 +00:00
jeff
a71cdca338 - Do a vn_start_write in vn_close, we may write if this is the last ref
on an unlinked file.  We can't know if this is the case until after we
   have the lock.
 - Lock the vnode in vn_close, many filesystems had code which was unsafe
   without the lock held, and holding it greatly simplifies vgone().
 - Adjust vn_lock() to check for the VI_DOOMED flag where appropriate.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 11:56:28 +00:00
jeff
d29b61a365 - Remove vx_lock, vx_unlock, vx_wait, etc.
- Add a vn_start_write/vn_finished_write around vlrureclaim so we don't do
   writing ops without suspending.  This could suspend the vlruproc which
   should not be a problem under normal circumstances.
 - Manually implement VMIGHTFREE in vlrureclaim as this was the only instance
   where it was used.
 - Acquire a lock before calling vgone() as it now requires it.
 - Move the acquisition of the vnode interlock from vtryrecycle() to
   getnewvnode() so that if it fails we don't drop and reacquire the
   vnode_free_list_mtx.
 - Check for a usecount or holdcount at the end of vtryrecycle() in case
   someone grabbed a ref while we were recycling.  Abort the recycle, and
   on the final ref drop this vnode will be placed on the head of the free
   list.
 - Move the redundant VOP_INACTIVE protection code into the local
   vinactive() routine to avoid code bloat.
 - Keep the vnode lock held across calls to vgone() in several places.
 - vgonel() no longer uses XLOCK, instead callers must hold an exclusive
   vnode lock.  The VI_DOOMED flag is set to allow other threads to detect
   a vnode which is no longer valid.  This flag is set until the last
   reference is gone, and there are no chances for a new ref.  vgonel()
   holds this lock across the entire function, which greatly simplifies
   logic.
 _ Only vfree() in one place in vgone() not three.
 - Adjust vget() to check the VI_DOOMED flag prior to waiting on the lock
   in the LK_NOWAIT case.  In other cases, check after we have slept and
   acquired an exlusive lock.  This will simulate the old vx_wait()
   behavior.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 11:54:28 +00:00
jeff
7b0b6888cc - A lock is required before calling VOP_REVOKE. Our reference protects us
from accessing another vnode so a naked VOP_LOCK is sufficient.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 11:47:04 +00:00
jeff
938e1f1a5c - Don't VOP_UNLOCK prior to VOP_REVOKE. The lock is required now.
Sponsored by:	Isilon Systems, Inc.
2005-03-13 11:45:51 +00:00
jeff
d448a9ec0e - Don't drop the lock in the default inactive handler anymore, VOP_NULL
will do for vop_stdinactive now.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 11:45:01 +00:00