Commit Graph

8710 Commits

Author SHA1 Message Date
jeff
22b5cc07b0 - Clear v_dd in cache_zap() instead of cache_purge() as cache_purge() may
not be called in all cases where we free the cnp.

Sponsored by:	Isilon Systems, Inc.
2005-06-13 05:59:59 +00:00
jeff
df170ebc61 - It has long been my suspicion that we don't actually need a loop in
vn_lock().  Add an assert that will help me gain more confidence that this
   is correct.

Sponsored by:	Isilon Systems, Inc.
2005-06-13 00:47:29 +00:00
jeff
2ef7df2a1a - Add KTR_VFS events to vdestroy, vtruncbuf, vinvalbuf, vfreehead.
Sponsored by:	Isilon Systems, Inc.
2005-06-13 00:46:37 +00:00
jeff
bf15cf7167 - Add KTR_VFS messages for various name cache related events.
Sponsored by:	Isilon Systems, Inc.
2005-06-13 00:46:03 +00:00
jeff
c92b8a6f78 - Split one KASSERT in bremfree() into two to aid in debugging.
Sponsored by:	Isilon Systems, Inc.
2005-06-13 00:45:05 +00:00
jeff
2599199edf - Dramatically simplify bioqdisksort(). We no longer do ordered bios so
most of the code to deal with them has been dead for sometime.  Simplify
   the code by doing an insert sort hinted by the current head position.

Met with apathy by:	arch@
2005-06-12 22:32:29 +00:00
pjd
d2fe610c90 Do not allocate memory while holding a mutex.
I introduce a very small race here (some file system can be mounted or
unmounted between 'count' calculation and file systems list creation),
but it is harmless.

Found by:	FreeBSD Kernel Stress Test Suite: http://www.holm.cc/stress/
Reported by:	Peter Holm <peter@holm.cc>
2005-06-12 07:03:23 +00:00
pjd
be79126844 Do not allocate memory based on not-checked argument from userland.
It can be used to panic the kernel by giving too big value.
Fix it by moving allocation and size verification into kern_getfsstat().
This even simplifies kern_getfsstat() consumers, but destroys symmetry -
memory is allocated inside kern_getfsstat(), but has to be freed by the
caller.

Found by:	FreeBSD Kernel Stress Test Suite: http://www.holm.cc/stress/
Reported by:	Peter Holm <peter@holm.cc>
2005-06-11 14:58:20 +00:00
maxim
e5e29d142d o setsockopt(2) cannot remove accept filter. [1]
o getsockopt(SO_ACCEPTFILTER) always returns success on listen socket
  even we didn't install accept filter on the socket.
o Fix these bugs and add regression tests for them.

Submitted by:	Igor Sysoev [1]
Reviewed by:	alfred
MFC after:	2 weeks
2005-06-11 11:59:48 +00:00
jeff
306b180d66 - Assert that we're not in the name cache anymore in vdestroy().
Sponsored by:	Isilon Systems, Inc.
2005-06-11 08:48:09 +00:00
jeff
8a4fe36603 - Assert that we're not adding a doomed vnode to the name cache.
Sponsored by:	Isilon Systems, Inc.
2005-06-11 08:47:30 +00:00
jeff
3625e8746b - Add KTR_VFS tracing to track the life of vnodes. Eventually KTR_VFS
events could be added to cover other interesting details.
 - Add some VNASSERTs to discover places where we access vnodes after
   they have been uma_zfree'd before we try to free them again.
 - Add a few more VNASSERTs to vdestroy() to be certain that the vnode is
   really unused.

Sponsored by:	Isilon Systems, Inc.
2005-06-11 01:16:46 +00:00
green
ff904ffb64 Fix a serious deadlock with the NFS client. Given a large enough
atomic write request, it can fill the buffer cache with the entirety
of that write in order to handle retries.  However, it never drops
the vnode lock, or else it wouldn't be atomic, so it ends up waiting
indefinitely for more buf memory that cannot be gotten as it has it
all, and it waits in an uncancellable state.

To fix this, hibufspace is exported and scaled to a reasonable
fraction.  This is used as the limit of how much of an atomic write
request by the NFS client will be handled asynchronously.  If the
request is larger than this, it will be turned into a synchronous
request which won't deadlock the system.  It's possible this value is
far off from what is required by some, so it shall be tunable as soon
as mount_nfs(8) learns of the new field.

The slowdown between an asynchronous and a synchronous write on NFS
appears to be on the order of 2x-4x.

General nod by:	gad
MFC after:	2 weeks
More testing:	wes
PR:		kern/79208
2005-06-10 23:50:41 +00:00
jeff
d372186b52 - Add curthread to the state that ktr is saving. The extra information is
well worth the bloat.
 - Change the formatting of 'show ktr' slightly to accommodate the
   additional field.  Remove a tab from the verbose output and place the
   actual trace data after a : so it is more easy to understand which
   part is the event and which is part of the record.
2005-06-10 23:21:29 +00:00
jkoshy
b195d18520 Fix typo.
Reviewed by:	rwatson, sam
2005-06-10 18:06:59 +00:00
brooks
567ba9b00a Stop embedding struct ifnet at the top of driver softcs. Instead the
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.

This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.

Other changes of note:
 - Struct arpcom is no longer referenced in normal interface code.
   Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
   To enforce this ac_enaddr has been renamed to _ac_enaddr.
 - The second argument to ether_ifattach is now always the mac address
   from driver private storage rather than sometimes being ac_enaddr.

Reviewed by:	sobomax, sam
2005-06-10 16:49:24 +00:00
ups
5273b0bf9f Restore preemption of idle threads.
Submitted by:	jhb
2005-06-10 03:00:29 +00:00
ssouhlal
0835f7b4a9 Allow EVFILT_VNODE events to work on every filesystem type, not just
UFS by:
- Making the pre and post hooks for the VOP functions work even when
DEBUG_VFS_LOCKS is not defined.
- Moving the KNOTE activations into the corresponding VOP hooks.
- Creating a MNTK_NOKNOTE flag for the mnt_kern_flag field of struct
mount that permits filesystems to disable the new behavior.
- Creating a default VOP_KQFILTER function: vfs_kqfilter()

My benchmarks have not revealed any performance degradation.

Reviewed by:	jeff, bde
Approved by:	rwatson, jmg (kqueue changes), grehan (mentor)
2005-06-09 20:20:31 +00:00
scottl
7a9b003ce5 Drat! Committed from the wrong branch. Restore HEAD to its previous goodness. 2005-06-09 19:59:09 +00:00
scottl
6be4cb00a4 Back out 1.68.2.26. It was a mis-guided change that was already backed out
of HEAD and should not have been MFC'd.  This will restore UDP socket
functionality, which will correct the recent NFS problems.

Submitted by: rwatson
2005-06-09 19:56:38 +00:00
jkoshy
1d3209ab83 MFP4:
- Implement sampling modes and logging support in hwpmc(4).

- Separate MI and MD parts of hwpmc(4) and allow sharing of
  PMC implementations across different architectures.
  Add support for P4 (EMT64) style PMCs to the amd64 code.

- New pmcstat(8) options: -E (exit time counts) -W (counts
  every context switch), -R (print log file).

- pmc(3) API changes, improve our ability to keep ABI compatibility
  in the future.  Add more 'alias' names for commonly used events.

- bug fixes & documentation.
2005-06-09 19:45:09 +00:00
ups
4421a08742 Lots of whitespace cleanup.
Fix for broken if condition.

Submitted by:	nate@
2005-06-09 19:43:08 +00:00
pjd
47f442bcb9 Rename sysctl security.jail.getfsstatroot_only to security.jail.enforce_statfs
and extend its functionality:

value	policy
0	show all mount-points without any restrictions
1	show only mount-points below jail's chroot and show only part of the
	mount-point's path (if jail's chroot directory is /jails/foo and
	mount-point is /jails/foo/usr/home only /usr/home will be shown)
2	show only mount-point where jail's chroot directory is placed.

Default value is 2.

Discussed with:	rwatson
2005-06-09 18:49:19 +00:00
pjd
5269cbb9cd Remove process information leak from inside a jail, when
security.bsd.see_other_uids is set to 0, etc.
One can check if invisible process is active, by doing:

	# ktrace -p <pid>

If ktrace returns 'Operation not permitted' the process is alive and
if returns 'No such process' there is no such process.

MFC after:	1 week
2005-06-09 18:33:21 +00:00
ups
d9753fcc91 Fix some race conditions for pinned threads that may cause them to run
on the wrong CPU.

Add IPI support for preempting a thread on another CPU.

MFC after:3 weeks
2005-06-09 18:26:31 +00:00
pjd
3af857a21a Avoid code duplication in serval places by introducing universal
kern_getfsstat() function.

Obtained from:	jhb
2005-06-09 17:44:46 +00:00
imp
6bc1b07ae1 Simplify the code a bit after the bzero(). 2005-06-09 05:50:01 +00:00
jeff
f637381b78 - My sub-par public school education has been exposed. s/sentinal/sentinel/
Noticed by:	Emil Mikulic
2005-06-09 04:40:20 +00:00
gad
d916eb91e6 Remove the previous parsing-logic for arguments on the '#!'-line of shell
scripts.  As far as I know, no one has needed the '#!#<' kludge to get at
the behavior implemented by the historical parsing.
2005-06-09 00:27:02 +00:00
jeff
b53b83993c - Under heavy IO load the buf daemon can run for many hundereds of
milliseconds due to what is essentially n^2 algorithmic complexity.  This
   change makes the algorithm N*2 instead.  This heavy processing manifested
   itself as skipping in audio and video playback due to the long scheduling
   latencies and contention on giant by pcm.
 - flushbufqueues() is now responsible for flushing multiple buffers
   rather than one at a time.  This allows us to save our progress in the
   list by using a sentinal.  We must do the numdirtywakeup() and
   waitrunningbufspace() here now rather than in buf_daemon().
 - Also add a uio_yield() after we have processed the list once for bufs
   without deps and again for bufs with deps.  This is to release Giant
   and allow any other giant locked code to proceed.

Tested by:	Many users on current@
Revealed by:	schedgraph traces sent by Emil Mikulic & Anthony Ginepro
2005-06-08 20:26:05 +00:00
rodrigc
b2d9df7a8b Initialize uio_iovcnt to 1 in extattr_list_vp() and extattr_get_vp()
PR:		kern/79357
Approved by:	rwatson
2005-06-08 13:22:10 +00:00
rwatson
1a25bf9ccd In sem_forkhook(), don't attempt to generate a copy of the process semaphore
list on fork() if the process doesn't actually have references to any
semaphores.  This avoids extra work, as well as potentially asking to
allocate storage for 0 references.

Found by:	avatar
MFC after:	1 week
2005-06-08 07:29:22 +00:00
jeff
4a9af33a3f - Clear OWEINACT prior to calling VOP_INACTIVE to remove the possibility
of a vget causing another call to INACTIVE before we're finished.
2005-06-07 22:05:32 +00:00
alc
43bc57303e In lio_listio(2) change jobref from an int to a long so that
lio_listio(LIO_WAIT, ...) works correctly on 64-bit architectures.

Reviewed by: tegge
2005-06-07 05:28:21 +00:00
rwatson
ee01c1bf47 Gratuitous renaming of four System V Semaphore MAC Framework entry
points to convert _sema() to _sem() for consistency purposes with
respect to the other semaphore-related entry points:

mac_init_sysv_sema() -> mac_init_sysv_sem()
mac_destroy_sysv_sem() -> mac_destroy_sysv_sem()
mac_create_sysv_sema() -> mac_create_sysv_sem()
mac_cleanup_sysv_sema() -> mac_cleanup_sysv_sem()

Congruent changes are made to the policy interface to support this.

Obtained from:	TrustedBSD Project
Sponsored by:	SPAWAR, SPARTA
2005-06-07 05:03:28 +00:00
jeff
6eaf0bed4f - Fix the case where we're not preempting but there is already a newtd
as this happens via thread_switchout().  I don't particularly like the
   structure of the code here.  We twice call out to thread code when
   a thread is voluntarily switching.  Once to thread_switchout() and once
   to slot_fill(), while sched_4BSD does even more work which is redundant
   to select another thread to use our remaining slice.  This should be
   simplified in the future, but for now I'm only going to fix the bug not
   the bad design.
2005-06-07 02:59:16 +00:00
dwhite
1d894721d3 Make "show msgbuf" use the pager instead of blasting the whole thing out.
MFC after:	3 days
2005-06-06 22:18:32 +00:00
davidxu
c2895a92bd Fix a bug relavant to debugging, a masked signal unexpectedly interrupts
a sleeping thread when process is being debugged.

PR: GNU/77818
Tested by: Sean C. Farley <sean-freebsd at farley org>
2005-06-06 05:13:10 +00:00
gallatin
c6980c2b7a Allow sends sent from non page-aligned userspace addresses to be
considered for zero-copy sends.

Reviewed by: alc
Submitted by: Romer Gil at Rice University
2005-06-05 17:13:23 +00:00
alc
981752ea4e Eliminate an unused field from struct aio_liojob. 2005-06-05 05:41:48 +00:00
marius
c74fc16e2d After some input from bde@ and rereading the datasheet use a MTX_SPIN
mutex instead of a MTX_DEF one in order to defer preemption while
reading the date and time registers. If we don't manage to read them
within the time slot where we are guaranteed that no updates occur we
might actually read them during an update in which case the output is
undefined.
2005-06-04 23:24:50 +00:00
alc
369cab6800 Eliminate the original method of requesting notification of aio_read(2) and
aio_write(2) completion through kevent(2).  This method does not work on
64-bit architectures.  It was deprecated in FreeBSD 4.4.  See revisions
1.87 and 1.70.2.7.

Change aio_physwakeup() to call psignal(9) directly rather than indirectly
through a timeout(9).  Discussed with: bde

Correct a bug introduced in revision 1.65 that could result in premature
delivery of a signal if an lio_listio(2) consisted of a mixture of
direct/raw and queued I/O operations.  Observed by: tegge

Eliminate a field from struct kaioinfo that is now unused.

Reviewed by: tegge
2005-06-04 19:16:33 +00:00
jeff
d33221f20a - It's 2005 already, I've been working on this for three years. 2005-06-04 09:24:15 +00:00
jeff
c720bf6f50 - Don't SLOT_USE() in the preempt case, sched_add() has already taken the
slot for us.  Previously, we would take two slots on every preempt, and
   setrunqueue() would fix it up for us in the non threaded case.  The
   threaded case was simply broken.
 - Clean up flags, prototypes, comments.
2005-06-04 09:23:28 +00:00
ps
bac0ce72d5 Wrap copyin/copyout for kevent so the 32bit wrapper does not have
to malloc nchanges * sizeof(struct kevent) AND/OR nevents *
sizeof(struct kevent) on every syscall.

Glanced at by:	peter, jmg
Obtained from:	Yahoo!
MFC after:	2 weeks
2005-06-03 23:15:01 +00:00
alc
652afa11cf Synchronize access to the per process aiocb lists in many of the functions. 2005-06-03 05:27:20 +00:00
alc
685bbca37b In aio_waitcomplete() correct two cases of using an aiocb after freeing it. 2005-06-02 23:14:38 +00:00
alc
47a9b57f58 Giant is no longer required in kern_setrlimit(); remove its acquisition and
release.

Reviewed by: jhb
2005-06-01 17:52:51 +00:00
kensmith
3a7e275ce6 This patch addresses a standards violation issue. The standards say a
file's access time should be updated when it gets executed.  A while
ago the mechanism used to exec was changed to use a more mmap based
mechanism and this behavior was broken as a side-effect of that.

A new vnode flag is added that gets set when the file gets executed,
and the VOP_SETATTR() vnode operation gets called.  The underlying
filesystem is expected to handle it based on its own semantics, some
filesystems don't support access time at all.  Those that do should
handle it in a way that does not block, does not generate I/O if possible,
etc.  In particular vn_start_write() has not been called.  The UFS code
handles it the same way as it would normally handle the access time if
a file was read - the IN_ACCESS flag gets set in the inode but no other
action happens at this point.  The actual time update will happen later
during a sync (which handles all the necessary locking).

Got me into this:	cperciva
Discussed with:		a lot with bde, a little with kan
Showed patches to:	phk, jeffr, standards@, arch@
Minor discussion on:	arch@
2005-05-31 19:39:52 +00:00
alc
09a2a99469 Synchronize access to aio_freeproc with a mutex. Eliminate related spl
calls.

Reduce the scope of Giant in aio_daemon().
2005-05-30 22:26:34 +00:00
alc
0a10c2b5cd Use the proc mtx to prevent simultaneous changes to p_aioinfo. 2005-05-30 19:33:33 +00:00
alc
3ecc8d1129 Eliminate unnecessary calls to wakeup(); no one sleeps on &aio_freeproc.
Eliminate an unused flag, AIOP_SCHED; it's cleared but never set.
2005-05-30 18:02:00 +00:00
rwatson
5010364761 Rebuild generated system call definition files following the addition of
the audit event field to the syscalls.master file format.

Submitted by:	wsalamon
Obtained from:	TrustedBSD Project
2005-05-30 15:20:21 +00:00
rwatson
370e72b242 Introduce a new field in the syscalls.master file format to hold the
audit event identifier associated with each system call, which will
be stored by makesyscalls.sh in the sy_auevent field of struct sysent.
For now, default the audit identifier on all system calls to AUE_NULL,
but in the near future, other BSM event identifiers will be used.  The
mapping of system calls to event identifiers is many:one due to
multiple system calls that map to the same end functionality across
compatibility wrappers, ABI wrappers, etc.

Submitted by:	wsalamon
Obtained from:	TrustedBSD Project
2005-05-30 15:09:18 +00:00
jeff
33b78c31e9 - Add bufobj_wrefl() to add a write ref to a bufobj that is already locked. 2005-05-30 07:01:18 +00:00
jkoshy
ad86ac4ba4 Kernel hooks to support PMC sampling modes.
Reviewed by:	alc
2005-05-30 06:29:29 +00:00
alc
f570134192 Eliminate aio_activeproc; it's unused. 2005-05-30 05:25:10 +00:00
alc
404d37a14d Eliminate aio_bufjobs; it's unused. 2005-05-29 21:29:15 +00:00
rwatson
aaf5c1d3e8 Normalize white space in syscalls.master: try to use tabs before system
call types.
2005-05-29 20:20:16 +00:00
rwatson
7035f9f56a Kernel malloc layers malloc_type allocation over one of two underlying
allocators: a set of power-of-two UMA zones for small allocations, and the
VM page allocator for large allocations.  In order to maintain unified
statistics for specific malloc types, kernel malloc maintains a separate
per-type statistics pool, which can be monitored using vmstat -m.  Prior
to this commit, each pool of per-type statistics was protected using a
per-type mutex associated with the malloc type.

This change modifies kernel malloc to maintain per-CPU statistics pools
for each malloc type, and protects writing those statistics using critical
sections.  It also moves to unsynchronized reads of per-CPU statistics
when generating coalesced statistics.  To do this, several changes are
implemented:

- In the previous world order, the statistics memory was allocated by
  the owner of the malloc type structure, allocated statically using
  MALLOC_DEFINE().  This embedded the definition of the malloc_type
  structure into all kernel modules.  Move to a model in which a pointer
  within struct malloc_type points at a UMA-allocated
  malloc_type_internal data structure owned and maintained by
  kern_malloc.c, and not part of the exported ABI/API to the rest of
  the kernel.  For the purposes of easing a possible MFC, re-use an
  existing pointer in 'struct malloc_type', and maintain the current
  malloc_type structure size, as well as layout with respect to the
  fields reused outside of the malloc subsystem (such as ks_shortdesc).
  There are several unused fields as a result of no longer requiring
  the mutex in malloc_type.

- Struct malloc_type_internal contains an array of malloc_type_stats,
  of size MAXCPU.  The structure defined above avoids hard-coding a
  kernel compile-time value of MAXCPU into kernel modules that interact
  with malloc.

- When accessing per-cpu statistics for a malloc type, surround read -
  modify - update requests with critical_enter()/critical_exit() in
  order to avoid races during write.  The per-CPU fields are written
  only from the CPU that owns them.

- Per-CPU stats now maintained "allocated" and "freed" counters for
  number of allocations/frees and bytes allocated/freed, since there is
  no longer a coherent global notion of the totals.  When coalescing
  malloc stats, accept a slight race between reading stats across CPUs,
  and avoid showing the user a negative allocation count for the type
  in the event of a race.  The global high watermark is no longer
  maintained for a malloc type, as there is no global notion of the
  number of allocations.

- While tearing up the sysctl() path, also switch to using sbufs.  The
  current "export as text" sysctl format is retained with the same
  syntax.  We may want to change this in the future to export more
  per-CPU information, such as how allocations and frees are balanced
  across CPUs.

This change results in a substantial speedup of kernel malloc and free
paths on SMP, as critical sections (where usable) out-perform mutexes
due to avoiding atomic/bus-locked operations.  There is also a minor
improvement on UP due to the slightly lower cost of critical sections
there.  The cost of the change to this approach is the loss of a
continuous notion of total allocations that can be exploited to track
per-type high watermarks, as well as increased complexity when
monitoring statistics.

Due to carefully avoiding changing the ABI, as well as hardening the ABI
against future changes, it is not necessary to recompile kernel modules
for this change.  However, MFC'ing this change to RELENG_5 will require
also MFC'ing optimizations for soft critical sections, which may modify
exposed kernel ABIs.  The internal malloc API is changed, and
modifications to vmstat in order to restore "vmstat -m" on core dumps will
follow shortly.

Several improvements from:		bde
Statistics approach discussed with:	ups
Tested by:				scottl, others
2005-05-29 13:38:07 +00:00
pjd
58d2b4c193 Fix panic when module is compiled in and it is loaded from loader.conf.
Only panic is fixed, module will be still listed in kldstat(8) output.
Not sure what is correct fix, because adding unloading code in case of
failure to linker_init_kernel_modules() doesn't work.
2005-05-28 23:20:05 +00:00
gad
2add4b872d Change the way options are parsed on the `#!'-line of a shell-script. Instead
of having the kernel parse that line and add an entry to the argument list for
each 'separate word' it finds, have it add only one entry which holds all
the words found on that line.  The old behavior is useful in some situations,
but it does not match the way any other operating system will parse that line.

This has been discussed in the thread "Bug in #! processing - One More Time"
on the freebsd-arch mailing list (starting back on Feb 24, 2005).  The first
few messages in that thread provide the background in much detail.

PR:		16393
Reviewed by:	freebsd-arch
2005-05-28 22:42:41 +00:00
pjd
7543a23525 Prevent loading modules with are compiled into the kernel.
PR:		kern/48759
Submitted by:	Pawe³ Ma³achowski <pawmal@unia.3lo.lublin.pl>
Patch from:	demon
MFC after:	2 weeks
2005-05-28 22:29:44 +00:00
rwatson
067b94d2d9 Regenerate from syscalls.master. 2005-05-28 14:35:43 +00:00
rwatson
c0001c0613 Mark ntp_gettime() as MSTD, since its system call path will acquire
Giant if required.
2005-05-28 14:35:05 +00:00
rwatson
f1dfea9d61 Explicitly acquire Giant around the ntp_gettime() and assert it in the
sysctl path.  While this code is close to MPSAFE, it may require some
additional locking.  Mark ntp_gettime1() as GIANT_REQUIRED for now.

Suggested by:	phk
2005-05-28 14:34:41 +00:00
rwatson
fb931ae00a Regenerate for updated syscalls.master. 2005-05-28 13:24:05 +00:00
rwatson
ceb26b4c48 Mark the following compatability system calls as MCOMPAT or MCOMPAT4 based
on the their simply wrapping MPSAFE implementations of existing MPSAFE
system calls:

  getfsstat()
  lseek()
  stat()
  lstat()
  truncate()
  ftruncate()
  statfs()
  fstatfs()

Note that ogetdirentries() is not marked MPSAFE because it does not share
the MPSAFE implementation used for getdirentries(), and requires separate
locking to be implemented.
2005-05-28 13:23:42 +00:00
rwatson
c060f4b949 Regenerate from syscalls.master. 2005-05-28 13:13:01 +00:00
rwatson
0439e13c01 Mark quotactl() as MSTD. 2005-05-28 13:12:04 +00:00
rwatson
527c640ad3 Acquire Giant explicitly in quotactl() so that the syscalls.master
entry can become MSTD.
2005-05-28 13:11:35 +00:00
rwatson
ff36d1a493 Regenerate from updated syscalls.master. 2005-05-28 13:09:56 +00:00
rwatson
35ffa17830 Mark kenv(2) as MPSAFE, since it appears to be properly locked down. 2005-05-28 13:09:41 +00:00
rwatson
ea08d61a73 Regenerate system call tables from syscalls.master. 2005-05-28 13:08:26 +00:00
rwatson
acb673063c Also mark the COMPAT4 version of fhstatfs() as MPSAFE. 2005-05-28 13:07:43 +00:00
rwatson
fa7cf37c72 Mark fhopen(), fhstat(), and fhstatfs() as MSTD, since they now
acquire Giant themselves.
2005-05-28 12:59:33 +00:00
rwatson
66d882141f Acquire Giant explicitly in fhopen(), fhstat(), and kern_fhstatfs(),
so that we can start to eliminate the presence of non-MPSAFE system
call entries in syscalls.master.
2005-05-28 12:58:54 +00:00
pjd
ac435fbb13 Remove (now) unused argument 'td' from cvtstatfs(). 2005-05-27 19:23:48 +00:00
pjd
788f75ddb2 Sync locking in freebsd4_getfsstat() with getfsstat().
Giant is probably also needed in kern_fhstatfs().
2005-05-27 19:21:08 +00:00
pjd
2fc56b12a9 Use consistent style in functions I want to modify in the near future. 2005-05-27 19:15:46 +00:00
rwatson
ac1a365e2d In the current world order, each socket has two mutexes: a mutex
that protects socket and receive socket buffer state, and a second
mutex to protect send socket buffer state.  In some places, the
mutex shared between the socket and receive socket buffer will be
acquired twice, once by each layer, resulting in some
inconsistency, but providing the abstraction benefit of being able
to more easily separate the two mutexes in the future if desired.

When transitioning a socket to the SS_ISDISCONNECTING or
SS_ISDISCONNECTED states, grab the socket/receive socket buffer lock
once rather than grabbing it as the socket lock, modifying socket
state, then grabbing a second time as the receive lock in order to
modify the socket buffer state to indicate no further data can be
read.  This change is believed to close a race between the change in
socket state and the change in socket buffer state, which for a
remotely initiated close on a UNIX domain socket, resulted in
soreceive() returning ENOTCONN rather than an EOF condition.

A similar race still exists in the case of send, however, and is
harder to fix as the socket and send socket buffer mutexes are not
the same, and we would like to avoid holding combinations of socket
mutexes over sb_upcall until we've finished clarifying the locking
protocol for upcalls.

This change has the side affect of reducing the number of mutex
operations to initiate disconnect or perform disconnect on a
socket by two.

PR:		78824
Rerported by:	Marc Olzheim <marcolz@stack.nl>
MFC after:	2 weeks
2005-05-27 17:16:43 +00:00
davidxu
5a8d3af0d6 Remove thread_upcall_check, it was used to avoid race bug in earlier
day's sleep queue code, today the bug no longer exists.
please see 04/25/2004 freebsd-threads@ mailing list archive.
2005-05-27 15:57:27 +00:00
davidxu
3fbc6983fa Remove sleep queue hack, it is no longer needed with current sleep queue.
Actually, it causes process to hang when it is being debugged.

PR: gnu/77818
2005-05-27 04:27:22 +00:00
jmg
07e93041c6 make stat return an zero'd struct, and be a FIFO again... This is only
to fix libc_r since it requires stat to close fd's, and so commented in
the code...

PR:		threads/75795
Reviewed by:	ps
MFC after:	1 week
2005-05-24 23:42:50 +00:00
cognet
9bcd47137c Don't set the default of kern.fallback_elf_brand to FreeBSD for arm, as
binutils now do the job for us
2005-05-24 22:21:44 +00:00
ups
acfce18a2a Use low level constructs borrowed from interrupt threads to wait for
work in proc0.
Remove the TDP_WAKEPROC0 workaround.
2005-05-23 23:01:53 +00:00
pjd
0b89469bda Protect fsid in freebsd4_getfsstat() in simlar way as it is done in
getfsstat().
2005-05-22 23:05:27 +00:00
pjd
a6e0e217b2 If we need to hide fsid, kern_statfs()/kern_fstatfs() will do it for us,
so do not duplicate the code in cvtstatfs().
Note, that we now need to clear fsid in freebsd4_getfsstat().

This moves all security related checks from functions like cvtstatfs()
and will allow to add more security related stuff (like statfs(2), etc.
protection for jails) a bit easier.
2005-05-22 21:52:30 +00:00
njl
9ab8d98ce5 Document that the returned pointer should be freed even if the number
of items returned is 0.
2005-05-20 05:04:22 +00:00
ups
c8d93020ce Fix a bug that caused preemption to happen for a thread in the same
ksegrp with the same priority as the currently running thread.
This can cause propagate_priority() to panic.

Pointy hat to: ups
2005-05-19 01:08:30 +00:00
pjd
4c810f35cd devfs_first() return value isn't used, remove it. 2005-05-18 22:05:12 +00:00
alc
bcfd7ad6a6 Revert revision 1.164: pmap_qremove() does not require protection by
VM_LOCK_GIANT.

Discussed with: jeff
2005-05-14 05:09:11 +00:00
jhb
6772446cb8 Actually use the iterating variable in the for loop when trying to avoid
overflow.

Reported by:	Vladislav Shabanov vs at rambler-co dot ru
MFC after:	1 week
Glanced at:	alfred
2005-05-12 20:04:48 +00:00
pjd
c6e5e8f446 We don't use 'mp' variable, but we do want to mount devfs, ehh. 2005-05-12 01:49:51 +00:00
pjd
91b47597be Remove unised variable introduced by accident in rev 1.168.
Found by:	Coverity Prevent analysis tool
2005-05-11 19:50:34 +00:00
pjd
f66a55ffcd Plug memory leaks.
Found by:		Coverity Prevent analysis tool
2005-05-11 19:27:38 +00:00
kan
4085840a33 Handle theoretical case of vfs_export being called with both MNT_DELEXPORT and
MNT_EXPORT flags set. Do not reuse the memory that has just been freed.
2005-05-11 18:25:42 +00:00
cperciva
a199a4f74b Fix two issues which were missed in FreeBSD-SA-05:08.kmem.
Reported by:	Uwe Doering
2005-05-07 00:41:36 +00:00
cperciva
e513415af9 If we are going to
1. Copy a NULL-terminated string into a fixed-length buffer, and
2. copyout that buffer to userland,
we really ought to
0. Zero the entire buffer
first.

Security: FreeBSD-SA-05:08.kmem
2005-05-06 02:50:00 +00:00
davidxu
af64c19b3b Only check signal event, single threading event shouldn't be reported. 2005-05-05 06:42:02 +00:00
emax
a52b6c9ce3 Change m_uiotombuf so it will accept offset at which data should be copied
to the mbuf. Offset cannot exceed MHLEN bytes. This is currently used to
fix Ethernet header alignment problem on alpha and sparc64. Also change all
users of m_uiotombuf to pass proper offset.

Reviewed by:	jmg, sam
Tested by:	Sten Spans "sten AT blinkenlights DOT nl"
MFC after:	1 week
2005-05-04 18:55:03 +00:00
rwatson
2197ab2d93 Introduce MAC Framework and MAC Policy entry points to label and control
access to POSIX Semaphores:

mac_init_posix_sem()            Initialize label for POSIX semaphore
mac_create_posix_sem()          Create POSIX semaphore
mac_destroy_posix_sem()         Destroy POSIX semaphore
mac_check_posix_sem_destroy()   Check whether semaphore may be destroyed
mac_check_posix_sem_getvalue()  Check whether semaphore may be queried
mac_check_possix_sem_open()     Check whether semaphore may be opened
mac_check_posix_sem_post()      Check whether semaphore may be posted to
mac_check_posix_sem_unlink()    Check whether semaphore may be unlinked
mac_check_posix_sem_wait()      Check whether may wait on semaphore

Update Biba, MLS, Stub, and Test policies to implement these entry points.
For information flow policies, most semaphore operations are effectively
read/write.

Submitted by:	Dandekar Hrishikesh <rishi_dandekar at sbcglobal dot net>
Sponsored by:	DARPA, McAfee, SPARTA
Obtained from:	TrustedBSD Project
2005-05-04 10:39:15 +00:00
rwatson
182429e8d0 Move definitions of 'struct kuser' and 'struct ksem' from uipc_sem.c
to ksem.h so that they are accessible from the MAC Framework for the
purposes of labeling and enforcing additional protections.  #error
if these are included without _KERNEL, since they are not intended
(nor installed) for user application use.

Submitted by:	Dandekar Hrishikesh <rishi_dandekar at sbcglobal dot net>
Sponsored by:	DARPA, SPARTA
Obtained from:	TrustedBSD Project
2005-05-03 20:21:24 +00:00
jeff
33ac8108e2 - Initialize vfslocked correctly early enough for MAC to compile.
- Fix one place where we explicitly drop Giant!

Pointy hat to:	me
Submitted by:	Max Laier
Warned by:	Tinderbox
2005-05-03 16:24:59 +00:00
jeff
79452537e3 - Remove two mtx_asserts that can incorrectly trigger if
devstat_end_transaction is called from a fast interrupt.  Presently
   there is no way for mtx_assert to determine that we're not executing
   in a real thread context.

Submitted by:	jhusted@isilon.com
2005-05-03 10:58:05 +00:00
jeff
92f17d1e6a - A vnode may have made its way onto the free list while it was being
vgone'd.  We must remove it from the freelist before returning in
   vtryrecycle() or we may get a duplicate free.

Reported by:	kkenn
2005-05-03 10:56:00 +00:00
jeff
ab437d7b1d - Use namei to acquire Giant for VFS if it is necessary. Drop the explicit
Giant acquisition.
 - Remove GIANT_REQUIRED in the few remaining cases; the vm and vfs have
   both been locked.
2005-05-03 10:55:05 +00:00
jeff
451e14446f - Use NAMEI to pickup Giant if we need it in fpcheckstd(). 2005-05-03 10:52:22 +00:00
jeff
617ce99006 - Neither of our image formats require Giant now that the vm and vfs have
been locked.
2005-05-03 10:51:38 +00:00
csjp
431f1afe8c Since it is not possible for curthread to be NULL in this context,
drop the check+initialization for a straight initialization. Also
assert that curthread will never be NULL just to be sure.

Discussed with:	rwatson, peter
MFC after:	1 week
2005-05-02 02:07:55 +00:00
jeff
dd41538cd8 - All buffers should either be clean or dirty. If neither of these flags
are set when we attempt to remove a buffer from a queue we should panic.
   Hopefully this will catch the source of the wrong bufobj panics.

Sponsored by:	Isilon Systems, Inc.
2005-05-01 12:00:36 +00:00
jeff
ff4a7a72e9 - Remove spls and comments relating to them. 2005-05-01 01:01:17 +00:00
jeff
22004a9723 - Remove an old splcam hack. 2005-05-01 00:59:55 +00:00
jeff
1bc61f8f0f - Remove unnecessary spls. 2005-05-01 00:59:34 +00:00
jeff
80bb41c921 - Return EACCES if we're trying to exec on a vp with no object.
Errno supplied by:	cperciva
2005-05-01 00:58:19 +00:00
sam
17d6060ac9 o enable shutdown of taskqueue threads; the thread servicing the queue checks
a new entry in the taskqueue struct each time it wakes up to see if it
  should terminate
o adjust TASKQUEUE_DEFINE_THREAD & co. to record the thread/proc identity for
  the shutdown rendezvous
o replace wakeup after adding a task to a queue with wakeup_one; this helps
  queues where multiple threads are used to service tasks (e.g. acpi)
o remove NULL check of tq_enqueue method; it should never be NULL

Reviewed by:	dfr, njl
2005-05-01 00:38:11 +00:00
dwhite
c8fa809967 Implement an alternate method to stop CPUs when entering DDB. Normally we use
a regular IPI vector, but this vector is blocked when interrupts are disabled.
With "options KDB_STOP_NMI" and debug.kdb.stop_cpus_with_nmi set, KDB will
send an NMI to each CPU instead. The code also has a context-stuffing
feature which helps ddb extract the state of processes running on the
stopped CPUs.

KDB_STOP_NMI is only useful with SMP and complains if SMP is not defined.
This feature only applies to i386 and amd64 at the moment, but could be
used on other architectures with the appropriate MD bits.

Submitted by:	ups
2005-04-30 20:01:00 +00:00
jeff
cb9dfadd87 - Remove long dead splbio() calls and comments relating to the old
synchronization mechanism.
2005-04-30 12:18:50 +00:00
jeff
116d72569a - Don't acquire Giant before calling b_biodone, individual consumers are
now required to do so themselves.

Sponsored by:	Isilon Systems, Inc.
2005-04-30 11:44:22 +00:00
jeff
32c015f463 - Acquire Giant in AIO's iodone routine. VFS will no longer do it for us
soon.

Sponsored by:	Isilon Systems, Inc.
2005-04-30 11:27:31 +00:00
jeff
f9172cb275 - Call VM_LOCK_GIANT in cluster_callback() to protect some pmap calls. VFS
will not be acquiring Giant before calling this function anymore.

Sponsored by:	Isilon Systems, Inc.
2005-04-30 11:26:58 +00:00
jeff
7354fc5e28 - In vnlru_free() remove the vnode from the free list before we call
vtryrecycle().  We could sometimes get into situations where two threads
   could try to recycle the same vnode before this.
 - vtryrecycle() is now responsible for returning the vnode to the free list
   if it fails and someone else hasn't done it.
 - Make a new function vfreehead() which moves a vnode to the head of the
   free list and use it in vgone() to clean up that code a bit.

Sponsored by:	Isilon Systems, Inc.
Reported by:	pho, kkenn
2005-04-30 11:22:40 +00:00
jeff
0e56b01ed6 - Don't vgonel() via vgone() or vrecycle() if the vnode is already doomed.
This fixes forced unmounts via nullfs.

Reported by:	kkenn
Sponsored by:	Isilon Systems, Inc.
2005-04-27 10:03:21 +00:00
jeff
a80bbe799e - Stop setting vxthread, we've asserted that it was useless for several
weeks now.
2005-04-27 09:17:33 +00:00
jeff
18cd3a36d3 - Stop checking vxthread, we've asserted that it was useless for several
weeks.
2005-04-27 09:17:11 +00:00
jeff
f869be5c72 - Pass the ISOPEN flag to namei so filesystems will know we're about to
open them or otherwise access the data.
2005-04-27 09:05:19 +00:00
mdodd
56c42039a5 Add missing break.
Found by:	marcus
2005-04-25 00:48:04 +00:00
sam
0f63abff2a o eliminate modification of task structures after their run to avoid
modify-after-free races when the task structure is malloc'd
o shrink task structure by removing ta_flags (no longer needed with
  avoid fix) and combining ta_pending and ta_priority

Reviewed by:	dwhite, dfr
MFC after:	4 days
2005-04-24 16:52:45 +00:00
davidxu
50a5bbcbfd Wake up swapper process if needed.
PR: kern/78474
Submitted by: Sam Lawrance <boris at brooknet dot com dot au>
2005-04-23 05:06:44 +00:00
davidxu
a247de6aeb Regen. 2005-04-23 02:38:17 +00:00
davidxu
1b8f9e10e1 Add new syscall thr_new to create thread in atomic, it will
inherit signal mask from parent thread, setup TLS and stack, and
user entry address.
Also support POSIX thread's PTHREAD_SCOPE_PROCESS and PTHREAD_SCOPE_SYSTEM,
sysctl is also provided to control the scheduler scope.
2005-04-23 02:36:07 +00:00
davidxu
2155a04472 Change cpu_set_kse_upcall to more generic style, so we can reuse it
in other codes. Add cpu_set_user_tls, use it to tweak user register
and setup user TLS. I ever wanted to merge it into cpu_set_kse_upcall,
but since cpu_set_kse_upcall is also used by M:N threads which may
not need this feature, so I wrote a separated cpu_set_user_tls.
2005-04-23 02:32:32 +00:00
jeff
4eaa5ebe1b - Define the real lock order with cdev and a few vm/vfs related locks. This
can be removed once cdev no longer calls free() with the cdev lock held.
2005-04-22 22:43:31 +00:00
jeff
b29bfc6efa - Check LO_DUPOK as well as LOP_DUPOK when determining whether we should
warn about duplicate acquires.

Sponsored by:	Isilon Systems, Inc.
2005-04-22 22:39:46 +00:00
trhodes
f02068c038 Get the directory structure correct in a comment.
Submitted by:	Samy Al Bahra
2005-04-22 19:09:12 +00:00
jeff
31cfb7f242 - Disable code which allows getnewvnode() to fail. Many ffs_vget() callers
do not correctly deal with failures.  This presently risks deadlock
   problems if dependency processing is held up by failures to allocate
   a vnode, however, this is better than the situation with the failures.

Sponsored by:	Isilon Systems, Inc.
2005-04-22 00:57:05 +00:00
jeff
d8b31a35ea - Add two KASSERTs to prevent us from recycling a buf that is still on a
bufobj list.

Sponsored by:	Isilon Systems, Inc.
2005-04-22 00:53:20 +00:00
marcel
dd5b3be596 Do not conditionally compile the contents of this file upon whether
HWPMC_HOOKS is defined. The pmc_cpu_is_*() functions in this file
are referenced unconditionally by hwpmc(4).

This is mostly a stop-gap. The pmc_cpu_is*() function should
probably be declared inline in <sys/pmc.h> or <sys/pmckern.h> and
the function pointers with corresponding SX lock should probably
be moved to another file and compiled conditionally upon HWPMC_HOOKS.

Ok'd by: jkoshy@
2005-04-20 20:30:59 +00:00
davidxu
0719b14efb Inherit signal mask for child process in fork1(), RELENG_4 and other
*BSD have this behaviour, also it is required by POSIX.

PR: kern/80130
Submitted by: Kostik Belousov konstantin.belousov at zoral dot com dot ua
2005-04-20 13:14:52 +00:00
mdodd
7826c585d5 Check sopt_level in uipc_ctloutput() and return early if it is non-zero.
This prevents unintended consequnces when an application calls things like
setsockopt(x, SOL_SOCKET, SO_REUSEADDR, ...) on a Unix domain socket.
2005-04-20 02:57:56 +00:00
pjd
db9ce4609f Call g_waitidle() before every check the list of holds is empty.
Suggested by:	phk
2005-04-19 21:44:44 +00:00
davidxu
9452a25d2d Clear P_STATCHILD earlier to avoid unnecessary retrying. 2005-04-19 12:31:15 +00:00
davidxu
913d50be4f Oops, forgot to update this file.
Fix a race condition between kern_wait() and thread_stopped().
Problem is in kern_wait(), parent process steps through children list,
once a child process is skipped, and later even if the child is stopped,
parent process still sleeps in msleep(), the race happens if parent
masked SIGCHLD.

Submitted by : Peter Edwards peadar.edwards at gmail dot com
MFC after    : 4 days
2005-04-19 08:11:28 +00:00
davidxu
02615ff23a Fix a race condition between kern_wait() and thread_stopped().
Problem is in kern_wait(), parent process steps through children list,
once a child process is skipped, and later even if the child is stopped,
parent process still sleeps in msleep(), the race happens if parent
masked SIGCHLD.

Submitted by : Peter Edwards peadar.edwards at gmail dot com
MFC after    : 4 days
2005-04-19 08:07:28 +00:00
phk
ed5a7da798 Call g_waitidle() instead of GEOM using the root_mount_hold() KPI.
GEOM could (and will) get events as a result of drivers coming in
late so a one-shot method is not good enough for GEOM.
2005-04-19 06:23:59 +00:00
jkoshy
dc3444cd91 Bring a working snapshot of hwpmc(4), its associated libraries, userland utilities
and documentation into -CURRENT.

Bump FreeBSD_version.

Reviewed by:	alc, jhb (kernel changes)
2005-04-19 04:01:25 +00:00
phk
b7f29c0fc0 Add a named reference-count KPI to hold off mounting of the root filesystem.
While we wait for holds to be released, print a list of who holds us
back once per second.

Use the new KPI from GEOM instead of vfs_mount.c calling g_waitidle().

Use the new KPI also from ata.

With ATAmkIII's newbusification, ata could narrowly miss the window
and ad0 would not exist when we tried to mount root.
2005-04-18 21:21:26 +00:00
phk
4bd811c8dd Initialize mountlist_mtx with an MTX_SYSINIT(), we need it to be ready
earlier.
2005-04-18 21:11:47 +00:00
rwatson
75030e30f6 Introduce p_canwait() and MAC Framework and MAC Policy entry points
mac_check_proc_wait(), which control the ability to wait4() specific
processes.  This permits MAC policies to limit information flow from
children that have changed label, although has to be handled carefully
due to common programming expectations regarding the behavior of
wait4().  The cr_seeotheruids() check in p_canwait() is #if 0'd for
this reason.

The mac_stub and mac_test policies are updated to reflect these new
entry points.

Sponsored by:	SPAWAR, SPARTA
Obtained from:	TrustedBSD Project
2005-04-18 13:36:57 +00:00
rwatson
997d8772c4 Remove end-of-line tabs.
MFC after:	3 days
2005-04-18 11:51:10 +00:00
das
5aec008257 Add a sysctl that returns the full path of a process' text file.
This information is needed by things like `gdb -p' and Sun's javac,
and previously it could only be obtained via procfs
2005-04-18 02:10:37 +00:00
rwatson
155bfd8789 Introduce three additional MAC Framework and MAC Policy entry points to
control socket poll() (select()), fstat(), and accept() operations,
required for some policies:

        poll()          mac_check_socket_poll()
        fstat()         mac_check_socket_stat()
        accept()        mac_check_socket_accept()

Update mac_stub and mac_test policies to be aware of these entry points.
While here, add missing entry point implementations for:

        mac_stub.c      stub_check_socket_receive()
        mac_stub.c      stub_check_socket_send()
        mac_test.c      mac_test_check_socket_send()
        mac_test.c      mac_test_check_socket_visible()

Obtained from:	TrustedBSD Project
Sponsored by:	SPAWAR, SPARTA
2005-04-16 18:46:29 +00:00
rwatson
76ab38319d In mac_get_fd(), remove unconditional acquisition of Giant around copying
of the socket label to thread-local storage, and replace it with
conditional acquisition based on debug.mpsafenet.  Acquire the socket
lock around the copy operation.

In mac_set_fd(), replace the unconditional acquisition of Giant with
the conditional acquisition of Giant based on debug.mpsafenet.  The socket
lock is acquired in mac_socket_label_set() so doesn't have to be
acquired here.

Obtained from:	TrustedBSD Project
Sponsored by:	SPAWAR, SPARTA
2005-04-16 18:33:13 +00:00
marius
dfe8329b58 Increase default HZ for sparc64 to 1000. 2005-04-16 15:07:41 +00:00
rwatson
51183f0f84 Introduce new MAC Framework and MAC Policy entry points to control the use
of system calls to manipulate elements of the process credential,
including:

        setuid()                mac_check_proc_setuid()
        seteuid()               mac_check_proc_seteuid()
        setgid()                mac_check_proc_setgid()
        setegid()               mac_check_proc_setegid()
        setgroups()             mac_check_proc_setgroups()
        setreuid()              mac_check_proc_setreuid()
        setregid()              mac_check_proc_setregid()
        setresuid()             mac_check_proc_setresuid()
        setresgid()             mac_check_rpoc_setresgid()

MAC checks are performed before other existing security checks; both
current credential and intended modifications are passed as arguments
to the entry points.  The mac_test and mac_stub policies are updated.

Submitted by:	Samy Al Bahra <samy@kerneled.org>
Obtained from:	TrustedBSD Project
2005-04-16 13:29:15 +00:00
rwatson
04a7b2d379 Modify the alq(9) alq_open() API to accept a file creation mode, rather
than defaulting the cmode argument to vn_open() to 0.  Supply a default
argument of ALQ_DEFAULT_CMODE (0600) in current callers.

Discussed with/pointed out by:	hmp
Reveiwed by:	jeff, hmp
MFC after:	3 days
2005-04-16 12:12:27 +00:00
maxim
2e89e331eb Fix a typo in the comment.
Noticed by:	Samy Al Bahra
2005-04-15 14:01:43 +00:00
jhb
249414d2cc Close a race between sleepq_broadcast() and sleepq_catch_signals().
Specifically, sleepq_broadcast() uses td_slpq for its private pending
queue of threads that it is going to wake up after it takes them off the
sleep queue.  The problem is that if one of the threads is actually not
asleep yet, then we can end up with td_slpq being corrupted and/or the
thread being made runnable at the wrong time resulting in the td_sleepqueue
== NULL assertion failures occasionally reported under heavy load.

The fix is to stop being so fancy and ditch the whole pending queue bit.
Instead, sleepq_remove_thread() and sleepq_resume_thread() were merged
into one function that requires the caller to hold sched_lock.  This
fixes several places that unlocked sched_lock only to call a function
that then locked sched_lock, so even though sched_lock is now held
slightly longer, removing the extra lock acquires (1 pair instead of 3
in some cases) probably makes it an overall win if you don't include the
fact that it closes a race.  This is definitely a 5.4 candidate.

PR:		kern/79693
Submitted by:	Steven Sears stevenjsears at yahoo dot com
MFC after:	4 days
2005-04-14 06:30:32 +00:00
jeff
b38ffce1e7 - Remove a debugging printf that slipped in.
Spotted by:	Peter Wemm
2005-04-13 23:36:28 +00:00
avatar
e496403c65 According to the comment in struct tty, t_modem is optional; hence we should
guard against NULL t_modem entry. Otherwise, driver doesn't have t_modem
callback implemented(such like sys/dev/usb/ucycom.c) would panic when
someone opens the driver's associated tty device.

Reviewed by:	phk, sam (mentor)
2005-04-13 13:56:17 +00:00
jeff
afab3762a0 - Change all filesystems and vfs_cache to relock the dvp once the child is
locked in the ISDOTDOT case.  Se vfs_lookup.c r1.79 for details.

Sponsored by:	Isilon Systems, Inc.
2005-04-13 10:59:09 +00:00
jeff
5642885b84 - Change vop_lookup_post assertions to reflect recent vfs_lookup changes.
Sponsored by:	Isilon Systems, Inc.
2005-04-13 10:57:53 +00:00
jeff
ee55350db0 - Further simplify lookup; Force all filesystems to relock in the DOTDOT
case.  There are bugs in some which didn't unlock in the ISDOTDOT case
   to begin with that need to be addressed seperately.  This simplifies
   things anyway.
 - Fix relookup() to prevent it from vrele()'ing the dvp while the vp
   is locked.  Catch up to other lookup changes.

Sponsored by:	Isilon Systems, Inc.
Reported by:	Peter Wemm
2005-04-13 10:57:13 +00:00
mdodd
bdcac6ad82 Implement unix(4) socket options LOCAL_CREDS and LOCAL_CONNWAIT.
- Add unp_addsockcred() (for LOCAL_CREDS).
- Add an argument to unp_connect2() to differentiate between
  PRU_CONNECT and PRU_CONNECT2. (for LOCAL_CONNWAIT)

Obtained from:	 NetBSD (with some changes)
2005-04-13 00:01:46 +00:00
rwatson
005d2e3810 Consistently style function declarations in kern_malloc.c.
MFC after:	3 days
2005-04-12 23:54:34 +00:00
jhb
f9da7305b5 Use PCPU_LAZY_INC() for cnt.v_{intr,trap,syscalls} rather than atomic
operations in some places and simple non-per CPU math in others.
2005-04-12 23:18:54 +00:00
vkashyap
363d709ec6 The latest release of the FreeBSD driver (twa) for
3ware's 9xxx series controllers.  This corresponds to
the 9.2 release (for FreeBSD 5.2.1) on the 3ware website.

Highlights of this release are:

1. The driver has been re-architected to use a "Common Layer"
    (all tw_cl* files), which is a consolidation of all OS-independent
    parts of the driver.  The FreeBSD OS specific portions of the
    driver go into an "OS Layer" (all tw_osl* files).
    This re-architecture is to achieve better maintainability, consistency
    of behavior across OS's, and better portability to new OS's (drivers
    for new OS's can be written by just adding an OS Layer that's specific
    to the OS, by complying to a "Common Layer Programming Interface" API.

2. The driver takes advantage of multiple processors.

3. The driver has a new firmware image bundled, the new features of which
   include Online Capacity Expansion and multi-lun support, among others.
   More details about 3ware's 9.2 release can be found here:
   http://www.3ware.com/download/Escalade9000Series/9.2/9.2_Release_Notes_Web.pdf

Since the Common Layer is used across OS's, the FreeBSD specific include
path for header files (/sys/dev/twa) is not part of the #include pre-processor
directive in any of the source files.  For being able to integrate twa into
the kernel despite this, Makefile.<arch> has been changed to add the include
path to CFLAGS.

Reviewed by: scottl
2005-04-12 22:07:11 +00:00
imp
a404b5c683 resource_list_purge: release the resources in this list, and purge the
elements of this list (eg, reset it).

Man page to follow
2005-04-12 15:20:36 +00:00
imp
8d4424715d rman_set_device() seems to have been omitted by mistake. Implement it. 2005-04-12 06:21:59 +00:00
jeff
802f97f119 - Remove unused include. 2005-04-12 05:45:58 +00:00
jeff
b53ca6ad0a - Differentiate two UPGRADE panics so I have a better idea of what's going
on here.
2005-04-12 05:43:03 +00:00
imp
02a4d4e044 Return the resource created/found in resource_list_add to avoid an extra
resouce_list_find in some places.

Suggested by: sam
Found by: Coventry Analysis tool.
2005-04-12 04:22:17 +00:00
jeff
a6b9b8b072 - Mark the VOPs that require exclusive locks. Those that aren't marked
with E may be called with a shared lock held.  This list really could
   be made per filesystem if we had any filesystems which differed from
   ffs in locking guarantees.  VFS itself is not sensitive to this except
   where vgone() etc. are concerned.

Sponsored by:	Isilon Systems, Inc.
2005-04-11 15:19:29 +00:00
jeff
b391d2675b - Enable ASSERT_VOP_ELOCKED and assert_vop_elocked() now that vnode_if.awk
uses it.

Sponsored by:	Isilon Systems, Inc.
2005-04-11 15:17:06 +00:00
jeff
17be4cbfa0 - Change the VOP_LOCK UPGRADE in vput() to do a LK_NOWAIT to avoid a
potential lock order reversal.  Also, don't unlock the vnode if this
   fails, lockmgr has already unlocked it for us.
 - Restructure vget() now that vn_lock() does all of VI_DOOMED checking
   for us and also handles the case where there is no real lock type.
 - If VI_OWEINACT is set, we need to upgrade the lock request to EXCLUSIVE
   so that we can call inactive.  It's not legal to vget a vnode that hasn't
   had INACTIVE called yet.

Sponsored by:	Isilon Systems, Inc.
2005-04-11 09:28:32 +00:00
jeff
1b29b37e26 - Assert that we're no longer doing recursive vn_locks in inactive/reclaim
as I'd like to get rid of the vxthread.
 - Handle lock requests which don't actually want a lock as this is a
   much more convenient place to handle this condition than in vget().
   These requests simply want to know that VI_DOOMED isn't set.
 - Correct a test at the end of vn_lock, if error !=0 should be
   if error == 0, this has been broken since I comitted the VI_DOOMED
   changes, but no one ran into it because vget() duplicated this
   functionality.

Sponsored by:	Isilon Systems, Inc.
2005-04-11 09:23:56 +00:00
jeff
fbc2ade92c - vput(tvp) before vrele(tdvp) in kern_rename() to avoid lock order issues. 2005-04-11 09:19:08 +00:00
njl
46ef074ef7 Add debugging prints to all the methods in case there are problems with
managing levels.  This can be enabled with the debug.cpufreq.verbose
tunable and sysctl.
2005-04-10 19:11:23 +00:00
das
7a8e8974c6 Suspend all other threads in the process while generating a core dump.
The main reason for doing this is that the ELF dump handler expects
the thread list to be fixed while the dump header is generated, so an
upcall that occurs at the wrong time can lead to buffer overruns and
other Bad Things.

Another solution would be to grab sched_lock in the ELF dump handler,
but we might as well single-thread, since the process is about to die.
Furthermore, I think this should ensure that the register sets in the
core file are sequentially consistent.
2005-04-10 02:31:24 +00:00
pjd
853bddab29 CDEV lock should be before 'system map' lock.
Hardcode this order to help track down reported LOR.

LOR reported by:	Thierry Herbelot <thierry@herbelot.com>
LOR info:		http://sources.zabbadoz.net/freebsd/lor.html#080
2005-04-09 13:32:01 +00:00
jeff
4048451472 - Remove the namei NOOBJ flag. It is meaningless now.
Sponsored by:	Isilon Systems, Inc.
2005-04-09 12:04:36 +00:00
jeff
46c812596a - If we vrele() a dvp while the child is locked we can potentially deadlock
when vrele() acquires the directory lock in the wrong order.  Fix this
   via the following changes:
 - Keep the directory locked after VOP_LOOKUP() until we've determined
   what we're going to do with the child.  This allows us to remove the
   complicated post LOOKUP code which determins whether we should lock or
   unlock the parent.  This means we may have to vput() in the appropriate
   cases later, rather than doing an unsafe vrele.
 - in NDFREE() keep two flags to indicate whether we need to unlock vp or
   dvp.  This allows us to vput rather than vrele in the appropriate
   cases without rechecking the flags.  Move the code to handle dvp after
   we handle vp.
 - Remove some dead code from namei() that was the result of changes to
   VFS_LOCK_GIANT().

Sponsored by:	Isilon Systems, Inc.
2005-04-09 11:53:16 +00:00
pjd
1a889a3a93 Add a missing terminator.
Confirmed by:	rwatson
2005-04-09 11:31:31 +00:00
glebius
f81c6c3eb6 Add additional newline to debug.mutex.prof.stats header, so that
column names are printed exactly above the columns.
2005-04-08 14:14:09 +00:00
ups
7bac02c146 Sprinkle some volatile magic and rearrange things a bit to avoid race
conditions in critical_exit now that it no longer blocks interrupts.

Reviewed by:	jhb
2005-04-08 03:37:53 +00:00
phk
b7a4869221 Fix bug in vfs_hash_rehash(): use correct bucket. This only affected
msdosfs which is broken in other ways too.
2005-04-07 07:54:08 +00:00
phk
46c303a78b Constify hexdump() harder. 2005-04-06 10:14:13 +00:00
jeff
a3d1d94c5a - Remove dead code. 2005-04-06 10:11:14 +00:00
jeff
60d07eec30 - Assert that the bufobj matches in flushbuflists. I still haven't gotten
to root cause on exactly how this happens.
 - If the assert is disabled, we presently try to handle this case, but the
   BUF_UNLOCK was missing.  Thus, if this condition ever hit we would leak
   a buf lock.

Many thanks to Peter Holm for all his help in finding this bug.  He really
put more effort into it than I did.
2005-04-06 06:49:46 +00:00
jeff
d42252c158 - Move NDFREE() from vfs_subr to vfs_lookup where namei() is. 2005-04-05 08:58:49 +00:00
jeff
01f6ce3a70 - Use taskqueue_thread rather than taskqueue_swi since our task is going
to vrele, which may vop lock.  This is not safe in a software interrupt
   context.
2005-04-05 08:51:45 +00:00
csjp
8a050fbf3b Assert that the vnode is locked. This is meant to catch bugs or
mis-use of the vnode API in conditions where IO_NODELOCKED has been
used without the vnode actually being locked.
2005-04-05 01:11:43 +00:00
jhb
41cadaa11e Divorce critical sections from spinlocks. Critical sections as denoted by
critical_enter() and critical_exit() are now solely a mechanism for
deferring kernel preemptions.  They no longer have any affect on
interrupts.  This means that standalone critical sections are now very
cheap as they are simply unlocked integer increments and decrements for the
common case.

Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter()
and spinlock_exit().  This KPI is responsible for providing whatever MD
guarantees are needed to ensure that a thread holding a spin lock won't
be preempted by any other code that will try to lock the same lock.  For
now all archs continue to block interrupts in a "spinlock section" as they
did formerly in all critical sections.  Note that I've also taken this
opportunity to push a few things into MD code rather than MI.  For example,
critical_fork_exit() no longer exists.  Instead, MD code ensures that new
threads have the correct state when they are created.  Also, we no longer
try to fixup the idlethreads for APs in MI code.  Instead, each arch sets
the initial curthread and adjusts the state of the idle thread it borrows
in order to perform the initial context switch.

This change is largely a big NOP, but the cleaner separation it provides
will allow for more efficient alternative locking schemes in other parts
of the kernel (bare critical sections rather than per-CPU spin mutexes
for per-CPU data for example).

Reviewed by:	grehan, cognet, arch@, others
Tested on:	i386, alpha, sparc64, powerpc, arm, possibly more
2005-04-04 21:53:56 +00:00
njl
f9c3ff58ce Document that devclass_get_maxunit(9) returns one greater than the current
highest unit.

Reviewed by:	dfr
MFC after:	2 weeks
2005-04-04 15:37:59 +00:00
njl
8a0ce01e92 Add devclass_get_drivers(9) which provides an array of pointers to driver
instances in a given devclass.  This is useful for systems that want to
call code in driver static methods, similar to device_identify().

Reviewed by:	dfr
MFC after:	2 weeks
2005-04-04 15:26:51 +00:00
jeff
d8b17b2eac - Add a missing unlock of the vnode_free_list_mtx.
Spotted by:	Antoine Brodin
2005-04-04 12:07:16 +00:00
jeff
b6f8b968c2 - Instead of waiting forever to get a vnode in getnewvnode() wait for
one to become available for one second and then return ENFILE.  We
   can run out of vnodes, and there must be a hard limit because without
   one we can quickly run out of KVA on x86.  Presently the system can
   deadlock if there are maxvnodes directories in the namecache.  The
   original 4.x BSD behavior was to return ENFILE if we reached the max,
   but 4.x BSD did not have the vnlru proc so it was less profitable to
   wait.
2005-04-04 11:43:44 +00:00
jeff
d9338897d1 - Include opt_vfs.h for LOOKUP_SHARED.
- Control the behavior of shared lookups with the lookup_shared sysctl
   which has its default behavior set via the LOOKUP_SHARED option.
2005-04-03 23:50:20 +00:00
njl
d126ccab5a maxunit is actually one higher than the greatest currently-allocated unit
in a devclass.  All the other uses of maxunit are correct and this one was
safe since it checks the return value of devclass_get_device(), which would
always say that the highest unit device doesn't exist.

Reviewed by:	dfr
MFC after:	3 days
2005-04-03 22:23:18 +00:00
jeff
62c728e499 - Slightly restructure acquire() so I can add more ktr information and
an assert to help find two strange bugs.
 - Remove some nearby spls.
2005-04-03 11:49:02 +00:00