Commit Graph

9485 Commits

Author SHA1 Message Date
John Baldwin
0df2972736 Various whitespace fixes. 2006-06-21 17:47:45 +00:00
John Baldwin
62d615d508 Conditionally acquire Giant around VFS operations. 2006-06-20 21:31:38 +00:00
John Baldwin
aeeb017bd6 - Push Giant down into linker_reference_module().
- Add a new function linker_release_module() as a more intuitive complement
  to linker_reference_module() that wraps linker_file_unload().
  linker_release_module() can either take the module name and version info
  passed to linker_reference_module() or it can accept the linker file
  object returned by linker_reference_module().
2006-06-20 20:54:13 +00:00
John Baldwin
f462ce3edd Make linker_find_file_by_name() and linker_find_file_by_id() static to
simplify linker locking.  The only external consumers now use
linker_file_foreach().
2006-06-20 20:41:15 +00:00
John Baldwin
932151064a - Add a new linker_file_foreach() function that walks the list of linker
file objects calling a user-specified predicate function on each object.
  The iteration terminates either when the entire list has been iterated
  over or the predicate function returns a non-zero value.
  linker_file_foreach() returns the value returned by the last invocation
  of the predicate function.  It also accepts a void * context pointer that
  is passed to the predicate function as well.  Using an iterator function
  avoids exposing linker internals to the rest of the kernel making locking
  simpler.
- Use linker_file_foreach() instead of walking the list of linker files
  manually to lookup ndis files in ndis(4).
- Use linker_file_foreach() to implement linker_hwpmc_list_objects().
2006-06-20 20:37:17 +00:00
John Baldwin
aaf3170501 Make linker_file_add_dependency() and linker_load_module() static since
only the linker uses them.
2006-06-20 20:18:42 +00:00
John Baldwin
e767366f99 Don't check if malloc(M_WAITOK) returns NULL. 2006-06-20 20:11:00 +00:00
John Baldwin
e5bb3a01d7 Use 'else' to remove another goto. 2006-06-20 19:49:28 +00:00
John Baldwin
73a2437a83 - Remove some useless variable initializations.
- Make some conditional free()'s where the condition was always true
  unconditional.
2006-06-20 19:32:10 +00:00
George V. Neville-Neil
fb11be62a2 Properly cast the values of valsize (the size of the value passed in)
in setsockopt so that they can be compared correctly against negative
values.  Passing in a negative value had a rather negative effect
on our socket code, making it impossible to open new sockets.

PR:		98858
Submitted by:	James.Juran@baesystems.com
MFC after:	1 week
2006-06-20 12:36:40 +00:00
Robert Watson
721150ad8f When retrieving SO_ERROR via getsockopt(), hold the socket lock around
the retrieval and replacement with 0.

MFC after:	1 week
2006-06-18 19:02:49 +00:00
Yaroslav Tykhiy
42ccd54fec Add a funny sysctl: debug.kdb.trap_code .
It is similar to debug.kdb.trap, except for it tries to cause a page fault
via a call to an invalid pointer.  This can highlight differences between
a fault on data access vs. a fault on code call some CPUs might have.

This appeared as a test for a work \
Sponsored by: RiNet (Cronyx Plus LLC)
2006-06-18 12:27:59 +00:00
Robert Watson
cd3a3a269f Remove sbinsertoob(), sbinsertoob_locked(). They violate (and have
basically always violated) invariannts of soreceive(), which assume
that the first mbuf pointer in a receive socket buffer can't change
while the SB_LOCK sleepable lock is held on the socket buffer,
which is precisely what these functions do.  No current protocols
invoke these functions, and removing them will help discourage them
from ever being used.  I should have removed them years ago, but
lost track of it.

MFC after:	1 week
Prodded almost by accident by:	peter
2006-06-17 22:48:34 +00:00
Ed Maste
374875fa56 Add a description for sysctl -d. 2006-06-17 02:58:18 +00:00
Robert Watson
9a44cbf19c Remove unused (and ifdef'd) unp_abort() and unp_drain().
MFC after:	1 month
2006-06-16 22:11:49 +00:00
David Malone
93ef14a74b Add a kern.timecounter.tc sysctl tree that contains the mask,
frequency, quality and current value of each available time counter.

At the moment all of these are read-only, but it might make sense to
make some of these read-write in the future.

MFC after:	3 months
2006-06-16 20:29:05 +00:00
Yaroslav Tykhiy
be70abccba Kill an XXX remark that has been untrue since rev. 1.150 of this file. 2006-06-16 07:36:18 +00:00
Christian S.J. Peron
4f0840f348 Axe Giant from vn_fullpath(9). The vnode -> pathname lookup should be
filesystem agnostic. We are not touching any file system specific functions
in this code path. Since we have a cache lock, there is really no need to
keep Giant around here.

This eliminates Giant acquisitions for any syscall which is auditing pathnames.

Discussed with:	jeff
2006-06-16 05:09:28 +00:00
Maxim Konovalov
059d68dea6 o Expand an exclusive lock scope to prevent a race between two
simultaneous module_register().

Original work done by:	Alex Lyashkov
Reviewed by:		jhb
MFC after:		2 weeks
2006-06-15 08:53:09 +00:00
David Xu
7bb561fbb9 Use scheduler API sched_relinquish() to implement yield() syscall. 2006-06-15 06:41:57 +00:00
David Xu
36ec198bd5 Add scheduler API sched_relinquish(), the API is used to implement
yield() and sched_yield() syscalls. Every scheduler has its own way
to relinquish cpu, the ULE and CORE schedulers have two internal run-
queues, a timesharing thread which calls yield() syscall should be
moved to inactive queue.
2006-06-15 06:37:39 +00:00
David Xu
c2c1ab1858 Clear ke_runq before calling maybe_preempt, this avoids a
KASSERT(ke->ke_runq == NULL) panic when the sched_add is recursively
called by maybe_preempt.

Reported by: Wojciech A. Koszek < dunstan at freebsd dot czest dot pl >
2006-06-14 03:46:03 +00:00
Xin LI
6ad26d8376 Unexpand an instance of TAILQ_EMPTY() 2006-06-14 03:14:26 +00:00
Marcel Moolenaar
e1684acf38 Unbreak 64-bit architectures. The 3rd argument to kern_kldload() is
a pointer to an integer and td->td_retval[0] is of type register_t.
On 64-bit architectures register_t is wider than an integer.
2006-06-14 03:01:06 +00:00
David Xu
2c7cae8042 Fox a typo in sched_is_timeshare. 2006-06-13 23:45:59 +00:00
David Xu
e15abbf251 Pass boolean value to __predict_false. Try to keep KSE slot count
correct for migrating thread, the count is a bit mess.
2006-06-13 23:01:50 +00:00
John Baldwin
edd32c2da2 Use kern_kldload() and kern_kldunload() to load and unload modules when
we intend for the user to be able to unload them later via kldunload(2)
instead of calling linker_load_module() and then directly adjusting the
ref count on the linker file structure.  This makes the resulting
consumer code simpler and cleaner and better hides the linker internals
making it possible to sanely lock the linker.
2006-06-13 21:36:23 +00:00
John Baldwin
b21c9288ce A couple of minor style tweaks. 2006-06-13 21:34:12 +00:00
John Baldwin
d53885879d - Add a kern_kldload() that is most of the previous kldload() and push
Giant down in it.
- Push Giant down in kern_kldunload() and reorganize it slightly to avoid
  using gotos.  Also, expose this function to the rest of the kernel.
2006-06-13 21:28:18 +00:00
John Baldwin
6b3d277ad4 - Push down Giant some in kldstat().
- Use a 'struct kld_file_stat' on the stack to read data under the lock
  and then do one copyout() w/o holding the lock at the end to push the
  data out to userland.
2006-06-13 21:11:12 +00:00
John Baldwin
b904477c68 Unexpand TAILQ_FOREACH() and TAILQ_FOREACH_SAFE(). 2006-06-13 20:49:07 +00:00
John Baldwin
3a600aeabc Remove some more pointless goto's and don't check to see if
malloc(M_WAITOK) returns NULL.
2006-06-13 20:27:23 +00:00
John Baldwin
2fa6cc80d7 Handle the simple case of just dropping a reference near the start of
linker_file_unload() instead of in the middle of a bunch of code for
the case of dropping the last reference to improve readability and sanity.
While I'm here, remove pointless goto's that were just jumping to a
return statement.
2006-06-13 19:45:08 +00:00
Maxim Konovalov
70df31f4de o There are two methods to get a process credentials over the unix
sockets:

1) A sender sends SCM_CREDS message to a reciever, struct cmsgcred;
2) A reciever sets LOCAL_CREDS socket option and gets sender
credentials in control message, struct sockcred.

Both methods use the same control message type SCM_CREDS with the
same control message level SOL_SOCKET, so they are indistinguishable
for the receiver.  A difference in struct cmsgcred and struct sockcred
layouts may lead to unwanted effects.

Now for sockets with LOCAL_CREDS option remove all previous linked
SCM_CREDS control messages and then add a control message with
struct sockcred so the process specifically asked for the peer
credentials by LOCAL_CREDS option always gets struct sockcred.

PR:		kern/90800
Submitted by:	Andrey Simonenko
Regres. tests:	tools/regression/sockets/unix_cmsg/
MFC after:	1 month
2006-06-13 14:33:35 +00:00
David Xu
b41f1452d9 Add scheduler CORE, the work I have done half a year ago, recent,
I picked it up again. The scheduler is forked from ULE, but the
algorithm to detect an interactive process is almost completely
different with ULE, it comes from Linux paper "Understanding the
Linux 2.6.8.1 CPU Scheduler", although I still use same word
"score" as a priority boost in ULE scheduler.

Briefly, the scheduler has following characteristic:
1. Timesharing process's nice value is seriously respected,
   timeslice and interaction detecting algorithm are based
   on nice value.
2. per-cpu scheduling queue and load balancing.
3. O(1) scheduling.
4. Some cpu affinity code in wakeup path.
5. Support POSIX SCHED_FIFO and SCHED_RR.
Unlike scheduler 4BSD and ULE which using fuzzy RQ_PPQ, the scheduler
uses 256 priority queues. Unlike ULE which using pull and push, the
scheduelr uses pull method, the main reason is to let relative idle
cpu do the work, but current the whole scheduler is protected by the
big sched_lock, so the benefit is not visible, it really can be worse
than nothing because all other cpu are locked out when we are doing
balancing work, which the 4BSD scheduelr does not have this problem.
The scheduler does not support hyperthreading very well, in fact,
the scheduler does not make the difference between physical CPU and
logical CPU, this should be improved in feature. The scheduler has
priority inversion problem on MP machine, it is not good for
realtime scheduling, it can cause realtime process starving.
As a result, it seems the MySQL super-smack runs better on my
Pentium-D machine when using libthr, despite on UP or SMP kernel.
2006-06-13 13:12:56 +00:00
John Baldwin
5c69ad8374 Use fget() in kqueue_register() instead of doing all the work by hand. 2006-06-12 21:46:23 +00:00
Warner Losh
ccdc8d9bff Add a convenience function rman_init_from_resource for initializing
a rman from a resource.

Also, include _bus.h since the implementation of bus_space isn't
needed here, just the definitions of the types.
2006-06-12 04:06:21 +00:00
Ian Dowse
eb1030c4fd Keep firmware images on the list until they have been unregistered
with firmware_unregister(). Previously when the last driver reference
had been dropped we would clear the list entry under the assumption
that the firmware module was about to be unloaded, but this was not
true if the firmware image had been loaded manually with kldload.

This makes it possible to manually kldload firmware images as a
workaround for drivers such as ipw that attempt to load firmware
while resuming after a suspend.

Reviewed by:	mlaier (an earlier version of the patch)
2006-06-10 17:04:07 +00:00
Robert Watson
b37ffd3189 Move some functions and definitions from uipc_socket2.c to uipc_socket.c:
- Move sonewconn(), which creates new sockets for incoming connections on
  listen sockets, so that all socket allocate code is together in
  uipc_socket.c.

- Move 'maxsockets' and associated sysctls to uipc_socket.c with the
  socket allocation code.

- Move kern.ipc sysctl node to uipc_socket.c, add a SYSCTL_DECL() for it
  to sysctl.h and remove lots of scattered implementations in various
  IPC modules.

- Sort sodealloc() after soalloc() in uipc_socket.c for dependency order
  reasons.  Statisticize soalloc() and sodealloc() as they are now
  required only in uipc_socket.c, and are internal to the socket
  implementation.

After this change, socket allocation and deallocation is entirely
centralized in one file, and uipc_socket2.c consists entirely of socket
buffer manipulation and default protocol switch functions.

MFC after:	1 month
2006-06-10 14:34:07 +00:00
Robert Watson
e02421f3fb Rearrange code in soalloc() so that it's less indented by returning
early if uma_zalloc() from the socket zone fails.  No functional
change.

MFC after:	1 week
2006-06-08 22:33:18 +00:00
Konstantin Belousov
55aef2632f Fix the LOR that occurs when the MAC compiled into the kernel
and vnode is destroyed.

Reviewed by:	rwatson
LOR:		189
MFC after:	2 weeks
Approved by:	kan (mentor)
2006-06-08 07:55:10 +00:00
David Xu
0ae716e5ee Make ke_rqindex unsigned. 2006-06-06 12:26:17 +00:00
Robert Watson
7ebfc8df78 Audit some arguments to nmount(), mount(), umount().
Submitted by:	wsalamon
Obtained from:	TrustedBSD Project
2006-06-05 15:32:07 +00:00
Robert Watson
6e79e6f805 Audit command, uid arguments for quotactl().
Audit the mode argument to mkfifo().
Audit the target path passed to symlink().

Submitted by:	wsalamon
Obtained from:	TrustedBSD Project
2006-06-05 13:34:23 +00:00
Robert Watson
d3778141bf Audit path passed to the acct() system call.
Obtained from:	TrustedBSD Project
2006-06-05 13:02:34 +00:00
John Baldwin
49b94bfc54 Bah, fix fat finger in last. Invert the ~ on MTX_FLAGMASK as it's
non-intuitive for the ~ to be built into the mask.  All the users now
explicitly ~ the mask.  In addition, add MTX_UNOWNED to the mask even
though it technically isn't a flag.  This should unbreak mtx_owner().

Quickly spotted by:	kris
2006-06-03 21:11:33 +00:00
John Baldwin
3ce3f44293 In the case of reentering the debugger due to an attempt to perform a
context switch while in the debugger, reenter the debugger sooner before
performing any statistics updates.
2006-06-03 20:49:44 +00:00
John Baldwin
315ce35f7b Simplify mtx_owner() so it only reads m->mtx_lock once. 2006-06-03 20:45:00 +00:00
John Baldwin
f781b5a4bb Style fix to be more like _mtx_lock_sleep(): use 'while (!foo) { ... }'
instead of 'for (;;) { if (foo) break; ... }'.
2006-06-03 20:44:01 +00:00
Pawel Jakub Dawidek
1f58dd4956 Fix a problem introduced in revision 1.220. On mount(2) failure, don't
forget to unbusy file system before its destruction.

This fixes the following warning on mount failure:

	Mount point <X> had 1 dangling refs

Tested by:	wkoszek
2006-06-02 20:29:02 +00:00
Doug Ambrisko
51e37c7f37 Make lio ident more consistant with aio ident. 2006-06-02 17:45:48 +00:00
Pawel Jakub Dawidek
f420242b2b Don't forget to unlock kq lock in low memory situations.
OK'ed by:	jmg
2006-06-02 13:23:39 +00:00
Pawel Jakub Dawidek
8ebab14c70 Remove confusing done_noglobal label. The KQ_GLOBAL_UNLOCK() macro know
how to handle both situations - when kq_global lock is and is not held.

OK'ed by:	jmg
2006-06-02 13:21:21 +00:00
Pawel Jakub Dawidek
241321abc0 Use SLIST_FOREACH_SAFE() macro, because knote_drop() can free an element
which can be then used to find next element in the list.

OK'ed by:	jmg
2006-06-02 13:18:59 +00:00
Olivier Houchard
4bb0f51d1d sched_rem() already sets ke->ke_state to KES_THREAD, so there's no need
to redo it.
2006-06-01 22:45:56 +00:00
Diomidis Spinellis
23efd78d03 Remove two locking assertion entries that:
a) were incorrectly written and therefore never compiled into
assertions, and
b) were incorrectly specified and when compiled resulted in a
failed assertion.
2006-05-31 14:06:06 +00:00
Diomidis Spinellis
f69ec7af12 Assertion code specifications are introduced using special character
sequences that are distinct from comments. %% is used for argument
locks; %! for pre- and post-conditions.
2006-05-30 20:49:54 +00:00
Diomidis Spinellis
b1b4282160 Remove incorrect lock validation specifications that caused
failed assertions with DEBUG_VFS_LOCKS.
We should reinstate them with correct specifications, possibly
after extendng vnode_if.awk

Noted by: truckman@
2006-05-30 20:21:51 +00:00
Tor Egge
57051fdc4b Close race between vmspace_exitfree() and exit1() and races between
vmspace_exitfree() and vmspace_free() which could result in the same
vmspace being freed twice.

Factor out part of exit1() into new function vmspace_exit().  Attach
to vmspace0 to allow old vmspace to be freed earlier.

Add new function, vmspace_acquire_ref(), for obtaining a vmspace
reference for a vmspace belonging to another process.  Avoid changing
vmspace refcount from 0 to 1 since that could also lead to the same
vmspace being freed twice.

Change vmtotal() and swapout_procs() to use vmspace_acquire_ref().

Reviewed by:	alc
2006-05-29 21:28:56 +00:00
Xin LI
56e26c3e7e Unexpand TAILQ_FIRST(foo) == NULL to TAILQ_EMPTY(foo). 2006-05-29 05:43:26 +00:00
Kris Kennaway
80a8e5da94 Correct typos
MFC after:	2 weeks
2006-05-28 22:15:28 +00:00
Robert Watson
4bb260ad78 In execve(), audit the path name being executed. In the future, it
would also be good to audit the interpreter pathname, if any.

Obtained from:	TrustedBSD Project
2006-05-28 08:28:47 +00:00
Diomidis Spinellis
0e1c7fb8ea Add missing % signs in the lock annotations of the functions:
lookup, rename, strategy, islocked
The missing % sign meant that the lines were processed as plain
comments and the corresponding assertions were never generated.
2006-05-28 07:24:12 +00:00
Xin LI
e38c7f3ef3 extlen and cpp is not used here in linker_search_kld(), so nuke them.
Reported by:	Mingyan Guo <guomingyan at gmail dot com>
MFC After:	2 weeks
2006-05-27 09:21:41 +00:00
Poul-Henning Kamp
9dd2370db6 If the console has no cncheckc method, use cngetc instead. 2006-05-26 11:00:20 +00:00
Poul-Henning Kamp
8aed7613bd Don't use CONS_DRIVER() macro to insert dummy element in cons_set 2006-05-26 10:46:38 +00:00
Poul-Henning Kamp
16b1613a31 GC the cn_dbctl_t hook for consoles, it is unused.
This used to make syscons switch to vty0 when we entered DDB but this
was lost in the KDB shuffle.  We may want to bring it back down the road
but it should be done by calling cn_init_t/cn_term_t instead, possibly
with a flag argument saying "Debugger!"
2006-05-26 10:24:00 +00:00
Craig Rodrigues
0c89bb0a02 Add "update" mount option to global_opts array,
for use with vfs_filteropt().
2006-05-26 02:38:48 +00:00
Craig Rodrigues
5eb304a91a Remove calls to vfs_export() for exporting a filesystem for NFS mounting
from individual filesystems.  Call it instead in vfs_mount.c,
after we call VFS_MOUNT() for a specific filesystem.
2006-05-26 00:32:21 +00:00
Robert Watson
20bdac8a4f Use getsock() and fput() instead of fgetsock() and fputsock() in
sendfile().  This causes sendfile() to use the file descriptor
reference to the socket instead of bumping the socket reference
count, which avoids an additional refcount operation, as well as a
potential expensive socket refcount drop, which can lead to
contention on the accept mutex.  This change also has the side
effect of further reducing the number of cases where an in-progress
I/O operation can occur on a socket after close, as using the file
descriptor refcount prevents the socket from closing while in use.

MFC after:	3 months
2006-05-25 15:10:13 +00:00
Stephan Uphoff
dcf67e65d2 Do not set B_NOCACHE on buffers when releasing them in flushbuflist().
If B_NOCACHE is set the pages of vm backed buffers will be invalidated.
However clean buffers can be backed by dirty VM pages so invalidating them
can lead to data loss.
Add support for flush dirty page in the data invalidation function
of some network file systems.

This fixes data losses during vnode recycling (and other code paths
using invalbuf(*,V_SAVE,*,*)) for data written using an mmaped file.

Collaborative effort by: jhb@,mohans@,peter@,ps@,ups@
Reviewed by:	tegge@
MFC after:	7 days
2006-05-25 01:00:35 +00:00
Sam Leffler
75b773ae3d When starting up threads in taskqueue_start_threads create them
stopped before adjusting their priority and setting them on the run
q so they cannot race for resources (pointed out by njl).

While here add a console printf on thread create fails; otherwise
noone may notice (e.g. return value is always 0 and caller has no
way to verify).

Reviewed by:	jhb, scottl
MFC after:	2 weeks
2006-05-24 22:11:07 +00:00
David Xu
f705bbe8b1 Don't allow non-root user to set a scheduler policy, otherwise this could
be a local DOS.

Submitted by: Diane Bruce at db at db.net
2006-05-21 00:40:38 +00:00
David Xu
f6c040a2c5 Style fixes.
Submitted by: Diane Bruce < db at db dot net >
2006-05-19 06:37:24 +00:00
David Xu
7b8d821268 Move flag TDF_UMTXQ into structure umtxq, this eliminates the requirement
of scheduler lock in some umtx code.
2006-05-18 08:43:46 +00:00
Poul-Henning Kamp
d595182f0b Make the printfs relating to purging threads from a device less intrusive. 2006-05-17 06:37:14 +00:00
Poul-Henning Kamp
c40da00ca3 Since DELAY() was moved, most <machine/clock.h> #includes have been
unnecessary.
2006-05-16 14:37:58 +00:00
Paul Saab
6befa6ae1b Allow concurrent read(2)/readv(2) access to a file.
Lock file offset against multiple read calls.

Submitted by:	ups
Obtained from:	Yahoo!
MFC after:	2 weeks
2006-05-16 07:50:54 +00:00
Kelly Yancey
c9ad8a67af Restore the ability to mount procfs and fdescfs filesystems via the
mount(2) system call:

  * Add cmount hook to fdescfs and pseudofs (and, by extension, procfs and
    linprocfs).  This (mostly) restores the ability to mount these
    filesystems using the old mount(2) system call (see below for the
    rest of the fix).

  * Remove not-NULL check for the data argument from the mount(2) entry
    point.  Per the mount(2) man page, it is up to the individual
    filesystem being mounted to verify data.  Or, in the case of procfs,
    etc. the filesystem is free to ignore the data parameter if it does
    not use it.  Enforcing data to be not-NULL in the mount(2) system call
    entry point prevented passing NULL to filesystems which ignored the
    data pointer value.  Apparently, passing NULL was common practice
    in such cases, as even our own mount_std(8) used to do it in the
    pre-nmount(2) world.

All userland programs in the tree were converted to nmount(2) long ago,
but I've found at least one external program which broke due to this
(presumably unintentional) mount(2) API change.  One could argue that
external programs should also be converted to nmount(2), but then there
isn't much point in keeping the mount(2) interface for backward
compatibility if it isn't backward compatible.
2006-05-15 19:42:10 +00:00
Benno Rice
77fe443878 The VERBOSE_SYSINIT stuff sees the DDB define a lot better if we include
opt_ddb.h.

Spotted by:	benno
Pointy hat to:	benno
2006-05-14 07:11:28 +00:00
Craig Rodrigues
5250012a1d For nmount(), if "rw" is specified as a mount option,
add "noro" to the list of mount options.  This allows
a read-only mount to be converted to read-write via:
mount -u -o rw

Requested by:	kris
2006-05-14 01:51:38 +00:00
John Baldwin
73dbd3da73 Remove various bits of conditional Alpha code and fixup a few comments. 2006-05-12 05:04:46 +00:00
Benno Rice
26ab616fdc Add a new kernel config option, VERBOSE_SYSINIT.
When porting FreeBSD to a new platform, one of the more useful things to do is
get mi_startup() to let you know which SYSINIT it's up to.  Most people tend to
whack a printf in the SYSINIT loop to print the address of the function it's
about to call.  Going one better, jhb made a version that uses DDB to look up
the name of the function and print that instead.  This version is essentially
his with the addition of some ifdeffery to make it optional and to allow it to
work (although using only the function address, not the symbol) if you forgot
to enable DDB.

All the cool bits by:	jhb
Approved by:		scottl, rink, cognet, imp
2006-05-12 02:01:38 +00:00
Poul-Henning Kamp
99ab8292c7 Remove more straggling CPU_ macro references 2006-05-11 17:53:26 +00:00
David Xu
005efcdb0e Use wakeup_one to avoid thundering herd.
Tested by: kris
2006-05-09 13:00:46 +00:00
David Xu
759ccccadb Use a dedicated mutex to protect aio queues, the movation is to reduce
lock contention with other parts.
2006-05-09 00:10:11 +00:00
Tor Egge
11991ab418 Call vn_finished_write() before calling the coredump handler which will
indirectly call vn_start_write() as necessary for each write.
2006-05-07 22:50:22 +00:00
Tor Egge
d302786c87 Temporarily unlock vnode for new image being executed to avoid lock order
reversals that can lead to deadlocks.  Normally vn_close(), namei() or vrele()
should not be called while holding vnode locks.
2006-05-05 20:25:05 +00:00
Pawel Jakub Dawidek
643df192de vn_start_write()/vn_finished_write() is not needed here, because
vn_start_write() is always called earlier in the code path and calling
the function recursively may lead to a deadlock.

Confirmed by:	tegge
MFC after:	2 weeks
2006-04-29 21:57:38 +00:00
Kris Kennaway
cef31ff7d9 Lock giant when assigning ni_vp and keep vfslocked state valid.
Committed for:	jeff
2006-04-29 07:13:49 +00:00
Pawel Jakub Dawidek
122410eea2 vn_start_write() is called only when v_type != VCHR, so corresponding
vn_finished_write() should also be called only then.

BTW. I fixed two functions here: vn_rdwr() and vn_write(). The latter seems
to be unused.

MFC after:	3 weeks
2006-04-28 21:54:05 +00:00
Robert Watson
3bf14fd5e9 Also check use_pty in the ptmx clone lookup; this means that when ptmx
support is turned off using the sysctl, we no longer even allow the
ptmx device to be looked up.

Foot provided by:	peter
2006-04-28 21:39:57 +00:00
Marcel Moolenaar
8f405ed335 Remove the puc-specific hacks. The puc(4) driver now properly uses
the rman(9) interface.
2006-04-28 21:23:09 +00:00
Jeff Roberson
6ca9fcc586 - Add a BO_NEEDSGIANT flag to the bufobj. This flag forces all child
buffers to go on the buf daemon's DIRTYGIANT queue.
 - Set BO_NEEDSGIANT on ffs's devvp since the ffs_copyonwrite handler
   runs in the context of the buf daemon and may require Giant.
2006-04-28 01:05:31 +00:00
Jeff Roberson
4b5b86816c - Consistently track ni_dvp and ni_vp with dvfslocked and vfslocked rather
than trying to optimize it into a single lock.  This adds more calls to
   lock giant with non smpsafe filesystems but is the only way to reliably
   hold the correct lock.
 - Remove an invalid assert in the mountedhere case in lookup and fix the
   code to properly deal with the scenario.  We can actually have a lookup
   that returns dp == dvp with mountedhere set with certain unmount races.

Tested by:	kris
Reported by:	kris/mohans
2006-04-28 00:59:48 +00:00
John-Mark Gurney
5c06d111b8 back out for now... revert ccpu to being kern.ccpu... 2006-04-27 17:57:59 +00:00
John-Mark Gurney
c71ce6a445 move remaining sysctl into the kern.sched tree... 2006-04-26 19:42:38 +00:00
John Baldwin
ae110b53d1 Add some new commands to hopefully make it easier to diagnose lock-related
problems in ddb:
- "show threadchain [thread]" will start with the specified thread (or the
  current kdb thread by default) and show it's state.  If it is blocked on
  a lock, it will find the owner of the lock and show its state, etc.
- "show allchains" will find all of the threads that are blocked on a
  lock (but do not have any threads blocked on a lock they hold) and show
  the resulting thread chain.
- "show lockchain <lock>" takes a pointer to a lock_object (such as a
  mutex or rwlock).  If there is a turnstile for that lock, then it will
  display all the threads blocked on the lock.  In addition, for each
  thread blocked on the lock, it will display any contested locks they
  hold, and recurse on those locks to show any threads blocked on those
  locks, etc.
2006-04-25 20:28:17 +00:00
John Baldwin
de833b7c0c Use db_lookup_thread() to lookup the thread for the passed in address
and change 'show locks' to only list the locks for a given thread
rather than for all the threads in the process containing a specified
thread.
2006-04-25 20:24:23 +00:00
Marius Strobl
fa63296aba Remove last vestiges of sab(4). 2006-04-25 19:43:53 +00:00
Robert Watson
102ea03373 Extend getsock() to return the struct file flags read while holding the
file lock, in the style of fgetsock().

Modify accept1() to use getsock() instead of fgetsock(), relying on the
file descriptor reference rather than an acquired socket reference to
prevent the listen socket from being destroyed during accept().  This
avoids additional reference count operations, which should improve
performance, and also avoids accept1() operating on a socket whose file
descriptor has been torn down, which may have resulted in protocol
shutdown starting.

MFC after:	3 months
2006-04-25 11:48:16 +00:00
Maxim Konovalov
481f8fe85f Inherit LOCAL_CREDS option from listen socket for sockets returned
by accept(2).

PR:		kern/90644
Submitted by:	Andrey Simonenko
OK'ed by:	mdodd
Tested by:	NetBSD regress/sys/kern/unfdpass/unfdpass.c
MFC after:	1 month
2006-04-24 19:09:33 +00:00
Marcel Moolenaar
845652dd28 MFp4: Add the ipend() method to the serdev I/F to allow umbrella
drivers to obtain pending interrupt status from subordinate
	drivers.
2006-04-23 22:12:39 +00:00
Robert Watson
0cec9959e8 Assert that sockets passed into soabort() not be SQ_COMP or SQ_INCOMP,
since that removal should have been done a layer up.

MFC after:	3 months
2006-04-23 18:15:54 +00:00
Robert Watson
28ea180136 Add missing 'not' to SQ_COMP comment.
MFC after:	3 months
2006-04-23 15:37:23 +00:00
Robert Watson
6ca35d4b81 Move handling of SQ_COMP exception case in sofree() to the top of the
function along with the remainder of the reference checking code.  Move
comment from body to header with remainder of comments.  Inclusion of a
socket in a completed connection queue counts as a true reference, and
should not be handled as an under-documented edge case.

MFC after:	3 months
2006-04-23 15:33:38 +00:00
John Baldwin
f9ab2f134f Print td_name instead of p_comm if td_name is non-empty for
'show turnstile' and 'show sleepq'.
2006-04-21 20:40:43 +00:00
Paul Saab
95f16c1e2c Don't try to kill embryonic processes in killpg1(). This prevents
a race condition between fork() and kill(pid,sig) with pid < 0 that
can cause a kernel panic.

Submitted by:	up
MFC after:	3 weeks
2006-04-21 19:26:21 +00:00
Paul Saab
4f590175b7 Allow for nmbclusters and maxsockets to be increased via sysctl.
An eventhandler is used to update all the various zones that depend
on these values.
2006-04-21 09:25:40 +00:00
John-Mark Gurney
be4db476a6 const'ify resource_spec to note that we won't be changing anything while
releasing resources... also, NULL out the resources as we free them...
2006-04-20 01:44:16 +00:00
Warner Losh
0385d64761 r_spare1 and r_spare2 aren't needed. They aren't used. They can't be
accessed from outside of subr_rman.c.  Remove them.

Reviewed by: jmg (in theory)
2006-04-19 21:25:55 +00:00
John Baldwin
fea3efe5bf Implement rw_try_upgrade() and rw_downgrade(). rw_try_upgrade() makes a
single attempt at upgrading a read lock to a write lock, and rw_downgrade()
converts curthread's write lock into a read lock.
2006-04-19 21:06:52 +00:00
Wojciech A. Koszek
5884c1a098 'owner' is not used without SMP. Fix kernel build for such kernel
configurations.

Approved by:	jhb
2006-04-18 20:32:42 +00:00
John Baldwin
efa86db61d Adaptively spin before blocking on the turnstile if an rwlock is write
locked.  In general the adaptive spinning is similar to the same code
for mutexes with some extra trickiness in rw_wunlock_hard().  Specifically,
even though both wait bits might be set and we might have a turnstile with
at least one waiting thread, there might not be any threads blocked on the
queue we are not waking up (they might all be spinning), and we should
only preserve the waiting flag for the queue we aren't waking up if there
are in fact threads blocked on that queue.  Secondly, there might not be
any threads blocked on the queue we have chosen to waken threads from
(there might only be threads blocked on the other queue and the threads
for this queue are all spinning) in which case we disown the turnstile
instead of doing a braodcast and unpend.
2006-04-18 18:27:54 +00:00
John Baldwin
f1a4b852dc - Bring back turnstile_empty() which can check to see if an individual
queue on a turnstile is empty.
- Add a turnstile_disown() function that allows a thread to give up
  ownership of a turnstile w/o waking up any waiters.
2006-04-18 18:16:54 +00:00
Xin LI
4207c279d4 In vfs_hash_get(): mount point should never be changed
so explicitly constify the mp parameter.

Reviewed by:	phk
2006-04-18 08:05:08 +00:00
John Baldwin
38bf165fa1 - Add a rw_wowner() macro that just returns the owner of a write lock and
use it in places that only care about the write owner instead of
  rw_owner() as a baby step towards limited read-lock owner.
- Tidy the code that sets the WAITER flag bits to not duplicate a test
  around the atomic operation and the KTR trace in both of the lock
  functions.
2006-04-17 21:11:01 +00:00
John Baldwin
32553b153e Add a 'show sleepqueue' alias for 'show sleepq' in DDB. 2006-04-17 20:16:32 +00:00
John Baldwin
964b557211 Trim trailing whitespace. 2006-04-17 20:14:51 +00:00
John Baldwin
2971c36136 Add a new module_file() function that returns the linker_file_t associated
with a given module_t.  I use this in some the MOD_LOAD event handler for
some test kernel modules to ask the kernel linker to look up the linker
sets in my test modules. (I use linker sets to generate the list of
possible events that I then signal to execute via a sysctl.  On non-amd64,
ld(8) would resolve the entire linker set, but on amd64 I have to ask the
kernel linker to do it for me, and having the kernel linker do it works on
all archs.)
2006-04-17 19:44:44 +00:00
John Baldwin
0f180a7cce Change msleep() and tsleep() to not alter the calling thread's priority
if the specified priority is zero.  This avoids a race where the calling
thread could read a snapshot of it's current priority, then a different
thread could change the first thread's priority, then the original thread
would call sched_prio() inside msleep() undoing the change made by the
second thread.  I used a priority of zero as no thread that calls msleep()
or tsleep() should be specifying a priority of zero anyway.

The various places that passed 'curthread->td_priority' or some variant
as the priority now pass 0.
2006-04-17 18:20:38 +00:00
John-Mark Gurney
e98b5a89de remove duplicate sizeof vnode entry (debug.sizeof.vnode already existed)...
move ncsize into debug.sizeof and rename to namecache...
2006-04-16 18:38:30 +00:00
Scott Long
bb141be10a Take a better stab at making this compile. 2006-04-15 18:54:56 +00:00
Scott Long
83bc5d54c8 Take a stab at making this compile. 2006-04-15 18:04:04 +00:00
John Baldwin
76447e5618 Mark the thread pointer used during an adaptive spin volatile so that the
compiler doesn't decide to cache td_state.  Cachine the state would cause
the spinning thread to not notice when the owning thread stopped executing
(if it was preempted for example) which could result in livelock.
2006-04-14 19:51:50 +00:00
John Baldwin
a29b4f6eec Drop the kqueue global mutex as soon as we are finished with it rather
than keeping it locked until we exit the function to optimize the case
where the lock would be dropped and later reacquired.  The optimization
was broken when kevent's were moved from UFS to VFS and the knote list
lock for a vnode kevent became the lockmgr vnode lock.  If one tried
to use a kqueue that contained events for a kqueue fd followed by a vnode,
then the kq global lock would end up being held when the vnode lock was
acquired which could result in sleeping with a mutex held (and subsequent
panics) if the vnode lock was contested.

Reviewed by:	jmg
Tested by:	ps (on 6.x)
MFC after:	3 days
2006-04-14 14:27:28 +00:00
David Xu
cfd6f8cd6c Clear TDF_SINTR in sleepq_resume_thread, also sleepq_catch_signal does
not need to clear it now, this should fix panic when msleep is recursivly
called. Patch is slightly adjusted after review.

Reviewed by: jhb
Tested by: Csaba Henk, csaba-ml at creo.hu
MFC after: 3 days
2006-04-13 23:29:25 +00:00
John Baldwin
9477358d00 Turn on ithread_destroy() and call it from intr_event_destroy() to tear
down an interrupt event's associated thread (if it has one).
2006-04-13 17:29:04 +00:00
Christian S.J. Peron
d5e5634075 Kill the last Giant acquisition in the exit(2) code. This Giant acquisition
doesn't appear to be protecting anything. Most of consumers funsetownlst(9)
do not appear to be picking up Giant anywhere. This was originally a part
of my Giant exit(2) clean up revision 1.272 but I thought it was a good idea
to leave it out until we were able to analyze it better.

Tested by:	kris
MFC after:	3 weeks
2006-04-10 14:07:28 +00:00
Pawel Jakub Dawidek
0909f38a3c On shutdown try to turn off all swap devices. This way GEOM providers are
properly closed on shutdown.

Requested by:	ru
Reviewed by:	alc
MFC after:	2 weeks
2006-04-10 10:03:41 +00:00
David Xu
e631cff309 Use proc lock to prevent a thread from exiting, Giant was no longer used to
protect thread list.
2006-04-10 04:55:59 +00:00
Robert Watson
d37b79a00f Remove UNIX domain socket raw socket support. This feature is documented
as being undocumented in Stevens, and was broken in 1997 during network
stack infrastructure work.  It is the one remaining (and incorrect)
direct protocol reference to raw_usrreq.pru_attach; this is incorrect
because the raw socket code assumes that raw_uattach is called only after
the protocol has allocated a PCB.

MFC after:	3 months
2006-04-09 16:29:47 +00:00
Marcel Moolenaar
07c8931358 Add the scc_hwmtx spin mutex, defined by scc(4). 2006-04-07 22:15:54 +00:00
John-Mark Gurney
1c4ca5e5fe spell unlock correctly, this is relatively minor as it's rare someone would
provide a lock method, and want the default unlock, but it is a bug...

PR:		95356
Submitted by:	Stephen Corteselli
MFC after:	3 days
2006-04-07 17:21:27 +00:00
Jeff Roberson
b53bf1269c - VFS_LOCK_GIANT when recycling a vnode via getnewvnode. We may be
recycling for an unrelated filesystem.  I really don't like potentially
   acquiring giant in the context of a giantless filesystem but there
   are reasonable objections to removing the recycling from this path.

Sponsored by:	Isilon Systems, Inc.
2006-04-04 06:46:10 +00:00
Jeff Roberson
4b24e4210e - Properly check against B_DELWRI and B_NEEDSGIANT. This check was
incorrectly written and caused some !NEEDSGIANT buffers to be put in
   the NEEDSGIANT queue.

Sponsored by:	Isilon Systems, Inc.
2006-04-04 06:44:21 +00:00
Marcel Moolenaar
39eb1d1263 Increment kdb_active after we stopped the other CPUs and decrement
kdb_active before we restart them. This avoids false positives on
restarted CPUs when they test for kdb_active while kdb_trap() is
still finishing up.
2006-04-04 00:40:20 +00:00
Marcel Moolenaar
bfcdefd8aa Eliminate HAVE_STOPPEDPCBS. On ia64 the PCPU holds a pointer to the
PCB in which the context of stopped CPUs is stored. To access this
PCB from KDB, we introduce a new define, called KDB_STOPPEDPCB. The
definition, when present, lives in <machine/kdb.h> and abstracts
where MD code saves the context. Define KDB_STOPPEDPCB on i386,
amd64, alpha and sparc64 in accordance to previous code.
2006-04-03 22:51:47 +00:00
Peter Wemm
b9eee07e36 Remove the unused sva and eva arguments from pmap_remove_pages(). 2006-04-03 21:16:10 +00:00
Marcel Moolenaar
5991a4f811 In kdb_trap(), change the type of the local variable 'intr' from int
to register_t, as intr_disable() returns the latter and register_t
may be wider than int.

Pointed out by: marius@
2006-04-03 20:55:52 +00:00
Marcel Moolenaar
2fae8f5aed Replace critical_enter() and critical_exit() in kdb_trap() with
intr_disable() and intr_restore() resp. Previously, critical
regions would have interrupts disabled, but that was changed.
Consequently, the debugger could run with interrupts enabled.
This could cause problems for the low-level console code where
received characters would trigger an interrupt that causes
the interrupt handler to read the character instead of the
cngetc() function.
2006-04-03 17:48:09 +00:00
John-Mark Gurney
5e6125891f mask out any action when copying the flags from the event to the knote..
Pointed out by:	Václav Haisman
Submitted by:	Dan Nelson (slightly modifed patch)
MFC after:	3 days
2006-04-01 20:15:39 +00:00
Robert Watson
bc725eafc7 Chance protocol switch method pru_detach() so that it returns void
rather than an error.  Detaches do not "fail", they other occur or
the protocol flags SS_PROTOREF to take ownership of the socket.

soclose() no longer looks at so_pcb to see if it's NULL, relying
entirely on the protocol to decide whether it's time to free the
socket or not using SS_PROTOREF.  so_pcb is now entirely owned and
managed by the protocol code.  Likewise, no longer test so_pcb in
other socket functions, such as soreceive(), which have no business
digging into protocol internals.

Protocol detach routines no longer try to free the socket on detach,
this is performed in the socket code if the protocol permits it.

In rts_detach(), no longer test for rp != NULL in detach, and
likewise in other protocols that don't permit a NULL so_pcb, reduce
the incidence of testing for it during detach.

netinet and netinet6 are not fully updated to this change, which
will be in an upcoming commit.  In their current state they may leak
memory or panic.

MFC after:	3 months
2006-04-01 15:42:02 +00:00
Robert Watson
ac45e92ff2 Change protocol switch pru_abort() API so that it returns void rather
than an int, as an error here is not meaningful.  Modify soabort() to
unconditionally free the socket on the return of pru_abort(), and
modify most protocols to no longer conditionally free the socket,
since the caller will do this.

This commit likely leaves parts of netinet and netinet6 in a situation
where they may panic or leak memory, as they have not are not fully
updated by this commit.  This will be corrected shortly in followup
commits to these components.

MFC after:      3 months
2006-04-01 15:15:05 +00:00
Robert Watson
fa4c5373ce Add comment to accept1() that it should use getsock() instead of fgetsock()
to avoid additional mutex operations, and also to avoid use of soref/sorele
which are now not preferred.

MFC after:	3 months
2006-04-01 11:14:56 +00:00
Robert Watson
197b35d717 Mark fgetsock() and fputsock() as depcrecated: callers should rely on
the file descriptor reference, rather than paying additional lock
operations to acquire a socket reference from the file descriptor.
This will also help to ensure that file descriptor based socket
requests are not delivered to a socket after close.  Most consumers
have already been converted to this model.

MFC after:	3 months
2006-04-01 11:09:54 +00:00
Robert Watson
7f689de232 Assert so->so_pcb is NULL in sodealloc() -- the protocol state should not
be present at this point.  We will eventually remove this assert because
the socket layer should never look at so_pcb, but for now it's a useful
debugging tool.

MFC after:	3 months
2006-04-01 10:45:52 +00:00
Robert Watson
220c1357ed Add a somewhat sizable comment documenting the semantics of various kernel
socket calls relating to the creation and destruction of sockets.  This
will eventually form the foundation of socket(9), but is currently in too
much flux to do so.

MFC after:	3 months
2006-04-01 10:43:02 +00:00
Jeff Roberson
0af2472199 - Add an assert to vgone. It is illegal to call vgone without a reference
to the vnode.  Without a reference the vnode will never be vdestroy'd
   and the memory will never be reclaimed.

Sponsored by:	Isilon Systems, Inc.
2006-03-31 23:39:26 +00:00
Jeff Roberson
ba5eb429e3 - When there are dangling vnodes at unmount print them before we panic.
Sponsored by:	Isilon Systems, Inc.
2006-03-31 23:38:15 +00:00
Jeff Roberson
3bbd6d8ae6 - Release the references acquired by VOP_GETWRITEMOUNT and vfs_getvfs().
Discussed with:	tegge
Tested by:	kris
Sponsored by:	Isilon Systems, Inc.
2006-03-31 03:54:20 +00:00
Jeff Roberson
94bc95db3c - Hold a reference from the time vfs_busy starts until vfs_unbusy is
called.
 - vfs_getvfs has to return a reference to prevent the returned mountpoint
   from changing identities.
 - Release references acquired via vfs_getvfs.

Discussed with:	tegge
Tested by:	kris
Sponsored by:	Isilon Systems, Inc.
2006-03-31 03:53:25 +00:00
Jeff Roberson
c5fcce21c5 - GETWRITEMOUNT now returns a referenced mountpoint to prevent its
identity from changing.  This is possible now that mounts are not freed.

Discussed with:	tegge
Tested by:	kris
Sponsored by:	Isilon Systems, Inc.
2006-03-31 03:52:24 +00:00
Jeff Roberson
a218edceb2 - Allocate mounts from a uma zone that uses UMA_ZONE_NOFREE to prevent
mount memory from being reclaimed.  This resolves a number of race
   conditions described in vfs_default.c and introduced with the
   VFS_LOCK_GIANT macros.
 - Let the mtx and lock remain valid after the mount structure has been
   freed by using init and fini calls.  Technically fini will never be
   called but is included for completeness.
 - Consistently use lockmgr directly rather than lockmgr to lock and
   vfs_unbusy to unlock.

Discussed with:	tegge
Tested by:	kris
Sponsored by:	Isilon Systems, Inc.
2006-03-31 03:49:51 +00:00
Jeff Roberson
fdf86b2dcd - LK_RETRY means nothing when passed to VOP_LOCK. Call vn_lock instead.
- Move the vn_lock of the dvp until after we've unbusied the filesystem
   to avoid a LOR with the mount point lock.
 - In the v_mountedhere while loop we acquire a new instance of giant each
   time through without releasing the first.  This would cause us to leak
   Giant.

Sponsored by:	Isilon Systems, Inc.
2006-03-31 02:59:23 +00:00
Jeff Roberson
084d64ac21 - Add the B_NEEDSGIANT flag which is only set if the vnode that owns a buf
requires Giant.  It is set in bgetvp and cleared in brelvp.
 - Create QUEUE_DIRTY_GIANT for dirty buffers that require giant.
 - In the buf daemon, only grab giant when processing QUEUE_DIRTY_GIANT and
   only if we think there are buffers in that queue.

Sponsored by:	Isilon Systems, Inc.
2006-03-31 02:56:30 +00:00
Sam Leffler
00537061dd fixup error handling in taskqueue_start_threads: check for kthread_create
failing, print a message when we fail for some reason as most callers do
not check the return value (e.g. 'cuz they're called from SYSINIT)

Reviewed by:	scottl
MFC after:	1 week
2006-03-30 23:06:59 +00:00
Pawel Jakub Dawidek
177a987379 Fix a panic on sparc64 related to inproper aligment - we cannot assume,
that 'unsigned char *' argument is 4 byte aligned.

MFC after:	3 days
2006-03-30 18:45:50 +00:00
Marcel Moolenaar
6174e6ed12 Add scc(4), a driver for serial communications controllers. These
controllers typically have multiple channels and support a number
of serial communications protocols. The scc(4) driver is itself
an umbrella driver that delegates the control over each channel
and mode to a subordinate driver (like uart(4)).
The scc(4) driver supports the Siemens SAB 82532 and the Zilog
Z8530 and replaces puc(4) for these devices.
2006-03-30 18:33:22 +00:00
Paul Saab
fbb273bc05 Properly support for FreeBSD 4 32bit System V shared memory.
Submitted by:	peter
Obtained from:	Yahoo!
MFC after:	3 weeks
2006-03-30 07:42:32 +00:00
John Baldwin
4b3b0413d2 Always explicitly panic in propogate_priority() if we try to propogate
a lock's priority to a sleeping thread.  When we panic, dump a stack
trace of the thread that is asleep if DDB is compiled into the kernel
just before calling panic().  This is much more informative and useful
for debugging than the current behavior of getting a page fault and not
having an easy way of determining which thread caused the original problem.

MFC after:	1 week
2006-03-29 23:24:55 +00:00
John-Mark Gurney
4e095bc045 hold the list lock over the f_event and KNOTE_ACTIVATE calls... This closes
a race where data could come in before we clear the INFLUX flag, and get
skipped over by knote (and hence never be activated, though it should of
been)...

Found by:	glebius & co.
Reviewed by:	glebius
MFC after:	3 days
2006-03-29 18:15:30 +00:00
John Baldwin
33f19bee6f - Conditionalize Giant around VFS operations for ALQ, ktrace, and
generating a coredump as the result of a signal.
- Fix a bug where we could leak a Giant lock if vn_start_write() failed
  in coredump().

Reported by:	jmg (2)
2006-03-28 21:30:22 +00:00
John Baldwin
11178ee4c1 Conditionalize locking of Giant for VFS in acct(2). We already
conditionally acquired Giant in the other parts of the accounting code.
2006-03-28 21:26:59 +00:00
John Baldwin
861dab08e7 Change vn_open() to honor the MPSAFE flag in the passed in nameidata object
and use that instead of testing fdidx against -1 to determine if it should
release Giant if Giant was locked due to the requested file residing on a
non-MPSAFE VFS.

Discussed with:	jeff
2006-03-28 21:22:08 +00:00
Dag-Erling Smørgrav
867c089bc7 Revert previous commit at davidxu's insistance. Instead, use __DECONST
(argh!) and rearrange the prototypes to make it clear that _umtx_op()
is not deprecated.
2006-03-28 14:32:38 +00:00
Dag-Erling Smørgrav
b3efbabe87 The undocumented and deprecated system call _umtx_op() takes two pointer
arguments.  The first one is never used (all callers pass in 0); the
second is sometimes used to pass in a struct timespec * which is used as
a timeout and never modified.  Constify that argument so callers can pass
a const struct timespec * without jumping through hoops.
2006-03-28 09:18:34 +00:00
Alan Cox
7c8dcf2def Use NET_LOCK_GIANT() and VFS_LOCK_GIANT() instead of unconditionally
acquiring Giant in kern_sendfile().

Guard against the forced reclamation of a vnode in kern_sendfile().

Discussed with: jeff
Reviewed by: tegge
MFC after: 3 weeks
2006-03-27 04:23:16 +00:00
Robert Watson
63b01ffd34 Add a sysctl, regression.sonewconn_earlytest, which when options
REGRESSION is enabled, allows user space to dictate that sonewconn()
should skip it's "skip the hard work" check to see if the listen
queue is full, and instead proceed with allocation of a socket and
trimming of the overflowed queue.  This makes it easier to test the
queue overflow logic.

MFC after:	1 month
2006-03-26 22:44:37 +00:00
Joseph Koshy
49874f6ea3 MFP4: Support for profiling dynamically loaded objects.
Kernel changes:

  Inform hwpmc of executable objects brought into the system by
  kldload() and mmap(), and of their removal by kldunload() and
  munmap().  A helper function linker_hwpmc_list_objects() has been
  added to "sys/kern/kern_linker.c" and is used by hwpmc to retrieve
  the list of currently loaded kernel modules.

  The unused `MAPPINGCHANGE' event has been deprecated in favour
  of separate `MAP_IN' and `MAP_OUT' events; this change reduces
  space wastage in the log.

  Bump the hwpmc's ABI version to "2.0.00".  Teach hwpmc(4) to
  handle the map change callbacks.

  Change the default per-cpu sample buffer size to hold
  32 samples (up from 16).

  Increment __FreeBSD_version.

libpmc(3) changes:

  Update libpmc(3) to deal with the new events in the log file; bring
  the pmclog(3) manual page in sync with the code.

pmcstat(8) changes:

  Introduce new options to pmcstat(8): "-r" (root fs path), "-M"
  (mapfile name), "-q"/"-v" (verbosity control).  Option "-k" now
  takes a kernel directory as its argument but will also work with
  the older invocation syntax.

  Rework string handling in pmcstat(8) to use an opaque type for
  interned strings.  Clean up ELF parsing code and add support for
  tracking dynamic object mappings reported by a v2.0.00 hwpmc(4).

  Report statistics at the end of a log conversion run depending
  on the requested verbosity level.

Reviewed by:	jhb, dds (kernel parts of an earlier patch)
Tested by:	gallatin (earlier patch)
2006-03-26 12:20:54 +00:00
David Xu
dbbccfe923 1. Move code for scanning pending I/O from aio_fsync to aio_aqueue,
it has less overhead.
2. Avoid scheduling task if maximum number of I/O threads is reached.
2006-03-24 00:50:06 +00:00
David Xu
177e987e63 Regenerate. 2006-03-23 08:48:37 +00:00
David Xu
99eee864ad Implement aio_fsync() syscall. 2006-03-23 08:46:42 +00:00
Pawel Jakub Dawidek
96c0381f5c Destroy "bip" bio in error case.
Found by:	Coverity Prevent analysis tool
Coverity ID:	795
MFC after:	3 days
2006-03-22 00:42:41 +00:00
Jeff Roberson
bacb51fb67 - Remove explicit giant acquires and replace it with VFS_LOCK_GIANT.
Sponsored by:	Isilon Systems, Inc.
2006-03-22 00:00:05 +00:00
Jeff Roberson
77c79550af - Remove explicit calls to lock and unlock Giant and replace them with
VFS_LOCK_GIANT/VFS_UNLOCK_GIANT calls.  This completely removes Giant
   acquisition in the syscall path for ffs.

Bug fix to kern_fhstatfs from:	Todd Miller <Todd.Miller@sparta.com>
Sponsored by:	Isilon Systems, Inc.
2006-03-21 23:58:37 +00:00
David Xu
bf1a322061 Rethink it a bit, if there is a STOP flag, don't bother to resume other
threads.
2006-03-21 10:05:15 +00:00
David Xu
568b4ebbcc Because JOB control has higher priority than single threading in
thread_suspend_check(), call thread_stopped() to report SIGCHLD
if there is JOB control in progress.
2006-03-21 08:41:15 +00:00
Tor Egge
41d7199b9d Remove unused leaked debug function prototype. 2006-03-21 01:04:24 +00:00
Christian S.J. Peron
2ed4894a26 Restore fd optimization with a few minor tweaks, to quote tegge:
"fdinit() fails to initialize newfdp->fd_fd.fd_lastfile to -1.  This breaks
fdcopy() which will incorrectly set newfdp->fd_freefile to 1 if no files are
open and the last file descriptor marked as unused for fdp was 0.  This later
causes descriptor 0 to be unavailable in newfdp when the optimization is
enabled.

When the last file descriptor previously marked as used is nonzero and marked
as unused, fdunused() incorrectly sets fdp->fd_lastfile to fd - 1 due to
fd_last_used() returning (size - 1).  This hides the problem that breaks the
optimization."

This allows us to keep the optimization, while un-breaking it.

This is a RELENG_6 candidate.

PR:		kern/87208
MFC after:	1 week
Submitted by:	tegge
2006-03-20 00:13:47 +00:00
Tor Egge
7de3839d0d Let snapshots make a copy of old contents for all buffers taking part in a
cluster instead of just the first buffer.

Delay buf_start() calls until snapshots have a copy of old content.

PR:		kern/93942
2006-03-19 21:43:36 +00:00
Tor Egge
d50ef66d03 Don't call vn_finished_write() if vn_start_write() failed. 2006-03-19 20:43:07 +00:00
Jeff Roberson
e44270a781 - Correct an assert in vop_rename_pre. fdvp may be locked if it is either
the target directory or file.  This case should fail in the filesystem
   anyway and perhaps kern_rename() should catch it.

Sponsored by:	Isilon Systems, Inc.
2006-03-19 20:14:46 +00:00
Christian S.J. Peron
30bacc08e0 Back out fd optimization introduced in revision 1.280 as it appears to be
really breaking things. Simple "close(0); dup(fd)" does not return descriptor
"0" in some cases. Further, this change also breaks some MAC interactions with
mac_execve_will_transition().  Under certain circumstances, fdcheckstd() can
be called in execve(2) causing an assertion that checks to make sure that
stdin, stdout and stderr reside at indexes 0, 1 and 2 in the process fd table
to fail, resulting in a kernel panic when INVARIANTS is on.

This should also kill the "dup(2) regression on 6.x" show stopper item on the
6.1-RELEASE TODO list.

This is a RELENG_6 candidate.

PR:		kern/87208
Silence from:	des
MFC after:	1 week
2006-03-18 23:27:21 +00:00
Robert Watson
4d4b555efa Modify UNIX domain sockets to guarantee, and assume, that so_pcb is always
defined for an in-use socket.  This allows us to eliminate countless tests
of whether so_pcb is non-NULL, eliminating dozens of error cases.  For
now, retain the call to sotryfree() in the uipc_abort() path, but this
will eventually move to soabort().

These new assumptions should be largely correct, and will become more so
as the socket/pcb reference model is fixed.  Removing the notion that
so_pcb can be non-NULL is a critical step towards further fine-graining
of the UNIX domain socket locking, as the so_pcb reference no longer
needs to be protected using locks, instead it is a property of the socket
life cycle.
2006-03-17 13:52:57 +00:00
Alan Cox
41634e2e8d Correct two vm object reference leaks in error cases.
Submitted by: davidxu
2006-03-16 08:51:59 +00:00
Robert Watson
92c07a345e Change soabort() from returning int to returning void, since all
consumers ignore the return value, soabort() is required to succeed,
and protocols produce errors here to report multiple freeing of the
pcb, which we hope to eliminate.
2006-03-16 07:03:14 +00:00
David Xu
795a11d049 Fix a race between file operations and rfork(RFCFDG) by parking
all other threads at user boundary, the race can crash kernel
under stress testing.

Reviewed by: jhb
MFC after: 3 days
2006-03-15 23:24:14 +00:00
Sam Leffler
47e2996e8b promote fast ipsec's m_clone routine for public use; it is renamed
m_unshare and the caller can now control how mbufs are allocated

Reviewed by:	andre, luigi, mlaier
MFC after:	1 week
2006-03-15 21:11:11 +00:00
Poul-Henning Kamp
590487078f Disable the "cputick increased..." message now that the dust has settled. 2006-03-15 20:22:32 +00:00
Alexander Leidinger
a8f47039c7 Fix memory leak introduced in previous revision.
Discussed with:	phk
2006-03-15 19:23:08 +00:00
Robert Watson
93709ad0be As with socket consumer references (so_count), make sofree() return
without GC'ing the socket if a strong protocol reference to the socket
is present (SS_PROTOREF).
2006-03-15 12:45:35 +00:00
David Xu
e170bfda56 1. Count last time slice, this intends to fix
"calcru: runtime went backwards" bug for threaded process.
2. Add comment about possible logical problem with scheduler.

MFC after: 3 days
2006-03-14 04:00:21 +00:00
John-Mark Gurney
45e0d0aa30 spell pdata correctly, we now will only dump maxlen of each mbuf in the
chain, instead of the entire mbuf...  This should probably be reworked
so that it prints at max maxlen bytes for the entire chain...
2006-03-14 00:22:10 +00:00
Ruslan Ermilov
936ddefcd6 The mount(8) manpage says: "In case of conflicting options being
specified, the rightmost option takes effect."  Fix code to obey
this.  This makes e.g. "mount -r /usr" or "mount -ar" actually
mount file systems read-only.
2006-03-13 14:58:37 +00:00
David Xu
28e989e9ca Remove unused code. 2006-03-13 10:37:25 +00:00
Christian S.J. Peron
a19fd0e766 Make sure that we are adding a path token to the audit record in open(2).
Do this by making sure we are using the AUDITVNODE1 mask in the namei flags.

Obtained from:	TrustedBSD Project
2006-03-11 17:14:05 +00:00
Poul-Henning Kamp
272601f8f0 Go over calcru and friends once more.
Reintroduce the monotonicity for the normal case and make the two
special cases behave in what is belived to be the most sensible fasion.
2006-03-11 10:48:19 +00:00
Tor Egge
ca2fa80767 Block secondary writes while expunging active unlinked files.
Fix detection of active unlinked files by checking VI_OWEINACT and
VI_DOINGINACT in addition to v_usecount.

Defer inactive handling for unlinked files if the file system is mostly
suspended (secondary writes being blocked).

Perform deferred inactive handling after the file system is resumed.
2006-03-11 01:08:37 +00:00
Jung-uk Kim
0d84d9ebb5 Implement printf 'X' conversion for both libstand and kernel. 2006-03-09 22:37:34 +00:00