Commit Graph

8302 Commits

Author SHA1 Message Date
csjp
17aca298fa Add much needed descriptions for a number of the IPC related sysctl OIDs.
This information will be very useful for people who are tuning applications
which have a dependence on IPC mechanisms.

The following OIDs were documented:

Message queues:
 kern.ipc.msgmax
 kern.ipc.msgmni
 kern.ipc.msgmnb
 kern.ipc.msgtlq
 kern.ipc.msgssz
 kern.ipc.msgseg

Semaphores:
 kern.ipc.semmap
 kern.ipc.semmni
 kern.ipc.semmns
 kern.ipc.semmnu
 kern.ipc.semmsl
 kern.ipc.semopm
 kern.ipc.semume
 kern.ipc.semusz
 kern.ipc.semvmx
 kern.ipc.semaem

Shared memory:
 kern.ipc.shmmax
 kern.ipc.shmmin
 kern.ipc.shmmni
 kern.ipc.shmseg
 kern.ipc.shmall
 kern.ipc.shm_use_phys
 kern.ipc.shm_allow_removed
 kern.ipc.shmsegs

These new descriptions can be viewed using sysctl -d

PR:		kern/65219
Submitted by:	Dan Nelson <dnelson at allantgroup dot com> (modified)
No objections:	developers@
Descriptions reviewed by: gnn
MFC after:	1 week
2005-02-12 01:22:39 +00:00
sobomax
eeb5ed79cb Add SIGTHR (32) into list of signals permitted to be delivered to the
suid application. The problem is that Linux applications using old Linux
threads (pre-NPTL) use signal 32 (linux SIGRTMIN) for communication between
thread-processes. If such an linux application is installed suid or sgid
and security.bsd.conservative_signals=1 (default), then permission will be
denied to send such a signal and the application will freeze.

I believe the same will be true for native applications that use libthr,
since libthr uses SIGTHR for implementing conditional variables.

PR:		72922
Submitted by:	Andriy Gapon <avg@icyb.net.ua>
MFC after:	2 weeks
2005-02-11 14:02:42 +00:00
iedowse
6df119b425 When processing a timeout() callout and returning it to the free
list, set `curr_callout' to NULL. This ensures that we won't attempt
to cancel the current callout if the original callout structure
gets recycled while we wait to acquire Giant.

This is reported to fix an intermittent syscons problem that was
introduced by revision 1.96.
2005-02-11 00:14:00 +00:00
bmilekic
885ba93847 Optimize the way reference counting is performed with Mbufs. We
do not need to perform an extra memory fetch in the Packet (Mbuf+Cluster)
constructor to initialize the reference counter anymore.  The reference
counts are located in a separate memory region (in the slab header,
because this zone is UMA_ZONE_REFCNT), so the memory fetch resulted very
often in a cache miss.  Additionally, and perhaps more significantly,
optimize the free mbuf+cluster (packet) case, which is very common, to
no longer require an atomic operation on free (to verify the reference
counter) if the reference on the cluster has never been increased (also
very common).  Reduces an atomic on mbuf free on average.

Original patch submitted by: Gerrit Nagelhout <gnagelhout@sandvine.com>
2005-02-10 22:23:02 +00:00
cperciva
e1f5bc1828 Declare "cnt" (a number of bytes to read or write) as an "ssize_t", not
as a "long" in dofileread() and dofilewrite().

Discussed with:	jhb
2005-02-10 20:19:17 +00:00
phk
5dd8d30575 Make various vnode related functions static 2005-02-10 12:28:58 +00:00
phk
dc9f809dd5 Make some file/filedesc related functions static 2005-02-10 12:27:58 +00:00
phk
40bcad426b Make various mountpoint related functions static. 2005-02-10 12:25:38 +00:00
phk
9fbd4a503d Make a SYSCTL_NODE static 2005-02-10 12:23:29 +00:00
phk
bbe97a9d2e MD5Pad() should never have been exposed. 2005-02-10 12:20:42 +00:00
phk
3435220961 make cluster_callback() static 2005-02-10 12:17:48 +00:00
phk
6d9a6aacc4 Make a SYSCTL_NODE and a mutex static 2005-02-10 12:16:42 +00:00
phk
82e926dbf2 Make another bunch of SYSCTL_NODEs static 2005-02-10 12:16:08 +00:00
phk
1de366179d Make a bunch of SYSCTL_NODEs static. 2005-02-10 12:15:49 +00:00
phk
13100c3699 Make a bunch of malloc types static.
Found by:	src/tools/tools/kernxref
2005-02-10 12:02:37 +00:00
phk
5d1652b89d Don't pass NULL to vprint() 2005-02-10 08:55:08 +00:00
jeff
480b60be3c - Add more information to the getnewbuf() recycling KTR.
Sponsored by:	Isilon Systems, Inc.
2005-02-10 02:22:56 +00:00
jeff
06f7a532e9 - Add a new assert in the getnewvnode(). Assert that the usecount is still
0 to detect getnewvnode() races.
 - Add the vnode address to a few panics near by to help in debugging.

Sponsored by:	Isilon Systems, Inc.
2005-02-08 23:27:10 +00:00
jeff
ede81ae242 - Remove an invalid KASSERT added in recent background write reshuffling.
Sponsored by:	Isilon Systems, Inc.
2005-02-08 23:25:08 +00:00
cperciva
30beb7d8e4 Add a new sysctl, "security.jail.chflags_allowed", which controls the
behaviour of chflags within a jail.  If set to 0 (the default), then a
jailed root user is treated as an unprivileged user; if set to 1, then
a jailed root user is treated the same as an unjailed root user.

This is necessary to allow "make installworld" to work inside a jail,
since it attempts to manipulate the system immutable flag on certain
files.

Discussed with:	csjp, rwatson
MFC after:	2 weeks
2005-02-08 21:31:11 +00:00
phk
af5ef3f262 Background writes are entirely an FFS/Softupdates thing.
Give FFS vnodes a specific bufwrite method which contains all the
background write stuff and then calls into the default bufwrite()
for the rest of the job.

Remove all the background write related stuff from the normal bufwrite.

This drags the softdep_move_dependencies() back into FFS.

Long term, it is worth looking at simply copying the data into
allocated memory and issuing the bio directly and not create the
"shadow buf" in the first place (just like copy-on-write is done
in snapshots for instance).  I don't think we really gain anything
but complexity from doing this with a buf.
2005-02-08 20:29:10 +00:00
phk
a75e6a7110 Drag another softupdates tentacle back into FFS: Now that FFS's
vop_fsync is separate from the internal use we can do the full job
there.
2005-02-08 18:09:11 +00:00
njl
cc21fc94e9 Maxunit is inclusive so fix off-by-one in previous commit. 2005-02-08 18:03:17 +00:00
njl
21180427d3 Update device_find_child(9) to return the first matching child if unit
is set to -1.

Reviewed by:	dfr, imp
2005-02-08 18:00:29 +00:00
jhb
60bd53b164 Implement a kern_pathconf() wrapper for pathconf() which can take the
filename from either a user space or a kernel space pointer.
2005-02-07 21:46:43 +00:00
jhb
221a30b414 If the pointer to the new itimerval is NULL in kern_setitimer(), just
read the old value via kern_getitimer().
2005-02-07 21:45:48 +00:00
jhb
71c05d27c0 - Tweak kern_msgctl() to return a copy of the requested message queue id
structure in the struct pointed to by the 3rd argument for IPC_STAT and
  get rid of the 4th argument.  The old way returned a pointer into the
  kernel array that the calling function would then access afterwards
  without holding the appropriate locks and doing non-lock-safe things like
  copyout() with the data anyways.  This change removes that unsafeness and
  resulting race conditions as well as simplifying the interface.
- Implement kern_foo wrappers for stat(), lstat(), fstat(), statfs(),
  fstatfs(), and fhstatfs().  Use these wrappers to cut out a lot of
  code duplication for freebsd4 and netbsd compatability system calls.
- Add a new lookup function kern_alternate_path() that looks up a filename
  under an alternate prefix and determines which filename should be used.
  This is basically a more general version of linux_emul_convpath() that
  can be shared by all the ABIs thus allowing for further reduction of
  code duplication.
2005-02-07 18:44:55 +00:00
jhb
2cfc33f9b1 Various and sundry style fixes. 2005-02-07 18:38:29 +00:00
phk
628952636c Access vmobject via the bufobj instead of the vnode 2005-02-07 10:04:06 +00:00
phk
e0b8a475a8 VOP_DESTROYVOBJECT() is no more. 2005-02-07 09:26:58 +00:00
phk
cf44cd72d6 Remove vop_stddestroyvobject() 2005-02-07 09:26:39 +00:00
phk
d2bbb620e9 Don't call VOP_DESTROYVOBJECT(), trust that VOP_RECLAIM() did what
was necessary.
2005-02-07 07:48:03 +00:00
phk
720f0b5181 Add a missing prefix to a struct field for consistency. 2005-02-07 07:40:39 +00:00
iedowse
885a9694bc Add a mechanism for associating a mutex with a callout when the
callout is first initialised, using a new function callout_init_mtx().
The callout system will acquire this mutex before calling the callout
function and release it on return.

In addition, the callout system uses the mutex to avoid most of the
complications and race conditions inherent in asynchronous timer
facilities, so mutex-protected callouts have much simpler semantics.
As long as the mutex is held when invoking callout_stop() or
callout_reset(), then these functions will guarantee that the callout
will be stopped, even if softclock() had already begun to process
the callout.

Existing Giant-locked callouts will automatically pick up the new
race-free semantics. This should close a number of race conditions
in the USB code and probably other areas of the kernel too.

There should be no change in behaviour for "MP-safe" callouts; these
still need to use the techniques mentioned in timeout(9) to avoid
race conditions.
2005-02-07 02:47:33 +00:00
njl
2163789671 Add support for relative cpufreq drivers. Such drivers modulate clock
frequency as a percentage of the base rate and do not change the base
rate directly.  The cpufreq framework combines these with absolute drivers
to produce synthesized levels made of one or more settings.
2005-02-06 21:08:35 +00:00
jeff
0a084a15e2 - Don't release BKGRDINPROG until after we've bufdone'd the copy.
Sponsored by:	Isilon Systems, Inc.
2005-02-05 01:26:14 +00:00
jeff
ef8ea3a09d - Add ke_runq == NULL to the conditions which will cause us to abort
adjusting timeshare loads in sched_class().  This is only important if
   the thread has never run, otherwise the state checks should work as
   expected.
2005-02-04 17:22:46 +00:00
ssouhlal
3dcdb56fbe Set the scheduling class of the idle threads to PRI_IDLE.
While there, set their priority with sched_prio() instead of changing it
'by hand'.

Reviewed by:	jhb
Approved by:	grehan (mentor)
2005-02-04 06:16:05 +00:00
njl
ed695e1533 Add the cpufreq framework. This code manages multiple drivers and presents
a unified kernel and user interface for controlling cpu frequencies.
2005-02-04 05:39:19 +00:00
njl
09a005a215 Add an interface for cpufreq. The kernel interface lets other drivers
select the CPU frequency level (say for cooling).  The driver interface
allows hardware drivers to announce themselves as capable of adjusting
an individual frequency setting.
2005-02-04 05:38:30 +00:00
pjd
0609f60831 - Move gets() function to libkern (I want to use it outside vfs_mount.c).
- Add buffer size limitations (overflow will not be possible anymore).
- Add 'visible' option, which will allow for passphrase reading in the
  future.
- Remove special treatment of '@' and '#', those two are only confusing.

Discussed with:	rwatson
MFC after:	2 weeks
2005-02-03 15:10:58 +00:00
jeff
d2a1c9973a - Correct a typo in kern_rename. tvfslocked should be initialized from
tond and not fromnd.  This could lead us to leak Giant, or unlock it
   twice, depending on the filesystems involved.  renames within a single
   filesystem would not have caused any problems.

Sponsored by:	Isilon Systems, Inc.
2005-02-02 17:17:15 +00:00
jeff
4ab36f5f96 - Or MPSAFE with the correct set of flags in stat(). This affected only
the LOOKUP_SHARED case.

Spotted by:	jhb
2005-02-01 23:43:46 +00:00
bmilekic
5cdce7d092 Update copyright, remove "all rights reserved" (since they are not
all reserved, as the lisence makes clear), and strike the third clause
(now this is a 2-clause liberal BSDL as are the rest of files I hold
copyright over).
2005-02-01 03:17:52 +00:00
sobomax
68d0bd2186 Extend kern_sendit() to take another enum uio_seg argument, which specifies
where the buffer to send lies and use it to eliminate yet another stackgap
in linuxlator.

MFC after:	2 weeks
2005-01-30 07:20:36 +00:00
sobomax
c1d75210e2 Fix build on AMD64 (and probably other arches where size_t != int).
Submitted by:	Tinderbox
MFC after:	2 weeks
2005-01-30 06:43:17 +00:00
rwatson
464d7f1e2a Fix spelling of integer in a comment.
Beady eyes:	ceri
2005-01-30 00:31:19 +00:00
sobomax
5fd43d6c79 Grrr, this committer needs to have a sleep. Remove lines from the previous
delta not intended for public consumption.

MFC after:	2 weeks
2005-01-29 23:51:05 +00:00
sobomax
bc473990f5 Fix small non-conformance introduced in the previous commit: execve() is
expected to return ENAMETOOLONG, not E2BIG if first argument doesn't
fit into {PATH_MAX} bytes.

MFC after:	2 weeks
2005-01-29 23:47:36 +00:00
sobomax
f489acaf0f o Split out kernel part of execve(2) syscall into two parts: one that
copies arguments into the kernel space and one that operates
  completely in the kernel space;

o use kernel-only version of execve(2) to kill another stackgap in
  linuxlator/i386.

Obtained from:  DragonFlyBSD (partially)
MFC after:      2 weeks
2005-01-29 23:12:00 +00:00
rwatson
1c7b501265 Correct a minr whitespace inconsistency introduced in revision 1.159:
add a tab between #define and DF_REBID instead of a space.
2005-01-29 22:04:30 +00:00
phk
237e3ac2e9 Use MAXMINOR 2005-01-29 16:50:04 +00:00
phk
ba0e01d2d8 Typo. 2005-01-29 15:10:30 +00:00
phk
9072f89ba1 Add MAXMINOR #define, we should have had this long time ago.
Add minor2unit() in addition to dev2unit() and unit2minor().

If it wasn't such a hazzle we should redefine minor numbers in
the kernel without the gap for the major number, but it's not worth
the bother (yet).
2005-01-29 15:07:13 +00:00
phk
cd117518d7 In 1.276 of kern/subr_trap.c I introduced a mechanism for delaying
a process return to userspace if it had pending GEOM events.

We need to have the same check in the exit pass to catch the case
where a GEOM related filedescriptor is not explicitly closed by
the process.

Bumped into by:	people using dd(1) to build releases, nanobsd etc.
2005-01-29 14:03:41 +00:00
jeff
da8e6b049d - Don't drop the wref on the bufobj until after bufdone() has completed.
Without this, threads waiting in bufobj_wwait() may wakeup prior to
   bufdone() completing.

Sponsored by:	Isilon Systems, Inc.
2005-01-28 17:48:58 +00:00
phk
4f73d0b6fc Remove unused argument to vrecycle() 2005-01-28 13:08:21 +00:00
phk
f8b1ba904f Integrate vclean() into vgonel().
Various associated polishing.
2005-01-28 13:00:03 +00:00
phk
eaf84397bb Remove register keyword 2005-01-28 12:39:10 +00:00
phk
9b1a8ec7bf Move the contents of vop_stddestroyvobject() to the new vnode_pager
function vnode_destroy_vobject().

Make the new function zero the vp->v_object pointer so we can tell
if a call is missing.
2005-01-28 08:56:48 +00:00
jeff
15397d00ae - Regen 2005-01-26 02:29:18 +00:00
jeff
1111e806e3 - Struct mount is not yet locked well enough to allow
mount/nmount/unmount to run without Giant.  Mark them as STD here.
2005-01-26 02:28:43 +00:00
sobomax
896df27c1a Split out kernel side of msgctl(2) into two parts: the first that pops data
from the userland and pushes results back and the second which does
actual processing. Use the latter to eliminate stackgap in the linux wrapper
of that syscall.

MFC after:      2 weeks
2005-01-26 00:46:36 +00:00
sobomax
35611d3699 Split out kernel side of {get,set}itimer(2) into two parts: the first that
pops data from the userland and pushes results back and the second which does
actual processing. Use the latter to eliminate stackgap in the linux wrappers
of those syscalls.

MFC after:	2 weeks
2005-01-25 21:28:28 +00:00
jeff
b28fe2715d - Include LK_INTERLOCK in LK_EXTFLG_MASK so that it makes its way into
acquire.
 - Correct the condition that causes us to skip apause() to only require
   the presence of LK_INTERLOCK.

Sponsored by:	Isilon Systems, Inc.
2005-01-25 16:06:05 +00:00
jeff
c9f0aca772 - Make lf_print static and move its prototype into kern_lockf.c
- Protect all of the advlock code with Giant as some filesystems
   may not be entering with Giant held now.

Sponsored by:	Isilon Systems, Inc.
2005-01-25 10:15:26 +00:00
phk
a4f3b3f609 Previously a read of zero bytes got handled in devfs:vop_read() but I
missed that when the vnode bypass was introduced.

Deal with zero length transfers before we even get to fo_ops->fo_read().

Found by:	Slawa Olhovchenkov <slwzxy.spb.ru@zxy.spb.ru>
PR:	75758
2005-01-25 09:15:32 +00:00
phk
32b3eaa1c2 Take VOP_GETVOBJECT() out to pasture. We use the direct pointer now. 2005-01-25 00:42:16 +00:00
phk
796d435574 Don't use VOP_GETVOBJECT, use vp->v_object directly. 2005-01-25 00:40:01 +00:00
phk
d0bbbd0881 Kill VOP_CREATEVOBJECT(), it is now the responsibility of the filesystem
for a given vnode to create a vnode_pager object if one is needed.
2005-01-25 00:12:24 +00:00
phk
716e67e429 Don't call VOP_CREATEVOBJECT(), it's the responsibility of the
filesystem which owns the vnode.
2005-01-24 23:53:54 +00:00
phk
1d63b12e22 Eliminate the constant flags argument to vclean() 2005-01-24 22:22:02 +00:00
phk
ba85bee696 Move the body of vop_stdcreatevobject() over to the vnode_pager under
the name Sande^H^H^H^H^Hvnode_create_vobject().

Make the new function take a size argument which removes the need for
a VOP_STAT() or a very pessimistic guess for disks.

Call that new function from vop_stdcreatevobject().

Make vnode_pager_alloc() private now that its only user came home.
2005-01-24 21:21:59 +00:00
phk
730f6f1d85 Save a line by unlocking before we test. 2005-01-24 14:13:24 +00:00
phk
dc1cfea3cd Change vprint() to vn_printf() which takes varargs.
Add #define for vprint() to call vn_printf().
2005-01-24 13:58:08 +00:00
phk
d5c135375c Kill the VV_OBJBUF and test the v_object for NULL instead. 2005-01-24 13:13:57 +00:00
phk
e5b74a2850 Fix a list corruption issue in cloning device management using the
western strategy ("allocate first, ask questions later") so we can
extend the devmtx coverage to the clone list.
2005-01-24 12:44:56 +00:00
glebius
d084122f36 - Convert so_qlen, so_incqlen, so_qlimit fields of struct socket from
short to unsigned short.
- Add SYSCTL_PROC() around somaxconn, not accepting values < 1 or > U_SHRTMAX.

Before this change setting somaxconn to smth above 32767 and calling
listen(fd, -1) lead to a socket, which doesn't accept connections at all.

Reviewed by:	rwatson
Reported by:	Igor Sysoev
2005-01-24 12:20:21 +00:00
jeff
62682e3e74 - Regen for recent vfs syscall changes.
Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:50:42 +00:00
jeff
00ad79e381 - Change all VFS syscalls to MSTD as they all manually deal with giant
or the appropriate filesystem locks.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:49:26 +00:00
jeff
39bf4e6e67 - Add CTR calls to trace the lifecycle of a buffer.
- Remove some KASSERTs which are invalid if the appropriate lock is
   not held.
 - Slightly restructure bremfree() so that it is more sane.
 - Change the flush code in bdwrite() to avoid acquiring a mutex
   whenever possible.
 - Change the flush code in bdwrite() to avoid holding the bufobj mutex
   while calling buf_countdeps().  This introduces a lock-order
   relationship with the softdep lock that can not otherwise be resolved.
 - Don't set B_DONE until bufdone() is complete, otherwise another
   processor may believe the buf is done before it is.
 - Only acquire Giant if the caller has set b_iodone.  Don't grab giant
   around normal bufdone() calls.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:47:04 +00:00
jeff
8cb5395678 - Add the tunable and sysctl for the mpsafevfs. It currently defaults
to off.
 - Protect access to mnt_kern_flag with the mointpoint mutex.
 - Remove some KASSERTs which are not legal checks without the appropriate
   locks held.
 - Use VCANRECYCLE() rather than rolling several slightly different
   checks together.
 - Return from vtryrecycle() with a recycled vnode rather than a locked
   vnode.  This simplifies some locking.
 - Remove several GIANT_REQUIRED lines.
 - Add a few KASSERTs to help with INACT debugging.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:41:01 +00:00
jeff
e5940f4bef - Remove GIANT_REQUIRED where giant is no longer required.
Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:33:46 +00:00
jeff
1e3b49c0c4 - Remove GIANT_REQUIRED where it is no longer required.
Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:32:14 +00:00
jeff
dcefe7b06b - Remove GIANT_REQUIRED where giant is no longer required.
- Protect access to mnt_kern_flag with the mountpoint mutex.
 - Use the appropriate nd flags to deal with giant in vn_open_cred().
   We currently determine whether the caller is mpsafe by checking
   for a valid fdidx.  Any caller coming from user-space is now
   mpsafe and supplies a valid fd.  No kenrel callers have been
   converted to mpsafe, so this check is sufficient for now.
 - Use VFS_LOCK_GIANT instead of manual giant acquisition where
   appropriate.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:31:42 +00:00
jeff
ea989897b7 - Protect mnt_kern_flag with the mountpoint's mutex. This is required
to make the suspend related functions mpsafe.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:28:41 +00:00
jeff
e794a01e4e - Acquire and release Giant as we enter and leave filesystems which
require it.
 - Track the status of Giant with the nd flag HASGIANT.
 - Release giant on return of namei() callers are not marked MPSAFE as
   they already own giant.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:27:05 +00:00
jeff
416ef3d9d0 - Change all vfs syscalls to use VFS_LOCK_GIANT(), and MPSAFE nds.
- Move Giant acquisition into the few vfs syscalls that weren't already
   directly acquiring it.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:25:44 +00:00
jeff
dc0d73570e - Simplify the cache locking. The lock order relationship with the
vnode lock is much simpler than I originally thought it would be.
   Now, the cache lock is  always acquired before the vnode lock.
 - Provide some gotos in __getcwd() to simplify the unlocking a bit.
 - Move Giant acquisition down into __getcwd().

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:24:12 +00:00
jeff
13b2f39f55 - Do not use APAUSE if LK_INTERLOCK is set. We lose synchronization
if the lockmgr interlock is dropped after the caller's interlock
   is dropped.
 - Change some lockmgr KTRs to be slightly more helpful.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:20:59 +00:00
jeff
8be2f1a91e - Use VFS_LOCK_GIANT() in place of mtx_lock(&giant), etc.
Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:19:31 +00:00
rwatson
8a8bdb0810 Style cleanup: with removal of mutex operations, we can also remove
{}'s from securelevel_gt() and securelevel_ge().

MFC after:	1 week
2005-01-23 21:11:39 +00:00
rwatson
1d8015ceb7 When reading pr_securelevel from a prison, perform a lockless read,
as it's an integer read operation and the resulting slight race is
acceptable.

MFC after:	1 week
2005-01-23 21:01:00 +00:00
rwatson
293a12c083 When retrieving the current per-jails securelevel for a sysctl read,
don't acquire the prison mutex, as it's an integer read and races
here don't make a difference.

MFC after:	1 week
2005-01-23 20:59:19 +00:00
rwatson
57c91a09d8 When DDB is not defined, don't implement witness_thread_has_locks() and
witness_proc_has_locks(), as they are unused, which results in a compiler
error.  This problem was introduced with the implementation of "show
alllocks".

Spotted by:	Artem Kuchin <matrix at itlegion dot ru>
2005-01-22 21:14:21 +00:00
rwatson
59f1cc6e6e Invoke label initialization, creation, cleanup, and tear-down MAC
Framework entry points for System V IPC shared memory.

Submitted by:	Dandekar Hrishikesh <rishi_dandekar at sbcglobal dot net>
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, SPAWAR, McAfee Research
2005-01-22 19:10:25 +00:00
rwatson
1215571a87 Invoke label initialization, creation, cleanup, and tear-down MAC
Framework entry points for System V IPC semaphores.

Submitted by:	Dandekar Hrishikesh <rishi_dandekar at sbcglobal dot net>
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, SPAWAR, McAfee Research
2005-01-22 19:04:17 +00:00
rwatson
a9307575e8 Invoke label initialization, creation, cleanup, and tear-down MAC
Framework entry points for System V IPC message queues.

Submitted by:	Dandekar Hrishikesh <rishi_dandekar at sbcglobal dot net>
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, SPAWAR, McAfee Research
2005-01-22 18:51:43 +00:00
bmilekic
da7116f3ac Bring in MemGuard, a very simple and small replacement allocator
designed to help detect tamper-after-free scenarios, a problem more
and more common and likely with multithreaded kernels where race
conditions are more prevalent.

Currently MemGuard can only take over malloc()/realloc()/free() for
particular (a) malloc type(s) and the code brought in with this
change manually instruments it to take over M_SUBPROC allocations
as an example.  If you are planning to use it, for now you must:

	1) Put "options DEBUG_MEMGUARD" in your kernel config.
	2) Edit src/sys/kern/kern_malloc.c manually, look for
	   "XXX CHANGEME" and replace the M_SUBPROC comparison with
	   the appropriate malloc type (this might require additional
	   but small/simple code modification if, say, the malloc type
	   is declared out of scope).
	3) Build and install your kernel.  Tune vm.memguard_divisor
	   boot-time tunable which is used to scale how much of kmem_map
	   you want to allott for MemGuard's use.  The default is 10,
	   so kmem_size/10.

ToDo:
	1) Bring in a memguard(9) man page.
	2) Better instrumentation (e.g., boot-time) of MemGuard taking
	   over malloc types.
	3) Teach UMA about MemGuard to allow MemGuard to override zone
	   allocations too.
	4) Improve MemGuard if necessary.

This work is partly based on some old patches from Ian Dowse.
2005-01-21 18:09:17 +00:00
cperciva
933b3f52b0 Make "c->c_func = NULL" conditional on CALLOUT_LOCAL_ALLOC in both
places where it occurs, not just one. :-)

Pointed out by:	glebius
Pointy had to:	cperciva
2005-01-19 21:15:58 +00:00
cperciva
958e0cd9a0 Make "c->c_func = NULL" conditional on the CALLOUT_LOCAL_ALLOC flag,
i.e., only clear c->c_func if the callout c is being used via the old
timeout(9) interface.

Requested by:	glebius
2005-01-19 20:34:46 +00:00
cperciva
8526221cef Clarify the description of the callout_active() macro: It is cleared by
callout_stop, callout_drain, and callout_deactivate, but is not
automatically cleared when a callout returns.
2005-01-19 19:46:35 +00:00
ps
155c196d05 move kern_nanosleep to sys/syscallsubr.h
Requested by:	jhb
2005-01-19 18:09:50 +00:00
ps
87f1a6a1a6 Add a 32bit syscall wrapper for modstat
Obtained from:	Yahoo!
2005-01-19 17:53:06 +00:00
ps
db53196a48 - rename nanosleep1 to kern_nanosleep
- Add a 32bit syscall entry for nanosleep

Reviewed by:	peter
Obtained from:	Yahoo!
2005-01-19 17:44:59 +00:00
imp
4a464d3b43 Introduce bus_free_resource. It is a convenience function which wraps
bus_release_resource by grabbing the rid from the resource.
2005-01-19 06:52:19 +00:00
davidxu
b3a53fc0e6 Revert my previous errno hack, that is certainly an issue,
and always has been, but the system call itself returns
errno in a register so the problem is really a function of
libc, not the system call.

Discussed with : Matthew Dillion <dillon@apollo.backplane.com>
2005-01-18 13:53:10 +00:00
phk
220b6a2414 Detect sign-extension bugs in the ioctl(2) command argument: Truncate
to 32 bits and print warning.
2005-01-18 07:37:05 +00:00
silby
ce62b5450e Rearrange the kninit calls for both directions of a pipe so that
they both happen before pipe backing allocation occurs.  Previously,
a pipe memory shortage would cause a panic due to a KNOTE call
on an uninitialized si_note.

Reported by:	Peter Holm
MFC after:	1 week
2005-01-17 07:56:28 +00:00
phk
d3b1b2cc99 Fix a bug I introduced in 1.561 which has caused considerable filesystem
unhappiness lately.

As far as I can tell, no files that have made it safely to disk
have been endangered, but stuff in transit has been in peril.

Pointy hat:	phk
2005-01-16 21:09:39 +00:00
davidxu
8a30514f66 make umtx timeout relative so userland can select different clock type,
e.g, CLOCK_REALTIME or CLOCK_MONOTONIC.
merge umtx_wait and umtx_timedwait into single function.
2005-01-14 13:38:15 +00:00
phk
cc0cbc6b34 Eliminate unused and unnecessary "cred" argument from vinvalbuf() 2005-01-14 07:33:51 +00:00
phk
3760addae2 Ditch vfs_object_create() and make the callers call VOP_CREATEVOBJECT()
directly.
2005-01-13 12:25:19 +00:00
phk
4d9781b3db Change the generated VOP_ macro implementations to improve type checking
and KASSERT coverage.

After this check there is only one "nasty" cast in this code but there
is a KASSERT to protect against the wrong argument structure behind
that cast.

Un-inlining the meat of VOP_FOO() saves 35kB of text segment on a typical
kernel with no change in performance.

We also now run the checking and tracing on VOP's which have been layered
by nullfs, umapfs, deadfs or unionfs.

    Add new (non-inline) VOP_FOO_AP() functions which take a "struct
    foo_args" argument and does everything the VOP_FOO() macros
    used to do with checks and debugging code.

    Add KASSERT to VOP_FOO_AP() check for argument type being
    correct.

    Slim down VOP_FOO() inline functions to just stuff arguments
    into the struct foo_args and call VOP_FOO_AP().

    Put function pointer to VOP_FOO_AP() into vop_foo_desc structure
    and make VCALL() use it instead of the current offsetoff() hack.

    Retire vcall() which implemented the offsetoff()

    Make deadfs and unionfs use VOP_FOO_AP() calls instead of
    VCALL(), we know which specific call we want already.

    Remove unneeded arguments to VCALL() in nullfs and umapfs bypass
    functions.

    Remove unused vdesc_offset and VOFFSET().

    Generally improve style/readability of the generated code.
2005-01-13 07:53:01 +00:00
sobomax
34354021da When re-connecting already connected datagram socket ensure to clean
up its pending error state, which may be set in some rare conditions resulting
in connect() syscall returning that bogus error and making application believe
that attempt to change association has failed, while it has not in fact.

There is sockets/reconnect regression test which excersises this bug.

MFC after:	2 weeks
2005-01-12 10:15:23 +00:00
phk
03778ef0a0 Comment out debugging printf which doesn't compile on amd64. 2005-01-12 10:11:31 +00:00
davidxu
7c0e04f42e Let _umtx_op directly return error code rather than from errno because
errno can be tampered potentially by nested signal handle.
Now all error codes are returned in negative value, positive value are
reserved for future expansion.
2005-01-12 05:55:52 +00:00
phk
5a497775d6 Add BO_SYNC() and add a default which uses the secret vnode pointer
and VOP_FSYNC() for now.
2005-01-11 10:43:08 +00:00
phk
437e41e061 More vnode -> bufobj migration. 2005-01-11 10:16:39 +00:00
phk
649a01e1a5 Give flushbuflist() a struct bufv as first argument and avoid home-rolling
TAILQ_FOREACH_SAFE().

Loose the error pointer argument and return any errors the normal way.

Return EAGAIN for the case where more work needs to be done.
2005-01-11 10:01:54 +00:00
phk
da2718f1af Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC().
I'm not sure why a credential was added to these in the first place, it is
not used anywhere and it doesn't make much sense:

	The credentials for syncing a file (ability to write to the
	file) should be checked at the system call level.

	Credentials for syncing one or more filesystems ("none")
	should be checked at the system call level as well.

	If the filesystem implementation needs a particular credential
	to carry out the syncing it would logically have to the
	cached mount credential, or a credential cached along with
	any delayed write data.

Discussed with:	rwatson
2005-01-11 07:36:22 +00:00
davidxu
6f1e481e4d Break out of loop earlier if it is not timeout. 2005-01-08 06:57:46 +00:00
rwatson
afca6a6239 In acct_process(), do a lockless read of acctvp to see if it's NULL
before deciding to do more expensive locking to account for process
exit.  This acceptable minor race avoids two mutex operations in
that highly common case of accounting not being enabled.

MFC after:	2 weeks
2005-01-08 04:45:57 +00:00
rwatson
79b283cecc In kern_wait(), let the compiler copy the rusage structure rather than
an explicit bcopy() -- it probably does a better job.
2005-01-08 04:17:48 +00:00
cperciva
30e2899111 Adjust two of my comments to the new world order: Indent protection in
the first column is performed using /**, not /*-.
2005-01-07 03:25:45 +00:00
imp
f0bf889d0d /* -> /*- for license, minor formatting changes 2005-01-07 02:29:27 +00:00
imp
20280f1431 /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
imp
10be4244ac Expand COPYRIGHT inline, per Matthew Dillon's earlier approval. 2005-01-06 23:34:38 +00:00
davidxu
a25b832d9a Return ETIMEDOUT when thread is timeouted since POSIX thread
APIs expect ETIMEDOUT not EAGAIN, this simplifies userland code a
bit.
2005-01-06 02:08:34 +00:00
jhb
3ec0dff7ad - Move the function prototypes for kern_setrlimit() and kern_wait() to
sys/syscallsubr.h where all the other kern_foo() prototypes live.
- Resort kern_execve() while I'm there.
2005-01-05 22:19:44 +00:00
jhb
65d4d80800 Rework the optimization for spinlocks on UP to be slightly less drastic and
turn it back on.  Specifically, the actual changes are now less intrusive
in that the _get_spin_lock() and _rel_spin_lock() macros now have their
contents changed for UP vs SMP kernels which centralizes the changes.
Also, UP kernels do not use _mtx_lock_spin() and no longer include it.  The
UP versions of the spin lock functions do not use any atomic operations,
but simple compares and stores which allow mtx_owned() to still work for
spin locks while removing the overhead of atomic operations.

Tested on:	i386, alpha
2005-01-05 21:13:27 +00:00
phk
f2581a224f Since we do not support forceful unmount of DEVFS we can do away with
the partially implemented vnode-readoption code in vgonechrl().
2005-01-04 08:49:14 +00:00
marcel
1a8a332194 Regen. 2005-01-03 00:47:23 +00:00
marcel
413099920e uuidgen(2) is MP safe. 2005-01-03 00:45:57 +00:00
imp
58871563b4 Implement device_quiesce. This method means 'you are about to be
unloaded, cleanup, or return ebusy of that's inconvenient.'  The
default module hanlder for newbus will now call this when we get a
MOD_QUIESCE event, but in the future may call this at other times.

This shouldn't change any actual behavior until drivers start to use it.
2004-12-31 20:47:51 +00:00
pjd
09d675e003 Be consistent and always use form 'return (value);' instead of 'return value;'.
We had (before this change) 84 lines where it was style(9)-clean and 15 lines
where it was not.
2004-12-31 14:52:53 +00:00
jhb
ad4d00347b Fix a typo and two whitespace nits. 2004-12-30 22:17:00 +00:00
jhb
3f307e93e3 Rework the interface between priority propagation (lending) and the
schedulers a bit to ensure more correct handling of priorities and fewer
priority inversions:
- Add two functions to the sched(9) API to handle priority lending:
  sched_lend_prio() and sched_unlend_prio().  The turnstile code uses these
  functions to ask the scheduler to lend a thread a set priority and to
  tell the scheduler when it thinks it is ok for a thread to stop borrowing
  priority.  The unlend case is slightly complex in that the turnstile code
  tells the scheduler what the minimum priority of the thread needs to be
  to satisfy the requirements of any other threads blocked on locks owned
  by the thread in question.  The scheduler then decides where the thread
  can go back to normal mode (if it's normal priority is high enough to
  satisfy the pending lock requests) or it it should continue to use the
  priority specified to the sched_unlend_prio() call.  This involves adding
  a new per-thread flag TDF_BORROWING that replaces the ULE-only kse flag
  for priority elevation.
- Schedulers now refuse to lower the priority of a thread that is currently
  borrowing another therad's priority.
- If a scheduler changes the priority of a thread that is currently sitting
  on a turnstile, it will call a new function turnstile_adjust() to inform
  the turnstile code of the change.  This function resorts the thread on
  the priority list of the turnstile if needed, and if the thread ends up
  at the head of the list (due to having the highest priority) and its
  priority was raised, then it will propagate that new priority to the
  owner of the lock it is blocked on.

Some additional fixes specific to the 4BSD scheduler include:
- Common code for updating the priority of a thread when the user priority
  of its associated kse group has been consolidated in a new static
  function resetpriority_thread().  One change to this function is that
  it will now only adjust the priority of a thread if it already has a
  time sharing priority, thus preserving any boosts from a tsleep() until
  the thread returns to userland.  Also, resetpriority() no longer calls
  maybe_resched() on each thread in the group. Instead, the code calling
  resetpriority() is responsible for calling resetpriority_thread() on
  any threads that need to be updated.
- schedcpu() now uses resetpriority_thread() instead of just calling
  sched_prio() directly after it updates a kse group's user priority.
- sched_clock() now uses resetpriority_thread() rather than writing
  directly to td_priority.
- sched_nice() now updates all the priorities of the threads after the
  group priority has been adjusted.

Discussed with:	bde
Reviewed by:	ups, jeffr
Tested on:	4bsd, ule
Tested on:	i386, alpha, sparc64
2004-12-30 20:52:44 +00:00
jhb
e3adf38617 Whitespace fix. 2004-12-30 20:30:58 +00:00
jhb
7b611b0cb2 Stop explicitly touching td_base_pri outside of the scheduler and simply
set a thread's priority via sched_prio() when that is the desired action.
The schedulers will start managing td_base_pri internally shortly.
2004-12-30 20:29:58 +00:00
jhb
a610dc93b7 Call tty_close() at the very end of ttyclose() since otherwise NULL
deferences can occur since tty_close() may end up freeing the tty structure
if it drops the last reference to it.

Glanced at by:	phk
2004-12-30 19:24:49 +00:00
rwatson
44386c7719 Make the sysctls kern.ipc.msgmnb and kern.ipc.msgtql into tunables as
is the case for most other sysctls in the System V IPC message queue
implementation.

PR:		75541
Submitted by:	Sergiy Vyshnevetskiy <serg at vostok dot net>
MFC after:	2 weeks
2004-12-30 13:56:34 +00:00
davidxu
6724b46563 Make umtx_wait and umtx_wake more like linux futex does, it is
more general than previous. It also lets me implement cancelable point
in thread library. Also in theory, umtx_lock and umtx_unlock can
be implemented by using umtx_wait and umtx_wake, all atomic operations
can be done in userland without kernel's casuptr() function.
2004-12-30 02:56:17 +00:00
alc
f5297a787e Eliminate (now) unnecessary acquisition and release of the global page
queues lock.
2004-12-29 04:49:10 +00:00
jhb
38cd373d81 - Up the WITNESS_COUNT macro from 200 to 1024 to support the growing number
of lock types in the kernel.  This results in an increase of witness
  data usage from ~145k to ~280k on i386 for kernels with
  'options WITNESS'.
- Remove the unused witness malloc bucket.

Submitted by:	Michal Mertl mime at traveller dot cz (1)
2004-12-28 21:21:27 +00:00
rwatson
c2459d7b3f Attempt to slightly refine the print out from "show alllocks" -- list
the process and thread numbers/names on the same line rather than on
separate lines, and print the thread pointer not just the tid.
2004-12-27 10:47:08 +00:00
kan
afd7d6f06b Do not vput(9) unlocked vnode and do not VREF it with the sole purpose
of vputting it back immediately.

Complained by:	DEBUG_VFS_LOCKS
2004-12-27 05:17:11 +00:00
jeff
94a75d08c3 - Unintentionally checked in a debugging panic. Remove that. 2004-12-26 23:21:48 +00:00
jeff
862fb71e5e - Remove a 4BSD specific hack since this will work on ULE too. 2004-12-26 22:56:51 +00:00
jeff
ceca9b8f9e - Fix a long standing problem where an ithread would not honor sched_pin().
- Remove the sched_add wrapper that used sched_add_internal() as a backend.
   Its only purpose was to interpret one flag and turn it into an int.  Do
   the right thing and interpret the flag in sched_add() instead.
 - Pass the flag argument to sched_add() to kseq_runq_add() so that we can
   get the SRQ_PREEMPT optimization too.
 - Add a KEF_INTERNAL flag.  If KEF_INTERNAL is set we don't adjust the SLOT
   counts, otherwise the slot counts are adjusted as soon as we enter
   sched_add() or sched_rem() rather than when the thread is actually placed
   on the run queue.  This greatly simplifies the handling of slots.
 - Remove the explicit prevention of migration for ithreads on non-x86
   platforms.  This was never shown to have any real benefit.
 - Remove the unused class argument to KSE_CAN_MIGRATE().
 - Add ktr points for thread migration events.
 - Fix a long standing bug on platforms which don't initialize the cpu
   topology.  The ksg_maxid variable was never correctly set on these
   platforms which caused the long term load balancer to never inspect
   more than the first group or processor.
 - Fix another bug which prevented the long term load balancer from working
   properly.  If stathz != hz we can't expect sched_clock() to be called
   on the exact tick count that we're anticipating.
 - Rearrange sched_switch() a bit to reduce indentation levels.
2004-12-26 22:56:08 +00:00
rwatson
b864ac486c Add "show alllocks" command to DDB, which dumps a list of processes
and threads currently holding sleep mutexes (and spin mutexes for
curthread).  This can be quite useful in looking for a lock condition
summary for a system, as it avoids manually iterating through threads
and processes to find all the interesting locks.

NB: "alllocks" is up there with "lockedvnods" for a bad argument for
show.

MFC after:	2 weeks
2004-12-26 22:52:24 +00:00
jeff
4739ea6908 - Run sched_userret() after thread_userret(). Before, sched_userret() would
lower the priority of the returning thread to a user priority before
   calling into thread_userret() which would call wakeup() which in turn would
   cause the returning thread to eventually context switch rather than
   completing its slice.  Allowing this thread to complete its slice first
   yields a 15% performance improvement in super-smack on my dual opteron with
   4BSD.
2004-12-26 07:30:35 +00:00
jeff
a5c50b5ce9 - Wrap the thread count adjustment in sched_load_add() and sched_load_rem()
so that we may place some ktr entries nearby.
 - Define other KTR_SCHED tracepoints so that we may graph the operation
   of the scheduler.
2004-12-26 00:16:24 +00:00
jeff
d378b46f4e - Remove earlier KTR_ULE tracepoints.
- Define new KTR_SCHED points so that we can graph the operation of the
   scheduler.
2004-12-26 00:15:33 +00:00
jeff
c2b9649e7a - Define KTR points for KTR_SCHED. 2004-12-26 00:14:21 +00:00
davidxu
9476ffbed8 Make _umtx_op() as more general interface, the final parameter needn't be
timespec pointer, every parameter will be interpreted by its opcode.
2004-12-25 13:02:50 +00:00
davidxu
7b03c7ecc4 1. introduce umtx_owner to get an owner of a umtx.
2. add const qualifier to umtx_timedlock and umtx_timedwait.
3. add missing blackets in umtx do_unlock_and_wait.
2004-12-25 12:49:35 +00:00
davidxu
da86559777 Add umtxq_lock/unlock around umtx_signal, fix debug kernel compiling,
let umtx_lock returns EINTR when it returns ERESTART, this lets
userland have chance to back off mtx lock code when needed.
2004-12-24 11:59:20 +00:00
davidxu
85fe4cfe89 1. Fix race condition between umtx lock and unlock, heavy testing
on SMP can explore the bug.
2. Let umtx_wake returns number of threads have been woken.
2004-12-24 11:30:55 +00:00
rwatson
649bb26a69 Assert the sem lock in sem_ref() and sem_rel(), as it is required to
safely manipulate the reference count.
2004-12-23 02:22:47 +00:00
rwatson
e1ce7eb9ce Remove temporary debugging printf that was used to detect the presence
of a race that had previously caused a panic in order to determine if
the fix was for the right problem.  It was.

MFC after:	2 weeks
2004-12-23 01:19:27 +00:00
rwatson
f1732152a7 In sonewconn(), the s/if/while/ change to wait for room at the tail of
the accept queue is a feature, not a bug/issue, so remove the XXXRW
from the comment.
2004-12-23 01:16:21 +00:00
rwatson
5728709bda Remove an XXXRW indicating atomic operations might be used as a
substitute for a global mutex protecting the socket count and
generation number.

The observation that soreceive_rcvoob() can't return an mbuf
chain is a property, not a bug, so remove the XXXRW.

In sorflush, s/existing/previous/ for code when describing prior
behavior.

For SO_LINGER socket option retrieval, remove an XXXRW about why
we hold the mutex: this is correct and not dubious.

MFC after:	2 weeks
2004-12-23 01:07:12 +00:00
rwatson
fd297e3939 In soalloc(), simplify the mac_init_socket() handling to remove
unnecessary use of a global variable and simplify the return case.
While here, use ()'s around return values.

In sodealloc(), remove a comment about why we bump the gencnt and
decrement the socket count separately.  It doesn't add
substantially to the reading, and clutters the function.

MFC after:	2 weeks
2004-12-23 00:59:43 +00:00
alc
04b2362b0f Add send buffer locking to uipc_send(). Without this locking a race can
occur between a reader and a writer that results in a panic upon close,
e.g.,
	"panic: sbflush_locked: cc 4 || mb 0xffffff0052afa400 || mbcnt 0"

Reviewed by: rwatson@
MFC after: 2 weeks
2004-12-22 20:28:46 +00:00
phk
fbe7293f5a Include uio.h
Check O_NONBLOCK instead if IO_NDELAY
Don't include vnode.h
2004-12-22 17:37:14 +00:00
phk
93beb5568e Hide/remove various printfs, now that root mounting doesn't seem to explode
on people.
2004-12-20 21:59:25 +00:00
phk
1a55c4023a fix a misleading sleep identifier. 2004-12-20 21:38:13 +00:00
phk
0faeb292ed We can only ever get to vgonechrl() from a devfs vnode, so we do not
need to reassign the vp->v_op to devfs_specops, we know that is the
value already.

Make devfs_specops private to devfs.
2004-12-20 21:34:29 +00:00
davidxu
acc63fde9c 1. msleep returns EWOULDBLOCK not ETIMEDOUT, use EWOULDBLOCK instead.
2. Eliminate a possible lock leak in timed wait loop.
2004-12-18 13:43:16 +00:00
davidxu
395ea4c2e2 1. make umtx sharable between processes, the way is two or more processes
call mmap() to create a shared space, and then initialize umtx on it,
   after that, each thread in different processes can use the umtx same
   as threads in same process.
2. introduce a new syscall _umtx_op to support timed lock and condition
   variable semantics. also, orignal umtx_lock and umtx_unlock inline
   functions now are reimplemented by using _umtx_op, the _umtx_op can
   use arbitrary id not just a thread id.
2004-12-18 12:52:44 +00:00
sam
8c71ba93c4 fix m_append for case where additional mbufs are required 2004-12-15 19:04:07 +00:00
phk
eab55e1589 Fix a deadlock I introduced this morning.
Mostly from:	tegge
2004-12-14 20:48:40 +00:00
jeff
c94fadce10 - Garbage collect several unused members of struct kse and struce ksegrp.
As best as I can tell, some of these were never used.
2004-12-14 10:53:55 +00:00
jeff
422a07b8e1 - In kseq_choose(), don't recalculate slice values for processes with a
nice of 0.  Doing so can cause an infinite loop because they should be
   running, but a nice -20 process could prevent them from doing so.
 - Add a new flag KEF_PRIOELEV to flag a thread that has had its priority
   elevated due to priority propagation.  If a thread has had its priority
   elevated, we assume that it must go on the current queue and it must
   get a slice.
 - In sched_userret() if our priority was elevated and we shouldn't have
   a timeslice, yield here until we should.

Found/Tested by:	glebius
2004-12-14 10:34:27 +00:00
phk
8a94ce5ea1 Add a new kind of reference count (fd_holdcnt) to struct filedesc
which holds on to just the data structure and the mutex.  (The
existing refcount (fd_refcnt) holds onto the open files in the
descriptor.)

The fd_holdcnt is protected by fdesc_mtx, fd_refcnt by FILEDESC_LOCK.

Add fdhold(struct proc *) which gets a hold on the filedescriptors of
the specified proc..

Add fddrop(struct filedesc *) which drops the fd_holdcnt and if zero
destroys the mutex and frees the memory.

Initialize the fd_holdcnt to one in fdinit().  Normal operations on
the filedesc structure will not change it.

In fdfree() use fddrop() to dispose of the mutex and structure.  Hold
the FILEDESC_LOCK() until we have cleaned out the contents and carefully
set the fields to null values during cleanup.

Use fdhold()/fddrop() in mountcheckdirs() and sysctl_kern_file().
2004-12-14 09:09:51 +00:00
phk
8986ab0ecf Make fdesc_mtx private to kern_descrip.c now that the flock has come home. 2004-12-14 08:44:51 +00:00
phk
b124f6c510 Move the checkdirs() function from vfs_mount.c to kern_descrip.c and
call it mountcheckdirs().
2004-12-14 08:23:18 +00:00
phk
e622b9bcba Add new function fdunshare() which encapsulates the necessary light magic
for ensuring that a process' filedesc is not shared with anybody.

Use it in the two places which previously had private implmentations.

This collects all fd_refcnt handling in kern_descrip.c
2004-12-14 07:20:03 +00:00
jeff
6ec1292f83 - If delivering a signal will result in killing a process that has a
nice value above 0, set it to 0 so that it may proceed with haste.
   This is especially important on ULE, where adjusting the priority
   does not guarantee that a thread will be granted a greater time slice.
2004-12-13 16:45:57 +00:00
jeff
00f132eec6 - Take up a 'slot' while we're on the assigned queue, waiting to be
posted to another processor.  Otherwise, kern_switch() gets confused
   and tries to sched_add(NULL).
2004-12-13 13:09:33 +00:00
pjd
77ebddb33c Add bioq_insert_head() function.
OK'd by:	phk
2004-12-13 12:57:21 +00:00
alc
04949181f9 Correct the handling of two unusual cases by the zero-copy receive path,
specifically, vm_pgmoveco():
1. If vm_pgmoveco() sleeps on a busy page, it must redo the look up
because the page may have been freed.
2. If the receive buffer is copy-on-write due to, for example, a fork,
then although the first vm object in the shadow chain may not contain
a page there may still be one from a backing object that is mapped.
Thus, a pmap_remove() is required for the new page rather than the
backing object's page to been seen by the application.

Also, add some comments to vm_pgmoveco() and update some assertions.

Tested by: ken@
2004-12-13 06:24:14 +00:00
phk
dcaf810e82 Copy the entire stats structure. Let compiler decide how. 2004-12-11 22:13:02 +00:00
phk
88c8b6a503 Fix whitespace.
Spotted by:	njl
2004-12-11 20:41:32 +00:00
phk
b733b4afe1 Remove the /dev/dev -> / symlink after we are done with it. 2004-12-11 12:48:37 +00:00
alc
72abac0bd2 Remove unneeded code from the zero-copy receive path.
Discussed with: gallatin@
Tested by: ken@
2004-12-10 04:49:13 +00:00
mlaier
488882b3e9 Start the protocol timeouts only after all domains have been initialized
completely. For some reason (that I am still curious about) we started to no
longer manage to finish the initialization before the timeouts run the first
time leading to panics when using uninitialized mutex etc.

The root of this problem is that we currently first link a domain to the
domains list and only later initialize the domain's protocols. This should
be reworked in the future, but with the current API it is not possible in
all situations. We settle with this lazy fix for now.

Tested by:	gnn, ru, myself
2004-12-09 11:47:30 +00:00
sam
bc6df700aa add m_append utility function to be used in forthcoming changes 2004-12-08 05:42:02 +00:00
alc
cdb92b3917 Tidy up the zero-copy receive path: Remove an unneeded argument to
uiomoveco() and userspaceco().
2004-12-08 05:25:08 +00:00
njl
6224f36cad Add the devclass_get_count(9) function and man page. It gets a count of
the number of devices in a devclass and is a subset of
devclass_get_devices(9).

Reviewed by:	imp, dfr
2004-12-08 02:39:56 +00:00
ups
e2f469b728 Propagate TDF_NEEDRESCHED to replacement thread in sched_switch().
Reviewed by:    julian, jhb (in October)
Approved by:    sam (mentor)
MFC after:      4 weeks
2004-12-07 18:17:24 +00:00
phk
4a639d6164 The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly
split the conversion of the remaining three filesystems out from the root
mounting changes, so in one go:

cd9660:
	Convert to nmount.
	Add omount compat shims.
	Remove dedicated rootfs mounting code.
	Use vfs_mountedfrom()
	Rely on vfs_mount.c calling VFS_STATFS()

nfs(client):
	Convert to nmount (the simple way, mount_nfs(8) is still necessary).
	Add omount compat shims.
	Drop COMPAT_PRELITE2 mount arg compatibility.

ffs:
	Convert to nmount.
	Add omount compat shims.
	Remove dedicated rootfs mounting code.
	Use vfs_mountedfrom()
	Rely on vfs_mount.c calling VFS_STATFS()

Remove vfs_omount() method, all filesystems are now converted.

Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem
task, and they all do it now.

Change rootmounting to use DEVFS trampoline:

vfs_mount.c:
	Mount devfs on /.  Devfs needs no 'from' so this is clean.
	symlink /dev to /.  This makes it possible to lookup /dev/foo.
	Mount "real" root filesystem on /.
	Surgically move the devfs mountpoint from under the real root
	filesystem onto /dev in the real root filesystem.

Remove now unnecessary getdiskbyname().

kern_init.c:
	Don't do devfs mounting and rootvnode assignment here, it was
	already handled by vfs_mount.c.

Remove now unused bdevvp(), addaliasu() and addalias().  Put the
few necessary lines in devfs where they belong.  This eliminates the
second-last source of bogo vnodes, leaving only the lemming-syncer.

Remove rootdev variable, it doesn't give meaning in a global context and
was not trustworth anyway.  Correct information is provided by
statfs(/).
2004-12-07 08:15:41 +00:00
phk
e8e7853c60 Instead of complaining about it, just silently filter out MNT_ROOTFS.
This fixes the "fsck /" problem various people have reported overnight.
2004-12-07 06:58:42 +00:00
phk
b0b381b949 make "ffs" and alias for "ufs" when it comes to filesystem names. 2004-12-06 22:22:57 +00:00
phk
40bf35ba61 Always call VFS_STATFS() on mp->mnt_stat when we have mounted a filesystem,
this way individual filesystems don't have to do it.
2004-12-06 19:53:32 +00:00
phk
dbe532a28b Add more functions for handling mount arguments in VFS_MOUNT():
vfs_flagopt() for binary/boolean options.
vfs_getopts() for string options
vfs_filteropt() to check for unknown options.
vfs_scanopt() for scanf() like processing of options.

Also add function for setting the stat.f_mntfromname field.
2004-12-06 18:18:35 +00:00
phk
e196d80083 Change the first argument of vfs_cmount() to a handy struct mntarg* and
call it accordingly.

(No filesystems implement vfs_cmount() yet, so this is a no-op commit)
2004-12-06 16:39:05 +00:00
phk
51c6653f10 Add a few convenient functions in the mount_arg() family and collect the
entire family at the end of the source file.
2004-12-06 13:01:41 +00:00
phk
e1b20748c2 Collapse two almost identical license copies, preserving the rights of
all listed authors, rightholders and contributors.
2004-12-06 12:44:30 +00:00