Commit Graph

8295 Commits

Author SHA1 Message Date
Jeff Roberson
7a9507b60e - A test in sched_switch() is no longer necessary and it is incorrect
when td0 is preempted before it voluntarily switches.

Discovered by:	Arjan Van Leeuwen <avleeuwen@gmail.com>
2005-02-23 00:50:26 +00:00
Sam Leffler
3e55226c46 kill dead code
Noticed by:	Coverity Prevent analysis tool
2005-02-23 00:43:00 +00:00
Jeff Roberson
d8a7c99a1c - Only the xlock holder should be calling VOP_LOCK on a vp once VI_XLOCK
has been set.  Assert that this is the case so that we catch filesystems
   who are using naked VOP_LOCKs in illegal cases.

Sponsored by:	Isilon Systems, Inc.
2005-02-23 00:11:14 +00:00
Jeff Roberson
4c11620bb9 - Add a check for xlock in vop_lock_assert. Presently the xlock is
considered to be as good as an exclusive lock, although there is still a
   possibility of someone acquiring a VOP LOCK while xlock is held.

Sponsored by:	Isilon Systems, Inc.
2005-02-22 23:59:11 +00:00
Poul-Henning Kamp
767056c0e8 Zero the v_un container field to make sure everything is gone. 2005-02-22 18:56:18 +00:00
Poul-Henning Kamp
aa2f6ddc3f Reap more benefits from DEVFS:
List devfs_dirents rather than vnodes off their shared struct cdev, this
saves a pointer field in the vnode at the expense of a field in the
devfs_dirent.  There are often 100 times more vnodes so this is bargain.
In addition it makes it harder for people to try to do stypid things like
"finding the vnode from cdev".

Since DEVFS handles all VCHR nodes now, we can do the vnode related
cleanup in devfs_reclaim() instead of in dev_rel() and vgonel().
Similarly, we can do the struct cdev related cleanup in dev_rel()
instead of devfs_reclaim().

	rename idestroy_dev() to destroy_devl() for consistency.

	Add LIST_ENTRY de_alias to struct devfs_dirent.
	Remove v_specnext from struct vnode.
	Change si_hlist to si_alist in struct cdev.
	String new devfs vnodes' devfs_dirent on si_alist when
	we create them and take them off in devfs_reclaim().

	Fix devfs_revoke() accordingly.  Also don't clear fields
	devfs_reclaim() will clear when called from vgone();

	Let devfs_reclaim() call dev_rel() instead of vgonel().

	Move the usecount tracking from dev_rel() to devfs_reclaim(),
	and let dev_rel() take a struct cdev argument instead of vnode.

	Destroy SI_CHEAPCLONE devices in dev_rel() (instead of
	devfs_reclaim()) when they are no longer used.   (This
	should maybe happen in devfs_close() instead.)
2005-02-22 15:51:07 +00:00
Poul-Henning Kamp
1a1457d427 Make dev_ref() require the dev_lock() to be held and use it from
devfs instead of directly frobbing the si_refcount.
2005-02-22 14:41:04 +00:00
Poul-Henning Kamp
7fc940b266 Remove vfinddev(), it is generally bogus when faced with jails and
chroot and has no legitimate use(r)s in the tree.
2005-02-22 14:11:47 +00:00
Robert Watson
4f7fd28ee1 When invoking callout_init(), spell '1' as "CALLOUT_MPSAFE".
MFC after:	3 days
2005-02-22 13:11:33 +00:00
Robert Watson
0daccb9c94 In the current world order, solisten() implements the state transition of
a socket from a regular socket to a listening socket able to accept new
connections.  As part of this state transition, solisten() calls into the
protocol to update protocol-layer state.  There were several bugs in this
implementation that could result in a race wherein a TCP SYN received
in the interval between the protocol state transition and the shortly
following socket layer transition would result in a panic in the TCP code,
as the socket would be in the TCPS_LISTEN state, but the socket would not
have the SO_ACCEPTCONN flag set.

This change does the following:

- Pushes the socket state transition from the socket layer solisten() to
  to socket "library" routines called from the protocol.  This permits
  the socket routines to be called while holding the protocol mutexes,
  preventing a race exposing the incomplete socket state transition to TCP
  after the TCP state transition has completed.  The check for a socket
  layer state transition is performed by solisten_proto_check(), and the
  actual transition is performed by solisten_proto().

- Holds the socket lock for the duration of the socket state test and set,
  and over the protocol layer state transition, which is now possible as
  the socket lock is acquired by the protocol layer, rather than vice
  versa.  This prevents additional state related races in the socket
  layer.

This permits the dual transition of socket layer and protocol layer state
to occur while holding locks for both layers, making the two changes
atomic with respect to one another.  Similar changes are likely require
elsewhere in the socket/protocol code.

Reported by:		Peter Holm <peter@holm.cc>
Review and fixes from:	emax, Antoine Brodin <antoine.brodin@laposte.net>
Philosophical head nod:	gnn
2005-02-21 21:58:17 +00:00
Robert Watson
c364c823d0 When aborting a UNIX domain socket bind() because VOP_CREATE() failed,
make sure to call vn_finished_write(mp) before returning.

MFC after:	3 days
2005-02-21 14:21:50 +00:00
Robert Watson
892af6b930 style(9)-ize function headers, remove use of 'register'.
MFC after:	3 days
2005-02-20 23:22:13 +00:00
David Schultz
e8ed933099 Remove VFS_START(). Its original purpose involved the mfs filesystem,
which is long gone.

Discussed with:	mckusick
Reviewed by:	phk
2005-02-20 23:02:20 +00:00
Robert Watson
d664e4fa50 In unp_attach(), allow uma_zalloc to zero the new unpcb rather than
explicitly using bzero().

Update copyright.

MFC after:	3 days
2005-02-20 20:05:11 +00:00
Robert Watson
2b85a170d1 Prefer NULL to returning 0 cast to a pointer type.
MFC after:	3 days
2005-02-20 15:56:13 +00:00
Robert Watson
a00428ef92 In soreceive(), when considering delivery to a socket in SS_ISCONFIRMING,
only call the protocol's pru_rcvd() if the protocol has the flag
PR_WANTRCVD set.  This brings that instance of pru_rcvd() into line with
the rest, which do check the flag.

MFC after:	3 days
2005-02-20 15:54:44 +00:00
Robert Watson
7301cf23ef Move assignment of UNIX domain socket pcb during unp_attach() outside
of the global UNIX domain socket mutex: no protection is needed that
early in the setup of the UNIX domain socket and socket structures.

MFC after:	3 days
2005-02-20 04:18:22 +00:00
Nate Lawson
e959a70bad Add the "freq_settings" sysctl to each device that registers with cpufreq
so their individual settings can be seen separately for debugging.
2005-02-20 00:59:15 +00:00
Poul-Henning Kamp
dfd4be14bd Try to unbreak the vnode locking around vop_reclaim() (based mostly on
patch from kan@).

Pull bufobj_invalbuf() out of vinvalbuf() and make g_vfs call it on
close.  This is not yet a generally safe function, but for this very
specific use it is safe.  This solves the problem with buffers not
being flushed by unmount or after failed mount attempts.
2005-02-19 11:44:57 +00:00
David Xu
1089f0319b Don't restart a timeout wait in kern_sigtimedwait, also allow it
to wait longer than a single integer can represent.
2005-02-19 06:05:49 +00:00
Paul Saab
b7820945ac Swap the arguments for CP so we copy the correct source and
destination.
2005-02-18 22:14:40 +00:00
Robert Watson
29bdd01910 Remove now unused 'int s' from spl().
MFC after:	3 days
2005-02-18 21:39:55 +00:00
Robert Watson
d8d716bef5 De-spl kern_connect().
MFC after:	3 days
2005-02-18 19:37:36 +00:00
Robert Watson
a7ae36bc45 Correct a typo in the comment describing soreceive_rcvoob().
MFC after:	3 days
2005-02-18 19:15:22 +00:00
Robert Watson
1b5c4b15b4 In soconnect(), when resetting so->so_error, the socket lock is not
required due to a straight integer write in which minor races are not
a problem.
2005-02-18 19:13:51 +00:00
Robert Watson
11d06c4b78 Re-style do_setopt_accept_filter() to match uipc_accf.c style, and fix
one other style nit in the file.

MFC after:	3 days
2005-02-18 19:01:22 +00:00
Robert Watson
78e436448f Move do_setopt_accept_filter() from uipc_socket.c to uipc_accf.c, where
the rest of the accept filter code currently lives.

MFC after:	3 days
2005-02-18 18:54:42 +00:00
Robert Watson
1ed716a149 Minor style tweaks: line wrap comments and lines more consistently.
MFC after:	3 days
2005-02-18 18:49:44 +00:00
Robert Watson
627de7fa2c Re-order checks in socheckuid() so that we check all deny cases before
returning accept.

MFC after:	3 days
2005-02-18 18:43:33 +00:00
Poul-Henning Kamp
900b7e2648 Make sure to drop the VI_LOCK in vgonel();
Spotted by: Taku YAMAMOTO <taku@tackymt.homeip.net>
2005-02-18 11:13:56 +00:00
Robert Watson
0d89301c51 In solisten(), unconditionally set the SO_ACCEPTCONN option in
so->so_options when solisten() will succeed, rather than setting it
conditionally based on there not being queued sockets in the completed
socket queue.  Otherwise, if the protocol exposes new sockets via the
completed queue before solisten() completes, the listen() system call
will succeed, but the socket and protocol state will be out of sync.
For TCP, this didn't happen in practice, as the TCP code will panic if
a new connection comes in after the tcpcb has been transitioned to a
listening state but the socket doesn't have SO_ACCEPTCONN set.

This is historical behavior resulting from bitrot since 4.3BSD, in which
that line of code was associated with the conditional NULL'ing of the
connection queue pointers (one-time initialization to be performed
during the transition to a listening socket), which are now initialized
separately.

Discussed with:	fenner, gnn
MFC after:	3 days
2005-02-18 00:52:17 +00:00
Nate Lawson
e94a0c1a18 Introduce a new method, cpufreq_drv_type(), that returns the type of the
driver.  This used to be handled by cpufreq_drv_settings() but it's
useful to get the type/flags separately from getting the settings.
(For example, you don't have to pass an array of cf_setting just to find
the driver type.)

Use this new method in our in-tree drivers to detect reliably if acpi_perf
is present and owns the hardware.  This simplifies logic in drivers as well
as fixing a bug introduced in my last commit where too many drivers attached.
2005-02-18 00:23:36 +00:00
Robert Watson
1e8f89541e In accept1(), extend coverage of the socket lock from just covering
soref() to also covering the update of so_state.  While no other user
threads can update the socket state here as it's not yet hooked up to
the file descriptor array yet, the protocol could also frob the
socket state here, leading to a lost update to the so_state field.
No reported instances of this bug (as yet).

MFC after:      3 days
2005-02-17 13:00:23 +00:00
Robert Watson
280249a66a In sonewconn(), set the new socket's state to show the protocol-provided
connection status before inserting the new socket into the listen
socket's accept queue, or there might be a race in which another thread
wakes up when the accept lock is released, and sees the socket before its
state is set correctly.  The wakeup still occurs after the accept lock is
released.  There have been no diagnoses of this bug in real-world systems
(as yet).

MFC after:	3 days
2005-02-17 12:53:45 +00:00
Poul-Henning Kamp
4d8ac58b05 Introduce vx_wait{l}() and use it instead of home-rolled versions. 2005-02-17 10:49:51 +00:00
Poul-Henning Kamp
58aac12894 Convert KASSERTS to VNASSERTS 2005-02-17 10:28:58 +00:00
Dag-Erling Smørgrav
f3f4baf099 Add /rescue/init to the default init_path, before /stand/sysinstall.
MFC after:	2 weeks
2005-02-17 10:00:10 +00:00
Bosko Milekic
8076cb5289 Well, it seems that I pre-maturely removed the "All rights reserved"
statement from some files, so re-add it for the moment, until the
related legalese is sorted out.  This change affects:

sys/kern/kern_mbuf.c
sys/vm/memguard.c
sys/vm/memguard.h
sys/vm/uma.h
sys/vm/uma_core.c
sys/vm/uma_dbg.c
sys/vm/uma_dbg.h
sys/vm/uma_int.h
2005-02-16 21:45:59 +00:00
Nate Lawson
67c8649f7f When dealing with systems with no absolute drivers attached, only calibrate
the rate for the 100% state once.  Afterwards, use that value for deriving
states.  This should fix the problem where the calibrated frequency was
different once a switch was done, giving a different set of levels each
time.  Also, properly search for the right cpufreqX device when detaching.
2005-02-15 07:43:48 +00:00
Nate Lawson
1196826af5 Bind to the driver's parent cpu before switching, for both absolute and
relative drivers.  Remove some extraneous KASSERTs since NULL pointers
will be found when they're used right afterwards.
2005-02-15 07:22:42 +00:00
Nate Lawson
5f0afa0415 Implement priorities. This allows a driver (say, for cooling purposes) to
override the current freq level temporarily and restore it when the
higher priority condition is past.  Note that only the first overridden
value is saved.  Callers pass NULL to CPUFREQ_SET to restore the saved
level.  Priorities are not yet used so this commit should have no effect.
2005-02-14 18:16:35 +00:00
Nate Lawson
e22cd41c01 Add support for the CPUFREQ_FLAG_INFO_ONLY flag. Devices that report this
are not added to the list(s) of available settings.  However, other drivers
can call the CPUFREQ_DRV_SETTINGS() method on those devices directly to
get info about available settings.

Update the acpi_perf(4) driver to use this flag in the presence of
"functional fixed hardware."  Thus, future drivers like Powernow can
query acpi_perf for platform info but perform frequency transitions
themselves.
2005-02-13 18:49:48 +00:00
Maxim Sobolev
f460d05699 Backout addition of SIGTHR into the list of signals allowed to be delivered
to the suid/sugid process, since apparently it has security implications.

Suggested by:   rwatson
2005-02-13 17:51:47 +00:00
Maxim Sobolev
1a88a252fd Backout previous change (disabling of security checks for signals delivered
in emulation layers), since it appears to be too broad.

Requested by:   rwatson
2005-02-13 17:37:20 +00:00
Nate Lawson
0325089dad Set levels on all CPUs and attach a cpufreq device to each one. Sysctl
on dev.cpu.0 will affect all of the CPUs together.  In the future,
independent control will be supported but this is good enough for now.
Check that the timecounter isn't TSC before switching (from Colin Percival.)
2005-02-13 17:31:56 +00:00
Maxim Sobolev
d8ff44b79f Split out kill(2) syscall service routine into user-level and kernel part, the
former is callable from user space and the latter from the kernel one. Make
kernel version take additional argument which tells if the respective call
should check for additional restrictions for sending signals to suid/sugid
applications or not.

Make all emulation layers using non-checked version, since signal numbers in
emulation layers can have different meaning that in native mode and such
protection can cause misbehaviour.

As a result remove LIBTHR from the signals allowed to be delivered to a
suid/sugid application.

Requested (sorta) by:	rwatson
MFC after:	2 weeks
2005-02-13 16:42:08 +00:00
Christian S.J. Peron
84f85aedef Add much needed descriptions for a number of the IPC related sysctl OIDs.
This information will be very useful for people who are tuning applications
which have a dependence on IPC mechanisms.

The following OIDs were documented:

Message queues:
 kern.ipc.msgmax
 kern.ipc.msgmni
 kern.ipc.msgmnb
 kern.ipc.msgtlq
 kern.ipc.msgssz
 kern.ipc.msgseg

Semaphores:
 kern.ipc.semmap
 kern.ipc.semmni
 kern.ipc.semmns
 kern.ipc.semmnu
 kern.ipc.semmsl
 kern.ipc.semopm
 kern.ipc.semume
 kern.ipc.semusz
 kern.ipc.semvmx
 kern.ipc.semaem

Shared memory:
 kern.ipc.shmmax
 kern.ipc.shmmin
 kern.ipc.shmmni
 kern.ipc.shmseg
 kern.ipc.shmall
 kern.ipc.shm_use_phys
 kern.ipc.shm_allow_removed
 kern.ipc.shmsegs

These new descriptions can be viewed using sysctl -d

PR:		kern/65219
Submitted by:	Dan Nelson <dnelson at allantgroup dot com> (modified)
No objections:	developers@
Descriptions reviewed by: gnn
MFC after:	1 week
2005-02-12 01:22:39 +00:00
Maxim Sobolev
ac16ff40c5 Add SIGTHR (32) into list of signals permitted to be delivered to the
suid application. The problem is that Linux applications using old Linux
threads (pre-NPTL) use signal 32 (linux SIGRTMIN) for communication between
thread-processes. If such an linux application is installed suid or sgid
and security.bsd.conservative_signals=1 (default), then permission will be
denied to send such a signal and the application will freeze.

I believe the same will be true for native applications that use libthr,
since libthr uses SIGTHR for implementing conditional variables.

PR:		72922
Submitted by:	Andriy Gapon <avg@icyb.net.ua>
MFC after:	2 weeks
2005-02-11 14:02:42 +00:00
Ian Dowse
57c037be1c When processing a timeout() callout and returning it to the free
list, set `curr_callout' to NULL. This ensures that we won't attempt
to cancel the current callout if the original callout structure
gets recycled while we wait to acquire Giant.

This is reported to fix an intermittent syscons problem that was
introduced by revision 1.96.
2005-02-11 00:14:00 +00:00
Bosko Milekic
3d2a3ff25e Optimize the way reference counting is performed with Mbufs. We
do not need to perform an extra memory fetch in the Packet (Mbuf+Cluster)
constructor to initialize the reference counter anymore.  The reference
counts are located in a separate memory region (in the slab header,
because this zone is UMA_ZONE_REFCNT), so the memory fetch resulted very
often in a cache miss.  Additionally, and perhaps more significantly,
optimize the free mbuf+cluster (packet) case, which is very common, to
no longer require an atomic operation on free (to verify the reference
counter) if the reference on the cluster has never been increased (also
very common).  Reduces an atomic on mbuf free on average.

Original patch submitted by: Gerrit Nagelhout <gnagelhout@sandvine.com>
2005-02-10 22:23:02 +00:00
Colin Percival
e5e6a46460 Declare "cnt" (a number of bytes to read or write) as an "ssize_t", not
as a "long" in dofileread() and dofilewrite().

Discussed with:	jhb
2005-02-10 20:19:17 +00:00
Poul-Henning Kamp
1ba212823f Make various vnode related functions static 2005-02-10 12:28:58 +00:00
Poul-Henning Kamp
44dc16a986 Make some file/filedesc related functions static 2005-02-10 12:27:58 +00:00
Poul-Henning Kamp
ebbfc2f82d Make various mountpoint related functions static. 2005-02-10 12:25:38 +00:00
Poul-Henning Kamp
5ece08f57a Make a SYSCTL_NODE static 2005-02-10 12:23:29 +00:00
Poul-Henning Kamp
502a35d60f MD5Pad() should never have been exposed. 2005-02-10 12:20:42 +00:00
Poul-Henning Kamp
502a590bf1 make cluster_callback() static 2005-02-10 12:17:48 +00:00
Poul-Henning Kamp
2adc2b87c7 Make a SYSCTL_NODE and a mutex static 2005-02-10 12:16:42 +00:00
Poul-Henning Kamp
85eb15a2ce Make another bunch of SYSCTL_NODEs static 2005-02-10 12:16:08 +00:00
Poul-Henning Kamp
0c898376fa Make a bunch of SYSCTL_NODEs static. 2005-02-10 12:15:49 +00:00
Poul-Henning Kamp
c711aea6ca Make a bunch of malloc types static.
Found by:	src/tools/tools/kernxref
2005-02-10 12:02:37 +00:00
Poul-Henning Kamp
fe0198779c Don't pass NULL to vprint() 2005-02-10 08:55:08 +00:00
Jeff Roberson
5c18d18b1d - Add more information to the getnewbuf() recycling KTR.
Sponsored by:	Isilon Systems, Inc.
2005-02-10 02:22:56 +00:00
Jeff Roberson
68f2274d97 - Add a new assert in the getnewvnode(). Assert that the usecount is still
0 to detect getnewvnode() races.
 - Add the vnode address to a few panics near by to help in debugging.

Sponsored by:	Isilon Systems, Inc.
2005-02-08 23:27:10 +00:00
Jeff Roberson
b56dc9a785 - Remove an invalid KASSERT added in recent background write reshuffling.
Sponsored by:	Isilon Systems, Inc.
2005-02-08 23:25:08 +00:00
Colin Percival
79653046d8 Add a new sysctl, "security.jail.chflags_allowed", which controls the
behaviour of chflags within a jail.  If set to 0 (the default), then a
jailed root user is treated as an unprivileged user; if set to 1, then
a jailed root user is treated the same as an unjailed root user.

This is necessary to allow "make installworld" to work inside a jail,
since it attempts to manipulate the system immutable flag on certain
files.

Discussed with:	csjp, rwatson
MFC after:	2 weeks
2005-02-08 21:31:11 +00:00
Poul-Henning Kamp
dd19a799b8 Background writes are entirely an FFS/Softupdates thing.
Give FFS vnodes a specific bufwrite method which contains all the
background write stuff and then calls into the default bufwrite()
for the rest of the job.

Remove all the background write related stuff from the normal bufwrite.

This drags the softdep_move_dependencies() back into FFS.

Long term, it is worth looking at simply copying the data into
allocated memory and issuing the bio directly and not create the
"shadow buf" in the first place (just like copy-on-write is done
in snapshots for instance).  I don't think we really gain anything
but complexity from doing this with a buf.
2005-02-08 20:29:10 +00:00
Poul-Henning Kamp
88e5b12a20 Drag another softupdates tentacle back into FFS: Now that FFS's
vop_fsync is separate from the internal use we can do the full job
there.
2005-02-08 18:09:11 +00:00
Nate Lawson
7990a18c7b Maxunit is inclusive so fix off-by-one in previous commit. 2005-02-08 18:03:17 +00:00
Nate Lawson
5b68bf38ab Update device_find_child(9) to return the first matching child if unit
is set to -1.

Reviewed by:	dfr, imp
2005-02-08 18:00:29 +00:00
John Baldwin
fee4a6af3a Implement a kern_pathconf() wrapper for pathconf() which can take the
filename from either a user space or a kernel space pointer.
2005-02-07 21:46:43 +00:00
John Baldwin
5e85ac176f If the pointer to the new itimerval is NULL in kern_setitimer(), just
read the old value via kern_getitimer().
2005-02-07 21:45:48 +00:00
John Baldwin
76951d21d1 - Tweak kern_msgctl() to return a copy of the requested message queue id
structure in the struct pointed to by the 3rd argument for IPC_STAT and
  get rid of the 4th argument.  The old way returned a pointer into the
  kernel array that the calling function would then access afterwards
  without holding the appropriate locks and doing non-lock-safe things like
  copyout() with the data anyways.  This change removes that unsafeness and
  resulting race conditions as well as simplifying the interface.
- Implement kern_foo wrappers for stat(), lstat(), fstat(), statfs(),
  fstatfs(), and fhstatfs().  Use these wrappers to cut out a lot of
  code duplication for freebsd4 and netbsd compatability system calls.
- Add a new lookup function kern_alternate_path() that looks up a filename
  under an alternate prefix and determines which filename should be used.
  This is basically a more general version of linux_emul_convpath() that
  can be shared by all the ABIs thus allowing for further reduction of
  code duplication.
2005-02-07 18:44:55 +00:00
John Baldwin
c90110d639 Various and sundry style fixes. 2005-02-07 18:38:29 +00:00
Poul-Henning Kamp
b9489a449c Access vmobject via the bufobj instead of the vnode 2005-02-07 10:04:06 +00:00
Poul-Henning Kamp
7ee4eb6192 VOP_DESTROYVOBJECT() is no more. 2005-02-07 09:26:58 +00:00
Poul-Henning Kamp
7c5d36fb80 Remove vop_stddestroyvobject() 2005-02-07 09:26:39 +00:00
Poul-Henning Kamp
b348abd6cd Don't call VOP_DESTROYVOBJECT(), trust that VOP_RECLAIM() did what
was necessary.
2005-02-07 07:48:03 +00:00
Poul-Henning Kamp
5937226d51 Add a missing prefix to a struct field for consistency. 2005-02-07 07:40:39 +00:00
Ian Dowse
98c926b20f Add a mechanism for associating a mutex with a callout when the
callout is first initialised, using a new function callout_init_mtx().
The callout system will acquire this mutex before calling the callout
function and release it on return.

In addition, the callout system uses the mutex to avoid most of the
complications and race conditions inherent in asynchronous timer
facilities, so mutex-protected callouts have much simpler semantics.
As long as the mutex is held when invoking callout_stop() or
callout_reset(), then these functions will guarantee that the callout
will be stopped, even if softclock() had already begun to process
the callout.

Existing Giant-locked callouts will automatically pick up the new
race-free semantics. This should close a number of race conditions
in the USB code and probably other areas of the kernel too.

There should be no change in behaviour for "MP-safe" callouts; these
still need to use the techniques mentioned in timeout(9) to avoid
race conditions.
2005-02-07 02:47:33 +00:00
Nate Lawson
88c9b54c47 Add support for relative cpufreq drivers. Such drivers modulate clock
frequency as a percentage of the base rate and do not change the base
rate directly.  The cpufreq framework combines these with absolute drivers
to produce synthesized levels made of one or more settings.
2005-02-06 21:08:35 +00:00
Jeff Roberson
8364446643 - Don't release BKGRDINPROG until after we've bufdone'd the copy.
Sponsored by:	Isilon Systems, Inc.
2005-02-05 01:26:14 +00:00
Jeff Roberson
42a29039de - Add ke_runq == NULL to the conditions which will cause us to abort
adjusting timeshare loads in sched_class().  This is only important if
   the thread has never run, otherwise the state checks should work as
   expected.
2005-02-04 17:22:46 +00:00
Suleiman Souhlal
339a7e7fbb Set the scheduling class of the idle threads to PRI_IDLE.
While there, set their priority with sched_prio() instead of changing it
'by hand'.

Reviewed by:	jhb
Approved by:	grehan (mentor)
2005-02-04 06:16:05 +00:00
Nate Lawson
73347b071d Add the cpufreq framework. This code manages multiple drivers and presents
a unified kernel and user interface for controlling cpu frequencies.
2005-02-04 05:39:19 +00:00
Nate Lawson
bfdbeca163 Add an interface for cpufreq. The kernel interface lets other drivers
select the CPU frequency level (say for cooling).  The driver interface
allows hardware drivers to announce themselves as capable of adjusting
an individual frequency setting.
2005-02-04 05:38:30 +00:00
Pawel Jakub Dawidek
f627315f1e - Move gets() function to libkern (I want to use it outside vfs_mount.c).
- Add buffer size limitations (overflow will not be possible anymore).
- Add 'visible' option, which will allow for passphrase reading in the
  future.
- Remove special treatment of '@' and '#', those two are only confusing.

Discussed with:	rwatson
MFC after:	2 weeks
2005-02-03 15:10:58 +00:00
Jeff Roberson
ff05fd5d77 - Correct a typo in kern_rename. tvfslocked should be initialized from
tond and not fromnd.  This could lead us to leak Giant, or unlock it
   twice, depending on the filesystems involved.  renames within a single
   filesystem would not have caused any problems.

Sponsored by:	Isilon Systems, Inc.
2005-02-02 17:17:15 +00:00
Jeff Roberson
37c15216fc - Or MPSAFE with the correct set of flags in stat(). This affected only
the LOOKUP_SHARED case.

Spotted by:	jhb
2005-02-01 23:43:46 +00:00
Bosko Milekic
737cd9525b Update copyright, remove "all rights reserved" (since they are not
all reserved, as the lisence makes clear), and strike the third clause
(now this is a 2-clause liberal BSDL as are the rest of files I hold
copyright over).
2005-02-01 03:17:52 +00:00
Maxim Sobolev
a6886ef173 Extend kern_sendit() to take another enum uio_seg argument, which specifies
where the buffer to send lies and use it to eliminate yet another stackgap
in linuxlator.

MFC after:	2 weeks
2005-01-30 07:20:36 +00:00
Maxim Sobolev
ec217396c4 Fix build on AMD64 (and probably other arches where size_t != int).
Submitted by:	Tinderbox
MFC after:	2 weeks
2005-01-30 06:43:17 +00:00
Robert Watson
78bb1895ab Fix spelling of integer in a comment.
Beady eyes:	ceri
2005-01-30 00:31:19 +00:00
Maxim Sobolev
56c2262c0e Grrr, this committer needs to have a sleep. Remove lines from the previous
delta not intended for public consumption.

MFC after:	2 weeks
2005-01-29 23:51:05 +00:00
Maxim Sobolev
c30af53213 Fix small non-conformance introduced in the previous commit: execve() is
expected to return ENAMETOOLONG, not E2BIG if first argument doesn't
fit into {PATH_MAX} bytes.

MFC after:	2 weeks
2005-01-29 23:47:36 +00:00
Maxim Sobolev
610ecfe035 o Split out kernel part of execve(2) syscall into two parts: one that
copies arguments into the kernel space and one that operates
  completely in the kernel space;

o use kernel-only version of execve(2) to kill another stackgap in
  linuxlator/i386.

Obtained from:  DragonFlyBSD (partially)
MFC after:      2 weeks
2005-01-29 23:12:00 +00:00
Robert Watson
3fcd9325ec Correct a minr whitespace inconsistency introduced in revision 1.159:
add a tab between #define and DF_REBID instead of a space.
2005-01-29 22:04:30 +00:00
Poul-Henning Kamp
d9aaa28f63 Use MAXMINOR 2005-01-29 16:50:04 +00:00
Poul-Henning Kamp
37085a3931 Typo. 2005-01-29 15:10:30 +00:00
Poul-Henning Kamp
3a85fd262c Add MAXMINOR #define, we should have had this long time ago.
Add minor2unit() in addition to dev2unit() and unit2minor().

If it wasn't such a hazzle we should redefine minor numbers in
the kernel without the gap for the major number, but it's not worth
the bother (yet).
2005-01-29 15:07:13 +00:00
Poul-Henning Kamp
a258707313 In 1.276 of kern/subr_trap.c I introduced a mechanism for delaying
a process return to userspace if it had pending GEOM events.

We need to have the same check in the exit pass to catch the case
where a GEOM related filedescriptor is not explicitly closed by
the process.

Bumped into by:	people using dd(1) to build releases, nanobsd etc.
2005-01-29 14:03:41 +00:00
Jeff Roberson
bd8d684fd7 - Don't drop the wref on the bufobj until after bufdone() has completed.
Without this, threads waiting in bufobj_wwait() may wakeup prior to
   bufdone() completing.

Sponsored by:	Isilon Systems, Inc.
2005-01-28 17:48:58 +00:00
Poul-Henning Kamp
d4eb29ba71 Remove unused argument to vrecycle() 2005-01-28 13:08:21 +00:00
Poul-Henning Kamp
1fdfaafb08 Integrate vclean() into vgonel().
Various associated polishing.
2005-01-28 13:00:03 +00:00
Poul-Henning Kamp
3fc8dd0653 Remove register keyword 2005-01-28 12:39:10 +00:00
Poul-Henning Kamp
7146d6cb3e Move the contents of vop_stddestroyvobject() to the new vnode_pager
function vnode_destroy_vobject().

Make the new function zero the vp->v_object pointer so we can tell
if a call is missing.
2005-01-28 08:56:48 +00:00
Jeff Roberson
37f32177bd - Regen 2005-01-26 02:29:18 +00:00
Jeff Roberson
810ad5ec4c - Struct mount is not yet locked well enough to allow
mount/nmount/unmount to run without Giant.  Mark them as STD here.
2005-01-26 02:28:43 +00:00
Maxim Sobolev
f4b6eb045f Split out kernel side of msgctl(2) into two parts: the first that pops data
from the userland and pushes results back and the second which does
actual processing. Use the latter to eliminate stackgap in the linux wrapper
of that syscall.

MFC after:      2 weeks
2005-01-26 00:46:36 +00:00
Maxim Sobolev
cfa0efe7ab Split out kernel side of {get,set}itimer(2) into two parts: the first that
pops data from the userland and pushes results back and the second which does
actual processing. Use the latter to eliminate stackgap in the linux wrappers
of those syscalls.

MFC after:	2 weeks
2005-01-25 21:28:28 +00:00
Jeff Roberson
04186764a4 - Include LK_INTERLOCK in LK_EXTFLG_MASK so that it makes its way into
acquire.
 - Correct the condition that causes us to skip apause() to only require
   the presence of LK_INTERLOCK.

Sponsored by:	Isilon Systems, Inc.
2005-01-25 16:06:05 +00:00
Jeff Roberson
013e6650ca - Make lf_print static and move its prototype into kern_lockf.c
- Protect all of the advlock code with Giant as some filesystems
   may not be entering with Giant held now.

Sponsored by:	Isilon Systems, Inc.
2005-01-25 10:15:26 +00:00
Poul-Henning Kamp
4f8d23d662 Previously a read of zero bytes got handled in devfs:vop_read() but I
missed that when the vnode bypass was introduced.

Deal with zero length transfers before we even get to fo_ops->fo_read().

Found by:	Slawa Olhovchenkov <slwzxy.spb.ru@zxy.spb.ru>
PR:	75758
2005-01-25 09:15:32 +00:00
Poul-Henning Kamp
729fcf7efb Take VOP_GETVOBJECT() out to pasture. We use the direct pointer now. 2005-01-25 00:42:16 +00:00
Poul-Henning Kamp
8516dd18e1 Don't use VOP_GETVOBJECT, use vp->v_object directly. 2005-01-25 00:40:01 +00:00
Poul-Henning Kamp
69816ea35e Kill VOP_CREATEVOBJECT(), it is now the responsibility of the filesystem
for a given vnode to create a vnode_pager object if one is needed.
2005-01-25 00:12:24 +00:00
Poul-Henning Kamp
dcff5b1440 Don't call VOP_CREATEVOBJECT(), it's the responsibility of the
filesystem which owns the vnode.
2005-01-24 23:53:54 +00:00
Poul-Henning Kamp
b5b6ec5faa Eliminate the constant flags argument to vclean() 2005-01-24 22:22:02 +00:00
Poul-Henning Kamp
d07a6d3f61 Move the body of vop_stdcreatevobject() over to the vnode_pager under
the name Sande^H^H^H^H^Hvnode_create_vobject().

Make the new function take a size argument which removes the need for
a VOP_STAT() or a very pessimistic guess for disks.

Call that new function from vop_stdcreatevobject().

Make vnode_pager_alloc() private now that its only user came home.
2005-01-24 21:21:59 +00:00
Poul-Henning Kamp
f6dc414a5c Save a line by unlocking before we test. 2005-01-24 14:13:24 +00:00
Poul-Henning Kamp
7c93282e42 Change vprint() to vn_printf() which takes varargs.
Add #define for vprint() to call vn_printf().
2005-01-24 13:58:08 +00:00
Poul-Henning Kamp
35764be39e Kill the VV_OBJBUF and test the v_object for NULL instead. 2005-01-24 13:13:57 +00:00
Poul-Henning Kamp
027b1f716c Fix a list corruption issue in cloning device management using the
western strategy ("allocate first, ask questions later") so we can
extend the devmtx coverage to the clone list.
2005-01-24 12:44:56 +00:00
Gleb Smirnoff
90d52f2f21 - Convert so_qlen, so_incqlen, so_qlimit fields of struct socket from
short to unsigned short.
- Add SYSCTL_PROC() around somaxconn, not accepting values < 1 or > U_SHRTMAX.

Before this change setting somaxconn to smth above 32767 and calling
listen(fd, -1) lead to a socket, which doesn't accept connections at all.

Reviewed by:	rwatson
Reported by:	Igor Sysoev
2005-01-24 12:20:21 +00:00
Jeff Roberson
e1279468ec - Regen for recent vfs syscall changes.
Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:50:42 +00:00
Jeff Roberson
29ed48fc6a - Change all VFS syscalls to MSTD as they all manually deal with giant
or the appropriate filesystem locks.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:49:26 +00:00
Jeff Roberson
71ddd673b1 - Add CTR calls to trace the lifecycle of a buffer.
- Remove some KASSERTs which are invalid if the appropriate lock is
   not held.
 - Slightly restructure bremfree() so that it is more sane.
 - Change the flush code in bdwrite() to avoid acquiring a mutex
   whenever possible.
 - Change the flush code in bdwrite() to avoid holding the bufobj mutex
   while calling buf_countdeps().  This introduces a lock-order
   relationship with the softdep lock that can not otherwise be resolved.
 - Don't set B_DONE until bufdone() is complete, otherwise another
   processor may believe the buf is done before it is.
 - Only acquire Giant if the caller has set b_iodone.  Don't grab giant
   around normal bufdone() calls.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:47:04 +00:00
Jeff Roberson
d1fcf3bb31 - Add the tunable and sysctl for the mpsafevfs. It currently defaults
to off.
 - Protect access to mnt_kern_flag with the mointpoint mutex.
 - Remove some KASSERTs which are not legal checks without the appropriate
   locks held.
 - Use VCANRECYCLE() rather than rolling several slightly different
   checks together.
 - Return from vtryrecycle() with a recycled vnode rather than a locked
   vnode.  This simplifies some locking.
 - Remove several GIANT_REQUIRED lines.
 - Add a few KASSERTs to help with INACT debugging.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:41:01 +00:00
Jeff Roberson
791625d853 - Remove GIANT_REQUIRED where giant is no longer required.
Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:33:46 +00:00
Jeff Roberson
82d1b24c70 - Remove GIANT_REQUIRED where it is no longer required.
Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:32:14 +00:00
Jeff Roberson
f50a2d5e2d - Remove GIANT_REQUIRED where giant is no longer required.
- Protect access to mnt_kern_flag with the mountpoint mutex.
 - Use the appropriate nd flags to deal with giant in vn_open_cred().
   We currently determine whether the caller is mpsafe by checking
   for a valid fdidx.  Any caller coming from user-space is now
   mpsafe and supplies a valid fd.  No kenrel callers have been
   converted to mpsafe, so this check is sufficient for now.
 - Use VFS_LOCK_GIANT instead of manual giant acquisition where
   appropriate.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:31:42 +00:00
Jeff Roberson
fc48b760ac - Protect mnt_kern_flag with the mountpoint's mutex. This is required
to make the suspend related functions mpsafe.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:28:41 +00:00
Jeff Roberson
22a960a69c - Acquire and release Giant as we enter and leave filesystems which
require it.
 - Track the status of Giant with the nd flag HASGIANT.
 - Release giant on return of namei() callers are not marked MPSAFE as
   they already own giant.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:27:05 +00:00
Jeff Roberson
94a9458501 - Change all vfs syscalls to use VFS_LOCK_GIANT(), and MPSAFE nds.
- Move Giant acquisition into the few vfs syscalls that weren't already
   directly acquiring it.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:25:44 +00:00
Jeff Roberson
799cc2dcee - Simplify the cache locking. The lock order relationship with the
vnode lock is much simpler than I originally thought it would be.
   Now, the cache lock is  always acquired before the vnode lock.
 - Provide some gotos in __getcwd() to simplify the unlocking a bit.
 - Move Giant acquisition down into __getcwd().

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:24:12 +00:00
Jeff Roberson
41bd6c15f2 - Do not use APAUSE if LK_INTERLOCK is set. We lose synchronization
if the lockmgr interlock is dropped after the caller's interlock
   is dropped.
 - Change some lockmgr KTRs to be slightly more helpful.

Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:20:59 +00:00
Jeff Roberson
66ca1b4878 - Use VFS_LOCK_GIANT() in place of mtx_lock(&giant), etc.
Sponsored By:	Isilon Systems, Inc.
2005-01-24 10:19:31 +00:00
Robert Watson
471135a3af Style cleanup: with removal of mutex operations, we can also remove
{}'s from securelevel_gt() and securelevel_ge().

MFC after:	1 week
2005-01-23 21:11:39 +00:00
Robert Watson
0b880542e6 When reading pr_securelevel from a prison, perform a lockless read,
as it's an integer read operation and the resulting slight race is
acceptable.

MFC after:	1 week
2005-01-23 21:01:00 +00:00
Robert Watson
4261ed50fd When retrieving the current per-jails securelevel for a sysctl read,
don't acquire the prison mutex, as it's an integer read and races
here don't make a difference.

MFC after:	1 week
2005-01-23 20:59:19 +00:00
Robert Watson
5324bda309 When DDB is not defined, don't implement witness_thread_has_locks() and
witness_proc_has_locks(), as they are unused, which results in a compiler
error.  This problem was introduced with the implementation of "show
alllocks".

Spotted by:	Artem Kuchin <matrix at itlegion dot ru>
2005-01-22 21:14:21 +00:00
Robert Watson
14cedfc842 Invoke label initialization, creation, cleanup, and tear-down MAC
Framework entry points for System V IPC shared memory.

Submitted by:	Dandekar Hrishikesh <rishi_dandekar at sbcglobal dot net>
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, SPAWAR, McAfee Research
2005-01-22 19:10:25 +00:00
Robert Watson
a6009aa7c1 Invoke label initialization, creation, cleanup, and tear-down MAC
Framework entry points for System V IPC semaphores.

Submitted by:	Dandekar Hrishikesh <rishi_dandekar at sbcglobal dot net>
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, SPAWAR, McAfee Research
2005-01-22 19:04:17 +00:00
Robert Watson
e6a543f8db Invoke label initialization, creation, cleanup, and tear-down MAC
Framework entry points for System V IPC message queues.

Submitted by:	Dandekar Hrishikesh <rishi_dandekar at sbcglobal dot net>
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, SPAWAR, McAfee Research
2005-01-22 18:51:43 +00:00
Bosko Milekic
e4eb384b47 Bring in MemGuard, a very simple and small replacement allocator
designed to help detect tamper-after-free scenarios, a problem more
and more common and likely with multithreaded kernels where race
conditions are more prevalent.

Currently MemGuard can only take over malloc()/realloc()/free() for
particular (a) malloc type(s) and the code brought in with this
change manually instruments it to take over M_SUBPROC allocations
as an example.  If you are planning to use it, for now you must:

	1) Put "options DEBUG_MEMGUARD" in your kernel config.
	2) Edit src/sys/kern/kern_malloc.c manually, look for
	   "XXX CHANGEME" and replace the M_SUBPROC comparison with
	   the appropriate malloc type (this might require additional
	   but small/simple code modification if, say, the malloc type
	   is declared out of scope).
	3) Build and install your kernel.  Tune vm.memguard_divisor
	   boot-time tunable which is used to scale how much of kmem_map
	   you want to allott for MemGuard's use.  The default is 10,
	   so kmem_size/10.

ToDo:
	1) Bring in a memguard(9) man page.
	2) Better instrumentation (e.g., boot-time) of MemGuard taking
	   over malloc types.
	3) Teach UMA about MemGuard to allow MemGuard to override zone
	   allocations too.
	4) Improve MemGuard if necessary.

This work is partly based on some old patches from Ian Dowse.
2005-01-21 18:09:17 +00:00
Colin Percival
7834081c88 Make "c->c_func = NULL" conditional on CALLOUT_LOCAL_ALLOC in both
places where it occurs, not just one. :-)

Pointed out by:	glebius
Pointy had to:	cperciva
2005-01-19 21:15:58 +00:00
Colin Percival
0ceba3d69c Make "c->c_func = NULL" conditional on the CALLOUT_LOCAL_ALLOC flag,
i.e., only clear c->c_func if the callout c is being used via the old
timeout(9) interface.

Requested by:	glebius
2005-01-19 20:34:46 +00:00
Colin Percival
86fd19de7b Clarify the description of the callout_active() macro: It is cleared by
callout_stop, callout_drain, and callout_deactivate, but is not
automatically cleared when a callout returns.
2005-01-19 19:46:35 +00:00
Paul Saab
efa42cbc93 move kern_nanosleep to sys/syscallsubr.h
Requested by:	jhb
2005-01-19 18:09:50 +00:00
Paul Saab
0e214fad37 Add a 32bit syscall wrapper for modstat
Obtained from:	Yahoo!
2005-01-19 17:53:06 +00:00