Commit Graph

5396 Commits

Author SHA1 Message Date
Don Lewis
b74f2c1878 Don't automagically call vslock() from SYSCTL_OUT(). Instead, complain
about calls to SYSCTL_OUT() made with locks held if the buffer has not
been pre-wired.  SYSCTL_OUT() should not be called while holding locks,
but if this is not possible, the buffer should be wired by calling
sysctl_wire_old_buffer() before grabbing any locks.
2002-08-06 11:28:09 +00:00
Alan Cox
20fb589d13 o The introduction of kevent() broke lio_listio(): _aio_aqueue() thought
that LIO_READ and LIO_WRITE were requests for kevent()-based
   notification of completion.  Modify _aio_aqueue() to recognize LIO_READ
   and LIO_WRITE.

Notes: (1) The patch provided by the PR perpetuates a second bug in this
code, a direct access to user-space memory.  This change fixes that bug
as well. (2) This change is to code that implements a deprecated interface.
It should probably be removed after an MFC.

PR:		kern/39556
2002-08-05 19:14:27 +00:00
Dag-Erling Smørgrav
ea4c8f8ca1 Check the far end before registering an EVFILT_WRITE filter on a pipe. 2002-08-05 15:03:03 +00:00
Jeff Roberson
8947be9ba0 - Move some logic from getnewvnode() to a new function vcanrecycle()
- Unlock the free list mutex around vcanrecycle to prevent a lock order
   reversal.
2002-08-05 10:15:56 +00:00
Jeff Roberson
18c6acee26 - Move a VOP assert to the right place.
Spotted by:	i386 tinderbox
2002-08-05 08:55:53 +00:00
Alfred Perlstein
4442e4a436 Cleanup:
Fix line wrapping.
Remove 'register'.
malloc(9) with M_WAITOK can't fail, so remove checks for that.
2002-08-05 05:16:09 +00:00
Luigi Rizzo
fc1c73c21a Temporarily disable polling when no processes are active, while I
investigate the problem described below.

I am seeing some strange livelock on recent -current sources with
a slow box under heavy load, which disappears with this change.
This might suggest some kind of problem (either insufficient locking,
or mishandling of priorities) in the poll_idle thread.
2002-08-04 21:00:49 +00:00
Jeff Roberson
e6e370a7fe - Replace v_flag with v_iflag and v_vflag
- v_vflag is protected by the vnode lock and is used when synchronization
   with VOP calls is needed.
 - v_iflag is protected by interlock and is used for dealing with vnode
   management issues.  These flags include X/O LOCK, FREE, DOOMED, etc.
 - All accesses to v_iflag and v_vflag have either been locked or marked with
   mp_fixme's.
 - Many ASSERT_VOP_LOCKED calls have been added where the locking was not
   clear.
 - Many functions in vfs_subr.c were restructured to provide for stronger
   locking.

Idea stolen from:	BSD/OS
2002-08-04 10:29:36 +00:00
Alan Cox
4453ada654 o Convert a vm_page_sleep_busy() into a vm_page_sleep_if_busy()
with appropriate page queue locking.
2002-08-04 06:27:37 +00:00
Matthew N. Dodd
9ccba881d9 Kernel modifications necessary to allow to follow fork()ed children.
PR:		 bin/25587 (in part)
MFC after:	 3 weeks
2002-08-04 01:07:02 +00:00
Alan Cox
3327872297 o Convert two instances of vm_page_sleep_busy() to vm_page_sleep_if_busy()
with appropriate page queue locking.
2002-08-03 18:59:19 +00:00
Maxime Henrion
f2b17113cf Make the consumers of the linker_load_file() function use
linker_load_module() instead.

This fixes a bug where the kernel was unable to properly locate and
load a kernel module in vfs_mount() (and probably in the netgraph
code as well since it was using the same function).  This is because
the linker_load_file() does not properly search the module path.

Problem found by:	peter
Reviewed by:		peter
Thanks to:		peter
2002-08-02 20:56:07 +00:00
Robert Watson
18b770b2fb Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke appropriate MAC framework entry points to authorize readdir()
operations in the native ABI.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 20:44:52 +00:00
Julian Elischer
67759b33f6 Fix a comment. 2002-08-01 19:10:40 +00:00
Julian Elischer
04774f2357 Slight cleanup of some comments/whitespace.
Make idle process state more consistant.
Add an assert on thread state.
Clean up idleproc/mi_switch() interaction.
Use a local instead of referencing curthread 7 times in a row
(I've been told curthread can be expensive on some architectures)
Remove some commented out code.
Add a little commented out code (completion coming soon)

Reviewed by:	jhb@freebsd.org
2002-08-01 18:45:10 +00:00
Robert Watson
ee0812f320 Since we have the struct file data pointer cached in vp, use that
instead when invoking VOP_POLL().
2002-08-01 18:29:30 +00:00
Alan Cox
46086ddf91 o Acquire the page queues lock before calling vm_page_io_finish().
o Assert that the page queues lock is held in vm_page_io_finish().
2002-08-01 17:57:42 +00:00
Robert Watson
f9d0d52459 Include file cleanup; mac.h and malloc.h at one point had ordering
relationship requirements, and no longer do.

Reminded by:	bde
2002-08-01 17:47:56 +00:00
Robert Watson
4a58340e98 Introduce support for Mandatory Access Control and extensible
kernel access control

Invoke appropriate MAC framework entry points to authorize a number
of vnode operations, including read, write, stat, poll.  This permits
MAC policies to revoke access to files following label changes,
and to limit information spread about the file to user processes.

Note: currently the file cached credential is used for some of
these authorization check.  We will need to expand some of the
MAC entry point APIs to permit multiple creds to be passed to
the access control check to allow diverse policy behavior.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 17:23:22 +00:00
Robert Watson
37bde6c0a3 Introduce support for Mandatory Access Control and extensible
kernel access control.

Restructure the vn_open_cred() access control checks to invoke
the MAC entry point for open authorization.  Note that MAC can
reject open requests where existing DAC code skips the open
authorization check due to O_CREAT.  However, the failure mode
here is the same as other failure modes following creation,
wherein an empty file may be left behind.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 17:14:28 +00:00
Robert Watson
f4d2cfdda6 Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke appropriate MAC entry points to authorize the following
operations:

        truncate on open()                      (write)
        access()                                (access)
        readlink()                              (readlink)
        chflags(), lchflags(), fchflags()       (setflag)
        chmod(), fchmod(), lchmod()             (setmode)
        chown(), fchown(), lchown()             (setowner)
        utimes(), lutimes(), futimes()          (setutimes)
        truncate(), ftrunfcate()                (write)
        revoke()                                (revoke)
        fhopen()                                (open)
        truncate on fhopen()                    (write)
        extattr_set_fd, extattr_set_file()      (setextattr)
        extattr_get_fd, extattr_get_file()      (getextattr)
        extattr_delete_fd(), extattr_delete_file() (setextattr)

These entry points permit MAC policies to enforce a variety of
protections on vnodes.  More vnode checks to come, especially in
non-native ABIs.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 15:37:12 +00:00
Robert Watson
339b79b939 Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke an appropriate MAC entry point to authorize execution of
a file by a process.  The check is placed slightly differently
than it appears in the trustedbsd_mac tree so that it prevents
a little more information leakage about the target of the execve()
operation.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 14:31:58 +00:00
Bosko Milekic
abc1263a51 Move the MAC label init/destroy stuff to more appropriate places so that
the inits/destroys are done without the cache locks held even in the
persistent-lock calls.  I may be cheating a little by using the MAC
"already initialized" flag for now.
2002-08-01 14:24:41 +00:00
John Baldwin
12240b1159 Revert previous revision which accidentally snuck in with another commit.
It just removed a comment that doesn't make sense to me personally.
2002-08-01 13:44:33 +00:00
John Baldwin
0711ca46c5 Revert previous revision which was accidentally committed and has not been
tested yet.
2002-08-01 13:39:33 +00:00
John Baldwin
fbd140c786 If we fail to write to a vnode during a ktrace write, then we drop all
other references to that vnode as a trace vnode in other processes as well
as in any pending requests on the todo list.  Thus, it is possible for a
ktrace request structure to have a NULL ktr_vp when it is destroyed in
ktr_freerequest().  We shouldn't call vrele() on the vnode in that case.

Reported by:	bde
2002-08-01 13:35:38 +00:00
Robert Watson
b3e13e1c3f Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument chdir() and chroot()-related system calls to invoke
appropriate MAC entry points to authorize the two operations.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 03:50:08 +00:00
Robert Watson
b827919594 Introduce support for Mandatory Access Control and extensible
kernel access control.

Implement two IOCTLs at the socket level to retrieve the primary
and peer labels from a socket.  Note that this user process interface
will be changing to improve multi-policy support.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 03:45:40 +00:00
Robert Watson
b285e7f9a8 Improve formatting and variable use consistency in extattr system
calls.

Submitted by:	green
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 01:29:03 +00:00
Robert Watson
956fc3f8a5 Simplify the logic to enter VFS_EXTATTRCTL().
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 01:26:07 +00:00
Robert Watson
d03db4290d Introduce support for Mandatory Access Control and extensible
kernel access control.

Authorize vop_readlink() and vop_lookup() activities during recursive
path lookup via namei() via calls to appropriate MAC entry points.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 01:21:40 +00:00
Robert Watson
6ea48a903c Introduce support for Mandatory Access Control and extensible
kernel access control.

Authorize the creation of UNIX domain sockets in the file system
namespace via an appropriate invocation a MAC framework entry
point.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 01:18:42 +00:00
Robert Watson
b65f6f6b69 When invoking NDINIT() in preparation for CREATE, set SAVENAME since
we'll use nd.ni_cnp later.

Submitted by:	green
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 01:16:22 +00:00
Robert Watson
62b24bcc26 Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument ctty driver invocations of various vnode operations on the
terminal controlling tty to perform appropriate MAC framework
authorization checks.

Note: VOP_IOCTL() on the ctty appears to be authorized using NOCRED in
the existing code rather than td->td_ucred.  Why?

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 01:09:54 +00:00
Robert Watson
467a273ca0 Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument the ktrace write operation so that it invokes the MAC
framework's vnode write authorization check.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 01:07:03 +00:00
Robert Watson
c86ca022eb Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument the kernel ACL retrieval and modification system calls
to invoke MAC framework entry points to authorize these operations.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 01:04:16 +00:00
Robert Watson
62f5f684fb Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument connect(), listen(), and bind() system calls to invoke
MAC framework entry points to permit policies to authorize these
requests.  This can be useful for policies that want to limit
the activity of processes involving particular types of IPC and
network activity.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 16:39:49 +00:00
Dag-Erling Smørgrav
aefe27a25c Have the kern.file sysctl export xfiles rather than files. The truth is
out there!

Sponsored by:	DARPA, NAI Labs
2002-07-31 12:26:52 +00:00
Dag-Erling Smørgrav
3072197229 Nit in previous commit: the correct sysctl type is "S,xvnode" 2002-07-31 12:25:28 +00:00
Dag-Erling Smørgrav
217b2a0b61 Initialize v_cachedid to -1 in getnewvnode().
Reintroduce the kern.vnode sysctl and make it export xvnodes rather than
vnodes.

Sponsored by:	DARPA, NAI Labs
2002-07-31 12:24:35 +00:00
Dag-Erling Smørgrav
4eee8de77c Introduce struct xvnode, which will be used instead of struct vnode for
sysctl purposes.  Also add two fields to struct vnode, v_cachedfs and
v_cachedid, which hold the vnode's device and file id and are filled in
by vn_open_cred() and vn_stat().

Sponsored by:	DARPA, NAI Labs
2002-07-31 12:19:49 +00:00
Alan Cox
67c1fae92e o Lock page accesses by vm_page_io_start() with the page queues lock.
o Assert that the page queues lock is held in vm_page_io_start().
2002-07-31 07:27:08 +00:00
Robert Watson
335654d73e Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke the necessary MAC entry points to maintain labels on sockets.
In particular, invoke entry points during socket allocation and
destruction, as well as creation by a process or during an
accept-scenario (sonewconn).  For UNIX domain sockets, also assign
a peer label.  As the socket code isn't locked down yet, locking
interactions are not yet clear.  Various protocol stack socket
operations (such as peer label assignment for IPv4) will follow.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 03:03:22 +00:00
Robert Watson
07bdba7e2d Note that the privilege indicating flag to vaccess() originally used
by the process accounting system is now deprecated.
2002-07-31 02:05:12 +00:00
Robert Watson
a0ee6ed1c0 Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke the necessary MAC entry points to maintain labels on vnodes.
In particular, initialize the label when the vnode is allocated or
reused, and destroy the label when the vnode is going to be released,
or reused.  Wow, an object where there really is exactly one place
where it's allocated, and one other where it's freed.  Amazing.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 02:03:46 +00:00
Robert Watson
e32a5b94d8 Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke additional MAC entry points when an mbuf packet header is
copied to another mbuf: release the old label if any, reinitialize
the new header, and ask the MAC framework to copy the header label
data.  Note that this requires a potential allocation operation,
but m_copy_pkthdr() is not permitted to fail, so we must block.
Since we now use interrupt threads, this is possible, but not
desirable.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 01:51:34 +00:00
Robert Watson
a3abeda755 Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke the necessary MAC entry points to maintain labels on header
mbufs.  In particular, invoke entry points during the two mbuf
header allocation cases, and the mbuf freeing case.  Pass the "how"
argument at allocation time to the MAC framework so that it can
determine if it is permitted to block (as with policy modules),
and permit the initialization entry point to fail if it needs to
allocate memory but is not permitted to, failing the mbuf
allocation.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 01:42:19 +00:00
Robert Watson
2712d0ee89 Introduce support for Mandatory Access Control and extensible
kernel access control.

Implement MAC framework access control entry points relating to
operations on mountpoints.  Currently, this consists only of
access control on mountpoint listing using the various statfs()
variations.  In the future, it might also be desirable to
implement checks on mount() and unmount().

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 01:27:33 +00:00
Robert Watson
a87cdf8335 Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke the necessary MAC entry points to maintain labels on
mount structures.  In particular, invoke entry points for
intialization and destruction in various scenarios (root,
non-root).  Also introduce an entry point in the boot procedure
following the mount of the root file system, but prior to the
start of the userland init process to permit policies to
perform further initialization.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 01:11:29 +00:00
Robert Watson
8a1d977d66 Introduce support for Mandatory Access Control and extensible
kernel access control.

Implement inter-process access control entry points for the MAC
framework.  This permits policy modules to augment the decision
making process for process and socket visibility, process debugging,
re-scheduling, and signaling.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 00:48:24 +00:00
Robert Watson
4024496496 Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke the necessary MAC entry points to maintain labels on
process credentials.  In particular, invoke entry points for
the initialization and destruction of struct ucred, the copying
of struct ucred, and permit the initial labels to be set for
both process 0 (parent of all kernel processes) and process 1
(parent of all user processes).

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 00:39:19 +00:00
Robert Watson
47ac133d33 Regen. 2002-07-31 00:16:58 +00:00
Robert Watson
55fb783052 Introduce support for Mandatory Access Control and extensible
kernel access control.

Replace 'void *' with 'struct mac *' now that mac.h is in the base
tree.  The current POSIX.1e-derived userland MAC interface is
schedule for replacement, but will act as a functional placeholder
until the replacement is done.  These system calls allow userland
processes to get and set labels on both the current process, as well
as file system objects and file descriptor backed objects.
2002-07-30 22:43:20 +00:00
Robert Watson
f8ef020e2e Begin committing support for Mandatory Access Control and extensible
kernel access control.  The MAC framework permits loadable kernel
modules to link to the kernel at compile-time, boot-time, or run-time,
and augment the system security policy.  This commit includes the
initial kernel implementation, although the interface with the userland
components of the operating system is still under work, and not all
kernel subsystems are supported.  Later in this commit sequence,
documentation of which kernel subsystems will not work correctly with
a kernel compiled with MAC support will be added.

Introduce two node vnode operations required to support MAC.  First,
VOP_REFRESHLABEL(), which will be invoked by callers requiring that
vp->v_label be sufficiently "fresh" for access control purposes.
Second, VOP_SETLABEL(), which be invoked by callers requiring that
the passed label contents be updated.  The file system is responsible
for updating v_label if appropriate in coordination with the MAC
framework, as well as committing to disk.  File systems that are
not MAC-aware need not implement these VOPs, as the MAC framework
will default to maintaining a single label for all vnodes based
on the label on the file system mount point.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-30 22:15:09 +00:00
Robert Watson
95fab37ea8 Begin committing support for Mandatory Access Control and extensible
kernel access control.  The MAC framework permits loadable kernel
modules to link to the kernel at compile-time, boot-time, or run-time,
and augment the system security policy.  This commit includes the
initial kernel implementation, although the interface with the userland
components of the oeprating system is still under work, and not all
kernel subsystems are supported.  Later in this commit sequence,
documentation of which kernel subsystems will not work correctly with
a kernel compiled with MAC support will be added.

kern_mac.c contains the body of the MAC framework.  Kernel and
user APIs defined in mac.h are implemented here, providing a front end
to loaded security modules.  This code implements a module registration
service, state (label) management, security configuration and policy
composition.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-30 21:36:05 +00:00
Julian Elischer
4d492b4369 Don't need to hold schedlock specifically for stop() ans it calls wakeup()
that locks it anyhow.

Reviewed by: jhb@freebsd.org
2002-07-30 21:13:48 +00:00
Bosko Milekic
c89137ff90 Make reference counting for mbuf clusters [only] work like in RELENG_4.
While I don't think this is the best solution, it certainly is the
fastest and in trying to find bottlenecks in network related code
I want this out of the way, so that I don't have to think about it.
What this means, for mbuf clusters anyway is:
- one less malloc() to do for every cluster allocation (replaced with
  a relatively quick calculation + assignment)
- no more free() in the cluster free case (replaced with empty space) :-)

This can offer a substantial throughput improvement, but it may not for
all cases.  Particularly noticable for larger buffer sends/recvs.
See http://people.freebsd.org/~bmilekic/code/measure2.txt for a rough
idea.
2002-07-30 21:06:27 +00:00
Alan Cox
1812190d09 o Replace vm_page_sleep_busy() with vm_page_sleep_if_busy()
in vfs_busy_pages().
2002-07-30 20:41:10 +00:00
Julian Elischer
b8e45df779 Remove code that removes thread from sleep queue before
adding it to a condvar wait.
We do not have asleep() any more so this can not happen.
2002-07-30 20:34:30 +00:00
Alan Cox
1161b86a15 o In do_sendfile(), replace vm_page_sleep_busy() by vm_page_sleep_if_busy()
and extend the scope of the page queues lock to cover all accesses
   to the page's flags and busy fields.
2002-07-30 18:51:07 +00:00
Robert Watson
e66c87b70e When referencing nd_cnp after namei(), always pass SAVENAME into
NDINIT() operation flags.

Submitted by:	green
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-30 18:48:25 +00:00
Robert Watson
e37b1fcdee Make M_COPY_PKTHDR() macro into a wrapper for a m_copy_pkthdr()
function.  This permits conditionally compiled extensions to the
packet header copying semantic, such as extensions to copy MAC
labels.

Reviewed by:	bmilekic
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-30 18:28:58 +00:00
Robert Watson
4266d0d0ce Regen. 2002-07-30 16:52:22 +00:00
Robert Watson
aedbd622fe Introduce a mac_policy() system call that will provide MAC policies
with a general purpose front end entry point for user applications
to invoke.  The MAC framework will route the system call to the
appropriate policy by name.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-30 16:50:25 +00:00
Jacques Vidrine
89ab930718 For processes which are set-user-ID or set-group-ID, the kernel performs a few
special actions for safety.  One of these is to make sure that file descriptors
0..2 are in use, by opening /dev/null for those that are not already open.
Another is to close any file descriptors 0..2 that reference procfs.  However,
these checks were made out of order, so that it was still possible for a
set-user-ID or set-group-ID process to be started with some of the file
descriptors 0..2 unused.

Submitted by:	Georgi Guninski <guninski@guninski.com>
2002-07-30 15:38:29 +00:00
Seigo Tanimura
133267776c In endtsleep() and cv_timedwait_end(), a thread marked TDF_TIMEOUT may
be swapped out.  Do not put such the thread directly back to the run
queue.

Spotted by:	David Xu <davidx@viasoft.com.cn>

While I am here, s/PS_TIMEOUT/TDF_TIMEOUT/.
2002-07-30 10:12:11 +00:00
Jeff Roberson
1e4c7a1368 - Acknowledge recursive vnode locks in the vop_unlock specification. The
vnode may not be unlocked even if the operation succeeded.
2002-07-30 08:50:52 +00:00
Seigo Tanimura
9eb881f804 - Optimize wakeup() and its friends; if a thread waken up is being
swapped in, we do not have to ask for the scheduler thread to do
  that.

- Assert that a process is not swapped out in runq functions and
  swapout().

- Introduce thread_safetoswapout() for readability.

- In swapout_procs(), perform a test that may block (check of a
  thread working on its vm map) first.  This lets us call swapout()
  with the sched_lock held, providing a better atomicity.
2002-07-30 06:54:05 +00:00
Mike Silbersack
c4441bc769 Update docs to reflect change in count of procs reserved for root
from 1 to 10.

PR:             kern/40515
Submitted by:   David Schultz <dschultz@uclink.Berkeley.EDU>
MFC after:      1 day
2002-07-30 05:37:00 +00:00
Robert Watson
03a719dcd1 Rebuild of files generated from syscalls.master.
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-30 02:09:24 +00:00
Robert Watson
5d37d00afc Prototype function arguments, only with MAC-specific structures
replaced with void until we bring in the actual structure definitions.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-30 02:06:34 +00:00
Robert Watson
7bc8250003 Stubs for the TrustedBSD MAC system calls to permit TrustedBSD MAC
userland code to operate on kernel's from the main tree.  Not much
in this file yet.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-30 02:04:05 +00:00
Julian Elischer
1d7b9ed2e6 Create a new thread state to describe threads that would be ready to run
except for the fact tha they are presently swapped out. Also add a process
flag to indicate that the process has started the struggle to swap
back in. This will be  needed for the case where multiple threads
start the swapin action top a collision. Also add code to stop
a process fropm being swapped out if one of the threads in this
process is actually off running on another CPU.. that might hurt...

Submitted by:	Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>
2002-07-29 18:33:32 +00:00
Jeff Roberson
a562685f65 - Backout the patch made in revision 1.75 of vfs_mount.c. The vputs here
were hiding the real problem of the missing unlock in sync_inactive.
 - Add the missing unlock in sync_inactive.

Submitted by:	iedowse
2002-07-29 06:26:55 +00:00
Don Lewis
9e74cba35a Make a temporary copy of the output data in the generic sysctl handlers
so that the data is less likely to be inconsistent if SYSCTL_OUT() blocks.
If the data is large, wire the output buffer instead.

This is somewhat less than optimal, since the handler could skip the copy
if it knew that the data was static.

If the data is dynamic, we are still not guaranteed to get a consistent
copy since another processor could change the data while the copy is in
progress because the data is not locked.  This problem could be solved if
the generic handlers had the ability to grab the proper lock before the
copy and release it afterwards.

This may duplicate work done in other sysctl handlers in the kernel which
also copy the data, possibly while a lock is held, before calling they call
a generic handler to output the data.  These handlers should probably call
SYSCTL_OUT() directly.
2002-07-28 21:06:14 +00:00
Don Lewis
5c38b6dbce Wire the sysctl output buffer before grabbing any locks to prevent
SYSCTL_OUT() from blocking while locks are held.  This should
only be done when it would be inconvenient to make a temporary copy of
the data and defer calling SYSCTL_OUT() until after the locks are
released.
2002-07-28 19:59:31 +00:00
David Malone
25dec7474c If a socket is disconnected for some reason (like a TCP connection
not responding) then drop any data on the outgoing queue in
soisdisconnected because there is no way to get it to its destination
any longer.

The only objection to this patch I got on -net was from Terry, who
wasn't sure that the condition in question could arise, so I provided
some example code.
2002-07-27 23:06:52 +00:00
Robert Watson
d06c0d4d40 Slight restructuring of the logic for credential change case identification
during execve() to use a 'credential_changing' variable.  This makes it
easier to have outstanding patchsets against this code, as well as to
add conditionally defined clauses.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-27 18:06:49 +00:00
John Baldwin
ce39e722ec Disable optimization of spinlocks on UP kernels w/o debugging for now
since it breaks mtx_owned() on spin mutexes when used outside of
mtx_assert().  Unfortunately we currently use it in the i386 MD code
and in the sio(4) driver.

Reported by:	bde
2002-07-27 16:54:23 +00:00
Jeff Roberson
0ea5e55265 - The default for lock, unlock, and islocked is now std* instead of no*. 2002-07-27 05:16:20 +00:00
Robert Drehmel
e25dadb05d Fix -Werror build for sparc64: Use the appropriate conversion
specifier for an 'unsigned int' argument.
2002-07-26 12:57:57 +00:00
Julian Elischer
8625e3721f get suspension counting right.
fix an error message

Submitted by:	David Xu <bsddiy@yahoo.com>
2002-07-25 03:21:35 +00:00
Julian Elischer
e3b9bf7198 fix some style problems and remove a mis-merged assert. 2002-07-25 00:27:39 +00:00
Julian Elischer
294e6308bf slight stylisations to take into account recent code changes. 2002-07-24 23:59:15 +00:00
Julian Elischer
b6d5995e5f Add some locking asserts and some comments 2002-07-24 23:21:05 +00:00
Julian Elischer
cf19bf911d When single threading a multithreaded program, awaken the
'single threading thread' when the last other thread suspends.
I had this code in there before but it seems to have been
accidentally deleted somewhere along the way.  This would only affect
multithreaded processes.

Reviewed by:	David Xu <bsddiy@yahoo.com>
2002-07-24 19:50:08 +00:00
Maxime Henrion
dae0abedbd Fix a stupid bug where I wasn't initializing the names
of 0-length mount options.
2002-07-24 19:50:00 +00:00
Robert Watson
eeb9251884 Under #ifdef DIAGNOSTIC, NULL out componentname pointers if we free the
pnbuf to increase the chances of detecting use of a free'd name buffer
if SAVENAME or SAVESTART wasn't passed in.  Curiously, running with these
changes doesn't panic the kernel, and should.
2002-07-24 15:42:22 +00:00
Bosko Milekic
4151d2e620 Move m_freem() from uipc_mbuf.c to subr_mbuf.c so it can take advantage
of the inlines, like its cousin, m_free().  Also, make a small (first
step?) optimisation of m_free() to use the MBP_PERSIST{,ENT} interface
to hold the lock across frees when possible.  The thing is that right
now, we can only do this easily for at most across one mbuf + one
cluster free, as the comment mentions (it also explains why).  Anyway,
some basic tests revealed a 5-10% overall improvement.  Some of the
results can be found here:
http://people.freebsd.org/~bmilekic/code/measure.txt
2002-07-24 15:11:23 +00:00
Mike Barcroft
5f0de71223 Catch up to rev 1.87 of sys/sys/socketvar.h (sb_cc changed from u_long
to u_int).

Noticed by:	sparc64 tinderbox
2002-07-24 14:21:41 +00:00
Julian Elischer
205683663f When suspending a thread, update the appropriate (sic) statistic. 2002-07-24 07:29:16 +00:00
Julian Elischer
38038891e9 revert some of the handling of STOP signals in
issignal(). Let thread_suspend_check() actually do the suspension
at the user boundary.

Submitted by:	David Xu <bsddiy@yahoo.com>
2002-07-24 07:23:41 +00:00
John Polstra
f824b5187e Widen struct sockbuf's sb_timeo member to int from short. With
non-default but reasonable values of hz this member overflowed,
breaking NFS over UDP.

Also, as long as I'm plowing up struct sockbuf ... Change certain
members from u_long/long to u_int/int in order to reduce wasted
space on 64-bit machines.  This change was requested by Andrew
Gallatin.

Netstat and systat need to be rebuilt.  I am incrementing
__FreeBSD_version in case any ports need to change.
2002-07-24 03:02:43 +00:00
Alfred Perlstein
b605b54ce3 Attempt to clarify comment in selrecord. 2002-07-24 00:29:22 +00:00
Bosko Milekic
dd4ac026f7 Introduce mb_free() to the MBP_PERSIST{,ENT} interface. What this means
is that grouped frees will be done as most often as possible without
dropping the cache lock in between.  So, for the most part, they'll be
done without the lock being dropped.  This is particularly true if you
have something that does a grouped m_getm() or m_getcl() (a cluster and
mbuf at the same time) - most likely getting the buffers from the
same per-CPU cache - and then frees them with m_free{,m}().  Unless
the buffers' underlying buckets were moved, the free will be done without
the lock getting dropped in between.  So far, only m_free() has been
shown how to do this, and m_freem() will shortly follow.

Since I'm here, I also fixed a small (but mostly harmless) type-mismatch
introduced in the last commit.
2002-07-23 14:55:33 +00:00
Alexander Kabaev
1c4229a6a7 Fix DIOCGMEDIASIZE and DIOCGSECTORSIZE ioctls to work for all
disk devices. This fixes the problem with these ioctls returning
EINVAL for plain slice devices with no disklabel on them.

The patch incorporates improvements and style fixes from BDE.

Reviewed by:	bde
Approved by:	obrien (mentor)
2002-07-23 14:30:27 +00:00
Andrew R. Reiter
5d3232048e - Make use of the VM_ALLOC_WIRED flag in the call to vm_page_alloc() in
do_sendfile().  This allows us to rearrange an if statement in order to
  avoid doing an unnecesary call to vm_page_lock_queues(), and an attempt
  at re-wiring the pages (which were wired in the vm_page_alloc() call).

Reviewed by:	alc, jhb
2002-07-23 01:09:34 +00:00
Alfred Perlstein
1a5a641600 Remove unneeded caddr_t casts. 2002-07-22 19:05:44 +00:00
Alfred Perlstein
fd6d9be4f5 Cleanup:
Define a debug printf macro rather than wrapping all calls to printf
with #ifdefs.
2002-07-22 18:27:54 +00:00
Alfred Perlstein
8209f090f1 Change struct vmspace->vm_shm from void * to struct shmmap_state *, this
removes the need for casts in several cases.
2002-07-22 16:22:27 +00:00
Alfred Perlstein
2cc593fd8e Remove caddr_t. 2002-07-22 16:12:55 +00:00
Alfred Perlstein
d452ec95a9 remove caddr_t from fo_ioctl calls 2002-07-22 15:46:51 +00:00
Alfred Perlstein
0a3e28cf1c remove caddr_t 2002-07-22 15:44:27 +00:00
Robert Watson
0b1040cb88 Set VAPPEND in open mode when O_APPEND is specified as an argument to
open() of fhopen().  Currently this has no actual affect due to the
treatment of VAPPEND in vaccess() and vaccess_acl() as a subset of
VWRITE, but when MAC comes in, MAC will distinguish the two.  Note:
if any file systems are cutting their own permission models, they
may wish to now take this into account.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-22 12:51:06 +00:00
Don Lewis
dcbe050b29 Pre-wire the output buffer so that sysctl_kern_function_list() doesn't
block in SYSCTL_OUT() while holding a lock.
2002-07-22 08:28:09 +00:00
Don Lewis
0600730d73 Provide a way for sysctl handlers to pre-wire their output buffer before
they grab a lock so that they don't block in SYSCTL_OUT() with the lock
being held.
2002-07-22 08:25:37 +00:00
Robert Watson
b02aac465d Teach discretionary access control methods for files about VAPPEND
and VALLPERM.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-22 03:57:07 +00:00
Alan Cox
c1d5e2741e o Lock page queue accesses by vm_page_free(). 2002-07-21 19:06:46 +00:00
Johan Karlsson
5b60674451 Save flags returned by vn_open and use them when calling vn_close.
Reviewed by:    bde
Approved by:    sheldonh (mentor)
2002-07-21 15:22:56 +00:00
Warner Losh
5878eb3fca Add bus_child_present and the child_present method to bus_if.m 2002-07-21 03:28:43 +00:00
Robert Watson
4f18efe220 Do preserve the error result from calling p_cansee() and use that when
failing because of the error.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-20 22:44:39 +00:00
Peter Wemm
3ebc124838 Infrastructure tweaks to allow having both an Elf32 and an Elf64 executable
handler in the kernel at the same time.  Also, allow for the
exec_new_vmspace() code to build a different sized vmspace depending on
the executable environment.  This is a big help for execing i386 binaries
on ia64.   The ELF exec code grows the ability to map partial pages when
there is a page size difference, eg: emulating 4K pages on 8K or 16K
hardware pages.

Flesh out the i386 emulation support for ia64.  At this point, the only
binary that I know of that fails is cvsup, because the cvsup runtime
tries to execute code in pages not marked executable.

Obtained from:  dfr (mostly, many tweaks from me).
2002-07-20 02:56:12 +00:00
Alan Cox
4aca0b1510 o Use vm_page_alloc(... | VM_ALLOC_WIRED) in place of vm_page_wire(). 2002-07-19 19:35:06 +00:00
Maxime Henrion
0f3b0aa87c Wrap a line longer than 80 characters. 2002-07-19 17:44:44 +00:00
Maxime Henrion
72fda5bc50 - Merge the mount options at MNT_UPDATE time with vfs_mergeopts().
- Sanity check the mount options list (remove duplicates) with
  vfs_sanitizeopts().
- Fix some malloc(0)/free(NULL) bugs.

Reviewed by:	rwatson (some time ago)
2002-07-19 16:05:31 +00:00
Kirk McKusick
7aca6291e3 Add support to UFS2 to provide storage for extended attributes.
As this code is not actually used by any of the existing
interfaces, it seems unlikely to break anything (famous
last words).

The internal kernel interface to manipulate these attributes
is invoked using two new IO_ flags: IO_NORMAL and IO_EXT.
These flags may be specified in the ioflags word of VOP_READ,
VOP_WRITE, and VOP_TRUNCATE. Specifying IO_NORMAL means that
you want to do I/O to the normal data part of the file and
IO_EXT means that you want to do I/O to the extended attributes
part of the file. IO_NORMAL and IO_EXT are mutually exclusive
for VOP_READ and VOP_WRITE, but may be specified individually
or together in the case of VOP_TRUNCATE. For example, when
removing a file, VOP_TRUNCATE is called with both IO_NORMAL
and IO_EXT set. For backward compatibility, if neither IO_NORMAL
nor IO_EXT is set, then IO_NORMAL is assumed.

Note that the BA_ and IO_ flags have been `merged' so that they
may both be used in the same flags word. This merger is possible
by assigning the IO_ flags to the low sixteen bits and the BA_
flags the high sixteen bits. This works because the high sixteen
bits of the IO_ word is reserved for read-ahead and help with
write clustering so will never be used for flags. This merge
lets us get away from code of the form:

        if (ioflags & IO_SYNC)
                flags |= BA_SYNC;

For the future, I have considered adding a new field to the
vattr structure, va_extsize. This addition could then be
exported through the stat structure to allow applications to
find out the size of the extended attribute storage and also
would provide a more standard interface for truncating them
(via VOP_SETATTR rather than VOP_TRUNCATE).

I am also contemplating adding a pathconf parameter (for
concreteness, lets call it _PC_MAX_EXTSIZE) which would
let an application determine the maximum size of the extended
atribute storage.

Sponsored by:	DARPA & NAI Labs.
2002-07-19 07:29:39 +00:00
Julian Elischer
9f189ade99 Clear up confusion in ugly code. ^T gave wrong results for RSS.
I misinterpretted this code when changing it to handle threads.
(there are still issues here)
Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
2002-07-18 21:19:56 +00:00
Peter Wemm
02fb42b0a8 ia64 does not have the same degree of stealth include file nesting,
so it needs an explicit #include <machine/frame.h> to get 'struct
trapframe'.  The fact that it needs this at this level is rather bogus
but it will not compile without it.
2002-07-17 23:43:55 +00:00
Peter Wemm
8a2bd34560 Pacify gcc on ia64 2002-07-17 23:32:13 +00:00
Julian Elischer
2d014fd7f8 Fix a reversed test.
Fix some style nits.
Fix a KASSERT message.
Add/fix some comments.

Submitted by:	bde@freebsd.org
2002-07-17 19:20:48 +00:00
Julian Elischer
cad4143a58 Make sure the process state for the idle proc is set correctly
from the beginning.
2002-07-17 19:18:45 +00:00
John Baldwin
3d3f20cbe6 Preallocate a struct file as the first thing in falloc() before we lock
the filelist_lock and check nfiles.  This closes a race where we had to
unlock the filedesc to re-lock the filelist_lock.

Reported by:	David Xu
Reviewed by:	bde (mostly)
2002-07-17 02:48:43 +00:00
John Baldwin
627ed43ba7 Add a KASSERT() to assert that td_critnest is == 1 when mi_switch() is
called.
2002-07-17 02:46:13 +00:00
Andrew Gallatin
fe79953325 Allow alphas to do crashdumps: Refuse to run anything in choosethread()
after a panic which is not an interrupt thread, or the thread which
caused the panic.  Also, remove panicstr checks from msleep() and from
cv_wait() in order to allow threads to go to sleep and yeild the cpu
to the panicing thread, or to an interrupt thread which might
be doing the crashdump.

Reviewed by: jhb  (and it was mostly his idea too)
2002-07-17 02:23:44 +00:00
Kirk McKusick
fb36a3d847 Change utimes to set the file creation time (for filesystems that
support creation times such as UFS2) to the value of the
modification time if the value of the modification time is older
than the current creation time. See utimes(2) for further details.

Sponsored by:	DARPA & NAI Labs.
2002-07-17 02:03:19 +00:00
Kirk McKusick
faab4e2722 Change the name of st_createtime to st_birthtime. This change is
made to reduce confusion between st_ctime and st_createtime.

Submitted by:	Eric Allman <eric@sendmail.org>
Sponsored by:	DARPA & NAI Labs.
2002-07-16 22:36:00 +00:00
Mark Murray
f0d2d03884 Fix a bazillion lint and WARNS warnings. One major fix is the removal of
semicolons from the end of macros:

#define FOO() bar(a,b,c);

becomes

#define FOO() bar(a,b,c)

Thus requiring the semicolon in the invocation of FOO. This is much
cleaner syntax and more consistent with expectations when writing
function-like things in source.

With both peril-sensitive sunglasses and flame-proof undies on, tighten
up some types, and work around some warnings generated by this. There
are some _horrible_ const/non-const issues in this code.
2002-07-15 17:28:34 +00:00
Mark Murray
b90cce95e0 Use ISO 9X variadic macro format; arguments are not optional, just
variable.
2002-07-15 17:17:56 +00:00
Bosko Milekic
185c2244ce o Introduce new m_getcl() interface routine that allocates an mbuf
and a cluster in one shot.
o Introduce MBP_PERSIST and MBP_PERSISTENT control bits to mb_alloc();
  MBP_PERSIST means "if you can allocate, then keep the cache lock
  held on exit," and MBP_PERSISTENT means "a cache lock is alredy held
  on entry, so allocate from the specified (already locked) cache."
  They may be used in combination.
o m_getcl() uses the MBP_PERSIST/MBP_PERSISTENT interface so that it
  doesn't drop the cache lock in between the mbuf and cluster allocations.
o m_getm(), which takes a size and allocates an mbuf + cluster "best fit"
  chain, has been moved from uipc_mbuf.c to subr_mbuf.c and shown how to
  use MBP_PERSIST/MBP_PERSISTENT to attempt to do a grouped allocation
  without dropping the cache lock in between.

Why this is good: much less bus-locked lock acquires/drops when they're
not needed.  Also, prototype for m_getcl():
struct mbuf * m_getcl(int how, short type, int flags);
"how" and "type" are self-explanatory.  "flags" may be M_PKTHDR, in
which case m_getcl() will make the mbuf a pkthdr-mbuf.

While I'm in subr_mbuf.c:
o Every exported routine now has a nice comment with a description of
  the expected arguments.  Eventually, mbuf(9) needs to be re-vamped
  but there's still more code to write/finalize before I get to that.
o internal macros have been changed a bit.
o consistently use 'short' for "type."  This somehow slipped through
  before (that 'type' was sometimes declared as int).

Alfred has been pushing for the MBP_PERSIST{,ENT} thing for almost a
year now.  Luigi asked for m_getcl(), and will probably MFC that
part of this commit.

TODO [Related]: teach mb_free() about MBP_PERSIST{, ENT}.
2002-07-15 15:32:59 +00:00
Mark Murray
1cc6a5356c Consistently use semicolons to terminate macro invocations. Cleaner
style and fixes later warnings.
2002-07-15 13:17:23 +00:00
Mark Murray
8deedb62c1 Convert GNU-styled variadic macros to ISO(9x) style. 2002-07-15 13:15:31 +00:00
Mark Murray
4f8cb019ea Use a semicolon at the end of a function-like macro invocation. Kills
warnings and makes the visual style easier.
2002-07-15 13:13:04 +00:00
Mark Peek
11a78c514f Silence compiler warnings when DDB is not defined.
PR:		36002
Submitted by:	Yoshikazu GOTO <goto@snowy.to>
2002-07-15 02:03:17 +00:00
Alan Cox
9175709532 o Lock page queue accesses by vm_page_wire(). 2002-07-14 19:45:46 +00:00
Alan Cox
b3afd20d9a In execve(), delay the acquisition of Giant until after kmem_alloc_wait().
(Operations on the exec_map don't require Giant.)
2002-07-14 17:58:35 +00:00
Julian Elischer
66d593142d part of a greater patch set..
1/ don't need to set td_state to TDS_RUNNING in fork_return.
it's already set in choosethread().
2/ Set a child process state to "normal" as opposed to "new"
when we allow it to be put on the run queue.
Allows child to receive signals from the parent if the parent
runs first and tries to immediatly signal he child.

Submitted by:  (part 2)	Thomas Moestl <tmoestl@gmx.net>
2002-07-14 08:29:15 +00:00
Julian Elischer
c3b98db091 Thinking about it I came to the conclusion that the KSE states were incorrectly
formulated.  The correct states should be:
IDLE:  On the idle KSE list for that KSEG
RUNQ:  Linked onto the system run queue.
THREAD: Attached to a thread and slaved to whatever state the thread is in.

This means that most places where we were adjusting kse state can go away
as it is just moving around because the thread is..
The only places we need to adjust the KSE state is in transition to and from
the idle and run queues.

Reviewed by:	jhb@freebsd.org
2002-07-14 03:43:33 +00:00
Julian Elischer
ac8bcbb700 oops, state cannot be two different values at once..
use || instead of &&
2002-07-14 01:36:48 +00:00
Alan Cox
5123aaef42 o Lock some page queue accesses, in particular, those by vm_page_unwire(). 2002-07-13 20:13:34 +00:00
Alfred Perlstein
8a32e0c96f Remove incorrect comment about now corrected manpage. 2002-07-13 17:11:17 +00:00
Alan Cox
9d1291be8f Lock accesses to the page queues. 2002-07-13 04:37:22 +00:00
Alan Cox
b416fa1041 o Lock accesses to the page queues.
o Add a comment explaining why hoisting the page queue lock outside
   of a particular loop is not possible.
2002-07-13 04:09:45 +00:00
John Baldwin
03d7a9fffb - Change chroot_refuse_vdir_fds() to require that the passed in struct
filedesc is already locked rather than having chroot() unlock the
  filedesc so chroot_refuse_vdir_fds() can immediately relock it.
- Reorder chroot() a bitso that we do the namei lookup before checking
  the process's struct filedesc.  This closes at least one potential race
  and allows us to only acquire the filedsec lock once in chroot().
- Push down Giant slightly into chroot().
2002-07-13 04:07:12 +00:00
John Baldwin
63c9e754e0 We don't need to clear oldcred here since newcred is not NULL yet. 2002-07-13 03:13:15 +00:00
Alan Cox
ae0ffa73cc Lock accesses to the page queues by sendfile() and friends. 2002-07-13 03:10:55 +00:00
Matthew Dillon
fbcf77c2ea Re-enable the idle page-zeroing code. Remove all IPIs from the idle
page-zeroing code as well as from the general page-zeroing code and use a
lazy tlb page invalidation scheme based on a callback made at the end
of mi_switch.

A number of people came up with this idea at the same time so credit
belongs to Peter, John, and Jake as well.

Two-way SMP buildworld -j 5 tests (second run, after stabilization)
    2282.76 real  2515.17 user  704.22 sys	before peter's IPI commit
    2266.69 real  2467.50 user  633.77 sys	after peter's commit
    2232.80 real  2468.99 user  615.89 sys	after this commit

Reviewed by:	peter, jhb
Approved by:	peter
2002-07-12 20:17:06 +00:00
Julian Elischer
40e550266d also set the KSE state for the idle KSE/thread case. 2002-07-12 20:16:46 +00:00
John Baldwin
33d7ad1abe Set the thread state of the newly chosen to run thread to TDS_RUNNING in
choosethread() in MI C code instead of doing it in in assembly in all the
various cpu_switch() functions.  This fixes problems on ia64 and sparc64.

Reviewed by:	julian, peter, benno
Tested on:	i386, alpha, sparc64
2002-07-12 18:34:22 +00:00
Alan Cox
a4e80b6b64 Lock accesses to the page queues. 2002-07-12 17:21:22 +00:00
Thomas Moestl
5c85966098 Fix ptrace(PT_READ_*, ...) for non-little-endian architectures where
sizeof(register_t) != sizeof(int).
2002-07-12 16:48:05 +00:00
Peter Wemm
f1b665c8fe Revive backed out pmap related changes from Feb 2002. The highlights are:
- It actually works this time, honest!
- Fine grained TLB shootdowns for SMP on i386.  IPI's are very expensive,
  so try and optimize things where possible.
- Introduce ranged shootdowns that can be done as a single IPI.
- PG_G support for i386
- Specific-cpu targeted shootdowns.  For example, there is no sense in
  globally purging the TLB cache for where we are stealing a page from
  the local unshared process on the local cpu.  Use pm_active to track
  this.
- Add some instrumentation for the tlb shootdown code.
- Rip out SMP code from <machine/cpufunc.h>
- Try and fix some very bogus PG_G and PG_PS interactions that were bad
  enough to cause vm86 bios calls to break.  vm86 depended on our existing
  bugs and this was the cause of the VESA panics last time.
- Fix the silly one-line error that caused the 'panic: bad pte' last time.
- Fix a couple of other silly one-line errors that should have caused more
  pain than they did.

Some more work is needed:
- pmap_{zero,copy}_page[_idle].  These can be done without IPI's if we
  have a hook in cpu_switch.
- The IPI handlers need some cleanup.  I have a bogus %ds load that can
  be avoided.
- APTD handling is rather bogus and appears to be a large source of
  global TLB IPI shootdowns for no really good reason.

I see speedups of between 1.5% and ~4% on buildworlds in a while 1 loop.
I expect to see a bigger difference when there is significant pageout
activity or the system otherwise has memory shortages.

I have backed out a few optimizations that I had been using over the last
few days in order to be a little more conservative.  I'll revisit these
again over the next few days as the dust settles.

New option:  DISABLE_PG_G - In case I missed something.
2002-07-12 07:56:11 +00:00
Alfred Perlstein
d11a56617d regen for freebsd4_sendfile(2) compat. 2002-07-12 06:52:44 +00:00
Alfred Perlstein
9c34129662 Create a bug-for-bug FreeBSD4 compatible version of sendfile and move the
fixed sendfile over.  This is needed to preserve binary compatibility from
4.x to 5.x.
2002-07-12 06:51:57 +00:00
Alfred Perlstein
074453c230 Introduce syscall.master option 'COMPAT4' which allows one to wrap
syscalls for FreeBSD 4 compatibility.
Add kernel option COMPAT_FREEBSD4 to enable these syscalls.
2002-07-12 06:38:34 +00:00
Kenneth D. Merry
a720097d24 Fix compilation with ENABLE_VFS_IOOPT turned on and ZERO_COPY_SOCKETS
turned off.

Clean up #ifdefs, and remove a bunch of unnecessary includes.

Reviewed by:	bde
Tested by:	netchild
2002-07-12 02:23:55 +00:00
Julian Elischer
5e3da64ee9 Remove debugging code that I originally only wanted to be there for a couple of days after merge.
Reminded with pointy stick by: jhb
2002-07-11 22:47:58 +00:00
John Baldwin
eb80408cff Add a missing newline during panic printf's for SMP systems that don't
have APICS.  (Like all the !i386 archs).
2002-07-11 21:56:37 +00:00
Alan Cox
97646f567d o Lock accesses to the page queues. 2002-07-11 18:48:05 +00:00
Jonathan Mini
aaa1c7715b Revert removal of cred_free_thread(): It is used to ensure that a thread's
credentials are not improperly borrowed when the thread is not current in
the kernel.

Requested by:	jhb, alfred
2002-07-11 02:18:33 +00:00
Johan Karlsson
92da2e7671 Open accounting file for appending, not general writing.
This allows accton(1) to be used with an append-only file.

PR:		7169
Reported by:	Joao Carlos Mendes Luis <jonny@jonny.eng.br>
Reviewed by:	bde
Approved by:	sheldonh (mentor)
MFC after:	2 weeks
2002-07-10 17:31:58 +00:00
Matthew Dillon
d331c5d43f Replace the global buffer hash table with per-vnode splay trees using a
methodology similar to the vm_map_entry splay and the VM splay that Alan
Cox is working on.  Extensive testing has appeared to have shown no
increase in overhead.

Disadvantages
    Dirties more cache lines during lookups.

    Not as fast as a hash table lookup (but still N log N and optimal
    when there is locality of reference).

Advantages
    vnode->v_dirtyblkhd is now perfectly sorted, making fsync/sync/filesystem
    syncer operate more efficiently.

    I get to rip out all the old hacks (some of which were mine) that tried
    to keep the v_dirtyblkhd tailq sorted.

    The per-vnode splay tree should be easier to lock / SMPng pushdown on
    vnodes will be easier.

    This commit along with another that Alan is working on for the VM page
    global hash table will allow me to implement ranged fsync(), optimize
    server-side nfs commit rpcs, and implement partial syncs by the
    filesystem syncer (aka filesystem syncer would detect that someone is
    trying to get the vnode lock, remembers its place, and skip to the
    next vnode).

Note that the buffer cache splay is somewhat more complex then other splays
due to special handling of background bitmap writes (multiple buffers with
the same lblkno in the same vnode), and B_INVAL discontinuities between the
old hash table and the existence of the buffer on the v_cleanblkhd list.

Suggested by: alc
2002-07-10 17:02:32 +00:00
Julian Elischer
ad22735e3f Don't slow every syscall and trap by doing locks and stuff if the
'stop' bits are not set. This is a temporary thing.. I think this code probably
needs to be rewritten anyhow.
2002-07-10 06:40:22 +00:00
Don Lewis
832dafad3d Rearrange the code so that it checks whether the file is something
valid to write a core dump to before doing the preparations to actually
write to the file.

Call VOP_GETATTR() before dropping the initial vnode lock.
2002-07-10 06:31:35 +00:00
Maxime Henrion
fbedc80bd1 Remove vfs_stdmount() and vfs_stdunmount(). They are not
really useful and are incompatible with nmount.
2002-07-09 22:50:29 +00:00
Jeff Roberson
50bfcee1cb - Use the new vop_lookup_{pre,post} instead of simpler locking specification. 2002-07-09 19:55:06 +00:00
Jeff Roberson
25b286d6db - Use standard locking functions in syncer's opv
- vput instead of vrele syncer vnodes in vfs_mount
 - Add vop_lookup_{pre,post} to verify locking in VOP_LOOKUP
2002-07-09 19:54:20 +00:00
John Baldwin
54a033896d Fix a minor whitespace style nit that broke 'grep ^uuidgen'. 2002-07-09 19:36:50 +00:00
Maxime Henrion
43088e9868 Add a VFS_START() call in vfs_mountroot_try() for the sake
of being correct.  None of the root mountable filesystems
do something at VFS_START().

Shorten a comment to fix a style bug while I'm here.

PR:	kern/18505
2002-07-08 19:10:15 +00:00
Bruce Evans
f20c8dde15 Fixed some printf format errors (one new one reported by gcc and 3 nearby
old ones not reported by gcc).  This helps unbreak LINT.
2002-07-08 12:21:11 +00:00
Peter Wemm
a136efe9b6 Collect all the (now equivalent) pmap_new_proc/pmap_dispose_proc/
pmap_swapin_proc/pmap_swapout_proc functions from the MD pmap code
and use a single equivalent MI version.  There are other cleanups
needed still.

While here, use the UMA zone hooks to keep a cache of preinitialized
proc structures handy, just like the thread system does.  This eliminates
one dependency on 'struct proc' being persistent even after being freed.
There are some comments about things that can be factored out into
ctor/dtor functions if it is worth it.  For now they are mostly just
doing statistics to get a feel of how it is working.
2002-07-07 23:05:27 +00:00
Jeff Roberson
41a5470d03 - Require locks for getattr. At some point this could only require shared
locks.
2002-07-07 22:37:45 +00:00
Jeff Roberson
31965a72c9 - Delay unlocking a vnode in linker_hints_lookup until we're actually done
with it.
 - Remove a now stale comment about improper vnode locking.
2002-07-07 22:35:47 +00:00
Peter Wemm
b799f5a475 Make this compile on 64 bit platforms 2002-07-07 22:27:40 +00:00
Jeff Roberson
18c48f437f - Don't hold the vn lock while calling VOP_CLOSE in vclean(). 2002-07-07 06:38:22 +00:00
Jeff Roberson
bed75d4627 - BUF_REFCNT() seems to be the preferred method for verifying a locked buf.
Tell vop_strategy_pre() to use this instead.
 - Ignore B_CLUSTER bufs.  Their components are locked but they don't really
   exist so they don't have to be.  This isn't ideal but it is safe.
2002-07-07 05:29:45 +00:00
Jeff Roberson
49244e35ff Add two asserts that prove & document getblk and geteblk's behavior of
returning locked bufs.
2002-07-07 05:27:08 +00:00
Jeff Roberson
c031d11bb4 Fix a mistake in my last commit. Don't grab an extra reference to the object
in bp->b_object.
2002-07-06 21:27:20 +00:00
Jeff Roberson
9a236af3ad Fixup uses of GETVOBJECT.
- Cache a pointer to the vnode's object in the buf.
 - Hold a reference to that object in addition to the vnode's reference just
   to be consistent.
 - Cleanup code that got the object indirectly through the vp and VOP calls.

This fixes at least one case where we were calling GETVOBJECT without a lock.
It also avoids an expensive layered call at the cost of another pointer in
struct buf.
2002-07-06 08:59:52 +00:00
Julian Elischer
fe0f1bf4b1 make this repect ps_sigintr if there is a pre-existing signal
or suspension request.

Submitted by:	David Xu
2002-07-06 08:47:24 +00:00
Jeff Roberson
0b2ed1aef7 Clean up execve locking:
- Grab the vnode object early in exec when we still have the vnode lock.
 - Cache the object in the image_params.
 - Make use of the cached object in imgact_*.c
2002-07-06 07:00:01 +00:00
Jeff Roberson
e818064e98 - Disable original vop_strategy lock specification.
- Switch to the new vop_strategy_pre for lock validation.

VOP_STRATEGY requires only that the buf is locked UNLESS the block numbers need
to be translated.  There may be other reasons, but as long as the underlying
layer uses a VOP to perform the operations they will be caught later.
2002-07-06 05:23:17 +00:00
Jeff Roberson
302c7aaab9 - Add vop_strategy_pre to validate VOP_STRATEGY locking.
- Disable original vop_strategy lock specification.
 - Switch to the new vop_strategy_pre for lock validation.

VOP_STRATEGY requires only that the buf is locked UNLESS the block numbers need
to be translated.  There may be other reasons, but as long as the underlying
layer uses a VOP to perform the operations they will be caught later.
2002-07-06 05:21:12 +00:00
Jeff Roberson
13e407efee Use the new #! directive for vop_rename. Leave the old lock specification
intact but disabled.
2002-07-06 04:41:27 +00:00
Jeff Roberson
cc8662b0f9 Add "vop_rename_pre" to do pre rename lock verification. This is enabled only
with DEBUG_VFS_LOCKS.
2002-07-06 04:39:48 +00:00
Julian Elischer
55fb7ca894 Fix at least one of the things wrong with signals
^Z should work a lot better now.

Submitted by:	peter@freebsd.org
2002-07-06 02:45:11 +00:00
Andrew Gallatin
a83560d677 Remove the advertising clause from the Duke BSD copyright on the
zero-copy files

Requested by: rwatson
Approved by: Jeff Chase (my old boss at Duke)
2002-07-06 02:44:15 +00:00
Warner Losh
e0b7446484 dd %i as an alias for %d for greater compatibility with our *BSD bretheren
Obtained from: NetBSD
Reviewed by: jake, rwatson, bosko
2002-07-05 18:36:49 +00:00
Jeff Roberson
2efc89d4dc Include systm.h before vnode.h so Debugger() and printf() are available when
full vnode lock debugging is enabled.
2002-07-05 05:15:30 +00:00
Alan Cox
70c1763634 o Resurrect vm_page_lock_queues(), vm_page_unlock_queues(), and the free
queue lock (revision 1.33 of vm/vm_page.c removed them).
 o Make the free queue lock a spin lock because it's sometimes acquired
   inside of a critical section.
2002-07-04 22:07:37 +00:00
Maxime Henrion
d7f9ecc86b Move vfs_rootmountalloc() in vfs_mount.c and remove lite2_vfs_mountroot()
which was #if 0'd and is not likely to be used now.
2002-07-03 09:27:24 +00:00
Julian Elischer
aa0fa33464 Try clean up some of the mess that resulted from layers and layers
of p4 merges from -current as things started getting different.

Corroborated by: Similar patches just mailed by BDE.
2002-07-03 09:15:20 +00:00
Maxime Henrion
563af2ec15 Remove an unused argument in vfs_mountroot(). 2002-07-03 08:52:37 +00:00
Julian Elischer
ee9919b024 White space commit.
I'm working on this file but I wanted to make the whitespece commit
separatly.
2002-07-03 06:15:26 +00:00
Andrew Gallatin
0ac3b6364f Hold the sched lock across call to forward_signal() in tdsignal() to
keep SMP systems from panic'ing when ^C'ing an app

suggested by julian
2002-07-03 02:55:48 +00:00
Dag-Erling Smørgrav
b61860ad2d Add mtx_ prefixes to the fields used for mutex profiling, and fix a bug
where the profiling code would report the release point instead of the
acquisition point.

Requested by:	bde
2002-07-03 01:50:27 +00:00
Maxime Henrion
534ab2e108 I didn't pay enough attention when copy/pasting disclaimers.
The disclaimer in vfs_conf.c was slightly different.  Fix this.
2002-07-02 18:33:32 +00:00
Maxime Henrion
2b4edb69f1 Move every code related to mount(2) in a new file, vfs_mount.c.
The file vfs_conf.c which was dealing with root mounting has
been repo-copied into vfs_mount.c to preserve history.
This makes nmount related development easier, and help reducing
the size of vfs_syscalls.c, which is still an enormous file.

Reviewed by:	rwatson
Repo-copy by:	peter
2002-07-02 17:09:22 +00:00
Julian Elischer
8b768fc82b When going back to SLEEP state, make sure our
State is correctly marked so.
2002-07-02 05:40:51 +00:00
Julian Elischer
d5cb7e14f6 Fix failure to correctly transition back to sleep mode. 2002-07-02 05:33:46 +00:00
Peter Wemm
c781aea8ba #include <sys/ktrace.h> would be useful too. (for ktrace_mtx) 2002-07-01 23:18:08 +00:00
Ian Dowse
f2f2285a6a The jail syscall calls chroot, which is not mpsafe, so put back a
mtx_lock(&Giant) around that call.

Reviewed by:	arr
2002-07-01 20:46:01 +00:00
Peter Wemm
1e9b3d9142 Add #include "opt_ktrace.h" 2002-07-01 19:49:04 +00:00
Ian Dowse
6bd521df93 Use indirect function pointer hooks instead of #ifdef SOFTUPDATES
direct calls for the two places where the kernel calls into soft
updates code. Set up the hooks in softdep_initialize() and NULL
them out in softdep_uninitialize(). This change allows soft updates
to function correctly when ufs is loaded as a module.

Reviewed by:	mckusick
2002-07-01 17:59:40 +00:00
Andrew R. Reiter
c0854cd341 - In thread_userret(), remove the Giant locking and unlocking around the
call to thread_alloc().

Approved by:	julian
Reviewed by:	jake, jeff
2002-07-01 03:15:16 +00:00
Julian Elischer
7c7a6f22ca If the process is a zombie, then you must not try dereference the thread
because there isn't one. Of course this code only possibly works
for single threaded processes anyhow..
2002-06-30 07:50:22 +00:00
Alfred Perlstein
37a6b453c4 Partial backout of 1.318, remove error handling added because it may be
incorrect.

Requested by: bde
2002-06-30 05:23:58 +00:00
Ian Dowse
37777f4d1f Add a hashdestroy() function to undo the actions of hashinit(). 2002-06-30 02:07:26 +00:00
Alfred Perlstein
97bb78ace2 Fix several style bugs:
close up the continued line after removing the cast made the line.
space before parentheses in indirect function call.

Add an addtional error handler case for the results of callback.

Submitted by: bde
2002-06-29 17:58:44 +00:00
Alfred Perlstein
c5e3ef7e1f Unbreak computation of 'smask' that I broke when removing caddr_t.
Submitted by: bde
2002-06-29 17:56:34 +00:00
Julian Elischer
e602ba25fd Part 1 of KSE-III
The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by:	Almost everyone who counts
	(at various times, peter, jhb, matt, alfred, mini, bernd,
	and a cast of thousands)

	NOTE: this is still Beta code, and contains lots of debugging stuff.
	expect slight instability in signals..
2002-06-29 17:26:22 +00:00
Julian Elischer
44990b8cb8 Add files that are new for KSE. 2002-06-29 07:04:59 +00:00
David E. O'Brien
87e1503e2c Rename the db command lockedvnodes to lockedvnods so that it fits on the
help screen and one doens't think we have a lockedvnodesmap command.
2002-06-29 04:45:09 +00:00
Alfred Perlstein
016091145e more caddr_t removal. 2002-06-29 02:00:02 +00:00
Alfred Perlstein
7f05b0353a More caddr_t removal, make fo_ioctl take a void * instead of a caddr_t. 2002-06-29 01:50:25 +00:00
Alfred Perlstein
69a3693f3e catch up with mextadd callback taking a void argument instead of a caddr_t. 2002-06-29 01:49:22 +00:00
Alfred Perlstein
802082390b More caddr_t removal.
Change struct knote's kn_hook from caddr_t to void *.
2002-06-29 00:29:12 +00:00
Alfred Perlstein
a551e20e27 nuke more instances of caddr_t 2002-06-29 00:02:01 +00:00
Alfred Perlstein
337f75e11c m_extadd takes a void (*freef)(void *, void *) now, not a
void (*freef)(caddr_t, void *).
2002-06-29 00:01:46 +00:00
Alfred Perlstein
64f0b9d749 remove or replace caddr_t with void.
make the mbuf external free function take a void * rather than caddr_t.
2002-06-28 23:48:23 +00:00
Alfred Perlstein
210a5a7169 nuke caddr_t. 2002-06-28 23:17:36 +00:00
Alfred Perlstein
a788442584 Remove unneeded casts to caddr_t. 2002-06-28 23:02:38 +00:00
Alfred Perlstein
52545a237b document that the pipe fo_stat routine doesn't need locks because it's
a read operation.

Requested by: rwatson
2002-06-28 22:35:12 +00:00
Jeff Roberson
90769c9ed0 Improve the VOP locking asserts
- Add vfs_badlock_print to control whether or not we print lock violations
 - Add vfs_badlock_panic to control whether we panic on lock violations

Both default to on to mimic the original behavior if DEBUG_VFS_LOCKS is on.
2002-06-28 20:58:14 +00:00
Ian Dowse
84b2995b2f In vn_mkdir(), use vrele() instead of vput() on the parent directory
vnode in the case that the target exists and is the same vnode as
the parent (i.e. "mkdir ."). The namei() call does not leave the
vnode locked in this case even though you might expect it to.

This bug was mostly harmless in practice because unlocking an already
unlocked vnode currently does not trigger any panics or warnings.

Reviewed by:	jeff
2002-06-28 20:06:47 +00:00
Jeff Roberson
5c71bc6cf2 Clean up vn_rdwr locking.
- Do shared locks on read.
 - Only do vn_{start,finished}_write when writing.
2002-06-28 17:51:11 +00:00
Brian Feldman
aac12bcfbc Fix a case where a vnode got explicitly unlocked after the pointer to it
got set to NULL.

Revision 1.355: in the box
2002-06-28 16:17:47 +00:00
Luigi Rizzo
d26d355f0e Remove a printf and add a comment on an assumption that could be
occasionally violated by device drivers.
2002-06-27 23:23:04 +00:00
Robert Watson
600c1a5a8e Fix a bug that prevented the deletion of non-default ACLs from being
passed down the VFS stack.  While I'm here, replace a '0' with a 'NULL'
to make the code more readable.

Sponsored by:	DARPA, NAI Labs
Obtained from:	TrustedBSD Project
2002-06-27 19:31:15 +00:00
Robert Watson
cbeb840245 A bit of whitespace magic. 2002-06-27 19:30:11 +00:00
Kenneth D. Merry
98cb733c67 At long last, commit the zero copy sockets code.
MAKEDEV:	Add MAKEDEV glue for the ti(4) device nodes.

ti.4:		Update the ti(4) man page to include information on the
		TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options,
		and also include information about the new character
		device interface and the associated ioctls.

man9/Makefile:	Add jumbo.9 and zero_copy.9 man pages and associated
		links.

jumbo.9:	New man page describing the jumbo buffer allocator
		interface and operation.

zero_copy.9:	New man page describing the general characteristics of
		the zero copy send and receive code, and what an
		application author should do to take advantage of the
		zero copy functionality.

NOTES:		Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS,
		TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT.

conf/files:	Add uipc_jumbo.c and uipc_cow.c.

conf/options:	Add the 5 options mentioned above.

kern_subr.c:	Receive side zero copy implementation.  This takes
		"disposable" pages attached to an mbuf, gives them to
		a user process, and then recycles the user's page.
		This is only active when ZERO_COPY_SOCKETS is turned on
		and the kern.ipc.zero_copy.receive sysctl variable is
		set to 1.

uipc_cow.c:	Send side zero copy functions.  Takes a page written
		by the user and maps it copy on write and assigns it
		kernel virtual address space.  Removes copy on write
		mapping once the buffer has been freed by the network
		stack.

uipc_jumbo.c:	Jumbo disposable page allocator code.  This allocates
		(optionally) disposable pages for network drivers that
		want to give the user the option of doing zero copy
		receive.

uipc_socket.c:	Add kern.ipc.zero_copy.{send,receive} sysctls that are
		enabled if ZERO_COPY_SOCKETS is turned on.

		Add zero copy send support to sosend() -- pages get
		mapped into the kernel instead of getting copied if
		they meet size and alignment restrictions.

uipc_syscalls.c:Un-staticize some of the sf* functions so that they
		can be used elsewhere.  (uipc_cow.c)

if_media.c:	In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid
		calling malloc() with M_WAITOK.  Return an error if
		the M_NOWAIT malloc fails.

		The ti(4) driver and the wi(4) driver, at least, call
		this with a mutex held.  This causes witness warnings
		for 'ifconfig -a' with a wi(4) or ti(4) board in the
		system.  (I've only verified for ti(4)).

ip_output.c:	Fragment large datagrams so that each segment contains
		a multiple of PAGE_SIZE amount of data plus headers.
		This allows the receiver to potentially do page
		flipping on receives.

if_ti.c:	Add zero copy receive support to the ti(4) driver.  If
		TI_PRIVATE_JUMBOS is not defined, it now uses the
		jumbo(9) buffer allocator for jumbo receive buffers.

		Add a new character device interface for the ti(4)
		driver for the new debugging interface.  This allows
		(a patched version of) gdb to talk to the Tigon board
		and debug the firmware.  There are also a few additional
		debugging ioctls available through this interface.

		Add header splitting support to the ti(4) driver.

		Tweak some of the default interrupt coalescing
		parameters to more useful defaults.

		Add hooks for supporting transmit flow control, but
		leave it turned off with a comment describing why it
		is turned off.

if_tireg.h:	Change the firmware rev to 12.4.11, since we're really
		at 12.4.11 plus fixes from 12.4.13.

		Add defines needed for debugging.

		Remove the ti_stats structure, it is now defined in
		sys/tiio.h.

ti_fw.h:	12.4.11 firmware.

ti_fw2.h:	12.4.11 firmware, plus selected fixes from 12.4.13,
		and my header splitting patches.  Revision 12.4.13
		doesn't handle 10/100 negotiation properly.  (This
		firmware is the same as what was in the tree previously,
		with the addition of header splitting support.)

sys/jumbo.h:	Jumbo buffer allocator interface.

sys/mbuf.h:	Add a new external mbuf type, EXT_DISPOSABLE, to
		indicate that the payload buffer can be thrown away /
		flipped to a userland process.

socketvar.h:	Add prototype for socow_setup.

tiio.h:		ioctl interface to the character portion of the ti(4)
		driver, plus associated structure/type definitions.

uio.h:		Change prototype for uiomoveco() so that we'll know
		whether the source page is disposable.

ufs_readwrite.c:Update for new prototype of uiomoveco().

vm_fault.c:	In vm_fault(), check to see whether we need to do a page
		based copy on write fault.

vm_object.c:	Add a new function, vm_object_allocate_wait().  This
		does the same thing that vm_object allocate does, except
		that it gives the caller the opportunity to specify whether
		it should wait on the uma_zalloc() of the object structre.

		This allows vm objects to be allocated while holding a
		mutex.  (Without generating WITNESS warnings.)

		vm_object_allocate() is implemented as a call to
		vm_object_allocate_wait() with the malloc flag set to
		M_WAITOK.

vm_object.h:	Add prototype for vm_object_allocate_wait().

vm_page.c:	Add page-based copy on write setup, clear and fault
		routines.

vm_page.h:	Add page based COW function prototypes and variable in
		the vm_page structure.

Many thanks to Drew Gallatin, who wrote the zero copy send and receive
code, and to all the other folks who have tested and reviewed this code
over the years.
2002-06-26 03:37:47 +00:00
Andrew R. Reiter
e024f583de - Remove Giant acquisition from modevent(), modfnext(), modstat() and
modfind().  Giant is no longer needed by these functions for safe
  execution.

Reviewed by:	jhb
2002-06-26 00:31:44 +00:00
Andrew R. Reiter
4e77f68011 - Alleviate jail() from having the burden of acquiring Giant by simply
removing.  We can do this since we no longer need Giant to safely
  execute jail().

Reviewed by:	rwatson, jhb
2002-06-26 00:29:01 +00:00
Alan Cox
366838ddfe o Eliminate vmspace::vm_minsaddr. It's initialized but never used.
o Replace stale comments in vmspace by "const until freed" annotations
   on some fields.
2002-06-25 18:14:38 +00:00
Jake Burkholder
8ba3d077ff Add an MD callout like cpu_exit, but which is called after sched_lock is
obtained, when all other scheduling activity is suspended.  This is needed
on sparc64 to deactivate the vmspace of the exiting process on all cpus.
Otherwise if another unrelated process gets the exact same vmspace structure
allocated to it (same address), its address space will not be activated
properly.  This seems to fix some spontaneous signal 11 problems with smp
on sparc64.
2002-06-24 15:48:02 +00:00
Maxime Henrion
0ae037eb45 Bring sys/kern/md5c.c in sync with the userland version.
Add a comment so that people don't forget to keep the
version in src/lib/libmd/md5c.c in sync with this one.
This fixes a warning on sparc64.

Reviewed by:	phk
2002-06-24 14:15:25 +00:00
Kirk McKusick
d374764fd3 Use proper size in bzero of stat structure.
Submitted by:	Jake Burkholder <jake@locore.ca>
Sponsored by:	DARPA & NAI Labs.
2002-06-24 07:14:44 +00:00
Jonathan Mini
01ad8a53db Remove unused diagnostic function cread_free_thread().
Approved by:	alfred
2002-06-24 06:22:00 +00:00
Matthew Dillon
727300861d I Noticed a defect in the way wakeup() scans the tailq. Tor noticed an
even worse defect in wakeup_one().  This patch cleans up both.

Submitted by:	tegge
MFC after:	3 days
2002-06-24 00:14:36 +00:00
Maxime Henrion
0a84e7c8f7 More 64 bits platforms warning fixes.
Reviewed by:	rwatson
2002-06-23 18:32:39 +00:00
Kirk McKusick
6524dddcd5 This patch fixes a size problem with the stat structure for
64-bit architectures that was introduced in the UFS2 code
merge two days ago. The stat structure change that caused
the problem was the addition of the file create time.

Submitted by:	Bruce Evans <bde@zeta.org.au>
Sponsored by:	DARPA & NAI Labs.
2002-06-22 22:01:13 +00:00
Maxime Henrion
2853bfa0df We don't need to check the return value of malloc() against
NULL when the M_WAITOK flag is specified.
2002-06-22 21:44:11 +00:00
Matthew Dillon
ed22d6e948 Fix a bug in vfs_bio_clrbuf(). The single-page-clrbuf optimization was
improperly clearing more then just the invalid portions of the page.  (This
bug is not known to have been triggered by anything).

Submitted by:	tegge
MFC after:	7 days
2002-06-22 19:09:35 +00:00
Maxime Henrion
cacd1c9b49 o Remove the initialization of unused fields in the struct
uio now that we don't use uiomove() anymore.
o Enforce stricter checks on the length of the iov's in
  nmount(2) since we now malloc() them individually and
  corrupted iov's could make the kernel crash in malloc()
  with "kmem_map too small".

Reviewed by:	phk
2002-06-22 18:07:05 +00:00
Jonathan Mini
9718382d85 Always drop the p_args reference we held for copyout, even if we're about
to change it. This fixes a leak triggered by setproctitle(3).

Approved by:	alfred
Noticed by:	Peter Jeremy <peter.jeremy@alcatel.com.au>
2002-06-22 10:05:50 +00:00
Kirk McKusick
1c85e6a35d This commit adds basic support for the UFS2 filesystem. The UFS2
filesystem expands the inode to 256 bytes to make space for 64-bit
block pointers. It also adds a file-creation time field, an ability
to use jumbo blocks per inode to allow extent like pointer density,
and space for extended attributes (up to twice the filesystem block
size worth of attributes, e.g., on a 16K filesystem, there is space
for 32K of attributes). UFS2 fully supports and runs existing UFS1
filesystems. New filesystems built using newfs can be built in either
UFS1 or UFS2 format using the -O option. In this commit UFS1 is
the default format, so if you want to build UFS2 format filesystems,
you must specify -O 2. This default will be changed to UFS2 when
UFS2 proves itself to be stable. In this commit the boot code for
reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c)
as there is insufficient space in the boot block. Once the size of the
boot block is increased, this code can be defined.

Things to note: the definition of SBSIZE has changed to SBLOCKSIZE.
The header file <ufs/ufs/dinode.h> must be included before
<ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and
ufs_lbn_t.

Still TODO:
Verify that the first level bootstraps work for all the architectures.
Convert the utility ffsinfo to understand UFS2 and test growfs.
Add support for the extended attribute storage. Update soft updates
to ensure integrity of extended attribute storage. Switch the
current extended attribute interfaces to use the extended attribute
storage. Add the extent like functionality (framework is there,
but is currently never used).

Sponsored by: DARPA & NAI Labs.
Reviewed by:	Poul-Henning Kamp <phk@freebsd.org>
2002-06-21 06:18:05 +00:00
Maxime Henrion
7d2d440991 Change the way we internally store the mount options to
a linked list.  This is to allow the merging of the mount
options in the MNT_UPDATE case, as the current data structure
is unsuitable for this.

There are no functional differences in this commit.

Reviewed by:	phk
2002-06-20 20:03:42 +00:00
Alfred Perlstein
c33c825169 Implement SO_NOSIGPIPE option for sockets. This allows one to request that
an EPIPE error return not generate SIGPIPE on sockets.

Submitted by: lioux
Inspired by: Darwin
2002-06-20 18:52:54 +00:00
Alfred Perlstein
69be5db96f Don't leak resources if fdcheckstd() fails during exec.
Submitted by: Mike Makonnen <makonnen@pacbell.net>
2002-06-20 17:27:28 +00:00
Ian Dowse
99568bcaf7 Display the mutex name in the ^T status line if the selected thread
is blocked on a mutex. Prepend a '*' to distinguish this case as
is done in top(1).
2002-06-20 14:03:36 +00:00
Peter Wemm
9d04103b7c Remove UIO_USERISPACE - we do not support any split instruction/data
address space machines (eg: pdp-11) and are not likely to ever do so.
Nothing in our kernel sets this.
2002-06-20 07:08:43 +00:00