Commit Graph

250 Commits

Author SHA1 Message Date
Alan Cox
03be99d20c Use pmap_extract_and_hold() in pipe_build_write_buffer(). Consequently,
pipe_build_write_buffer() no longer requires Giant on entry.

Reviewed by:	tegge
2003-09-08 04:58:32 +00:00
Alan Cox
603d3d4a44 Giant is no longer required by pipe_destroy_write_buffer(). Reduce
unnecessary white space from pipe_destroy_write_buffer().
2003-09-06 21:02:10 +00:00
John-Mark Gurney
fc8684cd46 if we got this far, we definately don't have an EBADF. Return a more
sane result of EPIPE.

Reported by:	nCircle dev team
MFC after:	3 day
2003-08-15 04:31:01 +00:00
Alan Cox
77685ea594 - The vm_object pointer in pipe_buffer is unused. Remove it.
- Check for successful initialization of pipe_zone in pipeinit()
   rather than every call to pipe(2).
2003-08-13 20:01:38 +00:00
Alan Cox
ad8204e3f5 Pipespace() no longer requires Giant. 2003-08-11 22:23:25 +00:00
Mike Silbersack
cebde06978 More pipe changes:
From alc:
Move pageable pipe memory to a seperate kernel submap to avoid awkward
vm map interlocking issues.  (Bad explanation provided by me.)

From me:
Rework pipespace accounting code to handle this new layout, and adjust
our default values to account for the fact that we now have a solid
limit on allocations.

Also, remove the "maxpipes" limit, as it no longer has a purpose.
(The limit on kva usage solves the problem of having two many pipes.)
2003-08-11 05:51:51 +00:00
Alan Cox
f9999c67be Use vm_page_hold() instead of vm_page_wire(). Otherwise, a multithreaded
application could cause a wired page to be freed.  In general,
vm_page_hold() should be preferred for ephemeral kernel mappings of pages
borrowed from a user-level address space.  (vm_page_wire() should really be
reserved for indefinite duration pinning by the "owner" of the page.)

Discussed with:	silby
Submitted by:	tegge
2003-08-11 00:17:44 +00:00
Alan Cox
9c62fce085 - Remove GIANT_REQUIRED from pipespace().
- Remove a duplicate initialization from pipe_create().
2003-08-08 22:38:15 +00:00
Alan Cox
f9b1de367e - Remove GIANT_REQUIRED from pipe_free_kmem().
- Remove the acquisition and release of Giant around pipe_kmem_free() and
   uma_zfree() in pipeclose().
2003-08-07 04:32:40 +00:00
Pierre Beyssac
ae9fcf4c66 Remove test in pipe_write() which causes write(2) to return EAGAIN
on a non-blocking pipe in cases where select(2) returns the file
descriptor as ready for write. This in turns causes libc_r, for
one, to busy wait in such cases.

Note: it is a quick performance fix, a more complex fix might be
required in case this turns out to have unexpected side effects.

Reviewed by:	silby
MFC after:	3 days
2003-07-30 22:50:37 +00:00
Alan Cox
93b4c5b707 The introduction of vm object locking has caused witness to reveal
a long-standing mistake in the way a portion of a pipe's KVA is
allocated.  Specifically, kmem_alloc_pageable() is inappropriate
for use in the "direct" case because it allows a preceding vm map entry
and vm object to be extended to support the new KVA allocation.
However, the direct case KVA allocation should not have a backing
vm object.  This is corrected by using kmem_alloc_nofault().

Submitted by:	tegge (with the above explanation by me)
2003-07-30 18:55:04 +00:00
Mike Silbersack
ff56f15e26 A few minor changes:
- Use atomic ops to update the bigpipe count
- Make the bigpipe count sysctl readable
- Remove a duplicate comparison in an if statement
- Comment two SYSCTLs.
2003-07-09 21:59:48 +00:00
Mike Silbersack
289016f2d1 Put some concrete limits on pipe memory consumption:
- Limit the total number of pipes so that we do not
  exhaust all vm objects in the kernel map.  When
  this limit is reached, a ratelimited message will
  be printed to the console.

- Put a soft limit on the amount of memory consumable
  by pipes.  Once the limit has been reached, all new
  pipes will be limited to 4K in size, rather than the
  default of 16K.

- Put a limit on the number of pages that may be used
  for high speed page flipping in order to reduce the
  amount of wired memory.  Pipe writes that occur
  while this limit is exceeded will fall back to
  non-page flipping mode.

The above values are auto-tuned in subr_param.c and
are scaled to take into account both the size of
physical memory and the size of the kernel map.

These limits help to reduce the "kernel resources exhausted"
panics that could be caused by opening a large
number of pipes.  (Pipes alone are no longer able
to exhaust all resources, but other kernel memory hogs
in league with pipes may still be able to do so.)

PR:			53627
Ideas / comments from:	hsu, tjr, dillon@apollo.backplane.com
MFC after:		1 week
2003-07-08 04:02:31 +00:00
Poul-Henning Kamp
7c2d2efd58 Initialize struct fileops with C99 sparse initialization. 2003-06-18 18:16:40 +00:00
David E. O'Brien
677b542ea2 Use __FBSDID(). 2003-06-11 00:56:59 +00:00
Maxime Henrion
0ca5dc1c3e style(9). 2003-06-09 21:57:48 +00:00
Jeffrey Hsu
c31548c820 Need to hold the same SMP lock for (knote) list traversal as for
list manipulation.  This lock also protects read-modify-write operations
on the pipe_state field.
2003-04-02 15:24:50 +00:00
Jake Burkholder
227f9a1c58 - Add vm_paddr_t, a physical address type. This is required for systems
where physical addresses larger than virtual addresses, such as i386s
  with PAE.
- Use this to represent physical addresses in the MI vm system and in the
  i386 pmap code.  This also changes the paddr parameter to d_mmap_t.
- Fix printf formats to handle physical addresses >4G in the i386 memory
  detection code, and due to kvtop returning vm_paddr_t instead of u_long.

Note that this is a name change only; vm_paddr_t is still the same as
vm_offset_t on all currently supported platforms.

Sponsored by:	DARPA, Network Associates Laboratories
Discussed with:	re, phk (cdevsw change)
2003-03-25 00:07:06 +00:00
Warner Losh
a163d034fa Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
Alfred Perlstein
e7d6662f1b Do not allow kqueues to be passed via unix domain sockets. 2003-02-15 06:04:55 +00:00
Alan Cox
2bd63062b5 Use atomic ops to update amountpipekva. Amountpipekva represents the
total kernel virtual address space used by all pipes.  It is, thus, outside
the scope of any individual pipe lock.
2003-02-13 19:39:54 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
Matthew Dillon
48e3128b34 Bow to the whining masses and change a union back into void *. Retain
removal of unnecessary casts and throw in some minor cleanups to see if
anyone complains, just for the hell of it.
2003-01-13 00:33:17 +00:00
Matthew Dillon
cd72f2180b Change struct file f_data to un_data, a union of the correct struct
pointer types, and remove a huge number of casts from code using it.

Change struct xfile xf_data to xun_data (ABI is still compatible).

If we need to add a #define for f_data and xf_data we can, but I don't
think it will be necessary.  There are no operational changes in this
commit.
2003-01-12 01:37:13 +00:00
Poul-Henning Kamp
a7010ee2f4 White-space changes. 2002-12-24 09:44:51 +00:00
Poul-Henning Kamp
f3a682116c Detediousficate declaration of fileops array members by introducing
typedefs for them.
2002-12-23 21:53:20 +00:00
Alfred Perlstein
8ced1eb281 Remove a KASSERT I added in 1.73 to catch uninitialized pipes.
It must be removed because it is done without the pipe being locked
via pipelock() and therefore is vulnerable to races with pipespace()
erroneously triggering it by temporarily zero'ing out the structure
backing the pipe.

It looks as if this assertion is not needed because all manipulation
of the data changed by pipespace() _is_ protected by pipelock().

Reported by: kris, mckusick
2002-10-14 21:15:04 +00:00
Alfred Perlstein
1e31f88689 whitespace fixes. 2002-10-12 22:26:41 +00:00
Mike Barcroft
2b7f24d210 Change iov_base's type from char *' to the standard void *'. All
uses of iov_base which assume its type is `char *' (in order to do
pointer arithmetic) have been updated to cast iov_base to `char *'.
2002-10-11 14:58:34 +00:00
Don Lewis
91e97a8266 In an SMP environment post-Giant it is no longer safe to blindly
dereference the struct sigio pointer without any locking.  Change
fgetown() to take a reference to the pointer instead of a copy of the
pointer and call SIGIO_LOCK() before copying the pointer and
dereferencing it.

Reviewed by:	rwatson
2002-10-03 02:13:00 +00:00
Robert Watson
1aa37f5392 Improve locking of pipe mutexes in the context of MAC:
(1) Where previously the pipe mutex was selectively grabbed during
    pipe_ioctl(), now always grab it and then release if if not
    needed.  This protects the call to mac_check_pipe_ioctl() to
    make sure the label remains consistent.  (Note: it looks
    like sigio locking may be incorrect for fgetown() since we
    call it not-by-reference and sigio locking assumes call by
    reference).

(2) In pipe_stat(), lock the pipe if MAC is compiled in so that
    the call to mac_check_pipe_stat() gets a locked pipe to
    protect label consistency.  We still release the lock before
    returning actual stat() data, risking inconsistency, but
    apparently our pipe locking model accepts that risk.

(3) In various pipe MAC authorization checks, assert that the pipe
    lock is held.

(4) Grab the lock when performing a pipe relabel operation, and
    assert it a little deeper in the stack.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-01 04:30:19 +00:00
Poul-Henning Kamp
37c841831f Be consistent about "static" functions: if the function is marked
static in its prototype, mark it static at the definition too.

Inspired by:    FlexeLint warning #512
2002-09-28 17:15:38 +00:00
Archie Cobbs
55f7c614fd Don't use "NULL" when "0" is really meant. 2002-08-21 23:39:52 +00:00
Robert Watson
c024c3eeb1 Break out mac_check_pipe_op() into component check entry points:
mac_check_pipe_poll(), mac_check_pipe_read(), mac_check_pipe_stat(),
and mac_check_pipe_write().  This is improves consistency with other
access control entry points and permits security modules to only
control the object methods that they are interested in, avoiding
switch statements.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-19 16:59:37 +00:00
Robert Watson
d49fa1ca6e In continuation of early fileop credential changes, modify fo_ioctl() to
accept an 'active_cred' argument reflecting the credential of the thread
initiating the ioctl operation.

- Change fo_ioctl() to accept active_cred; change consumers of the
  fo_ioctl() interface to generally pass active_cred from td->td_ucred.
- In fifofs, initialize filetmp.f_cred to ap->a_cred so that the
  invocations of soo_ioctl() are provided access to the calling f_cred.
  Pass ap->a_td->td_ucred as the active_cred, but note that this is
  required because we don't yet distinguish file_cred and active_cred
  in invoking VOP's.
- Update kqueue_ioctl() for its new argument.
- Update pipe_ioctl() for its new argument, pass active_cred rather
  than td_ucred to MAC for authorization.
- Update soo_ioctl() for its new argument.
- Update vn_ioctl() for its new argument, use active_cred rather than
  td->td_ucred to authorize VOP_IOCTL() and the associated VOP_GETATTR().

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-17 02:36:16 +00:00
Robert Watson
49cde51dfd Correct white space nits that crept in during my recent merges of
trustedbsd_mac material.
2002-08-16 14:12:40 +00:00
Robert Watson
ea6027a8e1 Make similar changes to fo_stat() and fo_poll() as made earlier to
fo_read() and fo_write(): explicitly use the cred argument to fo_poll()
as "active_cred" using the passed file descriptor's f_cred reference
to provide access to the file credential.  Add an active_cred
argument to fo_stat() so that implementers have access to the active
credential as well as the file credential.  Generally modify callers
of fo_stat() to pass in td->td_ucred rather than fp->f_cred, which
was redundantly provided via the fp argument.  This set of modifications
also permits threads to perform these operations on behalf of another
thread without modifying their credential.

Trickle this change down into fo_stat/poll() implementations:

- badfo_poll(), badfo_stat(): modify/add arguments.
- kqueue_poll(), kqueue_stat(): modify arguments.
- pipe_poll(), pipe_stat(): modify/add arguments, pass active_cred to
  MAC checks rather than td->td_ucred.
- soo_poll(), soo_stat(): modify/add arguments, pass fp->f_cred rather
  than cred to pru_sopoll() to maintain current semantics.
- sopoll(): moidfy arguments.
- vn_poll(), vn_statfile(): modify/add arguments, pass new arguments
  to vn_stat().  Pass active_cred to MAC and fp->f_cred to VOP_POLL()
  to maintian current semantics.
- vn_close(): rename cred to file_cred to reflect reality while I'm here.
- vn_stat(): Add active_cred and file_cred arguments to vn_stat()
  and consumers so that this distinction is maintained at the VFS
  as well as 'struct file' layer.  Pass active_cred instead of
  td->td_ucred to MAC and to VOP_GETATTR() to maintain current semantics.

- fifofs: modify the creation of a "filetemp" so that the file
  credential is properly initialized and can be used in the socket
  code if desired.  Pass ap->a_td->td_ucred as the active
  credential to soo_poll().  If we teach the vnop interface about
  the distinction between file and active credentials, we would use
  the active credential here.

Note that current inconsistent passing of active_cred vs. file_cred to
VOP's is maintained.  It's not clear why GETATTR would be authorized
using active_cred while POLL would be authorized using file_cred at
the file system level.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-16 12:52:03 +00:00
Robert Watson
9ca435893b In order to better support flexible and extensible access control,
make a series of modifications to the credential arguments relating
to file read and write operations to cliarfy which credential is
used for what:

- Change fo_read() and fo_write() to accept "active_cred" instead of
  "cred", and change the semantics of consumers of fo_read() and
  fo_write() to pass the active credential of the thread requesting
  an operation rather than the cached file cred.  The cached file
  cred is still available in fo_read() and fo_write() consumers
  via fp->f_cred.  These changes largely in sys_generic.c.

For each implementation of fo_read() and fo_write(), update cred
usage to reflect this change and maintain current semantics:

- badfo_readwrite() unchanged
- kqueue_read/write() unchanged
  pipe_read/write() now authorize MAC using active_cred rather
  than td->td_ucred
- soo_read/write() unchanged
- vn_read/write() now authorize MAC using active_cred but
  VOP_READ/WRITE() with fp->f_cred

Modify vn_rdwr() to accept two credential arguments instead of a
single credential: active_cred and file_cred.  Use active_cred
for MAC authorization, and select a credential for use in
VOP_READ/WRITE() based on whether file_cred is NULL or not.  If
file_cred is provided, authorize the VOP using that cred,
otherwise the active credential, matching current semantics.

Modify current vn_rdwr() consumers to pass a file_cred if used
in the context of a struct file, and to always pass active_cred.
When vn_rdwr() is used without a file_cred, pass NOCRED.

These changes should maintain current semantics for read/write,
but avoid a redundant passing of fp->f_cred, as well as making
it more clear what the origin of each credential is in file
descriptor read/write operations.

Follow-up commits will make similar changes to other file descriptor
operations, and modify the MAC framework to pass both credentials
to MAC policy modules so they can implement either semantic for
revocation.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-15 20:55:08 +00:00
Robert Watson
925860774d Introduce support for labeling and access control of pipe objects
as part of the TrustedBSD MAC framework.  Instrument the creation
and destruction of pipes, as well as relevant operations, with
necessary calls to the MAC framework.  Note that the locking
here is probably not quite right yet, but fixes will be forthcoming.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-13 02:47:13 +00:00
Dag-Erling Smørgrav
ea4c8f8ca1 Check the far end before registering an EVFILT_WRITE filter on a pipe. 2002-08-05 15:03:03 +00:00
Alfred Perlstein
1a5a641600 Remove unneeded caddr_t casts. 2002-07-22 19:05:44 +00:00
Alan Cox
b416fa1041 o Lock accesses to the page queues.
o Add a comment explaining why hoisting the page queue lock outside
   of a particular loop is not possible.
2002-07-13 04:09:45 +00:00
Alfred Perlstein
7f05b0353a More caddr_t removal, make fo_ioctl take a void * instead of a caddr_t. 2002-06-29 01:50:25 +00:00
Alfred Perlstein
52545a237b document that the pipe fo_stat routine doesn't need locks because it's
a read operation.

Requested by: rwatson
2002-06-28 22:35:12 +00:00
Alfred Perlstein
e649887b1e Make funsetown() take a 'struct sigio **' so that the locking can
be done internally.

Ensure that no one can fsetown() to a dying process/pgrp.  We need
to check the process for P_WEXIT to see if it's exiting.  Process
groups are already safe because there is no such thing as a pgrp
zombie, therefore the proctree lock completely protects the pgrp
from having sigio structures associated with it after it runs
funsetownlst.

Add sigio lock to witness list under proctree and allproc, but over
proc and pgrp.

Seigo Tanimura helped with this.
2002-05-06 19:31:28 +00:00
Alfred Perlstein
f132072368 Redo the sigio locking.
Turn the sigio sx into a mutex.

Sigio lock is really only needed to protect interrupts from dereferencing
the sigio pointer in an object when the sigio itself is being destroyed.

In order to do this in the most unintrusive manner change pgsigio's
sigio * argument into a **, that way we can lock internally to the
function.
2002-05-01 20:44:46 +00:00
Thomas Moestl
8db523989f Use pmap_extract() instead of pmap_kextract() to retrieve the physical
address associated with a user virtual address in
pipe_build_write_buffer().

Reviewed by:	alc
2002-04-13 20:09:06 +00:00
Thomas Moestl
de67a4bd91 Back out the last revision - it does not work correctly when one of
the pages in question is not in the top-level vm object, but in
one of the shadow ones.

Pointed out by: alc
Pointy hat to:	tmm
2002-04-13 00:03:07 +00:00
Thomas Moestl
60f2606a7d Do not use pmap_kextract() to find out the physical address of a user
belong to a user virtual address; while this happens to work on some
architectures, it can't on sparc64, since user and kernel virtual
address spaces overlap there (the distinction between them is done via
separate address space identifiers).

Instead, look up the page in the vm_map of the process in question.

Reviewed by:	jake
2002-04-12 19:38:41 +00:00
John Baldwin
6008862bc2 Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on:	i386, alpha, sparc64
2002-04-04 21:03:38 +00:00
Alan Cox
cd430164f1 Allow resursion on the pipe mutex because filt_piperead() and filt_pipewrite()
can be called both with and without the pipe mutex held.  (For example,
if called by pipeselwakeup(), it is held.  Whereas, if called by kqueue_scan(),
it is not.)

Reviewed by:	alfred
2002-03-27 21:47:50 +00:00
Alfred Perlstein
db51256707 When "cloning" a pipe's buffer bcopy the data after dropping the pipe's
lock as the data may be paged out and cause a fault.
2002-03-22 16:09:22 +00:00
Jeff Roberson
c897b81311 Remove references to vm_zone.h and switch over to the new uma API.
Also, remove maxsockets.  If you look carefully you'll notice that the old
zone allocator never honored this anyway.
2002-03-20 04:09:59 +00:00
Jeff Roberson
8355f576a9 This is the first part of the new kernel memory allocator. This replaces
malloc(9) and vm_zone with a slab like allocator.

Reviewed by:	arch@
2002-03-19 09:11:49 +00:00
Alfred Perlstein
3b018f572d Bug fixes:
Missed a place where the pipe sleep lock was needed in order to safely grab
Giant, fix it and add an assertion to make sure this doesn't happen again.

Fix typos in the PIPE_GET_GIANT/PIPE_DROP_GIANT that could cause the
wrong mutex to get passed to PIPE_LOCK/PIPE_UNLOCK.

Fix a location where the wrong pipe was being passed to
PIPE_GET_GIANT/PIPE_DROP_GIANT.
2002-03-15 07:18:09 +00:00
Alfred Perlstein
be4af4b723 Don't deref NULL mutex pointer when pipeclose()'ing a pipe that is not
fully instaniated.

Revert the logic in pipeclose so that we don't have the entire function
pretty much under a single if() statement, instead invert the test and
just return if it fails.

Submitted (in different form) by: bde

Don't use pool mutexes for pipes.  We can not use pool mutexes
because we will need to grab the select lock while holding a pipe
lock which is not allowed because you may not aquire additional
mutexes when holding a pool mutex.

Instead malloc(9) space for the mutex that is shared between the
pipes.
2002-03-09 22:06:31 +00:00
Seigo Tanimura
996abba928 Track the number of wired pages to avoid unwiring unwired pages.
Reviewed by:	alfred
2002-03-05 00:51:03 +00:00
Alfred Perlstein
9f01374de5 kill __P. 2002-02-27 18:51:53 +00:00
Alfred Perlstein
566c1313a3 add assertions in the places where giant is required to catch when
the pipe is locked and shouldn't be.

initialize pipe->pipe_mtxp to NULL when creating pipes in order not
to trip the above assertions.

swap pipe lock with giant around calls to pipe_destroy_write_buffer()

pipe_destroy_write_buffer issue noticed by: jhb
2002-02-27 18:49:58 +00:00
Alfred Perlstein
21dbcfd500 Fix a NULL deref panic in pipe_write, we can't blindly lock
pipe->pipe_peer->pipe_mtxp because it may be NULL, so lock the
passed in pipe's mutex instead.
2002-02-27 17:23:16 +00:00
Alfred Perlstein
ffddaaeeeb MPsafe fixes:
use SYSINIT to initialize pipe_zone.
use PIPE_LOCK to protect kevent ops.
2002-02-27 11:27:48 +00:00
Alfred Perlstein
f81b04d96c First rev at making pipe(2) pipe's MPsafe.
Both ends of the pipe share a pool_mutex, this makes allocation
and deadlock avoidance easy.

Remove some un-needed FILE_LOCK ops while I'm here.

There are some issues wrt to select and the f{s,g}etown code that
we'll have to deal with, I think we may also need to move the calls
to vfs_timestamp outside of the sections covered by PIPE_LOCK.
2002-02-27 07:35:59 +00:00
Alfred Perlstein
426da3bcfb SMP Lock struct file, filedesc and the global file list.
Seigo Tanimura (tanimura) posted the initial delta.

I've polished it quite a bit reducing the need for locking and
adapting it for KSE.

Locks:

1 mutex in each filedesc
   protects all the fields.
   protects "struct file" initialization, while a struct file
     is being changed from &badfileops -> &pipeops or something
     the filedesc should be locked.

1 mutex in each struct file
   protects the refcount fields.
   doesn't protect anything else.
   the flags used for garbage collection have been moved to
     f_gcflag which was the FILLER short, this doesn't need
     locking because the garbage collection is a single threaded
     container.
  could likely be made to use a pool mutex.

1 sx lock for the global filelist.

struct file *	fhold(struct file *fp);
        /* increments reference count on a file */

struct file *	fhold_locked(struct file *fp);
        /* like fhold but expects file to locked */

struct file *	ffind_hold(struct thread *, int fd);
        /* finds the struct file in thread, adds one reference and
                returns it unlocked */

struct file *	ffind_lock(struct thread *, int fd);
        /* ffind_hold, but returns file locked */

I still have to smp-safe the fget cruft, I'll get to that asap.
2002-01-13 11:58:06 +00:00
Maxim Sobolev
783c41d432 Make kevents on pipes work as described in the manpage - when the last
reader/writer disconnects, ensure that anybody who is waiting for the
kevent on the other end of the pipe gets EV_EOF.

MFC after:	2 weeks
2001-11-19 09:25:30 +00:00
John Baldwin
ed01445d8f Use the passed in thread to selrecord() instead of curthread. 2001-09-21 22:46:54 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
Matthew Dillon
7b9673fa28 cleanup: GIANT macros, rename DEPRECIATE to DEPRECATE
Move p_giant_optional to proc zero'd section
Remove (old) XXX zfree comment in pipe code
2001-07-04 17:11:03 +00:00
Matthew Dillon
0cddd8f023 With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage).  Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.
2001-07-04 16:20:28 +00:00
Jonathan Lemon
7b748f0a21 Correctly hook up the write kqfilter to pipes.
Submitted by:  Niels Provos <provos@citi.umich.edu>
2001-06-15 20:45:01 +00:00
Matthew Dillon
1b3e974a71 The pipe_write() code was locking the pipe without busying it first in
certain cases, and a close() by another process could potentially rip the
pipe out from under the (blocked) locking operation.

Reported-by: Alexander Viro <viro@math.psu.edu>
2001-06-04 04:04:45 +00:00
Alfred Perlstein
0cea693084 whitespace/style 2001-05-24 18:06:22 +00:00
Alfred Perlstein
53240603ee aquire vm_mutex a little bit earlier to protect a pmap call. 2001-05-23 10:26:36 +00:00
John Baldwin
270b041d95 - Assert that the vm mutex is held in pipe_free_kmem().
- Don't release the vm mutex early in pipespace() but instead hold it
  across vm_object_deallocate() if vm_map_find() returns an error and
  across pipe_free_kmem() if vm_map_find() succeeds.
- Add a XXX above a zfree() since zalloc already has its own locking,
  one would hope that zfree() wouldn't need the vm lock.
2001-05-21 18:47:17 +00:00
Alfred Perlstein
2395531439 Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb
2001-05-19 01:28:09 +00:00
Alfred Perlstein
0fd061c0c4 Cleanup
Remove comment about setting error for reads on EOF, read returns 0 on
EOF so the code should be ok.

Remove non-effective priority boost, PRIO+1 doesn't do anything
(according to McKusick), if a real priority boost is needed it should
have been +4.

Style fixes:
.) return foo -> return (foo)
.) FLAG1|FlAG2 -> FLAG1 | FlAG2
.) wrap long lines
.) unwrap short lines
.) for(i=0;i=foo;i++) -> for (i = 0; i=foo; i++)
.) remove braces for some conditionals with a single statement
.) fix continuation lines.

md5 couldn't verify the binary because some code had to
be shuffled around to address the style issues.
2001-05-17 19:47:09 +00:00
Alfred Perlstein
2deb4a20c3 initialize pipe pointers 2001-05-17 18:22:58 +00:00
Alfred Perlstein
82a283fcf3 pipe_create has to zero out the select record earlier to avoid
returning a half-initialized pipe and causing pipeclose() to follow
a junk pointer.

Discovered by: "Nick S" <snicko@noid.org>
2001-05-17 17:59:28 +00:00
Alfred Perlstein
97d4578662 Remove an 'optimization' I hope to never see again.
The pipe code could not handle running out of kva, it would panic
if that happened.  Instead return ENFILE to the application which
is an acceptable error return from pipe(2).

There was some slightly tricky things that needed to be worked on,
namely that the pipe code can 'realloc' the size of the buffer if
it detects that the pipe could use a bit more room.  However if it
failed the reallocation it could not cope and would panic.  Fix
this by attempting to grow the pipe while holding onto our old
resources.  If all goes well free the old resources and use the
new ones, otherwise continue to use the smaller buffer already
allocated.

While I'm here add a few blank lines for style(9) and remove
'register'.
2001-05-08 09:09:18 +00:00
Mark Murray
fb919e4d5a Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by:	bde (with reservations)
2001-05-01 08:13:21 +00:00
Jonathan Lemon
608a3ce62a Extend kqueue down to the device layer.
Backwards compatible approach suggested by: peter
2001-02-15 16:34:11 +00:00
David Malone
3b54736e19 Style improvements for last fix. Should be functionally the same.
Submitted by:	bde
2001-01-11 00:13:54 +00:00
Garrett Wollman
0a2c3d48c6 select() DKI is now in <sys/selinfo.h>. 2001-01-09 04:33:49 +00:00
David Malone
2ebaaccd47 If we failed to allocate the file discriptor for the write end of
the pipe, then we were corrupting the pipe_zone free list by calling
pipeclose on rpipe twice. NULL out rpipe to avoid this.

Reviewed by:	dillon
Reviewed by:	iedowse
2001-01-08 22:14:48 +00:00
Matthew Dillon
279d722604 This patchset fixes a large number of file descriptor race conditions.
Pre-rfork code assumed inherent locking of a process's file descriptor
    array.  However, with the advent of rfork() the file descriptor table
    could be shared between processes.  This patch closes over a dozen
    serious race conditions related to one thread manipulating the table
    (e.g. closing or dup()ing a descriptor) while another is blocked in
    an open(), close(), fcntl(), read(), write(), etc...

PR: kern/11629
Discussed with: Alexander Viro <viro@math.psu.edu>
2000-11-18 21:01:04 +00:00
Jonathan Lemon
6f451c99b3 Pipes are not writeable while a direct write is in progress. However,
the kqueue filter got the sense of the test reversed, so fix it.

Spotted by:	Michael Elkins <me@sigpipe.org>
2000-09-14 20:10:19 +00:00
Jake Burkholder
e39756439c Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by:		msmith and others
2000-05-26 02:09:24 +00:00
Jake Burkholder
740a1973a6 Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by:	phk
Reviewed by:	phk
Approved by:	mdodd
2000-05-23 20:41:01 +00:00
Chris Costello
12861d58db Include UID and GID information for stat() calls using the values filled
into the file descriptor data by falloc().

Reviewed by:	phk
2000-05-11 22:08:20 +00:00
Jonathan Lemon
cb679c385e Introduce kqueue() and kevent(), a kernel event notification facility. 2000-04-16 18:53:38 +00:00
Matthew Dillon
f1924a54f8 Fix in-kernel infinite loop in pipe_write() when the reader goes away
at just the wrong time.
2000-03-24 00:47:37 +00:00
Bruce Evans
da211f5bf6 Use vfs_timestamp() instead of getnanotime() to set timestamps. This
fixee incoherency of pipe timestamps relative to file timestamps in
the usual case where getnanotime() is not used for the latter.  (File
and pipe timestamps are still incoherent relative to real time unless
the vfs_timestamp_precision sysctl is set to 2 or 3).
1999-12-26 13:04:52 +00:00
Tor Egge
f52523701c Fix two problems with pipe_write():
1. Data written beyond end of pipe buffer, causing kernel memory corruption.

    - Check that space is still valid after obtaining the pipe lock.

    - Defer the calculation of transfer size until the pipe
      lock has been obtained.

    - Update the pipe buffer pointers while holding the pipe lock.

 2. Writes of size <= PIPE_BUF not always atomic.

    - Allow an internal write to span two contiguous segments,
      so writes of size <= PIPE_BUF can be kept atomic
      when wrapping around from the end to the start of the
      pipe buffer.

PR:		15235
Reviewed by:	Matt Dillon <dillon@FreeBSD.org>
1999-12-13 02:55:47 +00:00
Peter Wemm
29e040e5c1 Update pipe code for fo_stat() entry point - pipe_stat() is now no longer
used outside the pipe code.
1999-11-08 03:28:49 +00:00
Poul-Henning Kamp
923502ff91 useracc() the prequel:
Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>.  This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.
1999-10-29 18:09:36 +00:00
Matthew Dillon
4cc712004c Fix bug in pipe code relating to writes of mmap'd but illegal address
spaces which cross a segment boundry in the page table.  pmap_kextract()
    is not designed for access to the user space portion of the page
    table and cannot handle the null-page-directory-entry case.

    The fix is to have vm_fault_quick() return a success or failure which
    is then used to avoid calling pmap_kextract().
1999-09-20 19:08:48 +00:00
Brian Feldman
13ccadd4b0 This is what was "fdfix2.patch," a fix for fd sharing. It's pretty
far-reaching in fd-land, so you'll want to consult the code for
changes.  The biggest change is that now, you don't use
	fp->f_ops->fo_foo(fp, bar)
but instead
	fo_foo(fp, bar),
which increments and decrements the fp refcount upon entry and exit.
Two new calls, fhold() and fdrop(), are provided.  Each does what it
seems like it should, and if fdrop() brings the refcount to zero, the
fd is freed as well.

Thanks to peter ("to hell with it, it looks ok to me.") for his review.
Thanks to msmith for keeping me from putting locks everywhere :)

Reviewed by:	peter
1999-09-19 17:00:25 +00:00
Peter Wemm
c3aac50f28 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
Brian Feldman
e32c66c539 Fix fd race conditions (during shared fd table usage.) Badfileops is
now used in f_ops in place of NULL, and modifications to the files
are more carefully ordered. f_ops should also be set to &badfileops
upon "close" of a file.

This does not fix other problems mentioned in this PR than the first
one.

PR:		11629
Reviewed by:	peter
1999-08-04 18:53:50 +00:00
Alan Cox
3d41489171 Restructure pipe_read in order to eliminate several race conditions.
Submitted by:	Matthew Dillon <dillon@apollo.backplane.com> and myself
1999-06-05 03:53:57 +00:00
Dmitrij Tejblum
8fe387ab84 Add standard padding argument to pread and pwrite syscall. That should make them
NetBSD compatible.

Add parameter to fo_read and fo_write. (The only flag FOF_OFFSET mean that
the offset is set in the struct uio).

Factor out some common code from read/pread/write/pwrite syscalls.
1999-04-04 21:41:28 +00:00
Matthew Dillon
e198079da2 Fix race in pipe read code whereby a blocked lock can allow another
process to sneak in and write to or close the pipe.  The read code
    enters a 'piperd' state after doing the lock operation without
    checking to see if the state changed, which can cause the process
    to wait forever.

    The code has also been documented more.
1999-02-04 23:50:49 +00:00
Matthew Dillon
8aef171243 Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile
1999-01-28 00:57:57 +00:00
Matthew Dillon
d254af07a1 Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile
1999-01-27 21:50:00 +00:00
Bruce Evans
dc34e67676 Include <sys/select.h> -- don't depend on pollution in <sys/proc.h>. 1999-01-27 10:10:03 +00:00
Archie Cobbs
f1d19042b0 The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static
and local variables, goto labels, and functions declared but not defined.
1998-12-07 21:58:50 +00:00
Don Lewis
831d27a9f5 Installed the second patch attached to kern/7899 with some changes suggested
by bde, a few other tweaks to get the patch to apply cleanly again and
some improvements to the comments.

This change closes some fairly minor security holes associated with
F_SETOWN, fixes a few bugs, and removes some limitations that F_SETOWN
had on tty devices.  For more details, see the description on the PR.

Because this patch increases the size of the proc and pgrp structures,
it is necessary to re-install the includes and recompile libkvm,
the vinum lkm, fstat, gcore, gdb, ipfilter, ps, top, and w.

PR:		kern/7899
Reviewed by:	bde, elvind
1998-11-11 10:04:13 +00:00
David Greenman
730075613a Added a second argument, "activate" to the vm_page_unwire() call so that
the caller can select either inactive or active queue to put the page on.
1998-10-28 13:37:02 +00:00
David Greenman
6cde7a165f Fixed two potentially serious classes of bugs:
1) The vnode pager wasn't properly tracking the file size due to
   "size" being page rounded in some cases and not in others.
   This sometimes resulted in corrupted files. First noticed by
   Terry Lambert.
   Fixed by changing the "size" pager_alloc parameter to be a 64bit
   byte value (as opposed to a 32bit page index) and changing the
   pagers and their callers to deal with this properly.
2) Fixed a bogus type cast in round_page() and trunc_page() that
   caused some 64bit offsets and sizes to be scrambled. Removing
   the cast required adding casts at a few dozen callers.
   There may be problems with other bogus casts in close-by
   macros. A quick check seemed to indicate that those were okay,
   however.
1998-10-13 08:24:45 +00:00
Doug Rabson
ecbb00a262 This commit fixes various 64bit portability problems required for
FreeBSD/alpha.  The most significant item is to change the command
argument to ioctl functions from int to u_long.  This change brings us
inline with various other BSD versions.  Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.
1998-06-07 17:13:14 +00:00
Bruce Evans
08637435f2 Moved some #includes from <sys/param.h> nearer to where they are actually
used.
1998-03-28 10:33:27 +00:00
Poul-Henning Kamp
a0502b19d4 Add two new functions, get{micro|nano}time.
They are atomic, but return in essence what is in the "time" variable.
gettime() is now a macro front for getmicrotime().

Various patches to use the two new functions instead of the various
hacks used in their absence.

Some puntuation and grammer patches from Bruce.

A couple of XXX comments.
1998-03-26 20:54:05 +00:00
Eivind Eklund
303b270b0a Staticize. 1998-02-09 06:11:36 +00:00
Eivind Eklund
0b08f5f737 Back out DIAGNOSTIC changes. 1998-02-06 12:14:30 +00:00
Eivind Eklund
47cfdb166d Turn DIAGNOSTIC into a new-style option. 1998-02-04 22:34:03 +00:00
Poul-Henning Kamp
4a11ca4e29 Remove a bunch of variables which were unused both in GENERIC and LINT.
Found by:	-Wunused
1997-11-07 08:53:44 +00:00
Poul-Henning Kamp
cb226aaa62 Move the "retval" (3rd) parameter from all syscall functions and put
it in struct proc instead.

This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.

I have not removed the /*ARGSUSED*/, they will require some looking at.

libkvm, ps and other userland struct proc frobbing programs will need
recompiled.
1997-11-06 19:29:57 +00:00
Peter Wemm
115526d03e Ack! Fix excessive cut/paste blunder during poll mods. Who had the
pointy hat last? :-]

When one is selecting (or polling) for write, it helps if we use the
write side of the pipe when requesting wakeups instead of the read side.
This broke ghostview (at least) - I'm suprised it wasn't noticed for
so long.

Reviewed by:  Greg Lehey <grog@lemis.com>
1997-10-06 08:30:08 +00:00
Peter Wemm
d080b0b0be Implement the poll backend for the pipe file type. 1997-09-14 02:43:25 +00:00
Bruce Evans
e4ba6a82b0 Removed unused #includes. 1997-09-02 20:06:59 +00:00
John Dyson
0d65e566b9 Another attempt at cleaning up the new memory allocator. 1997-08-05 22:24:31 +00:00
John Dyson
e258d33a51 Fix up come cruft that I left on a previous commit. 1997-08-05 00:05:00 +00:00
John Dyson
3075778b63 Get rid of the ad-hoc memory allocator for vm_map_entries, in lieu of
a simple, clean zone type allocator.  This new allocator will also be
used for machine dependent pmap PV entries.
1997-08-05 00:02:08 +00:00
Bruce Evans
9dd8309d56 Removed support for OLD_PIPE. <sys/stat.h> is now missing the hack that
supported nameless pipes being indistinguishable from fifos.  We're not
going back.
1997-04-09 16:53:45 +00:00
Bruce Evans
2098241054 Don't include <sys/ioctl.h> in the kernel. Stage 4: include
<sys/ttycom.h> and sometimes <sys/filio.h> instead of <sys/ioctl.h>
in miscellaneous files.  Most of these files have nothing to do
with ttys but need to include <sys/ttycom.h> to get the definitions
of TIOC[SG]PGRP which are (ab)used to convert F[SG]ETOWN fcntls into
ioctls.
1997-03-24 11:52:29 +00:00
Bruce Evans
3ac4d1ef0c Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined.
Fixed everything that depended on getting fcntl.h stuff from the wrong
place.  Most things don't depend on file.h stuff at all.
1997-03-23 03:37:54 +00:00
Bruce Evans
3c81694426 Fixed some invalid (non-atomic) accesses to `time', mostly ones of the
form `tv = time'.  Use a new function gettime().  The current version
just forces atomicicity without fixing precision or efficiency bugs.
Simplified some related valid accesses by using the central function.
1997-03-22 06:53:45 +00:00
Peter Wemm
6875d25465 Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.
1997-02-22 09:48:43 +00:00
John Dyson
996c772f58 This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
		Mount_std mounts will not work until the getfsent
		library routine is changed.

Reviewed by:	various people
Submitted by:	Jeffery Hsu <hsu@freebsd.org>
1997-02-10 02:22:35 +00:00
Jordan K. Hubbard
1130b656e5 Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore.  This update would have been
insane otherwise.
1997-01-14 07:20:47 +00:00
John Dyson
f2c832d788 Mostly some fixes from bde to start support for ASYNC I/O (SIGIO).
Submitted by:	bde
1996-10-11 02:27:30 +00:00
John Dyson
d1a5be1064 A few minor mods (improvements) to support more efficient pipe
operations for large transfers.  There are essentially no differences
for small transfers, but big transfers should perform about 20%
better.
1996-07-13 22:52:50 +00:00
Bruce Evans
3be8cc7800 Staticized some variables.
Fixed initialization of pipe_pgid - don't default to pid 0 (swapper) for
SIGIO.

Added comments about other implicit initializations, mostly for struct
stat.

Fixed initialization of st_mode.  S_IFSOCK was for when pipes were sockets.
It is probably safe to fix the bogus S_ISFIFO() now that pipes can be
distinguished from sockets in all cases.

Don't return ENOSYS for inappropriate ioctls.
1996-07-12 08:14:58 +00:00
John Dyson
c44013cde6 Get rid of PIPE_NBIO, cleaning up the code a bit.
Reviewed by:	bde
1996-07-04 04:36:56 +00:00
John Dyson
23fd45be00 Disable direct writes for non-blocking output. 1996-06-17 05:15:01 +00:00
Gary Palmer
c23670e294 Clean up -Wunused warnings.
Reviewed by:		bde
1996-06-12 05:11:41 +00:00
John Dyson
d73ce5bd92 Various pipe error return fixes, and a significant typeo fix. From
Bruce Evans (of course :-)).
Submitted by:	bde
1996-03-25 01:48:28 +00:00
John Dyson
7fe9c39bea Yet another fix from BDE for the new pipe code. This fixes a potential
deadlock due to mismanagement of busy counters.

Reviewed by: dyson
Submitted by: bde
1996-03-17 04:52:10 +00:00
John Dyson
6e20683c9d Fix a problem that select did not work with direct writes. Make
wakeup channels more consistant also.
1996-02-22 03:33:52 +00:00
Peter Wemm
add3bbdaef Add missing prototype for pipeselwakeup (a recently added function) - gcc
bitches about it..
1996-02-17 14:47:16 +00:00
John Dyson
f3e79aa705 Add ifdefs for non-freebsd system usage. Add missing select wakeups,
and make the select wakup code a little neater.
1996-02-11 22:09:50 +00:00
John Dyson
5af564b4f4 Add some missing requests for the read-side to wakeup the write-side. Also
add some missing wakeups by the write side to the read side.
1996-02-09 04:36:36 +00:00
John Dyson
26d2f00960 Apparent fix for a pipe hang problem. 1996-02-07 06:41:56 +00:00
John Dyson
f29e1bd629 More fixes from bde.
Only modify times on success.
	splhigh() around time variable usage.
	Make atomic writes more posix compliant.
	Spelling errors.
Submitted by:	bde
1996-02-05 05:50:34 +00:00
John Dyson
96cc6b1011 Kva space allocated for direct buffer wasn't quite big enough. The
system can panic easily without this patch.
1996-02-05 05:17:15 +00:00
John Dyson
dca5129987 Changed vm_fault_quick in vm_machdep.c to be global. Needed for
new pipe code.
1996-02-04 22:09:12 +00:00
John Dyson
2834ceec7c Improve the performance for pipe(2) again. Also include some
fixes for previous version of new pipes from Bruce Evans.  This
new version:

Supports more properly the semantics of select (BDE).
Supports "OLD_PIPE" correctly (kern_descrip.c, BDE).
Eliminates incorrect EPIPE returns (bash 'pipe broken' messages.)
Much faster yet, currently tuned relatively conservatively -- but now
	gives approx 50% more perf than the new pipes code did originally.
	(That was about 50% more perf than the original BSD pipe code.)

Known bugs outstanding:
	No support for async io (SIGIO).  Will be included soon.

Next to do:
	Merge support for FIFOs.

Submitted by: bde
1996-02-04 19:56:35 +00:00
John Dyson
4ab7a1a6c7 Fix another problem with the new pipe code, pointed out by Bruce Evans.
This one fixes a problem with interactions with signals.
1996-01-31 06:00:45 +00:00
John Dyson
56363b79a9 Fix some problems with return codes on the new pipe stuff. Bruce Evans
found the problems, and this commit will fix the "first batch" :-).
1996-01-31 02:05:12 +00:00
John Dyson
4fd00d508b Fixed an uninitialized variable (argument to vm_map_find) -- problem
that DG detected, and promptly found a fix.
Submitted by:	davidg
1996-01-29 02:57:33 +00:00
John Dyson
10c5615c1d Added new files to support the new fast pipes. After the follow-on
commits, pipe performance should increase significantly.  The pipe(2)
system call is currently supported, while fifofs will be added later.
1996-01-28 23:38:26 +00:00