Commit Graph

385 Commits

Author SHA1 Message Date
Christian S.J. Peron
ed6c545cf0 In addition to the real user ID check, do an explicit jail
check to ensure that the caller is not prison root.

The intention is to fix file descriptor creation so that
prison root can not use the last remaining file descriptors.
This privilege should be reserved for non-jailed root users.

Approved by:	bmilekic (mentor)
2004-07-14 19:04:31 +00:00
Poul-Henning Kamp
a769355f9b Explicitly initialize f_data and f_vnode to NULL.
Report f_vnode to userland in struct xfile.
2004-06-19 11:40:08 +00:00
Poul-Henning Kamp
89c9c53da0 Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.
2004-06-16 09:47:26 +00:00
Robert Watson
395a08c904 Extend coverage of SOCK_LOCK(so) to include so_count, the socket
reference count:

- Assert SOCK_LOCK(so) macros that directly manipulate so_count:
  soref(), sorele().

- Assert SOCK_LOCK(so) in macros/functions that rely on the state of
  so_count: sofree(), sotryfree().

- Acquire SOCK_LOCK(so) before calling these functions or macros in
  various contexts in the stack, both at the socket and protocol
  layers.

- In some cases, perform soisdisconnected() before sotryfree(), as
  this could result in frobbing of a non-present socket if
  sotryfree() actually frees the socket.

- Note that sofree()/sotryfree() will release the socket lock even if
  they don't free the socket.

Submitted by:	sam
Sponsored by:	FreeBSD Foundation
Obtained from:	BSD/OS
2004-06-12 20:47:32 +00:00
Poul-Henning Kamp
1930e303cf Deorbit COMPAT_SUNOS.
We inherited this from the sparc32 port of BSD4.4-Lite1.  We have neither
a sparc32 port nor a SunOS4.x compatibility desire these days.
2004-06-11 11:16:26 +00:00
Robert Watson
63732dce22 Push the VOP_ADVLOCK() call to release advisory locks on vnode file
descriptors out of fdrop_locked() and into vn_closefile().  This
removes all knowledge of vnodes from fdrop_locked(), since the lock
behavior was specific to vnodes.  This also removes the specific
requirement for Giant in fdrop_locked(), it's now only required by
code that it calls into.

Add GIANT_REQUIRED to vn_closefile() since VFS requires Giant.
2004-06-01 18:03:20 +00:00
Warner Losh
7f8a436ff2 Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
2004-04-05 21:03:37 +00:00
Robert Watson
a1288c786e Conditionally assert Giant in fputsock() based on the value of
debug.mpsafenet.
2004-03-29 00:33:02 +00:00
Don Lewis
47934cef8f Split the mlock() kernel code into two parts, mlock(), which unpacks
the syscall arguments and does the suser() permission check, and
kern_mlock(), which does the resource limit checking and calls
vm_map_wire().  Split munlock() in a similar way.

Enable the RLIMIT_MEMLOCK checking code in kern_mlock().

Replace calls to vslock() and vsunlock() in the sysctl code with
calls to kern_mlock() and kern_munlock() so that the sysctl code
will obey the wired memory limits.

Nuke the vslock() and vsunlock() implementations, which are no
longer used.

Add a member to struct sysctl_req to track the amount of memory
that is wired to handle the request.

Modify sysctl_wire_old_buffer() to return an error if its call to
kern_mlock() fails.  Only wire the minimum of the length specified
in the sysctl request and the length specified in its argument list.
It is recommended that sysctl handlers that use sysctl_wire_old_buffer()
should specify reasonable estimates for the amount of data they
want to return so that only the minimum amount of memory is wired
no matter what length has been specified by the request.

Modify the callers of sysctl_wire_old_buffer() to look for the
error return.

Modify sysctl_old_user to obey the wired buffer length and clean up
its implementation.

Reviewed by:	bms
2004-02-26 00:27:04 +00:00
Poul-Henning Kamp
dc08ffec87 Device megapatch 4/6:
Introduce d_version field in struct cdevsw, this must always be
initialized to D_VERSION.

Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing
four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.
2004-02-21 21:10:55 +00:00
Dag-Erling Smørgrav
44f4b94b38 Don't bother storing a result when all you need are the side effects. 2004-02-16 18:38:46 +00:00
David Malone
a82294d01c In fdcheckstd the descriptor table should never be shared, so just
KASSERT this rather than trying to deal with what happens when file
descriptors change out from under us.
2004-02-15 21:14:48 +00:00
John Baldwin
91d5354a2c Locking for the per-process resource limits structure.
- struct plimit includes a mutex to protect a reference count.  The plimit
  structure is treated similarly to struct ucred in that is is always copy
  on write, so having a reference to a structure is sufficient to read from
  it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
  limits from a process to keep the limit structure from changing out from
  under you while reading from it.
- Various global limits that are ints are not protected by a lock since
  int writes are atomic on all the archs we support and thus a lock
  wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
  behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
  either an rlimit, or the current or max individual limit of the specified
  resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
  other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
  (it didn't used the stackgap when it should have) but uses lim_rlimit()
  and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
  but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits.  It
  also no longer uses the stackgap for accessing sysctl's for the
  ibcs2_sysconf() syscall but uses kernel_sysctl() instead.  As a result,
  ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.

Submitted by:	mtm (mostly, I only did a few cleanups and catchups)
Tested on:	i386
Compiled on:	alpha, amd64
2004-02-04 21:52:57 +00:00
Dag-Erling Smørgrav
a6d4491c71 Restore correct semantics for F_DUPFD fcntl. This should fix the errors
people have been getting with configure scripts.
2004-01-17 00:59:04 +00:00
Dag-Erling Smørgrav
56a9fc0e93 WITNESS won't let us hold two filedesc locks at the same time, so juggle
fdp and newfdp around a bit.
2004-01-16 21:54:56 +00:00
Dag-Erling Smørgrav
ddce426f69 Remove two KASSERTs which were overly paranoid. 2004-01-16 08:45:56 +00:00
Dag-Erling Smørgrav
12d568c2b1 Take care to drop locks when calling malloc() 2004-01-15 18:50:11 +00:00
Dag-Erling Smørgrav
a2fe44e8cf New file descriptor allocation code, derived from similar code introduced
in OpenBSD by Niels Provos.  The patch introduces a bitmap of allocated
file descriptors which is used to locate available descriptors when a new
one is needed.  It also moves the task of growing the file descriptor table
out of fdalloc(), reducing complexity in both fdalloc() and do_dup().

Debts of gratitude are owed to tjr@ (who provided the original patch on
which this work is based), grog@ (for the gdb(4) man page) and rwatson@
(for assistance with pxeboot(8)).
2004-01-15 10:15:04 +00:00
Dag-Erling Smørgrav
c9de31f55f Mechanical whitespace cleanup. 2004-01-11 19:39:14 +00:00
Alan Cox
0e88a71798 Remove long dead code, specifically, code related to munmapfd().
(See also vm/vm_mmap.c revision 1.173.)
2004-01-11 06:59:21 +00:00
David Malone
70ad6c2190 Plug a leak of open files that happens when you exec a suid program
with one of std{in,out,err} open. This helps with the file descriptor
leaks reported on -current. This should probably be merged into 5.2.

Reviewed by:	ru
Tested by:	Bjoern A. Zeeb <bzeeb-lists@lists.zabbadoz.net>
2003-12-28 19:27:14 +00:00
David Malone
e1419c08e2 falloc allocates a file structure and adds it to the file descriptor
table, acquiring the necessary locks as it works. It usually returns
two references to the new descriptor: one in the descriptor table
and one via a pointer argument.

As falloc releases the FILEDESC lock before returning, there is a
potential for a process to close the reference in the file descriptor
table before falloc's caller gets to use the file. I don't think this
can happen in practice at the moment, because Giant indirectly protects
closes.

To stop the file being completly closed in this situation, this change
makes falloc set the refcount to two when both references are returned.
This makes life easier for several of falloc's callers, because the
first thing they previously did was grab an extra reference on the
file.

Reviewed by:	iedowse
Idea run past:	jhb
2003-10-19 20:41:07 +00:00
Robert Watson
c142b0fcfe Remove the global variable 'cmask', which was used to initialize the
fd_cmask field in the file descriptor structure for the first process
indirectly from CMASK, and when an fd structure is initialized before
being filled in, and instead just use CMASK.  This appears to be an
artifact left over from the initial integration of quotas into BSD.

Suggested by:	peter
2003-10-02 03:57:59 +00:00
David Malone
d2cce3d6e8 Do some minor Giant pushdown made possible by copyin, fget, fdrop,
malloc and mbuf allocation all not requiring Giant.

1) ostat, fstat and nfstat don't need Giant until they call fo_stat.
2) accept can copyin the address length without grabbing Giant.
3) sendit doesn't need Giant, so don't bother grabbing it until kern_sendit.
4) move Giant grabbing from each indivitual recv* syscall to recvit.
2003-08-04 21:28:57 +00:00
Alan Cox
fbe1bdddcc Revision 1.51 of vm/uma_core.c modified uma_large_free() to acquire Giant
when needed.  So, don't do it here.
2003-07-29 05:23:19 +00:00
Robert Watson
2e4a71cdb1 When exporting file descriptor data for threads invoking the
kern.file sysctl, don't return information about processes that
fail p_cansee(td, p).  This prevents sockstat and related
programs from seeing file descriptors owned by processes not
in the same jail as the thread, as well as having implications
for MAC, etc.

This is a partial solution: it permits an information leak about
the number of descriptors in the sizing calculation (but this is
not new information, you can also get it from kern.openfiles),
and doesn't attempt to mask file descriptors based on the
properties of the descriptor, only the process referencing it.
However, it provides most of what you want under most
circumstances, without complicating the locking.

PR:	54211
Based on a patch submitted by:	Pawel Jakub Dawidek <nick@garage.freebsd.pl>
2003-07-28 16:03:53 +00:00
Poul-Henning Kamp
7c89f162bc Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout. 2003-07-27 17:04:56 +00:00
Alan Cox
18e8d4e79c revision 1.51 of vm/uma_core.c modified uma_large_malloc() to acquire
Giant when needed.
2003-07-25 22:26:43 +00:00
Don Lewis
857d9c60d0 Extend the mutex pool implementation to permit the creation and use of
multiple mutex pools with different options and sizes.  Mutex pools can
be created with either the default sleep mutexes or with spin mutexes.
A dynamically created mutex pool can now be destroyed if it is no longer
needed.

Create two pools by default, one that matches the existing pool that
uses the MTX_NOWITNESS option that should be used for building higher
level locks, and a new pool with witness checking enabled.

Modify the users of the existing mutex pool to use the appropriate pool
in the new implementation.

Reviewed by:	jhb
2003-07-13 01:22:21 +00:00
Poul-Henning Kamp
1226914c17 Use the f_vnode field to tell which file descriptors have a vnode. 2003-07-04 12:20:27 +00:00
Poul-Henning Kamp
3b6d965263 Add a f_vnode field to struct file.
Several of the subtypes have an associated vnode which is used for
stuff like the f*() functions.

By giving the vnode a speparate field, a number of checks for the specific
subtype can be replaced simply with a check for f_vnode != NULL, and
we can later free f_data up to subtype specific use.

At this point in time, f_data still points to the vnode, so any code I
might have overlooked will still work.
2003-06-22 08:41:43 +00:00
Poul-Henning Kamp
eaaca5deee Don't (re)initialize f_gcflag to zero.
Move initialization of DTYPE_VNODE specific field f_seqcount into
the DTYPE_VNODE specific code.
2003-06-20 08:02:30 +00:00
Alfred Perlstein
bab88630ba Unlock the struct file lock before aquiring Giant, otherwise
we can deadlock because of lock order reversals.  This was not
caught because Witness ignores pool mutexes right now.

Diagnosis and help: truckman
Noticed by: pho
2003-06-19 18:13:07 +00:00
Mike Silbersack
4d7dfc31b8 Add a rate limited message reporting when kern.maxfiles is exceeded,
reporting who did it.

Also, fix a style bug introduced in the previous change.

MFC after:	1 week
2003-06-19 04:07:12 +00:00
Mike Silbersack
438f085b2f Reserve the last 5% of file descriptors for root use. This should allow
systems to fail more gracefully when a file descriptor exhaustion situation
occurs.

Original patch by:	David G. Andersen <dga@lcs.mit.edu>
PR:			45353
MFC after:		1 week
2003-06-18 18:57:58 +00:00
Poul-Henning Kamp
7c2d2efd58 Initialize struct fileops with C99 sparse initialization. 2003-06-18 18:16:40 +00:00
David E. O'Brien
677b542ea2 Use __FBSDID(). 2003-06-11 00:56:59 +00:00
Tor Egge
ad05d58087 Add tracking of process leaders sharing a file descriptor table and
allow a file descriptor table to be shared between multiple process
leaders.

PR:		50923
2003-06-02 16:05:32 +00:00
Poul-Henning Kamp
90471005e1 Remove needless return
Found by:       FlexeLint
2003-05-31 20:16:44 +00:00
Robert Watson
c1dca9ab07 VOP_PATHCONF() requires a vnode lock; this patch adds locking to
fpathconf(). The lock is held for direct calls to VOP_PATHCONF() in
pathconf() already.

Approved by:	re (jhb)
Pointed out by:	DEBUG_VFS_LOCKS
2003-05-15 21:13:08 +00:00
Mark Murray
51da11a27a Fix some easy, global, lint warnings. In most cases, this means
making some local variables static. In a couple of cases, this means
removing an unused variable.
2003-04-30 12:57:40 +00:00
Alexander Kabaev
104a9b7e3e Deprecate machine/limits.h in favor of new sys/limits.h.
Change all in-tree consumers to include <sys/limits.h>

Discussed on:	standards@
Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>
2003-04-29 13:36:06 +00:00
Poul-Henning Kamp
7ac40f5f59 Gigacommit to improve device-driver source compatibility between
branches:

Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.

This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.

Approved by:    re(scottl)
2003-03-03 12:15:54 +00:00
Tor Egge
c6faf3bf1d Remove unneeded code added in revision 1.188. 2003-03-01 17:18:28 +00:00
Scott Long
3303c14b57 Don't NULL out p_fd until after closefd() has been called. This isn't
totally correct, but it has caused breakage for too long.  I welcome
someone with more fd fu to fix it correctly.
2003-02-24 05:46:55 +00:00
Mike Makonnen
750a91d8b1 Remove a comment which hasn't been true since rev. 1.158
Approved by:	jhb, markm (mentor)(implicit)
2003-02-22 05:59:48 +00:00
Warner Losh
a163d034fa Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
Tor Egge
218a01e062 Avoid file lock leakage when linuxthreads port or rfork is used:
- Mark the process leader as having an advisory lock
  - Check if process leader is marked as having advisory lock when
    closing file
  - Check that file is still open after lock has been obtained
  - Don't allow file descriptor table sharing between processes
    with different leaders

PR:		10265
Reviewed by:	alfred
2003-02-15 22:43:05 +00:00
Alfred Perlstein
e7d6662f1b Do not allow kqueues to be passed via unix domain sockets. 2003-02-15 06:04:55 +00:00
Alfred Perlstein
edf6699ae6 Fix LOR with PROC/filedesc. Introduce fdesc_mtx that will be used as a
barrier between free'ing filedesc structures.  Basically if you want to
access another process's filedesc, you want to hold this mutex over the
entire operation.
2003-02-15 05:52:56 +00:00
Alfred Perlstein
42e1b74af2 Don't lock FILEDESC under PROC.
The locking here needs to be revisited, but this ought to get rid of the
LOR messages that people are complaining about for now.  I imagine either
I or someone else interested with smp will eventually clear this up.
2003-02-11 07:20:52 +00:00
Poul-Henning Kamp
4af0d0c21f NODEVFS cleanup: remove #ifdefs 2003-01-30 12:35:40 +00:00
Jeffrey Hsu
a448a15bc1 Add missing SMP file locks around read-modify-write operations on
the flag field.

Reviewed by:	rwatson
2003-01-21 20:20:48 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
Poul-Henning Kamp
7e760e148a Originally when DEVFS was added, a global variable "devfs_present"
was used to control code which were conditional on DEVFS' precense
since this avoided the need for large-scale source pollution with
#include "opt_geom.h"

Now that we approach making DEVFS standard, replace these tests
with an #ifdef to facilitate mechanical removal once DEVFS becomes
non-optional.

No functional change by this commit.
2003-01-19 11:03:07 +00:00
Matthew Dillon
48e3128b34 Bow to the whining masses and change a union back into void *. Retain
removal of unnecessary casts and throw in some minor cleanups to see if
anyone complains, just for the hell of it.
2003-01-13 00:33:17 +00:00
Matthew Dillon
cd72f2180b Change struct file f_data to un_data, a union of the correct struct
pointer types, and remove a huge number of casts from code using it.

Change struct xfile xf_data to xun_data (ABI is still compatible).

If we need to add a #define for f_data and xf_data we can, but I don't
think it will be necessary.  There are no operational changes in this
commit.
2003-01-12 01:37:13 +00:00
Jacques Vidrine
f0c093284d Correct file descriptor leaks in lseek and do_dup.
The leak in lseek was introduced in vfs_syscalls.c revision 1.218.
The leak in do_dup was introduced in kern_descrip.c revision 1.158.

Submitted by:	iedowse
2003-01-06 13:19:05 +00:00
Alfred Perlstein
c522c1bf4b fdcopy() only needs a filedesc pointer. 2003-01-01 01:19:31 +00:00
Alfred Perlstein
03282e6e3d purge 'register'. 2003-01-01 01:05:54 +00:00
Alfred Perlstein
c7f1c11b20 Since fdshare() and fdinit() only operate on filedescs, make them
take pointers to filedesc structures instead of threads.  This makes
it more clear that they do not do any voodoo with the thread/proc
or anything other than the filedesc passed in or returned.

Remove some XXX KSE's as this resolves the issue.
2003-01-01 01:01:14 +00:00
Alfred Perlstein
59c97598d3 fdinit() does not need to lock the filedesc it is creating as no one
besideds itself has access until the function returns.
2003-01-01 00:35:46 +00:00
Robert Watson
f0bc12ee8d Improve consistency between devfs and MAKEDEV: use UID_ROOT and
GID_WHEEL instead of UID_BIN and GID_BIN for /dev/fd/* entries.

Submitted by:	kris
2002-12-27 16:54:44 +00:00
Poul-Henning Kamp
a7010ee2f4 White-space changes. 2002-12-24 09:44:51 +00:00
Poul-Henning Kamp
f3a682116c Detediousficate declaration of fileops array members by introducing
typedefs for them.
2002-12-23 21:53:20 +00:00
Tim J. Robbins
9d0fffd3ca Drop filedesc lock and acquire Giant around calls to malloc() and free().
These call uma_large_malloc() and uma_large_free() which require Giant.
Fixes panic when descriptor table is larger than KMEM_ZMAX bytes
noticed by kkenn.

Reviewed by:	jhb
2002-12-13 09:59:40 +00:00
John Baldwin
04f4a16448 If the file descriptors passed into do_dup() are negative, return EBADF
instead of panicing.  Also, perform some of the simpler sanity checks on
the fds before acquiring the filedesc lock.

Approved by:	re
Reported by:	Dan Nelson <dan@emsphone.com> and others
2002-11-26 17:22:15 +00:00
Garrett Wollman
c7047e5204 Change the way support for asynchronous I/O is indicated to applications
to conform to 1003.1-2001.  Make it possible for applications to actually
tell whether or not asynchronous I/O is supported.

Since FreeBSD's aio implementation works on all descriptor types, don't
call down into file or vnode ops when [f]pathconf() is asked about
_PC_ASYNC_IO; this avoids the need for every file and vnode op to know about
it.
2002-10-27 18:07:41 +00:00
John Baldwin
4562d72638 Don't lock the proc lock to clear p_fd. p_fd isn't protected by the proc
lock.
2002-10-18 17:42:28 +00:00
John Baldwin
bf3e55aa2c Many style and whitespace fixes.
Submitted by:	bde (mostly)
2002-10-16 15:45:37 +00:00
John Baldwin
18d9bd8f65 Sort includes a bit.
Submitted by:	bde
2002-10-16 15:14:31 +00:00
John Baldwin
7fd1f2b8bc Argh. Put back setting of P_ADVLOCK for the F_WRLCK case that was
accidentally lost in the previous revision.

Submitted by:	bde
Pointy hat to:	jhb
2002-10-15 18:10:13 +00:00
John Baldwin
60a6965a88 Remove the leaderp variable and just access p_leader directly. The
p_leader field is not protected by the proc lock but is only set during
fork1() by the parent process and never changes.
2002-10-15 00:03:40 +00:00
Don Lewis
91e97a8266 In an SMP environment post-Giant it is no longer safe to blindly
dereference the struct sigio pointer without any locking.  Change
fgetown() to take a reference to the pointer instead of a copy of the
pointer and call SIGIO_LOCK() before copying the pointer and
dereferencing it.

Reviewed by:	rwatson
2002-10-03 02:13:00 +00:00
Thomas Moestl
dde1c2c0d6 fcntl(..., F_SETLKW, ...) takes a pointer to a struct flock just like
F_SETLK does, so it also needs this structure copied in in fnctl() before
calling kern_fcntl().
2002-09-16 01:05:15 +00:00
Nate Lawson
06be2aaa83 Remove all use of vnode->v_tag, replacing with appropriate substitutes.
v_tag is now const char * and should only be used for debugging.

Additionally:
1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK
2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which
is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP.

Suggested by:   phk
Reviewed by:    bde, rwatson (earlier version)
2002-09-14 09:02:28 +00:00
Thomas Moestl
4e115a85ab Fix fcntl(..., F_GETOWN, ...) and fcntl(..., F_SETOWN, ...) on sparc64
by not passing a pointer to a register_t or intptr_t when the code in
the lower layers expects one to an int.
2002-09-13 15:15:16 +00:00
John Baldwin
5fc3031366 - Change falloc() to acquire an fd from the process table last so that
it can do it w/o needing to hold the filelist_lock sx lock.
- fdalloc() doesn't need Giant to call free() anymore.  It also doesn't
  need to drop and reacquire the filedesc lock around free() now as a
  result.
- Try to make the code that copies fd tables when extending the fd table in
  fdalloc() a bit more readable by performing assignments in separate
  statements.  This is still a bit ugly though.
- Use max() instead of an if statement so to figure out the starting point
  in the search-for-a-free-fd loop in fdalloc() so it reads better next to
  the min() in the previous line.
- Don't grow nfiles in steps up to the size needed if we dup2() to some
  really large number.  Go ahead and double 'nfiles' in a loop prior
  to doing the malloc().
- malloc() doesn't need Giant now.
- Use malloc() and free() instead of MALLOC() and FREE() in fdalloc().
- Check to see if the size we are going to grow to is too big, not if the
  current size of the fd table is too big in the loop in fdalloc().  This
  means if we are out of space or if dup2() requests too high of a fd,
  then we will return an error before we go off and try to allocate some
  huge table and copy the existing table into it.
- Move all of the logic for dup'ing a file descriptor into do_dup() instead
  of putting some of it in do_dup() and duplicating other parts in four
  different places.  This makes dup(), dup2(), and fcntl(F_DUPFD) basically
  wrappers of do_dup now.  fcntl() still has an extra check since it uses
  a different error return value in one case then the other functions.
- Add a KASSERT() for an assertion that may not always be true where the
  fdcheckstd() function assumes that falloc() returns the fd requested and
  not some other fd.  I think that the assertion is always true because we
  are always single-threaded when we get to this point, but if one was
  using rfork() and another process sharing the fd table were playing with
  the fd table, there might could be a problem.
- To handle the problem of a file descriptor we are dup()'ing being closed
  out from under us in dup() in general, do_dup() now obtains a reference
  on the file in question before calling fdalloc().  If after the call to
  fdalloc() the file for the fd we are dup'ing is a different file, then
  we drop our reference on the original file and return EBADF.  This
  race was only handled in the dup2() case before and would just retry
  the operation.  The error return allows the user to know they are being
  stupid since they have a locking bug in their app instead of dup'ing
  some other descriptor and returning it to them.

Tested on:	i386, alpha, sparc64
2002-09-03 20:16:31 +00:00
Ian Dowse
49c2ff159f Split fcntl() into a wrapper and a kernel-callable kern_fcntl()
implementation. The wrapper is responsible for copying additional
structure arguments (struct flock) to and from userland.
2002-09-02 22:24:14 +00:00
Philippe Charnier
93b0017f88 Replace various spelling with FALLTHROUGH which is lint()able 2002-08-25 13:23:09 +00:00
Robert Watson
d49fa1ca6e In continuation of early fileop credential changes, modify fo_ioctl() to
accept an 'active_cred' argument reflecting the credential of the thread
initiating the ioctl operation.

- Change fo_ioctl() to accept active_cred; change consumers of the
  fo_ioctl() interface to generally pass active_cred from td->td_ucred.
- In fifofs, initialize filetmp.f_cred to ap->a_cred so that the
  invocations of soo_ioctl() are provided access to the calling f_cred.
  Pass ap->a_td->td_ucred as the active_cred, but note that this is
  required because we don't yet distinguish file_cred and active_cred
  in invoking VOP's.
- Update kqueue_ioctl() for its new argument.
- Update pipe_ioctl() for its new argument, pass active_cred rather
  than td_ucred to MAC for authorization.
- Update soo_ioctl() for its new argument.
- Update vn_ioctl() for its new argument, use active_cred rather than
  td->td_ucred to authorize VOP_IOCTL() and the associated VOP_GETATTR().

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-17 02:36:16 +00:00
Robert Watson
ea6027a8e1 Make similar changes to fo_stat() and fo_poll() as made earlier to
fo_read() and fo_write(): explicitly use the cred argument to fo_poll()
as "active_cred" using the passed file descriptor's f_cred reference
to provide access to the file credential.  Add an active_cred
argument to fo_stat() so that implementers have access to the active
credential as well as the file credential.  Generally modify callers
of fo_stat() to pass in td->td_ucred rather than fp->f_cred, which
was redundantly provided via the fp argument.  This set of modifications
also permits threads to perform these operations on behalf of another
thread without modifying their credential.

Trickle this change down into fo_stat/poll() implementations:

- badfo_poll(), badfo_stat(): modify/add arguments.
- kqueue_poll(), kqueue_stat(): modify arguments.
- pipe_poll(), pipe_stat(): modify/add arguments, pass active_cred to
  MAC checks rather than td->td_ucred.
- soo_poll(), soo_stat(): modify/add arguments, pass fp->f_cred rather
  than cred to pru_sopoll() to maintain current semantics.
- sopoll(): moidfy arguments.
- vn_poll(), vn_statfile(): modify/add arguments, pass new arguments
  to vn_stat().  Pass active_cred to MAC and fp->f_cred to VOP_POLL()
  to maintian current semantics.
- vn_close(): rename cred to file_cred to reflect reality while I'm here.
- vn_stat(): Add active_cred and file_cred arguments to vn_stat()
  and consumers so that this distinction is maintained at the VFS
  as well as 'struct file' layer.  Pass active_cred instead of
  td->td_ucred to MAC and to VOP_GETATTR() to maintain current semantics.

- fifofs: modify the creation of a "filetemp" so that the file
  credential is properly initialized and can be used in the socket
  code if desired.  Pass ap->a_td->td_ucred as the active
  credential to soo_poll().  If we teach the vnop interface about
  the distinction between file and active credentials, we would use
  the active credential here.

Note that current inconsistent passing of active_cred vs. file_cred to
VOP's is maintained.  It's not clear why GETATTR would be authorized
using active_cred while POLL would be authorized using file_cred at
the file system level.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-16 12:52:03 +00:00
Robert Watson
9ca435893b In order to better support flexible and extensible access control,
make a series of modifications to the credential arguments relating
to file read and write operations to cliarfy which credential is
used for what:

- Change fo_read() and fo_write() to accept "active_cred" instead of
  "cred", and change the semantics of consumers of fo_read() and
  fo_write() to pass the active credential of the thread requesting
  an operation rather than the cached file cred.  The cached file
  cred is still available in fo_read() and fo_write() consumers
  via fp->f_cred.  These changes largely in sys_generic.c.

For each implementation of fo_read() and fo_write(), update cred
usage to reflect this change and maintain current semantics:

- badfo_readwrite() unchanged
- kqueue_read/write() unchanged
  pipe_read/write() now authorize MAC using active_cred rather
  than td->td_ucred
- soo_read/write() unchanged
- vn_read/write() now authorize MAC using active_cred but
  VOP_READ/WRITE() with fp->f_cred

Modify vn_rdwr() to accept two credential arguments instead of a
single credential: active_cred and file_cred.  Use active_cred
for MAC authorization, and select a credential for use in
VOP_READ/WRITE() based on whether file_cred is NULL or not.  If
file_cred is provided, authorize the VOP using that cred,
otherwise the active credential, matching current semantics.

Modify current vn_rdwr() consumers to pass a file_cred if used
in the context of a struct file, and to always pass active_cred.
When vn_rdwr() is used without a file_cred, pass NOCRED.

These changes should maintain current semantics for read/write,
but avoid a redundant passing of fp->f_cred, as well as making
it more clear what the origin of each credential is in file
descriptor read/write operations.

Follow-up commits will make similar changes to other file descriptor
operations, and modify the MAC framework to pass both credentials
to MAC policy modules so they can implement either semantic for
revocation.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-15 20:55:08 +00:00
Dag-Erling Smørgrav
aefe27a25c Have the kern.file sysctl export xfiles rather than files. The truth is
out there!

Sponsored by:	DARPA, NAI Labs
2002-07-31 12:26:52 +00:00
Don Lewis
5c38b6dbce Wire the sysctl output buffer before grabbing any locks to prevent
SYSCTL_OUT() from blocking while locks are held.  This should
only be done when it would be inconvenient to make a temporary copy of
the data and defer calling SYSCTL_OUT() until after the locks are
released.
2002-07-28 19:59:31 +00:00
John Baldwin
3d3f20cbe6 Preallocate a struct file as the first thing in falloc() before we lock
the filelist_lock and check nfiles.  This closes a race where we had to
unlock the filedesc to re-lock the filelist_lock.

Reported by:	David Xu
Reviewed by:	bde (mostly)
2002-07-17 02:48:43 +00:00
Alfred Perlstein
7f05b0353a More caddr_t removal, make fo_ioctl take a void * instead of a caddr_t. 2002-06-29 01:50:25 +00:00
Seigo Tanimura
4cc20ab1f0 Back out my lats commit of locking down a socket, it conflicts with hsu's work.
Requested by:	hsu
2002-05-31 11:52:35 +00:00
Seigo Tanimura
243917fe3b Lock down a socket, milestone 1.
o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a
  socket buffer. The mutex in the receive buffer also protects the data
  in struct socket.

o Determine the lock strategy for each members in struct socket.

o Lock down the following members:

  - so_count
  - so_options
  - so_linger
  - so_state

o Remove *_locked() socket APIs.  Make the following socket APIs
  touching the members above now require a locked socket:

 - sodisconnect()
 - soisconnected()
 - soisconnecting()
 - soisdisconnected()
 - soisdisconnecting()
 - sofree()
 - soref()
 - sorele()
 - sorwakeup()
 - sotryfree()
 - sowakeup()
 - sowwakeup()

Reviewed by:	alfred
2002-05-20 05:41:09 +00:00
Tom Rhodes
d394511de3 More s/file system/filesystem/g 2002-05-16 21:28:32 +00:00
Alfred Perlstein
e649887b1e Make funsetown() take a 'struct sigio **' so that the locking can
be done internally.

Ensure that no one can fsetown() to a dying process/pgrp.  We need
to check the process for P_WEXIT to see if it's exiting.  Process
groups are already safe because there is no such thing as a pgrp
zombie, therefore the proctree lock completely protects the pgrp
from having sigio structures associated with it after it runs
funsetownlst.

Add sigio lock to witness list under proctree and allproc, but over
proc and pgrp.

Seigo Tanimura helped with this.
2002-05-06 19:31:28 +00:00
Seigo Tanimura
6041fa0a60 As malloc(9) and free(9) are now Giant-free, remove the Giant lock
across malloc(9) and free(9) of a pgrp or a session.
2002-05-03 07:46:59 +00:00
Seigo Tanimura
c8d8a686e4 Fix the lock order reversal between the sigio lock and a process/pgrp lock in
funsetownlst() by locking the sigio lock across funsetownlst().
2002-05-03 05:32:25 +00:00
Alfred Perlstein
f132072368 Redo the sigio locking.
Turn the sigio sx into a mutex.

Sigio lock is really only needed to protect interrupts from dereferencing
the sigio pointer in an object when the sigio itself is being destroyed.

In order to do this in the most unintrusive manner change pgsigio's
sigio * argument into a **, that way we can lock internally to the
function.
2002-05-01 20:44:46 +00:00
Jeroen Ruigrok van der Werven
1cf1a725ff Fix indention which I did wrong in a previous commit.
Submitted by:	bde
2002-04-29 08:18:06 +00:00
Seigo Tanimura
d48d4b2501 Add a global sx sigio_lock to protect the pointer to the sigio object
of a socket.  This avoids lock order reversal caused by locking a
process in pgsigio().

sowakeup() and the callers of it (sowwakeup, soisconnected, etc.) now
require sigio_lock to be locked.  Provide sowwakeup_locked(),
soisconnected_locked(), and so on in case where we have to modify a
socket and wake up a process atomically.
2002-04-27 08:24:29 +00:00
Alfred Perlstein
ea5b39d029 Don't FILEDESC_LOCK around calls to falloc(). 2002-04-22 20:09:11 +00:00
Seigo Tanimura
1c2451c24d Push down Giant for setpgid(), setsid() and aio_daemon(). Giant protects only
malloc(9) and free(9).
2002-04-20 12:02:52 +00:00
Jacques Vidrine
e983a3762b When exec'ing a set[ug]id program, make sure that the stdio file descriptors
(0, 1, 2) are allocated by opening /dev/null for any which are not already
open.

Reviewed by:	alfred, phk
MFC after:	2 days
2002-04-19 00:45:29 +00:00
John Baldwin
ba626c1db2 Lock proctree_lock instead of pgrpsess_lock. 2002-04-16 17:11:34 +00:00
Jeroen Ruigrok van der Werven
bcbf4411d6 Use the correct macros for F_SETFD/F_GETFD instead of magic numbers.
Reflect that fact in the manual page.

PR:		12723
Submitted by:	Peter Jeremy <peter.jeremy@alcatel.com.au>
Approved by:	bde
MFC after:	2 weeks
2002-04-13 10:16:53 +00:00
John Baldwin
6008862bc2 Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on:	i386, alpha, sparc64
2002-04-04 21:03:38 +00:00
Seigo Tanimura
5cf4bcebbf The description of fd_mtx is "filedesc structure." 2002-03-29 11:26:05 +00:00
Jeff Roberson
c897b81311 Remove references to vm_zone.h and switch over to the new uma API.
Also, remove maxsockets.  If you look carefully you'll notice that the old
zone allocator never honored this anyway.
2002-03-20 04:09:59 +00:00
Alfred Perlstein
4d77a549fe Remove __P. 2002-03-19 21:25:46 +00:00
Jeff Roberson
8355f576a9 This is the first part of the new kernel memory allocator. This replaces
malloc(9) and vm_zone with a slab like allocator.

Reviewed by:	arch@
2002-03-19 09:11:49 +00:00
Alfred Perlstein
4a950215ef Close a race when vfs_syscalls.c:checkdirs() runs.
To do this protect the filedesc pointer in the proc with PROC_LOCK
in both checkdirs() and kern_descrip.c:fdfree().
2002-03-19 04:30:04 +00:00
Alfred Perlstein
628abf6c69 Giant pushdown for read/write/pread/pwrite syscalls.
kern/kern_descrip.c:
Aquire Giant in fdrop_locked when file refcount hits zero, this removes
the requirement for the caller to own Giant for the most part.

kern/kern_ktrace.c:
Aquire Giant in ktrgenio, simplifies locking in upper read/write syscalls.

kern/vfs_bio.c:
Aquire Giant in bwillwrite if needed.

kern/sys_generic.c
Giant pushdown, remove Giant for:
   read, pread, write and pwrite.
readv and writev aren't done yet because of the possible malloc calls
for iov to uio processing.

kern/sys_socket.c
Grab giant in the socket fo_read/write functions.

kern/vfs_vnops.c
Grab giant in the vnode fo_read/write functions.
2002-03-15 08:03:46 +00:00
John Baldwin
a854ed9893 Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.
2002-02-27 18:32:23 +00:00
Seigo Tanimura
f591779bb5 Lock struct pgrp, session and sigio.
New locks are:

- pgrpsess_lock which locks the whole pgrps and sessions,
- pg_mtx which protects the pgrp members, and
- s_mtx which protects the session members.

Please refer to sys/proc.h for the coverage of these locks.

Changes on the pgrp/session interface:

- pgfind() needs the pgrpsess_lock held.

- The caller of enterpgrp() is responsible to allocate a new pgrp and
  session.

- Call enterthispgrp() in order to enter an existing pgrp.

- pgsignal() requires a pgrp lock held.

Reviewed by:	jhb, alfred
Tested on:	cvsup.jp.FreeBSD.org
		(which is a quad-CPU machine running -current)
2002-02-23 11:12:57 +00:00
Peter Wemm
1037bbb195 Fix broken Giant locking protocol introduced in rev 1.114. You cannot
unlock Giant if it is not locked in the first place.  This make the
nfstat(2) syscall (#278) a nice panic(2) implementation.
2002-02-08 09:16:57 +00:00
Alfred Perlstein
3865fa138b Remove bogus assertion in dup2 that can lead to panics when kernel
threads race for a file slot.

dup2(2) incorrectly assumes that if it needs to grow the ofiles
array that it will get what it wants.  This assertion was valid
before we allowed shared filedescriptor tables but is now incorrect.

The assertion can trigger superfolous panics if the thread doing a
dup2 looses a race with another thread while possibly blocked in
the MALLOC call in fdalloc.  Another thread may grab the slot we
are requesting which makes fdalloc return something other than what
we asked for, this will triggering the bogus assertion.

MFC after: 2 weeks
Reviewed by: phk
2002-02-01 19:25:36 +00:00
Alfred Perlstein
2b39743941 Avoid lock order reversal filedesc/Giant when calling FREE() in fdalloc
by unlocking the filedesc before calling FREE().

Submitted by: bde
2002-02-01 19:19:54 +00:00
Alfred Perlstein
eb20931127 Attempt to fixup select(2) and poll(2), this should fix some races with
other threads as well as speed up the interfaces.

To fix the race and accomplish the speedup, remove selholddrop and
pollholddrop.  The entire concept is somewhat bogus because holding
the individual struct file pointers offers us no guarantees that
another thread context won't close it on us thereby removing our
access to our own reference.

Selholddrop and pollholddrop also would do multiple locks and unlocks
of mutexes _per-file_ in the fd arrays to be scanned, this needed to
be sped up.

Instead of using selholddrop and pollholddrop, simply hold the
filedesc lock over the selscan and pollscan functions.  This should
protect us against close(2)'s on the files as reduce the multiple
lock/unlock pairs per fd into a single lock over the filedesc.
2002-01-29 22:54:19 +00:00
Alfred Perlstein
5980a85f08 Backout 1.120, EINVAL isn't a proper error return when the passed fd is
negative, the 'pointer' referred to by the manpage is actually the
struct file's f_offset field.

Pointed out by: bde
2002-01-29 17:12:10 +00:00
Alfred Perlstein
095f670d4e in fget() return EINVAL when the descriptor requested is negative. 2002-01-23 08:40:35 +00:00
Alfred Perlstein
767567d3c2 use mutex pools for "struct file" locking.
fix indentation of FILE_LOCK/UNLOCK macros while I'm here.
2002-01-20 22:58:08 +00:00
Alfred Perlstein
74aac58b52 Push down Giant in dup(2) and dup2(2), Giant is only needed when
calling closef() in the case of dup2(2) duping over a descriptor
and when fdalloc must grow or free a filedesc.
2002-01-15 00:58:40 +00:00
Alfred Perlstein
a4db49537b Replace ffind_* with fget calls.
Make fget MPsafe.

Make fgetvp and fgetsock use the fget subsystem to reduce code bloat.

Push giant down in fpathconf().
2002-01-14 00:13:45 +00:00
Alfred Perlstein
ba868b0da2 Comment fdrop and fdrop_locked functions. 2002-01-13 12:58:14 +00:00
Alfred Perlstein
c2824dd49b Implement ffind_hold using ffind_lock.
Recommended by: jhb
2002-01-13 12:57:02 +00:00
Alfred Perlstein
426da3bcfb SMP Lock struct file, filedesc and the global file list.
Seigo Tanimura (tanimura) posted the initial delta.

I've polished it quite a bit reducing the need for locking and
adapting it for KSE.

Locks:

1 mutex in each filedesc
   protects all the fields.
   protects "struct file" initialization, while a struct file
     is being changed from &badfileops -> &pipeops or something
     the filedesc should be locked.

1 mutex in each struct file
   protects the refcount fields.
   doesn't protect anything else.
   the flags used for garbage collection have been moved to
     f_gcflag which was the FILLER short, this doesn't need
     locking because the garbage collection is a single threaded
     container.
  could likely be made to use a pool mutex.

1 sx lock for the global filelist.

struct file *	fhold(struct file *fp);
        /* increments reference count on a file */

struct file *	fhold_locked(struct file *fp);
        /* like fhold but expects file to locked */

struct file *	ffind_hold(struct thread *, int fd);
        /* finds the struct file in thread, adds one reference and
                returns it unlocked */

struct file *	ffind_lock(struct thread *, int fd);
        /* ffind_hold, but returns file locked */

I still have to smp-safe the fget cruft, I'll get to that asap.
2002-01-13 11:58:06 +00:00
Jonathan Lemon
2b846bd3a5 When removing kqueue descriptors from the descriptor table during a fork,
update fd_freefile and fd_lastfile as well, to keep things in sync.

Pointed out by: Debbie Chu <dchu@juniper.net>
2001-12-14 19:02:57 +00:00
Matthew Dillon
b1e4abd246 Give struct socket structures a ref counting interface similar to
vnodes.  This will hopefully serve as a base from which we can
expand the MP code.  We currently do not attempt to obtain any
mutex or SX locks, but the door is open to add them when we nail
down exactly how that part of it is going to work.
2001-11-17 03:07:11 +00:00
Matthew Dillon
b064d43d8f remove holdfp()
Replace uses of holdfp() with fget*() or fgetvp*() calls as appropriate

introduce fget(), fget_read(), fget_write() - these functions will take
a thread and file descriptor and return a file pointer with its ref
count bumped.

introduce fgetvp(), fgetvp_read(), fgetvp_write() - these functions will
take a thread and file descriptor and return a vref()'d vnode.

*_read() requires that the file pointer be FREAD, *_write that it be
FWRITE.

This continues the cleanup of struct filedesc and struct file access
routines which, when are all through with it, will allow us to then
make the API calls MP safe and be able to move Giant down into the fo_*
functions.
2001-11-14 06:30:36 +00:00
John Baldwin
bd78cece5d Change the kernel's ucred API as follows:
- crhold() returns a reference to the ucred whose refcount it bumps.
- crcopy() now simply copies the credentials from one credential to
  another and has no return value.
- a new crshared() primitive is added which returns true if a ucred's
  refcount is > 1 and false (0) otherwise.
2001-10-11 23:38:17 +00:00
Jonathan Lemon
1a6fc8ef63 When FREE()ing kqueue related structures, charge them to the correct bucket.
Submitted by: iedowse
Forgotten by: jlemon
2001-09-30 17:00:56 +00:00
Julian Elischer
9dbea9237c If an incoming struct proc could have been NULL before, tehn don't
automatically change the code to add

struct proc *p = td->td_proc;

because now 'td' is probably capable of being NULL too.
I expect to see more of this kind of error during the 'weeding'
process. It's too easy to make. (junior hacker project.. look for these :-)

Submitted by:	mark Peek <mp@freebsd.org>
2001-09-12 20:26:57 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
Matthew Dillon
835a82ee2d Giant Pushdown. Saved the worst P4 tree breakage for last.
reboot() getpriority() setpriority() rtprio() osetrlimit() ogetrlimit()
    setrlimit() getrlimit() getrusage() getpid() getppid() getpgrp()
    getpgid() getsid() getgid() getegid() getgroups() setsid() setpgid()
    setuid() seteuid() setgid() setegid() setgroups() setreuid() setregid()
    setresuid() setresgid() getresuid() getresgid () __setugid() getlogin()
    setlogin() modnext() modfnext() modstat() modfind() kldload() kldunload()
    kldfind() kldnext() kldstat() kldfirstmod() kldsym() getdtablesize()
    dup2() dup() fcntl() close() ofstat() fstat() nfsstat() fpathconf()
    flock()
2001-09-01 19:04:37 +00:00
Andrey A. Chernov
c8e7634357 advlock: simplify overflow checks 2001-08-29 18:53:53 +00:00
Andrey A. Chernov
4b207d9868 Move <machine/*> after <sys/*>
Add missing fdrop() before EOVERFLOW

Pointed by:	bde
2001-08-23 13:19:32 +00:00
Andrey A. Chernov
69cc1d0d7f Detect off_t EOVERFLOW of start/end offsets calculations for adv. lock,
as POSIX require.
2001-08-23 07:42:40 +00:00
Chris Costello
c30d4da338 Remove the fildesc_clone() function and its associated unnecessary code.
It didn't implement the proper /dev/fd functionality (which would be to
include in the directory listing /dev/fd/n if the process has fd n open)
anyway.

Anything needing access to /dev/fd/n where n > 2 can use the optional
fdescfs module, which implements this properly and does not cause any
trouble with devfs.

Discussed with:	phk
2001-08-06 05:56:33 +00:00
Robert Watson
b1fc0ec1a7 o Merge contents of struct pcred into struct ucred. Specifically, add the
real uid, saved uid, real gid, and saved gid to ucred, as well as the
  pcred->pc_uidinfo, which was associated with the real uid, only rename
  it to cr_ruidinfo so as not to conflict with cr_uidinfo, which
  corresponds to the effective uid.
o Remove p_cred from struct proc; add p_ucred to struct proc, replacing
  original macro that pointed.
  p->p_ucred to p->p_cred->pc_ucred.
o Universally update code so that it makes use of ucred instead of pcred,
  p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo,
  cr_{r,sv}{u,g}id instead of p_*, etc.
o Remove pcred0 and its initialization from init_main.c; initialize
  cr_ruidinfo there.
o Restruction many credential modification chunks to always crdup while
  we figure out locking and optimizations; generally speaking, this
  means moving to a structure like this:
        newcred = crdup(oldcred);
        ...
        p->p_ucred = newcred;
        crfree(oldcred);
  It's not race-free, but better than nothing.  There are also races
  in sys_process.c, all inter-process authorization, fork, exec, and
  exit.
o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid;
  remove comments indicating that the old arrangement was a problem.
o Restructure exec1() a little to use newcred/oldcred arrangement, and
  use improved uid management primitives.
o Clean up exit1() so as to do less work in credential cleanup due to
  pcred removal.
o Clean up fork1() so as to do less work in credential cleanup and
  allocation.
o Clean up ktrcanset() to take into account changes, and move to using
  suser_xxx() instead of performing a direct uid==0 comparision.
o Improve commenting in various kern_prot.c credential modification
  calls to better document current behavior.  In a couple of places,
  current behavior is a little questionable and we need to check
  POSIX.1 to make sure it's "right".  More commenting work still
  remains to be done.
o Update credential management calls, such as crfree(), to take into
  account new ruidinfo reference.
o Modify or add the following uid and gid helper routines:
      change_euid()
      change_egid()
      change_ruid()
      change_rgid()
      change_svuid()
      change_svgid()
  In each case, the call now acts on a credential not a process, and as
  such no longer requires more complicated process locking/etc.  They
  now assume the caller will do any necessary allocation of an
  exclusive credential reference.  Each is commented to document its
  reference requirements.
o CANSIGIO() is simplified to require only credentials, not processes
  and pcreds.
o Remove lots of (p_pcred==NULL) checks.
o Add an XXX to authorization code in nfs_lock.c, since it's
  questionable, and needs to be considered carefully.
o Simplify posix4 authorization code to require only credentials, not
  processes and pcreds.  Note that this authorization, as well as
  CANSIGIO(), needs to be updated to use the p_cansignal() and
  p_cansched() centralized authorization routines, as they currently
  do not take into account some desirable restrictions that are handled
  by the centralized routines, as well as being inconsistent with other
  similar authorization instances.
o Update libkvm to take these changes into account.

Obtained from:	TrustedBSD Project
Reviewed by:	green, bde, jhb, freebsd-arch, freebsd-audit
2001-05-25 16:59:11 +00:00
Mark Murray
fb919e4d5a Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by:	bde (with reservations)
2001-05-01 08:13:21 +00:00
John Baldwin
33a9ed9d0e Change the pfind() and zpfind() functions to lock the process that they
find before releasing the allproc lock and returning.

Reviewed by:	-smp, dfr, jake
2001-04-24 00:51:53 +00:00
Poul-Henning Kamp
f83880518b Send the remains (such as I have located) of "block major numbers" to
the bit-bucket.
2001-03-26 12:41:29 +00:00
Poul-Henning Kamp
71d033119f Make the pseudo-driver for "/dev/fd/*" handle fd's larger than 255.
PR:	25936
2001-03-20 13:26:13 +00:00
Jonathan Lemon
608a3ce62a Extend kqueue down to the device layer.
Backwards compatible approach suggested by: peter
2001-02-15 16:34:11 +00:00
David Malone
7cc0979fd6 Convert more malloc+bzero to malloc+M_ZERO.
Submitted by:	josh@zipperup.org
Submitted by:	Robert Drehmel <robd@gmx.net>
2000-12-08 21:51:06 +00:00
Matthew Dillon
279d722604 This patchset fixes a large number of file descriptor race conditions.
Pre-rfork code assumed inherent locking of a process's file descriptor
    array.  However, with the advent of rfork() the file descriptor table
    could be shared between processes.  This patch closes over a dozen
    serious race conditions related to one thread manipulating the table
    (e.g. closing or dup()ing a descriptor) while another is blocked in
    an open(), close(), fcntl(), read(), write(), etc...

PR: kern/11629
Discussed with: Alexander Viro <viro@math.psu.edu>
2000-11-18 21:01:04 +00:00
Alan Cox
4a71feb71c Add missing call to knote_fdclose() in setugidsafety() and fdcloseexec().
Reviewed by:	jlemon
2000-10-28 20:27:32 +00:00
Poul-Henning Kamp
db90128160 Avoid the modules madness I inadvertently introduced by making the
cloning infrastructure standard in kern_conf.  Modules are now
the same with or without devfs support.

If you need to detect if devfs is present, in modules or elsewhere,
check the integer variable "devfs_present".

This happily removes an ugly hack from kern/vfs_conf.c.

This forces a rename of the eventhandler and the standard clone
helper function.

Include <sys/eventhandler.h> in <sys/conf.h>: it's a helper #include
like <sys/queue.h>

Remove all #includes of opt_devfs.h they no longer matter.
2000-09-02 19:17:34 +00:00
Alfred Perlstein
c58b821e4c new sysctl 'kern.openfiles' (exports nfiles to userland) 2000-08-26 23:49:44 +00:00
Poul-Henning Kamp
d8cd1501f2 Dang, a _clone routine escaped #ifdef DEVFS containment. 2000-08-24 15:59:44 +00:00
Poul-Henning Kamp
a481b90b82 Fix panic when removing open device (found by bp@)
Implement subdirs.
 Build the full "devicename" for cloning functions.
 Fix panic when deleted device goes away.
 Collaps devfs_dir and devfs_dirent structures.
 Add proper cloning to the /dev/fd* "device-"driver.
 Fix a bug in make_dev_alias() handling which made aliases appear
  multiple times.
 Use devfs_clone to implement getdiskbyname()
 Make specfs maintain the stat(2) timestamps per dev_t
2000-08-24 15:36:55 +00:00
Peter Wemm
37b087a645 Clean up some low level bootstrap code:
- stop using the evil 'struct trapframe' argument for mi_startup()
  (formerly main()).  There are much better ways of doing it.
- do not use prepare_usermode() - setregs() in execve() will do it
  all for us as long as the p_md.md_regs pointer is set.  (which is
  now done in machdep.c rather than init_main.c.  The Alpha port did it
  this way all along and is much cleaner).
- collect all the magic %cr0 etc register settings into one place and
  have the AP's call that instead of using magic numbers (!!) that keep
  changing over and over again.
- Make it safe to call kthread_create() earlier, including during the
  device probe sequence.  It doesn't need the callback mechanism that
  NetBSD's version uses.
- kthreads created this way are root-less as they exist before the root
  filesystem is mounted.  init(1) is set up so that it aquires the root
  pointers prior to running.  If other kthreads want filesystem acccess
  we can make this code more generic.
- set all threads start times once we have decided what time it is.
- init uses a trampoline rather than the evil prepare_usermode() hack.
- kern_descrip.c has a couple of tweaks to deal with forking when there
  is no rootdir or cwd etc.
- adjust the early SYSINIT() sequence so that a few prereqisites are in
  place. eg: make sure the run queue is initialized before doing forks.

With this, the USB code can easily create a kthread to do the device
tree discovery.  (I have tested it, it works nicely).

There are still some open issues before this is truely useful.
- tsleep() does not like working before the clock is running.  It
  sort-of tries to spin wait, but it can do more useful things now.
- stopping a kthread in kld code at unload time is "interesting" but
  we have a solution for that.

The Alpha code needs no changes for this.  It already uses pretty much the
same strategies, but a little cleaner.
2000-08-11 09:05:12 +00:00
Poul-Henning Kamp
77978ab8bc Previous commit changing SYSCTL_HANDLER_ARGS violated KNF.
Pointed out by:	bde
2000-07-04 11:25:35 +00:00
Poul-Henning Kamp
82d9ae4e32 Style police catches up with rev 1.26 of src/sys/sys/sysctl.h:
Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our
sources:

        -sysctl_vm_zone SYSCTL_HANDLER_ARGS
        +sysctl_vm_zone (SYSCTL_HANDLER_ARGS)
2000-07-03 09:35:31 +00:00
Alfred Perlstein
1a61fa5e0d don't panic the system when fpathconv is called on an unsupported filetype. 2000-06-27 23:08:36 +00:00
Jake Burkholder
e39756439c Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by:		msmith and others
2000-05-26 02:09:24 +00:00
Jake Burkholder
740a1973a6 Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by:	phk
Reviewed by:	phk
Approved by:	mdodd
2000-05-23 20:41:01 +00:00
Jonathan Lemon
cb679c385e Introduce kqueue() and kevent(), a kernel event notification facility. 2000-04-16 18:53:38 +00:00
Warner Losh
27e2c03a27 Fix the style bugs in the style bugs fix. The style bug fix made the
new function inconsistant with the rest of this file.  The spelling
and grammer fixes were good and remain.
2000-01-21 06:57:52 +00:00
Brian Feldman
bd9079fa6c Fix style bugs in the last commit. 2000-01-21 02:52:54 +00:00
Warner Losh
7001be49f8 bdeize last commit:
o Remove opt_dontuse.h and ifdef PROCFS

Subitted by: bde, peter
2000-01-20 17:03:53 +00:00
Warner Losh
5e2664428c When we are execing a setugid program, and we have a procfs filesystem
file open in one of the special file descriptors (0, 1, or 2), close
it before completing the exec.

Submitted by: nergal@idea.avet.com.pl
Constructive comments: deraadt@openbsd.org, sef, peter, jkh
2000-01-20 07:12:52 +00:00
Bruce Evans
f85bdfcc66 Removed unused includes.
Rumoved unused compatibility cruft for dup().  Using it today would just
break dup() on fd's >= 64.

Fixed some style bugs.
1999-12-26 14:07:43 +00:00
Matthew Dillon
151f7a5d8a Only bother converting the stat structure if we intend to return it,
when no error occurs.

PR:		kern/14966
Reviewed by:	dillon@freebsd.org
Submitted by:	Kelly Yancey kbyanc@posi.net
1999-11-18 08:08:28 +00:00
Peter Wemm
2c77a71d3d Remove cdevsw_add() - the necessary make_dev() calls appear to be there
already.
1999-11-18 06:34:47 +00:00
Poul-Henning Kamp
2e3c8fcbd0 This is a partial commit of the patch from PR 14914:
Alot of the code in sys/kern directly accesses the *Q_HEAD and *Q_ENTRY
   structures for list operations.  This patch makes all list operations
   in sys/kern use the queue(3) macros, rather than directly accessing the
   *Q_{HEAD,ENTRY} structures.

This batch of changes compile to the same object files.

Reviewed by:    phk
Submitted by:   Jake Burkholder <jake@checker.org>
PR:     14914
1999-11-16 10:56:05 +00:00
Peter Wemm
cf87559cab Use fo_stat() rather than duplicating knowledge of file type internals
in here for stat(2) and friends.  Update the badops entries accordingly.
1999-11-08 03:27:14 +00:00
Brian Feldman
d91e41c8c9 Fix the advisory file locking by restoring previous ordering in closef()/
fdrop().  This only showed up when a file descriptor was duplicated
and then closed once, where the lock would be released on the first close().
1999-11-07 05:58:38 +00:00
Peter Wemm
d1f088dab5 Trim unused options (or #ifdef for undoc options).
Submitted by:	phk
1999-10-11 15:19:12 +00:00
Poul-Henning Kamp
d6a0e38a1b Remove five now unused fields from struct cdevsw. They should never
have been there in the first place.  A GENERIC kernel shrinks almost 1k.

Add a slightly different safetybelt under nostop for tty drivers.

Add some missing FreeBSD tags
1999-09-25 18:24:47 +00:00
Poul-Henning Kamp
2fe5bd8bb8 Fix a hole in jail(2).
Noticed by:	Alexander Bezroutchko <abb@zenon.net>
1999-09-25 14:14:21 +00:00
Brian Feldman
13ccadd4b0 This is what was "fdfix2.patch," a fix for fd sharing. It's pretty
far-reaching in fd-land, so you'll want to consult the code for
changes.  The biggest change is that now, you don't use
	fp->f_ops->fo_foo(fp, bar)
but instead
	fo_foo(fp, bar),
which increments and decrements the fp refcount upon entry and exit.
Two new calls, fhold() and fdrop(), are provided.  Each does what it
seems like it should, and if fdrop() brings the refcount to zero, the
fd is freed as well.

Thanks to peter ("to hell with it, it looks ok to me.") for his review.
Thanks to msmith for keeping me from putting locks everywhere :)

Reviewed by:	peter
1999-09-19 17:00:25 +00:00
Peter Wemm
c3aac50f28 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
Poul-Henning Kamp
9dcbe2404a Convert DEVFS hooks in (most) drivers to make_dev().
Diskslice/label code not yet handled.

Vinum, i4b, alpha, pc98 not dealt with (left to respective Maintainers)

Add the correct hook for devfs to kern_conf.c

The net result of this excercise is that a lot less files depends on DEVFS,
and devtoname() gets more sensible output in many cases.

A few drivers had minor additional cleanups performed relating to cdevsw
registration.

A few drivers don't register a cdevsw{} anymore, but only use make_dev().
1999-08-23 20:59:21 +00:00
Brian Feldman
e32c66c539 Fix fd race conditions (during shared fd table usage.) Badfileops is
now used in f_ops in place of NULL, and modifications to the files
are more carefully ordered. f_ops should also be set to &badfileops
upon "close" of a file.

This does not fix other problems mentioned in this PR than the first
one.

PR:		11629
Reviewed by:	peter
1999-08-04 18:53:50 +00:00
Mike Smith
79fc0bf4a0 From the submitter:
- this causes POSIX locking to use the thread group leader
   (p->p_leader) as the locking thread for all advisory locks.
   In non-kernel-threaded code p->p_leader == p, so this will have
   no effect.

   This results in (more) correct POSIX threaded flock-ing semantics.

   It also prevents the leader from exiting before any of the children.
   (so that p->p_leader will never be stale) in exit1().

   We have been running this patch for over a month now in our lab
   under load and at customer sites.

Submitted by:	John Plevyak <jplevyak@inktomi.com>
1999-06-07 20:37:29 +00:00
Poul-Henning Kamp
2447bec829 Simplify cdevsw registration.
The cdevsw_add() function now finds the major number(s) in the
struct cdevsw passed to it.  cdevsw_add_generic() is no longer
needed, cdevsw_add() does the same thing.

cdevsw_add() will print an message if the d_maj field looks bogus.

Remove nblkdev and nchrdev variables.  Most places they were used
bogusly.  Instead check a dev_t for validity by seeing if devsw()
or bdevsw() returns NULL.

Move bdevsw() and devsw() functions to kern/kern_conf.c

Bump __FreeBSD_version to 400006

This commit removes:
        72 bogus makedev() calls
        26 bogus SYSINIT functions

if_xe.c bogusly accessed cdevsw[], author/maintainer please fix.

I4b and vinum not changed.  Patches emailed to authors.  LINT
probably broken until they catch up.
1999-05-31 11:29:30 +00:00
Poul-Henning Kamp
4e2f199e0c This commit should be a extensive NO-OP:
Reformat and initialize correctly all "struct cdevsw".

        Initialize the d_maj and d_bmaj fields.

        The d_reset field was not removed, although it is never used.

I used a program to do most of this, so all the files now use the
same consistent format.  Please keep it that way.

Vinum and i4b not modified, patches emailed to respective authors.
1999-05-30 16:53:49 +00:00
Poul-Henning Kamp
bfbb9ce670 Divorce "dev_t" from the "major|minor" bitmap, which is now called
udev_t in the kernel but still called dev_t in userland.

Provide functions to manipulate both types:
        major()         umajor()
        minor()         uminor()
        makedev()       umakedev()
        dev2udev()      udev2dev()

For now they're functions, they will become in-line functions
after one of the next two steps in this process.

Return major/minor/makedev to macro-hood for userland.

Register a name in cdevsw[] for the "filedescriptor" driver.

In the kernel the udev_t appears in places where we have the
major/minor number combination, (ie: a potential device: we
may not have the driver nor the device), like in inodes, vattr,
cdevsw registration and so on, whereas the dev_t appears where
we carry around a reference to a actual device.

In the future the cdevsw and the aliased-from vnode will be hung
directly from the dev_t, along with up to two softc pointers for
the device driver and a few houskeeping bits.  This will essentially
replace the current "alias" check code (same buck, bigger bang).

A little stunt has been provided to try to catch places where the
wrong type is being used (dev_t vs udev_t), if you see something
not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if
it makes a difference.  If it does, please try to track it down
(many hands make light work) or at least try to reproduce it
as simply as possible, and describe how to do that.

Without DEVT_FASCIST I belive this patch is a no-op.

Stylistic/posixoid comments about the userland view of the <sys/*.h>
files welcome now, from userland they now contain the end result.

Next planned step: make all dev_t's refer to the same devsw[] which
means convert BLK's to CHR's at the perimeter of the vnodes and
other places where they enter the game (bootdev, mknod, sysctl).
1999-05-11 19:55:07 +00:00
Bill Fumerola
3d177f465a Add sysctl descriptions to many SYSCTL_XXXs
PR:		kern/11197
Submitted by:	Adrian Chadd <adrian@FreeBSD.org>
Reviewed by:	billf(spelling/style/minor nits)
Looked at by:	bde(style)
1999-05-03 23:57:32 +00:00
Dmitrij Tejblum
604359cf9b s/static foo_devsw_installed = 0;/static int foo_devsw_installed;/.
(Edited automatically)
1999-04-28 10:54:24 +00:00
Eivind Eklund
5526d2d920 Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT as
discussed on -hackers.

Introduce 'KASSERT(assertion, ("panic message", args))' for simple
check + panic.

Reviewed by:	msmith
1999-01-08 17:31:30 +00:00
Don Lewis
62d6ce3af2 I got another batch of suggestions for cosmetic changes from bde. 1998-11-11 10:56:07 +00:00
Don Lewis
831d27a9f5 Installed the second patch attached to kern/7899 with some changes suggested
by bde, a few other tweaks to get the patch to apply cleanly again and
some improvements to the comments.

This change closes some fairly minor security holes associated with
F_SETOWN, fixes a few bugs, and removes some limitations that F_SETOWN
had on tty devices.  For more details, see the description on the PR.

Because this patch increases the size of the proc and pgrp structures,
it is necessary to re-install the includes and recompile libkvm,
the vinum lkm, fstat, gcore, gdb, ipfilter, ps, top, and w.

PR:		kern/7899
Reviewed by:	bde, elvind
1998-11-11 10:04:13 +00:00
Bruce Evans
d974cf4dda Fixed printf format errors. 1998-07-29 17:38:14 +00:00
Bruce Evans
1ede4662be Cast longs to intptr_t before casting them to pointers.
Fixed bitrot in pseudo-declaration of `struct fcntl_args'.  fcntl()
is now broken in some cases when ints are larger than longs.
1998-07-15 06:10:16 +00:00
Doug Rabson
2ef49ddfcb 64bit fixes: p->p_retval is a register_t[] not an int[]. 1998-06-10 10:27:43 +00:00
John Dyson
1f56217280 Fix the futimes/undelete/utrace conflict with other BSD's. Note that
the only common  usage of utrace (the possible problem with this
commit) is with malloc, so this should be a real problem.  Add
the various NetBSD syscalls that allow full emulation of their
development environment.
1998-05-11 03:55:28 +00:00
John Dyson
9f24f214c3 Make the rootdir handling more consistent. Now, processes always
have a root vnode associated with them, and no special checks for
the null case are needed.
Submitted by:	terry@freebsd.org
1998-02-15 04:17:09 +00:00
Eivind Eklund
0b08f5f737 Back out DIAGNOSTIC changes. 1998-02-06 12:14:30 +00:00
Eivind Eklund
47cfdb166d Turn DIAGNOSTIC into a new-style option. 1998-02-04 22:34:03 +00:00
Eivind Eklund
7b778b5e61 Make all file-system (MFS, FFS, NFS, LFS, DEVFS) related option new-style.
This introduce an xxxFS_BOOT for each of the rootable filesystems.
(Presently not required, but encouraged to allow a smooth move of option *FS
to opt_dontuse.h later.)

LFS is temporarily disabled, and will be re-enabled tomorrow.
1998-01-24 02:54:56 +00:00
Eivind Eklund
5591b823d1 Make COMPAT_43 and COMPAT_SUNOS new-style options. 1997-12-16 17:40:42 +00:00
John Dyson
fd3bf77574 Fix and complete the AIO syscalls. There are some performance enhancements
coming up soon, but the code is functional.  Docs will be forthcoming.
1997-11-29 01:33:10 +00:00
Bruce Evans
a3c78a768e Fixed a missing conversion of retval to p_retval in disabled code.
Fixed overflow of FFLAGS() in fcntl(F_SETFL, ...).  This was not
a security hole, but gave wrong results for silly flags values.
E.g., it make fcntl(F_SETFL, -1) equivalent to fcntl(F_SETFL, 0).
POSIX requires ignoring the open mode bits in fcntl() (even if
they would be invalid for open()).
1997-11-23 12:24:59 +00:00
Bruce Evans
d826c47904 Fixed duplicate definitions of M_FILE (one static). 1997-11-23 10:43:49 +00:00
Poul-Henning Kamp
cb226aaa62 Move the "retval" (3rd) parameter from all syscall functions and put
it in struct proc instead.

This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.

I have not removed the /*ARGSUSED*/, they will require some looking at.

libkvm, ps and other userland struct proc frobbing programs will need
recompiled.
1997-11-06 19:29:57 +00:00
Poul-Henning Kamp
a1c995b626 Last major round (Unless Bruce thinks of somthing :-) of malloc changes.
Distribute all but the most fundamental malloc types.  This time I also
remembered the trick to making things static:  Put "static" in front of
them.

A couple of finer points by:	bde
1997-10-12 20:26:33 +00:00
Poul-Henning Kamp
55166637cd Distribute and statizice a lot of the malloc M_* types.
Substantial input from:	bde
1997-10-11 18:31:40 +00:00
Peter Wemm
51338ea83c Various select -> poll changes 1997-09-14 02:52:18 +00:00
Bruce Evans
32545fd108 Removed some stale comments.
Fixed a gratuitous ANSIism.
1997-08-26 00:09:44 +00:00
Bruce Evans
9dd8309d56 Removed support for OLD_PIPE. <sys/stat.h> is now missing the hack that
supported nameless pipes being indistinguishable from fifos.  We're not
going back.
1997-04-09 16:53:45 +00:00
Peter Wemm
6875d25465 Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.
1997-02-22 09:48:43 +00:00
Jordan K. Hubbard
1130b656e5 Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore.  This update would have been
insane otherwise.
1997-01-14 07:20:47 +00:00