Commit Graph

145 Commits

Author SHA1 Message Date
jhb
cf15cbb1b6 - Add two new system calls: preadv() and pwritev() which are like readv()
and writev() except that they take an additional offset argument and do
  not change the current file position.  In SAT speak:
  preadv:readv::pread:read and pwritev:writev::pwrite:write.
- Try to reduce code duplication some by merging most of the old
  kern_foov() and dofilefoo() functions into new dofilefoo() functions
  that are called by kern_foov() and kern_pfoov().  The non-v functions
  now all generate a simple uio on the stack from the passed in arguments
  and then call kern_foov().  For example, read() now just builds a uio and
  calls kern_readv() and pwrite() just builds a uio and calls kern_pwritev().

PR:		kern/80362
Submitted by:	Marc Olzheim marcolz at stack dot nl (1)
Approved by:	re (scottl)
MFC after:	1 week
2005-07-07 18:17:55 +00:00
peter
2778435f72 Conditionally weaken sys_generic.c rev 1.136 to allow certain dubious
ioctl numbers in backwards compatability mode.  eg: an IOC_IN ioctl with
a size of zero.  Traditionally this was what you did before IOC_VOID
existed, and we had some established users of this in the tree, namely
procfs.  Certain 3rd party drivers with binary userland components also
have this too.

This is necessary to have 4.x and 5.x binaries use these ioctl's.  We
found this at work when trying to run 4.x binaries.

Approved by:	re
2005-06-30 00:19:08 +00:00
jhb
72d1a40cb6 Implement kern_adjtime(), kern_readv(), kern_sched_rr_get_interval(),
kern_settimeofday(), and kern_writev() to allow for further stackgap
reduction in the compat ABIs.
2005-03-31 22:51:18 +00:00
cperciva
e1f5bc1828 Declare "cnt" (a number of bytes to read or write) as an "ssize_t", not
as a "long" in dofileread() and dofilewrite().

Discussed with:	jhb
2005-02-10 20:19:17 +00:00
phk
a4f3b3f609 Previously a read of zero bytes got handled in devfs:vop_read() but I
missed that when the vnode bypass was introduced.

Deal with zero length transfers before we even get to fo_ops->fo_read().

Found by:	Slawa Olhovchenkov <slwzxy.spb.ru@zxy.spb.ru>
PR:	75758
2005-01-25 09:15:32 +00:00
phk
220b6a2414 Detect sign-extension bugs in the ioctl(2) command argument: Truncate
to 32 bits and print warning.
2005-01-18 07:37:05 +00:00
imp
20280f1431 /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
phk
d6a49a1565 Push Giant down through ioctl.
Don't grab Giant in the upper syscall/wrapper code

NET_LOCK_GIANT in the socket code (sockets/fifos).

mtx_lock(&Giant) in the vnode code.

mtx_lock(&Giant) in the opencrypto code.  (This may actually not be
needed, but better safe than sorry).

Devfs grabs Giant if the driver is marked as needing Giant.
2004-11-17 09:09:55 +00:00
phk
313ccfcf9e Push Giant down through select and poll.
Don't grab Giant in the upper syscall/wrapper code

NET_LOCK_GIANT in the socket code (sockets/fifos).

mtx_lock(&Giant) in the vnode code.

Devfs grabs Giant if the driver is marked as needing Giant.
2004-11-17 08:01:10 +00:00
phk
d37a7160db Polish code to correctly reflect structure. 2004-11-16 14:47:04 +00:00
phk
7339a0b3dc Rearrange memory management for ioctl arguments to use stronger checks
for illegal values and don't store them on the stack any more.
2004-11-14 14:34:12 +00:00
phk
39a0a1611e style polish. 2004-11-14 12:04:34 +00:00
phk
216166ee0d Introduce an alias for FILEDESC_{UN}LOCK() with the suffix _FAST.
Use this in all the places where sleeping with the lock held is not
an issue.

The distinction will become significant once we finalize the exact
lock-type to use for this kind of case.
2004-11-13 11:53:02 +00:00
andre
bae83b7595 Poll() uses the array smallbits that is big enough to hold 32 struct
pollfd's to avoid calling malloc() on small numbers of fd's.  Because
smalltype's members have type char, its address might be misaligned
for a struct pollfd.  Change the array of char to an array of struct
pollfd.

PR:		kern/58214
Submitted by:	Stefan Farfeleder <stefan@fafoe.narf.at>
Reviewed by:	bde (a long time ago)
MFC after:	3 days
2004-08-27 21:23:50 +00:00
phk
b9f13e4266 Clean up and wash struct iovec and struct uio handling.
Add copyiniov() which copies a struct iovec array in from userland into
a malloc'ed struct iovec.  Caller frees.

Change uiofromiov() to malloc the uio (caller frees) and name it
copyinuio() which is more appropriate.

Add cloneuio() which returns a malloc'ed copy.  Caller frees.

Use them throughout.
2004-07-10 15:42:16 +00:00
imp
74cf37bd00 Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
2004-04-05 21:03:37 +00:00
rwatson
6a1193a64c Add annotations to mtx_lock(&Giant) in kern_select() and poll() that
we always grab Giant, even if we're actually only polling objects that
don't require giant.  Once socket locking is merged, there will be
strong motivation to fix this.
2004-03-13 05:58:57 +00:00
jhb
d25301c858 Switch the sleep/wakeup and condition variable implementations to use the
sleep queue interface:
- Sleep queues attempt to merge some of the benefits of both sleep queues
  and condition variables.  Having sleep qeueus in a hash table avoids
  having to allocate a queue head for each wait channel.  Thus, struct cv
  has shrunk down to just a single char * pointer now.  However, the
  hash table does not hold threads directly, but queue heads.  This means
  that once you have located a queue in the hash bucket, you no longer have
  to walk the rest of the hash chain looking for threads.  Instead, you have
  a list of all the threads sleeping on that wait channel.
- Outside of the sleepq code and the sleep/cv code the kernel no longer
  differentiates between cv's and sleep/wakeup.  For example, calls to
  abortsleep() and cv_abort() are replaced with a call to sleepq_abort().
  Thus, the TDF_CVWAITQ flag is removed.  Also, calls to unsleep() and
  cv_waitq_remove() have been replaced with calls to sleepq_remove().
- The sched_sleep() function no longer accepts a priority argument as
  sleep's no longer inherently bump the priority.  Instead, this is soley
  a propery of msleep() which explicitly calls sched_prio() before
  blocking.
- The TDF_ONSLEEPQ flag has been dropped as it was never used.  The
  associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been
  dropped and replaced with a single explicit clearing of td_wchan.
  TD_SET_ONSLEEPQ() would really have only made sense if it had taken
  the wait channel and message as arguments anyway.  Now that that only
  happens in one place, a macro would be overkill.
2004-02-27 18:52:44 +00:00
jhb
279b2b8278 Locking for the per-process resource limits structure.
- struct plimit includes a mutex to protect a reference count.  The plimit
  structure is treated similarly to struct ucred in that is is always copy
  on write, so having a reference to a structure is sufficient to read from
  it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
  limits from a process to keep the limit structure from changing out from
  under you while reading from it.
- Various global limits that are ints are not protected by a lock since
  int writes are atomic on all the archs we support and thus a lock
  wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
  behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
  either an rlimit, or the current or max individual limit of the specified
  resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
  other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
  (it didn't used the stackgap when it should have) but uses lim_rlimit()
  and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
  but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits.  It
  also no longer uses the stackgap for accessing sysctl's for the
  ibcs2_sysconf() syscall but uses kernel_sysctl() instead.  As a result,
  ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.

Submitted by:	mtm (mostly, I only did a few cleanups and catchups)
Tested on:	i386
Compiled on:	alpha, amd64
2004-02-04 21:52:57 +00:00
ache
be3f79f636 pread/pwrite:
follow lseek spirit - return EINVAL on negative offset for non-VCHR
2004-01-20 01:27:42 +00:00
tanimura
7eade05dfa - Implement selwakeuppri() which allows raising the priority of a
thread being waken up.  The thread waken up can run at a priority as
  high as after tsleep().

- Replace selwakeup()s with selwakeuppri()s and pass appropriate
  priorities.

- Add cv_broadcastpri() which raises the priority of the broadcast
  threads.  Used by selwakeuppri() if collision occurs.

Not objected in:	-arch, -current
2003-11-09 09:17:26 +00:00
phk
74c6dfd454 Introduce no_poll() default method for device drivers. Have it
do exactly the same as vop_nopoll() for consistency and put a
comment in the two pointing at each other.

Retire seltrue() in favour of no_poll().

Create private default functions in kern_conf.c instead of public
ones.

Change default strategy to return the bio with ENODEV instead of
doing nothing which would lead the bio stranded.

Retire public nullopen() and nullclose() as well as the entire band
of public no{read,write,ioctl,mmap,kqfilter,strategy,poll,dump}
funtions, they are the default actions now.

Move the final two trivial functions from subr_xxx.c to kern_conf.c
and retire the now empty subr_xxx.c
2003-09-27 12:53:33 +00:00
alc
7199d3e24f Remove Giant from writev(2). Eliminate trivial style differences between
writev(2) and readv(2).
2003-08-01 02:21:54 +00:00
phk
a81d7fdac7 Introduce a new flag on a file descriptor: DFLAG_SEEKABLE and use that
rather than assume that only DTYPE_VNODE is seekable.
2003-06-18 19:53:59 +00:00
obrien
3b8fff9e4c Use __FBSDID(). 2003-06-11 00:56:59 +00:00
kan
9468fdaf14 Deprecate machine/limits.h in favor of new sys/limits.h.
Change all in-tree consumers to include <sys/limits.h>

Discussed on:	standards@
Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>
2003-04-29 13:36:06 +00:00
imp
cf874b345d Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
alfred
bf8e8a6e8f Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
alfred
d070c0a52d SCARGS removal take II. 2002-12-14 01:56:26 +00:00
alfred
4f48184fb2 Backout removal SCARGS, the code freeze is only "selectively" over. 2002-12-13 22:41:47 +00:00
alfred
d19b4e039d Remove SCARGS.
Reviewed by: md5
2002-12-13 22:27:25 +00:00
phk
1dfc2c167f Be consistent about "static" functions: if the function is marked
static in its prototype, mark it static at the definition too.

Inspired by:    FlexeLint warning #512
2002-09-28 17:15:38 +00:00
phk
4848209c80 We don't need the <sys/disklabel.h> include for alpha anymore.
Sponsored by:	DARPA & NAI Labs.
2002-09-20 17:45:44 +00:00
julian
5702a380a5 Completely redo thread states.
Reviewed by:	davidxu@freebsd.org
2002-09-11 08:13:56 +00:00
iedowse
be17b12cb6 Split out a number of mostly VFS and signal related syscalls into
a kernel-internal kern_*() version and a wrapper that is called via
the syscall vector table. For paths and structure pointers, the
internal version either takes a uio_seg parameter or requires the
caller to copyin() the data to kernel memory as appropiate. This
will permit emulation layers to use these syscalls without having
to copy out translated arguments to the stack gap.

Discussed on:		-arch
Review/suggestions:	bde, jhb, peter, marcel
2002-09-01 20:37:28 +00:00
peter
6bc0c8ebd0 Move the TAILQ_INIT(&td->td_selq) before the retry: label. Otherwise in
some circumstances when we get a select collision, we can end up with
cases where we do not clear some sip->si_thread on the way out, leading to
page faults in selwakeup().  This should solve the problem where postfix
can crash the kernel during select collisions.

Reviewed by: alfred
2002-08-23 22:43:28 +00:00
rwatson
3246fbf45f In continuation of early fileop credential changes, modify fo_ioctl() to
accept an 'active_cred' argument reflecting the credential of the thread
initiating the ioctl operation.

- Change fo_ioctl() to accept active_cred; change consumers of the
  fo_ioctl() interface to generally pass active_cred from td->td_ucred.
- In fifofs, initialize filetmp.f_cred to ap->a_cred so that the
  invocations of soo_ioctl() are provided access to the calling f_cred.
  Pass ap->a_td->td_ucred as the active_cred, but note that this is
  required because we don't yet distinguish file_cred and active_cred
  in invoking VOP's.
- Update kqueue_ioctl() for its new argument.
- Update pipe_ioctl() for its new argument, pass active_cred rather
  than td_ucred to MAC for authorization.
- Update soo_ioctl() for its new argument.
- Update vn_ioctl() for its new argument, use active_cred rather than
  td->td_ucred to authorize VOP_IOCTL() and the associated VOP_GETATTR().

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-17 02:36:16 +00:00
rwatson
2b82cd24f1 Make similar changes to fo_stat() and fo_poll() as made earlier to
fo_read() and fo_write(): explicitly use the cred argument to fo_poll()
as "active_cred" using the passed file descriptor's f_cred reference
to provide access to the file credential.  Add an active_cred
argument to fo_stat() so that implementers have access to the active
credential as well as the file credential.  Generally modify callers
of fo_stat() to pass in td->td_ucred rather than fp->f_cred, which
was redundantly provided via the fp argument.  This set of modifications
also permits threads to perform these operations on behalf of another
thread without modifying their credential.

Trickle this change down into fo_stat/poll() implementations:

- badfo_poll(), badfo_stat(): modify/add arguments.
- kqueue_poll(), kqueue_stat(): modify arguments.
- pipe_poll(), pipe_stat(): modify/add arguments, pass active_cred to
  MAC checks rather than td->td_ucred.
- soo_poll(), soo_stat(): modify/add arguments, pass fp->f_cred rather
  than cred to pru_sopoll() to maintain current semantics.
- sopoll(): moidfy arguments.
- vn_poll(), vn_statfile(): modify/add arguments, pass new arguments
  to vn_stat().  Pass active_cred to MAC and fp->f_cred to VOP_POLL()
  to maintian current semantics.
- vn_close(): rename cred to file_cred to reflect reality while I'm here.
- vn_stat(): Add active_cred and file_cred arguments to vn_stat()
  and consumers so that this distinction is maintained at the VFS
  as well as 'struct file' layer.  Pass active_cred instead of
  td->td_ucred to MAC and to VOP_GETATTR() to maintain current semantics.

- fifofs: modify the creation of a "filetemp" so that the file
  credential is properly initialized and can be used in the socket
  code if desired.  Pass ap->a_td->td_ucred as the active
  credential to soo_poll().  If we teach the vnop interface about
  the distinction between file and active credentials, we would use
  the active credential here.

Note that current inconsistent passing of active_cred vs. file_cred to
VOP's is maintained.  It's not clear why GETATTR would be authorized
using active_cred while POLL would be authorized using file_cred at
the file system level.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-16 12:52:03 +00:00
rwatson
44404e4547 In order to better support flexible and extensible access control,
make a series of modifications to the credential arguments relating
to file read and write operations to cliarfy which credential is
used for what:

- Change fo_read() and fo_write() to accept "active_cred" instead of
  "cred", and change the semantics of consumers of fo_read() and
  fo_write() to pass the active credential of the thread requesting
  an operation rather than the cached file cred.  The cached file
  cred is still available in fo_read() and fo_write() consumers
  via fp->f_cred.  These changes largely in sys_generic.c.

For each implementation of fo_read() and fo_write(), update cred
usage to reflect this change and maintain current semantics:

- badfo_readwrite() unchanged
- kqueue_read/write() unchanged
  pipe_read/write() now authorize MAC using active_cred rather
  than td->td_ucred
- soo_read/write() unchanged
- vn_read/write() now authorize MAC using active_cred but
  VOP_READ/WRITE() with fp->f_cred

Modify vn_rdwr() to accept two credential arguments instead of a
single credential: active_cred and file_cred.  Use active_cred
for MAC authorization, and select a credential for use in
VOP_READ/WRITE() based on whether file_cred is NULL or not.  If
file_cred is provided, authorize the VOP using that cred,
otherwise the active credential, matching current semantics.

Modify current vn_rdwr() consumers to pass a file_cred if used
in the context of a struct file, and to always pass active_cred.
When vn_rdwr() is used without a file_cred, pass NOCRED.

These changes should maintain current semantics for read/write,
but avoid a redundant passing of fp->f_cred, as well as making
it more clear what the origin of each credential is in file
descriptor read/write operations.

Follow-up commits will make similar changes to other file descriptor
operations, and modify the MAC framework to pass both credentials
to MAC policy modules so they can implement either semantic for
revocation.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-15 20:55:08 +00:00
alfred
85762a62d4 Attempt to clarify comment in selrecord. 2002-07-24 00:29:22 +00:00
alfred
73d7a388d6 remove caddr_t from fo_ioctl calls 2002-07-22 15:46:51 +00:00
alfred
bb1a616a3f remove caddr_t 2002-07-22 15:44:27 +00:00
julian
aa2dc0a5d9 Part 1 of KSE-III
The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by:	Almost everyone who counts
	(at various times, peter, jhb, matt, alfred, mini, bernd,
	and a cast of thousands)

	NOTE: this is still Beta code, and contains lots of debugging stuff.
	expect slight instability in signals..
2002-06-29 17:26:22 +00:00
alfred
619c88aeeb Implement SO_NOSIGPIPE option for sockets. This allows one to request that
an EPIPE error return not generate SIGPIPE on sockets.

Submitted by: lioux
Inspired by: Darwin
2002-06-20 18:52:54 +00:00
phk
1f4e9c0c72 Remove the compat bits for the mis-aligned struct disklabel on alpha,
people got three times longer than I promised.

Sponsored by: DARPA & NAI Labs.
2002-06-19 08:37:02 +00:00
kbyanc
052b70fe67 Make nselcol, the number of select collisions since boot, unsigned as
negative collisions simply doesn't make sense.

PR:		(one small part of) 19720
Approved by:	alfred
2002-06-12 02:08:18 +00:00
jhb
fbebc83b5b Catch up to changes in ktrace API. 2002-06-07 05:37:18 +00:00
alc
eff7d93533 o Correct an error made in revision 1.65: In readv(), if uap->iovcnt is
out-of-range, drop the file reference before returning.  (This error
   also exists in the RELENG_4 branch.)
 o Eliminate the acquisition and release of Giant in readv()
   now that malloc() and free() are callable without Giant.
2002-05-09 02:30:41 +00:00
phk
8cabbc69f8 As promised make the hack for sizeof(struct disklabel) on alpha annoying.
Run make world (or recompile whatever program whines) to get rid of warning.

Compat bits will be removed entirely in about two weeks.
2002-05-02 21:53:39 +00:00
jhb
db9aa81e23 Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on:	i386, alpha, sparc64
2002-04-04 21:03:38 +00:00