Commit Graph

403 Commits

Author SHA1 Message Date
Mateusz Guzik
02efb9a8b1 Re-apply reverted parts of r236935 by pjd with some changes.
If fdalloc() decides to grow fdtable it does it once and at most doubles
the size. This still may be not enough for sufficiently large fd. Use fd
in calculations of new size in order to fix this.

When growing the table, fd is already equal to first free descriptor >= minfd,
also fdgrowtable() no longer drops the filedesc lock. As a result of this there
is no need to retry allocation nor lookup.

Fix description of fd_first_free to note all return values.

In co-operation with:	pjd
Approved by:	trasz (mentor)
MFC after:	1 month
2012-06-13 17:12:53 +00:00
Pawel Jakub Dawidek
faf0db351d Revert part of the r236935 for now, until I figure out why it doesn't
work properly.

Reported by:	davidxu
2012-06-12 10:25:11 +00:00
Pawel Jakub Dawidek
039dc89f0d fdgrowtable() no longer drops the filedesc lock so it is enough to
retry finding free file descriptor only once after fdgrowtable().

Spotted by:	pluknet
MFC after:	1 month
2012-06-11 22:05:26 +00:00
Pawel Jakub Dawidek
d3ec30e525 Use consistent way of checking if descriptor number is valid.
MFC after:	1 month
2012-06-11 20:17:20 +00:00
Pawel Jakub Dawidek
fd45a47ba6 Be consistent with white spaces.
MFC after:	1 month
2012-06-11 20:01:50 +00:00
Pawel Jakub Dawidek
19d9c0e11e Remove code duplicated in kern_close() and do_dup() and use closefp() function
introduced a minute ago.

This code duplication was responsible for the bug fixed in r236853.

Discussed with:	kib
Tested by:	pho
MFC after:	1 month
2012-06-11 20:00:44 +00:00
Pawel Jakub Dawidek
642db963ab Introduce closefp() function that we will be able to use to eliminate
code duplication in kern_close() and do_dup().

This is committed separately from the actual removal of the duplicated
code, as the combined diff was very hard to read.

Discussed with:	kib
Tested by:	pho
MFC after:	1 month
2012-06-11 19:57:31 +00:00
Pawel Jakub Dawidek
129c87eb7d Merge two ifs into one to make the code almost identical to the code in
kern_close().

Discussed with:	kib
Tested by:	pho
MFC after:	1 month
2012-06-11 19:53:41 +00:00
Pawel Jakub Dawidek
d327cee241 Move the code around a bit to move two parts of code duplicated from
kern_close() close together.

Discussed with:	kib
Tested by:	pho
MFC after:	1 month
2012-06-11 19:51:27 +00:00
Pawel Jakub Dawidek
8b40793150 Now that fdgrowtable() doesn't drop the filedesc lock we don't need to
check if descriptor changed from under us. Replace the check with an
assert.

Discussed with:	kib
Tested by:	pho
MFC after:	1 month
2012-06-11 19:48:55 +00:00
Pawel Jakub Dawidek
69d7614850 When we are closing capability during dup2(), we want to call mq_fdclose()
on the underlying object and not on the capability itself.

Discussed with:	rwatson
Sponsored by:	FreeBSD Foundation
MFC after:	1 month
2012-06-10 14:57:18 +00:00
Pawel Jakub Dawidek
1b693d7494 Merge two ifs into one. Other minor style fixes.
MFC after:	1 month
2012-06-10 13:10:21 +00:00
Pawel Jakub Dawidek
8849ae7256 Simplify fdtofp().
MFC after:	1 month
2012-06-10 06:31:54 +00:00
Pawel Jakub Dawidek
e59a97362d There is no need to drop the FILEDESC lock around malloc(M_WAITOK) anymore, as
we now use sx lock for filedesc structure protection.

Reviewed by:	kib
MFC after:	1 month
2012-06-09 18:50:32 +00:00
Pawel Jakub Dawidek
68abac4337 Remove now unused variable.
MFC after:	1 month
MFC with:	r236820
2012-06-09 18:48:06 +00:00
Pawel Jakub Dawidek
380513aaae Make some of the loops more readable.
Reviewed by:	tegge
MFC after:	1 month
2012-06-09 18:03:23 +00:00
Pawel Jakub Dawidek
5d02ed91e9 Correct panic message.
MFC after:	1 month
MFC with:	r236731
2012-06-09 12:27:30 +00:00
Pawel Jakub Dawidek
bf3e37ef15 In fdalloc() f_ofileflags for the newly allocated descriptor has to be 0.
Assert that instead of setting it to 0.

Sponsored by:	FreeBSD Foundation
MFC after:	1 month
2012-06-07 23:33:10 +00:00
Eitan Adler
847d0034e3 Return EBADF instead of EMFILE from dup2 when the second argument is
outside the range of valid file descriptors

PR:		kern/164970
Submitted by:	Peter Jeremy <peterjeremy@acm.org>
Reviewed by:	jilles
Approved by:	cperciva
MFC after:	1 week
2012-04-11 14:08:09 +00:00
John Baldwin
e506e182dd Export some more useful info about shared memory objects to userland
via procstat(1) and fstat(1):
- Change shm file descriptors to track the pathname they are associated
  with and add a shm_path() method to copy the path out to a caller-supplied
  buffer.
- Use the fo_stat() method of shared memory objects and shm_path() to
  export the path, mode, and size of a shared memory object via
  struct kinfo_file.
- Add a struct shmstat to the libprocstat(3) interface along with a
  procstat_get_shm_info() to export the mode and size of a shared memory
  object.
- Change procstat to always print out the path for a given object if it
  is valid.
- Teach fstat about shared memory objects and to display their path,
  mode, and size.

MFC after:	2 weeks
2012-04-01 18:22:48 +00:00
Peter Holm
ffae9d4d7c Free up allocated memory used by posix_fadvise(2). 2012-03-08 20:34:13 +00:00
David E. O'Brien
0e31b3c15f Reformat comment to be more readable in standard Xterm.
(while I'm here, wrap other long lines)
2011-11-15 01:48:53 +00:00
John Baldwin
dccc45e4c0 Move the cleanup of f_cdevpriv when the reference count of a devfs
file descriptor drops to zero out of _fdrop() and into devfs_close_f()
as it is only relevant for devfs file descriptors.

Reviewed by:	kib
MFC after:	1 week
2011-11-04 03:39:31 +00:00
Robert Watson
b160c14194 Correct a bug in export of capability-related information from the sysctls
supporting procstat -f: properly provide capability rights information to
userspace.  The bug resulted from a merge-o during upstreaming (or rather,
a failure to properly merge FreeBSD-side changed downstream).

Spotted by:     des, kibab
MFC after:      3 days
2011-10-12 12:08:03 +00:00
Kip Macy
8451d0dd78 In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)
2011-09-16 13:58:51 +00:00
Jonathan Anderson
cfb5f76865 Add experimental support for process descriptors
A "process descriptor" file descriptor is used to manage processes
without using the PID namespace. This is required for Capsicum's
Capability Mode, where the PID namespace is unavailable.

New system calls pdfork(2) and pdkill(2) offer the functional equivalents
of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote
process for debugging purposes. The currently-unimplemented pdwait(2) will,
in the future, allow querying rusage/exit status. In the interim, poll(2)
may be used to check (and wait for) process termination.

When a process is referenced by a process descriptor, it does not issue
SIGCHLD to the parent, making it suitable for use in libraries---a common
scenario when using library compartmentalisation from within large
applications (such as web browsers). Some observers may note a similarity
to Mach task ports; process descriptors provide a subset of this behaviour,
but in a UNIX style.

This feature is enabled by "options PROCDESC", but as with several other
Capsicum kernel features, is not enabled by default in GENERIC 9.0.

Reviewed by: jhb, kib
Approved by: re (kib), mentor (rwatson)
Sponsored by: Google Inc
2011-08-18 22:51:30 +00:00
Konstantin Belousov
9c00bb9190 Add the fo_chown and fo_chmod methods to struct fileops and use them
to implement fchown(2) and fchmod(2) support for several file types
that previously lacked it. Add MAC entries for chown/chmod done on
posix shared memory and (old) in-kernel posix semaphores.

Based on the submission by:	glebius
Reviewed by:	rwatson
Approved by:	re (bz)
2011-08-16 20:07:47 +00:00
Jonathan Anderson
69d377fe1b Allow Capsicum capabilities to delegate constrained
access to file system subtrees to sandboxed processes.

- Use of absolute paths and '..' are limited in capability mode.
- Use of absolute paths and '..' are limited when looking up relative
  to a capability.
- When a name lookup is performed, identify what operation is to be
  performed (such as CAP_MKDIR) as well as check for CAP_LOOKUP.

With these constraints, openat() and friends are now safe in capability
mode, and can then be used by code such as the capability-mode runtime
linker.

Approved by: re (bz), mentor (rwatson)
Sponsored by: Google Inc
2011-08-13 09:21:16 +00:00
Robert Watson
a9d2f8d84f Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *.  With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by:	re (bz)
Submitted by:	jonathan
Sponsored by:	Google Inc
2011-08-11 12:30:23 +00:00
Jonathan Anderson
c30b9b5169 Export capability information via sysctls.
When reporting on a capability, flag the fact that it is a capability,
but also unwrap to report all of the usual information about the
underlying file.

Approved by: re (kib), mentor (rwatson)
Sponsored by: Google Inc
2011-07-20 09:53:35 +00:00
Jonathan Anderson
745bae379d Add implementation for capabilities.
Code to actually implement Capsicum capabilities, including fileops and
kern_capwrap(), which creates a capability to wrap an existing file
descriptor.

We also modify kern_close() and closef() to handle capabilities.

Finally, remove cap_filelist from struct capability, since we don't
actually need it.

Approved by: mentor (rwatson), re (Capsicum blanket)
Sponsored by: Google Inc
2011-07-15 09:37:14 +00:00
Jonathan Anderson
5604e481b1 Fix the "passability" test in fdcopy().
Rather than checking to see if a descriptor is a kqueue, check to see if
its fileops flags include DFLAG_PASSABLE.

At the moment, these two tests are equivalent, but this will change with
the addition of capabilities that wrap kqueues but are themselves of type
DTYPE_CAPABILITY. We already have the DFLAG_PASSABLE abstraction, so let's
use it.

This change has been tested with [the newly improved] tools/regression/kqueue.

Approved by: mentor (rwatson), re (Capsicum blanket)
Sponsored by: Google Inc
2011-07-08 12:19:25 +00:00
Edward Tomasz Napierala
afcc55f318 All the racct_*() calls need to happen with the proc locked. Fixing this
won't happen before 9.0.  This commit adds "#ifdef RACCT" around all the
"PROC_LOCK(p); racct_whatever(p, ...); PROC_UNLOCK(p)" instances, in order
to avoid useless locking/unlocking in kernels built without "options RACCT".
2011-07-06 20:06:44 +00:00
Jonathan Anderson
9acdfe6549 Rework _fget to accept capability parameters.
This new version of _fget() requires new parameters:
- cap_rights_t needrights
    the rights that we expect the capability's rights mask to include
    (e.g. CAP_READ if we are going to read from the file)

- cap_rights_t *haverights
    used to return the capability's rights mask (ignored if NULL)

- u_char *maxprotp
    the maximum mmap() rights (e.g. VM_PROT_READ) that can be permitted
    (only used if we are going to mmap the file; ignored if NULL)

- int fget_flags
    FGET_GETCAP if we want to return the capability itself, rather than
    the underlying object which it wraps

Approved by: mentor (rwatson), re (Capsicum blanket)
Sponsored by: Google Inc
2011-07-05 13:45:10 +00:00
Jonathan Anderson
c0467b5e6e When Capsicum starts creating capabilities to wrap existing file
descriptors, we will want to allocate a new descriptor without installing
it in the FD array.

Split falloc() into falloc_noinstall() and finstall(), and rewrite
falloc() to call them with appropriate atomicity.

Approved by: mentor (rwatson), re (bz)
2011-06-30 15:22:49 +00:00
Stanislav Sedov
ff6f41a472 - Do no try to drop a NULL filedesc pointer. 2011-05-12 10:56:33 +00:00
Stanislav Sedov
0daf62d9f5 - Commit work from libprocstat project. These patches add support for runtime
file and processes information retrieval from the running kernel via sysctl
  in the form of new library, libprocstat.  The library also supports KVM backend
  for analyzing memory crash dumps.  Both procstat(1) and fstat(1) utilities have
  been modified to take advantage of the library (as the bonus point the fstat(1)
  utility no longer need superuser privileges to operate), and the procstat(1)
  utility is now able to display information from memory dumps as well.

  The newly introduced fuser(1) utility also uses this library and able to operate
  via sysctl and kvm backends.

  The library is by no means complete (e.g. KVM backend is missing vnode name
  resolution routines, and there're no manpages for the library itself) so I
  plan to improve it further.  I'm commiting it so it will get wider exposure
  and review.

  We won't be able to MFC this work as it relies on changes in HEAD, which
  was introduced some time ago, that break kernel ABI.  OTOH we may be able
  to merge the library with KVM backend if we really need it there.

Discussed with:	rwatson
2011-05-12 10:11:39 +00:00
Edward Tomasz Napierala
722581d9e6 Add RACCT_NOFILE accounting.
Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib (earlier version)
2011-04-06 19:13:04 +00:00
Konstantin Belousov
1fe80828e7 After the r219999 is merged to stable/8, rename fallocf(9) to falloc(9)
and remove the falloc() version that lacks flag argument. This is done
to reduce the KPI bloat.

Requested by:	jhb
X-MFC-note:	do not
2011-04-01 13:28:34 +00:00
Konstantin Belousov
246d35ec91 Add O_CLOEXEC flag to open(2) and fhopen(2).
The new function fallocf(9), that is renamed falloc(9) with added
flag argument, is provided to facilitate the merge to stable branch.

Reviewed by:	jhb
MFC after:	1 week
2011-03-25 14:00:36 +00:00
John Baldwin
8e6fa660f2 Fix some locking nits with the p_state field of struct proc:
- Hold the proc lock while changing the state from PRS_NEW to PRS_NORMAL
  in fork to honor the locking requirements.  While here, expand the scope
  of the PROC_LOCK() on the new process (p2) to avoid some LORs.  Previously
  the code was locking the new child process (p2) after it had locked the
  parent process (p1).  However, when locking two processes, the safe order
  is to lock the child first, then the parent.
- Fix various places that were checking p_state against PRS_NEW without
  having the process locked to use PROC_LOCK().  Every place was already
  locking the process, just after the PRS_NEW check.
- Remove or reduce the use of PROC_SLOCK() for places that were checking
  p_state against PRS_NEW.  The PROC_LOCK() alone is sufficient for reading
  the current state.
- Reorder fill_kinfo_proc() slightly so it only acquires PROC_SLOCK() once.

MFC after:	1 week
2011-03-24 18:40:11 +00:00
Bjoern A. Zeeb
1fb51a12f2 Mfp4 CH=177274,177280,177284-177285,177297,177324-177325
VNET socket push back:
  try to minimize the number of places where we have to switch vnets
  and narrow down the time we stay switched.  Add assertions to the
  socket code to catch possibly unset vnets as seen in r204147.

  While this reduces the number of vnet recursion in some places like
  NFS, POSIX local sockets and some netgraph, .. recursions are
  impossible to fix.

  The current expectations are documented at the beginning of
  uipc_socket.c along with the other information there.

  Sponsored by: The FreeBSD Foundation
  Sponsored by: CK Software GmbH
  Reviewed by:  jhb
  Tested by:    zec

Tested by:	Mikolaj Golub (to.my.trociny gmail.com)
MFC after:	2 weeks
2011-02-16 21:29:13 +00:00
Jilles Tjoelker
90750179ec Do not trip a KASSERT if /dev/null cannot be opened for a setuid program.
The fdcheckstd() function makes sure fds 0, 1 and 2 are open by opening
/dev/null. If this fails (e.g. missing devfs or wrong permissions),
fdcheckstd() will return failure and the process will exit as if it received
SIGABRT. The KASSERT is only to check that kern_open() returns the expected
fd, given that it succeeded.

Tripping the KASSERT is most likely if fd 0 is open but fd 1 or 2 are not.

MFC after:	2 weeks
2011-01-28 15:29:35 +00:00
Konstantin Belousov
23b70c1ae2 Finish r210923, 210926. Mark some devices as eternal.
MFC after:	2 weeks
2011-01-04 10:59:38 +00:00
Bjoern A. Zeeb
2f826fdf53 Remove one zero from the double-0.
This code doesn't have a license to kill.

MFC after:	3 days
2010-04-23 14:32:58 +00:00
Konstantin Belousov
080136212f On the return path from F_RDAHEAD and F_READAHEAD fcntls, do not
unlock Giant twice.

While there, bring conditions in the do/while loops closer to style,
that also makes the lines fit into 80 columns.

Reported and tested by:	dougb
2009-11-20 22:22:53 +00:00
Xin LI
82aebf697c Add two new fcntls to enable/disable read-ahead:
- F_READAHEAD: specify the amount for sequential access.  The amount is
   specified in bytes and is rounded up to nearest block size.
 - F_RDAHEAD: Darwin compatible version that use 128KB as the sequential
   access size.

A third argument of zero disables the read-ahead behavior.

Please note that the read-ahead amount is also constrainted by sysctl
variable, vfs.read_max, which may need to be raised in order to better
utilize this feature.

Thanks Igor Sysoev for proposing the feature and submitting the original
version, and kib@ for his valuable comments.

Submitted by:	Igor Sysoev <is rambler-co ru>
Reviewed by:	kib@
MFC after:	1 month
2009-09-28 16:59:47 +00:00
Robert Watson
14961ba789 Replace AUDIT_ARG() with variable argument macros with a set more more
specific macros for each audit argument type.  This makes it easier to
follow call-graphs, especially for automated analysis tools (such as
fxr).

In MFC, we should leave the existing AUDIT_ARG() macros as they may be
used by third-party kernel modules.

Suggested by:	brooks
Approved by:	re (kib)
Obtained from:	TrustedBSD Project
MFC after:	1 week
2009-06-27 13:58:44 +00:00
Ulf Lilleengen
2dd43ecaec - Similar to the previous commit, but for CURRENT: Fix a bug where a FIFO vnode
use count was increased twice, but only decreased once.
2009-06-24 18:44:38 +00:00
Ulf Lilleengen
7083246d7c - Fix a bug where a FIFO vnode use count was increased twice, but only
decreased once.

MFC after:	1 week
2009-06-24 18:38:51 +00:00