Commit Graph

615 Commits

Author SHA1 Message Date
Ed Maste
69a2875821 Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
Mariusz Zaborski
6e70b4f058 fd: add fget_cap and fget_cap_locked primitives
They can be used to obtain capabilities along with a referenced fp.

Reviewed by:	mjg@
2016-09-12 22:46:19 +00:00
Ed Maste
dd38731e09 allow kern.proc.nfds sysctl in capability mode
Reviewed by:	allanjude
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D7733
2016-09-01 02:51:50 +00:00
Mateusz Guzik
4cbafea09c fd: add fdeget_locked and use in kern_descrip 2016-08-30 21:53:22 +00:00
Mateusz Guzik
382172be68 sigio: do a lockless check in funsetownlist
There is no need to grab the lock first to see if sigio is used, and it
typically is not.
2016-08-10 15:24:15 +00:00
Robert Watson
51d1f69069 Audit file-descriptor arguments to I/O system calls such as
read(2), write(2), dup(2), and mmap(2).  This auditing is not
required by the Common Criteria (and hence was not being
performed), but is valuable in both contemporary live analysis
and forensic use cases.

MFC after:	3 days
Sponsored by:	DARPA, AFRL
2016-07-10 08:04:02 +00:00
Mateusz Guzik
2dbdf49cf4 fd: provide a common exit point for unlock in kern_dup
While here assert dropped filedesc lock on return from closefp.
2016-05-27 17:00:15 +00:00
Mateusz Guzik
0cfe1a1fec fd: assert dropped filedesc lock in fdcloseexec 2016-05-08 03:26:12 +00:00
Pedro F. Giffuni
e3043798aa sys/kern: spelling fixes in comments.
No functional change.
2016-04-29 22:15:33 +00:00
Gleb Smirnoff
9c64cfe56c The sendfile(2) allows to send extra data from userspace before the file
data (headers).  Historically the size of the headers was not checked
against the socket buffer space.  Application could easily overcommit the
socket buffer space.

With the new sendfile (r293439) the problem remained, but a KASSERT was
inserted that checked that amount of data written to the socket matches
its space.  In case when size of headers is bigger that socket space,
KASSERT fires.  Without INVARIANTS the new sendfile won't panic, but
would report incorrect amount of bytes sent.

o With this change, the headers copyin is moved down into the cycle, after
  the sbspace() check.  The uio size is trimmed by socket space there,
  which fixes the overcommit problem and its consequences.
o The compatibility handling for FreeBSD 4 sendfile headers API is pushed
  up the stack to syscall wrappers.  This required a copy and paste of the
  code, but in turn this allowed to remove extra stack carried parameter
  from fo_sendfile_t, and embrace entire compat code into #ifdef.  If in
  future we got more fo_sendfile_t function, the copy and paste level would
  even reduce.

Reviewed by:	emax, gallatin, Maxim Dounin <mdounin mdounin.ru>
Tested by:	Vitalij Satanivskij <satan ukr.net>
Sponsored by:	Netflix
2016-03-29 19:57:11 +00:00
John Baldwin
399e8c1773 Simplify AIO initialization now that it is standard.
- Mark AIO system calls as STD and remove the helpers to dynamically
  register them.
- Use COMPAT6 for the old system calls with the older sigevent instead of
  an 'o' prefix.
- Simplify the POSIX configuration to note that AIO is always available.
- Handle AIO in the default VOP_PATHCONF instead of special casing it in
  the pathconf() system call.  fpathconf() is still hackish.
- Remove freebsd32_aio_cancel() as it just called the native one directly.

Reviewed by:	kib
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D5589
2016-03-09 19:05:11 +00:00
Mateusz Guzik
b577e693aa fd: implement kern.proc.nfds sysctl
Intended purpose is to provide an equivalent of OpenBSD's getdtablecount
syscall for the compat library..
2015-11-07 00:18:14 +00:00
Mateusz Guzik
9af8c8b72b fd: make rights a mandatory argument to fgetvp_rights
The only caller already always passes rights.
2015-09-07 20:05:56 +00:00
Mateusz Guzik
d7832811a7 fd: make the common case in filecaps_copy work lockless
The filedesc lock is only needed if ioctls caps are present, which is a
rare situation. This is a step towards reducing the scope of the filedesc
lock.
2015-09-07 20:02:56 +00:00
Conrad Meyer
14bdbaf2e4 Detect badly behaved coredump note helpers
Coredump notes depend on being able to invoke dump routines twice; once
in a dry-run mode to get the size of the note, and another to actually
emit the note to the corefile.

When a note helper emits a different length section the second time
around than the length it requested the first time, the kernel produces
a corrupt coredump.

NT_PROCSTAT_FILES output length, when packing kinfo structs, is tied to
the length of filenames corresponding to vnodes in the process' fd table
via vn_fullpath.  As vnodes may move around during dump, this is racy.

So:

 - Detect badly behaved notes in putnote() and pad underfilled notes.

 - Add a fail point, debug.fail_point.fill_kinfo_vnode__random_path to
   exercise the NT_PROCSTAT_FILES corruption.  It simply picks random
   lengths to expand or truncate paths to in fo_fill_kinfo_vnode().

 - Add a sysctl, kern.coredump_pack_fileinfo, to allow users to
   disable kinfo packing for PROCSTAT_FILES notes.  This should avoid
   both FILES note corruption and truncation, even if filenames change,
   at the cost of about 1 kiB in padding bloat per open fd.  Document
   the new sysctl in core.5.

 - Fix note_procstat_files to self-limit in the 2nd pass.  Since
   sometimes this will result in a short write, pad up to our advertised
   size.  This addresses note corruption, at the risk of sometimes
   truncating the last several fd info entries.

 - Fix NT_PROCSTAT_FILES consumers libutil and libprocstat to grok the
   zero padding.

With suggestions from:	bjk, jhb, kib, wblock
Approved by:	markj (mentor)
Relnotes:	yes
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3548
2015-09-03 20:32:10 +00:00
Mateusz Guzik
7e8f566c0c fd: remove UMA_ZONE_ZINIT argument from Files zone
Originally it was added in order to prevent trashing of objects with
INVARIANTS enabled. The same effect is now provided with mere UMA_ZONE_NOFREE.

This reverts r286921.

Discussed with:		kib
2015-09-02 23:14:39 +00:00
Konstantin Belousov
fe5ec54b50 fget_unlocked() depends on the freed struct file f_count field being
zero.  The file_zone if no-free, but r284861 added trashing of the
freed memory.  Most visible manifestation of the issue were 'memory
modified after free' panics for the file zone, triggered from
falloc_noinstall().

Add UMA_ZONE_ZINIT flag to turn off trashing.  Mjg noted that it makes
sense to not trash freed memory for any non-free zone, which will be
done later.

Reported and tested by:	pho
Discussed with:	mjg
Sponsored by:	The FreeBSD Foundation
2015-08-19 11:53:32 +00:00
Ed Schouten
e555b4309c Introduce falloc_caps() to create descriptors with capabilties in place.
falloc_noinstall() followed by finstall() allows you to create and
install file descriptors with custom capabilities. Add falloc_caps()
that can do both of these actions in one go.

This will be used by CloudABI to create pipes with custom capabilities.

Reviewed by:	mjg
2015-07-29 17:16:53 +00:00
Mateusz Guzik
2919a0c5c1 fd: partially deduplicate fdescfree and fdescfree_remapped
This also moves vrele of cdir/rdir/jdir vnodes earlier, which should not
matter.
2015-07-16 15:26:37 +00:00
Ed Schouten
457f7e23b1 Implement CloudABI's exec() call.
Summary:
In a runtime that is purely based on capability-based security, there is
a strong emphasis on how programs start their execution. We need to make
sure that we execute an new program with an exact set of file
descriptors, ensuring that credentials are not leaked into the process
accidentally.

Providing the right file descriptors is just half the problem. There
also needs to be a framework in place that gives meaning to these file
descriptors. How does a CloudABI mail server know which of the file
descriptors corresponds to the socket that receives incoming emails?
Furthermore, how will this mail server acquire its configuration
parameters, as it cannot open a configuration file from a global path on
disk?

CloudABI solves this problem by replacing traditional string command
line arguments by tree-like data structure consisting of scalars,
sequences and mappings (similar to YAML/JSON). In this structure, file
descriptors are treated as a first-class citizen. When calling exec(),
file descriptors are passed on to the new executable if and only if they
are referenced from this tree structure. See the cloudabi-run(1) man
page for more details and examples (sysutils/cloudabi-utils).

Fortunately, the kernel does not need to care about this tree structure
at all. The C library is responsible for serializing and deserializing,
but also for extracting the list of referenced file descriptors. The
system call only receives a copy of the serialized data and a layout of
what the new file descriptor table should look like:

    int proc_exec(int execfd, const void *data, size_t datalen, const int *fds,
              size_t fdslen);

This change introduces a set of fd*_remapped() functions:

- fdcopy_remapped() pulls a copy of a file descriptor table, remapping
  all of the file descriptors according to the provided mapping table.
- fdinstall_remapped() replaces the file descriptor table of the process
  by the copy created by fdcopy_remapped().
- fdescfree_remapped() frees the table in case we aborted before
  fdinstall_remapped().

We then add a function exec_copyin_data_fds() that builds on top these
functions. It copies in the data and constructs a new remapped file
descriptor. This is used by cloudabi_sys_proc_exec().

Test Plan:
cloudabi-run(1) is capable of spawning processes successfully, providing
it data and file descriptors. procstat -f seems to confirm all is good.
Regular FreeBSD processes also work properly.

Reviewers: kib, mjg

Reviewed By: mjg

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3079
2015-07-16 07:05:42 +00:00
Mateusz Guzik
8a08cec166 Create a dedicated function for ensuring that cdir and rdir are populated.
Previously several places were doing it on its own, partially
incorrectly (e.g. without the filedesc locked) or even actively harmful
by populating jdir or assigning rootvnode without vrefing it.

Reviewed by:	kib
2015-07-11 16:22:48 +00:00
Mateusz Guzik
f0725a8e1e Move chdir/chroot-related fdp manipulation to kern_descrip.c
Prefix exported functions with pwd_.

Deduplicate some code by adding a helper for setting fd_cdir.

Reviewed by:	kib
2015-07-11 16:19:11 +00:00
Mateusz Guzik
9a1ad66fb5 fd: further cleanup of kern_dup
- make mode enum start from 0 so that the assertion covers all cases [1]
- rename prefix _CLOEXEC flag with _FLAG
- postpone fhold on the old file descriptor, which eliminates the need to fdrop
  in error cases.
- fixup FDDUP_FCNTL check missed in the previous commit

This removes 'fp == oldfde->fde_file' assertion which had little value. kern_dup
only calls fd-related functions which cannot drop the lock or a whole lot of
races would be introduced.

Noted by: kib [1]
2015-07-10 13:54:03 +00:00
Mateusz Guzik
5fe97c20dc fd: split kern_dup flags argument into actual flags and a mode
Tidy up the code inside to switch on the mode.
2015-07-10 11:01:30 +00:00
Ed Schouten
2491302a04 Add implementations for some of the CloudABI file descriptor system calls.
All of the CloudABI system calls that operate on file descriptors of an
arbitrary type are prefixed with fd_. This change adds wrappers for
most of these system calls around their FreeBSD equivalents.

The dup2() system call present on CloudABI deviates from POSIX, in the
sense that it can only be used to replace existing file descriptor. It
cannot be used to create new ones. The reason for this is that this is
inherently thread-unsafe. Furthermore, there is no need on CloudABI to
use fixed file descriptor numbers. File descriptors 0, 1 and 2 have no
special meaning.

This change exposes the kern_dup() through <sys/syscallsubr.h> and puts
the FDDUP_* flags in <sys/filedesc.h>. It then adds a new flag,
FDDUP_MUSTREPLACE to force that file descriptors are replaced -- not
allocated.

Differential Revision:	https://reviews.freebsd.org/D3035
Reviewed by:	mjg
2015-07-09 16:07:01 +00:00
Mateusz Guzik
efdc25304c fd: prepare do_dup for being exported
- rename it to kern_dup.
- prefix flags with FD
- assert that correct flags were passed
2015-07-09 15:19:45 +00:00
Konstantin Belousov
69d11def74 Handle copyout for the fcntl(F_OGETLK) using oflock structure.
Otherwise, kernel overwrites a word past the destination.

Submitted by:	walter@pelissero.de
PR:	196718
MFC after:	1 week
2015-07-08 13:19:13 +00:00
Mateusz Guzik
f131759f54 fd: make 'rights' a manadatory argument to fget* functions 2015-07-05 19:05:16 +00:00
Mateusz Guzik
dba0bec2bb fd: de-k&r-ify functions + some whitespace fixes
No functional changes.
2015-07-04 15:42:03 +00:00
Mateusz Guzik
9ef8328d52 fd: make rights a mandatory argument to fget_unlocked 2015-06-16 09:52:36 +00:00
Mateusz Guzik
80f3623f2f fd: don't unnecessary copy capabilities in _fget 2015-06-16 09:08:30 +00:00
Mateusz Guzik
cedab3c72c fd: reduce excessive zeroing on fd close
fde_file as NULL is already an indicator of an unused fd. All other
fields are populated when fp is installed.
2015-06-14 14:10:05 +00:00
Mateusz Guzik
ea31808c3b fd: move out actual fp installation to _finstall
Use it in fd passing functions as the first step towards fd code cleanup.
2015-06-14 14:08:52 +00:00
Mateusz Guzik
21de5aea6c Fixup the build after r284215.
Submitted by:	Ivan Klymenko <fidaj ukr.net> [slighly modified]
2015-06-10 12:39:01 +00:00
Mateusz Guzik
f6f6d24062 Implement lockless resource limits.
Use the same scheme implemented to manage credentials.

Code needing to look at process's credentials (as opposed to thred's) is
provided with *_proc variants of relevant functions.

Places which possibly had to take the proc lock anyway still use the proc
pointer to access limits.
2015-06-10 10:48:12 +00:00
Mateusz Guzik
3b3eb22ab6 fd: remove fdesc_mtx 2015-06-10 09:40:07 +00:00
Mateusz Guzik
153cc61b54 fd: use atomics to manage fd_refcnt and fd_holcnt
This gets rid of fdesc_mtx.
2015-06-10 09:34:50 +00:00
Mateusz Guzik
747c0dd67c fd: fix imbalanced fdp unlock in F_SETLK and F_GETLK
MFC after:	3 days
2015-05-18 14:27:04 +00:00
Edward Tomasz Napierala
4b5c9cf62f Add kern.racct.enable tunable and RACCT_DISABLED config option.
The point of this is to be able to add RACCT (with RACCT_DISABLED)
to GENERIC, to avoid having to rebuild the kernel to use rctl(8).

Differential Revision:	https://reviews.freebsd.org/D2369
Reviewed by:	kib@
MFC after:	1 month
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
2015-04-29 10:23:02 +00:00
Mateusz Guzik
8d0a4ab212 fd: plug an always overwritten initialization in fdalloc 2015-04-26 17:27:55 +00:00
Mateusz Guzik
90f54cbfeb fd: remove filedesc argument from fdclose
Just accept a thread instead. This makes it consistent with fdalloc.

No functional changes.
2015-04-11 15:40:28 +00:00
Mateusz Guzik
ea926658ff filedesc: microoptimize fget_unlocked by getting rid of fd < 0 branch
Casting fd to an unsigned type simplifies fd range coparison to mere checking
if the result is bigger than the table.
2015-03-24 00:10:11 +00:00
Ian Lepore
1eafc07856 Set the SBUF_INCLUDENUL flag in sbuf_new_for_sysctl() so that sysctl
strings returned to userland include the nulterm byte.

Some uses of sbuf_new_for_sysctl() write binary data rather than strings;
clear the SBUF_INCLUDENUL flag after calling sbuf_new_for_sysctl() in
those cases.  (Note that the sbuf code still automatically adds a nulterm
byte in sbuf_finish(), but since it's not included in the length it won't
get copied to userland along with the binary data.)

Remove explicit adding of a nulterm byte in a couple places now that it
gets done automatically by the sbuf drain code.

PR:		195668
2015-03-14 17:08:28 +00:00
Mateusz Guzik
8fbda7f00b filedesc: obtain a stable copy of credentials in fget_unlocked
This was broken in r278930.

While here tidy up fget_mmap to use fdp from local var instead of obtaining
the same pointer from td.
2015-02-18 13:37:28 +00:00
Mateusz Guzik
b7a39e9e07 filedesc: simplify fget_unlocked & friends
Introduce fget_fcntl which performs appropriate checks when needed.
This removes a branch from fget_unlocked.

Introduce fget_mmap dealing with cap_rights_to_vmprot conversion.
This removes a branch from _fget.

Modify fget_unlocked to pass sequence counter to interested callers so
that they can perform their own checks and make sure the result was
otained from stable & current state.

Reviewed by:	silence on -hackers
2015-02-17 23:54:06 +00:00
Mateusz Guzik
5e7cd3ec22 filedesc: avoid spurious copying of capabilities in fget_unlocked
We obtain a stable copy and store it in local 'fde' variable. Storing another
copy (based on aforementioned variable) does not serve any purpose.

No functional changes.
2015-01-21 18:32:53 +00:00
Mateusz Guzik
f9051b0e02 filedesc: return 0 from badfo_close
The only potential in-tree consumer (_fdrop) special-cased it and returns 0
0 on its own instead of calling badfo_close.

Remove the special case since it is not needed and very unlikely to encounter
anyway.

No objections from:	kib
2015-01-21 18:05:42 +00:00
Mateusz Guzik
5751146497 filedesc: fix whitespace nits in fget and fget_read
No functional changes.
2015-01-21 18:02:28 +00:00
Mateusz Guzik
c31c057957 filedesc: plug a test for impossible condition in _fget 2015-01-21 01:06:14 +00:00
John Baldwin
20abb66ede Properly initialize the capability rights for vnodes exported to procstat
that aren't for file descriptors (cwd, jdir, tracevp, etc.).

Submitted by:	Mikhail <mp@lenta.ru>
2014-11-24 18:34:11 +00:00
Mateusz Guzik
0c0d16e8ac filedesc: plug a test for impossible condition in fgetvp_rights 2014-11-23 00:12:27 +00:00
Mateusz Guzik
eb48fbd963 filedesc: fixup fdinit to lock fdp and preapare files conditinally
Not all consumers providing fdp to copy from want files.

Perhaps these functions should be reorganized to better express the outcome.

This fixes up panics after r273895 .

Reported by:	markj
2014-11-13 21:15:09 +00:00
Konstantin Belousov
6e646651d3 Remove the no-at variants of the kern_xx() syscall helpers. E.g., we
have both kern_open() and kern_openat(); change the callers to use
kern_openat().

This removes one (sometimes two) levels of indirection and
consolidates arguments checks.

Reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-13 18:01:51 +00:00
Gleb Smirnoff
0e87b36eaa Remove SF_KQUEUE code. This code was developed at Netflix, but was not
ever used.  It didn't go into stable/10, neither was documented.
It might be useful, but we collectively decided to remove it, rather
leave it abandoned and unmaintained.  It is removed in one single
commit, so restoring it should be easy, if anyone wants to reopen
this idea.

Sponsored by:	Netflix
2014-11-11 20:32:46 +00:00
Mateusz Guzik
bfda9935bd Add sysctl kern.proc.cwd
It returns only current working directory of given process which saves a lot of
overhead over kern.proc.filedesc if given proc has a lot of open fds.

Submitted by:	Tiwei Bie <btw mail.ustc.edu.cn> (slightly modified)
X-Additional:	JuniorJobs project
2014-11-06 08:12:34 +00:00
Mateusz Guzik
3ae366de58 filedesc: avoid taking fdesc_mtx when not necessary in fddrop
No functional changes.
2014-11-06 07:44:10 +00:00
Mateusz Guzik
eb6021fb96 filedesc: just free old tables without altering the list which is freed anyway
No functional changes.
2014-11-06 07:37:31 +00:00
Mateusz Guzik
324a7026f1 filedesc: plus sys/kdb.h include which crept in with r274007 2014-11-03 06:24:43 +00:00
Mateusz Guzik
1d29258ac2 filedesc: plug unnecessary fdp NULL checks in fdescfreee and fdcopy
Anything reaching these functions has fd table.
2014-11-03 05:12:17 +00:00
Mateusz Guzik
32417098f0 filedesc: create a dedicated zone for struct filedesc0
Currently sizeof(struct filedesc0) is 1096 bytes, which means allocations from
malloc use 2048 bytes.

There is no easy way to shrink the structure <= 1024 an it is likely to grow in
the future.
2014-11-03 04:16:04 +00:00
Mateusz Guzik
3dca54ab98 filedesc: move freeing old tables to fdescfree
They cannot be accessed by anyone and hold count only protects the structure
from being freed.
2014-11-02 14:12:03 +00:00
Mateusz Guzik
3dc85312b2 filedesc: factor out some code out of fdescfree
Previously it had a huge self-contained chunk dedicated to dealing with shared
tables.

No functional changes.
2014-11-02 13:43:04 +00:00
Mateusz Guzik
080fdefc28 filedesc: tidy up fdcheckstd
No functional changes.
2014-11-02 02:32:33 +00:00
Mateusz Guzik
d3f3e12a4f filedesc: lock filedesc lock in fdcloseexec only when needed 2014-11-02 01:13:11 +00:00
Mateusz Guzik
2534d8eeb6 filedesc: drop retval argument from do_dup
It was almost always td_retval anyway.

For the one case where it is not, preserve the old value across the call.
2014-10-31 10:35:01 +00:00
Mateusz Guzik
8a5177cca3 filedesc: fix missed comments about fdsetugidsafety
While here just note that both fdsetugidsafety and fdcheckstd take sleepable
locks.
2014-10-31 09:56:00 +00:00
Mateusz Guzik
f652d856ab filedesc: make fdinit return with source filedesc locked and new one sized
appropriately

Assert FILEDESC_XLOCK_ASSERT only for already used tables in fdgrowtable.
We don't have to call it with the lock held if we are just creating new
filedesc.

As a side note, strictly speaking processes can have fdtables with
fd_lastfile = -1, but then they cannot enter fdgrowtable. Very first file
descriptor they get will be 0 and the only syscall allowing to choose fd number
requires an active file descriptor. Should this ever change, we can add an 'init'
(or similar) parameter to fdgrowtable.
2014-10-31 09:25:28 +00:00
Mateusz Guzik
ffeb890592 filedesc: iterate over fd table only once in fdcopy
While here add 'fdused_init' which does not perform unnecessary work.

Drop FILEDESC_LOCK_ASSERT from fdisused and rely on callers to hold
it when appropriate. This function is only used with INVARIANTS.

No functional changes intended.
2014-10-31 09:19:46 +00:00
Mateusz Guzik
1a0c80a3df filedesc: tidy up fdfree
Implement fdefree_last variant and get rid of 'last' parameter.

No functional changes.
2014-10-31 09:15:59 +00:00
Mateusz Guzik
b97a758ffc filedesc: tidy up fdcopy a little bit
Test for file availability by fde_file != NULL instead of fdisused, this is
consistent with similar checks later.

Drop badfileops check. badfileops don't have DFLAG_PASSABLE set, so it was never
reached in practice.

fdiused is now only used in some KASSERTS, so ifdef it under INVARIANTS.

No functional changes.
2014-10-31 05:41:27 +00:00
Mateusz Guzik
f55cf4b0d1 filedesc: make sure to force table reload in fget_unlocked when count == 0
This is a fixup to r273843.
2014-10-30 07:21:38 +00:00
Mateusz Guzik
29c85772bb filedesc: microoptimize fget_unlocked by retrying obtaining reference count
without restarting whole lookup

Restart is only needed when fp was closed by current process, which is a much
rarer event than ref/deref by some other thread.
2014-10-30 05:21:12 +00:00
Mateusz Guzik
aa77d52800 filedesc: get rid of atomic_load_acq_int from fget_unlocked
A read barrier was necessary because fd table pointer and table size were
updated separately, opening a window where fget_unlocked could read new size
and old pointer.

This patch puts both these fields into one dedicated structure, pointer to which
is later atomically updated. As such, fget_unlocked only needs data a dependency
barrier which is a noop on all supported architectures.

Reviewed by:	kib (previous version)
MFC after:	2 weeks
2014-10-30 05:10:33 +00:00
Mateusz Guzik
58a3dcb229 filedesc assert that table size is at least 3 in fdsetugidsafety
Requested by: kib
2014-10-22 08:56:57 +00:00
Mateusz Guzik
11888da8d9 filedesc: cleanup setugidsafety a little
Rename it to fdsetugidsafety for consistency with other functions.

There is no need to take filedesc lock if not closing any files.

The loop has to verify each file and we are guaranteed fdtable has space
for at least 20 fds. As such there is no need to check fd_lastfile.

While here tidy up is_unsafe.
2014-10-22 00:23:43 +00:00
Hans Petter Selasky
f0188618f2 Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
Mateusz Guzik
966ee9f25f filedesc: plug 2 write-only variables
Reported by: Coverity
CID: 1245745, 1245746
2014-10-20 21:57:24 +00:00
Mateusz Guzik
55056be254 filedesc: plug 2 assignments to M_ZERO-ed pointers in falloc_noinstall
No functional changes.
2014-10-15 01:16:11 +00:00
Mateusz Guzik
2b4a2528d7 filedesc: fix up breakage introduced in 272505
Include sequence counter supports incoditionally [1]. This fixes reprted build
problems with e.g. nvidia driver due to missing opt_capsicum.h.

Replace fishy looking sizeof with offsetof. Make fde_seq the last member in
order to simplify calculations.

Suggested by:	kib [1]
X-MFC:		with 272505
2014-10-05 19:40:29 +00:00
Konstantin Belousov
57c2505e65 On error, sbuf_bcat() returns -1. Some callers returned this -1 to
the upper layers, which interpret it as errno value, which happens to
be ERESTART.  The result was spurious restarts of the sysctls in loop,
e.g. kern.proc.proc, instead of returning ENOMEM to caller.

Convert -1 from sbuf_bcat() to ENOMEM, when returning to the callers
expecting errno.

In collaboration with:	pho
Sponsored by:	The FreeBSD Foundation (kib)
MFC after:	1 week
2014-10-05 17:35:59 +00:00
Mateusz Guzik
ee3fd7bbb1 Plug capability races.
fp and appropriate capability lookups were not atomic, which could result in
improper capabilities being checked.

This could result either in protection bypass or in a spurious ENOTCAPABLE.

Make fp + capability check atomic with the help of sequence counters.

Reviewed by:	kib
MFC after:	3 weeks
2014-10-04 08:08:56 +00:00
Mateusz Guzik
0c4a09a378 Make do_dup() static and move relevant macros to kern_descrip.c
No functional changes.
2014-09-26 19:48:47 +00:00
Konstantin Belousov
f69261f2f9 Fix fcntl(2) compat32 after r270691. The copyin and copyout of the
struct flock are done in the sys_fcntl(), which mean that compat32 used
direct access to userland pointers.

Move code from sys_fcntl() to new wrapper, kern_fcntl_freebsd(), which
performs neccessary userland memory accesses, and use it from both
native and compat32 fcntl syscalls.

Reported by:	jhibbits
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2014-09-25 21:07:19 +00:00
John Baldwin
9696feebe2 Add a new fo_fill_kinfo fileops method to add type-specific information to
struct kinfo_file.
- Move the various fill_*_info() methods out of kern_descrip.c and into the
  various file type implementations.
- Rework the support for kinfo_ofile to generate a suitable kinfo_file object
  for each file and then convert that to a kinfo_ofile structure rather than
  keeping a second, different set of code that directly manipulates
  type-specific file information.
- Remove the shm_path() and ksem_info() layering violations.

Differential Revision:	https://reviews.freebsd.org/D775
Reviewed by:	kib, glebius (earlier version)
2014-09-22 16:20:47 +00:00
John Baldwin
2d69d0dcc2 Fix various issues with invalid file operations:
- Add invfo_rdwr() (for read and write), invfo_ioctl(), invfo_poll(),
  and invfo_kqfilter() for use by file types that do not support the
  respective operations.  Home-grown versions of invfo_poll() were
  universally broken (they returned an errno value, invfo_poll()
  uses poll_no_poll() to return an appropriate event mask).  Home-grown
  ioctl routines also tended to return an incorrect errno (invfo_ioctl
  returns ENOTTY).
- Use the invfo_*() functions instead of local versions for
  unsupported file operations.
- Reorder fileops members to match the order in the structure definition
  to make it easier to spot missing members.
- Add several missing methods to linuxfileops used by the OFED shim
  layer: fo_write(), fo_truncate(), fo_kqfilter(), and fo_stat().  Most
  of these used invfo_*(), but a dummy fo_stat() implementation was
  added.
2014-09-12 21:29:10 +00:00
John Baldwin
0ed667f6e5 Simplify vntype_to_kinfo() by returning when the desired value is found
instead of breaking out of the loop and then immediately checking the loop
index so that if it was broken out of the proper value can be returned.

While here, use nitems().
2014-09-12 20:56:09 +00:00
Mateusz Guzik
64196a9996 Plug unnecessary fp assignments in kern_fcntl.
No functional changes.
2014-09-05 23:56:25 +00:00
Gleb Smirnoff
e86447ca44 - Remove socket file operations declaration from sys/file.h.
- Make them static in sys_socket.c.
- Provide generic invfo_truncate() instead of soo_truncate().

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-08-26 14:44:08 +00:00
Mateusz Guzik
037755fd15 Fix up races with f_seqcount handling.
It was possible that the kernel would overwrite user-supplied hint.

Abuse vnode lock for this purpose.

In collaboration with: kib
MFC after:	1 week
2014-08-26 08:17:22 +00:00
Mateusz Guzik
a1bf811596 Prepare fget_unlocked for reading fd table only once.
Some capsicum functions accept fdp + fd and lookup fde based on that.
Add variants which accept fde.

Reviewed by:	pjd
MFC after:	1 week
2014-07-23 19:33:49 +00:00
Mateusz Guzik
b23c40d7b1 Don't zero fd_nfiles during fdp destruction.
Code trying to take a look has to check fd_refcnt and it is 0 by that time.

This is a follow up to r268505, without this the code would leak memory for
tables bigger than the default.

MFC after:	1 week
2014-07-10 21:05:45 +00:00
Mateusz Guzik
e518baf8f9 Avoid relocking filedesc lock when closing fds during fdp destruction.
Don't call bzero nor fdunused from fdfree for such cases. It would do
unnecessary work and complain that the lock is not taken.

MFC after:	1 week
2014-07-10 20:59:54 +00:00
Mateusz Guzik
b9d32c36fa Make fdunshare accept only td parameter.
Proc had to match the thread anyway and 2 parameters were inconsistent
with the rest.

MFC after:	1 week
2014-06-28 05:41:53 +00:00
Mateusz Guzik
35778d7aa9 Make sure to always clear p_fd for process getting rid of its filetable.
Filetable can be shared with other processes. Previous code failed to
clear the pointer for all but the last process getting rid of the table.
This is mostly cosmetics.

Get rid of 'This should happen earlier' comment. Clearing the pointer in
this place is fine as consumers can reliably check for files availability
by inspecting fd_refcnt and vnodes availabity by NULL-checking them.

MFC after:	1 week
2014-06-28 05:18:03 +00:00
Mateusz Guzik
450570a55e Tidy up fd-related functions called by do_execve
o assert in each one that fdp is not shared
o remove unnecessary NULL checks - all userspace processes have fdtables
and kernel processes cannot execve
o remove comments about the danger of fd_ofiles getting reallocated - fdtable
is not shared and fd_ofiles could be only reallocated if new fd was about to be
added, but if that was possible the code would already be buggy as setugidsafety
work could be undone

MFC after:	1 week
2014-06-23 01:28:18 +00:00
Mateusz Guzik
158627616c Don't take filedesc lock in fdunshare().
We can read refcnt safely and only care if it is equal to 1.

If it could suddenly change from 1 to something bigger the code would be
buggy even in the previous form and transitions from > 1 to 1 are equally racy
and harmless (we copy even though there is no need).

MFC after:	1 week
2014-06-22 21:37:27 +00:00
Mateusz Guzik
adf87ab01c fd: replace fd_nfiles with fd_lastfile where appropriate
fd_lastfile is guaranteed to be the biggest open fd, so when the intent
is to iterate over active fds or lookup one, there is no point in looking
beyond that limit.

Few places are left unpatched for now.

MFC after:	1 week
2014-06-22 01:31:55 +00:00
Mateusz Guzik
0f0b852c73 do_dup: plug redundant adjustment of fd_lastfile
By that time it was already set by fdalloc, or was there in the first place
if fd is replaced.

MFC after:	1 week
2014-06-22 00:53:33 +00:00
Mateusz Guzik
f2b1eaec33 Request a non-exiting process in sysctl_kern_proc_{o,}filedesc
This fixes a race with exit1 freeing p_textvp.

Suggested by:	kib
MFC after:	1 week
2014-05-02 21:55:09 +00:00
Mateusz Guzik
210a5d1689 Garbage collect fdavail.
It rarely returns an error and fdallocn handles the failure of fdalloc
just fine.
2014-04-04 05:07:36 +00:00