Commit Graph

13593 Commits

Author SHA1 Message Date
Gleb Smirnoff
b5c32cf481 Remove identical vnet sysctl handlers, and handle CTLFLAG_VNET
in the sysctl_root().

Note: SYSCTL_VNET_* macros can be removed as well. All is
  needed to virtualize a sysctl oid is set CTLFLAG_VNET on it.
  But for now keep macros in place to avoid large code churn.

Sponsored by:	Nginx, Inc.
2014-02-07 13:47:33 +00:00
John Baldwin
e432d5f6a7 Drop the 3rd clause from all 3 clause BSD licenses where I am the sole
holder to convert them to 2 clause BSD licenses.

MFC after:	1 week
2014-02-05 18:13:27 +00:00
Nathan Whitehorn
a8a9b1c250 ULE works on Book-E since r258002, so remove statements to the contrary. 2014-02-01 20:46:35 +00:00
Jamie Gritton
f15444cc97 Back out r261266 pending security buy-in.
r261266:
  Add a jail parameter, allow.kmem, which lets jailed processes access
  /dev/kmem and related devices (i.e. grants PRIV_IO and PRIV_KMEM_WRITE).
  This in conjunction with changing the drm driver's permission check from
  PRIV_DRIVER to PRIV_KMEM_WRITE will allow a jailed Xorg server.
2014-01-31 17:39:51 +00:00
Konstantin Belousov
49d39308ba The posix_madvise(3) and posix_fadvise(2) should return error on
failure, same as posix_fallocate(2).

Noted by:	Bob Bishop <rb@gid.co.uk>
Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-01-30 18:04:39 +00:00
Jamie Gritton
109ca2d5f1 Add a jail parameter, allow.kmem, which lets jailed processes access
/dev/kmem and related devices (i.e. grants PRIV_IO and PRIV_KMEM_WRITE).
This in conjunction with changing the drm driver's permission check from
PRIV_DRIVER to PRIV_KMEM_WRITE will allow a jailed Xorg server.

Submitted by:	netchild
MFC after:	1 week
2014-01-29 13:41:13 +00:00
John-Mark Gurney
3a6cdc4e55 fix spelling of lock_initialized.. jhb approved..
MFC after:	1 week
2014-01-28 17:27:54 +00:00
Christian S.J. Peron
c297f0e497 Allow sigwait(2) in capabilities mode.
It's common for multi-threaded processes to create a thread for
the purpose of synchronously processing signals. Allow such processes to
utilize a capabilities sandbox.

Discussed with:	rwatson, pjd
MFC after:	2 weeks
2014-01-28 01:49:49 +00:00
Robert Millan
f55a7d3058 Accept O_CLOEXEC in shm_open().
Reviewed by:	jilles, jhb
MFC after:	1 week
2014-01-24 21:05:07 +00:00
Konstantin Belousov
2852de0489 The posix_fallocate(2) syscall should return error number on error,
without modifying errno.

Reported and tested by:	Gennady Proskurin <gpr@mail.ru>
Reviewed by:	mdf
PR:	standards/186028
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-01-23 17:24:26 +00:00
Warner Losh
26fbe13c56 Implement generic support for early printf. Thought I can't find the
paper trail now, this patch is similar to one posted for one of the
preliminary versions of a new armv6 port. I took them and made them
more generic. Option not enabled by default since each board/port has
to provide its own eputc, and possibly do other things as well...
2014-01-22 21:20:08 +00:00
John Baldwin
44afcdabf3 Fix a typo. 2014-01-21 03:24:52 +00:00
Neel Natu
84cc772fe5 Bump up WITNESS_COUNT from 1024 to 1536 so there are sufficient entries for
WITNESS to actually work.

Reviewed by:	jhb@
2014-01-20 01:59:35 +00:00
Gleb Smirnoff
761a9a1f8d Fix comment. 2014-01-17 11:09:05 +00:00
Adrian Chadd
0cfea1c8fc Implement a kqueue notification path for sendfile.
This fires off a kqueue note (of type sendfile) to the configured kqfd
when the sendfile transaction has completed and the relevant memory
backing the transaction is no longer in use by this transaction.
This is analogous to SF_SYNC waiting for the mbufs to complete -
except now you don't have to wait.

Both SF_SYNC and SF_KQUEUE should work together, even if it
doesn't necessarily make any practical sense.

This is designed for use by applications which use backing cache/store
files (eg Varnish) or POSIX shared memory (not sure anything is using
it yet!) to know when a region of memory is free for re-use.  Note
it doesn't mark the region as free overall - only free from this
transaction.  The application developer still needs to track which
ranges are in the process of being recycled and wait until all
pending transactions are completed.

TODO:

* documentation, as always

Sponsored by:	Netflix, Inc.
2014-01-17 05:26:55 +00:00
Adrian Chadd
fda21f4d2a Add in a default initialiser for the EVOPS_SENDFILE kqueue filterops.
Sponsored by:	Netflix, Inc.
2014-01-17 05:15:44 +00:00
Gleb Smirnoff
d978bbea8a Simplify wait/nowait code, eventually killing last remnant of
historical mbuf(9) allocator flag.

Sponsored by:	Nginx, Inc.
2014-01-16 13:45:41 +00:00
Gleb Smirnoff
94985f742b Remove historical macro.
Sponsored by:	Nginx, Inc.
2014-01-16 13:42:50 +00:00
Bryan Venteicher
fb6c25186b Add sglist_append_bio(9) to append a struct bio's data to a sglist
Reviewed by:	jhb, kib (long ago)
2014-01-13 04:41:08 +00:00
Adrian Chadd
a43caef195 Refactor out the common sendfile code from the do_sendfile() and the
compat32 sendfile syscall.

Sponsored by:	Netflix, Inc.
2014-01-09 00:11:14 +00:00
Adrian Chadd
faa9b054a0 Add a compile-time control over the size of KN_HASHSIZE.
This is needed for applications that use a lot of non-filedescriptor
knotes.

MFC after:	1 week
Sponsored by:	Netflix, Inc.
2014-01-07 01:17:27 +00:00
Mateusz Guzik
231a0fe857 Plug a memory leak in dup2 when both old and new fd have ioctl caps.
Reviewed by:	pjd
MFC after:	3 days
2014-01-03 16:36:55 +00:00
Mateusz Guzik
0918d4b21f Don't check for fd limits in fdgrowtable_exp.
Callers do that already and additional check races with process
decreasing limits and can result in not growing the table at all, which
is currently not handled.

MFC after:	3 days
2014-01-03 16:34:16 +00:00
Warner Losh
9520f95242 Delete echoed doesn't rub out the previous character, so always
use <backspace> <space> <backspace> instead. This fixes hitting
DELETE instead of BACKSPACE at mountroot> prompt.
2013-12-31 04:40:25 +00:00
Mark Johnston
7dba784986 The arguments to sched:::off-cpu are the thread and associated process of
the thread selected to run, not the currently running thread. This fix has
already been made for ULE in r252070.

PR:		177706
MFC after:	1 week
2013-12-29 17:08:30 +00:00
Konstantin Belousov
fe20047039 Fix accounting for the negative cache entries when reusing v_cache_dd.
Having ncneg diverge with the actual length of the ncneg tailq causes
NULL dereference.

Add assertion that an entry taken from ncneg queue is indeed negative.

Reported by and discussed with:	avg
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-12-27 17:09:59 +00:00
Konstantin Belousov
e136eac224 Revert r259200. There are geoms/drivers which do not update
bio_completed, only manage bio_resid, e.g. sa(4).

Reported and tested by:	Manfred Antar <null@pozo.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-12-27 17:04:51 +00:00
Dimitry Andric
d3fdc73431 In sys/kern/vfs_mountroot.c, remove static function parse_isspace(),
which is unused since r214006.

MFC after:	3 days
2013-12-25 22:14:42 +00:00
Dimitry Andric
951d674203 In sys/kern/subr_witness.c, remove static function
witness_lock_order_key_empty(), which is unused since r181695.

MFC after:	3 days
2013-12-25 16:58:14 +00:00
Dimitry Andric
3371b88c7b In sys/kern/sched_ule.c, remove static function sched_both(), which is
unused since r232207.

MFC after:	3 days
2013-12-25 16:25:54 +00:00
Ed Schouten
a3da01fc76 Fix copy-pasting of CJK fullwidth characters.
They are stored as two separate characters in the vtbuf, so copy-pasting
will cause them to be passed to terminal_input_char() twice. Extend
terminal_input_char() to explicitly discard characters with TF_CJK_RIGHT
set. This causes only the left part to generate input.
2013-12-24 18:42:26 +00:00
Aleksandr Rybalko
7a1a32c4ef o Add virtual terminal mmap request handler.
o Forward termianl framebuffer ioctl to fbd.
o Forward terminal mmap request to fbd.
o Move inclusion of sys/conf.h to vt.h.

Sponsored by:	The FreeBSD Foundation
2013-12-23 18:09:10 +00:00
Ed Schouten
a6c26592f1 Extend libteken to support CJK fullwidth characters.
Introduce a new formatting bit (TF_CJK_RIGHT) that is set when putting a
cell that is the right part of a CJK fullwidth character. This will
allow drivers like vt(9) to support fullwidth characters properly.

emaste@ has a patch to extend vt(9)'s font handling to increase the
number of Unicode -> glyph maps from 2 ({normal,bold)} to 4
({normal,bold} x {left,right}). This will need to use this formatting
bit to determine whether to draw the left or right glyph.

Reviewed by:	emaste
2013-12-20 21:31:50 +00:00
Gleb Smirnoff
7276319825 Move list of ttys handling from the allocating procedures, to the
device creation stage. A device creation can fail, and in that case
an entry already on the list will be freed.

Sponsored by:	Nginx, Inc.
2013-12-20 19:45:51 +00:00
Stefan Eßer
774e8d906f Fix compilation on 32 bit architectures and use INT64_MAX instead of
LONG_MAX for the upper bound check.
2013-12-19 21:35:33 +00:00
Stefan Eßer
53d5cc255d Fix overflow for timeout values of more than 68 years, which is the maximum
covered by sbintime (LONG_MAX seconds).

Some programs use timeout values in excess of 1000 years. The conversion
to sbintime caused wrap-around on overflow, which resulted in short or
negative timeout values. This caused long delays on sockets opened by
affected programs (e.g. OpenSSH).

Kernels compiled without -fno-strict-overflow were not affected, apparently
because the compiler tested the sign of the timeout value before performing
the multiplication that lead to overflow.

When the -fno-strict-overflow option was added to CFLAGS, this optimization
was disabled and the test was performed on the result of the multiplication.
Negative products were caught and resulted in EINVAL being returned, but
wrap-around to positive values just shortened the timeout value to the
residue of the result that could be represented by sbintime.

The fix is to cap the timeout values at the maximum that can be represented
by sbintime, which is 2^31 - 1 seconds or more than 68 years.

After this change, the kernel can be compiled with -fno-strict-overflow
with no ill effects.

MFC after:	3 days
2013-12-19 09:01:46 +00:00
Mark Johnston
8f7254629f Invoke the kld_* event handlers from linker_load_file() and
linker_unload_file() rather than kern_kldload() and kern_kldunload(). This
ensures that the handlers are invoked for files that are loaded/unloaded
automatically as dependencies. Previously, they were only invoked for files
loaded by a user.

As a side effect, the kld_load and kld_unload handlers are now invoked with
the kernel linker lock exclusively held.

Reported by:	avg
Reviewed by:	jhb
MFC after:	2 weeks
2013-12-19 03:48:36 +00:00
Gleb Smirnoff
e1e585a87c - Rename tty_makedev() into tty_makedevf() and make it capable
to fail and return error.
- Use make_dev_p() in tty_makedevf() instead of make_dev_cred().
- Always pass MAKEDEV_CHECKNAME flag.
- Optionally pass MAKEDEV_REF flag.
- Provide macro for compatibility with old API.

This fixes races with simultaneous creation and desctruction of
ttys, and makes it possible to call tty_makedevf() from device
cloners.

A race in tty_watermarks() still exist, since the latter drops
lock for M_WAITOK allocation. This will be addressed in separate
commit.

Reviewed by:	kib
Sponsored by:	Nginx, Inc.
2013-12-18 12:50:43 +00:00
Mark Johnston
7159310fa6 The fasttrap fork handler is responsible for removing tracepoints in the
child process that were inherited from its parent. However, this should
not be done in the case of a vfork, since the fork handler ends up removing
the tracepoints from the shared vm space, and userland DTrace probes in the
parent will no longer fire as a result.

Now the child of a vfork may trigger userland DTrace probes enabled in its
parent, so modify the fasttrap probe handler to handle this case and handle
the child process in the same way that it would handle the traced process.
In particular, if once traces function foo() in a process that vforks, and
the child calls foo(), fasttrap will treat this call as having come from the
parent. This is the behaviour of the upstream code.

While here, add #ifdef guards to some code that isn't present upstream.

MFC after:	1 month
2013-12-18 01:41:52 +00:00
Konstantin Belousov
65f05eeb3d If vn_open_vnode() succeeded in opening the vnode, but subsequent
advisory lock cannot be obtained, prevent double-close of the vnode in
vn_close() called from the fdrop(), by resetting file' f_ops methods.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-12-17 17:31:16 +00:00
Andrey V. Elsukov
da0770bd57 Fix copy/paste typo.
MFC after:	1 week
2013-12-17 16:45:19 +00:00
Attilio Rao
e7a9eed7a8 - Assert for not leaking readers rw locks counter on userland return.
- Use a correct spin_cnt for KDTRACE_HOOK case in rw read lock.

Sponsored by:	EMC / Isilon storage division
2013-12-17 13:37:02 +00:00
Adrian Chadd
dc3bdd4ad9 Remove the invariants stuff I copy/paste'd from the mbuf code when
setting up the UMA zone.

This should (a) be correct(er) and (b) it should build on non-amd64.

Pointed out by: glebius
2013-12-17 03:06:21 +00:00
Adrian Chadd
73242a5ee1 Migrate the sendfile_sync struct to use a UMA zone rather than M_TEMP.
This allows it to be better tracked as well as being able to leverage
UMA for more interesting/useful behaviour at a later date.

Sponsored by:	Netflix, Inc.
2013-12-16 19:31:23 +00:00
Alexander Motin
e37e08c7bf Fix periodic per-CPU timers startup on boot.
Reported by:	neel
MFC after:	2 weeks
2013-12-16 13:52:18 +00:00
Marcel Moolenaar
15773775f7 Properly drain the TTY when both revoke(2) and close(2) end up closing
the TTY. In such a case, ttydev_close() is called multiple times and
each time, t_revokecnt is incremented and cv_broadcast() is called for
both the t_outwait and t_inwait condition variables.
Let's say revoke(2) comes in first and gets to call tty_drain() from
ttydev_leave(). Let's say that the revoke comes from init(8) as the
result of running "shutdown -r now". Since shutdown prints various
messages to the console before announing that the machine will reboot
immediately, let's also say that the output queue is not empty and
that tty_drain() has something to do. Let's assume this all happens
on a 9600 baud serial console, so it takes a time to drain.
The shutdown command will exit(2) and as such will end up closing
stdout. Let's say this close will come in second, bump t_revokecnt
and call tty_wakeup(). This has tty_wait() return prematurely and
the next thing that will happen is that the thread doing revoke(2)
will flush the TTY. Since the drain wasn't complete, the flush will
effectively drop whatever is left in t_outq.

This change takes into account that tty_drain() will return ERESTART
due to the fact that t_revokecnt was bumped and in that case simply
call tty_drain() again. The thread in question is already performing
the close so it can safely finish draining the TTY before destroying
the TTY structure.

Now all messages from shutdown will be printed on the serial console.

Obtained from:	Juniper Networks, Inc.
2013-12-16 00:50:14 +00:00
Pawel Jakub Dawidek
007e4f41a7 Regenerate after r259438. 2013-12-15 23:20:26 +00:00
Pawel Jakub Dawidek
82845da3fa Fix syscalls that can be loaded as kernel modules - they were not given
the flag allowing to call them from capability mode sandbox.

Noticed by:	David Drysdale <drysdale@google.com>
2013-12-15 23:19:42 +00:00
Pawel Jakub Dawidek
61a9fc8fe2 Regenerate after r259436. 2013-12-15 23:15:12 +00:00
Pawel Jakub Dawidek
e1e16d2419 Allow for pselect(2) in capability mode.
Noticed by:	David Drysdale <drysdale@google.com>
2013-12-15 23:14:27 +00:00