Commit Graph

14191 Commits

Author SHA1 Message Date
Rick Macklem
dda11d4ab9 File systems that do not use the buffer cache (such as ZFS) must
use VOP_FSYNC() to perform the NFS server's Commit operation.
This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which
is set by file systems that use the buffer cache. If this flag
is not set, the NFS server always does a VOP_FSYNC().
This should be ok for old file system modules that do not set
MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although
it might not be optimal for file systems that use the buffer cache.

Reviewed by:	kib
MFC after:	2 weeks
2015-04-15 20:16:31 +00:00
Neel Natu
1a688aa53e Fix handling of BUS_PROBE_NOWILDCARD in 'device_probe_child()'.
Device probe value of BUS_PROBE_NOWILDCARD should be treated specially only
if the device has a fixed devclass. Otherwise it should be interpreted just
as if the driver doesn't want to claim the device.

Prior to this change a device that was not claimed explicitly by its driver
would remain "attached" to the driver that returned BUS_PROBE_NOWILDCARD.
This would bump up the reference on 'driver->refs' and its 'dev->ops' would
point to the 'driver->ops'. When the driver is subsequently unloaded the
'dev->ops->cls' is left pointing to freed memory.

This fixes an easily reproducible #GP fault caused by loading and unloading
vmm.ko multiple times.

Differential Revision:	https://reviews.freebsd.org/D2294
Reviewed by:	imp, jhb
Discussed with:	rstone
Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	2 weeks
2015-04-15 16:22:05 +00:00
Edward Tomasz Napierala
1c73bcab8e Rewrite linprocfs_domtab() as a wrapper around kern_getfsstat(). This
adds missing jail and MAC checks.

Differential Revision:	https://reviews.freebsd.org/D2193
Reviewed by:	kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-04-15 09:13:11 +00:00
Konstantin Belousov
316b384343 Implement support for binary to requesting specific stack size for the
initial thread.  It is read by the ELF image activator as the virtual
size of the PT_GNU_STACK program header entry, and can be specified by
the linker option -z stack-size in newer binutils.

The soft RLIMIT_STACK is auto-increased if possible, to satisfy the
binary' request.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-04-15 08:13:53 +00:00
George V. Neville-Neil
63bf240482 When a kernel has DEVICE_POLLING turned on but no drivers have
the capability do not try to take the mutex at all.

Replaces misbegotten attempt from reverted commit 281276

Pointed out by: glebius
Sponsored by: Rubicon Communications (Netgate)
Differential Revision:	https://reviews.freebsd.org/D2262
2015-04-14 14:22:34 +00:00
Randall Stewart
b132edb56f Fix my stupid restoral of old code.. must be c_iflags now.
Thanks jhb for catching my stupidity...
MFC after:	3 days
2015-04-14 00:02:39 +00:00
Randall Stewart
07a2df5d83 Restore the two lines accidentally deleted that allow CALLOUT_DIRECT to be
specifed in the flags.

Thanks Mark Johnston for noticing this ;-o

MFC after:	3 days
2015-04-13 23:06:13 +00:00
Will Andrews
6311d7aaf0 uiomove_object_page(): Avoid instantiating pages in sparse regions on reads.
Check whether the page being requested is either resident or on swap.  If
not, read from the zero_region instead of instantiating an unnecessary page.

This avoids consuming memory for sparse files on tmpfs, when they are read
by applications that do not use SEEK_HOLE/SEEK_DATA (which is most of them).

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Spectra Logic
2015-04-11 18:51:41 +00:00
Mateusz Guzik
2574218578 Replace struct filedesc argument in getsock_cap with struct thread
This is is a step towards removal of spurious arguments.
2015-04-11 16:00:33 +00:00
Mateusz Guzik
90f54cbfeb fd: remove filedesc argument from fdclose
Just accept a thread instead. This makes it consistent with fdalloc.

No functional changes.
2015-04-11 15:40:28 +00:00
Alexander Motin
cdd09fea28 Add vmem locking to r281026.
While races there are not fatal, they cause result underestimation, that
cause unneeded ARC reclaims.

MFC after:	1 month
2015-04-05 14:17:26 +00:00
Konstantin Belousov
4cfc037c30 Restore proper error from oshmctl(2), used by COMPAT_43, when the
segment cannot be found.  Broken by r280323.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2015-04-04 23:56:38 +00:00
Jilles Tjoelker
78d75aba77 utimensat: Correct Capsicum required capability rights. 2015-04-04 21:47:54 +00:00
Konstantin Belousov
0122d251bf Remove useless initialization.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2015-04-04 08:44:20 +00:00
Alexander Motin
2e9ccb32a1 Make ZFS ARC track both KVA usage and fragmentation.
Even on Illumos, with its much larger KVA, ZFS ARC steps back if KVA usage
reaches certain threshold (3/4 on i386 or 16/17 otherwise).  FreeBSD has
even less KVA, but had no such limit on archs with direct map as amd64.
As result, on machines with a lot of RAM, during load with very small user-
space memory pressure, such as `zfs send`, it was possible to reach state,
when there is enough both physical RAM and KVA (I've seen up to 25-30%),
but no continuous KVA range to allocate even single 128KB I/O request.

Address this situation from two sides:
 - restore KVA usage limitations in a way the most close to Illumos;
 - introduce new requirement for KVA fragmentation, specifying that we
should have at least one sequential KVA range of zfs_max_recordsize bytes.

Experiments show that first limitation done alone is not sufficient.  On
machine with 64GB of RAM it is sometimes needed to drop up to half of ARC
size to get at leats one 1MB KVA chunk.  Statically limiting ARC to half
of KVA/RAM is too strict, so second limitation makes it to work in cycles:
accumulate trash up to certain critical mass, do massive spring-cleaning,
and then start littering again. :)

MFC after:	1 month
2015-04-03 14:45:48 +00:00
Konstantin Belousov
2832cd544f Speed up symbol lookup for the amd64 kernel modules.
Amd64 uses relocatable object files as the modules format.  It is good
WRT not having unneeded overhead for PIC code, in particular, due to
absence of useless GOT and PLT.  But the cost is that the module
linking process cannot use hash to speed up the symbol lookup, and
that each reference to the symbol requiring a relocation, instead of
single-place relocation in GOT.

Cache the successfull symbol lookup results in the module symbol
table, using the newly allocated SHN_FBSD_CACHED value from
SHN_LOOS-HIOS range as an indicator.  The SHN_FBSD_CACHED together
with the non-existent definition of the found symbol are reverted
after successfull relocations, which is done under kld_sx lock, so it
should not be visible to other consumers of the symbol table.

Submitted by:	Conrad Meyer
Differential Revision:  https://reviews.freebsd.org/D1718
MFC after:	3 weeks
2015-04-02 20:14:51 +00:00
Ryan Stone
f2c2231e0c Fix integer truncation bug in malloc(9)
A couple of internal functions used by malloc(9) and uma truncated
a size_t down to an int.  This could cause any number of issues
(e.g. indefinite sleeps, memory corruption) if any kernel
subsystem tried to allocate 2GB or more through malloc.  zfs would
attempt such an allocation when run on a system with 2TB or more
of RAM.

Note to self: When this is MFCed, sparc64 needs the same fix.

Differential revision:	https://reviews.freebsd.org/D2106
Reviewed by:	kib
Reported by:	Michael Fuckner <michael@fuckner.net>
Tested by:	Michael Fuckner <michael@fuckner.net>
MFC after:	2 weeks
2015-04-01 12:42:26 +00:00
Randall Stewart
403df7a672 Adopt jhb's suggested changes, updated comments and callout_migration() moving
to kern/kern_timeout.c

This does *not* address his -1 -> NOCPU comment.

Sponsored by:	Netflix Inc.
2015-03-31 00:18:00 +00:00
Gleb Smirnoff
f6d6b5e262 Catch up on r271387 and remove unused parameter from
VOP_GETPAGES_ASYNC().
2015-03-30 22:49:26 +00:00
Alexander Motin
43329ffcc8 Periodically wake up threads waiting for vmem(9) resources, so they could
ask for resource reclamation again.

This is kind of dirty hack, but as last resort this is better then stuck
indefinitely because of KVA fragmentation, waiting until some random event
free something sufficient.  OpenSolaris also has this hack in its vmem(9).

MFC after:	2 weeks
2015-03-30 13:30:53 +00:00
Alexander Motin
b308aaed27 Add four new DDB commands to display vmem(9) statistics.
In particular, such DDB commands were added:
        show vmem <addr>
        show all vmem
        show vmemdump <addr>
        show all vmemdump

As possible usage, that allows to see KVA usage and fragmentation.
2015-03-29 10:02:29 +00:00
Konstantin Belousov
eeb697c8e9 Make debug.vmem_check a tunable. It is useful to set it early.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-03-28 23:30:51 +00:00
Eric van Gyzen
e858b027db Clean up some cosmetic nits in kern_umtx.c, found during recent work
in this area and by the Clang static analyzer.

Remove some dead assignments.

Fix a typo in a panic string.

Use umtx_pi_disown() instead of duplicate code.

Use an existing variable instead of curthread.

Approved by:	kib (mentor)
MFC after:	3 days
Sponsored by:	Dell Inc
2015-03-28 21:21:40 +00:00
Bjoern A. Zeeb
a04d412295 Try to unbreak !SMP kernels broken in r280785 by using the proper macros
to access cc_cpu.
2015-03-28 15:07:19 +00:00
Randall Stewart
15b1eb142c Change the callout to supply -1 to indicate we are not changing
CPU, also add protection against invalid CPU's as well as
split c_flags and c_iflags so that if a user plays with the active
flag (the one expected to be played with by callers in MPSAFE) without
a lock, it won't adversely affect the callout system by causing a corrupt
list. This also means that all callers need to use the macros and *not*
play with the falgs directly (like netgraph used to).

Differential Revision: htts://reviews.freebsd.org/D1894
Reviewed by: .. timed out but looked at by jhb, imp, adrian hselasky
             tested by hiren and netflix.
Sponsored by:	Netflix Inc.
2015-03-28 12:50:24 +00:00
Hans Petter Selasky
38668c6044 Implement a simple OID number garbage collector. Given the increasing
number of dynamically created and destroyed SYSCTLs during runtime it
is very likely that the current new OID number limit of 0x7fffffff can
be reached. Especially if dynamic OID creation and destruction results
from automatic tests. Additional changes:

- Optimize the typical use case by decrementing the next automatic OID
sequence number instead of incrementing it. This saves searching time
when inserting new OIDs into a fresh parent OID node.

- Add simple check for duplicate non-automatic OID numbers.

MFC after:  1 week
2015-03-25 08:55:34 +00:00
Hans Petter Selasky
502702c644 Make sure tunable sysctls are only fetched once. The existing code can
re-register sysctls when destroying sysctl contexts or when moving
sysctls from one tree to another.
2015-03-24 17:42:53 +00:00
Gleb Smirnoff
a2d4a7e456 Do not include if_var.h and in6_var.h into kern_jail.c. It is now possible
after r280444.

Sponsored by:	Nginx, Inc.
2015-03-24 16:46:40 +00:00
Hans Petter Selasky
ab91c9a743 Correct string pointer offset for error printout. 2015-03-24 16:37:19 +00:00
Rui Paulo
0da9e11b7e Disable coredump_devctl because it could lead to leaking paths to
jails.
2015-03-24 02:17:17 +00:00
Mateusz Guzik
ea926658ff filedesc: microoptimize fget_unlocked by getting rid of fd < 0 branch
Casting fd to an unsigned type simplifies fd range coparison to mere checking
if the result is bigger than the table.
2015-03-24 00:10:11 +00:00
Ian Lepore
296f235de0 The sysctls that return process argv and envv return binary data, so clear
the SBUF_INCLUDENUL flag.

Pointed out by:	    tijl@
2015-03-22 21:18:44 +00:00
Hans Petter Selasky
2793ea13aa Fix for out of order device destruction notifications when using the
delist_dev() function. In addition to this change:
- add a proper description of this function
- add a proper witness assert inside this function
- switch a nearby line to use the "cdp" pointer instead of cdev2priv()

MFC after:	3 days
2015-03-22 13:11:56 +00:00
Mateusz Guzik
f97af9706b proc: use MTX_NEW flag in proc_init
This allows us to get rid of bzero which was added specifically to make
mtx_init on p_mtx reliable.

This also fixes a potential problem where mtx_init on other mutexes
could trip over on unitialized memory and fire an assertion.

Reviewed by:	kib
2015-03-21 20:25:34 +00:00
Mateusz Guzik
ffb34484ee cred: add proc_set_cred_init helper
proc_set_cred_init can be used to set first credentials of a new
process.

Update proc_set_cred assertions so that it only expects already used
processes.

This fixes panics where p_ucred of a new process happens to be non-NULL.

Reviewed by:	kib
2015-03-21 20:24:54 +00:00
Mateusz Guzik
12cec311e6 fork: assign refed credentials earlier
Prior to this change the kernel would take p1's credentials and assign
them tempororarily to p2. But p1 could change credentials at that time
and in effect give us a use-after-free.

No objections from: kib
2015-03-21 20:24:03 +00:00
Alan Cox
3d653db063 Introduce vm_object_color() and use it in mmap(2) to set the color of
named objects to zero before the virtual address is selected.  Previously,
the color setting was delayed until after the virtual address was
selected.  In rtld, this delay effectively prevented the mapping of a
shared library's code section using superpages.  Now, for example, we see
the first 1 MB of libc's code on armv6 mapped by a superpage after we've
gotten through the initial cold misses that bring the first 1 MB of code
into memory.  (With the page clustering that we perform on read faults,
this happens quickly.)

Differential Revision:	https://reviews.freebsd.org/D2013
Reviewed by:	jhb, kib
Tested by:	Svatopluk Kraus (armv6)
MFC after:	6 weeks
2015-03-21 17:56:55 +00:00
Olivier Houchard
d8d2f47629 error is only used if MAC is defined, so make its declaration conditional
as well.
2015-03-21 16:16:17 +00:00
Konstantin Belousov
0555fb3523 Somewhat modernize the SysV shm code:
- Use real locking, replace Giant with global sx protecting the
  subsystem.  Since the subsystem' lock is no longer dropped during
  the sleepsk, remove not needed SHMSEG_WANTED segment flag, and
  revert r278963.
- To do proper code simplification possible after the change of the
  lock, restructure several functions into _locked body and
  originally-named wrapper which calls into _locked variant.  This
  allows to eliminate the 'goto done2' spread over the code.
- Merge shm_find_segment_by_shmid() and shm_find_segment_by_shmidx().
- Consistently change all function prototypes to ANSI C.

Reviewed by:	mjg (who has earlier version of the similar patch to
	 introduce real locking)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-03-21 15:01:19 +00:00
Mateusz Guzik
5bc0ff888a coredump: protect corefilename access with a lock
Previously format string traversal could happen while the string itself was
being modified.

Use allproc_lock as coredumping is a rare operation and as such we don't
have to create a dedicated lock.

Submitted by:	Tiwei Bie <btw mail.ustc.edu.cn>
Reviewed by:	kib
X-Additional:	JuniorJobs project
2015-03-21 04:39:33 +00:00
Ian Lepore
612d9391a4 The minimum sbuf buffer size is 2 bytes (a byte plus a nulterm), assert that.
Values smaller than two lead to strange asserts that have nothing to do
with the actual problem (in the case of size=0), or to writing beyond the
end of the allocated buffer in sbuf_finish() (in the case of size=1).
2015-03-17 21:00:31 +00:00
Ian Lepore
2834924513 In sbuf_new_for_sysctl(), default the buffer size to 64 bytes if the
passed-in pointer is NULL and the length is zero.
2015-03-17 20:56:24 +00:00
Gleb Smirnoff
8c4df6296b Reduce header pollution. 2015-03-17 14:16:50 +00:00
Benno Rice
43348dc2ad Reset bp->bio_done to unmapped_buf when removing a transient map in biodone.
Submitted by:	Scott Ferris <scott.ferris@isilon.com>
Sponsored by:	EMC / Isilon Storage Division
Reviewed by:	kib
2015-03-16 20:00:09 +00:00
Ian Lepore
f62fbd30cb Trivial change / forced-commit to document prior change that slipped in
without a commit message...

Use sbuf_new() + SYSCTL_OUT() instead of wiring the userland buffer and
using sbuf_new_for_sysctl().  The preallocated 256 byte buffer is always
going to be big enough to hold these results, and this should be more
efficient than wiring the old buffer.
2015-03-16 19:29:19 +00:00
Ian Lepore
ff352d8978 2015-03-16 19:25:03 +00:00
Ian Lepore
ba00885515 Use a regular sbuf + SYSCTL_OUT() rather than sbuf_new_for_sysctl() with
auto-draining, to avoid a potential copyout fault while holding a lock.

Pointed out by:	  jhb
Pointy hat to:	  ian
2015-03-16 19:18:45 +00:00
Ian Lepore
8d5628fdb8 Update an sbuf assertion to allow for the new SBUF_INCLUDENUL flag. If
INCLUDENUL is set and sbuf_finish() has been called, the length has been
incremented to count the nulterm byte, and in that case current length is
allowed to be equal to buffer size, otherwise it must be less than.

Add a predicate macro to test for SBUF_INCLUDENUL, and use it in tests, to
be consistant with the style in the rest of this file.
2015-03-16 17:45:41 +00:00
Mateusz Guzik
fbe503d462 proc: get rid of proc lock + unlock pair in proc_reap
A comment in the code stated we PROC_LOCK and as a side effect guarantee
all writers released process lock. But at that point such lock was already
taken while we were removing the process from all lists, so it should be already
unreachable.
2015-03-16 01:09:49 +00:00
Mateusz Guzik
daf63fd2f9 cred: add proc_set_cred helper
The goal here is to provide one place altering process credentials.

This eases debugging and opens up posibilities to do additional work when such
an action is performed.
2015-03-16 00:10:03 +00:00