Commit Graph

14208 Commits

Author SHA1 Message Date
mav
31390ca675 Add vmem locking to r281026.
While races there are not fatal, they cause result underestimation, that
cause unneeded ARC reclaims.

MFC after:	1 month
2015-04-05 14:17:26 +00:00
kib
4f422b68f1 Restore proper error from oshmctl(2), used by COMPAT_43, when the
segment cannot be found.  Broken by r280323.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2015-04-04 23:56:38 +00:00
jilles
308eebbd52 utimensat: Correct Capsicum required capability rights. 2015-04-04 21:47:54 +00:00
kib
4aef349dc7 Remove useless initialization.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2015-04-04 08:44:20 +00:00
mav
326429ebdd Make ZFS ARC track both KVA usage and fragmentation.
Even on Illumos, with its much larger KVA, ZFS ARC steps back if KVA usage
reaches certain threshold (3/4 on i386 or 16/17 otherwise).  FreeBSD has
even less KVA, but had no such limit on archs with direct map as amd64.
As result, on machines with a lot of RAM, during load with very small user-
space memory pressure, such as `zfs send`, it was possible to reach state,
when there is enough both physical RAM and KVA (I've seen up to 25-30%),
but no continuous KVA range to allocate even single 128KB I/O request.

Address this situation from two sides:
 - restore KVA usage limitations in a way the most close to Illumos;
 - introduce new requirement for KVA fragmentation, specifying that we
should have at least one sequential KVA range of zfs_max_recordsize bytes.

Experiments show that first limitation done alone is not sufficient.  On
machine with 64GB of RAM it is sometimes needed to drop up to half of ARC
size to get at leats one 1MB KVA chunk.  Statically limiting ARC to half
of KVA/RAM is too strict, so second limitation makes it to work in cycles:
accumulate trash up to certain critical mass, do massive spring-cleaning,
and then start littering again. :)

MFC after:	1 month
2015-04-03 14:45:48 +00:00
kib
75d8fc688e Speed up symbol lookup for the amd64 kernel modules.
Amd64 uses relocatable object files as the modules format.  It is good
WRT not having unneeded overhead for PIC code, in particular, due to
absence of useless GOT and PLT.  But the cost is that the module
linking process cannot use hash to speed up the symbol lookup, and
that each reference to the symbol requiring a relocation, instead of
single-place relocation in GOT.

Cache the successfull symbol lookup results in the module symbol
table, using the newly allocated SHN_FBSD_CACHED value from
SHN_LOOS-HIOS range as an indicator.  The SHN_FBSD_CACHED together
with the non-existent definition of the found symbol are reverted
after successfull relocations, which is done under kld_sx lock, so it
should not be visible to other consumers of the symbol table.

Submitted by:	Conrad Meyer
Differential Revision:  https://reviews.freebsd.org/D1718
MFC after:	3 weeks
2015-04-02 20:14:51 +00:00
rstone
57feb6fb43 Fix integer truncation bug in malloc(9)
A couple of internal functions used by malloc(9) and uma truncated
a size_t down to an int.  This could cause any number of issues
(e.g. indefinite sleeps, memory corruption) if any kernel
subsystem tried to allocate 2GB or more through malloc.  zfs would
attempt such an allocation when run on a system with 2TB or more
of RAM.

Note to self: When this is MFCed, sparc64 needs the same fix.

Differential revision:	https://reviews.freebsd.org/D2106
Reviewed by:	kib
Reported by:	Michael Fuckner <michael@fuckner.net>
Tested by:	Michael Fuckner <michael@fuckner.net>
MFC after:	2 weeks
2015-04-01 12:42:26 +00:00
rrs
59d85f7f18 Adopt jhb's suggested changes, updated comments and callout_migration() moving
to kern/kern_timeout.c

This does *not* address his -1 -> NOCPU comment.

Sponsored by:	Netflix Inc.
2015-03-31 00:18:00 +00:00
glebius
e4390a8823 Catch up on r271387 and remove unused parameter from
VOP_GETPAGES_ASYNC().
2015-03-30 22:49:26 +00:00
mav
eda1855b80 Periodically wake up threads waiting for vmem(9) resources, so they could
ask for resource reclamation again.

This is kind of dirty hack, but as last resort this is better then stuck
indefinitely because of KVA fragmentation, waiting until some random event
free something sufficient.  OpenSolaris also has this hack in its vmem(9).

MFC after:	2 weeks
2015-03-30 13:30:53 +00:00
mav
ae6af27af5 Add four new DDB commands to display vmem(9) statistics.
In particular, such DDB commands were added:
        show vmem <addr>
        show all vmem
        show vmemdump <addr>
        show all vmemdump

As possible usage, that allows to see KVA usage and fragmentation.
2015-03-29 10:02:29 +00:00
kib
f389aa239b Make debug.vmem_check a tunable. It is useful to set it early.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-03-28 23:30:51 +00:00
vangyzen
3aacc80f6d Clean up some cosmetic nits in kern_umtx.c, found during recent work
in this area and by the Clang static analyzer.

Remove some dead assignments.

Fix a typo in a panic string.

Use umtx_pi_disown() instead of duplicate code.

Use an existing variable instead of curthread.

Approved by:	kib (mentor)
MFC after:	3 days
Sponsored by:	Dell Inc
2015-03-28 21:21:40 +00:00
bz
edcfd40470 Try to unbreak !SMP kernels broken in r280785 by using the proper macros
to access cc_cpu.
2015-03-28 15:07:19 +00:00
rrs
8e2d411633 Change the callout to supply -1 to indicate we are not changing
CPU, also add protection against invalid CPU's as well as
split c_flags and c_iflags so that if a user plays with the active
flag (the one expected to be played with by callers in MPSAFE) without
a lock, it won't adversely affect the callout system by causing a corrupt
list. This also means that all callers need to use the macros and *not*
play with the falgs directly (like netgraph used to).

Differential Revision: htts://reviews.freebsd.org/D1894
Reviewed by: .. timed out but looked at by jhb, imp, adrian hselasky
             tested by hiren and netflix.
Sponsored by:	Netflix Inc.
2015-03-28 12:50:24 +00:00
hselasky
e09860cab6 Implement a simple OID number garbage collector. Given the increasing
number of dynamically created and destroyed SYSCTLs during runtime it
is very likely that the current new OID number limit of 0x7fffffff can
be reached. Especially if dynamic OID creation and destruction results
from automatic tests. Additional changes:

- Optimize the typical use case by decrementing the next automatic OID
sequence number instead of incrementing it. This saves searching time
when inserting new OIDs into a fresh parent OID node.

- Add simple check for duplicate non-automatic OID numbers.

MFC after:  1 week
2015-03-25 08:55:34 +00:00
hselasky
d3609225f5 Make sure tunable sysctls are only fetched once. The existing code can
re-register sysctls when destroying sysctl contexts or when moving
sysctls from one tree to another.
2015-03-24 17:42:53 +00:00
glebius
baf4ea8ca8 Do not include if_var.h and in6_var.h into kern_jail.c. It is now possible
after r280444.

Sponsored by:	Nginx, Inc.
2015-03-24 16:46:40 +00:00
hselasky
56eb62091a Correct string pointer offset for error printout. 2015-03-24 16:37:19 +00:00
rpaulo
99d21d4e76 Disable coredump_devctl because it could lead to leaking paths to
jails.
2015-03-24 02:17:17 +00:00
mjg
ce00fb8c10 filedesc: microoptimize fget_unlocked by getting rid of fd < 0 branch
Casting fd to an unsigned type simplifies fd range coparison to mere checking
if the result is bigger than the table.
2015-03-24 00:10:11 +00:00
ian
7bc5de3287 The sysctls that return process argv and envv return binary data, so clear
the SBUF_INCLUDENUL flag.

Pointed out by:	    tijl@
2015-03-22 21:18:44 +00:00
hselasky
be80af9c98 Fix for out of order device destruction notifications when using the
delist_dev() function. In addition to this change:
- add a proper description of this function
- add a proper witness assert inside this function
- switch a nearby line to use the "cdp" pointer instead of cdev2priv()

MFC after:	3 days
2015-03-22 13:11:56 +00:00
mjg
818b12be14 proc: use MTX_NEW flag in proc_init
This allows us to get rid of bzero which was added specifically to make
mtx_init on p_mtx reliable.

This also fixes a potential problem where mtx_init on other mutexes
could trip over on unitialized memory and fire an assertion.

Reviewed by:	kib
2015-03-21 20:25:34 +00:00
mjg
b6e838d488 cred: add proc_set_cred_init helper
proc_set_cred_init can be used to set first credentials of a new
process.

Update proc_set_cred assertions so that it only expects already used
processes.

This fixes panics where p_ucred of a new process happens to be non-NULL.

Reviewed by:	kib
2015-03-21 20:24:54 +00:00
mjg
1cd59e2b5d fork: assign refed credentials earlier
Prior to this change the kernel would take p1's credentials and assign
them tempororarily to p2. But p1 could change credentials at that time
and in effect give us a use-after-free.

No objections from: kib
2015-03-21 20:24:03 +00:00
alc
b131a2abc8 Introduce vm_object_color() and use it in mmap(2) to set the color of
named objects to zero before the virtual address is selected.  Previously,
the color setting was delayed until after the virtual address was
selected.  In rtld, this delay effectively prevented the mapping of a
shared library's code section using superpages.  Now, for example, we see
the first 1 MB of libc's code on armv6 mapped by a superpage after we've
gotten through the initial cold misses that bring the first 1 MB of code
into memory.  (With the page clustering that we perform on read faults,
this happens quickly.)

Differential Revision:	https://reviews.freebsd.org/D2013
Reviewed by:	jhb, kib
Tested by:	Svatopluk Kraus (armv6)
MFC after:	6 weeks
2015-03-21 17:56:55 +00:00
cognet
2ff418ff42 error is only used if MAC is defined, so make its declaration conditional
as well.
2015-03-21 16:16:17 +00:00
kib
0f36a66254 Somewhat modernize the SysV shm code:
- Use real locking, replace Giant with global sx protecting the
  subsystem.  Since the subsystem' lock is no longer dropped during
  the sleepsk, remove not needed SHMSEG_WANTED segment flag, and
  revert r278963.
- To do proper code simplification possible after the change of the
  lock, restructure several functions into _locked body and
  originally-named wrapper which calls into _locked variant.  This
  allows to eliminate the 'goto done2' spread over the code.
- Merge shm_find_segment_by_shmid() and shm_find_segment_by_shmidx().
- Consistently change all function prototypes to ANSI C.

Reviewed by:	mjg (who has earlier version of the similar patch to
	 introduce real locking)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-03-21 15:01:19 +00:00
mjg
98bf77d34d coredump: protect corefilename access with a lock
Previously format string traversal could happen while the string itself was
being modified.

Use allproc_lock as coredumping is a rare operation and as such we don't
have to create a dedicated lock.

Submitted by:	Tiwei Bie <btw mail.ustc.edu.cn>
Reviewed by:	kib
X-Additional:	JuniorJobs project
2015-03-21 04:39:33 +00:00
ian
16585e4442 The minimum sbuf buffer size is 2 bytes (a byte plus a nulterm), assert that.
Values smaller than two lead to strange asserts that have nothing to do
with the actual problem (in the case of size=0), or to writing beyond the
end of the allocated buffer in sbuf_finish() (in the case of size=1).
2015-03-17 21:00:31 +00:00
ian
f8efd87631 In sbuf_new_for_sysctl(), default the buffer size to 64 bytes if the
passed-in pointer is NULL and the length is zero.
2015-03-17 20:56:24 +00:00
glebius
256f4dbc63 Reduce header pollution. 2015-03-17 14:16:50 +00:00
benno
05fdbce0f6 Reset bp->bio_done to unmapped_buf when removing a transient map in biodone.
Submitted by:	Scott Ferris <scott.ferris@isilon.com>
Sponsored by:	EMC / Isilon Storage Division
Reviewed by:	kib
2015-03-16 20:00:09 +00:00
ian
7934731b66 Trivial change / forced-commit to document prior change that slipped in
without a commit message...

Use sbuf_new() + SYSCTL_OUT() instead of wiring the userland buffer and
using sbuf_new_for_sysctl().  The preallocated 256 byte buffer is always
going to be big enough to hold these results, and this should be more
efficient than wiring the old buffer.
2015-03-16 19:29:19 +00:00
ian
b1f6b456b1 2015-03-16 19:25:03 +00:00
ian
6158729687 Use a regular sbuf + SYSCTL_OUT() rather than sbuf_new_for_sysctl() with
auto-draining, to avoid a potential copyout fault while holding a lock.

Pointed out by:	  jhb
Pointy hat to:	  ian
2015-03-16 19:18:45 +00:00
ian
ff87dec3ce Update an sbuf assertion to allow for the new SBUF_INCLUDENUL flag. If
INCLUDENUL is set and sbuf_finish() has been called, the length has been
incremented to count the nulterm byte, and in that case current length is
allowed to be equal to buffer size, otherwise it must be less than.

Add a predicate macro to test for SBUF_INCLUDENUL, and use it in tests, to
be consistant with the style in the rest of this file.
2015-03-16 17:45:41 +00:00
mjg
8c033d13b5 proc: get rid of proc lock + unlock pair in proc_reap
A comment in the code stated we PROC_LOCK and as a side effect guarantee
all writers released process lock. But at that point such lock was already
taken while we were removing the process from all lists, so it should be already
unreachable.
2015-03-16 01:09:49 +00:00
mjg
054f9cab59 cred: add proc_set_cred helper
The goal here is to provide one place altering process credentials.

This eases debugging and opens up posibilities to do additional work when such
an action is performed.
2015-03-16 00:10:03 +00:00
ian
e497716fbd Add a nulterm byte to the returned sysctl string.
PR:		195668
2015-03-15 00:39:18 +00:00
ian
cb2eadc65f Include the nulterm byte in the sysctl string.
PR:		195668
2015-03-15 00:36:08 +00:00
ian
6ded6cb790 Use sbuf_printf() for sysctl strings instead of stack buffers and snprintf(). 2015-03-14 23:16:12 +00:00
ian
f4042c6120 Use SYSCTL_OUT_STR() to return strings.
PR:		195668
2015-03-14 21:40:01 +00:00
ian
adc83897f5 Use sbuf_new_for_sysctl() instead of plain sbuf_new() to ensure sysctl
string returned to userland is nulterminated.

PR:           195668
2015-03-14 18:46:33 +00:00
ian
56e931511b Use sbuf_new_for_sysctl() instead of plain sbuf_new() to ensure sysctl
string returned to userland is nulterminated.

PR:           195668
2015-03-14 18:42:30 +00:00
ian
0dd684d23f Set the SBUF_INCLUDENUL flag in sbuf_new_for_sysctl() so that sysctl
strings returned to userland include the nulterm byte.

Some uses of sbuf_new_for_sysctl() write binary data rather than strings;
clear the SBUF_INCLUDENUL flag after calling sbuf_new_for_sysctl() in
those cases.  (Note that the sbuf code still automatically adds a nulterm
byte in sbuf_finish(), but since it's not included in the length it won't
get copied to userland along with the binary data.)

Remove explicit adding of a nulterm byte in a couple places now that it
gets done automatically by the sbuf drain code.

PR:		195668
2015-03-14 17:08:28 +00:00
ian
f3eb74c490 Add a new flag, SBUF_INCLUDENUL, and new get/set/clear functions for flags.
The SBUF_INCLUDENUL flag causes the nulterm byte at the end of the string
to be counted in the length of the data.  If copying the data using the
sbuf_data() and sbuf_len() functions, or if writing it automatically with
a drain function, the net effect is that the nulterm byte is copied along
with the rest of the data.
2015-03-14 16:02:11 +00:00
hselasky
de871211fe Factor out mbuf hashing code from LAGG driver so that other network
drivers can use it. This avoids some code duplication. Add missing
default case to all switch statements while at it. Also move the
hashing of the IPv6 flow field to layer 4 because the IPv6 flow field
is constant on a per L4 connection basis and not on a per L3 network.

Differential Revision:	https://reviews.freebsd.org/D1987
Sponsored by:		Mellanox Technologies
MFC after:		1 month
2015-03-11 16:02:24 +00:00
rstone
a76e348a4b Fix SR-IOV passthrough devices to allow ppt to attach
A late change to the SR-IOV infrastructure broke passthrough of
VFs.  device_set_devclass() was being used to try to force the
ppt driver to attach to the device, but this didn't work because
the DF_FIXEDCLASS flag wasn't being set on the device, so the
ppt driver probe routine would not match when it returned
BUS_NOWILDCARD.  Fix this by adding a new device function that
both sets the devclass and sets the DF_FIXEDCLASS flag, and use
that to force the ppt driver to attach to VFs.

Differential Revision: https://reviews.freebsd.org/D2041
Reviewed by:	jhb
MFC after:	3 weeks
2015-03-10 23:27:13 +00:00