Commit Graph

14141 Commits

Author SHA1 Message Date
Ian Lepore
e5197e3a08 Add a nulterm byte to the returned sysctl string.
PR:		195668
2015-03-15 00:39:18 +00:00
Ian Lepore
657282e062 Include the nulterm byte in the sysctl string.
PR:		195668
2015-03-15 00:36:08 +00:00
Ian Lepore
91d9eda200 Use sbuf_printf() for sysctl strings instead of stack buffers and snprintf(). 2015-03-14 23:16:12 +00:00
Ian Lepore
acfc962f82 Use SYSCTL_OUT_STR() to return strings.
PR:		195668
2015-03-14 21:40:01 +00:00
Ian Lepore
b773372938 Use sbuf_new_for_sysctl() instead of plain sbuf_new() to ensure sysctl
string returned to userland is nulterminated.

PR:           195668
2015-03-14 18:46:33 +00:00
Ian Lepore
b97fa22cd6 Use sbuf_new_for_sysctl() instead of plain sbuf_new() to ensure sysctl
string returned to userland is nulterminated.

PR:           195668
2015-03-14 18:42:30 +00:00
Ian Lepore
1eafc07856 Set the SBUF_INCLUDENUL flag in sbuf_new_for_sysctl() so that sysctl
strings returned to userland include the nulterm byte.

Some uses of sbuf_new_for_sysctl() write binary data rather than strings;
clear the SBUF_INCLUDENUL flag after calling sbuf_new_for_sysctl() in
those cases.  (Note that the sbuf code still automatically adds a nulterm
byte in sbuf_finish(), but since it's not included in the length it won't
get copied to userland along with the binary data.)

Remove explicit adding of a nulterm byte in a couple places now that it
gets done automatically by the sbuf drain code.

PR:		195668
2015-03-14 17:08:28 +00:00
Ian Lepore
f4d281428f Add a new flag, SBUF_INCLUDENUL, and new get/set/clear functions for flags.
The SBUF_INCLUDENUL flag causes the nulterm byte at the end of the string
to be counted in the length of the data.  If copying the data using the
sbuf_data() and sbuf_len() functions, or if writing it automatically with
a drain function, the net effect is that the nulterm byte is copied along
with the rest of the data.
2015-03-14 16:02:11 +00:00
Hans Petter Selasky
b7ba031ff7 Factor out mbuf hashing code from LAGG driver so that other network
drivers can use it. This avoids some code duplication. Add missing
default case to all switch statements while at it. Also move the
hashing of the IPv6 flow field to layer 4 because the IPv6 flow field
is constant on a per L4 connection basis and not on a per L3 network.

Differential Revision:	https://reviews.freebsd.org/D1987
Sponsored by:		Mellanox Technologies
MFC after:		1 month
2015-03-11 16:02:24 +00:00
Ryan Stone
1c229658b9 Fix SR-IOV passthrough devices to allow ppt to attach
A late change to the SR-IOV infrastructure broke passthrough of
VFs.  device_set_devclass() was being used to try to force the
ppt driver to attach to the device, but this didn't work because
the DF_FIXEDCLASS flag wasn't being set on the device, so the
ppt driver probe routine would not match when it returned
BUS_NOWILDCARD.  Fix this by adding a new device function that
both sets the devclass and sets the DF_FIXEDCLASS flag, and use
that to force the ppt driver to attach to VFs.

Differential Revision: https://reviews.freebsd.org/D2041
Reviewed by:	jhb
MFC after:	3 weeks
2015-03-10 23:27:13 +00:00
Mark Johnston
aa14e9b7c9 Reimplement support for userland core dump compression using a new interface
in kern_gzio.c. The old gzio interface was somewhat inflexible and has not
worked properly since r272535: currently, the gzio functions are called with
a range lock held on the output vnode, but kern_gzio.c does not pass the
IO_RANGELOCKED flag to vn_rdwr() calls, resulting in deadlock when vn_rdwr()
attempts to reacquire the range lock. Moreover, the new gzio interface can
be used to implement kernel core compression.

This change also modifies the kernel configuration options needed to enable
userland core dump compression support: gzio is now an option rather than a
device, and the COMPRESS_USER_CORES option is removed. Core dump compression
is enabled using the kern.compress_user_cores sysctl/tunable.

Differential Revision:	https://reviews.freebsd.org/D1832
Reviewed by:	rpaulo
Discussed with:	kib
2015-03-09 03:50:53 +00:00
Nathan Whitehorn
5c845fde2e Make 32-bit PowerPC kernels, like 64-bit PowerPC kernels, position-independent
executables. The goal here, not yet accomplished, is to let the e500 kernel
run under QEMU by setting KERNBASE to something that fits in low memory and
then having the kernel relocate itself at runtime.
2015-03-07 20:14:46 +00:00
Hans Petter Selasky
35ee8a4a59 Add mutex support to the pps_ioctl() API in the kernel.
Bump kernel version to reflect structure change.

PR:		196897
MFC after:	1 week
2015-03-07 18:23:32 +00:00
Ryan Stone
4d6a976e37 Move libnv into the kernel and hook it into the kernel build
Differential Revision:		https://reviews.freebsd.org/D1883
Reviewed by:			jfv
MFC after:			1 month
Sponsored by:			Sandvine Inc.
2015-03-01 00:34:27 +00:00
Ryan Stone
3d59729556 Correct the use of an unitialized variable in sendfind_getobj()
When sendfile_getobj() is called on a DTYPE_SHM file, it never
initializes error, which is eventually returned to the caller.

Differential Revision:		https://reviews.freebsd.org/D1989
Reviewed by:			kib
Reported by: 			Brainy Code Scanner, by Maxime Villard.
2015-02-28 21:49:59 +00:00
Ian Lepore
bd96bd15b2 Format the line properly (wrap before column 80). 2015-02-28 17:44:31 +00:00
Ian Lepore
a1a4c1b0d4 Export the new osreldate and osrelease jail parms in jail_get(2). 2015-02-28 17:32:31 +00:00
Konstantin Belousov
13dad10871 The umtx_lock mutex is used by top-half of the kernel, but is
currently a spin lock.  Apparently, the only reason for this is that
umtx_thread_exit() is called under the process spinlock, which put the
requirement on the umtx_lock.  Note that the witness static order list
is wrong for the umtx_lock, umtx_lock is explicitely before any thread
lock, so it is also before sleepq locks.

Change umtx_lock to be the sleepable mutex.  For the reason above, the
calls to umtx_thread_exit() are moved from thread_exit() earlier in
each caller, when the process spin lock is not yet taken.

Discussed with:	jhb
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-02-28 04:19:02 +00:00
Warner Losh
5837276ce2 Put back Andy's void for gcc happiness.
Submitted by:	jchandra@
2015-02-27 23:14:08 +00:00
Warner Losh
b250ad3499 Make sched_random() return an unsigned number, and use uint32_t
consistently. This also matches the per-cpu pointer declaration
anyway.

This changes the tweak we give to the load from -32..31 to be 0..31
which seems more inline with the rest of the code (- rnd and the -=
64). It should also provide the randomness we need, and may fix a
signedness bug in the old code (it isn't clear that the effect was
intentional as opposed to sloppy, and the right shift of a signed
value is undefined to boot).

This stores sched_balance() behavior when it used random().

Differential Revision: https://reviews.freebsd.org/D1981
2015-02-27 21:15:12 +00:00
Konstantin Belousov
08189ed667 The VNASSERT in vflush() FORCECLOSE case is trying to panic early to
prevent errors from yanking devices out from under filesystems.  Only
care about special vnodes on devfs, special nodes on other kinds of
filesystems do not have special properties.

Sponsored by:  EMC / Isilon Storage Division
Submitted by:   Conrad Meyer
MFC after:	1 week
2015-02-27 16:43:50 +00:00
Ian Lepore
b96bd95b85 Allow the kern.osrelease and kern.osreldate sysctl values to be set in a
jail's creation parameters.  This allows the kernel version to be reliably
spoofed within the jail whether examined directly with sysctl or
indirectly with the uname -r and -K options.

The values can only be set at jail creation time, to eliminate the need
for any locking when accessing the values via sysctl.

The overridden values are inherited by nested jails (unless the config for
the nested jails also overrides the values).

There is no sanity or range checking, other than disallowing an empty
release string or a zero release date, by design.  The system
administrator is trusted to set sane values.  Setting values that are
newer than the actual running kernel will likely cause compatibility
problems.

Differential Revision:	https://reviews.freebsd.org/D1948
Relnotes:	yes
2015-02-27 16:28:55 +00:00
Andrew Turner
ccc41f3e66 Fix sched_ule on sparc64, gcc complains sched_random is not a correct
prototype.

Sponsored by:	The FreeBSD Foundation
2015-02-27 15:05:20 +00:00
Andrew Turner
09d0653552 sched_random is only called for SMP, only define it there.
Sponsored by:	The FreeBSD Foundation
2015-02-27 12:38:24 +00:00
Warner Losh
0567b6cc16 Create sched_rand() and move the LCG code into that. Call this when
we need randomness in ULE. This removes random() call from the
rebalance interval code.

Submitted by: Harrison Grundy
Differential Revision: https://reviews.freebsd.org/D1968
2015-02-27 02:56:58 +00:00
Adrian Chadd
75493a82e0 Remove taskqueue_start_threads_pinned(); there's noa generic cpuset version of this.
Sponsored by:	Norse Corp, Inc.
2015-02-25 21:59:03 +00:00
Konstantin Belousov
84b736b268 When failing to claim ownership of a umtx_pi, restore the umutex owner
to its previous, unowned state.  This avoids compounding an existing
problem of inconsistent ownership.

Submitted by:	Eric van Gyzen <eric_van_gyzen@dell.com>
Obtained from:	Dell Inc.
PR:	198914
MFC after:	1 week
2015-02-25 16:17:16 +00:00
Konstantin Belousov
cc876d2c5c When unlocking a contested PI pthread mutex, if the queue of waiters
is empty, look up the umtx_pi and disown it if the current thread owns it.
This can happen if a signal or timeout removed the last waiter from
the queue, but there is still a thread in do_lock_pi() holding a reference
on the umtx_pi.  The unlocking thread might not own the umtx_pi in this case,
but if it does, it must disown it to keep the ownership consistent between
the umtx_pi and the umutex.

Submitted by:	Eric van Gyzen <eric_van_gyzen@dell.com>
	with advice from: Elliott Rabe and Jim Muchow, also at Dell Inc.
Obtained from:	Dell Inc.
PR:	198914
2015-02-25 16:12:56 +00:00
Konstantin Belousov
dacbc9dbe7 Keep a reference on the coredump vnode for vn_fullpath() call. Do it
by moving vn_close() after the point where notification is sent.

Reported by:	sbruno
Tested by:	pho, sbruno
Sponsored by:	The FreeBSD Foundation
2015-02-24 13:07:31 +00:00
Andrey V. Elsukov
e9b70483d1 soreceive_generic() still has similar KASSERT(), therefore instead of
remove KASSERT(), change it to check mbuf isn't NULL.

Suggested by:	kib
MFC after:	1 week
2015-02-23 15:24:43 +00:00
Andrey V. Elsukov
f21684bc75 In some cases soreceive_dgram() can return no data, but has control
message. This can happen when application is sending packets too big
for the path MTU and recvmsg() will return zero (indicating no data)
but there will be a cmsghdr with cmsg_type set to IPV6_PATHMTU.
Remove KASSERT() which does NULL pointer dereference in such case.
Also call m_freem() only when m isn't NULL.

PR:		197882
MFC after:	1 week
Sponsored by:	Yandex LLC
2015-02-23 13:41:35 +00:00
Nathan Whitehorn
c6014c739c Make kernel ELF image parsing not crash for kernels running at locations
other than their link address.
2015-02-21 23:20:05 +00:00
Mark Johnston
7abb0b0922 Don't specify a resid parameter if we're just going to ignore it. Instead,
let vn_rdwr() check for short reads.

MFC after:	3 days
Sponsored by:	EMC / Isilon Storage Division
2015-02-20 20:49:00 +00:00
Mark Johnston
ce47682c6c Remove unnecessary checks for a return value of NULL from M_WAITOK
allocations.

MFC after:	3 days
2015-02-19 03:32:48 +00:00
Mark Johnston
250246706f Free the zlib stream after expanding a compressed CTF section.
Note that this memory would only be leaked once, since CTF info for a kld
file is cached after the first access.

MFC after:	3 days
2015-02-19 03:29:46 +00:00
Konstantin Belousov
1395226703 If malloc() sleeps, Giant is dropped. Recheck for another thread
doing our work.

Remove unneeded check for failed M_WAITOK allocation.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-02-18 18:12:06 +00:00
Mateusz Guzik
8fbda7f00b filedesc: obtain a stable copy of credentials in fget_unlocked
This was broken in r278930.

While here tidy up fget_mmap to use fdp from local var instead of obtaining
the same pointer from td.
2015-02-18 13:37:28 +00:00
Mateusz Guzik
b7a39e9e07 filedesc: simplify fget_unlocked & friends
Introduce fget_fcntl which performs appropriate checks when needed.
This removes a branch from fget_unlocked.

Introduce fget_mmap dealing with cap_rights_to_vmprot conversion.
This removes a branch from _fget.

Modify fget_unlocked to pass sequence counter to interested callers so
that they can perform their own checks and make sure the result was
otained from stable & current state.

Reviewed by:	silence on -hackers
2015-02-17 23:54:06 +00:00
Gleb Smirnoff
ee52391ebe Use anonymous unions and structs to organize shared space in mbuf(9),
instead of preprocessor macros.
  This will make debugger output of 'print *m' exactly match the names
we use in code, making life of a kernel hacker way more pleasant. And
this also allows to rename struct_m_ext back to m_ext.
2015-02-17 20:52:51 +00:00
Gleb Smirnoff
ec9d83dd9b Use anonymous unions to add possibility to put mbufs into queue(3)
STAILQs and SLISTs using the same structure field as good old m_next
and m_nextpkt linkage occupy.

New code is encouraged to use queue(3) macros, instead of implementing
the wheel. However, better not to have a mixture of old style and
queue(3) in one file or subsystem.

Reviewed by:		rwatson, rrs, rpaulo
Differential Revision:	D1499
2015-02-17 19:32:11 +00:00
Enji Cooper
c514f051b7 Add the mnt_lockref field to the ddb(4) 'show mount' command
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D1688
Submitted by: Conrad Meyer <conrad.meyer@isilon.com>
Sponsored by: EMC / Isilon Storage Division
2015-02-17 09:31:58 +00:00
Adrian Chadd
bfa102cae1 Implement taskqueue_start_threads_cpuset().
This is a more generic version of taskqueue_start_threads_pinned()
which only supports a single cpuid.

This originally came from John Baldwin <jhb@> who implemented it
as part of a push towards NUMA awareness in drivers.  I started implementing
something similar for RSS and NUMA, then found he already did it.

I'd like to axe taskqueue_start_threads_pinned() so it doesn't become
part of a longer-term API.  (Read: hps@ wants to MFC things, and
if I don't do this soon, he'll MFC what's here. :-)

I have a follow-up commit which converts the intel drivers over
to using the cpuset version of this function, so we can eventually
nuke the the pinned version.

Tested:

* igb, ixgbe

Obtained from:	jhbbsd
2015-02-17 02:35:06 +00:00
Konstantin Belousov
45f1ade79b Reparenting done by debugger attach can leave reaper without direct
children.  Handle the situation instead asserting that it is
impossible.

Reported and tested by:	emaste
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2015-02-15 08:44:30 +00:00
Konstantin Belousov
4b685a2862 Return with the process locked, caller expects p still locked after
the call.

Reported and tested by:	bapt
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2015-02-15 08:43:19 +00:00
Davide Italiano
a76d4388e1 Don't access sockbuf fields directly, use accessor functions instead.
It is safe to move the call to socantsendmore_locked() after
sbdrop_locked() as long as we hold the sockbuf lock across the two
calls.

CR:	D1805
Reviewed by:	adrian, kmacy, julian, rwatson
2015-02-14 20:00:57 +00:00
John Baldwin
bc411bc2d0 Include OBJT_PHYS VM objects in ELF core dumps. In particular this
includes the shared page allowing debuggers to use the signal trampoline
code to identify signal frames in core dumps.

Differential Revision:	https://reviews.freebsd.org/D1828
Reviewed by:	alc, kib
MFC after:	1 week
2015-02-14 17:12:31 +00:00
John Baldwin
1b76e0b732 Add two new counters for vnode life cycle events:
- vfs.recycles counts the number of vnodes forcefully recycled to avoid
  exceeding kern.maxvnodes.
- vfs.vnodes_created counts the number of vnodes created by successful
  calls to getnewvnode().

Differential Revision:	https://reviews.freebsd.org/D1671
Reviewed by:	kib
MFC after:	1 week
2015-02-14 17:02:51 +00:00
Alan Cox
c2d5d3ee0e Preset the object's color, or alignment, to maximize superpage usage.
MFC after:	5 days
2015-02-13 19:58:53 +00:00
Randall Stewart
66525b2d16 This fixes a bug I in-advertantly inserted when I updated the callout
code in my last commit. The cc_exec_next is used to track the next
when a direct call is being made from callout. It is *never* used
in the in-direct method. When macro-izing I made it so that it
would separate out direct/vs/non-direct. This is incorrect and can
cause panics as Peter Holm has found for me (Thanks so much Peter for
all your help in this). What this change does is restore that behavior
but also get rid of the cc_next from the array and instead make it
be part of the base callout structure. This way no one else will get
confused since we will never use it for non-direct.

Reviewed by:	Peter Holm and more importantly tested by him ;-)
MFC after:	3 days.
Sponsored by:	Netflix Inc.
2015-02-12 13:31:08 +00:00
Rui Paulo
b5263b26db Remove check against NULL after M_WAITOK.
Submitted by:	Oliver Pinter
2015-02-11 19:07:05 +00:00