Commit Graph

15377 Commits

Author SHA1 Message Date
Konstantin Belousov
1e4296c919 Ktracing kevent(2) calls with unusual arguments might leads to an
overly large allocation requests.

When ktrace-ing io, sys_kevent() allocates memory to copy the
requested changes and reported events.  Allocations are sized by the
incoming syscall lengths arguments, which are user-controlled, and
might cause overflow in calculations or too large allocations.

Since io trace chunks are limited by ktr_geniosize, there is no sense
it even trying to satisfy unbounded allocations.  Export ktr_geniosize
and clamp the buffers sizes in advance.

PR:	217435
Reported by:	Tim Newsham <tim.newsham@nccgroup.trust>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-03-12 13:48:24 +00:00
Alan Cox
e383e820d3 Simplify the control flow and tidy up a comment in map_insert.
In collaboration with:	kib
MFC after:	1 week
2017-03-11 18:57:13 +00:00
Andriy Gapon
28ef18b8c1 trace thread running state when a thread is run for the first time
This applies to both KTR_SCHED and DTrace sched:::on-cpu tracing.

MFC after:	10 days
2017-03-11 15:57:36 +00:00
Andriy Gapon
6c9271a918 actually implement proc:::lwp-exit probe
MFC after:	4 days
2017-03-11 15:47:27 +00:00
Mahdi Mokhtari
32a1fb0d3d Fix NULL pointer dereference and panic with shm file pread/pwrite.
PR:		217429
Reported by:	Tim Newsham <tim.newsham@nccgroup.trust>
Reviewed by:	kib
Approved by:	dchagin
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D9844
2017-03-10 10:09:44 +00:00
Gleb Smirnoff
d0147e10ca In linker_load_file() print name of a file that failed to load.
Discussed with:	kib
2017-03-09 00:56:07 +00:00
Gleb Smirnoff
a344babf78 Reduce stack usage in link_elf_load_file(), allocating struct nameidata.
This function may be called recursively, when a module pulls its dependencies.
Under certain circumstances, e.g. quad chain of dependencies and presence
of dtrace we may run out of stack.
2017-03-09 00:45:15 +00:00
Gleb Smirnoff
14984031b7 m_mbuftouio() doesn't modify the mbuf. 2017-03-07 19:00:50 +00:00
Eric Badger
b38bd91f4f don't stop in issignal() if P_SINGLE_EXIT is set
Suppose a traced process is stopped in ptracestop() due to receipt of a
SIGSTOP signal, and is awaiting orders from the tracing process on how
to handle the signal. Before sending any such orders, the tracing
process exits. This should kill the traced process. But suppose a second
thread handles the SIGKILL and proceeds to exit1(), calling
thread_single(). The first thread will now awaken and will have a chance
to check once more if it should go to sleep due to the SIGSTOP.  It must
not sleep after P_SINGLE_EXIT has been set; this would prevent the
SIGKILL from taking effect, leaving a stopped orphan behind after the
tracing process dies.

Also add new tests for this condition.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D9890
2017-03-07 13:41:01 +00:00
Konstantin Belousov
15a9aedfa1 When selecting brand based on old Elf branding, prefer the brand which
interpreter exactly matches the one requested by the activated image.

This change applies r295277, which did the same for note branding, to
the old brand selection, with the same reasoning of fixing compat32
interpreter substitution.

PR:	211837
Reported by:	kenji@kens.fm
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-03-07 13:38:25 +00:00
Konstantin Belousov
3d560b4be2 Require whole brand string matching for old Elf branding.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-03-07 13:37:35 +00:00
Konstantin Belousov
0bbee4cd3f Consistently use vm_ooffset_t type for the vm object offset in
elf_load_section.

The values passed currently as vm_offset_t are phdr.p_offset, which
have the native Elf word size.  Since elf_load_section interprets them
as the file offset, use vm object offset type.

Noted and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-03-07 13:36:43 +00:00
Hiren Panchasara
f41b2de716 Fix the KASSERT check from r314813.
len being 0 is valid.

Submitted by:	ngie
Reported by:	ngie (via jenkins test run)
Sponsored by:	Limelight Networks
2017-03-07 06:46:38 +00:00
Hiren Panchasara
b5b023b91e We've found a recurring problem where some userland process would be
stuck spinning at 100% cpu around sbcut_internal(). Inside
sbflush_internal(), sb_ccc reached to about 4GB and before passing it
to sbcut_internal(), we type-cast it from uint to int making it -ve.

The root cause of sockbuf growing this large is unknown. Correct fix
is also not clear but based on mailing list discussions, adding
KASSERTs to panic instead of looping endlessly.

Reviewed by:		glebius
Sponsored by:		Limelight Networks
2017-03-07 00:20:01 +00:00
Gleb Smirnoff
6cf0c1db55 Fix compilation of r314784 on 32 bit. 2017-03-06 22:32:56 +00:00
Gleb Smirnoff
f2498877c9 In panic() print current timestamp, which matches timestamp in the dump
header.  This will help to correlate console server logs with dump files,
no matter how precise is clock on a console server appliance, and how
buggy the appliance is.
2017-03-06 19:14:08 +00:00
Konstantin Belousov
aaadc41f6c Instead of direct use of vm_map_insert(), call vm_map_fixed(MAP_CHECK_EXCL).
This KPI explicitely indicates the intent of creating the mapping at
the fixed address, and incorporates the map locking into the callee.

Suggested and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-03-06 14:09:54 +00:00
Alan Cox
28e8da6517 Style and punctuation fixes.
Reviewed by:	kib
MFC after:	3 days
2017-03-05 23:59:04 +00:00
Emmanuel Vadot
c1b014c51c Export a sysctl dev.<clkdom>.<unit>.clocks for each clock domain containing
all the clocks that they provide.
Each clocks are exported under the node 'clock.<clkname>' and have the following
children nodes :
- frequency
- parent (The selected parent, if any)
- parents (The list of parents, if any)
- childrens (The list of childrens, if any)
- enable_cnt (The enabled counter)

This give us the possibility to examine clocks at runtime and make graph of
the clock flow.

Reviewed by:	mmel
MFC after:	2 month
Differential Revision:	https://reviews.freebsd.org/D9833
2017-03-05 07:13:29 +00:00
Eric van Gyzen
8a8bea603c Fix grammar in some comments in subr_sleepqueue.c
While I'm here, remove trailing whitespace.

Reviewed by:	kib, mostly, as part of a larger review
MFC after:	3 days
2017-03-03 21:03:28 +00:00
Mark Johnston
7813302434 Fix a ticks comparison in sched_pctcpu_update().
We may fail to reset the %CPU tracking window if a thread does not run
for over half of the ticks rollover period, resulting in a bogus %CPU
value for the thread until ticks fully rolls over. Handle this by comparing
the unsigned difference ticks - ts_ltick with SCHED_TICK_TARG instead.

Reviewed by:	cem, jeff
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-03-03 20:57:40 +00:00
Ed Maste
e052a8b932 kern_sig.c: ANSIfy and remove archaic register keyword
Sponsored by:	The FreeBSD Foundation
2017-03-02 22:17:53 +00:00
Konstantin Belousov
fe0a8a3994 Style.
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2017-03-02 17:35:13 +00:00
Hans Petter Selasky
403f4a31ab Implement taskqueue_poll_is_busy() for use by the LinuxKPI.
Refer to comment above function for a detailed description.

Discussed with:		kib @
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-03-02 12:20:23 +00:00
Sean Bruno
d945ed6472 Make gtaskqueue compatible with drm-next such that they can be used with the
linuxkpi tasklets.

Submitted by:	mmacy@nextbsd.org
Reported by:	hps
2017-03-01 18:37:35 +00:00
Konstantin Belousov
55b985b43b Use vm_map_insert() instead of vm_map_find() in elf_map_insert().
Elf_map_insert() needs to create mapping at the known fixed address.
Usage of vm_map_find() assumes, on the other hand, that any suitable
address space range above or equal the specified hint, is acceptable.
Due to operating on the fresh or cleared address space, vm_map_find()
usually creates mapping starting exactly at hint.

Switch to vm_map_insert() use to clearly request fixed mapping from
the VM.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-03-01 10:28:15 +00:00
Konstantin Belousov
e3d8f8fed4 When deallocating the vm object in elf_map_insert() due to
vm_map_insert() failure, drop the vnode lock around the call to
vm_object_deallocate().

Since the deallocated object is the vm object of the vnode, we might
get the vnode lock recursion there.  In fact, it is almost impossible
to make vm_map_insert() failing there on stock kernel.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-03-01 10:22:07 +00:00
Mateusz Guzik
a21018063b locks: ensure proper barriers are used with atomic ops when necessary
Unclear how, but the locking routine for mutexes was using the *release*
barrier instead of acquire. This must have been either a copy-pasto or bad
completion.

Going through other uses of atomics shows no barriers in:
- upgrade routines (addressed in this patch)
- sections protected with turnstile locks - this should be fine as necessary
  barriers are in the worst case provided by turnstile unlock

I would like to thank Mark Millard and andreast@ for reporting the problem and
testing previous patches before the issue got identified.

ps.
  .-'---`-.
,'          `.
|             \
|              \
\           _  \
,\  _    ,'-,/-)\
( * \ \,' ,' ,'-)
 `._,)     -',-')
   \/         ''/
    )        / /
   /       ,'-'

Hardware provided by: IBM LTC
2017-03-01 05:06:21 +00:00
Scott Long
38e41e66e5 Provide a comment on why stdio.h needs to be included. 2017-02-28 21:27:51 +00:00
Jung-uk Kim
c4e929946c Include stdio.h to fix libsbuf build.
Reviewed by:	scottl
2017-02-28 21:18:45 +00:00
Scott Long
388f3ce6c3 Implement sbuf_prf(), which takes an sbuf and outputs it
to stdout in the non-kernel case and to the console+log
in the kernel case.  For the kernel case it hooks the
putbuf() machinery underneath printf(9) so that the buffer
is written completely atomically and without a copy into
another temporary buffer.  This is useful for fixing
compound console/log messages that become broken and
interleaved when multiple threads are competing for the
console.

Reviewed by:	ken, imp
Sponsored by:	Netflix
2017-02-28 18:25:06 +00:00
Gleb Smirnoff
efe3b0de14 Remove SVR4 (System V Release 4) binary compatibility support.
UNIX System V Release 4 is operating system released in 1988. It ceased
to exist in early 2000-s.
2017-02-28 05:14:42 +00:00
Konstantin Belousov
aca4bb9112 Do not leak mount references for dying threads.
Thread might create a condition for delayed SU cleanup, which creates
a reference to the mount point in td_su, but exit without returning
through userret(), e.g. when terminating due to single-threading or
process exit.  In this case, td_su reference is not dropped and mount
point cannot be freed.

Handle the situation by clearing td_su also in the thread destructor
and in exit1().  softdep_ast_cleanup() has to receive the thread as
argument, since e.g. thread destructor is executed in different
context.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-02-25 10:38:18 +00:00
Konstantin Belousov
8cd5962571 Remove cpu_deepest_sleep variable.
On Core2 and older Intel CPUs, where TSC stops in C2, system does not
allow C2 entrance if timecounter hardware is TSC.  This is done by
tc_windup() which tests for TC_FLAGS_C2STOP flag of the new
timecounter and increases cpu_disable_c2_sleep if flag is set.  Right
now init_TSC_tc() only sets the flag if cpu_deepest_sleep >= 2, but
TSC is initialized too early for this variable to be set by
acpi_cpu.c.

There is no reason to require that ACPI reported C2 and deeper states
to set TC_FLAGS_C2STOP, so remove cpu_deepest_sleep test from
init_TSC_tc() condition.  And since this is the only use of the
variable, remove it at all.

Reported and submitted by:	Jia-Shiun Li <jiashiun@gmail.com>
Suggested by:	jhb
MFC after:	2 weeks
2017-02-24 16:11:55 +00:00
Warner Losh
bbf6e5144e Cast values to (int) before comparing them to the range of the
enum. This ensures they are in range w/o the warnings.
2017-02-24 01:39:12 +00:00
Warner Losh
df1c30f6bd KDTRACE_HOOKS isn't guaranteed to be defined. Change to check to see
if it is defined or not rather than if it is non-zero.

Sponsored by: Netflix, Inc
2017-02-24 01:39:08 +00:00
Mateusz Guzik
dfaa7859d6 mtx: microoptimize lockstat handling in spin mutexes and thread lock
While here make the code compilablle on kernels with LOCK_PROFILING but without
KDTRACE_HOOKS.
2017-02-23 22:46:01 +00:00
Eric van Gyzen
b215ceaaec Add sem_clockwait_np()
This function allows the caller to specify the reference clock
and choose between absolute and relative mode.  In relative mode,
the remaining time can be returned.

The API is similar to clock_nanosleep(3).  Thanks to Ed Schouten
for that suggestion.

While I'm here, reduce the sleep time in the semaphore "child"
test to greatly reduce its runtime.  Also add a reasonable timeout.

Reviewed by:	ed (userland)
MFC after:	2 weeks
Relnotes:	yes
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D9656
2017-02-23 19:36:38 +00:00
Jonathan T. Looney
c9cde8251c Fix a panic during boot caused by inadequate locking of some vt(4) driver
data structures.

vt_change_font() calls vtbuf_grow() to change some vt driver data
structures. It uses TF_MUTE to prevent the console from trying to use those
data structures while it changes them.

During the early stage of the boot process, the vt driver's tc_done routine
uses those data structures; however, it is currently called outside the
TF_MUTE check.

Move the tc_done routine inside the locked TF_MUTE check.

PR:		217282
Reviewed by:	ed, ray
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D9709
2017-02-23 01:18:47 +00:00
Warner Losh
6fec662c86 Make the code match the comments: If we have ANY buf's that failed
then return EAGAIN. The current code just returns that if the LAST buf
failed.

Reviewed by: kib@, trasz@
Differential Revision: https://reviews.freebsd.org/D9677
2017-02-21 18:56:06 +00:00
John Baldwin
150599be12 Consolidate statements to initialize files.
Previously, the first lines of various generated files from system call
tables were generated in two sections.  Some of the initialization was
done in BEGIN, and the rest was done when the first line was encountered.
The main reason for this split before r313564 was that most of the
initialization done in the second section depended on the $FreeBSD$ tag
extracted from the system call table.  Now that the $FreeBSD$ tag is no
longer used, consolidate all of the file initialization in the BEGIN
section.

This change was tested by confirming that the content of generated files
did not change.
2017-02-20 20:37:25 +00:00
Mateusz Guzik
13d2ef0f3a mtx: fix spin mutexes interaction with failed fcmpset
While doing so move recursion support down to the fallback routine.
2017-02-20 19:08:36 +00:00
Eric Badger
82a4538f31 Defer ptracestop() signals that cannot be delivered immediately
When a thread is stopped in ptracestop(), the ptrace(2) user may request
a signal be delivered upon resumption of the thread. Heretofore, those signals
were discarded unless ptracestop()'s caller was issignal(). Fix this by
modifying ptracestop() to queue up signals requested by the ptrace user that
will be delivered when possible. Take special care when the signal is SIGKILL
(usually generated from a PT_KILL request); no new stop events should be
triggered after a PT_KILL.

Add a number of tests for the new functionality. Several tests were authored
by jhb.

PR:		212607
Reviewed by:	kib
Approved by:	kib (mentor)
MFC after:	2 weeks
Sponsored by:	Dell EMC
In collaboration with:	jhb
Differential Revision:	https://reviews.freebsd.org/D9260
2017-02-20 15:53:16 +00:00
Konstantin Belousov
ecc6c515ab Apply noexec mount option for mmap(PROT_EXEC).
Right now the noexec mount option disallows image activators to try
execve the files on the mount point.  Also, after r127187, noexec
also limits max_prot map entries permissions for mappings of files
from such mounts, but not the actual mapping permissions.

As result, the API behaviour is inconsistent.  The files from noexec
mount can be mapped with PROT_EXEC, but if mprotect(2) drops execution
permission, it cannot be re-enabled later.  Make this consistent
logically and aligned with behaviour of other systems, by disallowing
PROT_EXEC for mmap(2).

Note that this change only ensures aligned results from mmap(2) and
mprotect(2), it does not prevent actual code execution from files
coming from noexec mount.  Such files can always be read into
anonymous executable memory and executed from there.

Reported by:	shamaz.mazum@gmail.com
PR:	217062
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-02-19 20:51:04 +00:00
Mateusz Guzik
b247fd395d locks: make trylock routines check for 'unowned' value
Since fcmpset can fail without lock contention e.g. on arm, it was possible
to get spurious failures when the caller was expecting the primitive to succeed.

Reported by:	mmel
2017-02-19 16:28:46 +00:00
Hans Petter Selasky
316e092a77 Make sure the thread constructor and destructor eventhandlers are
called for all threads belonging to a procedure. Currently the first
thread in a procedure is kept around as an optimisation step and is
never freed. Because the first thread in a procedure is never freed
nor allocated, its destructor and constructor callbacks are never
called which means per thread structures allocated by dtrace and the
Linux emulation layers for example, might be present for threads which
don't need these structures.

This patch adds a thread construction and destruction call for the
first thread in a procedure.

Tested:			dtrace, linux emulation
Reviewed by:		kib @
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-02-19 13:15:33 +00:00
Jason A. Harmening
e2a8d17887 Bring back r313037, with fixes for mips:
Implement get_pcpu() for amd64/sparc64/mips/powerpc, and use it to
replace pcpu_find(curcpu) in MI code.

Reviewed by:	andreast, kan, lidl
Tested by:	lidl(mips, sparc64), andreast(powerpc)
Differential Revision:	https://reviews.freebsd.org/D9587
2017-02-19 02:03:09 +00:00
Mateusz Guzik
5c5df0d99b locks: clean up trylock primitives
In particular thius reduces accesses of the lock itself.
2017-02-18 22:06:03 +00:00
Bryan Drewery
8e31b510b0 Fix panic with unlocked vnode to vrecycle().
MFC after:	2 weeks
2017-02-18 05:07:53 +00:00
Mateusz Guzik
a24c8eb847 mtx: plug the 'opts' argument when not used 2017-02-18 01:52:10 +00:00