Commit Graph

15760 Commits

Author SHA1 Message Date
nwhitehorn
3cd3799a91 Remove some, but not all, assumptions that the BSP is CPU 0 and that CPUs
are numbered densely from there to n_cpus.

MFC after:	1 month
2017-11-25 23:41:05 +00:00
mjg
16d5d58622 Add the missing lockstat check for thread lock. 2017-11-25 20:49:27 +00:00
mjg
67f9bba460 rwlock: fix up compilation of the previous change
commmitted wrong version of the patch
2017-11-25 20:25:45 +00:00
mjg
c52edb8a9e rwlock: add __rw_try_{r,w}lock_int 2017-11-25 20:22:51 +00:00
mjg
2d0a05c617 sx: change sunlock to wake waiters up if it locked sleepq
sleepq is only locked if the curhtread is the last reader. By the time
the lock gets acquired new ones could have arrived. The previous code
would unlock and loop back. This results spurious relocking of sleepq.

This is a step towards xadd-based unlock routine.
2017-11-25 20:13:50 +00:00
mjg
c7ffe8c4e0 locks: retry turnstile/sleepq loops on failed cmpset
In order to go to sleep threads set waiter flags, but that can spuriously
fail e.g. when a new reader arrives. Instead of unlocking everything and
looping back, re-evaluate the new state while still holding the lock necessary
to go to sleep.
2017-11-25 20:10:33 +00:00
mjg
d5c5e62cc7 rwlock: stop re-reading the owner when going to sleep 2017-11-25 20:08:11 +00:00
jhb
bac78aa2a4 Decode kevent structures logged via ktrace(2) in kdump.
- Add a new KTR_STRUCT_ARRAY ktrace record type which dumps an array of
  structures.

  The structure name in the record payload is preceded by a size_t
  containing the size of the individual structures.  Use this to
  replace the previous code that dumped the kevent arrays dumped for
  kevent().  kdump is now able to decode the kevent structures rather
  than dumping their contents via a hexdump.

  One change from before is that the 'changes' and 'events' arrays are
  not marked with separate 'read' and 'write' annotations in kdump
  output.  Instead, the first array is the 'changes' array, and the
  second array (only present if kevent doesn't fail with an error) is
  the 'events' array.  For kevent(), empty arrays are denoted by an
  entry with an array containing zero entries rather than no record.

- Move kevent decoding tables from truss to libsysdecode.

  This adds three new functions to decode members of struct kevent:
  sysdecode_kevent_filter, sysdecode_kevent_flags, and
  sysdecode_kevent_fflags.

  kdump uses these helper functions to pretty-print kevent fields.

- Move structure definitions for freebsd11 and freebsd32 kevent
  structures to <sys/event.h> so that they can be shared with userland.
  The 32-bit structures are only exposed if _WANT_KEVENT32 is defined.
  The freebsd11 structures are only exposed if _WANT_FREEBSD11_KEVENT is
  defined.  The 32-bit freebsd11 structure requires both.

- Decode freebsd11 kevent structures in truss for the compat11.kevent()
  system call.

- Log 32-bit kevent structures via ktrace for 32-bit compat kevent()
  system calls.

- While here, constify the 'void *data' argument to ktrstruct().

Reviewed by:	kib (earlier version)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D12470
2017-11-25 04:49:12 +00:00
markj
d88166d005 Have lockstat:::sx-release fire only after the lock state has changed.
MFC after:	1 week
2017-11-24 19:04:31 +00:00
markj
e32cd825f4 Add a missing lockstat:::sx-downgrade probe.
We were returning without firing the probe when the lock had no shared
waiters.

MFC after:	1 week
2017-11-24 19:02:06 +00:00
ed
9b06c6070c Don't let cpu_set_syscall_retval() clobber exec_setregs().
Upon successful completion, the execve() system call invokes
exec_setregs() to initialize the registers of the initial thread of the
newly executed process. What is weird is that when execve() returns, it
still goes through the normal system call return path, clobbering the
registers with the system call's return value (td->td_retval).

Though this doesn't seem to be problematic for x86 most of the times (as
the value of eax/rax doesn't matter upon startup), this can be pretty
frustrating for architectures where function argument and return
registers overlap (e.g., ARM). On these systems, exec_setregs() also
needs to initialize td_retval.

Even worse are architectures where cpu_set_syscall_retval() sets
registers to values not derived from td_retval. On these architectures,
there is no way cpu_set_syscall_retval() can set registers to the way it
wants them to be upon the start of execution.

To get rid of this madness, let sys_execve() return EJUSTRETURN. This
will cause cpu_set_syscall_retval() to leave registers intact. This
makes process execution easier to understand. It also eliminates the
difference between execution of the initial process and successive ones.
The initial call to sys_execve() is not performed through a system call
context.

Reviewed by:	kib, jhibbits
Differential Revision:	https://reviews.freebsd.org/D13180
2017-11-24 07:35:08 +00:00
kib
11c77eaad8 Kill all descendants of the reaper, even if they are descendants of a
subordinate reaper.

Also, mark reapers when listing pids.

Reported by:	Michael Zuo <muh.muhten@gmail.com>
PR:	223745
Reviewed by:	bapt
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D13183
2017-11-23 11:25:11 +00:00
mjg
b01e4efe7a sx: unbreak debug after r326107
An assertion was modified to use the found value, but it was not updated to
handle a race where blocked threads appear after the entrance to the func.

Move the assertion down to the area protected with sleepq lock where the
lock is read anyway. This does not affect coverage of the assertion and
is consistent with what rw locks are doing.

Reported by:	Shawn Webb
2017-11-23 03:40:51 +00:00
mjg
77a5a9349b rwlock: unbreak WITNESS builds after r326110
Reported by:	Shawn Webb
2017-11-23 03:20:12 +00:00
mjg
41e06ccaac rwlock: don't check for curthread's read lock count in the fast path 2017-11-22 23:52:05 +00:00
mjg
c8a2652582 locks: pass the found lock value to unlock slow path
This avoids an explicit read later.

While here whack the cheaply obtainable 'tid' argument.
2017-11-22 22:04:04 +00:00
mjg
b1c2309fd4 locks: remove the file + line argument from internal primitives when not used
The pair is of use only in debug or LOCKPROF kernels, but was passed (zeroed)
for many locks even in production kernels.

While here whack the tid argument from wlock hard and xlock hard.

There is no kbi change of any sort - "external" primitives still accept the
pair.
2017-11-22 21:51:17 +00:00
markj
06c0131e2d Clean up the SYSINIT_FLAGS definitions for rwlock(9) and rmlock(9).
Avoid duplication in their macro definitions, and document them. No
functional change intended.

MFC after:	1 week
2017-11-21 14:59:23 +00:00
scottl
6e64b87b4f Update a comment in brelse() to match reality. 2017-11-20 20:53:03 +00:00
pfg
4736ccfd9c sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
pfg
9da7bdde06 spdx: initial adoption of licensing ID tags.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

Initially, only tag files that use BSD 4-Clause "Original" license.

RelNotes:	yes
Differential Revision:	https://reviews.freebsd.org/D13133
2017-11-18 14:26:50 +00:00
mjg
3fb5232780 locks: fix compilation issues without SMP or KDTRACE_HOOKS 2017-11-17 23:27:06 +00:00
mjg
0e14f7b1ec lockmgr: remove the ADAPTIVE_LOCKMGRS option
The code was never enabled and is very heavy weight.

A revamped adaptive spinning may show up at a later time.

Discussed with:	kib
2017-11-17 20:41:17 +00:00
cem
30b90e55c4 vfs_lookup: Allow PATH_MAX-1 symlinks
Previously, symlinks in FreeBSD were artificially limited to PATH_MAX-2.

Add a short test case to verify the change.

Submitted by:	Gaurav Gangalwar <ggangalwar AT isilon.com>
Reviewed by:	kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12589
2017-11-17 19:25:39 +00:00
mjg
23988560e7 mtx: add missing parts of the diff in r325920
Fixes build breakage.
2017-11-17 02:59:28 +00:00
mjg
6dae3949a0 sched: move panic handling code out of choosethread
This avoids jumps in the common case of the kernel not being panicked.
2017-11-17 02:45:38 +00:00
mjg
4688365465 Check for PRS_NEW without locking the proc in sysctl_kern_proc 2017-11-17 02:29:06 +00:00
mjg
2cadb364c5 sx: perform a minor cleanup of the unlock slowpath
No functional changes.
2017-11-17 02:27:04 +00:00
mjg
45ffcf24ba rwlock: unlock before traversing threads to wake up
While here perform a minor cleanup of the unlock path.
2017-11-17 02:26:15 +00:00
mjg
24a0d3819f mtx: unlock before traversing threads to wake up
This shortens the lock hold time while not affecting corretness.
All the woken up threads end up competing can lose the race against
a completely unrelated thread getting the lock anyway.
2017-11-17 02:25:04 +00:00
mjg
d5884672d5 locks: pull up PMC_SOFT_CALLs out of slow path loops 2017-11-17 02:22:51 +00:00
mjg
3b151c9e66 rwlock: avoid branches in the slow path if lockstat is disabled 2017-11-17 02:21:24 +00:00
mjg
72b0124a90 sx: avoid branches if in the slow path if lockstat is disabled 2017-11-17 02:21:07 +00:00
gordon
2d36b76cdf Properly bzero kldstat structure to prevent kernel information leak.
Submitted by:	kib
Reported by:	TJ Corley
Security:	CVE-2017-1088
2017-11-15 22:30:21 +00:00
emaste
e69667d9c9 disallow clock_settime too far in the future to avoid panic
clock_ts_to_ct has a KASSERT that the converted year fits into four
digits.  By default (sysctl debug.allow_insane_settime is 0) the kernel
disallows a time too far in the future, using a value of 9999 366-day
years.  However, clock_settime is epoch-relative and the assertion will
fail with a tv_sec corresponding to some 8030 years.

Avoid trying to be too clever, and just use a limit of 8000 365-day
years past the epoch.

Submitted by:	Heqing Yan <scottieyan@gmail.com>
Reported by:	Syzkaller (https://github.com/google/syzkaller)
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2017-11-14 18:18:18 +00:00
imp
347a734090 Add two new tunables / sysctls to controll reboot after panic:
kern.poweroff_on_panic which, when enabled, instructs a system to
power off on a panic instead of a reboot.

kern.powercyle_on_panic which, when enabled, instructs a system to
power cycle, if possible, on a panic instead of a reboot.

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D13042
2017-11-14 00:29:14 +00:00
jhb
6666fed955 Move loop to clear TDB_SUSPEND into PT_DETACH case.
The PT_DETACH case above the sendsig: label already looped over all
threads clearing flags in td_dbgflags.  Reuse this loop to clear
TDB_SUSPEND and move the logic out of the sendsig: block.
2017-11-13 21:22:33 +00:00
jhb
85bf6975fd Pull the PT_ATTACH case out of the 'sendsig:' block.
Most of the conditionals in the 'sendsig:' block are now only different
for PT_ATTACH vs other continue requests.  Pull the PT_ATTACH-specific
logic up into the PT_ATTACH case and simplify the 'sendsig:' block.  This
also permits moving the unlock of proctree_lock above the sendsig: label
since PT_KILL doesn't hold the lock and and the other cases all fall
through to the label.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D13073
2017-11-13 21:09:08 +00:00
jhb
dd16f28153 Only clear a pending thread event if one is pending.
This fixes a panic when attaching to an already-stopped process after
r325028.  While here, clean up a few other things in the control flow
of the 'sendsig' section:
- Only check for P_STOPPED_TRACE rather than either of P_STOPPED_SIG
  or P_STOPPED_TRACE for most ptrace requests.  The signal handling
  code in kern_sig.c never sets just P_STOPPED_SIG for a traced
  process, so if P_STOPPED_SIG is stopped, P_STOPPED_TRACE should be
  set anyway.  Remove a related debug printf.  Assuming P_STOPPED_TRACE
  permits simplifications in the 'sendsig:' block.
- Move the block to clear the pending thread state up into a new
  block conditional on P_STOPPED_TRACE and handle delivering pending
  signals to the reporting thread and clearing the reporting thread's
  state in this block.
- Consolidate case to send a signal to the process in a single case
  for PT_ATTACH.  The only case that could have been in the else before
  was a PT_ATTACH where P_STOPPED_SIG was not set, so both instances
  of kern_psignal() collapse down to just PT_ATTACH.

Reported by:	pho, mmel
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D12837
2017-11-13 19:58:58 +00:00
delphij
62fdd4ef88 Be more careful when doing calculation with request from userland.
MFC after:	2 weeks
2017-11-13 07:47:43 +00:00
mjg
b77ed9cba6 Use passed thread pointer instead of curthread in sys_sched_yield
No functional changes.
2017-11-12 02:34:33 +00:00
mjg
1e871cdd86 Avoid locking and refing in sysctl_kern_proc_args if possible.
Turns out the sysctl is called a lot e.g. by pkg-static.
2017-11-11 22:39:33 +00:00
mjg
524a7f5fe0 sysctl: try to avoid malloc in name2oid
name2oid is called all the time and passed names are almost always very short
(< 16 characters).
2017-11-11 21:50:36 +00:00
mjg
cbe24d82fa Use pfind_any in linux_rt_sigqueueinfo and kern_sigqueue 2017-11-11 18:10:09 +00:00
mjg
2127391bd8 Add pfind_any
It looks for both regular and zombie processes. This avoids allproc relocking
previously seen with pfind -> zpfind calls.
2017-11-11 18:04:39 +00:00
mjg
15a19385fd Avoid allproc lock in pfind if curproc->pid == pid 2017-11-11 18:03:26 +00:00
mjg
3aa5eba714 Remove useless proc lookup from sysctl_out_proc 2017-11-11 18:02:23 +00:00
mjg
41984bdd76 rwlock: use fcmpset for setting RW_LOCK_WRITE_SPINNER 2017-11-11 09:34:11 +00:00
mjoras
9c18ca3bd2 Introduce EVENTHANDLER_LIST and some users.
This introduces a facility to EVENTHANDLER(9) for explicitly defining a
reference to an event handler list. This is useful since previously all
invokers of events had to do a locked traversal of the global list of
event handler lists in order to find the appropriate event handler list.
By keeping a pointer to the appropriate list an invoker can avoid this
traversal completely. The pointer is initialized with SYSINIT(9) during
the eventhandler stage. Users registering interest in events do not need
to know if the event is backed by such a list, since the list is added
to the global list of lists. As with lists that are not pre-defined it
is safe to register for the events before the list has been created.

This converts the process_* and thread_* events to using the new
facility, as these are events whose locked traversals end up showing up
significantly in ports build workflows (and presumably other workflows
with many short lived threads/procs). It may be advantageous to convert
other events to using the new facility.

The el_flags field is now unused, but leave it be so that this revision
can be MFC'd.

Reviewed by:	bdrewery, markj, mjg
Approved by:	rstone (mentor)
In collaboration with:  ian
MFC after:      4 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12814
2017-11-09 22:51:48 +00:00
kib
5633778c38 Zero whole struct ptrace_lwpinfo to not leak kernel stack data.
Reported by:	Ilja Van Sprundel <ivansprundel@ioactive.com>
Discussed with:	secteam
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D12796
2017-11-08 23:32:56 +00:00