15699 Commits

Author SHA1 Message Date
Ed Maste
81d606f52e disallow clock_settime too far in the future to avoid panic
clock_ts_to_ct has a KASSERT that the converted year fits into four
digits.  By default (sysctl debug.allow_insane_settime is 0) the kernel
disallows a time too far in the future, using a value of 9999 366-day
years.  However, clock_settime is epoch-relative and the assertion will
fail with a tv_sec corresponding to some 8030 years.

Avoid trying to be too clever, and just use a limit of 8000 365-day
years past the epoch.

Submitted by:	Heqing Yan <scottieyan@gmail.com>
Reported by:	Syzkaller (https://github.com/google/syzkaller)
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2017-11-14 18:18:18 +00:00
Warner Losh
48f1a4921e Add two new tunables / sysctls to controll reboot after panic:
kern.poweroff_on_panic which, when enabled, instructs a system to
power off on a panic instead of a reboot.

kern.powercyle_on_panic which, when enabled, instructs a system to
power cycle, if possible, on a panic instead of a reboot.

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D13042
2017-11-14 00:29:14 +00:00
John Baldwin
7e3e36068b Move loop to clear TDB_SUSPEND into PT_DETACH case.
The PT_DETACH case above the sendsig: label already looped over all
threads clearing flags in td_dbgflags.  Reuse this loop to clear
TDB_SUSPEND and move the logic out of the sendsig: block.
2017-11-13 21:22:33 +00:00
John Baldwin
2a2b23cae2 Pull the PT_ATTACH case out of the 'sendsig:' block.
Most of the conditionals in the 'sendsig:' block are now only different
for PT_ATTACH vs other continue requests.  Pull the PT_ATTACH-specific
logic up into the PT_ATTACH case and simplify the 'sendsig:' block.  This
also permits moving the unlock of proctree_lock above the sendsig: label
since PT_KILL doesn't hold the lock and and the other cases all fall
through to the label.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D13073
2017-11-13 21:09:08 +00:00
John Baldwin
feeaec18d4 Only clear a pending thread event if one is pending.
This fixes a panic when attaching to an already-stopped process after
r325028.  While here, clean up a few other things in the control flow
of the 'sendsig' section:
- Only check for P_STOPPED_TRACE rather than either of P_STOPPED_SIG
  or P_STOPPED_TRACE for most ptrace requests.  The signal handling
  code in kern_sig.c never sets just P_STOPPED_SIG for a traced
  process, so if P_STOPPED_SIG is stopped, P_STOPPED_TRACE should be
  set anyway.  Remove a related debug printf.  Assuming P_STOPPED_TRACE
  permits simplifications in the 'sendsig:' block.
- Move the block to clear the pending thread state up into a new
  block conditional on P_STOPPED_TRACE and handle delivering pending
  signals to the reporting thread and clearing the reporting thread's
  state in this block.
- Consolidate case to send a signal to the process in a single case
  for PT_ATTACH.  The only case that could have been in the else before
  was a PT_ATTACH where P_STOPPED_SIG was not set, so both instances
  of kern_psignal() collapse down to just PT_ATTACH.

Reported by:	pho, mmel
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D12837
2017-11-13 19:58:58 +00:00
Xin LI
712dda7fb0 Be more careful when doing calculation with request from userland.
MFC after:	2 weeks
2017-11-13 07:47:43 +00:00
Mateusz Guzik
fe7979a12c Use passed thread pointer instead of curthread in sys_sched_yield
No functional changes.
2017-11-12 02:34:33 +00:00
Mateusz Guzik
baaa6ec7ed Avoid locking and refing in sysctl_kern_proc_args if possible.
Turns out the sysctl is called a lot e.g. by pkg-static.
2017-11-11 22:39:33 +00:00
Mateusz Guzik
8b9817a443 sysctl: try to avoid malloc in name2oid
name2oid is called all the time and passed names are almost always very short
(< 16 characters).
2017-11-11 21:50:36 +00:00
Mateusz Guzik
537d0fb138 Use pfind_any in linux_rt_sigqueueinfo and kern_sigqueue 2017-11-11 18:10:09 +00:00
Mateusz Guzik
6e1619dae3 Add pfind_any
It looks for both regular and zombie processes. This avoids allproc relocking
previously seen with pfind -> zpfind calls.
2017-11-11 18:04:39 +00:00
Mateusz Guzik
272640b7fc Avoid allproc lock in pfind if curproc->pid == pid 2017-11-11 18:03:26 +00:00
Mateusz Guzik
9b57bf75d0 Remove useless proc lookup from sysctl_out_proc 2017-11-11 18:02:23 +00:00
Mateusz Guzik
c7e4e92ecd rwlock: use fcmpset for setting RW_LOCK_WRITE_SPINNER 2017-11-11 09:34:11 +00:00
Matt Joras
2ca45184dc Introduce EVENTHANDLER_LIST and some users.
This introduces a facility to EVENTHANDLER(9) for explicitly defining a
reference to an event handler list. This is useful since previously all
invokers of events had to do a locked traversal of the global list of
event handler lists in order to find the appropriate event handler list.
By keeping a pointer to the appropriate list an invoker can avoid this
traversal completely. The pointer is initialized with SYSINIT(9) during
the eventhandler stage. Users registering interest in events do not need
to know if the event is backed by such a list, since the list is added
to the global list of lists. As with lists that are not pre-defined it
is safe to register for the events before the list has been created.

This converts the process_* and thread_* events to using the new
facility, as these are events whose locked traversals end up showing up
significantly in ports build workflows (and presumably other workflows
with many short lived threads/procs). It may be advantageous to convert
other events to using the new facility.

The el_flags field is now unused, but leave it be so that this revision
can be MFC'd.

Reviewed by:	bdrewery, markj, mjg
Approved by:	rstone (mentor)
In collaboration with:  ian
MFC after:      4 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12814
2017-11-09 22:51:48 +00:00
Konstantin Belousov
9acf7b136d Zero whole struct ptrace_lwpinfo to not leak kernel stack data.
Reported by:	Ilja Van Sprundel <ivansprundel@ioactive.com>
Discussed with:	secteam
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D12796
2017-11-08 23:32:56 +00:00
Jeff Roberson
8d6fbbb867 Replace manyinstances of VM_WAIT with blocking page allocation flags
similar to the kernel memory allocator.

This simplifies NUMA allocation because the domain will be known at wait
time and races between failure and sleeping are eliminated.  This also
reduces boilerplate code and simplifies callers.

A wait primitive is supplied for uma zones for similar reasons.  This
eliminates some non-specific VM_WAIT calls in favor of more explicit
sleeps that may be satisfied without new pages.

Reviewed by:	alc, kib, markj
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
2017-11-08 02:39:37 +00:00
Bartek Rutkowski
cee09850f7 Make sysctl_kern_proc_umask execute fast path when requested pid in
curproc->p_pid or 0, avoiding unnecessary locking. Update libc consumer
to skip calling getpid().

Submitted by:	Pawel Biernacki <pawel.biernacki@gmail.com>
Reviewed by:	mjg, robak
Approved by:	mjg
Sponsored by:	Mysterious Code Ltd.
Differential Revision:	D12972
2017-11-07 15:13:32 +00:00
Mateusz Guzik
db520fdd46 rwlock: fix up compilation without KDTRACE_HOOKS after r324787 2017-11-06 05:14:05 +00:00
Mateusz Guzik
ce80021f4e namecache: bump numcache after dropping all locks
This makes no difference correctness-wise, but shortens total hold time.
2017-11-05 22:29:45 +00:00
Mateusz Guzik
119b826a62 namecache: wlock buckets in cache_lookup_nomakeentry
Since the case of an empty chain was already covered, it si very likely
that the existing entry is matching. Skipping readlocking saves on lock
upgrade.
2017-11-05 22:28:39 +00:00
Mateusz Guzik
ba324b5946 namecache: skip locking in cache_lookup_nomakeentry if there is no entry 2017-11-05 21:59:39 +00:00
Ed Maste
80dc9f8888 ANSIfy sys/kern/md4c.c
PR:		223453
Submitted by:	ota@j.email.ne.jp
MFC After:	2 weeks
2017-11-05 19:49:44 +00:00
Mateusz Guzik
a52058f013 namecache: skip locking in cache_purge_negative if there are no entries 2017-11-05 08:31:25 +00:00
Pedro F. Giffuni
7aa472731e ANSI-fy exec_shell_imgact().
Fix a stray space while here.

PR:	223317
MFC after:	3 days
2017-11-04 15:41:08 +00:00
Konstantin Belousov
30c438723d Convert explicit panic() call to assert.
Based on github pull request:	#113
Submitted by:	pmarillo@github
MFC after:	1 week
2017-11-04 10:49:34 +00:00
Mateusz Guzik
a2c36a24b6 Special-case pget lookups where pid == curproc->pid
Saves on allproc_lock acquires during buildworld, poudriere etc.

Submitted by:	Pawel Biernacki <pawel.biernacki@gmail.com>
Sponsored by:	Mysterious Code Ltd.
Differential Revision:	D12929
2017-11-03 19:21:36 +00:00
Mateusz Guzik
ac850e5a8d namecache: fix .. check broken after r324378
wtf by:	mjg
Diagnosed by:	avg
2017-11-01 08:40:04 +00:00
Mateusz Guzik
59e260f860 Fixup r325264, take #2
whack an unused variable
2017-11-01 06:46:58 +00:00
Mateusz Guzik
5644fffa25 namecache: ncnegfactor 16 -> 12
It is used on each new entry addition to decide whether to whack an existing
negative entry in order to prevent a blow out in size, but the parameter was
set years ago and never revisited.

Building with poudriere results in about 400 evictions per second which
unnecessarily grab entries from the hot list.

With the new parameter there are next to no evictions of the sort.
2017-11-01 06:45:41 +00:00
Mateusz Guzik
5d03f1e11f Fixup r325264
Accidentally committed an incomplete diff.
2017-11-01 06:38:46 +00:00
Mateusz Guzik
c0b5261b55 Save on loginclass list locking by checking if caller already uses the struct 2017-11-01 06:12:14 +00:00
Mateusz Guzik
5949c7e504 Save on uihash table locking by checking if the caller already uses the struct
In particular with poudriere this saves about 90% of lookups.
2017-11-01 05:51:20 +00:00
John Baldwin
e012fe34cb Discard the correct thread event reported for a ptrace stop.
When multiple threads wish to report a tracing event to a debugger,
both threads call ptracestop() and one thread will win the race to be
the reporting thread (p->p_xthread).  The debugger uses PT_LWPINFO
with the process ID to determine which thread / LWP is reporting an
event and the details of that event.  This event is cleared as a side
effect of the subsequent ptrace event that resumed the process
(PT_CONTINUE, PT_STEP, etc.).  However, ptrace() was clearing the
event identified by the LWP ID passed to the resume request even if
that wasn't the 'p_xthread'.  This could result in clearing an event
that had not yet been observed by the debugger and leaving the
existing event for 'p_thread' pending so that it was reported a second
time.

Specifically, if the debugger stopped due to a software breakpoint in
one thread, but then switched to another thread that was used to
resume (e.g. if the user switched to a different thread and issued a
step), the resume request (PT_STEP) cleared a pending event (if any)
for the thread being stepped.  However, the process immediately
stopped and the first thread reported it's breakpoint event a second
time.  The debugger decremented the PC for "both" breakpoint events
which resulted in the PC now pointing into the middle of an
instruction (on x86) and a SIGILL fault when the process was resumed a
second time.

To fix, always clear the pending event for 'p_xthread' when resuming a
process.  ptrace() still honors the requested LWP ID when enabling
single-stepping (PT_STEP) or setting a different PC (PT_CONTINUE).

Reported by:	GDB testsuite (gdb.threads/continue-pending-status.exp)
Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D12794
2017-10-27 03:16:19 +00:00
Alan Somers
df485bdb3c Fix aio_suspend in 32-bit emulation
An off-by-one error has been present since the system call was first present
in 185878.  It additionally became a memory corruption bug after change
324941.  The failure is actually revealed by our existing AIO tests.
However, apparently nobody's been running those in 32-bit emulation mode.

Reported by:	Coverity, cem
CID:		1382114
MFC after:	18 days
X-MFC-With:	324941
Sponsored by:	Spectra Logic Corp
2017-10-26 19:45:15 +00:00
Warner Losh
7d41b6f078 Handle RB_POWERCYCLE in the MI part of the kernel
Signal init with SIGWINCH in shutdown_nice for RB_POWERCYCLE.

Sponsored by: Netflix
2017-10-25 15:30:44 +00:00
Mark Johnston
64a16434d8 Add support for compressed kernel dumps.
When using a kernel built with the GZIO config option, dumpon -z can be
used to configure gzip compression using the in-kernel copy of zlib.
This is useful on systems with large amounts of RAM, which require a
correspondingly large dump device. Recovery of compressed dumps is also
faster since fewer bytes need to be copied from the dump device.

Because we have no way of knowing the final size of a compressed dump
until it is written, the kernel will always attempt to dump when
compression is configured, regardless of the dump device size. If the
dump is aborted because we run out of space, an error is reported on
the console.

savecore(8) is modified to handle compressed dumps and save them to
vmcore.<index>.gz, as it does when given the -z option.

A new rc.conf variable, dumpon_flags, is added. Its value is added to
the boot-time dumpon(8) invocation that occurs when a dump device is
configured in rc.conf.

Reviewed by:	cem (earlier version)
Discussed with:	def, rgrimes
Relnotes:	yes
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11723
2017-10-25 00:51:00 +00:00
Alan Somers
913b932900 Remove artificial restriction on lio_listio's operation count
In r322258 I made p1003_1b.aio_listio_max a tunable. However, further
investigation shows that there was never any good reason for that limit to
exist in the first place. It's used in two completely different ways:

* To size a UMA zone, which globally limits the number of concurrent
  aio_suspend calls.

* To artifically limit the number of operations in a single lio_listio call.
  There doesn't seem to be any memory allocation associated with this limit.

This change does two things:

* Properly names aio_suspend's UMA zone, and sizes it based on a new constant.

* Eliminates the artifical restriction on lio_listio. Instead, lio_listio
  calls will now be limited by the more generous max_aio_queue_per_proc. The
  old p1003_1b.aio_listio_max is now an alias for
  vfs.aio.max_aio_queue_per_proc, so sysconf(3) will still work with
  _SC_AIO_LISTIO_MAX.

Reported by:	bde
Reviewed by:	jhb
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D12120
2017-10-23 23:12:01 +00:00
Mateusz Guzik
5132933a08 Bump WITNESS_PENDLIST to accomodate sleepq chain bump.
Reported by:	ngie
2017-10-23 01:00:35 +00:00
Mateusz Guzik
9e68989764 Make the sleepq chain hash size configurable per-arch and bump on amd64.
While here cache-align chains.

This shortens longest found chain during poudriere -j 80 from 32 to 16.

Pushing this higher up will probably require allocation on boot.
2017-10-22 20:43:50 +00:00
Mateusz Guzik
5a17c5524f sdt: make all sdt probe sites test one variable
This saves on cache misses at the expense of a slight grow of .text.

Note this is a bandaid for lack of hotpatching.

Discussed with:	markj
2017-10-22 20:22:23 +00:00
Mateusz Guzik
614e1868d6 Change kdb_active type to u_char.
Fixes warnings from gcc and keeps the small size. Perhaps nesting should be moved
to another variablle.

Reported by:	ngie
2017-10-22 13:42:56 +00:00
Enji Cooper
f2374e0cc5 Clean up trailing whitespace in kdb_thr_ctx(..)
MFC after:	1 week
2017-10-22 12:12:52 +00:00
Konstantin Belousov
456a73ef01 Remove the support for mknod(S_IFMT), which created dummy vnodes with
VBAD type.

FFS ffs_write() VOP catches such vnodes and panics, other VOPs do not
check for the type and their behaviour is really undefined.  The
comment claims that this support was done for 'badsect' to flag bad
sectors, we do not have such facility in kernel anyway.

Reported by:	Dmitry Vyukov <dvyukov@google.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-22 08:11:45 +00:00
Mateusz Guzik
be49509eea mtx: implement thread lock fastpath
MFC after:	1 week
2017-10-21 22:40:09 +00:00
Michal Meloun
904d8c492f Add AT_HWCAP2 ELF auxiliary vector.
- allocate value for new AT_HWCAP2 auxiliary vector on all platforms.
 - expand 'struct sysentvec' by new 'u_long *sv_hwcap2', in exactly
   same way as for AT_HWCAP.

MFC after:	1 month
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D12699
2017-10-21 12:05:01 +00:00
Mark Johnston
a3e8a25a52 Avoid the nbp lookup in the final loop iteration in flushbuflist().
The end of the loop must re-lookup the next buf since the bufobj lock
is dropped in the loop body. If the lookup fails, the loop is restarted.
This mechanism non-obviously also terminates the loop when the end of
the buf list is reached. Split up the two loops termination cases to
make the code a bit less fragile. No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12730
2017-10-20 14:56:13 +00:00
Mateusz Guzik
62bf13cbf9 mtx: fix up UP build after r324778
Reported by:	Michael Butler
2017-10-20 14:04:01 +00:00
Mateusz Guzik
c48a94251d Mark kdb_active as __read_frequently and switch to bool to eat less space. 2017-10-20 04:02:53 +00:00
Mateusz Guzik
2567807c32 rwlock: reduce lockstat branches in the slowpath
MFC after:	1 week
2017-10-20 03:32:42 +00:00