Commit Graph

15689 Commits

Author SHA1 Message Date
Mateusz Guzik
6e1619dae3 Add pfind_any
It looks for both regular and zombie processes. This avoids allproc relocking
previously seen with pfind -> zpfind calls.
2017-11-11 18:04:39 +00:00
Mateusz Guzik
272640b7fc Avoid allproc lock in pfind if curproc->pid == pid 2017-11-11 18:03:26 +00:00
Mateusz Guzik
9b57bf75d0 Remove useless proc lookup from sysctl_out_proc 2017-11-11 18:02:23 +00:00
Mateusz Guzik
c7e4e92ecd rwlock: use fcmpset for setting RW_LOCK_WRITE_SPINNER 2017-11-11 09:34:11 +00:00
Matt Joras
2ca45184dc Introduce EVENTHANDLER_LIST and some users.
This introduces a facility to EVENTHANDLER(9) for explicitly defining a
reference to an event handler list. This is useful since previously all
invokers of events had to do a locked traversal of the global list of
event handler lists in order to find the appropriate event handler list.
By keeping a pointer to the appropriate list an invoker can avoid this
traversal completely. The pointer is initialized with SYSINIT(9) during
the eventhandler stage. Users registering interest in events do not need
to know if the event is backed by such a list, since the list is added
to the global list of lists. As with lists that are not pre-defined it
is safe to register for the events before the list has been created.

This converts the process_* and thread_* events to using the new
facility, as these are events whose locked traversals end up showing up
significantly in ports build workflows (and presumably other workflows
with many short lived threads/procs). It may be advantageous to convert
other events to using the new facility.

The el_flags field is now unused, but leave it be so that this revision
can be MFC'd.

Reviewed by:	bdrewery, markj, mjg
Approved by:	rstone (mentor)
In collaboration with:  ian
MFC after:      4 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12814
2017-11-09 22:51:48 +00:00
Konstantin Belousov
9acf7b136d Zero whole struct ptrace_lwpinfo to not leak kernel stack data.
Reported by:	Ilja Van Sprundel <ivansprundel@ioactive.com>
Discussed with:	secteam
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D12796
2017-11-08 23:32:56 +00:00
Jeff Roberson
8d6fbbb867 Replace manyinstances of VM_WAIT with blocking page allocation flags
similar to the kernel memory allocator.

This simplifies NUMA allocation because the domain will be known at wait
time and races between failure and sleeping are eliminated.  This also
reduces boilerplate code and simplifies callers.

A wait primitive is supplied for uma zones for similar reasons.  This
eliminates some non-specific VM_WAIT calls in favor of more explicit
sleeps that may be satisfied without new pages.

Reviewed by:	alc, kib, markj
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
2017-11-08 02:39:37 +00:00
Bartek Rutkowski
cee09850f7 Make sysctl_kern_proc_umask execute fast path when requested pid in
curproc->p_pid or 0, avoiding unnecessary locking. Update libc consumer
to skip calling getpid().

Submitted by:	Pawel Biernacki <pawel.biernacki@gmail.com>
Reviewed by:	mjg, robak
Approved by:	mjg
Sponsored by:	Mysterious Code Ltd.
Differential Revision:	D12972
2017-11-07 15:13:32 +00:00
Mateusz Guzik
db520fdd46 rwlock: fix up compilation without KDTRACE_HOOKS after r324787 2017-11-06 05:14:05 +00:00
Mateusz Guzik
ce80021f4e namecache: bump numcache after dropping all locks
This makes no difference correctness-wise, but shortens total hold time.
2017-11-05 22:29:45 +00:00
Mateusz Guzik
119b826a62 namecache: wlock buckets in cache_lookup_nomakeentry
Since the case of an empty chain was already covered, it si very likely
that the existing entry is matching. Skipping readlocking saves on lock
upgrade.
2017-11-05 22:28:39 +00:00
Mateusz Guzik
ba324b5946 namecache: skip locking in cache_lookup_nomakeentry if there is no entry 2017-11-05 21:59:39 +00:00
Ed Maste
80dc9f8888 ANSIfy sys/kern/md4c.c
PR:		223453
Submitted by:	ota@j.email.ne.jp
MFC After:	2 weeks
2017-11-05 19:49:44 +00:00
Mateusz Guzik
a52058f013 namecache: skip locking in cache_purge_negative if there are no entries 2017-11-05 08:31:25 +00:00
Pedro F. Giffuni
7aa472731e ANSI-fy exec_shell_imgact().
Fix a stray space while here.

PR:	223317
MFC after:	3 days
2017-11-04 15:41:08 +00:00
Konstantin Belousov
30c438723d Convert explicit panic() call to assert.
Based on github pull request:	#113
Submitted by:	pmarillo@github
MFC after:	1 week
2017-11-04 10:49:34 +00:00
Mateusz Guzik
a2c36a24b6 Special-case pget lookups where pid == curproc->pid
Saves on allproc_lock acquires during buildworld, poudriere etc.

Submitted by:	Pawel Biernacki <pawel.biernacki@gmail.com>
Sponsored by:	Mysterious Code Ltd.
Differential Revision:	D12929
2017-11-03 19:21:36 +00:00
Mateusz Guzik
ac850e5a8d namecache: fix .. check broken after r324378
wtf by:	mjg
Diagnosed by:	avg
2017-11-01 08:40:04 +00:00
Mateusz Guzik
59e260f860 Fixup r325264, take #2
whack an unused variable
2017-11-01 06:46:58 +00:00
Mateusz Guzik
5644fffa25 namecache: ncnegfactor 16 -> 12
It is used on each new entry addition to decide whether to whack an existing
negative entry in order to prevent a blow out in size, but the parameter was
set years ago and never revisited.

Building with poudriere results in about 400 evictions per second which
unnecessarily grab entries from the hot list.

With the new parameter there are next to no evictions of the sort.
2017-11-01 06:45:41 +00:00
Mateusz Guzik
5d03f1e11f Fixup r325264
Accidentally committed an incomplete diff.
2017-11-01 06:38:46 +00:00
Mateusz Guzik
c0b5261b55 Save on loginclass list locking by checking if caller already uses the struct 2017-11-01 06:12:14 +00:00
Mateusz Guzik
5949c7e504 Save on uihash table locking by checking if the caller already uses the struct
In particular with poudriere this saves about 90% of lookups.
2017-11-01 05:51:20 +00:00
John Baldwin
e012fe34cb Discard the correct thread event reported for a ptrace stop.
When multiple threads wish to report a tracing event to a debugger,
both threads call ptracestop() and one thread will win the race to be
the reporting thread (p->p_xthread).  The debugger uses PT_LWPINFO
with the process ID to determine which thread / LWP is reporting an
event and the details of that event.  This event is cleared as a side
effect of the subsequent ptrace event that resumed the process
(PT_CONTINUE, PT_STEP, etc.).  However, ptrace() was clearing the
event identified by the LWP ID passed to the resume request even if
that wasn't the 'p_xthread'.  This could result in clearing an event
that had not yet been observed by the debugger and leaving the
existing event for 'p_thread' pending so that it was reported a second
time.

Specifically, if the debugger stopped due to a software breakpoint in
one thread, but then switched to another thread that was used to
resume (e.g. if the user switched to a different thread and issued a
step), the resume request (PT_STEP) cleared a pending event (if any)
for the thread being stepped.  However, the process immediately
stopped and the first thread reported it's breakpoint event a second
time.  The debugger decremented the PC for "both" breakpoint events
which resulted in the PC now pointing into the middle of an
instruction (on x86) and a SIGILL fault when the process was resumed a
second time.

To fix, always clear the pending event for 'p_xthread' when resuming a
process.  ptrace() still honors the requested LWP ID when enabling
single-stepping (PT_STEP) or setting a different PC (PT_CONTINUE).

Reported by:	GDB testsuite (gdb.threads/continue-pending-status.exp)
Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D12794
2017-10-27 03:16:19 +00:00
Alan Somers
df485bdb3c Fix aio_suspend in 32-bit emulation
An off-by-one error has been present since the system call was first present
in 185878.  It additionally became a memory corruption bug after change
324941.  The failure is actually revealed by our existing AIO tests.
However, apparently nobody's been running those in 32-bit emulation mode.

Reported by:	Coverity, cem
CID:		1382114
MFC after:	18 days
X-MFC-With:	324941
Sponsored by:	Spectra Logic Corp
2017-10-26 19:45:15 +00:00
Warner Losh
7d41b6f078 Handle RB_POWERCYCLE in the MI part of the kernel
Signal init with SIGWINCH in shutdown_nice for RB_POWERCYCLE.

Sponsored by: Netflix
2017-10-25 15:30:44 +00:00
Mark Johnston
64a16434d8 Add support for compressed kernel dumps.
When using a kernel built with the GZIO config option, dumpon -z can be
used to configure gzip compression using the in-kernel copy of zlib.
This is useful on systems with large amounts of RAM, which require a
correspondingly large dump device. Recovery of compressed dumps is also
faster since fewer bytes need to be copied from the dump device.

Because we have no way of knowing the final size of a compressed dump
until it is written, the kernel will always attempt to dump when
compression is configured, regardless of the dump device size. If the
dump is aborted because we run out of space, an error is reported on
the console.

savecore(8) is modified to handle compressed dumps and save them to
vmcore.<index>.gz, as it does when given the -z option.

A new rc.conf variable, dumpon_flags, is added. Its value is added to
the boot-time dumpon(8) invocation that occurs when a dump device is
configured in rc.conf.

Reviewed by:	cem (earlier version)
Discussed with:	def, rgrimes
Relnotes:	yes
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11723
2017-10-25 00:51:00 +00:00
Alan Somers
913b932900 Remove artificial restriction on lio_listio's operation count
In r322258 I made p1003_1b.aio_listio_max a tunable. However, further
investigation shows that there was never any good reason for that limit to
exist in the first place. It's used in two completely different ways:

* To size a UMA zone, which globally limits the number of concurrent
  aio_suspend calls.

* To artifically limit the number of operations in a single lio_listio call.
  There doesn't seem to be any memory allocation associated with this limit.

This change does two things:

* Properly names aio_suspend's UMA zone, and sizes it based on a new constant.

* Eliminates the artifical restriction on lio_listio. Instead, lio_listio
  calls will now be limited by the more generous max_aio_queue_per_proc. The
  old p1003_1b.aio_listio_max is now an alias for
  vfs.aio.max_aio_queue_per_proc, so sysconf(3) will still work with
  _SC_AIO_LISTIO_MAX.

Reported by:	bde
Reviewed by:	jhb
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D12120
2017-10-23 23:12:01 +00:00
Mateusz Guzik
5132933a08 Bump WITNESS_PENDLIST to accomodate sleepq chain bump.
Reported by:	ngie
2017-10-23 01:00:35 +00:00
Mateusz Guzik
9e68989764 Make the sleepq chain hash size configurable per-arch and bump on amd64.
While here cache-align chains.

This shortens longest found chain during poudriere -j 80 from 32 to 16.

Pushing this higher up will probably require allocation on boot.
2017-10-22 20:43:50 +00:00
Mateusz Guzik
5a17c5524f sdt: make all sdt probe sites test one variable
This saves on cache misses at the expense of a slight grow of .text.

Note this is a bandaid for lack of hotpatching.

Discussed with:	markj
2017-10-22 20:22:23 +00:00
Mateusz Guzik
614e1868d6 Change kdb_active type to u_char.
Fixes warnings from gcc and keeps the small size. Perhaps nesting should be moved
to another variablle.

Reported by:	ngie
2017-10-22 13:42:56 +00:00
Enji Cooper
f2374e0cc5 Clean up trailing whitespace in kdb_thr_ctx(..)
MFC after:	1 week
2017-10-22 12:12:52 +00:00
Konstantin Belousov
456a73ef01 Remove the support for mknod(S_IFMT), which created dummy vnodes with
VBAD type.

FFS ffs_write() VOP catches such vnodes and panics, other VOPs do not
check for the type and their behaviour is really undefined.  The
comment claims that this support was done for 'badsect' to flag bad
sectors, we do not have such facility in kernel anyway.

Reported by:	Dmitry Vyukov <dvyukov@google.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-22 08:11:45 +00:00
Mateusz Guzik
be49509eea mtx: implement thread lock fastpath
MFC after:	1 week
2017-10-21 22:40:09 +00:00
Michal Meloun
904d8c492f Add AT_HWCAP2 ELF auxiliary vector.
- allocate value for new AT_HWCAP2 auxiliary vector on all platforms.
 - expand 'struct sysentvec' by new 'u_long *sv_hwcap2', in exactly
   same way as for AT_HWCAP.

MFC after:	1 month
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D12699
2017-10-21 12:05:01 +00:00
Mark Johnston
a3e8a25a52 Avoid the nbp lookup in the final loop iteration in flushbuflist().
The end of the loop must re-lookup the next buf since the bufobj lock
is dropped in the loop body. If the lookup fails, the loop is restarted.
This mechanism non-obviously also terminates the loop when the end of
the buf list is reached. Split up the two loops termination cases to
make the code a bit less fragile. No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12730
2017-10-20 14:56:13 +00:00
Mateusz Guzik
62bf13cbf9 mtx: fix up UP build after r324778
Reported by:	Michael Butler
2017-10-20 14:04:01 +00:00
Mateusz Guzik
c48a94251d Mark kdb_active as __read_frequently and switch to bool to eat less space. 2017-10-20 04:02:53 +00:00
Mateusz Guzik
2567807c32 rwlock: reduce lockstat branches in the slowpath
MFC after:	1 week
2017-10-20 03:32:42 +00:00
Mateusz Guzik
cbc2d7c218 mtx: stop testing SCHEDULER_STOPPED in kabi funcs for spin mutexes
There is nothing panic-breaking to do in the unlock case and the lock
case will fallback to the slow path doing the check already.

MFC after:	1 week
2017-10-20 00:34:25 +00:00
Mateusz Guzik
0d74fe267b mtx: clean up locking spin mutexes
1) shorten the fast path by pushing the lockstat probe to the slow path
2) test for kernel panic only after it turns out we will have to spin,
in particular test only after we know we are not recursing

MFC after:	1 week
2017-10-20 00:30:35 +00:00
Mateusz Guzik
9b8de76beb sysctl: only take mem lock if oldlen is > 4 * PAGE_SIZE
The previous limit of just one page is hit by ps.

The entire mechanism should be reworked, if not whacked. It seems the intent
is to reduce kernel dos-ability - some handlers wire the amount of memory
passed here. Handlers should probably stop wiring in the first place or in
the worst case indicate they are doing so so that the check is done only if
necessary. It should also probably be a counter, not a lock.

MFC after:	1 week
2017-10-19 01:38:31 +00:00
Mateusz Guzik
e6b645ae89 execve: avoid one proc lock/unlock trip unless PTRACE_EXEC is set
MFC after:	1 week
2017-10-19 00:46:15 +00:00
Mateusz Guzik
80a2397a38 Tidy up pmc support at execve.
The proc-specific check is inherently racy, so the code can just unlock
beforehand.

MFC after:	1 week
2017-10-19 00:38:14 +00:00
Mateusz Guzik
cb1c79008e sysvsem: check if semu_list has anything on it before grabbing the lock
This should get a process-specific support instead.

MFC after:	1 week
2017-10-19 00:31:00 +00:00
Mateusz Guzik
c69a1a50cd Don't take Giant for SMP status and cpu topology sysctls.
Not only this lock doesn't play any role here, dirtying it slows down
other things a little bit as giant-held checks (e.g. DROP_GIANT) are
spread all over the kernel.

MFC after:	1 week
2017-10-18 22:00:44 +00:00
Mark Johnston
46fcd1af63 Move kernel dump offset tracking into MI code.
All of the kernel dump implementations keep track of the current offset
("dumplo") within the dump device. However, except for textdumps, they
all write the dump sequentially, so we can reduce code duplication by
having the MI code keep track of the current offset. The new
dump_append() API can be used to write at the current offset.

This is needed to implement support for kernel dump compression in the
MI kernel dump code.

Also simplify dump_encrypted_write() somewhat: use dump_write() instead
of duplicating its bounds checks, and get rid of the redundant offset
tracking.

Reviewed by:	cem
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11722
2017-10-18 15:38:05 +00:00
Brooks Davis
39ed7f250a Remove mbpool(9) now that it has no consumers.
mbpool existed to support NICs with memory interfaces and all remaining
comsumers were removed earlier this year with NATM.

Reviewed by:	jhb
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D10513
2017-10-18 00:18:03 +00:00
Mark Johnston
fa00affd18 Fix a racy VI_DOOMED check in MNT_VNODE_FOREACH_ALL().
MNT_VNODE_FOREACH_ALL() is supposed to avoid returning doomed vnodes,
but the VI_DOOMED check it used was done without the vnode interlock
held, so it could race with a concurrent vgone().

Submitted by:	Don Morris <don.morris@isilon.com>
Reviewed by:	kib, mckusick
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12704
2017-10-17 19:41:45 +00:00