Commit Graph

15114 Commits

Author SHA1 Message Date
Marcel Moolenaar
07f862a769 Include <stdarg.h> instead of <machine/stdarg.h> when compiled as
part of libsbuf. The former is the standard header, and allows us
to compile libsbuf on macOS/linux.
2016-10-24 18:03:04 +00:00
Konstantin Belousov
835c2787be Handle broadcast NMIs.
On several Intel chipsets, diagnostic NMIs sent from BMC or NMIs
reporting hardware errors are broadcasted to all CPUs.

When kernel is configured to enter kdb on NMI, the outcome is
problematic, because each CPU tries to enter kdb.  All CPUs are
executing NMI handlers, which set the latches disabling the nested NMI
delivery; this means that stop_cpus_hard(), used by kdb_enter() to
stop other cpus by broadcasting IPI_STOP_HARD NMI, cannot work.  One
indication of this is the harmless but annoying diagnostic "timeout
stopping cpus".

Much more harming behaviour is that because all CPUs try to enter kdb,
and if ddb is used as debugger, all CPUs issue prompt on console and
race for the input, not to mention the simultaneous use of the ddb
shared state.

Try to fix this by introducing a pseudo-lock for simultaneous attempts
to handle NMIs.  If one core happens to enter NMI trap handler, other
cores see it and simulate reception of the IPI_STOP_HARD.  More,
generic_stop_cpus() avoids sending IPI_STOP_HARD and avoids waiting
for the acknowledgement, relying on the nmi handler on other cores
suspending and then restarting the CPU.

Since it is impossible to detect at runtime whether some stray NMI is
broadcast or unicast, add a knob for administrator (really developer)
to configure debugging NMI handling mode.

The updated patch was debugged with the help from Andrey Gapon (avg)
and discussed with him.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D8249
2016-10-24 16:40:27 +00:00
Konstantin Belousov
55ee7a4c5f In the fueword64(9) wrapper for architectures which do not implemented
native fueword64(9) still, use proper type for local where fuword64()
result is stored.

Note that fueword64() is unused in the tree.

Submitted by:	Chunhui He <hchunhui@mail.ustc.edu.cn>
PR:	212520
MFC after:	1 week
2016-10-23 11:23:17 +00:00
Conrad Meyer
8798ef0679 ddb(4): Add sleepchains to "show allchains"
Reported by:	markj
Reviewed by:	markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8320
2016-10-22 18:02:20 +00:00
Hiren Panchasara
9d71a3975e Rework r306337.
In sendit(), if mp->msg_control is present, then in sockargs() we are
allocating mbuf to store mp->msg_control. Later in kern_sendit(), call
to getsock_cap(), will check validity of file pointer passed, if this
fails EBADF is returned but mbuf allocated in sockargs() is not freed.
Made code changes to free the same.

Since freeing control mbuf in sendit() after checking (control != NULL)
may lead to double freeing of control mbuf in sendit(), we can free
control mbuf in kern_sendit() if there are any errors in the routine.

Submitted by:		    Lohith Bellad <lohith.bellad@me.com>
Reviewed by:		    glebius
MFC after:		    3 weeks
Differential Revision:	    https://reviews.freebsd.org/D8152
2016-10-21 18:27:30 +00:00
Mariusz Zaborski
4b83a77606 capsicum: perform copyout without the fildesc lock held in sys_cap_ioctls_get
Reviewed by:	pjd
2016-10-21 16:12:23 +00:00
Mateusz Guzik
bb697a20d7 cache: fix up a corner case in r307650
If no negative entry is found on the last list, the ncp pointer will be
left uninitialized and a non-null value will make the function assume an
entry was found.

Fix the problem by initializing to NULL on entry.

Reported by:	glebius
2016-10-20 19:55:50 +00:00
Kevin Lo
61f481fb7e Remove register keyword.
Reviewed by:	kib
2016-10-20 01:21:10 +00:00
Kevin Lo
7c68685366 Remove a sentence about putting initialization in init_proc.c or kern_proc.c
and useless comment.

Reviewed by:	kib
2016-10-20 01:19:37 +00:00
Sean Bruno
026204b4c6 Resolve whitespace diff to NextBSD.
Check to see that the taskqueue thread count requires us to acutally
iterate over the thread count to bind to cpus.

Submitted by:	mmacy@nextbsd.org
2016-10-19 21:01:24 +00:00
Mateusz Guzik
53dc58f2dc Mark a bunch of mpsafe sysctls as such.
This gives me a sysctl Giant-free buildworld.
2016-10-19 19:42:01 +00:00
Mateusz Guzik
a45a1a25b8 cache: split negative entry LRU into multiple lists
This splits the ncneg_mtx lock while preserving the hit ratio at least
during buildworld.

Create N dedicated lists for new negative entries.

Entries with at least one hit get promoted to the hot list, where they
get requeued every M hits.

Shrinking demotes one hot entry and performs a round-robin shrinking of
regular lists.

Reviewed by:	kib
2016-10-19 18:29:52 +00:00
Sean Bruno
abf38392c6 Assert that we're assigning a non-null taskqueue.
ref: 535865d02c

Fix cpu assignment by assuring stride is non-zero, assert that all tasks
have a valid taskqueue.
ref: db39817623

Start cpu assignment from zero.
ref: d99d39b6b6

Submitted by:	mmacy@nextbsd.org
2016-10-18 14:00:26 +00:00
Sean Bruno
12d1b8c9f3 Ensure that tasks with a specific cpu set prior to smp starting get
re-attached to a thread running on that cpu.

ref: fcc20e306b

Submitted by:	mmacy@nextbsd.org
2016-10-18 13:55:34 +00:00
Sean Bruno
dc35f36560 Tell gtask to what we've been bound.
ref: 54414984cf

Submitted by:	mmacy@nextbsd.org
2016-10-18 13:16:27 +00:00
Ed Maste
9e62195361 makesyscalls.sh: remove trailing space on the "created from" line
In r10905 and r10906 makesyscalls was modified to avoid emitting a
literal $Id$ string in the generated file, with:

    gsub("[$]Id: ", "", $0)
    gsub(" [$]", "", $0)

Then r11294 added some functionality and also tried to address the $Id$
problem in a different way, by removing every $:

    sed -e 's/\$//g ...

This rendered the gsub infeffective. The gsub was later updated to
track the $Id$ -> $FreeBSD$ switch, even though it did not do anything.

Revert the addition of the s/\$//g, and update the gsub to keep the
resulting format the same.

Discussed with:	bde
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2016-10-17 13:52:24 +00:00
Hans Petter Selasky
d3bf5efc1f Fix device delete child function.
When detaching device trees parent devices must be detached prior to
detaching its children. This is because parent devices can have
pointers to the child devices in their softcs which are not
invalidated by device_delete_child(). This can cause use after free
issues and panic().

Device drivers implementing trees, must ensure its detach function
detaches or deletes all its children before returning.

While at it remove now redundant device_detach() calls before
device_delete_child() and device_delete_children(), mostly in
the USB controller drivers.

Tested by:		Jan Henrik Sylvester <me@janh.de>
Reviewed by:		jhb
Differential Revision:	https://reviews.freebsd.org/D8070
MFC after:		2 weeks
2016-10-17 10:20:38 +00:00
Konstantin Belousov
5975e53d40 Fix a race in vm_page_busy_sleep(9).
Suppose that we have an exclusively busy page, and a thread which can
accept shared-busy page.  In this case, typical code waiting for the
page xbusy state to pass is
again:
	VM_OBJECT_WLOCK(object);
	...
	if (vm_page_xbusied(m)) {
		vm_page_lock(m);
 		VM_OBJECT_WUNLOCK(object);    <---1
		vm_page_busy_sleep(p, "vmopax");
 		goto again;
	}

Suppose that the xbusy state owner locked the object, unbusied the
page and unlocked the object after we are at the line [1], but before we
executed the load of the busy_lock word in vm_page_busy_sleep().  If it
happens that there is still no waiters recorded for the busy state,
the xbusy owner did not acquired the page lock, so it proceeded.

More, suppose that some other thread happen to share-busy the page
after xbusy state was relinquished but before the m->busy_lock is read
in vm_page_busy_sleep().  Again, that thread only needs vm_object lock
to proceed.  Then, vm_page_busy_sleep() reads busy_lock value equal to
the VPB_SHARERS_WORD(1).

In this case, all tests in vm_page_busy_sleep(9) pass and we are going
to sleep, despite the page being share-busied.

Update check for m->busy_lock == VPB_UNBUSIED in vm_page_busy_sleep(9)
to also accept shared-busy state if we only wait for the xbusy state to
pass.

Merge sequential if()s with the same 'then' clause in
vm_page_busy_sleep().

Note that the current code does not share-busy pages from parallel
threads, the only way to have more that one sbusy owner is right now
is to recurse.

Reported and tested by:	pho (previous version)
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D8196
2016-10-13 14:41:05 +00:00
Conrad Meyer
d9ce8a41ea kern_linker: Handle module-loading failures in preloaded .ko files
The runtime kernel loader, linker_load_file, unloads kernel files that
failed to load all of their modules. For consistency, treat preloaded
(loader.conf loaded) kernel files in the same way.

Reviewed by:	kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8200
2016-10-13 02:06:23 +00:00
Ed Maste
2a059700b6 Use correct size type in do_setopt_accept_filter
Submitted by:	ecturt@gmail.com
2016-10-12 00:56:49 +00:00
Oleksandr Tymoshenko
609b0fe966 INTRNG - fix MSI/MSIX release path
Use isrc in attached MSI data structure instead of using map's
isrc directly. map's isrc is set to NULL on IRQ deactivation
which happens prior to pci_release_msi so MSI_RELEASE_MSI
receives array of NULLs

Reviewed by:	mmel
Differential Revision:	https://reviews.freebsd.org/D8206
2016-10-11 17:00:29 +00:00
Sean Bruno
1ee17b070d Fix bug where malloc(.., M_NOWAIT) return value is not checked, Change to
M_WAITOK and move outside the mutex

Submitted by:	shurd
Reviewed by:	mmacy@nextbsd.org
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D7649
2016-10-11 14:08:53 +00:00
Mateusz Guzik
45571f8886 vfs: assert empty tmp free list on unmount 2016-10-08 13:38:05 +00:00
Mateusz Guzik
c6c44ff7eb vfs: clear the tmp free list flag before taking the free vnode list lock
Safe access is already guaranteed because of the mnt_listmx lock.
2016-10-08 13:36:59 +00:00
Konstantin Belousov
f71d08566c Limit scope of the optimization in r306608 to dounmount() caller only.
Other uses of cache_purgevfs() do rely on the cache purge for correct
operations, when paths are invalidated without unmount.

Reported and tested by:	jkim
Discussed with:	mjg
Sponsored by:	The FreeBSD Foundation
2016-10-07 11:38:28 +00:00
Bryan Drewery
32641585a9 vrefl: Assert that the interlock is held.
Sponsored by:	Dell EMC Isilon
MFC after:	2 weeks
2016-10-06 18:10:19 +00:00
Bryan Drewery
5a22c9582c Add vrecyclel() to vrecycle() a vnode with the interlock already held.
Obtained from:	OneFS
Sponsored by:	Dell EMC Isilon
MFC after:	2 weeks
2016-10-06 18:09:22 +00:00
Conrad Meyer
f43292ecf4 vfs_bio: Remove a leading space (style)
Introduced in r282085.

Sponsored by:	Dell EMC Isilon
2016-10-05 23:42:02 +00:00
Bryan Drewery
0617f64ec6 Correct some comments after r294299.
Sponsored by:	Dell EMC Isilon
2016-10-04 21:44:20 +00:00
Ed Maste
65eea7ede6 ANSIfy inflate.c
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D8143
2016-10-04 17:57:30 +00:00
Konstantin Belousov
5420f76b59 Style.
Reviewed by:	emaste
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2016-10-04 15:23:03 +00:00
Mateusz Guzik
4876636eb7 cache: ignore purgevfs requests for filesystems with few vnodes
purgevfs is purely optional and induces lock contention in workloads
which frequently mount and unmount filesystems.

In particular, poudriere will do this for filesystems with 4 vnodes or
less. Full cache scan is clearly wasteful.

Since there is no explicit counter for namecache entries, the number of
vnodes used by the target fs is checked.

The default limit is the number of bucket locks.

Reviewed by:	kib
2016-10-03 00:02:32 +00:00
Mateusz Guzik
5bb81f9b2d vfs: batch free vnodes in per-mnt lists
Previously free vnodes would always by directly returned to the global
LRU list. With this change up to mnt_free_list_batch vnodes are collected
first.

syncer runs always return the batch regardless of its size.

While vnodes on per-mnt lists are not counted as free, they can be
returned in case of vnode shortage.

Reviewed by:	kib
Tested by:	pho
2016-09-30 17:27:17 +00:00
Mateusz Guzik
8660b707ff vfs: remove the __bo_vnode field from struct vnode
The pointer can be obtained using __containerof instead.

Reviewed by:	kib
2016-09-30 17:11:03 +00:00
Gleb Smirnoff
7ed6b78b92 Provide kern.maxphys sysctl, which returns MAXPHYS. Naming matches NetBSD. 2016-09-29 23:07:28 +00:00
Allan Jude
0176ca2ed5 Allow reading the following sysctl MIBs in capability mode:
kern.hostname, kern.domainname, and kern.hostuuid

This allows sandboxed applications to read these sysctls

Submitted by:	cem (original version)
Reviewed by:	cem, jonathan, rwatson (original version)
Sponsored by:	ScaleEngine Inc.
Differential Revision:	https://reviews.freebsd.org/D8015
2016-09-29 16:29:49 +00:00
Hans Petter Selasky
99eca1b2b3 While draining a timeout task prevent the taskqueue_enqueue_timeout()
function from restarting the timer.

Commonly taskqueue_enqueue_timeout() is called from within the task
function itself without any checks for teardown. Then it can happen
the timer stays active after the return of taskqueue_drain_timeout(),
because the timeout and task is drained separately.

This patch factors out the teardown flag into the timeout task itself,
allowing existing code to stay as-is instead of applying a teardown
flag to each and every of the timeout task consumers.

Add assert to taskqueue_drain_timeout() which prevents parallel
execution on the same timeout task.

Update manual page documenting the return value of
taskqueue_enqueue_timeout().

Differential Revision:	https://reviews.freebsd.org/D8012
Reviewed by:	kib, trasz
MFC after:	1 week
2016-09-29 10:38:20 +00:00
Hiren Panchasara
7c9a4d09d6 Revert r306337. dhw@ reproted a panic which seems related to this and bde@ has
raised some issues.
2016-09-26 15:45:30 +00:00
Eric van Gyzen
310ab671b8 Make no assertions about mutex state when the scheduler is stopped.
This changes the assert path to match the lock and unlock paths.

MFC after:	1 week
Sponsored by:	Dell EMC
2016-09-26 15:30:30 +00:00
Hiren Panchasara
41bb1a25a9 In sendit(), if mp->msg_control is present, then in sockargs() we are allocating
mbuf to store mp->msg_control. Later in kern_sendit(), call to getsock_cap(),
will check validity of file pointer passed, if this fails EBADF is returned but
mbuf allocated in sockargs() is not freed. Fix this possible leak.

Submitted by:	Lohith Bellad <lohith.bellad@me.com>
Reviewed by:	adrian
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D7910
2016-09-26 10:13:58 +00:00
Julian Elischer
1c8260b61d Give the user a clue as to which process hit maxfiles.
MFC after:	1 week
Sponsored by:	Panzura
2016-09-24 22:56:13 +00:00
Konstantin Belousov
939457e3e0 Add the foundation copyrights to procctl kernel sources.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-09-23 12:32:20 +00:00
Mariusz Zaborski
ad5e83dd3c fd: fix up fget_cap
If the kernel is not compiled with the CAPABILITIES kernel options
fget_unlocked doesn't return the sequence number so fd_modify will
always report modification, in that case we got infinity loop.

Reported by:	br
Reviewed by:	mjg
Tested by:	br, def
2016-09-23 08:13:46 +00:00
Mateusz Guzik
deffc4a026 fd: fix up fgetvp_rights after r306184
fget_cap_locked returns a referenced file, but the fgetvp_rights does
not need it. Instead, due to the filedesc lock being held, it can
ref the vnode after the file was looked up.

Fix up fget_cap_locked to be consistent with other _locked helpers and not
ref the file.

This plugs a leak introduced in r306184.

Pointy hat to: mjg, oshogbo
2016-09-23 06:51:46 +00:00
Mateusz Guzik
1d2541fd1a cache: get rid of the global lock
Add a table of vnode locks and use them along with bucketlocks to provide
concurrent modification support. The approach taken is to preserve the
current behaviour of the namecache and just lock all relevant parts before
any changes are made.

Lookups still require the relevant bucket to be locked.

Discussed with:		kib
Tested by:	pho
2016-09-23 04:45:11 +00:00
Gleb Smirnoff
a2d8f9d2fc Fix regression from r297400, which truncates headers in case of low socket
buffer and put a small optimization for low socket buffer case:

- Do not hack uio_resid, and let m_uiotombuf() properly take care of it. This
  fixes truncation of headers at low buffer.
- If headers ate all the space, jump right to the end of the cycle, to
  avoid doing single page I/O and allocating zero length mbuf.
- Clear hdr_uio only if space is positive, which indicates that all uio
  was copied in.

Reviewed by:	pluknet, jtl, emax, rrs, lstewart, emax, gallatin, scottl
2016-09-22 20:34:44 +00:00
Ruslan Bukin
30f3bfe58e Adjust the sopt_val pointer on bigendian systems (e.g. MIPS64EB).
sooptcopyin() checks if size of data provided by user is <= than we can
accept, else it strips down the size. On bigendian platforms we have to
move pointer as well so we copy the actual data.

Reviewed by:	gnn
Sponsored by:	DARPA, AFRL
Sponsored by:	HEIF5
Differential Revision:	https://reviews.freebsd.org/D7980
2016-09-22 12:41:53 +00:00
Mariusz Zaborski
6490bc6529 fd: simplify fgetvp_rights by using fget_cap_locked
Reviewed by:	mjg
2016-09-22 11:54:20 +00:00
Mariusz Zaborski
85b0f9de11 capsicum: propagate rights on accept(2)
Descriptor returned by accept(2) should inherits capabilities rights from
the listening socket.

PR:		201052
Reviewed by:	emaste, jonathan
Discussed with:	many
Differential Revision:	https://reviews.freebsd.org/D7724
2016-09-22 09:58:46 +00:00
Mark Johnston
bdaf6d6913 Regenerate syscall provider argument strings. 2016-09-22 04:50:03 +00:00