15755 Commits

Author SHA1 Message Date
Alexander Kabaev
6d41588b6b Reverse the check to allocate the buffer if cached pointer is NULL.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D13596
2017-12-23 17:55:19 +00:00
Alexander Kabaev
4daa09f343 Remove dead store to local variable. 2017-12-23 16:49:57 +00:00
Bruce Evans
da9fba5447 Use resume_cpus() instead of restart_cpus() to resume from ACPI suspension.
restart_cpus() worked well enough by accident.  Before this set of fixes,
resume_cpus() used the same cpuset (started_cpus, meaning CPUs directed to
restart) as restart_cpus().  resume_cpus() waited for the wrong cpuset
(stopped_cpus) to become empty, but since mixtures of stopped and suspended
CPUs are not close to working, stopped_cpus must be empty when resuming so
the wait is null -- restart_cpus just allows the other CPUs to restart and
returns without waiting.

Fix resume_cpus() to wait on a non-wrong cpuset for the ACPI case, and
add further kludges to try to keep it working for the XEN case.  It
was only used for XEN.  It waited on suspended_cpus.  This works for
XEN.  However, for ACPI, resuming is a 2-step process.  ACPI has already
woken up the other CPUs and removed them from suspended_cpus.  This
fix records the move by putting them in a new cpuset resuming_cpus.
Waiting on suspended_cpus would give the same null wait as waiting on
stopped_cpus.  Wait on resuming_cpus instead.

Add a cpuset toresume_cpus to map the CPUs being told to resume to keep
this separate from the cpuset started_cpus for mapping the CPUs being told
to restart.  Mixtures of stopped and suspended/resuming CPUs are still far
from working.  Describe new and some old cpusets in comments.

Add further kludges to cpususpend_handler() to try to avoid breaking it
for XEN.  XEN doesn't use resumectx(), so it doesn't use the second
return path for savectx(), and it goes from the suspended state directly
to the restarted state, while ACPI resume goes through the resuming state.
Enter the resuming state early for all cases so that resume_cpus can test
for being in this state and not have to worry about the intermediate
!suspended state for ACPI only.

Reviewed by:	kib
2017-12-21 09:17:48 +00:00
John Baldwin
b501cc5da6 Rework pathconf handling for FIFOs.
On the one hand, FIFOs should respect other variables not supported by
the fifofs vnode operation (such as _PC_NAME_MAX, _PC_LINK_MAX, etc.).
These values are fs-specific and must come from a fs-specific method.
On the other hand, filesystems that support FIFOs are required to
support _PC_PIPE_BUF on directory vnodes that can contain FIFOs.
Given this latter requirement, once the fs-specific VOP_PATHCONF
method supports _PC_PIPE_BUF for directories, it is also suitable for
FIFOs permitting a single VOP_PATHCONF method to be used for both
FIFOs and non-FIFOs.

To that end, retire all of the FIFO-specific pathconf methods from
filesystems and change FIFO-specific vnode operation switches to use
the existing fs-specific VOP_PATHCONF method.  For fifofs, set it's
VOP_PATHCONF to VOP_PANIC since it should no longer be used.

While here, move _PC_PIPE_BUF handling out of vop_stdpathconf() so that
only filesystems supporting FIFOs will report a value.  In addition,
only report a valid _PC_PIPE_BUF for directories and FIFOs.

Discussed with:	bde
Reviewed by:	kib (part of a larger patch)
MFC after:	1 month
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D12572
2017-12-19 22:39:05 +00:00
John Baldwin
599afe53a8 Move NAME_MAX, LINK_MAX, and CHOWN_RESTRICTED out of vop_stdpathconf().
Having all filesystems fall through to default values isn't always correct
and these values can vary for different filesystem implementations.  Most
of these changes just use the existing default values with a few exceptions:
- Don't report CHOWN_RESTRICTED for ZFS since it doesn't do the exact
  permissions check this claims for chown().
- Use NANDFS_NAME_LEN for NAME_MAX for nandfs.
- Don't report a LINK_MAX of 0 on smbfs.  Now fail with EINVAL to
  indicate hard links aren't supported.

Requested by:	bde (though perhaps not this exact implementation)
Reviewed by:	kib (earlier version)
MFC after:	1 month
Sponsored by:	Chelsio Communications
2017-12-19 19:51:36 +00:00
John Baldwin
dd688800e1 Add a custom VOP_PATHCONF method for fdescfs.
The method handles NAME_MAX and LINK_MAX explicitly.  For all other
pathconf variables, the method passes the request down to the underlying
file descriptor.  This requires splitting a kern_fpathconf() syscallsubr
routine out of sys_fpathconf().  Also, to avoid lock order reversals with
vnode locks, the fdescfs vnode is unlocked around the call to
kern_fpathconf(), but with the usecount of the vnode bumped.

MFC after:	1 month
Sponsored by:	Chelsio Communications
2017-12-19 18:20:38 +00:00
Konstantin Belousov
6f697994fd Use atomic_load(9) to read ppsinfo sequence numbers.
In this case volatile qualifiers enusre that a compiler does not
optimize the accesses out.

Reviewed by:	alc, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D13534
2017-12-19 10:05:45 +00:00
Pedro F. Giffuni
62cf53fdac SPDX: some uses of the RSA-MD license. 2017-12-13 16:30:39 +00:00
Fedor Uporov
4ba058c0cf Fix kernel build if MAC is not defined.
Reported by:    Ravi Pokala, Andrew Turner
Approved by:    pfg (mentor)
MFC after:      1 week
2017-12-13 16:14:38 +00:00
Fedor Uporov
61b214f338 Move buffer size checks outside of the vnode locks.
Reviewed by:    kib, cem, pfg (mentor)
Approved by:    pfg (mentor)
MFC after:      1 weeks

Differential Revision:    https://reviews.freebsd.org/D13405
2017-12-12 20:15:57 +00:00
Bruce Evans
fb3cc1c37d Move instantiation of msgbufp from 9 MD files to subr_prf.c.
This variable should be pure MI except possibly for reading it in MD
dump routines.  Its initialization was pure MD in 4.4BSD, but FreeBSD
changed this in r36441 in 1998.  There were many imperfections in
r36441.  This commit fixes only a small one, to simplify fixing the
others 1 arch at a time.  (r47678 added support for
special/early/multiple message buffer initialization which I want in
a more general form, but this was too fragile to use because hacking
on the msgbufp global corrupted it, and was only used for 5 hours in
-current...)
2017-12-07 07:55:38 +00:00
Mark Johnston
e1703ef5ae Plug a name cache lock leak.
Reviewed by:	mjg
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-12-01 22:51:02 +00:00
Konstantin Belousov
36bce27be9 Destroy seltd st_mtx and st_wait in seltdfini().
A correct destruction is important for WITNESS(4) and LOCK_PROFILING(9).

Submitted by:	Sebastian Huber <sebastian.huber@embedded-brains.de>
MFC after:	1 week
2017-12-01 11:18:19 +00:00
Pedro F. Giffuni
64de3fdd58 SPDX: use the Beerware identifier. 2017-11-30 20:33:45 +00:00
Hans Petter Selasky
1408b84a26 The sched_add() function is not only used when the thread is initially
started, but also by the turnstiles to mark a thread as runnable for
all locks, for instance sleepqueues do:
setrunnable()->sched_wakeup()->sched_add()

In r326218 code was added to allow booting from non-zero CPU numbers
by setting the ts_cpu field inside the ULE scheduler's sched_add()
function. This had an undesired side-effect that prior sched_pin() and
sched_bind() calls got disregarded. This patch fixes the
initialization of the ts_cpu field for the ULE scheduler to only
happen once when the initial thread is constructed during system
init. Forking will then later on ensure that a valid ts_cpu value gets
copied to all children.

Reviewed by:	jhb, kib
Discussed with:	nwhitehorn
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D13298
Sponsored by:	Mellanox Technologies
2017-11-29 23:28:40 +00:00
Alexey Dokuchaev
2c9ec07528 Fix several noticed style issues.
Reviewed by:	bde
Approved by:	bapt
2017-11-29 12:49:22 +00:00
Jeff Roberson
2e47807c21 Eliminate kmem_arena and kmem_object in preparation for further NUMA commits.
The arena argument to kmem_*() is now only used in an assert.  A follow-up
commit will remove the argument altogether before we freeze the API for the
next release.

This replaces the hard limit on kmem size with a soft limit imposed by UMA.  When
the soft limit is exceeded we periodically wakeup the UMA reclaim thread to
attempt to shrink KVA.  On 32bit architectures this should behave much more
gracefully as we exhaust KVA.  On 64bit the limits are likely never hit.

Reviewed by:	markj, kib (some objections)
Discussed with:	alc
Tested by:	pho
Sponsored by:	Netflix / Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D13187
2017-11-28 23:40:54 +00:00
Brooks Davis
5cd667e65f Disable vim syntax highlighting.
Vim's default pick doesn't understand that ';' is a comment character
and the result looks horrible.

Reviewed by:	emaste
2017-11-28 18:23:17 +00:00
Edward Tomasz Napierala
212ff84f4a Make kdb_reenter() silent when explicitly called from db_error().
This removes the useless backtrace on various ddb(4) user errors.

Reviewed by:	jhb@
Obtained from:	CheriBSD
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D13212
2017-11-28 12:53:55 +00:00
Nathan Whitehorn
51de47e3f8 Remove assertion that a CPU be present before returning a PCPU for it. It
is up to the caller to check for a NULL return value. The assert was meant
to catch buggy code that did not check the return value. Some code, however,
was smart and used the return value to see if a CPU existed, which this
broke.

Requested by:	jhb@
2017-11-28 05:39:48 +00:00
Pedro F. Giffuni
8a36da99de sys/kern: adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:20:12 +00:00
Mateusz Guzik
e57b2b1830 rw: fix runlock_hard when new readers show up
When waiters/writer spinner flags are set no new readers can show up unless
they already have a different rw rock read locked. The change in r326195 failed
to take that into account - in presence of new readers it would spin until
they all drain, which would be lead to trouble if e.g. they go off cpu and
can get scheduled because of this thread.

Reported by:	pho
2017-11-26 21:10:47 +00:00
Nathan Whitehorn
efe67753cc Remove some, but not all, assumptions that the BSP is CPU 0 and that CPUs
are numbered densely from there to n_cpus.

MFC after:	1 month
2017-11-25 23:41:05 +00:00
Mateusz Guzik
2c50bafef5 Add the missing lockstat check for thread lock. 2017-11-25 20:49:27 +00:00
Mateusz Guzik
5ba6facfcd rwlock: fix up compilation of the previous change
commmitted wrong version of the patch
2017-11-25 20:25:45 +00:00
Mateusz Guzik
c1e1a7ec30 rwlock: add __rw_try_{r,w}lock_int 2017-11-25 20:22:51 +00:00
Mateusz Guzik
cec1747322 sx: change sunlock to wake waiters up if it locked sleepq
sleepq is only locked if the curhtread is the last reader. By the time
the lock gets acquired new ones could have arrived. The previous code
would unlock and loop back. This results spurious relocking of sleepq.

This is a step towards xadd-based unlock routine.
2017-11-25 20:13:50 +00:00
Mateusz Guzik
93118b62f9 locks: retry turnstile/sleepq loops on failed cmpset
In order to go to sleep threads set waiter flags, but that can spuriously
fail e.g. when a new reader arrives. Instead of unlocking everything and
looping back, re-evaluate the new state while still holding the lock necessary
to go to sleep.
2017-11-25 20:10:33 +00:00
Mateusz Guzik
2e106e0427 rwlock: stop re-reading the owner when going to sleep 2017-11-25 20:08:11 +00:00
John Baldwin
ffb6607984 Decode kevent structures logged via ktrace(2) in kdump.
- Add a new KTR_STRUCT_ARRAY ktrace record type which dumps an array of
  structures.

  The structure name in the record payload is preceded by a size_t
  containing the size of the individual structures.  Use this to
  replace the previous code that dumped the kevent arrays dumped for
  kevent().  kdump is now able to decode the kevent structures rather
  than dumping their contents via a hexdump.

  One change from before is that the 'changes' and 'events' arrays are
  not marked with separate 'read' and 'write' annotations in kdump
  output.  Instead, the first array is the 'changes' array, and the
  second array (only present if kevent doesn't fail with an error) is
  the 'events' array.  For kevent(), empty arrays are denoted by an
  entry with an array containing zero entries rather than no record.

- Move kevent decoding tables from truss to libsysdecode.

  This adds three new functions to decode members of struct kevent:
  sysdecode_kevent_filter, sysdecode_kevent_flags, and
  sysdecode_kevent_fflags.

  kdump uses these helper functions to pretty-print kevent fields.

- Move structure definitions for freebsd11 and freebsd32 kevent
  structures to <sys/event.h> so that they can be shared with userland.
  The 32-bit structures are only exposed if _WANT_KEVENT32 is defined.
  The freebsd11 structures are only exposed if _WANT_FREEBSD11_KEVENT is
  defined.  The 32-bit freebsd11 structure requires both.

- Decode freebsd11 kevent structures in truss for the compat11.kevent()
  system call.

- Log 32-bit kevent structures via ktrace for 32-bit compat kevent()
  system calls.

- While here, constify the 'void *data' argument to ktrstruct().

Reviewed by:	kib (earlier version)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D12470
2017-11-25 04:49:12 +00:00
Mark Johnston
dbe4541db2 Have lockstat:::sx-release fire only after the lock state has changed.
MFC after:	1 week
2017-11-24 19:04:31 +00:00
Mark Johnston
26d94f99af Add a missing lockstat:::sx-downgrade probe.
We were returning without firing the probe when the lock had no shared
waiters.

MFC after:	1 week
2017-11-24 19:02:06 +00:00
Ed Schouten
814629dd64 Don't let cpu_set_syscall_retval() clobber exec_setregs().
Upon successful completion, the execve() system call invokes
exec_setregs() to initialize the registers of the initial thread of the
newly executed process. What is weird is that when execve() returns, it
still goes through the normal system call return path, clobbering the
registers with the system call's return value (td->td_retval).

Though this doesn't seem to be problematic for x86 most of the times (as
the value of eax/rax doesn't matter upon startup), this can be pretty
frustrating for architectures where function argument and return
registers overlap (e.g., ARM). On these systems, exec_setregs() also
needs to initialize td_retval.

Even worse are architectures where cpu_set_syscall_retval() sets
registers to values not derived from td_retval. On these architectures,
there is no way cpu_set_syscall_retval() can set registers to the way it
wants them to be upon the start of execution.

To get rid of this madness, let sys_execve() return EJUSTRETURN. This
will cause cpu_set_syscall_retval() to leave registers intact. This
makes process execution easier to understand. It also eliminates the
difference between execution of the initial process and successive ones.
The initial call to sys_execve() is not performed through a system call
context.

Reviewed by:	kib, jhibbits
Differential Revision:	https://reviews.freebsd.org/D13180
2017-11-24 07:35:08 +00:00
Konstantin Belousov
ee50062cfb Kill all descendants of the reaper, even if they are descendants of a
subordinate reaper.

Also, mark reapers when listing pids.

Reported by:	Michael Zuo <muh.muhten@gmail.com>
PR:	223745
Reviewed by:	bapt
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D13183
2017-11-23 11:25:11 +00:00
Mateusz Guzik
2d96bd8812 sx: unbreak debug after r326107
An assertion was modified to use the found value, but it was not updated to
handle a race where blocked threads appear after the entrance to the func.

Move the assertion down to the area protected with sleepq lock where the
lock is read anyway. This does not affect coverage of the assertion and
is consistent with what rw locks are doing.

Reported by:	Shawn Webb
2017-11-23 03:40:51 +00:00
Mateusz Guzik
62b0676cde rwlock: unbreak WITNESS builds after r326110
Reported by:	Shawn Webb
2017-11-23 03:20:12 +00:00
Mateusz Guzik
70502e39d3 rwlock: don't check for curthread's read lock count in the fast path 2017-11-22 23:52:05 +00:00
Mateusz Guzik
b584eb2e90 locks: pass the found lock value to unlock slow path
This avoids an explicit read later.

While here whack the cheaply obtainable 'tid' argument.
2017-11-22 22:04:04 +00:00
Mateusz Guzik
013c0b493f locks: remove the file + line argument from internal primitives when not used
The pair is of use only in debug or LOCKPROF kernels, but was passed (zeroed)
for many locks even in production kernels.

While here whack the tid argument from wlock hard and xlock hard.

There is no kbi change of any sort - "external" primitives still accept the
pair.
2017-11-22 21:51:17 +00:00
Mark Johnston
755230eb9f Clean up the SYSINIT_FLAGS definitions for rwlock(9) and rmlock(9).
Avoid duplication in their macro definitions, and document them. No
functional change intended.

MFC after:	1 week
2017-11-21 14:59:23 +00:00
Scott Long
cab229b2a6 Update a comment in brelse() to match reality. 2017-11-20 20:53:03 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Pedro F. Giffuni
df57947f08 spdx: initial adoption of licensing ID tags.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

Initially, only tag files that use BSD 4-Clause "Original" license.

RelNotes:	yes
Differential Revision:	https://reviews.freebsd.org/D13133
2017-11-18 14:26:50 +00:00
Mateusz Guzik
284194f183 locks: fix compilation issues without SMP or KDTRACE_HOOKS 2017-11-17 23:27:06 +00:00
Mateusz Guzik
18f23540d8 lockmgr: remove the ADAPTIVE_LOCKMGRS option
The code was never enabled and is very heavy weight.

A revamped adaptive spinning may show up at a later time.

Discussed with:	kib
2017-11-17 20:41:17 +00:00
Conrad Meyer
38d84d683e vfs_lookup: Allow PATH_MAX-1 symlinks
Previously, symlinks in FreeBSD were artificially limited to PATH_MAX-2.

Add a short test case to verify the change.

Submitted by:	Gaurav Gangalwar <ggangalwar AT isilon.com>
Reviewed by:	kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12589
2017-11-17 19:25:39 +00:00
Mateusz Guzik
2ccee9cc52 mtx: add missing parts of the diff in r325920
Fixes build breakage.
2017-11-17 02:59:28 +00:00
Mateusz Guzik
32aef9ff05 sched: move panic handling code out of choosethread
This avoids jumps in the common case of the kernel not being panicked.
2017-11-17 02:45:38 +00:00
Mateusz Guzik
997131646f Check for PRS_NEW without locking the proc in sysctl_kern_proc 2017-11-17 02:29:06 +00:00
Mateusz Guzik
bc24577c25 sx: perform a minor cleanup of the unlock slowpath
No functional changes.
2017-11-17 02:27:04 +00:00