17021 Commits

Author SHA1 Message Date
Hans Petter Selasky
cc79ea3a26 Restore important comment in RCU/EPOCH support in FreeBSD after r355784.
Sponsored by:	Mellanox Technologies
2019-12-18 09:30:32 +00:00
Mateusz Guzik
6fa079fc3f vfs: flatten vop vectors
This eliminates the following loop from all VOP calls:

while(vop != NULL && \
    vop->vop_spare2 == NULL && vop->vop_bypass == NULL)
        vop = vop->vop_default;

Reviewed by:	jeff
Tesetd by:	pho
Differential Revision:	https://reviews.freebsd.org/D22738
2019-12-16 00:06:22 +00:00
Mateusz Guzik
3fd19ce7a5 mtx: eliminate recursion support from thread lock
Now that it is not used after schedlock changes got merged.

Note the unlock routine temporarily still checks for it on account of just using
regular spin unlock.

This is a prelude towards a general clean up.
2019-12-16 00:04:33 +00:00
Jeff Roberson
686bcb5c14 schedlock 4/4
Don't hold the scheduler lock while doing context switches.  Instead we
unlock after selecting the new thread and switch within a spinlock
section leaving interrupts and preemption disabled to prevent local
concurrency.  This means that mi_switch() is entered with the thread
locked but returns without.  This dramatically simplifies scheduler
locking because we will not hold the schedlock while spinning on
blocked lock in switch.

This change has not been made to 4BSD but in principle it would be
more straightforward.

Discussed with:	markj
Reviewed by:	kib
Tested by:	pho
Differential Revision: https://reviews.freebsd.org/D22778
2019-12-15 21:26:50 +00:00
Jeff Roberson
1c81a87efd schedlock 3/4
Eliminate lock recursion from turnstiles.  This was simply used to avoid
tracking the top-level turnstile lock.  explicitly check for it before
picking up and dropping locks.

Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22746
2019-12-15 21:19:41 +00:00
Jeff Roberson
045b7c6084 schedlock 2/4
Do all sleepqueue post-processing in sleepq_remove_thread() so that we
do not require the thread lock after a context switch.

Reviewed by:	jhb, kib
Differential Revision:	https://reviews.freebsd.org/D22745
2019-12-15 21:18:07 +00:00
Ian Lepore
b19c9dea3e Rewrite arm kernel stack unwind code to work when unwinding through modules.
The arm kernel stack unwinder has apparently never been able to unwind when
the path of execution leads through a kernel module. There was code that
tried to handle modules by looking for the unwind data in them, but it did
so by trying to find symbols which have never existed in arm kernel
modules. That caused the unwind code to panic, and because part of panic
handling calls into the unwind code, that just created a recursion loop.

Locating the unwind data in a loaded module requires accessing the Elf
section headers to find the SHT_ARM_EXIDX section. For preloaded modules
those headers are present in a metadata blob. For dynamically loaded
modules, the headers are present only while the loading is in progress; the
memory is freed once the module is ready to use. For that reason, there is
new code in kern/link_elf.c, wrapped in #ifdef __arm__, to extract the
unwind info while the headers are loaded. The values are saved into new
fields in the linker_file structure which are also conditional on __arm__.

In arm/unwind.c there is new code to locally cache the per-module info
needed to find the unwind tables. The local cache is crafted for lockless
read access, because the unwind code often needs to run in context where
sleeping is not allowed.  A large comment block describes the local cache
list, so I won't repeat it all here.
2019-12-15 21:16:35 +00:00
Jeff Roberson
61a74c5ccd schedlock 1/4
Eliminate recursion from most thread_lock consumers.  Return from
sched_add() without the thread_lock held.  This eliminates unnecessary
atomics and lock word loads as well as reducing the hold time for
scheduler locks.  This will eventually allow for lockless remote adds.

Discussed with:	kib
Reviewed by:	jhb
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22626
2019-12-15 21:11:15 +00:00
Jeff Roberson
d29f674f2e Fix a mistake in r355765. We need to activate the page if it is not yet
on a pagequeue.

Reported by:	pho
2019-12-15 06:26:47 +00:00
Jeff Roberson
a808177864 Add a deferred free mechanism for freeing swap space that does not require
an exclusive object lock.

Previously swap space was freed on a best effort basis when a page that
had valid swap was dirtied, thus invalidating the swap copy.  This may be
done inconsistently and requires the object lock which is not always
convenient.

Instead, track when swap space is present.  The first dirty is responsible
for deleting space or setting PGA_SWAP_FREE which will trigger background
scans to free the swap space.

Simplify the locking in vm_fault_dirty() now that we can reliably identify
the first dirty.

Discussed with:	alc, kib, markj
Differential Revision:	https://reviews.freebsd.org/D22654
2019-12-15 03:15:06 +00:00
Jeff Roberson
af00971419 Handle pagein clustering in vm_page_grab_valid() so that it can be used by
exec_map_first_page().  This will also enable pagein clustering for other
interested consumers (tmpfs, md, etc).

Discussed with:	alc
Approved by:	kib
Differential Revision:	https://reviews.freebsd.org/D22731
2019-12-15 02:00:32 +00:00
Doug Moore
9f70442a04 Simplify the processing a leaf mask to find big-enough ranges of set
bits, by storing and modifying the complement of the original leaf
mask, and by avoiding some unnecessary intermediate variables in
computing the shift amounts. The logic is similar to what has recently
been committed to sys/sys/bitstring.h.

Compute better hint updates for the case when the cursor starts in
mid-leaf, and eliminates some otherwise viable solutions. Assume the
worst case, that all the eliminated offsets could have been solutions,
and you can still compute a better hint than we use now.

Eliminate some unnecessary conditional control flow.

Approved by: alc
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22666
2019-12-14 19:44:42 +00:00
Mateusz Guzik
6f836483ec Remove the useless return value from proc_set_cred 2019-12-14 00:43:17 +00:00
John Baldwin
4b28d96e5d Remove the deprecated timeout(9) interface.
All in-tree consumers have been converted to callout(9).

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D22602
2019-12-13 21:03:12 +00:00
Warner Losh
b832a7e505 Create new wrapper function: bus_delayed_attach_children()
Delay the attachment of children, when requested, until after interrutps are
running. This is often needed to allow children to run transactions on i2c or
spi busses. It's a common enough idiom that it will be useful to have its own
wrapper.

Reviewed by: ian
Differential Revision: https://reviews.freebsd.org/D21465
2019-12-13 19:39:33 +00:00
John Baldwin
bf2276f378 Use callout(9) instead of deprecated timeout(9) for fail points.
Allocate the callout structure on-demand from
fail_point_use_timeout_path() since most fail points do not use
timeouts.

Reviewed by:	markj (earlier version), cem
Differential Revision:	https://reviews.freebsd.org/D22599
2019-12-13 19:26:04 +00:00
Edward Tomasz Napierala
34ad5ac242 Add kern_kill() and use it in Linuxulator. It's just a cleanup,
no functional changes.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22645
2019-12-13 18:44:02 +00:00
Edward Tomasz Napierala
be2cfdbc86 Add kern_getsid() and use it in Linuxulator; no functional changes.
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22647
2019-12-13 18:39:36 +00:00
Ryan Libby
9825eadf2c bitset: rename confusing macro NAND to ANDNOT
s/BIT_NAND/BIT_ANDNOT/, and for CPU and DOMAINSET too.  The actual
implementation is "and not" (or "but not"), i.e. A but not B.
Fortunately this does appear to be what all existing callers want.

Don't supply a NAND (not (A and B)) operation at this time.

Discussed with:	jeff
Reviewed by:	cem
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D22791
2019-12-13 09:32:16 +00:00
Conrad Meyer
cd5650407e kern/subr_unit: Rip srandomdev, random(3) out of dead code
The simulation cannot be reproduced, so the value of using a deterministic PRNG
like random(3) is dubious.  The number of repitions used in the sample isn't a
problem for the Chacha implementation of arc4random we have today.  (Also, no
one actually runs this code; it was provided as an example of the work the
author did validating the implementation.  It's not even test code.)
2019-12-13 04:48:20 +00:00
Rick Macklem
ea9a16b252 r355677 requires that vop_stdioctl() be global so it can be called from NFS.
r355677 modified the NFS client so that it does lseek(SEEK_DATA/SEEK_HOLE)
for NFSv4.2, but calls vop_stdioctl() otherwise. As such, vop_stdioctl()
needs to be a global function.

Missed during the code merge for r355677.
2019-12-13 00:14:12 +00:00
Edward Tomasz Napierala
d6fee74a0c Add kern_sync(9), and make kernel code call it instead of going
via sys_sync(2).  Minor cleanup, no functional changes.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19366
2019-12-12 18:45:31 +00:00
Mark Johnston
7789ab32b3 Rename tdq_ipipending and clear it in sched_switch().
This fixes a regression after r355311.  Specifically, sched_preempt()
may trigger a context switch by calling thread_lock(), since
thread_lock() calls critical_exit() in its slow path and the interrupted
thread may have already been marked for preemption.  This would happen
before tdq_ipipending is cleared, blocking further preemption IPIs.  The
CPU can be left in this state indefinitely if the interrupted thread
migrates.

Rename tdq_ipipending to tdq_owepreempt.  Any switch satisfies a remote
preemption request, so clear tdq_owepreempt in sched_switch() instead of
sched_preempt() to avoid subtle problems of the sort described above.

Reviewed by:	jeff, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22758
2019-12-12 02:43:24 +00:00
Mateusz Guzik
c8b29d1212 vfs: locking primitives which elide ->v_vnlock and shared locking disablement
Both of these features are not needed by many consumers and result in avoidable
reads which in turn puts them on profiles due to cache-line ping ponging.

On top of that the current lockgmr entry point is slower than necessary
single-threaded. As an attempted clean up preparing for other changes,
provide new routines which don't support any of the aforementioned features.

With these patches in place vop_stdlock and vop_stdunlock disappear from
flamegraphs during -j 104 buildkernel.

Reviewed by:	jeff (previous version)
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22665
2019-12-11 23:11:21 +00:00
Mateusz Guzik
55eb92db8d fd: static-ize and devolatile openfiles
Almost all access is using atomics. The only read is sysctl which should use
a whole-int-at-a-time friendly read internally.
2019-12-11 23:09:12 +00:00
Andriy Gapon
64ebbdd54d add a sanity check to the system call registration code
A system call number should be at least reserved.
We do not expect an attempt to register a fixed number system call
when nothing at all is known about it.

MFC after:	3 weeks
Sponsored by:	Panzura
2019-12-11 15:52:29 +00:00
John Baldwin
a8a03706fb Add a callout_func_t typedef for functions used with callout_*().
This typedef is the same as timeout_t except that it is in the callout
namespace and header.

Use this typedef in various places of the callout implementation that
were either using the raw type or timeout_t.

While here, add <sys/callout.h> to the manpage.

Reviewed by:	kib, imp
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D22751
2019-12-10 21:58:30 +00:00
Mateusz Guzik
ff4486e827 vfs: refactor vhold and vdrop
No fuctional changes.
2019-12-10 00:08:05 +00:00
John Baldwin
d8010b1175 Copy out aux args after the argument and environment vectors.
Partially revert r354741 and r354754 and go back to allocating a
fixed-size chunk of stack space for the auxiliary vector.  Keep
sv_copyout_auxargs but change it to accept the address at the end of
the environment vector as an input stack address and no longer
allocate room on the stack.  It is now called at the end of
copyout_strings after the argv and environment vectors have been
copied out.

This should fix a regression in r354754 that broke the stack alignment
for newer Linux amd64 binaries (and probably broke Linux arm64 as
well).

Reviewed by:	kib
Tested on:	amd64 (native, linux64 (only linux-base-c7), and i386)
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22695
2019-12-09 19:17:28 +00:00
Mateusz Guzik
abd80ddb94 vfs: introduce v_irflag and make v_type smaller
The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.

v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.

Reviewed by:	kib, jeff
Differential Revision:	https://reviews.freebsd.org/D22715
2019-12-08 21:30:04 +00:00
Mateusz Guzik
791a24c7ea vfs: clean up vputx a little
1. replace hand-rolled macros for operation type with enum
2. unlock the vnode in vput itself, there is no need to branch on it. existence
of VPUTX_VPUT remains significant in that the inactive variant adds LK_NOWAIT
to locking request.
3. remove the useless v_usecount assertion. few lines above the checks if
v_usecount > 0 and leaves. should the value be negative, refcount would fail.
4. the CTR return vnode %p to the freelist is incorrect as vdrop may find the
vnode with holdcnt > 1. if the like should exist, it should be moved there
5. no need to error = 0 for everyone

Reviewed by:	kib, jeff (previous version)
Differential Revision:	https://reviews.freebsd.org/D22718
2019-12-08 21:13:07 +00:00
Mateusz Guzik
fd6e0c43a6 vfs: factor out vnode destruction out of vdrop
Sponsored by:	The FreeBSD Foundation
2019-12-08 21:11:25 +00:00
Jeff Roberson
c3cccf95bf Handle multiple clock interrupts simultaneously in sched_clock().
Reviewed by:	kib, markj, mav
Differential Revision:	https://reviews.freebsd.org/D22625
2019-12-08 01:17:38 +00:00
Konstantin Belousov
0cc9fb7551 Only return EPERM from kill(-pid) when no process was signalled.
As mandated by POSIX.  Also clarify the kill(2) manpage.

While there, restructure the code in killpg1() to use helper which
keeps overall state of the process list iteration in the killpg1_ctx
structued, later used to infer the error returned.

Reported by:	amdmi3
Reviewed by:	jilles
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D22621
2019-12-07 18:07:49 +00:00
Mateusz Guzik
12e483e5f7 vfs: clean up delmntque similarly to vdrop r355414 2019-12-07 12:56:24 +00:00
Mateusz Guzik
4f4d9a086a vfs: catch vn_printf up with reality
- add the missing VV_VMSIZEVNLOCK and VV_READLINK flags
- add decoding v_mflag

While here sort flags.
2019-12-07 12:55:58 +00:00
Brooks Davis
af796bfa71 sysent: Reduce duplication and improve readability.
Use the power of variable to avoid spelling out source and generated
files too many times.  The previous Makefiles were hard to read, hard to
edit, and badly formatted.

Reviewed by:	kevans, emaste
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D22714
2019-12-06 23:59:23 +00:00
Alexander Motin
cb847b8152 Make devstat_end_transaction_bio() count BIO_ORDERED.
MFC after:	2 weeks
2019-12-06 18:39:05 +00:00
Bjoern A. Zeeb
173c062a56 Improve EPOCH_TRACE
Two changes to EPOCH_TRACE:
(1) add a sysctl to surpress the backtrace from epoch_trace_report().
    Sometimes the log line for the recursion is enough and the
    backtrace massively spams the console.
(2) In order to be able to go without the backtrace do not only
    print where the previous occurance happened, but also where
    the current one happens.  That way we have file:line information
    for both and can look at them without the need for getting line
    numbers from backtrace and a debugging tool.

Reviewed by:	glebius
Sponsored by:	Netflix (originally)
Differential Revision:	https://reviews.freebsd.org/D22641
2019-12-06 16:34:04 +00:00
Mateusz Guzik
befd3e35b3 sx: check for SX_LOCK_SHARED | SX_LOCK_WRITE_SPINNER when exclusive-locking
First, this removes a spurious difference compared to rw locks.
More importantly though this avoids a trip through sleepq code if the lock
happens to be caught in this state.
2019-12-05 13:43:44 +00:00
Mateusz Guzik
3eeb8a1fba vfs: remove 'active' variable from _vdrop
No functional changes.
2019-12-05 13:40:10 +00:00
Alexander Motin
61322a0a8a Mark some more hot global variables with __read_mostly.
MFC after:	1 week
2019-12-04 21:26:03 +00:00
Ryan Libby
30be9685a3 mbuf zones: take out the trash
The mbuf zones were explicitly specifying the uma trash procedures on
zcreate, conditionally on INVARIANTS, because that used to be necessary
in order to get use-after-free checking for uma zones with non-empty
constructors or destructors.  After r355137 uma automatically invokes
the trash constructor and destructor as long as no init and fini are
specified.  This now allows the mbuf zones to pass their constructors
and destructors without needing to add on the uma trash procedures
conditionally.

Reviewed by:	cem, jhb, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D22583
2019-12-04 18:21:29 +00:00
John Baldwin
31174518d2 Use uintptr_t instead of register_t * for the stack base.
- Use ustringp for the location of the argv and environment strings
  and allow destp to travel further down the stack for the stackgap
  and auxv regions.
- Update the Linux copyout_strings variants to move destp down the
  stack as was done for the native ABIs in r263349.
- Stop allocating a space for a stack gap in the Linux ABIs.  This
  used to hold translated system call arguments, but hasn't been used
  since r159992.

Reviewed by:	kib
Tested on:	md64 (amd64, i386, linux64), i386 (i386, linux)
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22501
2019-12-03 23:17:54 +00:00
Kirk McKusick
d00066a5f9 Currently the breadn_flags() and getblkx() interfaces are passed
the vnode, logical block number, and size of data block that is
being requested. They then use the VOP_BMAP function to calculate
the mapping from logical block number to physical block number from
which to access the data. This change expands the interface to also
pass the physical block number in cases where the VOP_MAP function
may no longer work, for example when a file is being truncated.

No functional change.

Reviewed by:  kib
Tested by:    Peter Holm
Sponsored by: Netflix
2019-12-03 23:07:09 +00:00
Jeff Roberson
9b78b1f433 Use a precise bit count for the slab free items in UMA. This significantly
shrinks embedded slab structures.

Reviewed by:	markj, rlibby (prior version)
Differential Revision:	https://reviews.freebsd.org/D22584
2019-12-02 22:44:34 +00:00
Jeff Roberson
4504268a1b Fix the last few cases that grab without busy or valid. The grab functions must
return the page in some held state for consistency elsewhere.

Reviewed by:	alc, kib, markj
Differential Revision:	https://reviews.freebsd.org/D22610
2019-12-02 22:38:25 +00:00
Jeff Roberson
e15046952d Initialize the idle thread's lock sooner so it's not evaluated on every fork
exit and we can rely on it elsewhere.

Reviewed by:	mav, kib, jhb, markj
Differential Revision:	https://reviews.freebsd.org/D22624
2019-12-02 22:35:45 +00:00
Mateusz Guzik
5fe188b1e8 lockmgr: remove more remnants of adaptive spinning
Sponsored by:	The FreeBSD Foundation
2019-12-01 00:35:08 +00:00
Kyle Evans
1b50b999f9 tty: implement TIOCNOTTY
Generally, it's preferred that an application fork/setsid if it doesn't want
to keep its controlling TTY, but it could be that a debugger is trying to
steal it instead -- so it would hook in, drop the controlling TTY, then do
some magic to set things up again. In this case, TIOCNOTTY is quite handy
and still respected by at least OpenBSD, NetBSD, and Linux as far as I can
tell.

I've dropped the note about obsoletion, as I intend to support TIOCNOTTY as
long as it doesn't impose a major burden.

Reviewed by:	bcr (manpages), kib
Differential Revision:	https://reviews.freebsd.org/D22572
2019-11-30 20:10:50 +00:00