Commit Graph

17011 Commits

Author SHA1 Message Date
Jeff Roberson
af00971419 Handle pagein clustering in vm_page_grab_valid() so that it can be used by
exec_map_first_page().  This will also enable pagein clustering for other
interested consumers (tmpfs, md, etc).

Discussed with:	alc
Approved by:	kib
Differential Revision:	https://reviews.freebsd.org/D22731
2019-12-15 02:00:32 +00:00
Doug Moore
9f70442a04 Simplify the processing a leaf mask to find big-enough ranges of set
bits, by storing and modifying the complement of the original leaf
mask, and by avoiding some unnecessary intermediate variables in
computing the shift amounts. The logic is similar to what has recently
been committed to sys/sys/bitstring.h.

Compute better hint updates for the case when the cursor starts in
mid-leaf, and eliminates some otherwise viable solutions. Assume the
worst case, that all the eliminated offsets could have been solutions,
and you can still compute a better hint than we use now.

Eliminate some unnecessary conditional control flow.

Approved by: alc
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22666
2019-12-14 19:44:42 +00:00
Mateusz Guzik
6f836483ec Remove the useless return value from proc_set_cred 2019-12-14 00:43:17 +00:00
John Baldwin
4b28d96e5d Remove the deprecated timeout(9) interface.
All in-tree consumers have been converted to callout(9).

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D22602
2019-12-13 21:03:12 +00:00
Warner Losh
b832a7e505 Create new wrapper function: bus_delayed_attach_children()
Delay the attachment of children, when requested, until after interrutps are
running. This is often needed to allow children to run transactions on i2c or
spi busses. It's a common enough idiom that it will be useful to have its own
wrapper.

Reviewed by: ian
Differential Revision: https://reviews.freebsd.org/D21465
2019-12-13 19:39:33 +00:00
John Baldwin
bf2276f378 Use callout(9) instead of deprecated timeout(9) for fail points.
Allocate the callout structure on-demand from
fail_point_use_timeout_path() since most fail points do not use
timeouts.

Reviewed by:	markj (earlier version), cem
Differential Revision:	https://reviews.freebsd.org/D22599
2019-12-13 19:26:04 +00:00
Edward Tomasz Napierala
34ad5ac242 Add kern_kill() and use it in Linuxulator. It's just a cleanup,
no functional changes.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22645
2019-12-13 18:44:02 +00:00
Edward Tomasz Napierala
be2cfdbc86 Add kern_getsid() and use it in Linuxulator; no functional changes.
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22647
2019-12-13 18:39:36 +00:00
Ryan Libby
9825eadf2c bitset: rename confusing macro NAND to ANDNOT
s/BIT_NAND/BIT_ANDNOT/, and for CPU and DOMAINSET too.  The actual
implementation is "and not" (or "but not"), i.e. A but not B.
Fortunately this does appear to be what all existing callers want.

Don't supply a NAND (not (A and B)) operation at this time.

Discussed with:	jeff
Reviewed by:	cem
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D22791
2019-12-13 09:32:16 +00:00
Conrad Meyer
cd5650407e kern/subr_unit: Rip srandomdev, random(3) out of dead code
The simulation cannot be reproduced, so the value of using a deterministic PRNG
like random(3) is dubious.  The number of repitions used in the sample isn't a
problem for the Chacha implementation of arc4random we have today.  (Also, no
one actually runs this code; it was provided as an example of the work the
author did validating the implementation.  It's not even test code.)
2019-12-13 04:48:20 +00:00
Rick Macklem
ea9a16b252 r355677 requires that vop_stdioctl() be global so it can be called from NFS.
r355677 modified the NFS client so that it does lseek(SEEK_DATA/SEEK_HOLE)
for NFSv4.2, but calls vop_stdioctl() otherwise. As such, vop_stdioctl()
needs to be a global function.

Missed during the code merge for r355677.
2019-12-13 00:14:12 +00:00
Edward Tomasz Napierala
d6fee74a0c Add kern_sync(9), and make kernel code call it instead of going
via sys_sync(2).  Minor cleanup, no functional changes.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19366
2019-12-12 18:45:31 +00:00
Mark Johnston
7789ab32b3 Rename tdq_ipipending and clear it in sched_switch().
This fixes a regression after r355311.  Specifically, sched_preempt()
may trigger a context switch by calling thread_lock(), since
thread_lock() calls critical_exit() in its slow path and the interrupted
thread may have already been marked for preemption.  This would happen
before tdq_ipipending is cleared, blocking further preemption IPIs.  The
CPU can be left in this state indefinitely if the interrupted thread
migrates.

Rename tdq_ipipending to tdq_owepreempt.  Any switch satisfies a remote
preemption request, so clear tdq_owepreempt in sched_switch() instead of
sched_preempt() to avoid subtle problems of the sort described above.

Reviewed by:	jeff, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22758
2019-12-12 02:43:24 +00:00
Mateusz Guzik
c8b29d1212 vfs: locking primitives which elide ->v_vnlock and shared locking disablement
Both of these features are not needed by many consumers and result in avoidable
reads which in turn puts them on profiles due to cache-line ping ponging.

On top of that the current lockgmr entry point is slower than necessary
single-threaded. As an attempted clean up preparing for other changes,
provide new routines which don't support any of the aforementioned features.

With these patches in place vop_stdlock and vop_stdunlock disappear from
flamegraphs during -j 104 buildkernel.

Reviewed by:	jeff (previous version)
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22665
2019-12-11 23:11:21 +00:00
Mateusz Guzik
55eb92db8d fd: static-ize and devolatile openfiles
Almost all access is using atomics. The only read is sysctl which should use
a whole-int-at-a-time friendly read internally.
2019-12-11 23:09:12 +00:00
Andriy Gapon
64ebbdd54d add a sanity check to the system call registration code
A system call number should be at least reserved.
We do not expect an attempt to register a fixed number system call
when nothing at all is known about it.

MFC after:	3 weeks
Sponsored by:	Panzura
2019-12-11 15:52:29 +00:00
John Baldwin
a8a03706fb Add a callout_func_t typedef for functions used with callout_*().
This typedef is the same as timeout_t except that it is in the callout
namespace and header.

Use this typedef in various places of the callout implementation that
were either using the raw type or timeout_t.

While here, add <sys/callout.h> to the manpage.

Reviewed by:	kib, imp
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D22751
2019-12-10 21:58:30 +00:00
Mateusz Guzik
ff4486e827 vfs: refactor vhold and vdrop
No fuctional changes.
2019-12-10 00:08:05 +00:00
John Baldwin
d8010b1175 Copy out aux args after the argument and environment vectors.
Partially revert r354741 and r354754 and go back to allocating a
fixed-size chunk of stack space for the auxiliary vector.  Keep
sv_copyout_auxargs but change it to accept the address at the end of
the environment vector as an input stack address and no longer
allocate room on the stack.  It is now called at the end of
copyout_strings after the argv and environment vectors have been
copied out.

This should fix a regression in r354754 that broke the stack alignment
for newer Linux amd64 binaries (and probably broke Linux arm64 as
well).

Reviewed by:	kib
Tested on:	amd64 (native, linux64 (only linux-base-c7), and i386)
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22695
2019-12-09 19:17:28 +00:00
Mateusz Guzik
abd80ddb94 vfs: introduce v_irflag and make v_type smaller
The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.

v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.

Reviewed by:	kib, jeff
Differential Revision:	https://reviews.freebsd.org/D22715
2019-12-08 21:30:04 +00:00
Mateusz Guzik
791a24c7ea vfs: clean up vputx a little
1. replace hand-rolled macros for operation type with enum
2. unlock the vnode in vput itself, there is no need to branch on it. existence
of VPUTX_VPUT remains significant in that the inactive variant adds LK_NOWAIT
to locking request.
3. remove the useless v_usecount assertion. few lines above the checks if
v_usecount > 0 and leaves. should the value be negative, refcount would fail.
4. the CTR return vnode %p to the freelist is incorrect as vdrop may find the
vnode with holdcnt > 1. if the like should exist, it should be moved there
5. no need to error = 0 for everyone

Reviewed by:	kib, jeff (previous version)
Differential Revision:	https://reviews.freebsd.org/D22718
2019-12-08 21:13:07 +00:00
Mateusz Guzik
fd6e0c43a6 vfs: factor out vnode destruction out of vdrop
Sponsored by:	The FreeBSD Foundation
2019-12-08 21:11:25 +00:00
Jeff Roberson
c3cccf95bf Handle multiple clock interrupts simultaneously in sched_clock().
Reviewed by:	kib, markj, mav
Differential Revision:	https://reviews.freebsd.org/D22625
2019-12-08 01:17:38 +00:00
Konstantin Belousov
0cc9fb7551 Only return EPERM from kill(-pid) when no process was signalled.
As mandated by POSIX.  Also clarify the kill(2) manpage.

While there, restructure the code in killpg1() to use helper which
keeps overall state of the process list iteration in the killpg1_ctx
structued, later used to infer the error returned.

Reported by:	amdmi3
Reviewed by:	jilles
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D22621
2019-12-07 18:07:49 +00:00
Mateusz Guzik
12e483e5f7 vfs: clean up delmntque similarly to vdrop r355414 2019-12-07 12:56:24 +00:00
Mateusz Guzik
4f4d9a086a vfs: catch vn_printf up with reality
- add the missing VV_VMSIZEVNLOCK and VV_READLINK flags
- add decoding v_mflag

While here sort flags.
2019-12-07 12:55:58 +00:00
Brooks Davis
af796bfa71 sysent: Reduce duplication and improve readability.
Use the power of variable to avoid spelling out source and generated
files too many times.  The previous Makefiles were hard to read, hard to
edit, and badly formatted.

Reviewed by:	kevans, emaste
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D22714
2019-12-06 23:59:23 +00:00
Alexander Motin
cb847b8152 Make devstat_end_transaction_bio() count BIO_ORDERED.
MFC after:	2 weeks
2019-12-06 18:39:05 +00:00
Bjoern A. Zeeb
173c062a56 Improve EPOCH_TRACE
Two changes to EPOCH_TRACE:
(1) add a sysctl to surpress the backtrace from epoch_trace_report().
    Sometimes the log line for the recursion is enough and the
    backtrace massively spams the console.
(2) In order to be able to go without the backtrace do not only
    print where the previous occurance happened, but also where
    the current one happens.  That way we have file:line information
    for both and can look at them without the need for getting line
    numbers from backtrace and a debugging tool.

Reviewed by:	glebius
Sponsored by:	Netflix (originally)
Differential Revision:	https://reviews.freebsd.org/D22641
2019-12-06 16:34:04 +00:00
Mateusz Guzik
befd3e35b3 sx: check for SX_LOCK_SHARED | SX_LOCK_WRITE_SPINNER when exclusive-locking
First, this removes a spurious difference compared to rw locks.
More importantly though this avoids a trip through sleepq code if the lock
happens to be caught in this state.
2019-12-05 13:43:44 +00:00
Mateusz Guzik
3eeb8a1fba vfs: remove 'active' variable from _vdrop
No functional changes.
2019-12-05 13:40:10 +00:00
Alexander Motin
61322a0a8a Mark some more hot global variables with __read_mostly.
MFC after:	1 week
2019-12-04 21:26:03 +00:00
Ryan Libby
30be9685a3 mbuf zones: take out the trash
The mbuf zones were explicitly specifying the uma trash procedures on
zcreate, conditionally on INVARIANTS, because that used to be necessary
in order to get use-after-free checking for uma zones with non-empty
constructors or destructors.  After r355137 uma automatically invokes
the trash constructor and destructor as long as no init and fini are
specified.  This now allows the mbuf zones to pass their constructors
and destructors without needing to add on the uma trash procedures
conditionally.

Reviewed by:	cem, jhb, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D22583
2019-12-04 18:21:29 +00:00
John Baldwin
31174518d2 Use uintptr_t instead of register_t * for the stack base.
- Use ustringp for the location of the argv and environment strings
  and allow destp to travel further down the stack for the stackgap
  and auxv regions.
- Update the Linux copyout_strings variants to move destp down the
  stack as was done for the native ABIs in r263349.
- Stop allocating a space for a stack gap in the Linux ABIs.  This
  used to hold translated system call arguments, but hasn't been used
  since r159992.

Reviewed by:	kib
Tested on:	md64 (amd64, i386, linux64), i386 (i386, linux)
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22501
2019-12-03 23:17:54 +00:00
Kirk McKusick
d00066a5f9 Currently the breadn_flags() and getblkx() interfaces are passed
the vnode, logical block number, and size of data block that is
being requested. They then use the VOP_BMAP function to calculate
the mapping from logical block number to physical block number from
which to access the data. This change expands the interface to also
pass the physical block number in cases where the VOP_MAP function
may no longer work, for example when a file is being truncated.

No functional change.

Reviewed by:  kib
Tested by:    Peter Holm
Sponsored by: Netflix
2019-12-03 23:07:09 +00:00
Jeff Roberson
9b78b1f433 Use a precise bit count for the slab free items in UMA. This significantly
shrinks embedded slab structures.

Reviewed by:	markj, rlibby (prior version)
Differential Revision:	https://reviews.freebsd.org/D22584
2019-12-02 22:44:34 +00:00
Jeff Roberson
4504268a1b Fix the last few cases that grab without busy or valid. The grab functions must
return the page in some held state for consistency elsewhere.

Reviewed by:	alc, kib, markj
Differential Revision:	https://reviews.freebsd.org/D22610
2019-12-02 22:38:25 +00:00
Jeff Roberson
e15046952d Initialize the idle thread's lock sooner so it's not evaluated on every fork
exit and we can rely on it elsewhere.

Reviewed by:	mav, kib, jhb, markj
Differential Revision:	https://reviews.freebsd.org/D22624
2019-12-02 22:35:45 +00:00
Mateusz Guzik
5fe188b1e8 lockmgr: remove more remnants of adaptive spinning
Sponsored by:	The FreeBSD Foundation
2019-12-01 00:35:08 +00:00
Kyle Evans
1b50b999f9 tty: implement TIOCNOTTY
Generally, it's preferred that an application fork/setsid if it doesn't want
to keep its controlling TTY, but it could be that a debugger is trying to
steal it instead -- so it would hook in, drop the controlling TTY, then do
some magic to set things up again. In this case, TIOCNOTTY is quite handy
and still respected by at least OpenBSD, NetBSD, and Linux as far as I can
tell.

I've dropped the note about obsoletion, as I intend to support TIOCNOTTY as
long as it doesn't impose a major burden.

Reviewed by:	bcr (manpages), kib
Differential Revision:	https://reviews.freebsd.org/D22572
2019-11-30 20:10:50 +00:00
Mateusz Guzik
e0a1a1e6cb smp: cast the read in quiesce_all_critical through void *
Fixes compilation on some 32-bit arm platforms.

Sponsored by:	The FreeBSD Foundation
2019-11-30 19:33:02 +00:00
Mateusz Guzik
3ac2ac2e08 lockprof: use IPI-injecetd fences to fix hangs on stat dump and reset
The previously used quiesce_all_cpus walks all CPUs and waits until curthread
can run on them. Even on contemporary machines this becomes a significant
problem under load when it can literally take minutes for the operation to
complete. With the patch the stall is normally less than 1 second.

Reviewed by:	kib, jeff (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21740
2019-11-30 17:24:42 +00:00
Mateusz Guzik
5032fe17a2 Add a way to inject fences using IPIs
A variant of this facility was already used by rmlocks where IPIs would
enforce ordering.

This allows to elide fences where they are rarely needed and the cost of
IPI (should it be necessary) is cheaper.

Reviewed by:	kib, jeff (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21740
2019-11-30 17:22:10 +00:00
Mateusz Guzik
a02cab334c devfs: introduce a per-dev lock to protect ->si_devsw
This allows bumping threadcount without taking the global devmtx lock.

In particular this eliminates contention on said lock while using bhyve
with multiple vms.

Reviewed by:	kib
Tested by:	markj
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22548
2019-11-30 16:46:19 +00:00
Kyle Evans
9e387c3da2 tty_rel_gone: add locking assertion
We already assert the lock is held later during tty_rel_free(), but it is
arguably good form to clarify locking expectations here as well at the
top-level that other drivers use.
2019-11-29 14:46:13 +00:00
Konstantin Belousov
fdc6b10d44 Add a VN_OPEN_INVFS flag.
vn_open_cred() assumes that it is called from the top-level of a VFS
syscall.  Writers must call bwillwrite() before locking any VFS
resource to wait for cleanup of dirty buffers.

ZFS getextattr() and setextattr() VOPs do call vn_open_cred(), which
results in wait for unrelated buffers while owning ZFS vnode lock (and
ZFS does not use buffer cache).  VN_OPEN_INVFS allows caller to skip
bwillwrite.

Note that ZFS is still incorrect there, because it starts write on an
mp and locks a vnode while holding another vnode lock.

Reported by:	Willem Jan Withagen <wjw@digiware.nl>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-11-29 14:02:32 +00:00
Ryan Libby
815db2f6f8 ktls_session zone: don't need to specify uma trash
The use of the uma trash procedures is automatic, there's no need to
pass them explicitly here.

Reviewed by:	markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D22582
2019-11-29 06:25:03 +00:00
Kyle Evans
cf29433090 tty_pts: don't rely on tty header pollution for sys/mutex.h
tty_pts.c relies on sys/tty.h for sys/mutex.h. Include it directly instead
of relying on this pollution to ease the diff for anyone that wants to try
converting the tty lock to anything other than a mutex.
2019-11-29 03:56:01 +00:00
Jeff Roberson
6d6a03d7a8 Handle large mallocs by going directly to kmem. Taking a detour through
UMA does not provide any additional value.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D22563
2019-11-29 03:14:10 +00:00
Jeff Roberson
b476ae7f52 Fix DEBUG_REDZONE build after r355169 2019-11-28 08:56:14 +00:00