Commit Graph

17266 Commits

Author SHA1 Message Date
Mateusz Guzik
2f423bce54 vfs: stop taking additional refs on root vnode during lookup
They are spurious since introduction of struct pwd, which provides them
implicitly.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23885
2020-03-01 21:54:28 +00:00
Mateusz Guzik
8d03b99b9d fd: move vnodes out of filedesc into a dedicated structure
The new structure is copy-on-write. With the assumption that path lookups are
significantly more frequent than chdirs and chrooting this is a win.

This provides stable root and jail root vnodes without the need to reference
them on lookup, which in turn means less work on globally shared structures.
Note this also happens to fix a bug where jail vnode was never referenced,
meaning subsequent access on lookup could run into use-after-free.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23884
2020-03-01 21:53:46 +00:00
Mateusz Guzik
8243063f9b fd: make fgetvp_rights work without the filedesc lock
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23883
2020-03-01 21:50:13 +00:00
Mark Johnston
5aa5420ff2 Ensure that arm64 thread structures are allocated from the direct map.
Otherwise we can fail to handle translation faults on curthread, leading
to a panic.

Reviewed by:	alc, rlibby
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23895
2020-02-29 18:41:48 +00:00
Jeff Roberson
6be21eb778 Provide a lock free alternative to resolve bogus pages. This is not likely
to be much of a perf win, just a nice code simplification.

Reviewed by:	markj, kib
Differential Revision:	https://reviews.freebsd.org/D23866
2020-02-28 21:42:48 +00:00
Jeff Roberson
7aaf252c96 Convert a few triviail consumers to the new unlocked grab API.
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23847
2020-02-28 20:34:30 +00:00
Jeff Roberson
f72eaaeb03 Use unlocked grab for uipc_shm/tmpfs.
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D23865
2020-02-28 20:33:28 +00:00
Mark Johnston
46994ec2b1 Fix standalone builds of systrace.ko after r357912.
Sponsored by:	The FreeBSD Foundation
2020-02-28 17:05:04 +00:00
Mark Johnston
c99d0c5801 Add a blocking counter KPI.
refcount(9) was recently extended to support waiting on a refcount to
drop to zero, as this was needed for a lockless VM object
paging-in-progress counter.  However, this adds overhead to all uses of
refcount(9) and doesn't really match traditional refcounting semantics:
once a counter has dropped to zero, the protected object may be freed at
any point and it is not safe to dereference the counter.

This change removes that extension and instead adds a new set of KPIs,
blockcount_*, for use by VM object PIP and busy.

Reviewed by:	jeff, kib, mjg
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23723
2020-02-28 16:05:18 +00:00
Jeff Roberson
561af25fa7 Simplify lazy advance with a 64bit atomic cmpset.
This provides the potential to force a lazy (tick based) SMR to advance
when there are blocking waiters by decoupling the wr_seq value from the
ticks value.

Add some missing compiler barriers.

Reviewed by:	rlibby
Differential Revision:	https://reviews.freebsd.org/D23825
2020-02-27 19:05:26 +00:00
Warner Losh
729ea680be Remove trailing white space. 2020-02-26 16:22:28 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Gleb Smirnoff
6bc27f086a Generalize resources freeing in sendfile with different scenarios.
Now we execute sendfile_iodone() in all possible cases, which
guarantees that vm_object_pip_wakeup() is called and sfio structure
is freed.

At the beginning of sendfile initialize sfio->m to NULL, that would
indicate that the mbuf chain either doesn't exist, or belongs to the
syscall (not to I/O completion).  Fill sfio->m only at a point when
we are positive that there are I/Os ongoing and before releasing
syscall's reference on sfio.

In sendfile_iodone() perform vm_object_pip_wakeup() once last
reference is released, then check for sfio->m.  NULL pointer
indicates that we need only to free the memory.

Reviewed by:	jtl, gallatin
2020-02-25 19:29:05 +00:00
Gleb Smirnoff
f85e1a806b Make ktls_frame() never fail. Caller must supply correct mbufs.
This makes sendfile code a bit simplier.
2020-02-25 19:26:40 +00:00
Gleb Smirnoff
69302907d6 When sendfile_swapin() sweeps through pages in search for a bogus page
skip first and last pages.  This is a micro optimisation.
2020-02-25 19:11:20 +00:00
Ryan Libby
fe20aaec0a sys/kern: quiet -Wwrite-strings
Quiet a variety of Wwrite-strings warnings in sys/kern at low-impact
sites.  This patch avoids addressing certain others which would need to
plumb const through structure definitions.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23798
2020-02-23 03:32:16 +00:00
Ryan Libby
2782c00c04 vfs: quiet -Wwrite-strings
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23797
2020-02-23 03:32:11 +00:00
Ryan Libby
eaa17d4291 sys/vm: quiet -Wwrite-strings
Discussed with:	kib
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D23796
2020-02-23 03:32:04 +00:00
Konstantin Belousov
04869b812b Add td_pflags2, yet another thread-private flags word.
There is no more free bits in td_pflags.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-02-22 20:43:04 +00:00
Jeff Roberson
226dd6db47 Add an atomic-free tick moderated lazy update variant of SMR.
This enables very cheap read sections with free-to-use latencies and memory
overhead similar to epoch.  On a recent AMD platform a read section cost
1ns vs 5ns for the default SMR.  On Xeon the numbers should be more like 1
ns vs 11.  The memory consumption should be proportional to the product
of the free rate and 2*1/hz while normal SMR consumption is proportional
to the product of free rate and maximum read section time.

While here refactor the code to make future additions more
straightforward.

Name the overall technique Global Unbound Sequences (GUS) and adjust some
comments accordingly.  This helps distinguish discussions of the general
technique (SMR) vs this specific implementation (GUS).

Discussed with:	rlibby, markj
2020-02-22 03:44:10 +00:00
Mateusz Guzik
721a81c369 vfs: stop duplicating vnode work in audit during path lookup
Duplicating the work was putting an avoidable requirement that the filedesc
lock is held across the entire operation (otherwise by the time audit reads
vnode pointers another thread in the same process can chdir somewhere else,
making audit log things using different vnode than the one which will be
used for actual lookup).

Do the obvious thing and pass down vnodes which will be used.
2020-02-21 01:44:31 +00:00
Eric van Gyzen
3cd1f28e4a clamp kernel dump compression level when using gzip
If the configured compression level for kernel dumps
it outside the supported range, clamp it to the closest
supported level.  Previously, dumpon would fail.

zstd already does this internally, so the compressor
needs no change.

Reviewed by:	cem markj
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D23765
2020-02-20 23:53:48 +00:00
Konstantin Belousov
74cb9a5333 Fix a bug in r358168, do not call sigfastblock_setpend() under a mutex.
PR:	244250
Reported and tested by:	lwhsu
Sponsored by:	The FreeBSD Foundation
2020-02-20 21:25:12 +00:00
Mateusz Guzik
65cdfb4caa make sysent for r358172 ("vfs: add realpathat syscall") 2020-02-20 16:58:57 +00:00
Mateusz Guzik
0573d0a9b8 vfs: add realpathat syscall
realpath(3) is used a lot e.g., by clang and is a major source of getcwd
and fstatat calls. This can be done more efficiently in the kernel.

This works by performing a regular lookup while saving the name and found
parent directory. If the terminal vnode is a directory we can resolve it using
usual means. Otherwise we can use the name saved by lookup and resolve the
parent.

See the review for sample syscall counts.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23574
2020-02-20 16:58:19 +00:00
Konstantin Belousov
a113b17f10 Do not read sigfastblock word on syscall entry.
On machines with SMAP, fueword executes two serializing instructions
which can be seen in microbenchmarks.

As a measure to restore microbenchmark numbers, only read the word on
the attempt to deliver signal in ast().  If the word is set, signal is
not delivered and word is kept, preventing interruption of
interruptible sleeps by signals until userspace calls
sigfastblock(UNBLOCK) which clears the word.

This way, the spurious EINTR that userspace can see while in critical
section is on first interruptible sleep, if a signal is pending, and
on signal posting.  It is believed that it is not important for rtld
and lbithr critical sections.  It might be visible for the application
code e.g. for the callback of dl_iterate_phdr(3), but again the belief
is that the non-compliance is acceptable.  Most important is that the
retry of the sleeping syscall does not interrupt unless additional
signal is posted.

For now I added the knob kern.sigfastblock_fetch_always to enable the
word read on syscall entry to be able to diagnose possible issues due
to spurious EINTR.

While there, do some code restructuting to have all sigfastblock()
handling located in kern_sig.c.

Reviewed by:	jeff
Discussed with:	mjg
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D23622
2020-02-20 15:34:02 +00:00
Jeff Roberson
6c5f36ff30 Eliminate some unnecessary uses of UMA_ZONE_VM. Only zones involved in
virtual address or physical page allocation need to be marked with this
flag.

Reviewed by:	markj
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D23712
2020-02-19 08:17:27 +00:00
Mateusz Guzik
d8a84f08e8 refcount: update comments about fencing when releasing counts after r357989
Requested by:	kib
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23719
2020-02-16 18:20:09 +00:00
Mateusz Guzik
3403d5245e vfs: fix vlrureclaim ->v_object access
The routine was checking for ->v_type == VBAD. Since vgone drops the interlock
early sets this type at the end of the process of dooming a vnode, this opens
a time window where it can clear the pointer while the inerlock-holders is
accessing it.

Another note is that the code was:
	   (vp->v_object != NULL &&
	   vp->v_object->resident_page_count > trigger)

With the compiler being fully allowed to emit another read to get the pointer,
and in fact it did on the kernel used by pho.

Use atomic_load_ptr and remember the result.

Note that this depends on type-safety of vm_object.

Reported by:	pho
2020-02-16 03:33:34 +00:00
Mateusz Guzik
c615009461 vfs: check early for VCHR in vput_final to short-circuit in the common case
Otherwise the compiler inlines v_decr_devcount which keps getting jumped over
in the common case of not dealing with a device.
2020-02-16 03:16:28 +00:00
Matt Macy
45035becfe Add zfree to zero allocation before free
Key and cookie management typically wants to
avoid information leaks by explicitly zeroing
before free. This routine simplifies that by
permitting consumers to do so without carrying
the size around.

Reviewed by:	jeff@, jhb@
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC (Netgate)
Differential Revision:	https://reviews.freebsd.org/D22790
2020-02-16 00:12:53 +00:00
Konstantin Belousov
a7b61c0af1 sem_remove(): fix the loop that compacts sem array on semaphores removal.
As written now, it copies random kernel memory from beyond the bounds
of the array.

Reported and tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation (kib)
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D23694
2020-02-15 23:19:23 +00:00
Konstantin Belousov
4cb6ea7e8e sem_remove(): add some asserts.
Assert that sema[idx] allocation from sem[] is sane.
Also assert that sem_mtx is owned, it protects the SEM_ALLOC flag.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation (kib)
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D23694
2020-02-15 23:18:02 +00:00
Konstantin Belousov
8095050846 Use designated initializers for seminfo.
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation (kib)
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D23694
2020-02-15 23:15:42 +00:00
Pawel Biernacki
e0d69c5a88 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (1 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked). Use it in
preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Reviewed by:	kib, trasz
Approved by:	kib (mentor)
Differential Revision:	https://reviews.freebsd.org/D23640
2020-02-15 18:48:38 +00:00
Mateusz Guzik
074ad60a4c vfs: make write suspension mandatory
At the time opt-in was introduced adding yourself as a writer was esrializing
across the mount point. Nowadays it is fully per-cpu, the only impact being
a small single-threaded hit on top of what's there right now.

Vast majority of the overhead stems from the call to VOP_GETWRITEMOUNT which
has is done regardless.

Should someone want to microoptimize this single-threaded they can coalesce
looking the mount up with adding a write to it.
2020-02-15 13:00:39 +00:00
Mateusz Guzik
eb40664d83 capsicum: use new helpers 2020-02-15 01:30:27 +00:00
Mateusz Guzik
445faddf7f kqueue: use new capsicum helpers 2020-02-15 01:30:13 +00:00
Mateusz Guzik
32a86c44ee fd: use new capsicum helpers 2020-02-15 01:28:55 +00:00
Mateusz Guzik
e126c5a3e8 vfs: use new capsicum helpers 2020-02-15 01:28:42 +00:00
Konstantin Belousov
6cf2362e2c Consolidate read code for timecounters and fix possible overflow in
bintime()/binuptime().

The algorithm to read the consistent snapshot of current timehand is
repeated in each accessor, including the details proper rollup
detection and synchronization with the writer.  In fact there are only
two different kind of readers: one for bintime()/binuptime() which has
to do the in-place calculation, and another kind which fetches some
member from struct timehand.

Extract the logic into type-checked macros, GETTHBINTIME() for bintime
calculation, and GETTHMEMBER() for safe read of a structure' member.
This way, the synchronization is only written in bintime_off() and
getthmember().

In bintime_off(), use overflow-safe calculation of th_scale *
delta(timecounter).  In tc_windup, pre-calculate the min delta value
which overflows and require slow algorithm, into the new timehands
th_large_delta member.

This part with overflow fix was written by Bruce Evans.

Reported by:	Mark Millard <marklmi@yahoo.com> (the overflow issue)
Tested by:	pho
Discussed with:	emaste
Sponsored by:	The FreeBSD Foundation (kib)
MFC after:	3 weeks
2020-02-14 23:27:45 +00:00
Mateusz Guzik
df0d5a2a85 vfs: remove no longer needed atomic_load_ptr casts 2020-02-14 23:18:32 +00:00
Mateusz Guzik
8f86349f8b fd: remove no longer needed atomic_load_ptr casts 2020-02-14 23:18:22 +00:00
Mateusz Guzik
5bc6a91f54 kcov: remove no longer needed atomic_load_ptr casts 2020-02-14 23:18:03 +00:00
Mateusz Guzik
2f7292437d Merge audit and systrace checks
This further shortens the syscall routine by not having to re-check after
the system call.
2020-02-14 13:09:41 +00:00
Mateusz Guzik
0e84a878c0 Annotate branches in the syscall path
This in particular significantly shortens amd64_syscall, which otherwise
keeps jumping forward over 2KB of code in total.

Note some of these branches should be either eliminated altogether or
coalesced.
2020-02-14 13:08:46 +00:00
Mateusz Guzik
ba8dd40bb1 lockmgr: add a change missed in r357907 2020-02-14 11:56:50 +00:00
Mateusz Guzik
6ed30ea4c0 fd: annotate finstall with prediction branches 2020-02-14 11:22:12 +00:00
Mateusz Guzik
c1b57fa7d3 lockmgr: rename lock_fast_path to lock_flags
The routine is not much of a fast path and the flags name better describes
its purpose.
2020-02-14 11:21:28 +00:00
Mateusz Guzik
943c4932f3 lockmgr: retire the unused lockmgr_unlock_fast_path routine 2020-02-14 11:20:25 +00:00