Commit Graph

17154 Commits

Author SHA1 Message Date
Mateusz Guzik
921e7210f8 cache: return the total length from vn_fullpath1
This removes strlen from getcwd.
2020-02-01 20:37:11 +00:00
Mateusz Guzik
4511dd9d41 cache: remove vnode -> path lookup disablement
It seems to be of little to no use even when debugging.

Interested parties can resurrect it and gate compilation with a macro.
2020-02-01 20:36:35 +00:00
Mateusz Guzik
45757984f8 vfs: consistently use size_t for buflen around VOP_VPTOCNP 2020-02-01 20:34:43 +00:00
Mateusz Guzik
643656cfaf vfs: replace VOP_MARKATIME with VOP_MMAPPED
The routine is only provided by ufs and is only used on mmap and exec.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23422
2020-02-01 06:46:55 +00:00
Mateusz Guzik
90f4ec3328 vfs: save on atomics on the root vnode for absolute lookups
There are 2 back-to-back atomics on the vnode, but we can check upfront if one
is sufficient. Similarly we can handle relative lookups where current working
directory == root directory.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23427
2020-02-01 06:40:35 +00:00
Mateusz Guzik
21c4f1041e vfs: add vrefactn
Differential Revision:	https://reviews.freebsd.org/D23427
2020-02-01 06:39:49 +00:00
Jeff Roberson
915c367e8e Add two missing fences with comments describing them. These were found by
inspection and after a lengthy discussion with jhb and kib.  They have not
produced test failures.

Don't pointer chase through cpu0's smr.  Use cpu correct smr even when not
in a critical section to reduce the likelihood of false sharing.
2020-01-31 22:21:15 +00:00
Mark Johnston
1c29da0279 Reimplement stack capture of running threads on i386 and amd64.
After r355784 the td_oncpu field is no longer synchronized by the thread
lock, so the stack capture interrupt cannot be delievered precisely.
Fix this using a loop which drops the thread lock and restarts if the
wrong thread was sampled from the stack capture interrupt handler.

Change the implementation to use a regular interrupt instead of an NMI.
Now that we drop the thread lock, there is no advantage to the latter.

Simplify the KPIs.  Remove stack_save_td_running() and add a return
value to stack_save_td().  On platforms that do not support stack
capture of running threads, stack_save_td() returns EOPNOTSUPP.  If the
target thread is running in user mode, stack_save_td() returns EBUSY.

Reviewed by:	kib
Reported by:	mjg, pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23355
2020-01-31 15:43:33 +00:00
Mateusz Guzik
0f4d8b77c0 vfs: revert the overzealous assert added in r357285 to vgone
The intent was to make it more likely to catch filesystems with custom
need_inactive routines which fail to call vn_need_pageq_flush (or do an
equivalent).

One immediate case which is missed is vgone from called by inactive itself.

A better assertion may land later. The routine is not added to vputx because
it is of no use to tmpfs et al.

Reported by:	syzbot+5f697ec11f89b60941db@syzkaller.appspotmail.com
2020-01-31 11:31:14 +00:00
Mateusz Guzik
1a78ac2416 Add rms_try_rlock and rms_wowned. 2020-01-31 08:36:49 +00:00
Mateusz Guzik
cedad2916e Remove an overzealous assert from rms_runlock. 2020-01-31 08:36:23 +00:00
Jeff Roberson
da6e9935e4 Don't use "All rights reserved" in new copyrights.
Requested by:	rgrimes
2020-01-31 02:08:09 +00:00
Jeff Roberson
d4665eaa66 Implement a safe memory reclamation feature that is tightly coupled with UMA.
This is in the same family of algorithms as Epoch/QSBR/RCU/PARSEC but is
a unique algorithm.  This has 3x the performance of epoch in a write heavy
workload with less than half of the read side cost.  The memory overhead
is significantly lessened by limiting the free-to-use latency.  A synthetic
test uses 1/20th of the memory vs Epoch.  There is significant further
discussion in the comments and code review.

This code should be considered experimental.  I will write a man page after
it has settled.  After further validation the VM will begin using this
feature to permit lockless page lookups.

Both markj and cperciva tested on arm64 at large core counts to verify
fences on weaker ordering architectures.  I will commit a stress testing
tool in a follow-up.

Reviewed by:	mmacy, markj, rlibby, hselasky
Discussed with:	sbahara
Differential Revision:	https://reviews.freebsd.org/D22586
2020-01-31 00:49:51 +00:00
Mateusz Guzik
3ff65f71cb Remove duplicated empty lines from kern/*.c
No functional changes.
2020-01-30 20:05:05 +00:00
Mateusz Guzik
2823710f05 Tidy up 2 comments in smp_rendezvous_cpus. 2020-01-30 20:02:14 +00:00
Mateusz Guzik
7ab99925fd Assert that smp_rendezvous_cpus is called with interrupts enabled. 2020-01-30 19:38:51 +00:00
Mateusz Guzik
d53d924f60 vfs: keep the mount point referenced across sys_quotactl
Otherwise we risk running into use-after-free.

In particular this codepath ends up dropping all protection before
suspending writes:

ufs_quotactl -> quotaoff_inchange -> vfs_write_suspend_umnt

Reported by:	pho
2020-01-30 19:38:12 +00:00
John Baldwin
fbb9879c0c Fix use of an uninitialized variable.
ctx (and thus ctx.flags) is stack garbage at the start of this
function, so initialize ctx.flags to an explicit value instead of
using binary operations on the garbage.

Reported by:	gcc9
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D23368
2020-01-30 18:28:02 +00:00
Mateusz Guzik
c2ef6aa3d5 vfs: assert that doomed vnodes don't need to call vm_object_page_clean
... after the optional inactive processing.
2020-01-30 04:59:08 +00:00
Mateusz Guzik
07c6e2f4ab vfs: unlazy before dooming the vnode
With this change having the listmtx lock held postpones dooming the vnode.
Use this fact to simplify iteration over the lazy list. It also allows
filters to safely access ->v_data.

Reviewed by:	kib (early version)
Differential Revision:	https://reviews.freebsd.org/D23397
2020-01-30 02:12:52 +00:00
Gleb Smirnoff
79674264df Fix text format definition for kern.maxvnodes, vfs.wantfreevnodes. This
is a regression from r356642, r356645.
2020-01-30 00:18:00 +00:00
Conrad Meyer
07a65f9d38 hwpstate_intel(4): Silence/fix Coverity reports
These were all introduced in the initial import of hwpstate_intel(4).

Reported by:	Coverity
CIDs:		1413161, 1413164, 1413165, 1413167
X-MFC-With:	r357002
2020-01-29 03:15:34 +00:00
Warner Losh
42ec4f05a3 Make mqueue objects work across a fork again.
In r110908 (2003) alfred added DFLAG_PASSABLE to tag those types of FD
that can be passed via unix pipes, but mqueuefs didn't exist
yet. Later, in r152825 (2005) davidxu neglected to include
DFLAG_PASSABLE since people don't normally pass these things via unix
sockets (it's a FreeBSD implementation detail that it's a file
descriptor, nobody noticed). Then r223866 (2011) by jonathan used the
new flag in fdcopy, which fork uses. Due to that, mqueuefs actually
broke mqueue objects being propagated by fork. No mention of mqueuefs
was made in r223866, so I think it was an unintended consequence.

Fix this by tagging mqueuefs as passable as well. They were prior to
alfred's change (and it's clear there's no intent in his change to
change this behavior), and POSIX requires this to be the case as well.

PR: 243103
Reviewed by: kib@, jiles@
Differential Revision: https://reviews.freebsd.org/D23038
2020-01-27 22:36:54 +00:00
John Baldwin
425e5f9dcf Revert accidental change from r357146. 2020-01-26 14:23:27 +00:00
John Baldwin
c73222d0e6 Fix some misleading indentation warnings reported by recent clang.
These should not be any functional change.  While the change in
emul10kx-pcm.c looks like a real bug fix (as opposed to inconsistent
whitespace), the extra statements were not harmful.

Reviewed by:	kib
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D23363
2020-01-26 14:20:57 +00:00
Mateusz Guzik
1513f80391 vfs: do an unlocked check before iterating the lazy list
For most filesystems it is expected to be empty most of the time.
2020-01-26 07:06:18 +00:00
Mateusz Guzik
cd0e46c66b vfs: remove vop loop from vop_sigdefer
All ops are guaranteed to be present since r357131.
2020-01-26 07:05:06 +00:00
Mateusz Guzik
6d69e665dd vfs: fix freevnodes count update race against preemption
vdbatch_process leaves the critical section too early, openign a time
window where another thread can get scheduled and modify vd->freevnodes.
Once it the preempted thread gets back it overrides the value with 0.

Just move critical_exit to the end of the function.
2020-01-26 00:40:27 +00:00
Mateusz Guzik
dc9a1cb60b vfs: predict vn_lock failure as unlikely in vget 2020-01-26 00:34:57 +00:00
Jason A. Harmening
a9aa06f7b1 Implement cycle-detecting garbage collector for AF_UNIX sockets
The existing AF_UNIX socket garbage collector destroys any socket
which may potentially be in a cycle, as indicated by its file reference
count being equal to its enqueue count. However, this can produce false
positives for in-flight sockets which aren't part of a cycle but are
part of one or more SCM_RIGHTS mssages and which have been closed
on the sending side. If the garbage collector happens to run at
exactly the wrong time, destruction of these sockets will render them
unusable on the receiving side, such that no previously-written data
may be read.

This change rewrites the garbage collector to precisely detect cycles:

1. The existing check of msgcount==f_count is still used to determine
   whether the socket is potentially in a cycle.
2. The socket is now placed on a local "dead list", which is used to
   reduce iteration time (and therefore contention on the global
   unp_link_rwlock).
3. The first pass through the dead list removes each potentially-dead
   socket's outgoing references from the graph of potentially-dead
   sockets, using a gc-specific copy of the original reference count.
4. The second series of passes through the dead list removes from the
   list any socket whose remaining gc refcount is non-zero, as this
   indicates the socket is actually accessible outside of any possible
   cycle.  Iteration is repeated until no further sockets are removed
   from the dead list.
5. Sockets remaining in the dead list are destroyed as before.

PR:		227285
Submitted by:	jan.kokemueller@gmail.com (prior version)
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D23142
2020-01-25 08:57:26 +00:00
Mark Johnston
a89c2c8c34 Revert r357050.
It seems to have introduced a couple of regressions.

Reported by:	cy, pho
2020-01-24 14:58:02 +00:00
Edward Tomasz Napierala
b3fb13eb55 Add kern_unmount() and use in Linuxulator. No functional changes.
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22646
2020-01-24 11:57:55 +00:00
Mateusz Guzik
28eb39a5ab vfs: allow v_usecount to transition 0->1 without the interlock
There is nothing to do but to bump the count even during said transition.
There are 2 places which can do it:
- vget only does this after locking the vnode, meaning there is no change in
  contract versus inactive or reclamantion
- vref only ever did it with the interlock held which did not protect against
  either (that is, it would always succeed)

VCHR vnodes retain special casing due to the need to maintain dev use count.

Reviewed by:	jeff, kib
Tested by:	pho (previous version)
Differential Revision:	https://reviews.freebsd.org/D23185
2020-01-24 07:47:44 +00:00
Mateusz Guzik
d93762b94d vfs: stop handling VI_OWEINACT in vget
vget is almost always called with LK_SHARED, meaning the flag (if present) is
almost guaranteed to get cleared. Stop handling it in the first place and
instead let the thread which wanted to do inactive handle the bumepd usecount.

Reviewed by:	jeff
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D23184
2020-01-24 07:45:59 +00:00
Mateusz Guzik
74c4b7cc60 vfs: stop unlocking the vnode upfront in vput
Doing so runs into races with filesystems which make half-constructed vnodes
visible to other users, while depending on the chain vput -> vinactive ->
vrecycle to be executed without dropping the vnode lock.

Impediments for making this work got cleared up (notably vop_unlock_post now
does not do anything and lockmgr stops touching the lock after the final
write). Stacked filesystems keep vhold/vdrop across unlock, which arguably can
now be eliminated.

Reviewed by:	jeff
Differential Revision:	https://reviews.freebsd.org/D23344
2020-01-24 07:44:25 +00:00
Mateusz Guzik
c00115f108 lockmgr: don't touch the lock past unlock
This evens it up with other locking primitives.

Note lock profiling still touches the lock, which again is in line with the
rest.

Reviewed by:	jeff
Differential Revision:	https://reviews.freebsd.org/D23343
2020-01-24 07:42:57 +00:00
Mark Johnston
1bfca40c57 Set td_oncpu before dropping the thread lock during a switch.
After r355784 we no longer hold a thread's thread lock when switching it
out.  Preserve the previous synchronization protocol for td_oncpu by
setting it together with td_state, before dropping the thread lock
during a switch.

Reported and tested by:	pho
Reviewed by:	kib
Discussed with:	jeff
Differential Revision:	https://reviews.freebsd.org/D23270
2020-01-23 16:24:51 +00:00
Jeff Roberson
91e31c3c08 Consistently use busy and vm_page_valid() rather than touching page bits
directly.  This improves API compliance, asserts, etc.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23283
2020-01-23 04:54:49 +00:00
Jeff Roberson
1eb13fce84 Block the thread lock in sched_throw() and use cpu_switch() to unblock
it.  The introduction of lockless switch in r355784 created a race to
re-use the exiting thread that was only possible to hit on a hypervisor.

Reported/Tested by:	rlibby
Discussed with:	rlibby, jhb
2020-01-23 03:36:50 +00:00
Gleb Smirnoff
ad3980121b DEVICE_POLLING is an alternative to network interrupts and also
needs to enter epoch.  Assert that in the netisr_poll() and do
the work for the idle poll routine.
2020-01-23 01:30:50 +00:00
Gleb Smirnoff
511d1afb6b Enter the network epoch for interrupt handlers of INTR_TYPE_NET.
Provide tunable to limit how many times handlers may be executed
without reentering epoch.

Differential Revision:	https://reviews.freebsd.org/D23242
2020-01-23 01:24:47 +00:00
Gleb Smirnoff
c4eb66309f Add ie_hflags to struct intr_event, which accumulates flags from all
handlers on this event.  For now handle only IH_ENTROPY in that manner.
2020-01-23 01:20:59 +00:00
Conrad Meyer
4577cf3744 cpufreq(4): Add support for Intel Speed Shift
Intel Speed Shift is Intel's technology to control frequency in hardware,
with hints from software.

Let's get a working version of this in the tree and we can refine it from
here.

Submitted by:	bwidawsk, scottph
Reviewed by:	bcr (manpages), myself
Discussed with:	jhb, kib (earlier versions)
With feedback from:	Greg V, gallatin, freebsdnewbie AT freenet.de
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D18028
2020-01-22 23:28:42 +00:00
Hans Petter Selasky
1f69a50940 Make sure the VNET is properly set when calling tcp_drop() from
the ktls taskqueue callback function.

A valid VNET is needed when updating statistics.

panic()
tcp_state_change()
tcp_drop()
ktls_reset_send_tag()
taskqueue_run_locked()
taskqueue_thread_loop()

Sponsored by:	Mellanox Technologies
2020-01-21 11:43:25 +00:00
Mateusz Guzik
6403455301 cache: revert r352613 now that vhold does not take locks 2020-01-20 19:52:23 +00:00
Mateusz Guzik
8bba93c7e0 cache: make numcachehv use counter(9) on all archs
Requested by:	kib
2020-01-20 14:42:11 +00:00
Jeff Roberson
d6e13f3b4d Don't hold the object lock while calling getpages.
The vnode pager does not want the object lock held.  Moving this out allows
further object lock scope reduction in callers.  While here add some missing
paging in progress calls and an assert.  The object handle is now protected
explicitly with pip.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D23033
2020-01-19 23:47:32 +00:00
Mateusz Guzik
a9099e5b10 vfs: switch vop_stdunlock to call lockmgr_unlock
Since the flags argument is now alawys 0 the new call provides the same
behavior.
2020-01-19 21:41:34 +00:00
Jeff Roberson
811d05fcb7 Provide an API for interlocked refcount sleeps.
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D22908
2020-01-19 18:18:17 +00:00
Mateusz Guzik
28479aaae2 vfs: allow v_holdcnt to transition 0->1 without the interlock
Since r356672 ("vfs: rework vnode list management") there is nothing to do
apart from altering freevnodes count, but this much can be safely done based
on the result of atomic_fetchadd.

Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D23186
2020-01-19 17:47:04 +00:00