Commit Graph

17486 Commits

Author SHA1 Message Date
Mateusz Guzik
31ad4050fe lockmgr: add adaptive spinning
It is very conservative. Only spinning when LK_ADAPTIVE is passed, only on
exclusive lock and never when any waiters are present. buffer cache is remains
not spinning.

This reduces total sleep times during buildworld etc., but it does not shorten
total real time (culprits are contention in the vm subsystem along with slock +
upgrade which is not covered).

For microbenchmarks: open3_processes -t 52 (open/close of the same file for
writing) ops/s:
before: 258845
after: 801638

Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D25753
2020-07-22 12:30:31 +00:00
Mitchell Horne
dc42509049 INTRNG: only shuffle for !EARLY_AP_STARTUP
During device attachment, all interrupt sources will bind to the BSP,
as it is the only processor online. This means interrupts must be
redistributed ("shuffled") later, during SI_SUB_SMP.

For the EARLY_AP_STARTUP case, this is no longer true. SI_SUB_SMP will
execute much earlier, meaning APs will be online and available before
devices begin attachment, and there will therefore be nothing to
shuffle.

All PIC-conforming interrupt controllers will handle this early
distribution properly, except for RISC-V's PLIC. Make the necessary
tweak to the PLIC driver.

While here, convert irq_assign_cpu from a boolean_t to a bool.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D25693
2020-07-21 22:47:02 +00:00
Mateusz Guzik
4aff9f5d99 lockmgr: denote recursion with a bit in lock value
This reduces excessive reads from the lock.

Tested by:	pho
2020-07-21 14:42:22 +00:00
Mateusz Guzik
f6b091fbbd lockmgr: rewrite upgrade to stop always dropping the lock
This matches rw and sx locks.
2020-07-21 14:41:25 +00:00
Mateusz Guzik
bdb6d824f4 lockmgr: add a helper for reading the lock value 2020-07-21 14:39:20 +00:00
Adrian Chadd
f7d38a13a8 [net80211] Add new privileges; restrict what can be done in a jail.
Split the MANAGE privilege into MANAGE, SETMAC and CREATE_VAP.

+ VAP_MANAGE is everything but setting the MAC and creating a VAP.
+ VAP_SETMAC is setting the MAC address of the VAP.
  Typically you wouldn't want the jail to be able to modify this.
+ CREATE_VAP is to create a new VAP. Again, you don't want to be doing
  this in a jail, but this DOES stop being able to run some corner
  cases like Dynamic WDS (DWDS) AP in a jail/vnet. We can figure this
  bit out later.

This allows me to run wpa_supplicant in a jail after transferring
a STA VAP into it. I unfortunately can't currently set the wlan
debugging inside the jail; that would be super useful!

Reviewed by:	bz
Differential Revision:	https://reviews.freebsd.org/D25630
2020-07-19 15:16:27 +00:00
Mateusz Guzik
7cd4443fb1 Short-circuit tdfind when looking for the calling thread.
Common occurence with cpuset and other places.
2020-07-18 00:14:43 +00:00
Mateusz Guzik
3ea3fbe685 vfs: fix vn_poll performance with either MAC or AUDIT
The code would unconditionally lock the vnode to audit or call the
mac hoook, even if neither want to do anything. Pre-check the state
to avoid locking in the common case of nothing to do.

Note this code should not be normally executed anyway as vnodes are
always return ready. However, poll1/2 from will-it-scale use regular
files for benchmarking, presumably to focus on the interface itself
as the vnode handler is not supposed to do almost anything.

This in particular fixes poll2 which passes 128 fds.

$ ./poll2_processes -s 10
before: 134411
after:  271572
2020-07-16 14:09:18 +00:00
Mateusz Guzik
ab06a30517 vfs: fix MAC/AUDIT mismatch in vn_poll
Auditing would not be performed without MAC compiled in.
2020-07-16 14:04:28 +00:00
Mateusz Guzik
b1607c8727 poll: factor fd lookup out of scan and rescan 2020-07-15 10:24:39 +00:00
Mateusz Guzik
d8bc2a17a5 fd: remove fd_lastfile
It keeps recalculated way more often than it is needed.

Provide a routine (fdlastfile) to get it if necessary.

Consumers may be better off with a bitmap iterator instead.
2020-07-15 10:24:04 +00:00
Mateusz Guzik
7177149a4d fd: add obvious branch predictions to fdalloc 2020-07-15 10:14:00 +00:00
Mateusz Guzik
29f3e5ea41 cache: make negative shrinker round robin on all lists every time
Previously it would check 4, 3, 2, 1 lists. In practice by the time
it is getting called all lists have some elements and consequently
this does not result in new evictions.

Nonetheless, the code is clearer.

Tested by:	pho
2020-07-14 21:19:33 +00:00
Mateusz Guzik
a110fa2ee1 cache: remove numcalls
The counter is not very useful and if necessary the value can be
found by summing up other counters.
2020-07-14 21:17:46 +00:00
Mateusz Guzik
4516c7eed9 cache: count dropped entries 2020-07-14 21:17:08 +00:00
Mateusz Guzik
654e644e80 cache: remove neg_locked argument from cache_zap_locked
Tested by:	pho
2020-07-14 21:16:48 +00:00
Mateusz Guzik
ffb0abddf1 cache: remove a useless argument from cache_negative_insert 2020-07-14 21:16:07 +00:00
Mateusz Guzik
9f8d452173 cache: create a dedicate struct for negative entries
.. and stuff if into the unused target vnode field

This gets rid of concurrent nc_flag modifications racing with the
shrinker and consequently fixes a bug where such a change could have
been missed when cache_ncp_invalidate was being issued..

Reported by:	zeising
Tested by:	pho, zeising
Fixes:	r362828 ("cache: lockless forward lookup with smr")
2020-07-14 21:14:59 +00:00
Mateusz Guzik
373278a7f6 fd: stop looping in pwd_hold
We don't expect to fail acquiring the reference unless running into a corner
case. Just in case ensure forward progress by taking the lock.

Reviewed by:	kib, markj
Differential Revision: https://reviews.freebsd.org/D25616
2020-07-11 21:57:03 +00:00
Mateusz Guzik
74f61caed5 vfs: fix early termination of kern_getfsstat
The kernel would unlock already unlocked mutex if the buffer got filled up
before the mount list ended.

Reported by:	pho
Fixes:	r363069 ("vfs: depessimize getfsstat when only the count is requested")
2020-07-10 09:24:27 +00:00
Mateusz Guzik
422f38d8ea vfs: fix trivial whitespace issues which don't interefere with blame
.. even without the -w switch
2020-07-10 09:01:36 +00:00
Mateusz Guzik
6c69e69724 vfs: depessimize getfsstat when only the count is requested
This avoids relocking mountlist_mtx for each entry.
2020-07-10 06:47:58 +00:00
Mateusz Guzik
8c1f410c19 vfs: avoid spurious memcpy in vfs_statfs
It is quite often called for the very same buffer.
2020-07-10 06:46:42 +00:00
Kyle Evans
3f07b9d9f8 shm_open2: Implement SHM_GROW_ON_WRITE
Lack of SHM_GROW_ON_WRITE is actively breaking Python's memfd_create tests,
so go ahead and implement it. A future change will make memfd_create always
set SHM_GROW_ON_WRITE, to match Linux behavior and unbreak Python's tests
on -CURRENT.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D25502
2020-07-10 00:43:45 +00:00
Mark Johnston
fe59cb6ba2 Apply the logic from r363051 to semctl(2) and __sem_base field.
Reported by:	Jeffball <jeffball@grimm-co.com>
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25600
2020-07-09 18:34:54 +00:00
Mark Johnston
f4f16af1d3 Avoid copying out kernel pointers from msgctl(IPC_STAT).
While this behaviour is harmless, it is really just an artifact of the
fact that the msgctl(2) implementation uses a user-visible structure as
part of the internal implementation, so it is not deliberate and these
pointers are not useful to userspace.  Thus, NULL them out before
copying out, and remove references to them from the manual page.

Reported by:	Jeffball <jeffball@grimm-co.com>
Reviewed by:	emaste, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25600
2020-07-09 17:26:49 +00:00
Mark Johnston
866a5d1298 Regenerate.
Sponsored by:	The FreeBSD Foundation
2020-07-06 16:34:49 +00:00
Mark Johnston
bdfe61e05e Permit cpuset_(get|set)domain() in capability mode.
These system calls already perform validation of their parameters when
called in capability mode, identical to cpuset_(get|set)affinity().

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-07-06 16:34:29 +00:00
Pawel Biernacki
e94fdc3833 kern.tty_info_kstacks: set compact format as default 2020-07-06 16:34:15 +00:00
Mark Johnston
69b565d7c0 Allow accesses of the caller's CPU and domain sets in capability mode.
cpuset_(get|set)(affinity|domain)(2) permit a get or set of the calling
thread or process' CPU and domain set in capability mode, but only when
the thread or process ID is specified as -1.  Extend this to cover the
case where the ID actually matches the caller's TID or PID, since some
code, such as our pthread_attr_get_np() implementation, always provides
an explicit ID.

It was not and still is not permitted to access CPU and domain sets for
other threads in the same process when the process is in capability
mode.  This might change in the future.

Submitted by:	Greg V <greg@unrelenting.technology> (original version)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25552
2020-07-06 16:34:09 +00:00
Pawel Biernacki
cd1c083d80 kern.tty_info_kstacks: add a compact format
Add a more compact display format for kern.tty_info_kstacks inspired by
procstat -kk. Set it as a default one.

# sysctl kern.tty_info_kstacks=1
kern.tty_info_kstacks: 0 -> 1
# sleep 2
^T
load: 0.17  cmd: sleep 623 [nanslp] 0.72r 0.00u 0.00s 0% 2124k
#0 0xffffffff80c4443e at mi_switch+0xbe
#1 0xffffffff80c98044 at sleepq_catch_signals+0x494
#2 0xffffffff80c982c2 at sleepq_timedwait_sig+0x12
#3 0xffffffff80c43af3 at _sleep+0x193
#4 0xffffffff80c50e31 at kern_clock_nanosleep+0x1a1
#5 0xffffffff80c5119b at sys_nanosleep+0x3b
#6 0xffffffff810ffc69 at amd64_syscall+0x119
#7 0xffffffff810d5520 at fast_syscall_common+0x101
sleep: about 1 second(s) left out of the original 2
^C
# sysctl kern.tty_info_kstacks=2
kern.tty_info_kstacks: 1 -> 2
# sleep 2
^T
load: 0.24  cmd: sleep 625 [nanslp] 0.81r 0.00u 0.00s 0% 2124k
mi_switch+0xbe sleepq_catch_signals+0x494 sleepq_timedwait_sig+0x12
sleep+0x193 kern_clock_nanosleep+0x1a1 sys_nanosleep+0x3b
amd64_syscall+0x119 fast_syscall_common+0x101
sleep: about 1 second(s) left out of the original 2
^C

Suggested by:	avg
Reviewed by:	mjg
Relnotes:	yes
Sponsored by:	Mysterious Code Ltd.
Differential Revision:	https://reviews.freebsd.org/D25487
2020-07-06 16:33:28 +00:00
Mark Johnston
9eb997cb48 Lift cpuset Capsicum checks into a subroutine.
Otherwise the same checks are duplicated across four different system
call implementations, cpuset_(get|set)(affinity|domain)().  No
functional change intended.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-07-06 16:33:21 +00:00
Mateusz Guzik
9b0c2e5909 vfs: expand on vhold_smr comment 2020-07-06 02:00:35 +00:00
Mateusz Guzik
d363fa4127 lockf: elide avoidable locking in lf_advlockasync
While here assert on ls_threads state.
2020-07-05 23:07:54 +00:00
Konstantin Belousov
4543c1c329 Fix typo.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2020-07-05 20:54:01 +00:00
Andrew Turner
fcf7a48191 Rerun kernel ifunc resolvers after all CPUs have started
On architectures that use RELA relocations it is safe to rerun the ifunc
resolvers on after all CPUs have started, but while they are sill parked.

On arm64 with big.LITTLE this is needed as some SoCs have shipped with
different ID register values the big and little clusters meaning we were
unable to rely on the register values from the boot CPU.

Add support for rerunning the resolvers on arm64 and amd64 as these are
both RELA using architectures.

Reviewed by:	kib
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D25455
2020-07-05 14:38:22 +00:00
Mateusz Guzik
dc3c991598 Add char and short types to kcsan 2020-07-04 06:22:05 +00:00
Mateusz Guzik
58199a7052 ifdef out pg_jobc assertions added in r361967
They trigger for some people, the bug is not obvious, there are no takers
for fixing it, the issue already had to be there for years beforehand and
is low priority.
2020-07-03 09:23:11 +00:00
Mateusz Guzik
a2de789ebb cred: add a prediction to crfree for td->td_realucred == cr
This matches crhold and eliminates an assembly maze in the common case.
2020-07-02 12:58:07 +00:00
Mateusz Guzik
d23850207b cache: add missing call to cache_ncp_invalid for negative hits
Note the dtrace probe can fire even the entry is gone, but I don't think that's
worth fixing.
2020-07-02 12:56:20 +00:00
Mateusz Guzik
d129e0eba0 cache: fix misplaced fence in cache_ncp_invalidate
The intent was to mark the entry as invalid before cache_zap starts messing
with it.

While here add some comments.
2020-07-02 12:54:50 +00:00
Konstantin Belousov
4bc5ce2c74 Use tdfind() in pget().
Reviewed by:	jhb, hselasky
Sponsored by:	Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25532
2020-07-02 10:40:47 +00:00
Andrew Turner
ecc8ccb441 Simplify the flow when getting/setting an isrc
Rather than unlocking and returning we can just perform the needed action
only when the interrupt source is valid and reuse the unlock in both the
valid irq and invalid irq cases.

Sponsored by:	Innovate UK
2020-07-01 12:07:28 +00:00
Mateusz Guzik
5d1c042d32 cache: lockless forward lookup with smr
This eliminates the need to take bucket locks in the common case.

Concurrent lookup utilizng the same vnodes is still bottlenecked on referencing
and locking path components, this will be taken care of separately.

Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D23913
2020-07-01 05:59:08 +00:00
Mateusz Guzik
f8022be3e6 vfs: protect vnodes with smr
vget_prep_smr and vhold_smr can be used to ref a vnode while within vfs_smr
section, allowing consumers to get away without locking.

See vhold_smr and vdropl for comments explaining caveats.

Reviewed by:	kib
Testec by:	pho
Differential Revision:	https://reviews.freebsd.org/D23913
2020-07-01 05:56:29 +00:00
Andrew Gallatin
46cac10b3b Fix a panic when unloading firmware
LIST_FOREACH_SAFE() is not safe in the presence
of other threads removing list entries when a
mutex is released.

This is not in the critical path, so just restart
the scan each time we drop the lock, rather than
using a marker.

Reviewed by:	jhb, markj
Sponsored by:	Netflix
2020-06-29 21:35:50 +00:00
John Baldwin
4a711b8d04 Use zfree() instead of explicit_bzero() and free().
In addition to reducing lines of code, this also ensures that the full
allocation is always zeroed avoiding possible bugs with incorrect
lengths passed to explicit_bzero().

Suggested by:	cem
Reviewed by:	cem, delphij
Approved by:	csprng (cem)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25435
2020-06-25 20:17:34 +00:00
Mark Johnston
84242cf68a Call swap_pager_freespace() from vm_object_page_remove().
All vm_object_page_remove() callers, except
linux_invalidate_mapping_pages() in the LinuxKPI, free swap space when
removing a range of pages from an object.  The LinuxKPI case appears to
be an unintentional omission that could result in leaked swap blocks, so
unconditionally free swap space in vm_object_page_remove() to protect
against similar bugs in the future.

Reviewed by:	alc, kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25329
2020-06-25 15:21:21 +00:00
Enji Cooper
d6701b6c8c Add kern.features.witness
Adding `kern.features.witness` helps expose whether or not the kernel has
`options WITNESS` enabled, so the `feature_present(3)` API can be used
to query whether or not witness(9) is built into the kernel.

This support is helpful with userspace applications (generally speaking,
tests), as it can be queried to determine whether or not tests related
to WITNESS should be run.

MFC after:	1 week
Reviewed by: cem, darrick.freebsd_gmail.com
Differential Revision: https://reviews.freebsd.org/D25302
Sponsored by:	DellEMC Isilon
2020-06-24 18:51:01 +00:00
Thomas Munro
f270658873 vfs: track sequential reads and writes separately
For software like PostgreSQL and SQLite that sometimes reads sequentially
while also writing sequentially some distance behind with interleaved
syscalls on the same fd, performance is better on UFS if we do
sequential access heuristics separately for reads and writes.

Patch originally by Andrew Gierth in 2008, updated and proposed by me with
his permission.

Reviewed by:	mjg, kib, tmunro
Approved by:	mjg (mentor)
Obtained from:	Andrew Gierth <andrew@tao11.riddles.org.uk>
Differential Revision:	https://reviews.freebsd.org/D25024
2020-06-21 08:51:24 +00:00