This provides the potential to force a lazy (tick based) SMR to advance
when there are blocking waiters by decoupling the wr_seq value from the
ticks value.
Add some missing compiler barriers.
Reviewed by: rlibby
Differential Revision: https://reviews.freebsd.org/D23825
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
This enables very cheap read sections with free-to-use latencies and memory
overhead similar to epoch. On a recent AMD platform a read section cost
1ns vs 5ns for the default SMR. On Xeon the numbers should be more like 1
ns vs 11. The memory consumption should be proportional to the product
of the free rate and 2*1/hz while normal SMR consumption is proportional
to the product of free rate and maximum read section time.
While here refactor the code to make future additions more
straightforward.
Name the overall technique Global Unbound Sequences (GUS) and adjust some
comments accordingly. This helps distinguish discussions of the general
technique (SMR) vs this specific implementation (GUS).
Discussed with: rlibby, markj
This was relatively harmless but surprising to see in counters. The
race occurred when rd_seq was read after the goal was updated and we
incorrectly calculated the delta between them.
Reviewed by: rlibby
Differential Revision: https://reviews.freebsd.org/D23464
counters. In my stress test there is only one poll for every 15,000
frees. This means we are effectively amortizing the cache coherency
overhead even with very high write rates (3M/s/core).
Reviewed by: markj, rlibby
Differential Revision: https://reviews.freebsd.org/D23463
inspection and after a lengthy discussion with jhb and kib. They have not
produced test failures.
Don't pointer chase through cpu0's smr. Use cpu correct smr even when not
in a critical section to reduce the likelihood of false sharing.
This is in the same family of algorithms as Epoch/QSBR/RCU/PARSEC but is
a unique algorithm. This has 3x the performance of epoch in a write heavy
workload with less than half of the read side cost. The memory overhead
is significantly lessened by limiting the free-to-use latency. A synthetic
test uses 1/20th of the memory vs Epoch. There is significant further
discussion in the comments and code review.
This code should be considered experimental. I will write a man page after
it has settled. After further validation the VM will begin using this
feature to permit lockless page lookups.
Both markj and cperciva tested on arm64 at large core counts to verify
fences on weaker ordering architectures. I will commit a stress testing
tool in a follow-up.
Reviewed by: mmacy, markj, rlibby, hselasky
Discussed with: sbahara
Differential Revision: https://reviews.freebsd.org/D22586