freebsd-dev

Author	SHA1	Message	Date
Mark Johnston	d869a17e62	Use COUNTER_U64_DEFINE_EARLY() in places where it simplifies things. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23978	2020-03-06 19:10:00 +00:00
Jeff Roberson	561af25fa7	Simplify lazy advance with a 64bit atomic cmpset. This provides the potential to force a lazy (tick based) SMR to advance when there are blocking waiters by decoupling the wr_seq value from the ticks value. Add some missing compiler barriers. Reviewed by: rlibby Differential Revision: https://reviews.freebsd.org/D23825	2020-02-27 19:05:26 +00:00
Pawel Biernacki	7029da5c36	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
Jeff Roberson	226dd6db47	Add an atomic-free tick moderated lazy update variant of SMR. This enables very cheap read sections with free-to-use latencies and memory overhead similar to epoch. On a recent AMD platform a read section cost 1ns vs 5ns for the default SMR. On Xeon the numbers should be more like 1 ns vs 11. The memory consumption should be proportional to the product of the free rate and 2*1/hz while normal SMR consumption is proportional to the product of free rate and maximum read section time. While here refactor the code to make future additions more straightforward. Name the overall technique Global Unbound Sequences (GUS) and adjust some comments accordingly. This helps distinguish discussions of the general technique (SMR) vs this specific implementation (GUS). Discussed with: rlibby, markj	2020-02-22 03:44:10 +00:00
Jeff Roberson	1f2a6b8501	Since r357804 pcpu zones are required to use zalloc_pcpu(). Prior to this it was only required if you were zeroing. Switch to these interfaces. Reviewed by: mjg	2020-02-13 21:10:17 +00:00
Jeff Roberson	a4d50e49da	Add more precise SMR entry asserts.	2020-02-13 20:50:21 +00:00
Jeff Roberson	a40068e524	Fix a race in smr_advance() that could result in unnecessary poll calls. This was relatively harmless but surprising to see in counters. The race occurred when rd_seq was read after the goal was updated and we incorrectly calculated the delta between them. Reviewed by: rlibby Differential Revision: https://reviews.freebsd.org/D23464	2020-02-06 20:51:46 +00:00
Jeff Roberson	8d7f16a5db	Add some global counters for SMR. These may eventually become per-smr counters. In my stress test there is only one poll for every 15,000 frees. This means we are effectively amortizing the cache coherency overhead even with very high write rates (3M/s/core). Reviewed by: markj, rlibby Differential Revision: https://reviews.freebsd.org/D23463	2020-02-06 20:10:21 +00:00
Jeff Roberson	bc6509845d	Implement a deferred write advancement feature that can be used to further amortize shared cacheline writes. Discussed with: rlibby Differential Revision: https://reviews.freebsd.org/D23462	2020-02-04 02:44:52 +00:00
Jeff Roberson	915c367e8e	Add two missing fences with comments describing them. These were found by inspection and after a lengthy discussion with jhb and kib. They have not produced test failures. Don't pointer chase through cpu0's smr. Use cpu correct smr even when not in a critical section to reduce the likelihood of false sharing.	2020-01-31 22:21:15 +00:00
Jeff Roberson	da6e9935e4	Don't use "All rights reserved" in new copyrights. Requested by: rgrimes	2020-01-31 02:08:09 +00:00
Jeff Roberson	d4665eaa66	Implement a safe memory reclamation feature that is tightly coupled with UMA. This is in the same family of algorithms as Epoch/QSBR/RCU/PARSEC but is a unique algorithm. This has 3x the performance of epoch in a write heavy workload with less than half of the read side cost. The memory overhead is significantly lessened by limiting the free-to-use latency. A synthetic test uses 1/20th of the memory vs Epoch. There is significant further discussion in the comments and code review. This code should be considered experimental. I will write a man page after it has settled. After further validation the VM will begin using this feature to permit lockless page lookups. Both markj and cperciva tested on arm64 at large core counts to verify fences on weaker ordering architectures. I will commit a stress testing tool in a follow-up. Reviewed by: mmacy, markj, rlibby, hselasky Discussed with: sbahara Differential Revision: https://reviews.freebsd.org/D22586	2020-01-31 00:49:51 +00:00

12 Commits