The code is pretty much guaranteed not to be able to unlock.
This is a minor nit. The code still performs way too many reads.
The altered exclusive-locked condition is supposed to be always
true as well, to be cleaned up at a later date.
There are risks associated with waiting on a preemptible epoch section.
Change the name to make them not be the default and document the issue
under CAVEATS.
Reported by: markj
adds:
- epoch_enter_critical() - can be called inside a different epoch,
starts a section that will acquire any MTX_DEF mutexes or do
anything that might sleep.
- epoch_exit_critical() - corresponding exit call
- epoch_wait_critical() - wait variant that is guaranteed that any
threads in a section are running.
- epoch_global_critical - an epoch_wait_critical safe epoch instance
Requested by: markj
Approved by: sbruno
This simplifies the use of the path variable by making it NUL
terminated. This is a prerequisite for further cleanups.
Reviewed by: imp
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15467
This change moves to using a reference count across lock drop / reacquire
to guarantee liveness.
Currently sends on unix sockets contend heavily on read locking the list lock.
unix1_processes in will-it-scale peaks at 6 processes and then declines.
With this change I get a substantial improvement in number of operations per second
with 96 processes:
x before
+ after
N Min Max Median Avg Stddev
x 11 1688420 1696389 1693578 1692766.3 2971.1702
+ 10 63417955 71030114 70662504 69576423 2374684.6
Difference at 95.0% confidence
6.78837e+07 +/- 1.49463e+06
4010.22% +/- 88.4246%
(Student's t, pooled s = 1.63437e+06)
And even for 2 processes shows a ~18% improvement.
"Small" iron changes (1, 2, and 4 processes):
x before1
+ after1.2
+------------------------------------------------------------------------+
| + |
| x + |
| x + |
| x + |
| x ++ |
| xx ++ |
|x x xx ++ |
| |__________________A_____M_____AM____||
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 1131648 1197750 1197138.5 1190369.3 20651.839
+ 10 1203840 1205056 1204919 1204827.9 353.27404
Difference at 95.0% confidence
14458.6 +/- 13723
1.21463% +/- 1.16683%
(Student's t, pooled s = 14605.2)
x before2
+ after2.2
+------------------------------------------------------------------------+
| +|
| +|
| +|
| +|
| +|
| +|
| x +|
| x +|
| x xx +|
|x xxxx +|
| |___AM_| A|
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 1972843 2045866 2038186.5 2030443.8 21367.694
+ 10 2400853 2402196 2401043.5 2401172.7 385.40024
Difference at 95.0% confidence
370729 +/- 14198.9
18.2585% +/- 0.826943%
(Student's t, pooled s = 15111.7)
x before4
+ after4.2
N Min Max Median Avg Stddev
x 10 3986994 3991728 3990137.5 3989985.2 1300.0164
+ 10 4799990 4806664 4806116.5 4805194 1990.6625
Difference at 95.0% confidence
815209 +/- 1579.64
20.4314% +/- 0.0421713%
(Student's t, pooled s = 1681.19)
Tested by: pho
Reported by: mjg
Approved by: sbruno
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15430
Add epoch section to struct thread. We can use this to
ennable epoch counter to advance even if a section is
perpetually occupied by a thread.
Approved by: sbruno
This implements per-thread counters for PMC sampling. The thread
descriptors are stored in a list attached to the process descriptor.
These thread descriptors can store any per-thread information necessary
for current or future features. For the moment, they just store the counters
for sampling.
The thread descriptors are created when the process descriptor is created.
Additionally, thread descriptors are created or freed when threads
are started or stopped. Because the thread exit function is called in a
critical section, we can't directly free the thread descriptors. Hence,
they are freed to a cache, which is also used as a source of allocations
when needed for new threads.
Approved by: sbruno
Obtained from: jtl
Sponsored by: Juniper Networks, Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15335
... to process input, instead of inside each smaller operations such as
appending a character or moving the cursor forward.
In other words, before we were doing (oversimplified):
teken_input()
<for each input character>
vtterm_putchar()
VTBUF_LOCK()
VTBUF_UNLOCK()
vtterm_cursor_position()
VTBUF_LOCK()
VTBUF_UNLOCK()
Now, we are doing:
vtterm_pre_input()
VTBUF_LOCK()
teken_input()
<for each input character>
vtterm_putchar()
vtterm_cursor_position()
vtterm_post_input()
VTBUF_UNLOCK()
The situation was even worse when the vtterm_copy() and vtterm_fill()
callbacks were involved.
The new callbacks are:
* struct terminal_class->tc_pre_input()
* struct terminal_class->tc_post_input()
They are called in teken_input(), surrounding the while() loop.
The goal is to improve input processing speed of vt(4). As a benchmark,
here is the time taken to write a text file of 360 000 lines (26 MiB) on
`ttyv0`:
* vt(4), unmodified: 1500 ms
* vt(4), with this patch: 1200 ms
* syscons(4): 700 ms
This is on a Haswell laptop with a GENERIC-NODEBUG kernel.
At the same time, the locking is changed in the vt_flush() function
which is responsible to draw the text on screen. So instead of
(indirectly) using VTBUF_LOCK() just to read and reset the dirty area
of the internal buffer, the lock is held for about the entire function,
including the drawing part.
The change is mostly visible while content is scrolling fast: before,
lines could appear garbled while scrolling because the internal buffer
was accessed without locks (once the scrolling was finished, the output
was correct). Now, the scrolling appears correct.
In the end, the locking model is closer to what syscons(4) does.
Differential Revision: https://reviews.freebsd.org/D15302
- fix load/unload race by allocating the per-domain list structure at boot
- fix long extant vm map LOR by replacing pmc_sx sx_slock with global_epoch
to protect the liveness of elements of the pmc_ss_owners list
Reported by: pho
Approved by: sbruno
The INVARIANTS checks in epoch_wait() were intended to
prevent the block handler from returning with locks held.
What it in fact did was preventing anything except Giant
from being held across it. Check that the number of locks
held has not changed instead.
Approved by: sbruno@