Commit Graph

47 Commits

Author SHA1 Message Date
Mateusz Guzik
27dcd3d90b cam: clean up empty lines in .c and .h files 2020-09-01 22:13:48 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Warner Losh
ece56614c8 Revert r355833
While it works on nda, it fails on ada and/or da for at least zfs with a modify
after free issue on a trim BIO. Revert while I rework it to fix those devices.
2019-12-17 21:53:22 +00:00
Warner Losh
0d83f8dc1f Implement bio_speedup
React to the BIO_SPEED command in the cam io scheduler by completing
as successful BIO_DELETE commands that are pending, up to the length
passed down in the BIO_SPEEDUP cmomand. The length passed down is a
hint for how much space on the drive needs to be recovered. By
completing the BIO_DELETE comomands, this allows the upper layers to
allocate and write to the blocks that were about to be trimmed. Since
FreeBSD implements TRIMSs as advisory, we can eliminliminate them and
go directly to writing.

The biggest benefit from TRIMS coomes ffrom the drive being able t
ooptimize its free block pool inthe log run. There's little nto no
bene3efit in the shoort term. , sepeciall whn the trim is followed by
a write. Speedup lets  us make this tradeoff.

Reviewed by: kirk, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D18351
2019-12-17 00:13:45 +00:00
Warner Losh
7918ea40a5 Eliminate the TRIM_ACTIVE flag.
Rather than a trim active flag, have a counter that can be used to
have a absolute limit on the number of trims in flight independent of
any I/O limiting factors.

Sponsored by: Netflix
2019-12-17 00:13:30 +00:00
Warner Losh
3aba1d47c8 Tweak the ddb show cam iosched command a bit.
For each of the different queue types, list the name of the
queue. While it can be worked out from context, this makes it more
useful and clearer.

Sponsored by: Netflix
2019-12-17 00:13:26 +00:00
Warner Losh
c6171b4440 Add rate limiters to TRIM.
Add rate limiters to trims. Trims are a bit different than reads or
writes in that they can be combined, so some care needs to be taken
where we rate limit them. Additional work will be needed to push the
working rate limit below the I/O quanta rate for things like IOPS.

Sponsored by: Netflix
2019-12-17 00:13:21 +00:00
Warner Losh
d900ade516 NVME trim clocking
Add the ability to set two goals for trims in the I/O scheduler. The
first goal is the number of BIO_DELETEs to accumulate
(kern.cam.XX.U.trim_goal). When non-zero, this many trims will be
accumulated before we start to transfer them to lower layers. This is
useful for devices that like to get lots of trims all at once in one
transaction (not all devices are like this, and some vary by workload).

The second is a number of ticks to defer trims. If you've set a trim
goal, then kern.cam.XX.U.trim_ticks controls how long the system will
defer those trims before timing out and sending them anyway. It has no
effect when trim_goal is 0.

In any event, a BIO_FLUSH will cause all the TRIMs to be released to
the periph drivers. This may be a minor overloading of what BIO_FLUSH
is supposed to mean, but it's useful to preserve other ordering
semantics that users of BIO_FLUSH reply on.

Sponsored by: Netflix, Inc
2018-11-27 00:36:35 +00:00
Warner Losh
1759fd7798 Minor tweaks to the formatting
Tweak the format of the trim + read bias code. Add similar debug to
the read + writes case.

Spondored by: Netflix
2018-11-26 22:50:30 +00:00
Warner Losh
e5436ab5af Add cam_iosched_set_latfcn to set a latency callback for high latency.
It's often useful to have a callback when an I/O takes more than a
threshold amount of time. This adds the infrastructure for periph
devices to register one.

One use-case is as a debugging aide when you need a semi-realtime
indication of an I/O outlier so you can trigger bus capture gear for
vendor analysis.

Sponsored by: Netflix, Inc
2018-11-15 16:02:45 +00:00
Warner Losh
74cc33ce57 Flesh out a comment about what we're doing with read bias and trims.
Sponsored by: Netflix
2018-08-15 00:15:40 +00:00
Warner Losh
62c94a0551 For the dynamic I/O scheduler, make the TRIM stuff also count against
read bias so we do reads in preference to TRIMs. This helps a lot when
many trims are delivered at once from the upper layers as they tend to
delay READs due to priority inversion in the code today.

The non iosched case will be fixed when the trim comibing changes
needed for nvme come in later this year.

Sponsored by: Netflix
2018-07-26 22:55:51 +00:00
Warner Losh
041f49aece Remove the 'All Rights Reserved' clause from some of the stuff I've
done for Netflix, since I'm in the neighborhood.
2018-05-09 20:32:23 +00:00
Warner Losh
157cb465c4 Fix inverted logic that counted all completions as errors, except when
they were actual errors.

Sponsored by: Netflix
2018-03-14 16:44:57 +00:00
Warner Losh
8a3de7bc34 Allow NULL ccb to cam_iosched_bio_complete
When the ccb is NULL to cam_iosched_bio_complete, just update the
other statistics, but not the time. If many operations are collapsed
together, this is needed to keep stats properly for the grouped bp.
This should fix trim accounting.

Sponsored by: Netflix
2018-03-14 16:44:16 +00:00
Warner Losh
2d87718fda Use bool instead of int for predicate functions relating to work
available.
2018-02-23 16:06:54 +00:00
Warner Losh
07e5967a22 Revert r329814 as well. It should have been in r329819. 2018-02-22 11:51:50 +00:00
Warner Losh
0028abe633 Backout r329818, r329816 and r329815.
These aren't the commits I thought I was testing prior to
commit. Revert until I can sort out what happened and fix it.
2018-02-22 11:18:33 +00:00
Warner Losh
91acaad987 Fix typo in last commit after last rebase before commit... 2018-02-22 10:55:23 +00:00
Warner Losh
c5fe3ae9b8 Introduce capacity flags for periphs
Introduce flags word to describe the capacities of the peripheral.
First bit will describe if the periph driver allows multiple
outstanding TRIMS to be active in a device.

Modify the I/O scheduler so that the nda driver can queue trims
for a while after the first one arrives. We'll queue until we see
a I/O scheduler tick, then we'll schedule as many TRIMs as allowed
by other factors (currently this is slocts in the NVMe controller).
This mariginally helps the read latency issues we see with reads,
but sets the stage for the nda driver to do TRIM collapsing like the
da and ada drivers do today.

Sponsored by: Netflix
2018-02-22 05:43:55 +00:00
Warner Losh
c9878d6d63 Note when we tick.
To help implement a policy of 'queue all trims until next I/O sched
tick' policy to help coalesce them, note when we tick so we can do
something special on the first call after the tick to get more work.

Sponsored by: Netflix
2018-02-22 05:43:50 +00:00
Warner Losh
f2b9885036 Wrap an extra long line
This debugging line is too big for even my largest xterm. wrap it at
about 80 columns.

Sponsored by: Netflix
2018-02-22 05:43:45 +00:00
Warner Losh
97f8aa050e Don't sort TRIMs.
While the code for ada and da both assume that the trim list is
ordered when doing the coaleascing the TRIMs, it turns out that
creating the sorted list uses more resources than are saved by having
slightly fewer trims sent to the device.

Sponsored by: Netflix
2018-02-22 05:43:20 +00:00
Warner Losh
c4b72d8b37 Keep a counter for number of requests completed with an error.
Sponsored by: Netflix
2018-02-06 23:21:08 +00:00
Pedro F. Giffuni
f24882eca5 SPDX: finish tagging sys/cam. 2018-01-16 23:19:57 +00:00
Warner Losh
6ca2fb6623 Treat a 'current' value of 0 as unlimited as a failsfe.
When limiting I/O, a value of 0 makes no sense as a limit. No progress
can be made. Trade the possibility that someone might be doing
something clever to achieve ultra-low I/O limits vs the damage of not
ever making progress on an I/O in favor of making progress. Now the
machine won't be useless if this accidentally gets requested.

Sponsored by: Netflix
2017-10-24 02:25:42 +00:00
Warner Losh
78ed811e6c cam iosched: Bettar account IOPS for smoother performance
Prevent cam_iosched_iops_tick() from discarding 'unspent' ios unless
it's a new accounting interval.

Previously ios that weren't used between ticks were lost, as a result
the iops limiter could enforce a limit below the configured maximum.

Obtained from: ElectroBSD
Submitted by: Fabian Keil
PR: 221974
2017-09-22 02:36:36 +00:00
Warner Losh
f777123b83 cam iosched: Enforce iop limits below the quanta value
Previously the iops limiter would always allow at least
quanta ios per second as cam_iosched_iops_tick() never set
ios->l_value1 below 1.

Submitted by: Fabian Keil <fk@fabiankeil.de>
Obtained from: ElectroBSD
PR: 221974
2017-09-22 02:36:32 +00:00
Warner Losh
89d26636f3 cam iosched: Call cam_iosched_limiter_init() after ios->current is set to the default
Previously ios->current was set to 0 until the first
cam_iosched_cl_maybe_steer() call.

PR: 221954
Obtained from: ElectroBSD
Submitted by: Fabian Keil
Differential Revision: https://reviews.freebsd.org/D12349
2017-09-20 21:26:01 +00:00
Warner Losh
3028dd8dd5 cam iosched: Schedule cam_iosched_ticker() quanta times per second
Previously callout_reset() was called with a "ticks" value that was
off by one.  As a result cam_iosched_ticker() was called a bit too
frequently: On systems with hz=1000 a quanta value of 200 resulted in
~250 calls and a value of 100 in ~111 calls.

For the "queue_depth" and "bandwidth" limiters the difference doesn't
matter but the "iops" limiter depends on the scheduling to enforce the
correct maximum.

PR: 221956
Obtained from: ElectroBSD
Submitted by: Fabian Keil
Differential Revision: https://reviews.freebsd.org/D12350
2017-09-20 21:25:56 +00:00
Warner Losh
2d22619adc cam iosched: Add a handler for the quanta sysctl to enforce valid values
Invalid values can result in devision-by-zero panics or other
undefined behaviour so lets not allow them.

PR: 221957
Obtained from: ElectroBSD
Submitted by: Fabian Keil
Differential Revision: https://reviews.freebsd.org/D12351
2017-09-20 21:19:53 +00:00
Warner Losh
84c12dcdd0 cam iosched: Use the write queue for BIO_ZONE commands
Use the write queue for BIO_ZONE commands so they can't get executed
ahead of writes that were sent after them. More generally, since they
introduce strong ordering into the list, they need to go to the write
queue (which is the only queue that BIO_ORDERED is honored for at the
moment). In fact, fix mismatch between queueing and dequeueing code by
changing this to queue all non-reads (and non-trims) to the write
queue.

As a side effect this prevents the kernel message:
kernel: Found bio_cmd = 0x9
which cam_iosched_next_bio() emits when finding commands
other than BIO_READ in the read queue.

PR: 221973
Obtained from: ElectroBSD
Submitted by: Fabian Keil
Differential Revision: https://reviews.freebsd.org/D12353
2017-09-20 21:13:20 +00:00
Warner Losh
55c770b40a Update comments on what the CAM_IOSCHED_FLAG_TRIM_ACTIVE means.
It's intended only for those situations where the periph driver
ones to limit the number of trims active to one and only one.
Also update comments on associated functions.

Sponsored by: Netflix
2017-09-15 20:15:55 +00:00
Warner Losh
d7fa1ab02d cam iosched: Limit the quanta default to hz if it's below 200
The cam_iosched_ticker() can't be scheduled more than once per tick.
Some limiters depend on quanta matching the number of calls per second
to enforce the proper limits. Limit the quanta to no faster than 1 per
clock tick. This fixes some features when running in VMs where the
default HZ is 100.

PR: 221953
Obtained from: ElectroBSD
Differential Revision: https://reviews.freebsd.org/D12337
Submitted by: Fabian Keil
2017-09-12 23:46:33 +00:00
Warner Losh
08fc2f23b3 Expand the latency tracking array from 1.024s to 8.192s to help track
extreme outliers from dodgy drives. Adjust comments to reflect this,
and make sure that the number of latency buckets match in the two
places where it matters.
2017-08-24 22:11:10 +00:00
Warner Losh
e4c9cba71f Fix 32-bit overflow on latency measurements
o Allow I/O scheduler to gather times on 32-bit systems. We do this by shifting
  the sbintime_t over by 8 bits and truncating to 32-bits. This gives us 8.24
  time. This is sufficient both in range (256 seconds is about 128x what current
  users need) and precision (60ns easily meets the 1ms smallest bucket size
  measurements). 64-bit systems are unchanged. Centralize all the time math so
  it's easy to tweak tha range / precision tradeoffs in the future.
o While I'm here, the I/O scheduler should be using periph_data rather than
  sim_data since it is operating on behalf of the periph.

Differential Review: https://reviews.freebsd.org/D12119
2017-08-24 22:10:58 +00:00
Ed Maste
0b4060b073 cam iosched: fix typos in comments
PR:		220947
Submitted by:	Fabian Keil
Obtained from:	ElectroBSD
2017-08-18 16:38:33 +00:00
Warner Losh
79d80af216 Implement moving SD.
From the paper "Incremental calculation of weighted mean and variance"
by Tony Finch Februrary 2009, retrieved from
http://people.ds.cam.ac.uk/fanf2/hermes/doc/antiforgery/stats.pdf
converted to use shifting.
2017-03-22 19:18:47 +00:00
Warner Losh
e831795b8a Remove nested #ifdef that can't possibly be false. 2017-01-27 08:30:43 +00:00
Warner Losh
b20c0a07fb Preening pass to fix up trailing white space and other minor style(9)
nits (though I'm sure others remain).

MFC After: 3 days
2017-01-25 02:05:08 +00:00
Warner Losh
cf3ec15167 Compute two new metrics. Disk load, the average number of transactions
we have queued up normaliazed to the queue size. Also compute buckets
of latency to help compute, in userland, estimates of Median, P90, P95
and P99 values.

Sponsored by: Netflix, Inc
2016-09-30 17:49:04 +00:00
Warner Losh
035ec48e55 Tidy up loose ends from Netflix I/O sched rename to dynamic I/O sched.
Rename kern.cam.do_netflix_iosched sysctl to
kern.cam.do_dynamic_iosched.

Approved by: re (kib@)
2016-07-07 20:31:35 +00:00
Warner Losh
df2362478e Rename CAM_NETFLIX_IOSCHED to CAM_IOSCHED_DYNAMIC to better reflect
its nature.

Approved by: re
Reviewed By: jhb
Differential Revision: https://reviews.freebsd.org/D6811
2016-06-23 23:20:58 +00:00
Xin LI
b97b6d27f2 Fix tinderbox LINT build. 2016-04-18 08:24:13 +00:00
Warner Losh
2b5c19f196 Do the intmax_t dance for debug so CAM_NETFLIX_IOSCHED builds on
i386.

Sponsored by: Netflix, Inc
2016-04-17 21:29:44 +00:00
Warner Losh
5ede5b8cb5 Put function only used by CAM_NETFLIX_IOSCHED under that ifdef. 2016-04-15 05:10:32 +00:00
Warner Losh
ba6c22ce93 Add in missing files from r298002. 2016-04-14 22:13:44 +00:00