In r304602, I mistakenly removed the ioat_process_events check that we weren't
processing events before the hardware had completed the descriptor
("last_seen"). Reinstate that logic.
Keep the defensive loop condition and additionally make sure we've actually
completed a descriptor before blindly chasing the ring around.
In reset, queue and finish the startup command before allowing any event
processing or submission to occur. Avoid potential missed callouts by
requeueing the poll later.
is_completion_pending governs whether or not a callout will be scheduled
when new work is queued on the IOAT device. If true, a callout is
already scheduled, so we do not need a new one. If false, we schedule
one and set it true. Because resetting the hardware completed all
outstanding work but failed to clear is_completion_pending, no new
callout could be scheduled after a reset with pending work.
This resulted in a driver hang for polled-only work.
Fix the race between ioat_reset_hw and ioat_process_events.
HW reset isn't protected by a lock because it can sleep for a long time
(40.1 ms). This resulted in a race where we would process bogus parts
of the descriptor ring as if it had completed. This looked like
duplicate completions on old events, if your ring had looped at least
once.
Block callout and interrupt work while reset runs so the completion end
of things does not observe indeterminate state and process invalid parts
of the ring.
Start the channel with a manually implemented ioat_null() to keep other
submitters quiesced while we wait for the channel to start (100 us).
r295605 may have made the race between ioat_reset_hw and
ioat_process_events wider, but I believe it already existed before that
revision. ioat_process_events can be invoked by two asynchronous
sources: callout (softclock) and device interrupt. Those could race
each other, to the same effect.
Reviewed by: markj
Approved by: re
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D7097
Add CRC/MOVECRC operations, as well as the TEST and STORE variants.
With these operations, a CRC32C can be computed over one or more
descriptors' source data. When the STORE operation is encountered, the
accumulated CRC32C is emitted to memory. A TEST operations triggers an
IOAT channel error if the accumulated CRC32C does not match one in
memory.
These operations are not exposed through any API yet.
Sponsored by: EMC / Isilon Storage Division
The IOAT engine can only address the low 40 bits (1 TB) of physmem via
the 'next descriptor' pointer. Restrict acceptable range given to
bus_dma_tag_create to match.
Sponsored by: EMC / Isilon Storage Division
The I/OAT HW reset process may sleep, so it is invalid to perform a
channel reset from the software interrupt thread.
Sponsored by: EMC / Isilon Storage Division
Some classes of IOAT hardware prefetch reads. DMA operations that
depend on the result of prior DMA operations must use the DMA_FENCE flag
to prevent stale reads.
(E.g., I've hit this personally on Broadwell-EP. The Broadwell-DE has a
different IOAT unit that is documented to not pipeline DMA operations.)
Sponsored by: EMC / Isilon Storage Division
ioat_acquire_reserve() is an extended version of ioat_acquire(). It
allows users to reserve space in the channel for some number of
descriptors. If this succeeds, it guarantees that at least submission
of N valid descriptors will succeed.
Sponsored by: EMC / Isilon Storage Division
Different revisions support different operations. Refer to Intel
External Design Specifications to figure out what your hardware
supports.
Sponsored by: EMC / Isilon Storage Division
The new flag, -c <period>, sets the interrupt coalescing period in
microseconds through the new ioat(4) API ioat_set_interrupt_coalesce().
Also add a -z flag to zero ioat statistics before tests, to make it easy
to measure results.
Sponsored by: EMC / Isilon Storage Division
In I/OAT, this is done through the INTRDELAY register. On supported
platforms, this register can coalesce interrupts in a set period to
avoid excessive interrupt load for small descriptor workflows. The
period is configurable anywhere from 1 microsecond to 16.38
milliseconds, in microsecond granularity.
Sponsored by: EMC / Isilon Storage Division
The hardware supports descriptors with two non-contiguous pages. This
allows issuing one descriptor for an 8k copy from/to non-contiguous but
otherwise page-aligned memory.
Sponsored by: EMC / Isilon Storage Division
Certain invalid operations trigger hardware error conditions. Error
conditions that only halt one channel can be detected and recovered by
resetting the channel. Error conditions that halt the whole device are
generally not recoverable.
Add a sysctl to inject channel-fatal HW errors,
'dev.ioat.<N>.force_hw_error=1'.
When a halt due to a channel error is detected, ioat(4) blocks new
operations from being queued on the channel, completes any outstanding
operations with an error status, and resets the channel before allowing
new operations to be queued again.
Update ioat.4 to document error recovery; document blockfill introduced
in r290021 while we are here; document ioat_put_dmaengine() added in
r289907; document DMA_NO_WAIT added in r289982.
Sponsored by: EMC / Isilon Storage Division