Commit Graph

45 Commits

Author SHA1 Message Date
Conrad Meyer
374b05e1ff ioat(4): On error detected in ithread, defer HW reset to taskqueue
The I/OAT HW reset process may sleep, so it is invalid to perform a
channel reset from the software interrupt thread.

Sponsored by:	EMC / Isilon Storage Division
2016-02-13 22:51:25 +00:00
Conrad Meyer
d2c55e5ad0 ioat(4): Also check for errors if the channel is suspended
Sponsored by:	EMC / Isilon Storage Division
2016-02-13 22:51:17 +00:00
Conrad Meyer
564af7a654 ioat(4): Decode/define more capabilities, operations
These are defined in the Intel Haswell EDS volume 2 (registers) (507849
v2.1).

Sponsored by:	EMC / Isilon Storage Division
2016-02-13 19:01:56 +00:00
Conrad Meyer
007a703036 ioat(4): Recheck status register on zero-descriptor wakeups
Errors that halt the channel don't necessarily result in a completion
update, apparently.

Sponsored by:	EMC / Isilon Storage Division
2016-02-13 02:55:45 +00:00
Conrad Meyer
6ca07079af ioat(4): Add support for 'fence' bit with DMA_FENCE flag
Some classes of IOAT hardware prefetch reads.  DMA operations that
depend on the result of prior DMA operations must use the DMA_FENCE flag
to prevent stale reads.

(E.g., I've hit this personally on Broadwell-EP.  The Broadwell-DE has a
different IOAT unit that is documented to not pipeline DMA operations.)

Sponsored by:	EMC / Isilon Storage Division
2016-01-15 01:34:43 +00:00
Conrad Meyer
1502e36346 ioat(4): Add ioat_acquire_reserve() KPI
ioat_acquire_reserve() is an extended version of ioat_acquire().  It
allows users to reserve space in the channel for some number of
descriptors.  If this succeeds, it guarantees that at least submission
of N valid descriptors will succeed.

Sponsored by:	EMC / Isilon Storage Division
2016-01-07 23:02:15 +00:00
Conrad Meyer
bd81fe68ee ioat(4): Add ioat_get_max_io_size() KPI
Consumers need to know the permitted IO size to send maximally sized
chunks to the hardware.

Sponsored by:	EMC / Isilon Storage Division
2016-01-05 20:42:19 +00:00
Conrad Meyer
31bf2875ea ioat(4): Add an API to get HW revision
Different revisions support different operations.  Refer to Intel
External Design Specifications to figure out what your hardware
supports.

Sponsored by:	EMC / Isilon Storage Division
2015-12-17 23:21:37 +00:00
Conrad Meyer
d37872da34 ioatcontrol(8): Add support for interrupt coalescing
The new flag, -c <period>, sets the interrupt coalescing period in
microseconds through the new ioat(4) API ioat_set_interrupt_coalesce().

Also add a -z flag to zero ioat statistics before tests, to make it easy
to measure results.

Sponsored by:	EMC / Isilon Storage Division
2015-12-14 22:02:01 +00:00
Conrad Meyer
5ca9fc2a8d ioat(4): Add support for interrupt coalescing
In I/OAT, this is done through the INTRDELAY register.  On supported
platforms, this register can coalesce interrupts in a set period to
avoid excessive interrupt load for small descriptor workflows.  The
period is configurable anywhere from 1 microsecond to 16.38
milliseconds, in microsecond granularity.

Sponsored by:	EMC / Isilon Storage Division
2015-12-14 22:01:52 +00:00
Conrad Meyer
01fbbc8886 ioat(4): Gather and expose DMA statistics via sysctl
Organize the dev.ioat sysctl node into a tree while we're here.

Sponsored by:	EMC / Isilon Storage Division
2015-12-14 22:00:07 +00:00
Conrad Meyer
6a301ac85a ioat(4): Add ioatcontrol(8) testing for copy_8k
Add -E ("Eight k") and -m ("Memcpy") modes to the ioatcontrol(8) tool.

Prompted by:	rpokala
Sponsored by:	EMC / Isilon Storage Division
2015-12-10 02:05:35 +00:00
Conrad Meyer
5afc2508e1 ioat(4): Add Broadwell-EP PCI IDs
Sponsored by:	EMC / Isilon Storage Division
2015-12-09 22:46:00 +00:00
Conrad Meyer
9950fde08d ioat(4): Add ioat_copy_8k_aligned KPI
The hardware supports descriptors with two non-contiguous pages.  This
allows issuing one descriptor for an 8k copy from/to non-contiguous but
otherwise page-aligned memory.

Sponsored by:	EMC / Isilon Storage Division
2015-12-09 22:45:51 +00:00
Conrad Meyer
c2b69205ac ioat(4): Add MODULE_VERSION so MODULE_DEPEND works
Suggested by:	jhb
Review in progress:	cc
Sponsored by:	EMC / Isilon Storage Division
2015-12-04 23:31:32 +00:00
Conrad Meyer
faefad9c12 ioat: Handle channel-fatal HW errors safely
Certain invalid operations trigger hardware error conditions.  Error
conditions that only halt one channel can be detected and recovered by
resetting the channel.  Error conditions that halt the whole device are
generally not recoverable.

Add a sysctl to inject channel-fatal HW errors,
'dev.ioat.<N>.force_hw_error=1'.

When a halt due to a channel error is detected, ioat(4) blocks new
operations from being queued on the channel, completes any outstanding
operations with an error status, and resets the channel before allowing
new operations to be queued again.

Update ioat.4 to document error recovery;  document blockfill introduced
in r290021 while we are here;  document ioat_put_dmaengine() added in
r289907;  document DMA_NO_WAIT added in r289982.

Sponsored by:	EMC / Isilon Storage Division
2015-10-31 20:38:06 +00:00
Conrad Meyer
1ffae6e80a ioat_test: Handled forced hardware resets gracefully
Sponsored by:	EMC / Isilon Storage Division
2015-10-29 04:16:52 +00:00
Conrad Meyer
5f77bd3e24 ioat: Drain/quiesce the device less racily
On detach and during a forced HW reset.

Sponsored by:	EMC / Isilon Storage Division
2015-10-29 04:16:39 +00:00
Conrad Meyer
e9497f9bbd ioatcontrol(8): Add and document "raw" testing mode
Allows DMA from/to arbitrary KVA or physical address.  /dev/ioat_test
must be enabled by root and is only R/W root, so this is approximately
as dangerous as /dev/mem and /dev/kmem.

Sponsored by:	EMC / Isilon Storage Division
2015-10-29 04:16:16 +00:00
Conrad Meyer
1693d27b71 ioat: Define DMACAPABILITY bits
Check for BFILL capability before initiating blockfill operations.

Sponsored by:	EMC / Isilon Storage Division
2015-10-28 02:37:24 +00:00
Conrad Meyer
2a4fd6b17a ioat: Add support for Block Fill operations
The IOAT hardware supports writing a 64-bit pattern to some destination
buffer.  The same limitations on buffer length apply as for copy
operations.  Throughput is a bit higher (probably because fill does not
have to spend bandwidth reading from a source in memory).

Support for testing Block Fill has been added to ioatcontrol(8) and the
ioat_test device.  ioatcontrol(8) accepts the '-f' flag, which tests
Block Fill.  (If the flag is omitted, the tool tests copy by default.)
The '-V' flag, in conjunction with '-f', verifies that buffers are
filled in the expected pattern.

Tested on:	Broadwell DE (Xeon D-1500)
Sponsored by:	EMC / Isilon Storage Division
2015-10-26 19:34:12 +00:00
Conrad Meyer
9e3bbf26a9 ioat: Dedupe operation enqueue logic
Add generic hw descriptor struct and generic control flags struct, in
preparation for other kinds of IOAT operation.

Sponsored by:	EMC / Isilon Storage Division
2015-10-26 19:34:00 +00:00
Conrad Meyer
59acd4badb ioat: Add %b format string for CHANERR codes
Sponsored by:	EMC / Isilon Storage Division
2015-10-26 03:30:50 +00:00
Conrad Meyer
bf8553ea38 ioat: Allocate memory for ring resize sanely
Add a new flag for DMA operations, DMA_NO_WAIT.  It behaves much like
other NOWAIT flags -- if queueing an operation would sleep, abort and
return NULL instead.

When growing the internal descriptor ring, the memory allocation is
performed outside of all locks.  A lock-protected flag is used to avoid
duplicated work.  Threads that cannot sleep and attempt to queue
operations when the descriptor ring is full allocate a larger ring with
M_NOWAIT, or bail if that fails.

ioat_reserve_space() could become an external API if is important to
callers that they have room for a sequence of operations, or that those
operations succeed each other directly in the hardware ring.

This patch splits the internal head index (->head) from the hardware's
head-of-chain (DMACOUNT) register (->hw_head).  In the future, for
simplicity's sake, we could drop the 'ring' array entirely and just use
a linked list (with head and tail pointers rather than indices).

Suggested by:	Witness
Sponsored by:	EMC / Isilon Storage Division
2015-10-26 03:30:38 +00:00
Conrad Meyer
65e4f8adce ioat: Expose more softc members in sysctls
Kill some unused softc variables while we're here.

Sponsored by:	EMC / Isilon Storage Division
2015-10-26 02:21:32 +00:00
Conrad Meyer
43fc184751 ioat: Introduce KTR probes
Sponsored by:	EMC / Isilon Storage Division
2015-10-26 02:21:19 +00:00
Conrad Meyer
cea5b880c3 ioat: Actually bring the hardware back online after reset
We need to reset the chancmp and chainaddr MMIO registers to bring the
device back to a working state.

Name the chanerr bits while we're here.

Sponsored by:	EMC / Isilon Storage Division
2015-10-24 23:46:32 +00:00
Conrad Meyer
e88e14b9f0 ioat: Use bus_alloc_resource_any(9)
Sponsored by:	EMC / Isilon Storage Division
2015-10-24 23:46:20 +00:00
Conrad Meyer
8f27463708 ioat: Extract halted error-debugging to a function
Sponsored by:	EMC / Isilon Storage Division
2015-10-24 23:46:08 +00:00
Conrad Meyer
4becebdf9e ioat: Always re-arm interrupts in process_events
It doesn't hurt, even if there is nothing to do.

Sponsored by:	EMC / Isilon Storage Division
2015-10-24 23:45:56 +00:00
Conrad Meyer
f7157235b8 ioat: Add sysctl to force hw reset
To enable controlled testing.

Sponsored by:	EMC / Isilon Storage Division
2015-10-24 23:45:45 +00:00
Conrad Meyer
466b3540ff ioat: refcnt users so we can drain them at detach
We only need to borrow a mutex for the drain sleep and the 0->1
transition, so just reuse an existing one for now.

The wchan is arbitrary.  Using refcount itself would have required
__DEVOLATILE(), so use the lock's address instead.

Different uses are tagged by kind, although we only do anything with
that information in INVARIANTS builds.

Sponsored by:	EMC / Isilon Storage Division
2015-10-24 23:45:33 +00:00
Conrad Meyer
09f49f249a ioat: When queueing operations, assert the submit lock
Callers should have acquired this lock when they invoked ioat_acquire()
before issuing operations.  Assert it is held.

Sponsored by:	EMC / Isilon Storage Division
2015-10-24 23:45:21 +00:00
Conrad Meyer
f46011ae19 ioat: Don't use sleeping allocation in lock path
This is still the worst possible way to allocate memory if it will ever
be under pressure, but at least it won't deadlock.

Suggested by:	WITNESS
Sponsored by:	EMC / Isilon Storage Division
2015-10-24 23:45:10 +00:00
Conrad Meyer
fe720f5ae0 ioat: Pull out timer callout delay into a constant
Pull out the timer callout delay into IOAT_INTR_TIMO and shorten it
considerably (5s -> 100ms).  Single operations do not take 5-10 seconds
and when interrupts aren't working, waiting 100ms sucks a lot less than
5s.

Sponsored by:	EMC / Isilon Storage Division
2015-10-24 23:44:58 +00:00
Conrad Meyer
592fe72d77 ioat_test: Add a colon (':') for style
Missed in r289776.

Sponsored by:	EMC / Isilon Storage Division
2015-10-22 23:08:08 +00:00
Conrad Meyer
1c25420ea8 ioat: Clean up logging
Replace custom Linux-like logging with a thin shim around
device_printf(), when the softc is available.

In ioat_test, shim around printf(9) instead.

Sponsored by:	EMC / Isilon Storage Division
2015-10-22 23:03:33 +00:00
Conrad Meyer
7afbb2638e ioat: Fix some attach/detach issues
Don't run the selftest until after we've enabled bus mastering, or the
DMA engine can't copy anything for our test.

Create the ioat_test device on attach, if so tuned.  Destroy the
ioat_test device on teardown.

Replace deprecated 'CALLOUT_MPSAFE' with correct '1' in callout_init().

Sponsored by:	EMC / Isilon Storage Division
2015-10-22 16:46:21 +00:00
Conrad Meyer
7c69db50df Improve flexibility of ioat_test / ioatcontrol(8)
The test logic now preallocates memory before running the test.

The buffer size is now configurable.  Post-copy verification is
configurable.  The number of copies to chain into one transaction (one
interrupt) is configurable.

A 'duration' mode is added, which repeats the test until the duration
has elapsed, reporting the B/s and transactions completed.

ioatcontrol.8 has been updated to document the new arguments.

Initial limits (on this particular Broadwell-DE) (and when the
interrupts are working) seem to be: 256 interrupts/sec or ~6 GB/s,
whichever limit is more restrictive.

Unfortunately, it seems the interrupt-reset handling on Broadwell isn't
working as intended.  That will be fixed in a later commit.

Sponsored by:	EMC / Isilon Storage Division
2015-10-22 04:38:05 +00:00
Conrad Meyer
b81eee4a22 ioat: Define IOAT_XFERCAP_VALID_MASK and use in ioat_read_xfercap
Instead of ANDing a magic constant later.

Sponsored by:	EMC / Isilon Storage Division
2015-10-22 04:33:05 +00:00
Conrad Meyer
f3e30f9721 ioat: Use correct macro, fix build on i386
Sponsored by:	EMC / Isilon Storage Division
2015-10-13 19:46:12 +00:00
Conrad Meyer
0d1a05d9e6 ioat(4): pci_save/restore_state to persist MSI-X registers over BDXDE reset
Also for BWD devices, per jimharris@.

Reviewed by:	jhb
Approved by:	markj (mentor)
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3552
2015-09-02 22:48:41 +00:00
Conrad Meyer
4253ea5083 ioat: re-initialize interrupts after resetting hw on BDXDE
Resetting some generations of the I/OAT hardware (just BDXDE for now)
resets the corresponding MSI-X registers.  So, teardown and
re-initialize interrupts after resetting the hardware.

Reviewed by:	jimharris
Approved by:	markj (mentor)
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3549
2015-09-02 16:48:03 +00:00
Conrad Meyer
8c8e848710 ioat(4): Minor style cleanups
Suggested by:	ngie
Reviewed by:	jimharris
Approved by:	markj (mentor)
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3481
2015-08-25 17:39:03 +00:00
Conrad Meyer
e974f91c38 Import ioat(4) driver
I/OAT is also referred to as Crystal Beach DMA and is a Platform Storage
Extension (PSE) on some Intel server platforms.

This driver currently supports DMA descriptors only and is part of a
larger effort to upstream an interconnect between multiple systems using
the Non-Transparent Bridge (NTB) PSE.

For now, this driver is only built on AMD64 platforms.  It may be ported
to work on i386 later, if that is desired.  The hardware is exclusive to
x86.

Further documentation on ioat(4), including API documentation and usage,
can be found in the new manual page.

Bring in a test tool, ioatcontrol(8), in tools/tools/ioat.  The test
tool is not hooked up to the build and is not intended for end users.

Submitted by:	jimharris, Carl Delsey <carl.r.delsey@intel.com>
Reviewed by:	jimharris (reviewed my changes)
Approved by:	markj (mentor)
Relnotes:	yes
Sponsored by:	Intel
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3456
2015-08-24 19:32:03 +00:00