mechanism.
Now that all requests are timed, we are guaranteed to get a completion
notification, even if it is an abort status due to a timed out admin
command.
This has the effect of simplifying the controller and namespace setup
code, so that it reads straight through rather than broken up into
a bunch of different callback functions.
Sponsored by: Intel
Reviewed by: carl
start or reset. Also add a notifier for NVMe consumers for controller fail
conditions and plumb this notifier for nvd(4) to destroy the associated
GEOM disks when a failure occurs.
This requires a bit of work to cover the races when a consumer is sending
I/O requests to a controller that is transitioning to the failed state. To
help cover this condition, add a task to defer completion of I/Os submitted
to a failed controller, so that the consumer will still always receive its
completions in a different context than the submission.
Sponsored by: Intel
Reviewed by: carl
This is just as effective, and removes the need for a bunch of admin commands
to a controller that's going to be disabled shortly anyways.
Sponsored by: Intel
Reviewed by: carl
start process.
The spec indicates the OS driver should use Set Features (Software
Progress Marker) to set the pre-boot software load count to 0
after the OS driver has successfully been initialized. This allows
pre-boot software to determine if there have been any issues with the
OS loading.
Sponsored by: Intel
Reviewed by: carl
This flag was originally added to communicate to the sysctl code
which oids should be built, but there are easier ways to do this. This
needs to be cleaned up prior to adding new controller states - for example,
controller failure.
Sponsored by: Intel
Reviewed by: carl
The controller's IDENTIFY data contains MDTS (Max Data Transfer Size) to
allow the controller to specify the maximum I/O data transfer size. nvme(4)
already provides a default maximum, but make sure it does not exceed what
MDTS reports.
Sponsored by: Intel
Reviewed by: carl
that if a specific I/O repeatedly times out, we don't retry it indefinitely.
The default number of retries will be 4, but is adjusted using hw.nvme.retry_count.
Sponsored by: Intel
Reviewed by: carl
specified log page.
This satisfies the spec condition that future async events of the same type
will not be sent until the associated log page is fetched.
Sponsored by: Intel
Reviewed by: carl
NVMe error log entries include status, so breaking this out into
its own data structure allows it to be included in both the
nvme_completion data structure as well as error log entry data
structures.
While here, expose nvme_completion_is_error(), and change all of
the places that were explicitly looking at sc/sct bits to use this
macro instead.
Sponsored by: Intel
Reviewed by: carl
This protects against cases where a controller crashes with multiple
I/O outstanding, each timing out and requesting controller resets
simultaneously.
While here, remove a debugging printf from a previous commit, and add
more logging around I/O that need to be resubmitted after a controller
reset.
Sponsored by: Intel
Reviewed by: carl
While aborts are typically cleaner than a full controller reset, many times
an I/O timeout indicates other controller-level issues where aborts may not
work. NVMe drivers for other operating systems are also defaulting to
controller reset rather than aborts for timed out I/O.
Sponsored by: Intel
Reviewed by: carl
On any I/O timeout, check for csts.cfs==1. If set, the controller
is reporting fatal status and we reset the controller immediately,
rather than trying to abort the timed out command.
This changeset also includes deferring the controller start portion
of the reset to a separate task. This ensures we are always performing
a controller start operation from a consistent context.
Sponsored by: Intel
Reviewed by: carl
invoke it from nvmecontrol(8).
Controller reset will be performed in cases where I/O are repeatedly
timing out, the controller reports an unrecoverable condition, or
when explicitly requested via IOCTL or an nvme consumer. Since the
controller may be in such a state where it cannot even process queue
deletion requests, we will perform a controller reset without trying
to clean up anything on the controller first.
Sponsored by: Intel
Reviewed by: carl
Also add logic to clean up all outstanding asynchronous event requests
when resetting or shutting down the controller, since these requests
will not be explicitly completed by the controller itself.
Sponsored by: Intel
function.
This allows for completions outside the normal completion path, for example
when an ABORT command fails due to the controller reporting the targeted
command does not exist. This is mainly for protection against a faulty
controller, but we need to clean up our internal request nonetheless.
Sponsored by: Intel
the submit action assuming the qpair lock has already been acquired.
Also change nvme_qpair_submit_request to just lock/unlock the mutex
around a call to this new function.
This fixes a recursive mutex acquisition in the retry path.
Sponsored by: Intel
/usr/src/sys/modules/nvme/../../dev/nvme/nvme.c:211: warning: format '%qx' expects type 'long unsigned int', but argument 9 has type 'long long unsigned int' [-Wformat]
This change was originally intended to account for test kthreads under
the nvmecontrol process, but jhb indicated it may not be safe to
associate kthreads with userland processes and this could have
unintended consequences.
I did not observe any problems with this change, but my testing didn't
exhaust the kinds of corner cases that could cause problems. It is not
that important to account for these test threads under nvmecontrol, so I
am just reverting this change for now.
On a related note, the part of this patch for <= 7.x fails compilation
so reverting this fixes that too.
Suggested by: jhb
current CPU and not always CPU 0.
This has the added benefit of reducing a huge amount of spinlock
contention on the callout_cpu spinlock for CPU 0.
Sponsored by: Intel
This eliminates the need to manage queue depth at the nvd(4) level for
Chatham prototype board workarounds, and also adds the ability to
accept a number of requests on a single qpair that is much larger
than the number of trackers allocated.
Sponsored by: Intel
nvme_ctrlr_submit_io_request().
While here, also fix case where a uio may have more than 1 iovec.
NVMe's definition of SGEs (called PRPs) only allows for the first SGE to
start on a non-page boundary. The simplest way to handle this is to
construct a temporary uio for each iovec, and submit an NVMe request
for each.
Sponsored by: Intel