Commit Graph

57 Commits

Author SHA1 Message Date
Warner Losh
4d5475613e Implement nvme suspend / resume for pci attachment
When we suspend, we need to properly shutdown the NVME controller. The
controller may go into D3 state (or may have the power removed), and
to properly flush the metadata to non-volatile RAM, we must complete a
normal shutdown. This consists of deleting the I/O queues and setting
the shutodown bit. We have to do some extra stuff to make sure we
reset the software state of the queues as well.

On resume, we have to reset the card twice, for reasons described in
the attach funcion. Once we've done that, we can restart the card. If
any of this fails, we'll fail the NVMe card, just like we do when a
reset fails.

Set is_resetting for the duration of the suspend / resume. This keeps
the reset taskqueue from running a concurrent reset, and also is
needed to prevent any hw completions from queueing more I/O to the
card. Pass resetting flag to nvme_ctrlr_start. It doesn't need to get
that from the global state of the ctrlr. Wait for any pending reset to
finish. All queued I/O will get sent to the hardware as part of
nvme_ctrlr_start(), though the upper layers shouldn't send any
down. Disabling the qpairs is the other failsafe to ensure all I/O is
queued.

Rename nvme_ctrlr_destory_qpairs to nvme_ctrlr_delete_qpairs to avoid
confusion with all the other destroy functions.  It just removes the
queues in hardware, while the other _destroy_ functions tear down
driver data structures.

Split parts of the hardware reset function up so that I can
do part of the reset in suspsend. Split out the software disabling
of the qpairs into nvme_ctrlr_disable_qpairs.

Finally, fix a couple of spelling errors in comments related to
this.

Relnotes: Yes
MFC After: 1 week
Reviewed by: scottl@ (prior version)
Differential Revision: https://reviews.freebsd.org/D21493
2019-09-03 15:26:11 +00:00
Warner Losh
5f9e856e3a It turns out the duplication is only mostly harmless.
While it worked with the kenrel, it wasn't working with the loader.
It failed to handle dependencies correctly. The reason for that is
that we never created a nvme module with the DRIVER_MODULE, but
instead a nvme_pci and nvme_ahci module. Create a real nvme module
that nvd can be dependent on so it can import the nvme symbols it
needs from there.

Arguably, nvd should just be a simple child of nvme, but transitioning
to that (and winning that argument given why it was done this way) is
beyond the scope of this change.

Reviewed by: jhb@
Differential Revision: https://reviews.freebsd.org/D21382
2019-08-23 22:52:58 +00:00
Warner Losh
acc48026b3 Remove stray line that was duplicated.
Noticed by: rpokala@
2019-08-22 02:53:51 +00:00
Warner Losh
f182f928db Separate the pci attachment from the rest of nvme
Nvme drives can be attached in a number of different ways. Separate out the PCI
attachment so that we can have other attachment types, like ahci and various
types of NVMeoF.

Submitted by: cognet@
2019-08-21 22:17:55 +00:00
Alexander Motin
51b92c1af6 Formalize NVMe controller consumer life cycle.
This fixes possible double call of fail_fn, for example on hot removal.
It also allows ctrlr_fn to safely return NULL cookie in case of failure
and not get useless ns_fn or fail_fn call with NULL cookie later.

MFC after:	2 weeks
2019-08-21 02:17:39 +00:00
Warner Losh
1071b50a65 Use sysctl + CTLRWTUN for hw.nvme.verbose_cmd_dump.
Also convert it to a bool. While the rest of the driver isn't yet bool clean,
this will help.

Reviewed by: cem@
Differential Revision: https://reviews.freebsd.org/D20988
2019-07-19 00:32:56 +00:00
Warner Losh
c75bdc044d Provide new tunable hw.nvme.verbose_cmd_dump
The nvme drive dumps only the most relevant details about a command when it
fails. However, there are times this is not sufficient (such as debugging weird
issues for a new drive with a vendor). Setting hw.nvme.verbose_cmd_dump=1
in loader.conf will enable more complete debugging information about each
command that fails.

Reviewed by: rpokala
Sponsored by: Netflix
Differential Version: https://reviews.freebsd.org/D20988
2019-07-18 21:58:51 +00:00
Warner Losh
204498d7c2 Remove now-obsolete comment. 2019-07-17 20:43:14 +00:00
Warner Losh
14343799dc Remove do-nothing nvme_modevent.
nvme_modevent no longer does anything interesting, remove it.

Sponsored by: Netflix
2018-11-16 16:51:44 +00:00
Warner Losh
09efa3dfb2 Put a workaround in for command timeout malfunctioning
At least one NVMe drive has a bug that makeing the Command Time Out
PCIe feature unreliable. The workaround is to disable this
feature. The driver wouldn't deal correctly with a timeout anyway.
Only do this for drives that are known bad.

Sponsored by: Netflix, Inc
Differential Revision: https://reviews.freebsd.org/D17708
2018-10-26 14:27:37 +00:00
Chuck Tuffli
9544e6dcf1 Make NVMe compatible with the original API
The original NVMe API used bit-fields to represent fields in data
structures defined by the specification (e.g. the op-code in the command
data structure). The implementation targeted x86_64 processors and
defined the bit fields for little endian dwords (i.e. 32 bits).

This approach does not work as-is for big endian architectures and was
changed to use a combination of bit shifts and masks to support PowerPC.
Unfortunately, this changed the NVMe API and forces #ifdef's based on
the OS revision level in user space code.

This change reverts to something that looks like the original API, but
it uses bytes instead of bit-fields inside the packed command structure.
As a bonus, this works as-is for both big and little endian CPU
architectures.

Bump __FreeBSD_version to 1200081 due to API change

Reviewed by: imp, kbowling, smh, mav
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D16404
2018-08-22 04:29:24 +00:00
Alexander Motin
f439e3a4ff Refactor NVMe CAM integration.
- Remove layering violation, when NVMe SIM code accessed CAM internal
device structures to set pointers on controller and namespace data.
Instead make NVMe XPT probe fetch the data directly from hardware.
 - Cleanup NVMe SIM code, fixing support for multiple namespaces per
controller (reporting them as LUNs) and adding controller detach support
and run-time namespace change notifications.
 - Add initial support for namespace change async events.  So far only
in CAM mode, but it allows run-time namespace arrival and departure.
 - Add missing nvme_notify_fail_consumers() call on controller detach.
Together with previous changes this allows NVMe device detach/unplug.

Non-CAM mode still requires a lot of love to stay on par, but at least
CAM mode code should not stay in the way so much, becoming much more
self-sufficient.

Reviewed by:	imp
MFC after:	1 month
Sponsored by:	iXsystems, Inc.
2018-05-25 03:34:33 +00:00
Wojciech Macek
0d787e9b35 NVMe: Add big-endian support
Remove bitfields from defined structures as they are not portable.
Instead use shift and mask macros in the driver and nvmecontrol application.

NVMe is now working on powerpc64 host.

Submitted by:          Michal Stanek <mst@semihalf.com>
Obtained from:         Semihalf
Reviewed by:           imp, wma
Sponsored by:          IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D13916
2018-02-22 13:32:31 +00:00
Warner Losh
29077eb456 Use atomic load and stores to ensure that the compiler doesn't
optimize away these loops. Change boolean to int to match what atomic
API supplies. Remove wmb() since the atomic_store_rel() on status.done
ensure the prior writes to status. It also fixes the fact that there
wasn't a rmb() before reading done. This should also be more efficient
since wmb() is fairly heavy weight.

Sponsored by: Netflix
Reviewed by: kib@, jim harris
Differential Revision: https://reviews.freebsd.org/D14053
2018-01-29 00:00:52 +00:00
Warner Losh
ce1ec9c178 When we're disabling the nvme device, some drives have a controller
bug that requires 'hands off' for a period of time (2.3s) before we
check the RDY bit. Sicne this is a very odd quirk for a very limited
selection of drives, do this as a quirk. This prevented a successful
reset of the card when the card wedged.

Also, make sure that we comply with the advice from section 3.1.5 of
the 1.3 spec says that transitioning CC.EN from 0 to 1 when CSTS.RDY
is 1 or transitioning CC.EN from 1 to 0 when CSTS.RDY is 0 "has
undefined results". Short circuit when EN == RDY == desired state.

Finally, fail the reset if the disable fails. This will lead to a
failed device, which is what we want. (note: nda device needs
work for coping with a failed device).

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D13389
2017-12-18 18:38:00 +00:00
Pedro F. Giffuni
718cf2ccb9 sys/dev: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 14:52:40 +00:00
Konstantin Belousov
5a21cd1941 The nvme module should explicitly declare dependency on the cam.
If both nvme and cam are compiled as modules, nvme cannot be kldloaded
otherwise.

Reviewed by:	imp
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-08-31 14:21:32 +00:00
Warner Losh
abb61405a6 Enable bus mastering on the device before resetting the device. The
card has to do PCIe transactions to complete the reset process, but
can't do them, per the PCIe spec, unless bus mastering is enabled.

Submitted by: Kinjal Patel
PR: 22166
2017-08-25 03:15:18 +00:00
Nathan Whitehorn
c670f31f19 Move NVME controller shutdown from being called as part of module unloading
to being called through the newbus DEVICE_SHUTDOWN() path. This ensures that
the NVME controller gets shut down before the device and bus disappear
and prevents data corruption on shutdown on at least Samsung EVO 960 SSDs.

PR:		kern/211852
Reviewed by:	imp
MFC after:	2 weeks
2017-08-12 22:13:06 +00:00
Warner Losh
a8a18dd590 Make multi-namespace nvme drives more robust.
Fix assumptions about name spaces in NVME driver. First, it assumes
cdata.nn is the number of configured devices. However, it is the
number of supported name spaces. Second, it assumes that there will
never be more than 16 name spaces supported, but a certain drive I'm
testing reports 1024. It assumes that name spaces are a tightly packed
namespace, but the standard seems to indicate otherwise. Finally, it
assumes that an error would be generated when quearying an
unconfigured namespace. Instead, it succeeds but the identify data is
all zeros.

Fix these by limiting the number of name spaces we probe to 16. Remove
aborting when we find one in error. When the size of the name space is
zero, ignore it.

This is admittedly a bandaide. The long term fix will be to
participate in the enumeration and name space change protocols
definfed in the NVNe standard.

Sponsored by: Netflix
2017-03-07 21:47:54 +00:00
Jim Harris
2b647da7a0 nvme: do not revert o single I/O queue when per-CPU queues not possible
Previously nvme(4) would revert to a signle I/O queue if it could not
allocate enought interrupt vectors or NVMe submission/completion queues
to have one I/O queue per core.  This patch determines how to utilize a
smaller number of available interrupt vectors, and assigns (as closely
as possible) an equal number of cores to each associated I/O queue.

MFC after:	3 days
Sponsored by:	Intel
2016-01-07 16:18:32 +00:00
Jim Harris
0e1fd2dda3 nvme: do not notify a consumer about failures that occur during initialization
MFC after:	3 days
Sponsored by:	Intel
2015-07-29 21:29:50 +00:00
Jim Harris
36b0e4ee1f nvme: remove CHATHAM related code
Chatham was an internal NVMe prototype board used for
early driver development.

MFC after:	1 week
Sponsored by:	Intel
2015-04-08 21:52:06 +00:00
Jim Harris
eb4929fb41 nvme: add device strings for Intel DC series NVMe SSDs
MFC after:	1 week
Sponsored by:	Intel
2015-04-08 21:50:45 +00:00
Jim Harris
496a27520d nvme: Close hole where nvd(4) would not be notified of all nvme(4)
instances if modules loaded during boot.

Sponsored by:	Intel
MFC after:	3 days
2014-03-18 18:09:08 +00:00
Jim Harris
7aa27dbac5 Do not leak resources during attach if nvme_ctrlr_construct() or the initial
controller resets fail.

Sponsored by:	Intel
Reviewed by:	carl
Approved by:	re (hrs)
MFC after:	1 week
2013-10-08 16:01:43 +00:00
Jim Harris
086d23cfd3 If a controller fails to initialize, do not notify consumers (nvd) of its
namespaces.

Sponsoredy by:	Intel
Reviewed by:	carl
MFC after:	3 days
2013-08-13 21:49:32 +00:00
Jim Harris
56183abc2b Send a shutdown notification in the driver unload path, to ensure
notification gets sent in cases where system shuts down with driver
unloaded.

Sponsored by:	Intel
Reviewed by:	carl
MFC after:	3 days
2013-08-13 21:47:08 +00:00
Jim Harris
38441bd9a9 Add message when nvd disks are attached and detached.
As part of this commit, add an nvme_strvis() function which borrows
heavily from cam_strvis().  This will allow stripping of
leading/trailing whitespace and also handle unprintable characters
in model/serial numbers.  This function goes into a new nvme_util.c
file which is used by both the driver and nvmecontrol.

Sponsored by:	Intel
Reviewed by:	carl
MFC after:	3 days
2013-07-19 21:40:57 +00:00
Jim Harris
e9efbc134f Update copyright dates.
MFC after:	3 days
2013-07-09 21:22:17 +00:00
Jim Harris
eb32b874f6 Add pci_enable_busmaster() and pci_disable_busmaster() calls in
nvme_attach() and nvme_detach() respectively.

Sponsored by:	Intel
MFC after:	3 days
2013-07-09 21:02:45 +00:00
Jim Harris
ca269f32ef Move the busdma mapping functions to nvme_qpair.c.
This removes nvme_uio.c completely.

Sponsored by:	Intel
2013-04-12 17:48:45 +00:00
Jim Harris
e2b9900498 Do not panic when a busdma mapping operation fails.
Instead, print an error message and fail the associated command with
DATA_TRANSFER_ERROR NVMe completion status.

Sponsored by:	Intel
2013-04-12 17:34:49 +00:00
Jim Harris
955910a916 Replace usages of mtx_pool_find used for admin commands with a polling
mechanism.

Now that all requests are timed, we are guaranteed to get a completion
notification, even if it is an abort status due to a timed out admin
command.

This has the effect of simplifying the controller and namespace setup
code, so that it reads straight through rather than broken up into
a bunch of different callback functions.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 22:09:51 +00:00
Jim Harris
232e2edb6c Add the ability to internally mark a controller as failed, if it is unable to
start or reset.  Also add a notifier for NVMe consumers for controller fail
conditions and plumb this notifier for nvd(4) to destroy the associated
GEOM disks when a failure occurs.

This requires a bit of work to cover the races when a consumer is sending
I/O requests to a controller that is transitioning to the failed state.  To
help cover this condition, add a task to defer completion of I/Os submitted
to a failed controller, so that the consumer will still always receive its
completions in a different context than the submission.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:58:38 +00:00
Jim Harris
be34f21609 Remove the is_started flag from struct nvme_controller.
This flag was originally added to communicate to the sysctl code
which oids should be built, but there are easier ways to do this.  This
needs to be cleaned up prior to adding new controller states - for example,
controller failure.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:19:26 +00:00
Jim Harris
cb5b7c1304 Cap the number of retry attempts to a configurable number. This ensures
that if a specific I/O repeatedly times out, we don't retry it indefinitely.

The default number of retries will be 4, but is adjusted using hw.nvme.retry_count.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:14:51 +00:00
Jim Harris
0d7e13ecb2 Pass associated log page data to async event consumers, if requested.
Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:08:32 +00:00
Jim Harris
cf81529ce3 Create struct nvme_status.
NVMe error log entries include status, so breaking this out into
its own data structure allows it to be included in both the
nvme_completion data structure as well as error log entry data
structures.

While here, expose nvme_completion_is_error(), and change all of
the places that were explicitly looking at sc/sct bits to use this
macro instead.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:00:18 +00:00
Jim Harris
b846efd7ec Add controller reset capability to nvme(4) and ability to explicitly
invoke it from nvmecontrol(8).

Controller reset will be performed in cases where I/O are repeatedly
timing out, the controller reports an unrecoverable condition, or
when explicitly requested via IOCTL or an nvme consumer.  Since the
controller may be in such a state where it cannot even process queue
deletion requests, we will perform a controller reset without trying
to clean up anything on the controller first.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 19:50:46 +00:00
Jim Harris
038a5ee403 Add an interface for nvme shim drivers (i.e. nvd) to register for
notifications when new nvme controllers are added to the system.

Sponsored by:	Intel
2013-03-26 18:39:54 +00:00
Jim Harris
990e741c18 Move controller destruction code from nvme_detach() to new nvme_ctrlr_destruct()
function.

Sponsored by:	Intel
2013-03-26 18:34:19 +00:00
David E. O'Brien
4b52061e17 Fix GCC build:
/usr/src/sys/modules/nvme/../../dev/nvme/nvme.c:211: warning: format '%qx' expects type 'long unsigned int', but argument 9 has type 'long long unsigned int' [-Wformat]
2013-03-07 22:54:28 +00:00
Jim Harris
91fe20e34d Map BAR 4/5, because NVMe spec says devices may place the MSI-X table
behind BAR 4/5, rather than in BAR 0/1 with the control/doorbell registers.

Sponsored by:	Intel
2012-12-18 23:27:18 +00:00
Jim Harris
e1e84e74c1 Simplify module definition by adding nvme_modevent to DRIVER_MODULE()
definition.

Submitted by:   Carl Delsey <carl.r.delsey@intel.com>
2012-12-18 22:10:40 +00:00
Jim Harris
4d6abcb19f Do not use taskqueue to defer completion work when using INTx. INTx now
matches MSI-X behavior.

Sponsored by:	Intel
2012-12-18 21:50:48 +00:00
Jim Harris
38ce9496fe Add PCI device ID for 8-channel IDT NVMe controller, and clarify that the
previously defined IDT PCI device ID was for a 32-channel controller.

Submitted by:	Joe Golio <joseph.golio@isilon.com>
2012-12-06 15:36:24 +00:00
Jim Harris
5fa5cc5f12 Cleanup uio-related code to use struct nvme_request and
nvme_ctrlr_submit_io_request().

While here, also fix case where a uio may have more than 1 iovec.
NVMe's definition of SGEs (called PRPs) only allows for the first SGE to
start on a non-page boundary.  The simplest way to handle this is to
construct a temporary uio for each iovec, and submit an NVMe request
for each.

Sponsored by:	Intel
2012-10-18 00:40:40 +00:00
Jim Harris
d281e8fbbd Add nvme_ctrlr_submit_[admin|io]_request functions which consolidates
code for allocating nvme_tracker objects and making calls into
bus_dmamap_load for commands which have payloads.

Sponsored by:	Intel
2012-10-18 00:39:29 +00:00
Jim Harris
ad697276ce Add struct nvme_request object which contains all of the parameters passed
from an NVMe consumer.

This allows us to mostly build NVMe command buffers without holding the
qpair lock, and also allows for future queueing of nvme_request objects
in cases where the submission queue is full and no nvme_tracker objects
are available.

Sponsored by:	Intel
2012-10-18 00:38:28 +00:00