61 Commits

Author SHA1 Message Date
chuck
fa895cb8d2 Make NVMe compatible with the original API
The original NVMe API used bit-fields to represent fields in data
structures defined by the specification (e.g. the op-code in the command
data structure). The implementation targeted x86_64 processors and
defined the bit fields for little endian dwords (i.e. 32 bits).

This approach does not work as-is for big endian architectures and was
changed to use a combination of bit shifts and masks to support PowerPC.
Unfortunately, this changed the NVMe API and forces #ifdef's based on
the OS revision level in user space code.

This change reverts to something that looks like the original API, but
it uses bytes instead of bit-fields inside the packed command structure.
As a bonus, this works as-is for both big and little endian CPU
architectures.

Bump __FreeBSD_version to 1200081 due to API change

Reviewed by: imp, kbowling, smh, mav
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D16404
2018-08-22 04:29:24 +00:00
mav
a8d82e59ae Refactor NVMe CAM integration.
- Remove layering violation, when NVMe SIM code accessed CAM internal
device structures to set pointers on controller and namespace data.
Instead make NVMe XPT probe fetch the data directly from hardware.
 - Cleanup NVMe SIM code, fixing support for multiple namespaces per
controller (reporting them as LUNs) and adding controller detach support
and run-time namespace change notifications.
 - Add initial support for namespace change async events.  So far only
in CAM mode, but it allows run-time namespace arrival and departure.
 - Add missing nvme_notify_fail_consumers() call on controller detach.
Together with previous changes this allows NVMe device detach/unplug.

Non-CAM mode still requires a lot of love to stay on par, but at least
CAM mode code should not stay in the way so much, becoming much more
self-sufficient.

Reviewed by:	imp
MFC after:	1 month
Sponsored by:	iXsystems, Inc.
2018-05-25 03:34:33 +00:00
imp
8d536b9f2e Starting LBA is a 64bit number, so use htole64 instead of htole32. The
latter casts the LBA to a 32-bit number before assigning it to the 64
bit structure entity. This works fine on the first 2TB of TRIMs, but
terrible beyond that due to trucation.

Also, add an assert to make sure we don't end too many DSM TRIM
entries in one request.

Sponsored by: Netflix
2018-03-20 03:37:14 +00:00
imp
1ddb7299f0 Implement trim collapsing in nda
When multiple trims are in the queue, collapse them as much as
possible. At present, this usually results in only a few trims being
collapsed together, but more work on that will make it possible to do
hundreds (up to some configurable max).

Sponsored by: Netflix
2018-03-14 16:44:50 +00:00
mav
a7ab51623b Print fuses and fna fields in identify data.
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2018-03-12 16:31:25 +00:00
mav
73b7aa323b Add new opcodes and statuses from NVMe 1.3a.
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2018-03-11 06:30:09 +00:00
mav
559bce3bae Add new identify data structures fields from NVMe 1.3a.
Some of them are already supported by existing hardware, so reporting
them `nvmecontrol identify` can be useful.
2018-03-11 05:09:02 +00:00
kevans
ba4f462a09 nvme: Unbreak LE builds after r329824
The parameter 'p' is unused if _BYTE_ORDER == _LITTLE_ENDIAN. Add in a
(void)p to fix the build.
2018-02-22 16:16:49 +00:00
wma
2858f9ff6e NVMe: Add big-endian support
Remove bitfields from defined structures as they are not portable.
Instead use shift and mask macros in the driver and nvmecontrol application.

NVMe is now working on powerpc64 host.

Submitted by:          Michal Stanek <mst@semihalf.com>
Obtained from:         Semihalf
Reviewed by:           imp, wma
Sponsored by:          IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D13916
2018-02-22 13:32:31 +00:00
imp
56151d954c Backout r329818, r329816 and r329815.
These aren't the commits I thought I was testing prior to
commit. Revert until I can sort out what happened and fix it.
2018-02-22 11:18:33 +00:00
imp
25e0879e2e Combine BIO_DELETE requests for nda devices
Now that we're queueing BIO_DELETE requests in the CAM I/O scheduler,
it make sense to try to combine as many as possible into a single
request to send down to hardware. Hopefully, lots of larger requests
like this are better than lots of individual transactions.

Note for future: need to limit based on total size of the trim
request. Should also collapse adjacent ranges where possible to
increase the size of the max payload.

Sponsored by: Netflix
2018-02-22 05:44:00 +00:00
pfg
1537078d8f sys/dev: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 14:52:40 +00:00
imp
c00b8f3c13 Provide link speed data in XPT_GET_TRAN_SETTINGS. Provide full version
information for that and XPT_PATH_INQ. Provide macros to encode/decode
major/minor versions.  Read the link speed and lane count to compute
the base_transfer_speed for XPT_PATH_INQ.

Sponsored by: Netflix
2017-11-14 05:05:16 +00:00
imp
d5eb569d3d Closer examination shows that nvme and CAM both normally zero-fill
allocations (for req and ccb, which ultimately contain the
nvme_cmd). As such, we can micro-optimize these routines. Add a
comment to this effect, and bzero the ccb used to make the requests
for the nda dump rotuine so it more closely matches a ccb allocated
with xpt_get_ccb().

Sponsored by: Netflix
2017-10-15 23:53:55 +00:00
imp
1edcf8dd07 Explicitly set reserved fields and 'fuse' to 0. This prevents us from
acidentally sending bogus values in these fields, which some drives
may reject with an error or worse (undefined behavior).

This is especially needed for the ndadump routine which allocates the
cmd from stack garbage....

Sponsored by: Netflix
2017-10-15 16:17:59 +00:00
imp
20c4a6f953 Fix a few overlooked spots where the coded uses 16-bit NSIDs. Chuck
Tuffli had submitted a more thorough patch that I was unaware of when
I did my work and this brings in the bits I missed from that patch.

PR: 220267
Submitted by: Chuck Tuffli
2017-08-29 15:46:34 +00:00
imp
64912a07d3 Fill in reserved areas from NVMe spec in the IDENTIFY structure
(struct nvme_controller_data) as defined in the NVM Express
specification, revsion 1.3.

Sponsored by: Netflix
2017-08-25 21:38:43 +00:00
imp
2fd4069941 Add feature codes from NVMe 1.3 specification:
o Automomous Power State Transition
o Host Memory Buffer
o Timestamp
o Keep Alive Timer
o Host Controlled Thermal Management
o Non-Operational Power State Config

Also note that feature codes 0x78-0x7f are reserved for the NVMe
Management Interface.

Sponsored by: Netflix
2017-08-25 21:38:29 +00:00
imp
1c9cc9fd2e Use _Static_assert
These files are compiled in userland too, so we can't use sys/systm.h
and rely on CTASSERT. Switch to using _Static_assert instead.

MFC After: 3 days
Sponsored by: Netflix
2017-08-25 04:33:06 +00:00
imp
81eb7962f0 Sanity check sizes
Add compile time sanity checks to make sure that packed structures are
the proper size, typically as defined in the NVMe standard.
2017-08-25 04:05:53 +00:00
imp
b001b92dc8 Make nvd vs nda choice boot-time rather than build-time
Introduce hw.nvme.use_nvd tunable. This tunable allows both nvd and
nda to be installed in the kernel, while allowing only one of them to
create devices. This is an all-or-nothing setting, and you can't
change it after boot-time. However, it will allow easier A/B testing.

Differential Revision: https://reviews.freebsd.org/D11825
2017-08-04 03:40:01 +00:00
imp
7e0ae72db7 Add new definitions for namespaces.
Sponsored by: Netflix
Submitted by: Matt Williams (via D11330)
2017-06-27 20:24:39 +00:00
imp
6884a6e482 cwd10 takes the low 32-bits and cwd11 takes the upper 32-bits of the
lba. Rather than do a cast to uint64_t, which clang warns might be
unaligned, do the stores 32-bits at a time.

Sponsored by: Netflix
2017-03-07 23:02:59 +00:00
imp
026202c479 Implement HGST Log page 0xc1, as documented in the HGST SN100 and
SN150 product manuals. Subpage 0x32 is documented, but not implemented.

Sponsored by: Netflix, Inc
2016-11-19 17:13:08 +00:00
imp
cc89adf060 Print Intel's expanded Temperature log page.
Sponsored by: Netflix, Inc
2016-11-19 17:13:03 +00:00
imp
ef178f671d Add log pages that Intel SSDs provide. It turns out that many of these
are widely implemented beyond just Intel drives.

Sponsored by: Netflix, Inc
2016-11-19 17:12:58 +00:00
imp
99472d4580 Add log pages defined through NVM Express 1.2.1.
Sponsored by: Netflix, Inc
2016-11-19 17:12:53 +00:00
imp
69c12da21d Expand the SMART / Health Information Log Page (Page 02) printout
based on NVM Express 1.2.1 Standard.

Sponsored by: Netflix, Inc
2016-11-19 17:12:49 +00:00
scottl
ff355c7557 Implement crashdump support on NVME
MFC after:	3 days
Sponsored by:	Netflix, Inc.
2016-07-19 03:13:51 +00:00
imp
b93dab6663 Commit the bits of nda that were missed. This should fix the build.
Approved by: re@
2016-06-10 06:04:53 +00:00
mav
5c07241f1a Revert r292074 (by smh): Limit stripesize reported from nvd(4) to 4K
I believe that this patch handled the problem from the wrong side.
Instead of making ZFS properly handle large stripe sizes, it made
unrelated driver to lie in reported parameters to workaround that.

Alternative solution for this problem from ZFS side was committed at
r296615.

Discussed with:	smh
2016-03-10 17:13:10 +00:00
imp
b58cf84475 Implement power command to list all power modes, find out the power
mode we're in and to set the power mode.
2016-01-30 22:48:06 +00:00
smh
0026debd97 Limit stripesize reported from nvd(4) to 4K
Intel NVMe controllers have a slow path for I/Os that span a 128KB stripe boundary but ZFS limits ashift, which is derived from d_stripesize, to 13 (8KB) so we limit the stripesize reported to geom(8) to 4KB.

This may result in a small number of additional I/Os to require splitting in nvme(4), however the NVMe I/O path is very efficient so these additional I/Os will cause very minimal (if any) difference in performance or CPU utilisation.

This can be controller by the new sysctl kern.nvme.max_optimal_sectorsize.

MFC after:	1 week
Sponsored by:	Multiplay
Differential Revision:	https://reviews.freebsd.org/D4446
2015-12-11 02:06:03 +00:00
jimharris
b93e62f3e6 nvd, nvme: report stripesize through GEOM disk layer
MFC after:	3 days
Sponsored by:	Intel
2015-10-30 16:35:18 +00:00
jimharris
bb66cfd2ae Extend some 32-bit fields and variables to 64-bit to prevent overflow
when calculating stats in nvmecontrol perftest.

Sponsored by:	Intel
Reported by:	Joe Golio <joseph.golio@emc.com>
Reviewed by:	carl
Approved by:	re (hrs)
MFC after:	1 week
2013-10-08 15:47:22 +00:00
jimharris
509a795193 Add driver-assisted striping for upcoming Intel NVMe controllers that can
benefit from it.

Sponsored by:	Intel
Reviewed by:	kib (earlier version), carl
Approved by:	re (hrs)
MFC after:	1 week
2013-10-08 15:44:04 +00:00
jimharris
3f846da35a Send a shutdown notification in the driver unload path, to ensure
notification gets sent in cases where system shuts down with driver
unloaded.

Sponsored by:	Intel
Reviewed by:	carl
MFC after:	3 days
2013-08-13 21:47:08 +00:00
jimharris
52bfa150c7 Add message when nvd disks are attached and detached.
As part of this commit, add an nvme_strvis() function which borrows
heavily from cam_strvis().  This will allow stripping of
leading/trailing whitespace and also handle unprintable characters
in model/serial numbers.  This function goes into a new nvme_util.c
file which is used by both the driver and nvmecontrol.

Sponsored by:	Intel
Reviewed by:	carl
MFC after:	3 days
2013-07-19 21:40:57 +00:00
jimharris
8281445679 Define constants for the lengths of the serial number, model number
and firmware revision in the controller's identify structure.

Also modify consumers of these fields to ensure they only use the
specified number of bytes for their respective fields.

Sponsored by:	Intel
Reviewed by:	carl
MFC after:	3 days
2013-07-17 23:23:38 +00:00
jimharris
ae0660c354 Fix a poorly worded comment in nvme(4).
MFC after:	3 days
2013-07-11 15:02:38 +00:00
jimharris
d7c0528dab Update copyright dates.
MFC after:	3 days
2013-07-09 21:22:17 +00:00
jimharris
c15f698fb4 Add firmware replacement and activation support to nvmecontrol(8) through
a new firmware command.

NVMe controllers may support up to 7 firmware slots for storing of
different firmware revisions.  This new firmware command supports
firmware replacement (i.e. firmware download) with or without immediate
activation, or activation of a previously stored firmware image.  It
also supports selection of the firmware slot during replacement
operations, using IDENTIFY information from the controller to
check that the specified slot is valid.

Newly activated firmware does not take effect until the new controller
reset, either via a reboot or separate 'nvmecontrol reset' command to the
same controller.

Submitted by:	Joe Golio <joseph.golio@emc.com>
Obtained from:	EMC / Isilon Storage Division
MFC after:	3 days
2013-06-27 00:08:25 +00:00
jimharris
8579bd1923 Use MAXPHYS to specify the maximum I/O size for nvme(4).
Also allow admin commands to transfer up to this maximum I/O size, rather
than the artificial limit previously imposed.  The larger I/O size is very
beneficial for upcoming firmware download support.  This has the added
benefit of simplifying the code since both admin and I/O commands now use
the same maximum I/O size.

Sponsored by:	Intel
MFC after:	3 days
2013-06-26 23:27:17 +00:00
jimharris
c0e542217e Remove the NVME_IDENTIFY_CONTROLLER and NVME_IDENTIFY_NAMESPACE IOCTLs and replace
them with the NVMe passthrough equivalent.

Sponsored by:	Intel
2013-04-12 17:56:47 +00:00
jimharris
eee11f2f3d Add support for passthrough NVMe commands.
This includes a new IOCTL to support a generic method for nvmecontrol(8) to pass
IDENTIFY, GET_LOG_PAGE, GET_FEATURES and other commands to the controller, rather than
separate IOCTLs for each.

Sponsored by:	Intel
2013-04-12 17:52:17 +00:00
jimharris
c4799f93b1 Add unmapped bio support to nvme(4) and nvd(4).
Sponsored by:	Intel
2013-04-01 16:23:34 +00:00
jimharris
69d2e13801 Add the ability to internally mark a controller as failed, if it is unable to
start or reset.  Also add a notifier for NVMe consumers for controller fail
conditions and plumb this notifier for nvd(4) to destroy the associated
GEOM disks when a failure occurs.

This requires a bit of work to cover the races when a consumer is sending
I/O requests to a controller that is transitioning to the failed state.  To
help cover this condition, add a task to defer completion of I/Os submitted
to a failed controller, so that the consumer will still always receive its
completions in a different context than the submission.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:58:38 +00:00
jimharris
18a3a60fb4 Pass associated log page data to async event consumers, if requested.
Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:08:32 +00:00
jimharris
79d7c4eec2 Add structure definitions and controller command function for firmware
log pages.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:03:03 +00:00
jimharris
de4e1d0695 Add structure definitions and a controller command function for
error log pages.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:01:53 +00:00