Commit Graph

205 Commits

Author SHA1 Message Date
Warner Losh
223a9b93ac Add feature codes from NVMe 1.3 specification:
o Automomous Power State Transition
o Host Memory Buffer
o Timestamp
o Keep Alive Timer
o Host Controlled Thermal Management
o Non-Operational Power State Config

Also note that feature codes 0x78-0x7f are reserved for the NVMe
Management Interface.

Sponsored by: Netflix
2017-08-25 21:38:29 +00:00
Warner Losh
0012e436e3 Use _Static_assert
These files are compiled in userland too, so we can't use sys/systm.h
and rely on CTASSERT. Switch to using _Static_assert instead.

MFC After: 3 days
Sponsored by: Netflix
2017-08-25 04:33:06 +00:00
Warner Losh
0c26c1992f Sanity check sizes
Add compile time sanity checks to make sure that packed structures are
the proper size, typically as defined in the NVMe standard.
2017-08-25 04:05:53 +00:00
Warner Losh
abb61405a6 Enable bus mastering on the device before resetting the device. The
card has to do PCIe transactions to complete the reset process, but
can't do them, per the PCIe spec, unless bus mastering is enabled.

Submitted by: Kinjal Patel
PR: 22166
2017-08-25 03:15:18 +00:00
Nathan Whitehorn
c670f31f19 Move NVME controller shutdown from being called as part of module unloading
to being called through the newbus DEVICE_SHUTDOWN() path. This ensures that
the NVME controller gets shut down before the device and bus disappear
and prevents data corruption on shutdown on at least Samsung EVO 960 SSDs.

PR:		kern/211852
Reviewed by:	imp
MFC after:	2 weeks
2017-08-12 22:13:06 +00:00
Warner Losh
d0e75394cf Use the correct queue depth for nda devices.
Submitted by: Matt Williams
2017-08-08 16:06:16 +00:00
Warner Losh
8a5d94f94d Make nvd vs nda choice boot-time rather than build-time
Introduce hw.nvme.use_nvd tunable. This tunable allows both nvd and
nda to be installed in the kernel, while allowing only one of them to
create devices. This is an all-or-nothing setting, and you can't
change it after boot-time. However, it will allow easier A/B testing.

Differential Revision: https://reviews.freebsd.org/D11825
2017-08-04 03:40:01 +00:00
Warner Losh
df4245150a This adds CAM pass(4) support for NVMe IO's. Applications indicate
the IO type (Admin or NVM) using XPT op-codes XPT_NVME_ADMIN or
XPT_NVME_IO.

Submitted by:   Chuck Tuffli <chuck@tuffli.net>
Differential Revision:  https://reviews.freebsd.org/D10247
2017-07-14 14:52:20 +00:00
Warner Losh
594ffc03cd Add new definitions for namespaces.
Sponsored by: Netflix
Submitted by: Matt Williams (via D11330)
2017-06-27 20:24:39 +00:00
Warner Losh
824073fbd6 Avoid dereferencing unintialized elements in the error path.
Some drives sometimes have errors for things like setting the number
of queue entries in the submission queue. The error paths taken for
these drives ensure a panic dereferencing uninialized data.

Sponsored by: Netflix
2017-03-07 23:06:41 +00:00
Warner Losh
05ee702af6 cwd10 takes the low 32-bits and cwd11 takes the upper 32-bits of the
lba. Rather than do a cast to uint64_t, which clang warns might be
unaligned, do the stores 32-bits at a time.

Sponsored by: Netflix
2017-03-07 23:02:59 +00:00
Warner Losh
a8a18dd590 Make multi-namespace nvme drives more robust.
Fix assumptions about name spaces in NVME driver. First, it assumes
cdata.nn is the number of configured devices. However, it is the
number of supported name spaces. Second, it assumes that there will
never be more than 16 name spaces supported, but a certain drive I'm
testing reports 1024. It assumes that name spaces are a tightly packed
namespace, but the standard seems to indicate otherwise. Finally, it
assumes that an error would be generated when quearying an
unconfigured namespace. Instead, it succeeds but the identify data is
all zeros.

Fix these by limiting the number of name spaces we probe to 16. Remove
aborting when we find one in error. When the size of the name space is
zero, ignore it.

This is admittedly a bandaide. The long term fix will be to
participate in the enumeration and name space change protocols
definfed in the NVNe standard.

Sponsored by: Netflix
2017-03-07 21:47:54 +00:00
Warner Losh
adc8145e6f Remove obsolete comment after prior rev. 2017-02-19 17:38:17 +00:00
Alexander Motin
950c5aca4a Remove dead mentions of CAM target mode APIs from drivers.
This makes grepping kernel for target mode implementation much easier.
2017-02-19 17:27:58 +00:00
Warner Losh
a3a6c48d66 Ensure that the passthrough request will fit in MAXPHYS bytes after it
has been rounded to full pages. This avoids a panic in
vm_fault_quick_hold_pages due to this off-by-one error passing one
page too many into vmapbuf.
2017-02-02 23:04:06 +00:00
Ravi Pokala
d3c06026c2 In the same vein as r311350, fix whitespace in handling of XPT_PATH_INQ in
several more drivers.

Sponsored by:	Panasas
2017-01-05 03:08:57 +00:00
Alan Somers
4195c7de24 Always null-terminate ccb_pathinq.(sim_vid|hba_vid|dev_name)
The sim_vid, hba_vid, and dev_name fields of struct ccb_pathinq are
fixed-length strings. AFAICT the only place they're read is in
sbin/camcontrol/camcontrol.c, which assumes they'll be null-terminated.
However, the kernel doesn't null-terminate them. A bunch of copy-pasted code
uses strncpy to write them, and doesn't guarantee null-termination. For at
least 4 drivers (mpr, mps, ciss, and hyperv), the hba_vid field actually
overflows. You can see the result by doing "camcontrol negotiate da0 -v".

This change null-terminates those fields everywhere they're set in the
kernel. It also shortens a few strings to ensure they'll fit within the
16-character field.

PR:		215474
Reported by:	Coverity
CID:		1009997 1010000 1010001 1010002 1010003 1010004 1010005
CID:		1331519 1010006 1215097 1010007 1288967 1010008 1306000
CID:		1211924 1010009 1010010 1010011 1010012 1010013 1010014
CID:		1147190 1010017 1010016 1010018 1216435 1010020 1010021
CID:		1010022 1009666 1018185 1010023 1010025 1010026 1010027
CID:		1010028 1010029 1010030 1010031 1010033 1018186 1018187
CID:		1010035 1010036 1010042 1010041 1010040 1010039
Reviewed by:	imp, sephe, slm
MFC after:	4 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D9037
Differential Revision:	https://reviews.freebsd.org/D9038
2017-01-04 20:26:42 +00:00
Warner Losh
0cf14228c4 Implement HGST Log page 0xc1, as documented in the HGST SN100 and
SN150 product manuals. Subpage 0x32 is documented, but not implemented.

Sponsored by: Netflix, Inc
2016-11-19 17:13:08 +00:00
Warner Losh
ab1dd0917b Print Intel's expanded Temperature log page.
Sponsored by: Netflix, Inc
2016-11-19 17:13:03 +00:00
Warner Losh
d01f26f590 Add log pages that Intel SSDs provide. It turns out that many of these
are widely implemented beyond just Intel drives.

Sponsored by: Netflix, Inc
2016-11-19 17:12:58 +00:00
Warner Losh
aea528795a Add log pages defined through NVM Express 1.2.1.
Sponsored by: Netflix, Inc
2016-11-19 17:12:53 +00:00
Warner Losh
dc58cdf95e Expand the SMART / Health Information Log Page (Page 02) printout
based on NVM Express 1.2.1 Standard.

Sponsored by: Netflix, Inc
2016-11-19 17:12:49 +00:00
Scott Long
a965389b5a Convert the Q-Pair and PRP list memory allocations to use BUSDMA. Add a
bunch of safery belts and error handling in related codepaths.

Reviewed by:	jimharris
Obtained from:	Netflix
Differential Revision:	D8453
2016-11-08 00:24:49 +00:00
Warner Losh
34dc8f1bb4 Kill a few stray debug printfs. 2016-07-28 22:40:31 +00:00
Warner Losh
3a31c31c22 Actually import nvme_sim so the CAM attachment for NVME (nda) actually
works.

MFC after: 1 week
2016-07-21 03:11:39 +00:00
Scott Long
49e20d2420 Supporting flushing the dump before returning, and simplify/combine the
logic.  Switch to a 5us delay since most NVME devices can easily do 200,000
iops.

Submitted by:	imp
MFC after:	3 days
Sponsored by:	Netflix, Inc.
2016-07-19 19:09:23 +00:00
Scott Long
a498975ef7 Implement crashdump support on NVME
MFC after:	3 days
Sponsored by:	Netflix, Inc.
2016-07-19 03:13:51 +00:00
Warner Losh
f24c011beb Commit the bits of nda that were missed. This should fix the build.
Approved by: re@
2016-06-10 06:04:53 +00:00
Alexander Motin
ee7f4d8187 Revert r292074 (by smh): Limit stripesize reported from nvd(4) to 4K
I believe that this patch handled the problem from the wrong side.
Instead of making ZFS properly handle large stripe sizes, it made
unrelated driver to lie in reported parameters to workaround that.

Alternative solution for this problem from ZFS side was committed at
r296615.

Discussed with:	smh
2016-03-10 17:13:10 +00:00
Jim Harris
361e1fb408 nvme: fix intx handler to not dereference ioq during initialization
This was a regression from r293328, which deferred allocation
of the controller's ioq array until after interrupts are enabled
during boot.

PR:		207432
Reported and tested by: Andy Carrel <wac@google.com>
MFC after:	3 days
Sponsored by:	Intel
2016-02-24 00:01:10 +00:00
Justin Hibbits
43cd61606b Replace several bus_alloc_resource() calls using default arguments with bus_alloc_resource_any()
Since these calls only use default arguments, bus_alloc_resource_any() is the
right call.

Differential Revision: https://reviews.freebsd.org/D5306
2016-02-19 03:37:56 +00:00
Jim Harris
7b036d7790 nvme: avoid duplicate SET_NUM_QUEUES commands
nvme(4) issues a SET_NUM_QUEUES command during device
initialization to ensure enough I/O queues exists for each
of the MSI-X vectors we have allocated.  The SET_NUM_QUEUES
command is then issued again during nvme_ctrlr_start(), to
ensure that is properly set after any controller reset.

At least one NVMe drive exists which fails this second
SET_NUM_QUEUES command during device initialization.  So
change nvme_ctrlr_start() to only issue its SET_NUM_QUEUES
command when it is coming out of a reset - avoiding the
duplicate SET_NUM_QUEUES during device initialization.

Reported by:	gallatin
MFC after:	3 days
Sponsored by:	Intel
2016-02-11 17:32:41 +00:00
Warner Losh
038659e7dd Implement power command to list all power modes, find out the power
mode we're in and to set the power mode.
2016-01-30 22:48:06 +00:00
Jim Harris
9c6b5d40eb nvme: replace NVME_CEILING macro with howmany()
Suggested by:	rpokala
MFC after:	3 days
2016-01-07 20:35:26 +00:00
Jim Harris
50dea2da12 nvme: add hw.nvme.min_cpus_per_ioq tunable
Due to FreeBSD system-wide limits on number of MSI-X vectors
(https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321),
it may be desirable to allocate fewer than the maximum number
of vectors for an NVMe device, in order to save vectors for
other devices (usually Ethernet) that can take better
advantage of them and may be probed after NVMe.

This tunable is expressed in terms of minimum number of CPUs
per I/O queue instead of max number of queues per controller,
to allow for a more even distribution of CPUs per queue.  This
avoids cases where some number of CPUs have a dedicated queue,
but other CPUs need to share queues.  Ideally the PR referenced
above will eventually be fixed and the mechanism implemented
here becomes obsolete anyways.

While here, fix a bug in the CPUs per I/O queue calculation to
properly account for the admin queue's MSI-X vector.

Reviewed by:	gallatin
MFC after:	3 days
Sponsored by:	Intel
2016-01-07 20:32:04 +00:00
Jim Harris
2b647da7a0 nvme: do not revert o single I/O queue when per-CPU queues not possible
Previously nvme(4) would revert to a signle I/O queue if it could not
allocate enought interrupt vectors or NVMe submission/completion queues
to have one I/O queue per core.  This patch determines how to utilize a
smaller number of available interrupt vectors, and assigns (as closely
as possible) an equal number of cores to each associated I/O queue.

MFC after:	3 days
Sponsored by:	Intel
2016-01-07 16:18:32 +00:00
Jim Harris
d400f790b1 nvme: break out interrupt setup code into a separate function
MFC after:	3 days
Sponsored by:	Intel
2016-01-07 16:12:42 +00:00
Jim Harris
e5af5854ff nvme: do not pre-allocate MSI-X IRQ resources
The issue referenced here was resolved by other changes
in recent commits, so this code is no longer needed.

MFC after:	3 days
Sponsored by:	Intel
2016-01-07 16:11:31 +00:00
Jim Harris
c75ad8ce5a nvme: remove per_cpu_io_queues from struct nvme_controller
Instead just use num_io_queues to make this determination.

This prepares for some future changes enabling use of multiple
queues when we do not have enough queues or MSI-X vectors
for one queue per CPU.

MFC after:	3 days
Sponsored by:	Intel
2016-01-07 16:09:56 +00:00
Jim Harris
d85f84abb8 nvme: simplify some of the nested ifs in interrupt setup code
This prepares for some follow-up commits which do more work in
this area.

MFC after:	3 days
Sponsored by:	Intel
2016-01-07 16:08:04 +00:00
Steven Hartland
fdf16a68ab Limit stripesize reported from nvd(4) to 4K
Intel NVMe controllers have a slow path for I/Os that span a 128KB stripe boundary but ZFS limits ashift, which is derived from d_stripesize, to 13 (8KB) so we limit the stripesize reported to geom(8) to 4KB.

This may result in a small number of additional I/Os to require splitting in nvme(4), however the NVMe I/O path is very efficient so these additional I/Os will cause very minimal (if any) difference in performance or CPU utilisation.

This can be controller by the new sysctl kern.nvme.max_optimal_sectorsize.

MFC after:	1 week
Sponsored by:	Multiplay
Differential Revision:	https://reviews.freebsd.org/D4446
2015-12-11 02:06:03 +00:00
Jim Harris
fdbd3d8068 nvd, nvme: report stripesize through GEOM disk layer
MFC after:	3 days
Sponsored by:	Intel
2015-10-30 16:35:18 +00:00
Jim Harris
e7e7bad3d7 nvme: fix race condition in split bio completion path
Fixes race condition observed under following circumstances:

1) I/O split on 128KB boundary with Intel NVMe controller.
   Current Intel controllers produce better latency when
   I/Os do not span a 128KB boundary - even if the I/O size
   itself is less than 128KB.
2) Per-CPU I/O queues are enabled.
3) Child I/Os are submitted on different submission queues.
4) Interrupts for child I/O completions occur almost
   simultaneously.
5) ithread for child I/O A increments bio_inbed, then
   immediately is preempted (rendezvous IPI, higher priority
   interrupt).
6) ithread for child I/O B increments bio_inbed, then completes
   parent bio since all children are now completed.
7) parent bio is freed, and immediately reallocated for a VFS
   or gpart bio (including setting bio_children to 1 and
   clearing bio_driver1).
8) ithread for child I/O A resumes processing.  bio_children
   for what it thinks is the parent bio is set to 1, so it
   thinks it needs to complete the parent bio.

Result is either calling a NULL callback function, or double freeing
the bio to its uma zone.

PR:		203746
Reported by:	Drew Gallatin <gallatin@netflix.com>,
		Marc Goroff <mgoroff@quorum.net>
Tested by:	Drew Gallatin <gallatin@netflix.com>
MFC after:	3 days
Sponsored by:	Intel
2015-10-30 16:06:34 +00:00
Jim Harris
0e1fd2dda3 nvme: do not notify a consumer about failures that occur during initialization
MFC after:	3 days
Sponsored by:	Intel
2015-07-29 21:29:50 +00:00
Jeff Roberson
fade8dd714 Refactor unmapped buffer address handling.
- Use pointer assignment rather than a combination of pointers and
   flags to switch buffers between unmapped and mapped.  This eliminates
   multiple flags and generally simplifies the logic.
 - Eliminate b_saveaddr since it is only used with pager bufs which have
   their b_data re-initialized on each allocation.
 - Gather up some convenience routines in the buffer cache for
   manipulating buf space and buf malloc space.
 - Add an inline, buf_mapped(), to standardize checks around unmapped
   buffers.

In collaboration with: mlaier
Reviewed by:	kib
Tested by:	pho (many small revisions ago)
Sponsored by:	EMC / Isilon Storage Division
2015-07-23 19:13:41 +00:00
Jim Harris
cbdec09c1c nvme: ensure csts.rdy bit is cleared before returning from nvme_ctrlr_disable
PR:		200458
MFC after:	3 days
Sponsored by:	Intel
2015-07-23 15:50:39 +00:00
Jim Harris
de9a58f4ee nvme: properly handle case where pci_alloc_msix does not alloc all vectors
Reported by: Sean Kelly <smkelly@smkelly.org>
MFC after:	3 days
Sponsored by:	Intel
2015-07-23 15:35:08 +00:00
Jim Harris
3345ed9a55 nvme: use BUS_SPACE_MAXSIZE for bus_dma_tag_create maxsize parameter
This fixes i386 PAE build fallout from r281281.

Reported by:	bz
MFC after:	1 week
2015-04-09 00:37:55 +00:00
Jim Harris
36b0e4ee1f nvme: remove CHATHAM related code
Chatham was an internal NVMe prototype board used for
early driver development.

MFC after:	1 week
Sponsored by:	Intel
2015-04-08 21:52:06 +00:00
Jim Harris
eb4929fb41 nvme: add device strings for Intel DC series NVMe SSDs
MFC after:	1 week
Sponsored by:	Intel
2015-04-08 21:50:45 +00:00
Jim Harris
a6e3096392 nvme: create separate DMA tag for non-payload DMA buffers
Submission and completion queue memory need to use a
separate DMA tag for mappings than payload buffers,
to ensure mappings remain contiguous even with DMAR
enabled.

Submitted by:	kib
MFC after:	1 week
Sponsored by:	Intel
2015-04-08 21:49:45 +00:00
Jim Harris
e5ce537999 nvme: fall back to a smaller MSI-X vector allocation if necessary
Previously, if per-CPU MSI-X vectors could not be allocated,
nvme(4) would fall back to INTx with a single I/O queue pair.
This change will still fall back to a single I/O queue pair, but
allocate MSI-X vectors instead of reverting to INTx.

MFC after:	1 week
Sponsored by:	Intel
2015-04-08 21:46:18 +00:00
Jim Harris
2efb5fb1ec Use bitwise OR instead of logical OR when constructing value for
SET_FEATURES/NUMBER_OF_QUEUES command.

Sponsored by:	Intel
MFC after:	3 days
2014-06-10 21:40:43 +00:00
Jim Harris
f42ca756b9 nvme: Allocate all MSI resources up front so that we can fall back to
INTx if necessary.

Sponsored by:	Intel
MFC after:	3 days
2014-03-18 18:10:35 +00:00
Jim Harris
496a27520d nvme: Close hole where nvd(4) would not be notified of all nvme(4)
instances if modules loaded during boot.

Sponsored by:	Intel
MFC after:	3 days
2014-03-18 18:09:08 +00:00
Jim Harris
1416ef361e nvme: NVMe specification dictates 4-byte alignment for PRPs (not 8).
Sponsored by:	Intel
MFC after:	3 days
2014-03-17 22:37:17 +00:00
Jim Harris
2b26030cbc nvme: Remove the software progress marker SET_FEATURE command during
controller initialization.

The spec says OS drivers should send this command after controller
initialization completes successfully, but other NVMe OS drivers are
not sending this command.  This change will therefore reduce differences
between the FreeBSD and other OS drivers.

Sponsored by:	Intel
MFC after:	3 days
2014-03-17 22:36:04 +00:00
Jim Harris
448cffc859 For IDENTIFY passthrough commands to Chatham prototype controllers, copy
the spoofed identify data into the user buffer rather than issuing the
command to the controller, since Chatham IDENTIFY data is always spoofed.

While here, fix a bug in the spoofed data for Chatham submission and
completion queue entry sizes.

Sponsored by:	Intel
MFC after:	3 days
2014-01-06 23:51:26 +00:00
Jim Harris
d603c3d73b Create a unique unit number for each controller and namespace cdev.
Sponsored by:	Intel
MFC after:	3 days
2013-11-01 23:30:54 +00:00
Jim Harris
8a959ae073 Fix the LINT build.
Approved by:	re (implicit)
MFC after:	1 week
2013-10-08 23:23:04 +00:00
Jim Harris
7aa27dbac5 Do not leak resources during attach if nvme_ctrlr_construct() or the initial
controller resets fail.

Sponsored by:	Intel
Reviewed by:	carl
Approved by:	re (hrs)
MFC after:	1 week
2013-10-08 16:01:43 +00:00
Jim Harris
bb2f67fd72 Log and then disable asynchronous notification of persistent events after
they occur.

This prevents repeated notifications of the same event.

Status of these events may be viewed at any time by viewing the
SMART/Health Info Page using nvmecontrol, whether or not asynchronous
events notifications for those events are enabled.  This log page can
be viewed using:

    nvmecontrol logpage -p 2 <ctrlr id>

Future enhancements may re-enable these notifications on a periodic basis
so that if the notified condition persists, it will continue to be logged.

Sponsored by:	Intel
Reviewed by:	carl
Approved by:	re (hrs)
MFC after:	1 week
2013-10-08 16:00:12 +00:00
Jim Harris
d5fc982133 Do not enable temperature threshold as an asynchronous event notification
on NVMe controllers that do not support it.

Sponsored by:	Intel
Reviewed by:	carl
Approved by:	re (hrs)
MFC after:	1 week
2013-10-08 15:49:14 +00:00
Jim Harris
992db80f1d Extend some 32-bit fields and variables to 64-bit to prevent overflow
when calculating stats in nvmecontrol perftest.

Sponsored by:	Intel
Reported by:	Joe Golio <joseph.golio@emc.com>
Reviewed by:	carl
Approved by:	re (hrs)
MFC after:	1 week
2013-10-08 15:47:22 +00:00
Jim Harris
a40e72a695 Add driver-assisted striping for upcoming Intel NVMe controllers that can
benefit from it.

Sponsored by:	Intel
Reviewed by:	kib (earlier version), carl
Approved by:	re (hrs)
MFC after:	1 week
2013-10-08 15:44:04 +00:00
Kenneth D. Merry
ce625ec719 Change the way that unmapped I/O capability is advertised.
The previous method was to set the D_UNMAPPED_IO flag in the cdevsw
for the driver.  The problem with this is that in many cases (e.g.
sa(4)) there may be some instances of the driver that can handle
unmapped I/O and some that can't.  The isp(4) driver can handle
unmapped I/O, but the esp(4) driver currently cannot.  The cdevsw
is shared among all driver instances.

So instead of setting a flag on the cdevsw, set a flag on the cdev.
This allows drivers to indicate support for unmapped I/O on a
per-instance basis.

sys/conf.h:	Remove the D_UNMAPPED_IO cdevsw flag and replace it
		with an SI_UNMAPPED cdev flag.

kern_physio.c:	Look at the cdev SI_UNMAPPED flag to determine
		whether or not a particular driver can handle
		unmapped I/O.

geom_dev.c:	Set the SI_UNMAPPED flag for all GEOM cdevs.
		Since GEOM will create a temporary mapping when
		needed, setting SI_UNMAPPED unconditionally will
		work.

		Remove the D_UNMAPPED_IO flag.

nvme_ns.c:	Set the SI_UNMAPPED flag on cdevs created here
		if NVME_UNMAPPED_BIO_SUPPORT is enabled.

vfs_aio.c:	In aio_qphysio(), check the SI_UNMAPPED flag on a
		cdev instead of the D_UNMAPPED_IO flag on the cdevsw.

sys/param.h:	Bump __FreeBSD_version to 1000045 for the switch from
		setting the D_UNMAPPED_IO flag in the cdevsw to setting
		SI_UNMAPPED in the cdev.

Reviewed by:	kib, jimharris
MFC after:	1 week
Sponsored by:	Spectra Logic
2013-08-15 22:52:39 +00:00
Jim Harris
086d23cfd3 If a controller fails to initialize, do not notify consumers (nvd) of its
namespaces.

Sponsoredy by:	Intel
Reviewed by:	carl
MFC after:	3 days
2013-08-13 21:49:32 +00:00
Jim Harris
56183abc2b Send a shutdown notification in the driver unload path, to ensure
notification gets sent in cases where system shuts down with driver
unloaded.

Sponsored by:	Intel
Reviewed by:	carl
MFC after:	3 days
2013-08-13 21:47:08 +00:00
Jim Harris
38441bd9a9 Add message when nvd disks are attached and detached.
As part of this commit, add an nvme_strvis() function which borrows
heavily from cam_strvis().  This will allow stripping of
leading/trailing whitespace and also handle unprintable characters
in model/serial numbers.  This function goes into a new nvme_util.c
file which is used by both the driver and nvmecontrol.

Sponsored by:	Intel
Reviewed by:	carl
MFC after:	3 days
2013-07-19 21:40:57 +00:00
Jim Harris
2fb37e8f1a Fix nvme(4) and nvd(4) to support non 512-byte sector sizes.
Recent testing with QEMU that has variable sector size support for
NVMe uncovered some of these issues.  Chatham prototype boards supported
only 512 byte sectors.

Sponsored by:	Intel
Reviewed by:	carl
MFC after:	3 days
2013-07-19 21:33:24 +00:00
Jim Harris
8e0ac13f5a Use pause() instead of DELAY() when polling for completion of admin
commands during controller initialization.

DELAY() does not work here during config_intrhook context - we need to
explicitly relinquish the CPU for the admin command completion to
get processed.

Sponsored by:	Intel
Reported by:	Adam Brooks <adam.j.brooks@intel.com>
Reviewed by:	carl
MFC after:	3 days
2013-07-17 23:26:56 +00:00
Jim Harris
e8f25c6266 Define constants for the lengths of the serial number, model number
and firmware revision in the controller's identify structure.

Also modify consumers of these fields to ensure they only use the
specified number of bytes for their respective fields.

Sponsored by:	Intel
Reviewed by:	carl
MFC after:	3 days
2013-07-17 23:23:38 +00:00
Jim Harris
66619178b5 Fix a poorly worded comment in nvme(4).
MFC after:	3 days
2013-07-11 15:02:38 +00:00
Jim Harris
bd6b0ac5be Add comment explaining why CACHE_LINE_SIZE is defined in nvme_private.h
if not already defined elsewhere.

Requested by:	attilio
MFC after:	3 days
2013-07-09 21:24:19 +00:00
Jim Harris
e9efbc134f Update copyright dates.
MFC after:	3 days
2013-07-09 21:22:17 +00:00
Jim Harris
ec526ea90b Do not retry failed async event requests.
Sponsored by:	Intel
MFC after:	3 days
2013-07-09 21:03:39 +00:00
Jim Harris
eb32b874f6 Add pci_enable_busmaster() and pci_disable_busmaster() calls in
nvme_attach() and nvme_detach() respectively.

Sponsored by:	Intel
MFC after:	3 days
2013-07-09 21:02:45 +00:00
Jim Harris
49fac6101d Add firmware replacement and activation support to nvmecontrol(8) through
a new firmware command.

NVMe controllers may support up to 7 firmware slots for storing of
different firmware revisions.  This new firmware command supports
firmware replacement (i.e. firmware download) with or without immediate
activation, or activation of a previously stored firmware image.  It
also supports selection of the firmware slot during replacement
operations, using IDENTIFY information from the controller to
check that the specified slot is valid.

Newly activated firmware does not take effect until the new controller
reset, either via a reboot or separate 'nvmecontrol reset' command to the
same controller.

Submitted by:	Joe Golio <joseph.golio@emc.com>
Obtained from:	EMC / Isilon Storage Division
MFC after:	3 days
2013-06-27 00:08:25 +00:00
Jim Harris
bbd412dd05 Remove remaining uio-related code.
The nvme_physio() function was removed quite a while ago, which was the
only user of this uio-related code.

Sponsored by:	Intel
MFC after:	3 days
2013-06-26 23:37:11 +00:00
Jim Harris
7b68ae1e5e Fail any passthrough command whose transfer size exceeds the controller's
max transfer size.  This guards against rogue commands coming in from
userspace.

Also add KASSERTS for the virtual address and unmapped bio cases, if the
transfer size exceeds the controller's max transfer size.

Sponsored by:	Intel
MFC after:	3 days
2013-06-26 23:32:45 +00:00
Jim Harris
8d09e3c400 Use MAXPHYS to specify the maximum I/O size for nvme(4).
Also allow admin commands to transfer up to this maximum I/O size, rather
than the artificial limit previously imposed.  The larger I/O size is very
beneficial for upcoming firmware download support.  This has the added
benefit of simplifying the code since both admin and I/O commands now use
the same maximum I/O size.

Sponsored by:	Intel
MFC after:	3 days
2013-06-26 23:27:17 +00:00
Jim Harris
5076698e19 Remove the NVME_IDENTIFY_CONTROLLER and NVME_IDENTIFY_NAMESPACE IOCTLs and replace
them with the NVMe passthrough equivalent.

Sponsored by:	Intel
2013-04-12 17:56:47 +00:00
Jim Harris
7c3f19d7bb Add support for passthrough NVMe commands.
This includes a new IOCTL to support a generic method for nvmecontrol(8) to pass
IDENTIFY, GET_LOG_PAGE, GET_FEATURES and other commands to the controller, rather than
separate IOCTLs for each.

Sponsored by:	Intel
2013-04-12 17:52:17 +00:00
Jim Harris
ca269f32ef Move the busdma mapping functions to nvme_qpair.c.
This removes nvme_uio.c completely.

Sponsored by:	Intel
2013-04-12 17:48:45 +00:00
Jim Harris
611060cab5 Remove the NVMe-specific physio and associated routines.
These were added early on for benchmarking purposes to avoid the mapped I/O
penalties incurred in kern_physio.  Now that FreeBSD (including kern_physio)
supports unmapped I/O, the need for these NVMe-specific routines no longer exists.

Sponsored by:	Intel
2013-04-12 17:44:55 +00:00
Jim Harris
97fafe2580 Add a mutex to each namespace, for general locking operations on the namespace.
Sponsored by:	Intel
2013-04-12 17:41:24 +00:00
Jim Harris
a90b810492 Rename the controller's fail_req_lock, so that it can be used for other
locking operations on the controller.

Sponsored by:	Intel
2013-04-12 17:36:48 +00:00
Jim Harris
e2b9900498 Do not panic when a busdma mapping operation fails.
Instead, print an error message and fail the associated command with
DATA_TRANSFER_ERROR NVMe completion status.

Sponsored by:	Intel
2013-04-12 17:34:49 +00:00
Jim Harris
5fdf9c3c8e Add unmapped bio support to nvme(4) and nvd(4).
Sponsored by:	Intel
2013-04-01 16:23:34 +00:00
Jim Harris
1e526bc478 Add "type" to nvme_request, signifying if its payload is a VADDR, UIO, or
NULL. This simplifies decisions around if/how requests are routed through
busdma.  It also paves the way for supporting unmapped bios.

Sponsored by:	Intel
2013-03-29 20:34:28 +00:00
Jim Harris
64432b473b Remove obsolete comment. This code has now been tested with the QEMU
NVMe device emulator.
2013-03-28 16:57:48 +00:00
Jim Harris
bb852ae89b Delete extra IO qpairs allocated based on number of MSI-X vectors, but
later found to not be usable because the controller doesn't support the
same number of queues.

This is not the normal case, but does occur with the Chatham prototype
board.

Sponsored by:	Intel
2013-03-28 16:54:19 +00:00
Jim Harris
bdd1fd402c Fix printf format issue on i386.
Reported by:	bz
2013-03-27 00:37:00 +00:00
Jim Harris
547d523eb8 Clean up debug prints.
1) Consistently use device_printf.
2) Make dump_completion and dump_command into something more
    human-readable.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 22:17:10 +00:00
Jim Harris
dd433dd0fb Move common code from the different nvme_allocate_request functions into a
separate function.

Sponsored by:	Intel
Suggested by:	carl
Reviewed by:	carl
2013-03-26 22:13:07 +00:00
Jim Harris
237d2019e5 Change a number of malloc(9) calls to use M_WAITOK instead of
M_NOWAIT.

Sponsored by:	Intel
Suggested by:	carl
Reviewed by:	carl
2013-03-26 22:11:34 +00:00
Jim Harris
955910a916 Replace usages of mtx_pool_find used for admin commands with a polling
mechanism.

Now that all requests are timed, we are guaranteed to get a completion
notification, even if it is an abort status due to a timed out admin
command.

This has the effect of simplifying the controller and namespace setup
code, so that it reads straight through rather than broken up into
a bunch of different callback functions.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 22:09:51 +00:00
Jim Harris
43a3725688 Abort and do not retry any outstanding admin commands left over after
a controller reset.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 22:06:05 +00:00
Jim Harris
232e2edb6c Add the ability to internally mark a controller as failed, if it is unable to
start or reset.  Also add a notifier for NVMe consumers for controller fail
conditions and plumb this notifier for nvd(4) to destroy the associated
GEOM disks when a failure occurs.

This requires a bit of work to cover the races when a consumer is sending
I/O requests to a controller that is transitioning to the failed state.  To
help cover this condition, add a task to defer completion of I/Os submitted
to a failed controller, so that the consumer will still always receive its
completions in a different context than the submission.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:58:38 +00:00
Jim Harris
3d7eb41c1b Just disable the controller instead of deleting IO queues during detach.
This is just as effective, and removes the need for a bunch of admin commands
to a controller that's going to be disabled shortly anyways.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:48:41 +00:00