freebsd-nq

Author	SHA1	Message	Date
Konstantin Belousov	5a21cd1941	The nvme module should explicitly declare dependency on the cam. If both nvme and cam are compiled as modules, nvme cannot be kldloaded otherwise. Reviewed by: imp Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-08-31 14:21:32 +00:00
Warner Losh	c2005bba77	Fix a few overlooked spots where the coded uses 16-bit NSIDs. Chuck Tuffli had submitted a more thorough patch that I was unaware of when I did my work and this brings in the bits I missed from that patch. PR: 220267 Submitted by: Chuck Tuffli	2017-08-29 15:46:34 +00:00
Warner Losh	519772814d	Add CAM/NVMe support for CAM_DATA_SG This adds support in pass(4) for data to be described with a scatter-gather list (sglist) to augment the existing (single) virtual address. Differential Revision: https://reviews.freebsd.org/D11361 Submitted by: Chuck Tuffli Reviewed by: imp@, scottl@, kenm@	2017-08-29 15:29:57 +00:00
Warner Losh	850564b948	Add new compile-time option NVME_USE_NVD that sets the default value of the runtime hw.nvme.use_vnd tunable. We still default to nvd unless otherwise requested. Sponsored by: Netflix	2017-08-28 23:54:25 +00:00
Warner Losh	c02565f9fa	Set the max transactions for NVMe drives better. Provided a better estimate for the number of transactions that can be pending at one time. This will be number of queues * number of trackers / 4, as suggested by Jim Harris. This gives a better estimate of the number of transactions that CAM should queue before applying back pressure. This should be revisted when we have real multi-queue support in CAM and the upper layers of the I/O stack. Sponsored by: Netflix	2017-08-28 23:54:20 +00:00
Warner Losh	030edcce02	Fill in reserved areas from NVMe spec in the IDENTIFY structure (struct nvme_controller_data) as defined in the NVM Express specification, revsion 1.3. Sponsored by: Netflix	2017-08-25 21:38:43 +00:00
Warner Losh	696c950297	NVME Namespace ID is 32-bits, so widen interface to reflect that. Sponsored by: Netflix	2017-08-25 21:38:38 +00:00
Warner Losh	223a9b93ac	Add feature codes from NVMe 1.3 specification: o Automomous Power State Transition o Host Memory Buffer o Timestamp o Keep Alive Timer o Host Controlled Thermal Management o Non-Operational Power State Config Also note that feature codes 0x78-0x7f are reserved for the NVMe Management Interface. Sponsored by: Netflix	2017-08-25 21:38:29 +00:00
Warner Losh	0012e436e3	Use _Static_assert These files are compiled in userland too, so we can't use sys/systm.h and rely on CTASSERT. Switch to using _Static_assert instead. MFC After: 3 days Sponsored by: Netflix	2017-08-25 04:33:06 +00:00
Warner Losh	0c26c1992f	Sanity check sizes Add compile time sanity checks to make sure that packed structures are the proper size, typically as defined in the NVMe standard.	2017-08-25 04:05:53 +00:00
Warner Losh	abb61405a6	Enable bus mastering on the device before resetting the device. The card has to do PCIe transactions to complete the reset process, but can't do them, per the PCIe spec, unless bus mastering is enabled. Submitted by: Kinjal Patel PR: 22166	2017-08-25 03:15:18 +00:00
Nathan Whitehorn	c670f31f19	Move NVME controller shutdown from being called as part of module unloading to being called through the newbus DEVICE_SHUTDOWN() path. This ensures that the NVME controller gets shut down before the device and bus disappear and prevents data corruption on shutdown on at least Samsung EVO 960 SSDs. PR: kern/211852 Reviewed by: imp MFC after: 2 weeks	2017-08-12 22:13:06 +00:00
Warner Losh	d0e75394cf	Use the correct queue depth for nda devices. Submitted by: Matt Williams	2017-08-08 16:06:16 +00:00
Warner Losh	8a5d94f94d	Make nvd vs nda choice boot-time rather than build-time Introduce hw.nvme.use_nvd tunable. This tunable allows both nvd and nda to be installed in the kernel, while allowing only one of them to create devices. This is an all-or-nothing setting, and you can't change it after boot-time. However, it will allow easier A/B testing. Differential Revision: https://reviews.freebsd.org/D11825	2017-08-04 03:40:01 +00:00
Warner Losh	df4245150a	This adds CAM pass(4) support for NVMe IO's. Applications indicate the IO type (Admin or NVM) using XPT op-codes XPT_NVME_ADMIN or XPT_NVME_IO. Submitted by: Chuck Tuffli <chuck@tuffli.net> Differential Revision: https://reviews.freebsd.org/D10247	2017-07-14 14:52:20 +00:00
Warner Losh	594ffc03cd	Add new definitions for namespaces. Sponsored by: Netflix Submitted by: Matt Williams (via D11330)	2017-06-27 20:24:39 +00:00
Warner Losh	824073fbd6	Avoid dereferencing unintialized elements in the error path. Some drives sometimes have errors for things like setting the number of queue entries in the submission queue. The error paths taken for these drives ensure a panic dereferencing uninialized data. Sponsored by: Netflix	2017-03-07 23:06:41 +00:00
Warner Losh	05ee702af6	cwd10 takes the low 32-bits and cwd11 takes the upper 32-bits of the lba. Rather than do a cast to uint64_t, which clang warns might be unaligned, do the stores 32-bits at a time. Sponsored by: Netflix	2017-03-07 23:02:59 +00:00
Warner Losh	a8a18dd590	Make multi-namespace nvme drives more robust. Fix assumptions about name spaces in NVME driver. First, it assumes cdata.nn is the number of configured devices. However, it is the number of supported name spaces. Second, it assumes that there will never be more than 16 name spaces supported, but a certain drive I'm testing reports 1024. It assumes that name spaces are a tightly packed namespace, but the standard seems to indicate otherwise. Finally, it assumes that an error would be generated when quearying an unconfigured namespace. Instead, it succeeds but the identify data is all zeros. Fix these by limiting the number of name spaces we probe to 16. Remove aborting when we find one in error. When the size of the name space is zero, ignore it. This is admittedly a bandaide. The long term fix will be to participate in the enumeration and name space change protocols definfed in the NVNe standard. Sponsored by: Netflix	2017-03-07 21:47:54 +00:00
Warner Losh	adc8145e6f	Remove obsolete comment after prior rev.	2017-02-19 17:38:17 +00:00
Alexander Motin	950c5aca4a	Remove dead mentions of CAM target mode APIs from drivers. This makes grepping kernel for target mode implementation much easier.	2017-02-19 17:27:58 +00:00
Warner Losh	a3a6c48d66	Ensure that the passthrough request will fit in MAXPHYS bytes after it has been rounded to full pages. This avoids a panic in vm_fault_quick_hold_pages due to this off-by-one error passing one page too many into vmapbuf.	2017-02-02 23:04:06 +00:00
Ravi Pokala	d3c06026c2	In the same vein as r311350, fix whitespace in handling of XPT_PATH_INQ in several more drivers. Sponsored by: Panasas	2017-01-05 03:08:57 +00:00
Alan Somers	4195c7de24	Always null-terminate ccb_pathinq.(sim_vid\|hba_vid\|dev_name) The sim_vid, hba_vid, and dev_name fields of struct ccb_pathinq are fixed-length strings. AFAICT the only place they're read is in sbin/camcontrol/camcontrol.c, which assumes they'll be null-terminated. However, the kernel doesn't null-terminate them. A bunch of copy-pasted code uses strncpy to write them, and doesn't guarantee null-termination. For at least 4 drivers (mpr, mps, ciss, and hyperv), the hba_vid field actually overflows. You can see the result by doing "camcontrol negotiate da0 -v". This change null-terminates those fields everywhere they're set in the kernel. It also shortens a few strings to ensure they'll fit within the 16-character field. PR: 215474 Reported by: Coverity CID: 1009997 1010000 `1010001` 1010002 1010003 1010004 1010005 CID: 1331519 1010006 1215097 1010007 1288967 1010008 1306000 CID: 1211924 1010009 1010010 1010011 1010012 1010013 1010014 CID: 1147190 1010017 1010016 1010018 1216435 1010020 1010021 CID: 1010022 1009666 1018185 1010023 1010025 1010026 1010027 CID: 1010028 1010029 1010030 1010031 1010033 1018186 1018187 CID: 1010035 1010036 1010042 1010041 1010040 1010039 Reviewed by: imp, sephe, slm MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D9037 Differential Revision: https://reviews.freebsd.org/D9038	2017-01-04 20:26:42 +00:00
Warner Losh	0cf14228c4	Implement HGST Log page 0xc1, as documented in the HGST SN100 and SN150 product manuals. Subpage 0x32 is documented, but not implemented. Sponsored by: Netflix, Inc	2016-11-19 17:13:08 +00:00
Warner Losh	ab1dd0917b	Print Intel's expanded Temperature log page. Sponsored by: Netflix, Inc	2016-11-19 17:13:03 +00:00
Warner Losh	d01f26f590	Add log pages that Intel SSDs provide. It turns out that many of these are widely implemented beyond just Intel drives. Sponsored by: Netflix, Inc	2016-11-19 17:12:58 +00:00
Warner Losh	aea528795a	Add log pages defined through NVM Express 1.2.1. Sponsored by: Netflix, Inc	2016-11-19 17:12:53 +00:00
Warner Losh	dc58cdf95e	Expand the SMART / Health Information Log Page (Page 02) printout based on NVM Express 1.2.1 Standard. Sponsored by: Netflix, Inc	2016-11-19 17:12:49 +00:00
Scott Long	a965389b5a	Convert the Q-Pair and PRP list memory allocations to use BUSDMA. Add a bunch of safery belts and error handling in related codepaths. Reviewed by: jimharris Obtained from: Netflix Differential Revision: D8453	2016-11-08 00:24:49 +00:00
Warner Losh	34dc8f1bb4	Kill a few stray debug printfs.	2016-07-28 22:40:31 +00:00
Warner Losh	3a31c31c22	Actually import nvme_sim so the CAM attachment for NVME (nda) actually works. MFC after: 1 week	2016-07-21 03:11:39 +00:00
Scott Long	49e20d2420	Supporting flushing the dump before returning, and simplify/combine the logic. Switch to a 5us delay since most NVME devices can easily do 200,000 iops. Submitted by: imp MFC after: 3 days Sponsored by: Netflix, Inc.	2016-07-19 19:09:23 +00:00
Scott Long	a498975ef7	Implement crashdump support on NVME MFC after: 3 days Sponsored by: Netflix, Inc.	2016-07-19 03:13:51 +00:00
Warner Losh	f24c011beb	Commit the bits of nda that were missed. This should fix the build. Approved by: re@	2016-06-10 06:04:53 +00:00
Alexander Motin	ee7f4d8187	Revert r292074 (by smh): Limit stripesize reported from nvd(4) to 4K I believe that this patch handled the problem from the wrong side. Instead of making ZFS properly handle large stripe sizes, it made unrelated driver to lie in reported parameters to workaround that. Alternative solution for this problem from ZFS side was committed at r296615. Discussed with: smh	2016-03-10 17:13:10 +00:00
Jim Harris	361e1fb408	nvme: fix intx handler to not dereference ioq during initialization This was a regression from r293328, which deferred allocation of the controller's ioq array until after interrupts are enabled during boot. PR: 207432 Reported and tested by: Andy Carrel <wac@google.com> MFC after: 3 days Sponsored by: Intel	2016-02-24 00:01:10 +00:00
Justin Hibbits	43cd61606b	Replace several bus_alloc_resource() calls using default arguments with bus_alloc_resource_any() Since these calls only use default arguments, bus_alloc_resource_any() is the right call. Differential Revision: https://reviews.freebsd.org/D5306	2016-02-19 03:37:56 +00:00
Jim Harris	7b036d7790	nvme: avoid duplicate SET_NUM_QUEUES commands nvme(4) issues a SET_NUM_QUEUES command during device initialization to ensure enough I/O queues exists for each of the MSI-X vectors we have allocated. The SET_NUM_QUEUES command is then issued again during nvme_ctrlr_start(), to ensure that is properly set after any controller reset. At least one NVMe drive exists which fails this second SET_NUM_QUEUES command during device initialization. So change nvme_ctrlr_start() to only issue its SET_NUM_QUEUES command when it is coming out of a reset - avoiding the duplicate SET_NUM_QUEUES during device initialization. Reported by: gallatin MFC after: 3 days Sponsored by: Intel	2016-02-11 17:32:41 +00:00
Warner Losh	038659e7dd	Implement power command to list all power modes, find out the power mode we're in and to set the power mode.	2016-01-30 22:48:06 +00:00
Jim Harris	9c6b5d40eb	nvme: replace NVME_CEILING macro with howmany() Suggested by: rpokala MFC after: 3 days	2016-01-07 20:35:26 +00:00
Jim Harris	50dea2da12	nvme: add hw.nvme.min_cpus_per_ioq tunable Due to FreeBSD system-wide limits on number of MSI-X vectors (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321), it may be desirable to allocate fewer than the maximum number of vectors for an NVMe device, in order to save vectors for other devices (usually Ethernet) that can take better advantage of them and may be probed after NVMe. This tunable is expressed in terms of minimum number of CPUs per I/O queue instead of max number of queues per controller, to allow for a more even distribution of CPUs per queue. This avoids cases where some number of CPUs have a dedicated queue, but other CPUs need to share queues. Ideally the PR referenced above will eventually be fixed and the mechanism implemented here becomes obsolete anyways. While here, fix a bug in the CPUs per I/O queue calculation to properly account for the admin queue's MSI-X vector. Reviewed by: gallatin MFC after: 3 days Sponsored by: Intel	2016-01-07 20:32:04 +00:00
Jim Harris	2b647da7a0	nvme: do not revert o single I/O queue when per-CPU queues not possible Previously nvme(4) would revert to a signle I/O queue if it could not allocate enought interrupt vectors or NVMe submission/completion queues to have one I/O queue per core. This patch determines how to utilize a smaller number of available interrupt vectors, and assigns (as closely as possible) an equal number of cores to each associated I/O queue. MFC after: 3 days Sponsored by: Intel	2016-01-07 16:18:32 +00:00
Jim Harris	d400f790b1	nvme: break out interrupt setup code into a separate function MFC after: 3 days Sponsored by: Intel	2016-01-07 16:12:42 +00:00
Jim Harris	e5af5854ff	nvme: do not pre-allocate MSI-X IRQ resources The issue referenced here was resolved by other changes in recent commits, so this code is no longer needed. MFC after: 3 days Sponsored by: Intel	2016-01-07 16:11:31 +00:00
Jim Harris	c75ad8ce5a	nvme: remove per_cpu_io_queues from struct nvme_controller Instead just use num_io_queues to make this determination. This prepares for some future changes enabling use of multiple queues when we do not have enough queues or MSI-X vectors for one queue per CPU. MFC after: 3 days Sponsored by: Intel	2016-01-07 16:09:56 +00:00
Jim Harris	d85f84abb8	nvme: simplify some of the nested ifs in interrupt setup code This prepares for some follow-up commits which do more work in this area. MFC after: 3 days Sponsored by: Intel	2016-01-07 16:08:04 +00:00
Steven Hartland	fdf16a68ab	Limit stripesize reported from nvd(4) to 4K Intel NVMe controllers have a slow path for I/Os that span a 128KB stripe boundary but ZFS limits ashift, which is derived from d_stripesize, to 13 (8KB) so we limit the stripesize reported to geom(8) to 4KB. This may result in a small number of additional I/Os to require splitting in nvme(4), however the NVMe I/O path is very efficient so these additional I/Os will cause very minimal (if any) difference in performance or CPU utilisation. This can be controller by the new sysctl kern.nvme.max_optimal_sectorsize. MFC after: 1 week Sponsored by: Multiplay Differential Revision: https://reviews.freebsd.org/D4446	2015-12-11 02:06:03 +00:00
Jim Harris	fdbd3d8068	nvd, nvme: report stripesize through GEOM disk layer MFC after: 3 days Sponsored by: Intel	2015-10-30 16:35:18 +00:00
Jim Harris	e7e7bad3d7	nvme: fix race condition in split bio completion path Fixes race condition observed under following circumstances: 1) I/O split on 128KB boundary with Intel NVMe controller. Current Intel controllers produce better latency when I/Os do not span a 128KB boundary - even if the I/O size itself is less than 128KB. 2) Per-CPU I/O queues are enabled. 3) Child I/Os are submitted on different submission queues. 4) Interrupts for child I/O completions occur almost simultaneously. 5) ithread for child I/O A increments bio_inbed, then immediately is preempted (rendezvous IPI, higher priority interrupt). 6) ithread for child I/O B increments bio_inbed, then completes parent bio since all children are now completed. 7) parent bio is freed, and immediately reallocated for a VFS or gpart bio (including setting bio_children to 1 and clearing bio_driver1). 8) ithread for child I/O A resumes processing. bio_children for what it thinks is the parent bio is set to 1, so it thinks it needs to complete the parent bio. Result is either calling a NULL callback function, or double freeing the bio to its uma zone. PR: 203746 Reported by: Drew Gallatin <gallatin@netflix.com>, Marc Goroff <mgoroff@quorum.net> Tested by: Drew Gallatin <gallatin@netflix.com> MFC after: 3 days Sponsored by: Intel	2015-10-30 16:06:34 +00:00

1 2 3 4

162 Commits