freebsd-nq

Author	SHA1	Message	Date
Chuck Tuffli	9544e6dcf1	Make NVMe compatible with the original API The original NVMe API used bit-fields to represent fields in data structures defined by the specification (e.g. the op-code in the command data structure). The implementation targeted x86_64 processors and defined the bit fields for little endian dwords (i.e. 32 bits). This approach does not work as-is for big endian architectures and was changed to use a combination of bit shifts and masks to support PowerPC. Unfortunately, this changed the NVMe API and forces #ifdef's based on the OS revision level in user space code. This change reverts to something that looks like the original API, but it uses bytes instead of bit-fields inside the packed command structure. As a bonus, this works as-is for both big and little endian CPU architectures. Bump __FreeBSD_version to 1200081 due to API change Reviewed by: imp, kbowling, smh, mav Approved by: imp (mentor) Differential Revision: https://reviews.freebsd.org/D16404	2018-08-22 04:29:24 +00:00
Justin Hibbits	2e0090af65	nvme(4): Add bus_dmamap_sync() at the end of the request path Summary: Some architectures, in this case powerpc64, need explicit synchronization barriers vs device accesses. Prior to this change, when running 'make buildworld -j72' on a 18-core (72-thread) POWER9, I would see controller resets often. With this change, I don't see these resets messages, though another tester still does, for yet to be determined reasons, so this may not be a complete fix. Additionally, I see a ~5-10% speed up in buildworld times, likely due to not needing to reset the controller. Reviewed By: jimharris Differential Revision: https://reviews.freebsd.org/D16570	2018-08-03 20:04:06 +00:00
Alexander Motin	f439e3a4ff	Refactor NVMe CAM integration. - Remove layering violation, when NVMe SIM code accessed CAM internal device structures to set pointers on controller and namespace data. Instead make NVMe XPT probe fetch the data directly from hardware. - Cleanup NVMe SIM code, fixing support for multiple namespaces per controller (reporting them as LUNs) and adding controller detach support and run-time namespace change notifications. - Add initial support for namespace change async events. So far only in CAM mode, but it allows run-time namespace arrival and departure. - Add missing nvme_notify_fail_consumers() call on controller detach. Together with previous changes this allows NVMe device detach/unplug. Non-CAM mode still requires a lot of love to stay on par, but at least CAM mode code should not stay in the way so much, becoming much more self-sufficient. Reviewed by: imp MFC after: 1 month Sponsored by: iXsystems, Inc.	2018-05-25 03:34:33 +00:00
Warner Losh	041f49aece	Remove the 'All Rights Reserved' clause from some of the stuff I've done for Netflix, since I'm in the neighborhood.	2018-05-09 20:32:23 +00:00
Alexander Motin	c252f63740	Fix LOR between controller and queue locks. Admin pass-through requests took controller lock before the queue lock, but in case of request submission to a failed controller controller lock was taken after the queue lock. Fix that by reducing the lock scopes and switching to mtx_pool locks to track pass-through request completion. Sponsored by: iXsystems, Inc.	2018-05-02 20:13:03 +00:00
Alexander Motin	e134ecdcfc	Improve nvme(4) attach/detach sequences. This change allows clean device detach on attach failures and driver unload, while previous code tried to talk to already shut down controller, or even accessed resources failed to allocate. Sponsored by: iXsystems, Inc.	2018-04-30 23:05:57 +00:00
Alexander Motin	c6c70c0746	Fix use-after-free in nvme_qpair_destroy(). dma_tag_payload should not be destroyed before payload_dma_map, and seems it should be used there instead of dma_tag to match creation. Sponsored by: iXsystems, Inc.	2018-04-30 21:28:10 +00:00
Alexander Motin	e4c7e3a1b9	Set si_drv1 for nvmeXnsY in a new race-free way. r332897 switched to new KPI, but havent used its main benefit. Sponsored by: iXsystems, Inc.	2018-04-30 19:21:20 +00:00
Warner Losh	76583d573d	Migrate to make_dev_s interface to populate /dev/nvmeX entries Submitted by: Michael Hordijk Differential Revision: https://reviews.freebsd.org/D15162	2018-04-23 22:30:17 +00:00
Warner Losh	e8bef32ce2	Reword comment to remove awkward constructs, including an "it's" that shouldn't have been there at all (it wasn't a typo for its, rather a left-over from an older revision of the comment). Noticed by: many	2018-04-19 16:05:48 +00:00
Warner Losh	b3e85e7a79	Intel drives have an optimal alignment for I/O. While they honor I/Os that cross this boundary, they perform better when this isn't the case. Intel uses the 3rd byte in the vendor specific area for this. The DC P3500 was previously listed without any explanation. Add the DC P3520 and DC P4500 to the list. There won't be any others drives needing this quirk. Intel has standardized a field in the namespace data in 1.3 (noiob). A future patch will use that if it exists, with fallback to this method. Submitted by: Keith Busch Reviewed by: jimharris@	2018-04-19 15:39:20 +00:00
Warner Losh	afdbfe1e1b	Starting LBA is a 64bit number, so use htole64 instead of htole32. The latter casts the LBA to a 32-bit number before assigning it to the 64 bit structure entity. This works fine on the first 2TB of TRIMs, but terrible beyond that due to trucation. Also, add an assert to make sure we don't end too many DSM TRIM entries in one request. Sponsored by: Netflix	2018-03-20 03:37:14 +00:00
Warner Losh	d85d964829	Try polling the qpairs on timeout. On some systems, we're getting timeouts when we use multiple queues on drives that work perfectly well on other systems. On a hunch, Jim Harris suggested I poll the completion queue when we get a timeout. This patch polls the completion queue if no fatal status was indicated. If it had pending I/O, we complete that request and return. Otherwise, if aborts are enabled and no fatal status, we abort the command and return. Otherwise we reset the card. This may clear up the problem, or we may see it result in lots of timeouts and a performance problem. Either way, we'll know the next step. We may also need to pay attention to the fatal status bit of the controller. PR: 211713 Suggested by: Jim Harris Sponsored by: Netflix	2018-03-16 05:23:48 +00:00
Warner Losh	5d7fd8f726	Fix error messages in cut and pasted code. Also, fix an unnecessary deref to get ctrlr. Noticed by: rpokala@ Sponsored by: Netflix	2018-03-14 23:28:28 +00:00
Warner Losh	8b1e6ebe0e	When tearing down a queue pair, also delete the queue entries. The NVME standard has required in section 7.2.6, since at least 1.1, that a clean shutdown is signalled by deleting the subission and the completion queues before setting the shutdown bit in CC. The 1.0 standard, apparently, did not and many of the early Intel cards didn't care. Some newer cards care, at least one whose beta firmware can scramble the card on an unclean shutdown. Linux has done this for some time. To make it possible to move forward with an evaluation of this pre-release card with wonky firmware, delete the queues on the card when we delete the qpair structures. Sponsored by: Netflix	2018-03-14 23:01:18 +00:00
Warner Losh	d61cf64d0e	Don't make the namespace devices eternal. We'll need to delete namespaces soon, so go ahead and stop making these devices eternal. It doesn't help much, and will be getting in the way soon. Sponsored by: Netflix	2018-03-14 23:01:04 +00:00
Warner Losh	807e94b2c3	Implement trim collapsing in nda When multiple trims are in the queue, collapse them as much as possible. At present, this usually results in only a few trims being collapsed together, but more work on that will make it possible to do hundreds (up to some configurable max). Sponsored by: Netflix	2018-03-14 16:44:50 +00:00
Alexander Motin	01c1be35e0	Print fuses and fna fields in identify data. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2018-03-12 16:31:25 +00:00
Alexander Motin	6b1a96b16b	Add new opcodes and statuses from NVMe 1.3a. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2018-03-11 06:30:09 +00:00
Alexander Motin	3fa5467a06	Add new identify data structures fields from NVMe 1.3a. Some of them are already supported by existing hardware, so reporting them `nvmecontrol identify` can be useful.	2018-03-11 05:09:02 +00:00
Kyle Evans	afdc2600c2	nvme: Unbreak LE builds after r329824 The parameter 'p' is unused if _BYTE_ORDER == _LITTLE_ENDIAN. Add in a (void)p to fix the build.	2018-02-22 16:16:49 +00:00
Wojciech Macek	0d787e9b35	NVMe: Add big-endian support Remove bitfields from defined structures as they are not portable. Instead use shift and mask macros in the driver and nvmecontrol application. NVMe is now working on powerpc64 host. Submitted by: Michal Stanek <mst@semihalf.com> Obtained from: Semihalf Reviewed by: imp, wma Sponsored by: IBM, QCM Technologies Differential revision: https://reviews.freebsd.org/D13916	2018-02-22 13:32:31 +00:00
Warner Losh	0028abe633	Backout r329818, r329816 and r329815. These aren't the commits I thought I was testing prior to commit. Revert until I can sort out what happened and fix it.	2018-02-22 11:18:33 +00:00
Warner Losh	4d87e27125	Combine BIO_DELETE requests for nda devices Now that we're queueing BIO_DELETE requests in the CAM I/O scheduler, it make sense to try to combine as many as possible into a single request to send down to hardware. Hopefully, lots of larger requests like this are better than lots of individual transactions. Note for future: need to limit based on total size of the trim request. Should also collapse adjacent ranges where possible to increase the size of the max payload. Sponsored by: Netflix	2018-02-22 05:44:00 +00:00
Warner Losh	29077eb456	Use atomic load and stores to ensure that the compiler doesn't optimize away these loops. Change boolean to int to match what atomic API supplies. Remove wmb() since the atomic_store_rel() on status.done ensure the prior writes to status. It also fixes the fact that there wasn't a rmb() before reading done. This should also be more efficient since wmb() is fairly heavy weight. Sponsored by: Netflix Reviewed by: kib@, jim harris Differential Revision: https://reviews.freebsd.org/D14053	2018-01-29 00:00:52 +00:00
Pedro F. Giffuni	ac2fffa4b7	Revert r327828, r327949, r327953, r328016-r328026, r328041: Uses of mallocarray(9). The use of mallocarray(9) has rocketed the required swap to build FreeBSD. This is likely caused by the allocation size attributes which put extra pressure on the compiler. Given that most of these checks are superfluous we have to choose better where to use mallocarray(9). We still have more uses of mallocarray(9) but hopefully this is enough to bring swap usage to a reasonable level. Reported by: wosch PR: 225197	2018-01-21 15:42:36 +00:00
Warner Losh	7e5f6f2588	Move setting of CAM_SIM_QUEUED to before we actually submit it to the hardware. Setting it after is racy, and we can lose the race on a heavily loaded system. Reviewed by: scottl@, gallatin@ Sponsored by: Netflix	2018-01-17 17:08:26 +00:00
Pedro F. Giffuni	26c1d774b5	dev: make some use of mallocarray(9). Focus on code where we are doing multiplications within malloc(9). None of these is likely to overflow, however the change is still useful as some static checkers can benefit from the allocation attributes we use for mallocarray. This initial sweep only covers malloc(9) calls with M_NOWAIT. No good reason but I started doing the changes before r327796 and at that time it was convenient to make sure the sorrounding code could handle NULL values.	2018-01-13 22:30:30 +00:00
Warner Losh	4484c8f5d2	Return domain, bus, slot, and function for the transport settings in PATH_INQ requests for nvme. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D13546	2017-12-20 19:13:55 +00:00
Warner Losh	989c7f0b7c	Although we only have one quirk at the moment, guard against the day we have more than one by checking the actual quirk bit before delaying the reset. Noticed by: rpokala@	2017-12-18 20:11:21 +00:00
Warner Losh	ce1ec9c178	When we're disabling the nvme device, some drives have a controller bug that requires 'hands off' for a period of time (2.3s) before we check the RDY bit. Sicne this is a very odd quirk for a very limited selection of drives, do this as a quirk. This prevented a successful reset of the card when the card wedged. Also, make sure that we comply with the advice from section 3.1.5 of the 1.3 spec says that transitioning CC.EN from 0 to 1 when CSTS.RDY is 1 or transitioning CC.EN from 1 to 0 when CSTS.RDY is 0 "has undefined results". Short circuit when EN == RDY == desired state. Finally, fail the reset if the disable fails. This will lead to a failed device, which is what we want. (note: nda device needs work for coping with a failed device). Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D13389	2017-12-18 18:38:00 +00:00
Pedro F. Giffuni	718cf2ccb9	sys/dev: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.	2017-11-27 14:52:40 +00:00
Warner Losh	eab9d0a85b	Inline pcie_link_{status,caps} where needed. Remove them as they aren't really needed and I don't want to document them. Suggested by: jhb@ Sponsored by: Netflix	2017-11-15 02:24:47 +00:00
Warner Losh	4e3b274457	Provide link speed data in XPT_GET_TRAN_SETTINGS. Provide full version information for that and XPT_PATH_INQ. Provide macros to encode/decode major/minor versions. Read the link speed and lane count to compute the base_transfer_speed for XPT_PATH_INQ. Sponsored by: Netflix	2017-11-14 05:05:16 +00:00
Warner Losh	fa271a5d09	Closer examination shows that nvme and CAM both normally zero-fill allocations (for req and ccb, which ultimately contain the nvme_cmd). As such, we can micro-optimize these routines. Add a comment to this effect, and bzero the ccb used to make the requests for the nda dump rotuine so it more closely matches a ccb allocated with xpt_get_ccb(). Sponsored by: Netflix	2017-10-15 23:53:55 +00:00
Warner Losh	29431e54b9	Use nvme_ctrlr_poll instead of nvme_ctrlr_intx_handler since it is more general and doesn't try to access registers that may be undefined when the card is in MSIX mode. This change, along with r324630, r324631, r324632, makes nda crash dumps work again. Previously, they only worked on CPU 0 when the stack garbage was just so. Sponsored by: Netflix Suggested by: scottl@ (who provided earlier version of the patch)	2017-10-15 16:19:09 +00:00
Warner Losh	bb1c7be429	Create general polling function for the nvme controller. Use it when we're doing the various pin-based interrupt modes. Adjust nvme_ctrlr_intx_handler to use nvme_ctrlr_poll. Sponsored by: Netflix Suggested by: scottl@	2017-10-15 16:18:08 +00:00
Warner Losh	fbed8df259	Explicitly set reserved fields and 'fuse' to 0. This prevents us from acidentally sending bogus values in these fields, which some drives may reject with an error or worse (undefined behavior). This is especially needed for the ndadump routine which allocates the cmd from stack garbage.... Sponsored by: Netflix	2017-10-15 16:17:59 +00:00
Warner Losh	cfb43eb12e	Tweak performance of nda completions Use xpt_done_direct in preference to xpt_done when completing a successful I/O. Continue to use xpt_done when there's an error, or for completion of the submission of a CCB. This eliminates a context switch to the cam_doneq thread. Sponsored by: Netflix Suggested by: scottl@	2017-09-28 01:27:00 +00:00
Warner Losh	5fff95cc1d	Fix queue depth for nda. 1/4 of the number of queues times queue entries is too limiting. It works up to about 4k IOPS / 3.0GB/s for hardware that can do 4.4k/3.2GB/s with nvd. 3/4 works better, though it highlights issues in the fairness of nda's choice of TRIM vs READ. That will be fixed separately.	2017-09-20 21:42:25 +00:00
Konstantin Belousov	5a21cd1941	The nvme module should explicitly declare dependency on the cam. If both nvme and cam are compiled as modules, nvme cannot be kldloaded otherwise. Reviewed by: imp Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-08-31 14:21:32 +00:00
Warner Losh	c2005bba77	Fix a few overlooked spots where the coded uses 16-bit NSIDs. Chuck Tuffli had submitted a more thorough patch that I was unaware of when I did my work and this brings in the bits I missed from that patch. PR: 220267 Submitted by: Chuck Tuffli	2017-08-29 15:46:34 +00:00
Warner Losh	519772814d	Add CAM/NVMe support for CAM_DATA_SG This adds support in pass(4) for data to be described with a scatter-gather list (sglist) to augment the existing (single) virtual address. Differential Revision: https://reviews.freebsd.org/D11361 Submitted by: Chuck Tuffli Reviewed by: imp@, scottl@, kenm@	2017-08-29 15:29:57 +00:00
Warner Losh	850564b948	Add new compile-time option NVME_USE_NVD that sets the default value of the runtime hw.nvme.use_vnd tunable. We still default to nvd unless otherwise requested. Sponsored by: Netflix	2017-08-28 23:54:25 +00:00
Warner Losh	c02565f9fa	Set the max transactions for NVMe drives better. Provided a better estimate for the number of transactions that can be pending at one time. This will be number of queues * number of trackers / 4, as suggested by Jim Harris. This gives a better estimate of the number of transactions that CAM should queue before applying back pressure. This should be revisted when we have real multi-queue support in CAM and the upper layers of the I/O stack. Sponsored by: Netflix	2017-08-28 23:54:20 +00:00
Warner Losh	030edcce02	Fill in reserved areas from NVMe spec in the IDENTIFY structure (struct nvme_controller_data) as defined in the NVM Express specification, revsion 1.3. Sponsored by: Netflix	2017-08-25 21:38:43 +00:00
Warner Losh	696c950297	NVME Namespace ID is 32-bits, so widen interface to reflect that. Sponsored by: Netflix	2017-08-25 21:38:38 +00:00
Warner Losh	223a9b93ac	Add feature codes from NVMe 1.3 specification: o Automomous Power State Transition o Host Memory Buffer o Timestamp o Keep Alive Timer o Host Controlled Thermal Management o Non-Operational Power State Config Also note that feature codes 0x78-0x7f are reserved for the NVMe Management Interface. Sponsored by: Netflix	2017-08-25 21:38:29 +00:00
Warner Losh	0012e436e3	Use _Static_assert These files are compiled in userland too, so we can't use sys/systm.h and rely on CTASSERT. Switch to using _Static_assert instead. MFC After: 3 days Sponsored by: Netflix	2017-08-25 04:33:06 +00:00
Warner Losh	0c26c1992f	Sanity check sizes Add compile time sanity checks to make sure that packed structures are the proper size, typically as defined in the NVMe standard.	2017-08-25 04:05:53 +00:00

1 2 3 4 5

202 Commits