freebsd-nq

Author	SHA1	Message	Date
Alexander Motin	a6d222eb68	Add more random bits from NVMe 1.4. MFC after: 2 weeks	2019-08-03 02:36:35 +00:00
Alexander Motin	90dfa8f0ac	Add more new fields and values from NVMe 1.4. MFC after: 2 weeks	2019-08-02 03:43:24 +00:00
Warner Losh	5e83c2ffaa	Keep track of the number of commands that exhaust their retry limit. While we print failure messages on the console, sometimes logs are lost or overwhelmed. Keeping a count of how many times we've failed retriable commands helps get a magnitude of the problem.	2019-07-19 18:39:24 +00:00
Warner Losh	c37fc318c4	Keep track of the number of retried commands. Retried commands can indicate a performance degredation of an nvme drive. Keep track of the number of retries and report it out via sysctl, just like number of commands an interrupts.	2019-07-19 18:39:18 +00:00
Warner Losh	c75bdc044d	Provide new tunable hw.nvme.verbose_cmd_dump The nvme drive dumps only the most relevant details about a command when it fails. However, there are times this is not sufficient (such as debugging weird issues for a new drive with a vendor). Setting hw.nvme.verbose_cmd_dump=1 in loader.conf will enable more complete debugging information about each command that fails. Reviewed by: rpokala Sponsored by: Netflix Differential Version: https://reviews.freebsd.org/D20988	2019-07-18 21:58:51 +00:00
Warner Losh	d0aaeffdb4	Since a fatal trap can happen at aribtrary times, don't panic when the completions are not in a consistent state. Cope with the different places the normal I/O completion polling thread can be interrupted and then re-entered during a kernel panic + dump. Reviewed by: jhb and markj (both prior versions) Differential Revision: https://reviews.freebsd.org/D20478	2019-06-01 15:37:44 +00:00
Warner Losh	2ffd6fce5b	Don't print all the I/O we abort on a reset, unless we're out of retries. When resetting the controller, we abort I/O. Prior to this fix, we printed a ton of abort messages for I/O that we're going to retry. This imparts no useful information. Stop printing them unless our retry count is exhausted. Clarify code for when we don't retry, and remove useless arg to a routine that's always called with it as 'true'. All the other debug is still printed (including multiple reset messages if we have multiple timeouts before the taskqueue runs the actual reset) so that we know when we reset. Reviewed by: jimharris@, chuck@ Differential Revision: https://reviews.freebsd.org/D19431	2019-03-09 01:18:16 +00:00
Warner Losh	95108cadbc	Add ABORTED_BY_REQUEST to the list of things we look at DNR bit and tell why to comment (code already does this)	2019-03-03 03:36:33 +00:00
Warner Losh	45d7e233a5	Unconditionally support unmapped BIOs. This was another shim for supporting older kernels. However, all supported versions of FreeBSD have unmapped I/Os (as do several that have gone EOL), remove it. It's unlikely the driver would work on the older kernels anyway at this point.	2019-02-27 22:16:59 +00:00
Warner Losh	d706306d49	Remove #ifdef code to support FreeBSD versions that haven't been supported in years. A number of changes have been made to the driver that likely wouldn't work on those older versions that aren't properly ifdef'd and it's project policy to GC such code once it is stale.	2019-02-27 22:05:01 +00:00
Alexander Motin	a646135771	Add descriptions to NVMe interrupts. MFC after: 1 month	2018-12-26 23:41:52 +00:00
Chuck Tuffli	9544e6dcf1	Make NVMe compatible with the original API The original NVMe API used bit-fields to represent fields in data structures defined by the specification (e.g. the op-code in the command data structure). The implementation targeted x86_64 processors and defined the bit fields for little endian dwords (i.e. 32 bits). This approach does not work as-is for big endian architectures and was changed to use a combination of bit shifts and masks to support PowerPC. Unfortunately, this changed the NVMe API and forces #ifdef's based on the OS revision level in user space code. This change reverts to something that looks like the original API, but it uses bytes instead of bit-fields inside the packed command structure. As a bonus, this works as-is for both big and little endian CPU architectures. Bump __FreeBSD_version to 1200081 due to API change Reviewed by: imp, kbowling, smh, mav Approved by: imp (mentor) Differential Revision: https://reviews.freebsd.org/D16404	2018-08-22 04:29:24 +00:00
Justin Hibbits	2e0090af65	nvme(4): Add bus_dmamap_sync() at the end of the request path Summary: Some architectures, in this case powerpc64, need explicit synchronization barriers vs device accesses. Prior to this change, when running 'make buildworld -j72' on a 18-core (72-thread) POWER9, I would see controller resets often. With this change, I don't see these resets messages, though another tester still does, for yet to be determined reasons, so this may not be a complete fix. Additionally, I see a ~5-10% speed up in buildworld times, likely due to not needing to reset the controller. Reviewed By: jimharris Differential Revision: https://reviews.freebsd.org/D16570	2018-08-03 20:04:06 +00:00
Alexander Motin	c6c70c0746	Fix use-after-free in nvme_qpair_destroy(). dma_tag_payload should not be destroyed before payload_dma_map, and seems it should be used there instead of dma_tag to match creation. Sponsored by: iXsystems, Inc.	2018-04-30 21:28:10 +00:00
Warner Losh	d85d964829	Try polling the qpairs on timeout. On some systems, we're getting timeouts when we use multiple queues on drives that work perfectly well on other systems. On a hunch, Jim Harris suggested I poll the completion queue when we get a timeout. This patch polls the completion queue if no fatal status was indicated. If it had pending I/O, we complete that request and return. Otherwise, if aborts are enabled and no fatal status, we abort the command and return. Otherwise we reset the card. This may clear up the problem, or we may see it result in lots of timeouts and a performance problem. Either way, we'll know the next step. We may also need to pay attention to the fatal status bit of the controller. PR: 211713 Suggested by: Jim Harris Sponsored by: Netflix	2018-03-16 05:23:48 +00:00
Alexander Motin	6b1a96b16b	Add new opcodes and statuses from NVMe 1.3a. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2018-03-11 06:30:09 +00:00
Wojciech Macek	0d787e9b35	NVMe: Add big-endian support Remove bitfields from defined structures as they are not portable. Instead use shift and mask macros in the driver and nvmecontrol application. NVMe is now working on powerpc64 host. Submitted by: Michal Stanek <mst@semihalf.com> Obtained from: Semihalf Reviewed by: imp, wma Sponsored by: IBM, QCM Technologies Differential revision: https://reviews.freebsd.org/D13916	2018-02-22 13:32:31 +00:00
Pedro F. Giffuni	718cf2ccb9	sys/dev: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.	2017-11-27 14:52:40 +00:00
Warner Losh	519772814d	Add CAM/NVMe support for CAM_DATA_SG This adds support in pass(4) for data to be described with a scatter-gather list (sglist) to augment the existing (single) virtual address. Differential Revision: https://reviews.freebsd.org/D11361 Submitted by: Chuck Tuffli Reviewed by: imp@, scottl@, kenm@	2017-08-29 15:29:57 +00:00
Warner Losh	824073fbd6	Avoid dereferencing unintialized elements in the error path. Some drives sometimes have errors for things like setting the number of queue entries in the submission queue. The error paths taken for these drives ensure a panic dereferencing uninialized data. Sponsored by: Netflix	2017-03-07 23:06:41 +00:00
Scott Long	a965389b5a	Convert the Q-Pair and PRP list memory allocations to use BUSDMA. Add a bunch of safery belts and error handling in related codepaths. Reviewed by: jimharris Obtained from: Netflix Differential Revision: D8453	2016-11-08 00:24:49 +00:00
Jim Harris	e5af5854ff	nvme: do not pre-allocate MSI-X IRQ resources The issue referenced here was resolved by other changes in recent commits, so this code is no longer needed. MFC after: 3 days Sponsored by: Intel	2016-01-07 16:11:31 +00:00
Jim Harris	3345ed9a55	nvme: use BUS_SPACE_MAXSIZE for bus_dma_tag_create maxsize parameter This fixes i386 PAE build fallout from r281281. Reported by: bz MFC after: 1 week	2015-04-09 00:37:55 +00:00
Jim Harris	36b0e4ee1f	nvme: remove CHATHAM related code Chatham was an internal NVMe prototype board used for early driver development. MFC after: 1 week Sponsored by: Intel	2015-04-08 21:52:06 +00:00
Jim Harris	a6e3096392	nvme: create separate DMA tag for non-payload DMA buffers Submission and completion queue memory need to use a separate DMA tag for mappings than payload buffers, to ensure mappings remain contiguous even with DMAR enabled. Submitted by: kib MFC after: 1 week Sponsored by: Intel	2015-04-08 21:49:45 +00:00
Jim Harris	f42ca756b9	nvme: Allocate all MSI resources up front so that we can fall back to INTx if necessary. Sponsored by: Intel MFC after: 3 days	2014-03-18 18:10:35 +00:00
Jim Harris	1416ef361e	nvme: NVMe specification dictates 4-byte alignment for PRPs (not 8). Sponsored by: Intel MFC after: 3 days	2014-03-17 22:37:17 +00:00
Jim Harris	e9efbc134f	Update copyright dates. MFC after: 3 days	2013-07-09 21:22:17 +00:00
Jim Harris	bbd412dd05	Remove remaining uio-related code. The nvme_physio() function was removed quite a while ago, which was the only user of this uio-related code. Sponsored by: Intel MFC after: 3 days	2013-06-26 23:37:11 +00:00
Jim Harris	7b68ae1e5e	Fail any passthrough command whose transfer size exceeds the controller's max transfer size. This guards against rogue commands coming in from userspace. Also add KASSERTS for the virtual address and unmapped bio cases, if the transfer size exceeds the controller's max transfer size. Sponsored by: Intel MFC after: 3 days	2013-06-26 23:32:45 +00:00
Jim Harris	8d09e3c400	Use MAXPHYS to specify the maximum I/O size for nvme(4). Also allow admin commands to transfer up to this maximum I/O size, rather than the artificial limit previously imposed. The larger I/O size is very beneficial for upcoming firmware download support. This has the added benefit of simplifying the code since both admin and I/O commands now use the same maximum I/O size. Sponsored by: Intel MFC after: 3 days	2013-06-26 23:27:17 +00:00
Jim Harris	ca269f32ef	Move the busdma mapping functions to nvme_qpair.c. This removes nvme_uio.c completely. Sponsored by: Intel	2013-04-12 17:48:45 +00:00
Jim Harris	e2b9900498	Do not panic when a busdma mapping operation fails. Instead, print an error message and fail the associated command with DATA_TRANSFER_ERROR NVMe completion status. Sponsored by: Intel	2013-04-12 17:34:49 +00:00
Jim Harris	5fdf9c3c8e	Add unmapped bio support to nvme(4) and nvd(4). Sponsored by: Intel	2013-04-01 16:23:34 +00:00
Jim Harris	1e526bc478	Add "type" to nvme_request, signifying if its payload is a VADDR, UIO, or NULL. This simplifies decisions around if/how requests are routed through busdma. It also paves the way for supporting unmapped bios. Sponsored by: Intel	2013-03-29 20:34:28 +00:00
Jim Harris	bdd1fd402c	Fix printf format issue on i386. Reported by: bz	2013-03-27 00:37:00 +00:00
Jim Harris	547d523eb8	Clean up debug prints. 1) Consistently use device_printf. 2) Make dump_completion and dump_command into something more human-readable. Sponsored by: Intel Reviewed by: carl	2013-03-26 22:17:10 +00:00
Jim Harris	237d2019e5	Change a number of malloc(9) calls to use M_WAITOK instead of M_NOWAIT. Sponsored by: Intel Suggested by: carl Reviewed by: carl	2013-03-26 22:11:34 +00:00
Jim Harris	43a3725688	Abort and do not retry any outstanding admin commands left over after a controller reset. Sponsored by: Intel Reviewed by: carl	2013-03-26 22:06:05 +00:00
Jim Harris	232e2edb6c	Add the ability to internally mark a controller as failed, if it is unable to start or reset. Also add a notifier for NVMe consumers for controller fail conditions and plumb this notifier for nvd(4) to destroy the associated GEOM disks when a failure occurs. This requires a bit of work to cover the races when a consumer is sending I/O requests to a controller that is transitioning to the failed state. To help cover this condition, add a task to defer completion of I/Os submitted to a failed controller, so that the consumer will still always receive its completions in a different context than the submission. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:58:38 +00:00
Jim Harris	3d7eb41c1b	Just disable the controller instead of deleting IO queues during detach. This is just as effective, and removes the need for a bunch of admin commands to a controller that's going to be disabled shortly anyways. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:48:41 +00:00
Jim Harris	cb5b7c1304	Cap the number of retry attempts to a configurable number. This ensures that if a specific I/O repeatedly times out, we don't retry it indefinitely. The default number of retries will be 4, but is adjusted using hw.nvme.retry_count. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:14:51 +00:00
Jim Harris	cf81529ce3	Create struct nvme_status. NVMe error log entries include status, so breaking this out into its own data structure allows it to be included in both the nvme_completion data structure as well as error log entry data structures. While here, expose nvme_completion_is_error(), and change all of the places that were explicitly looking at sc/sct bits to use this macro instead. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:00:18 +00:00
Jim Harris	f37c22a3bd	Make nvme_ctrlr_reset a nop if a reset is already in progress. This protects against cases where a controller crashes with multiple I/O outstanding, each timing out and requesting controller resets simultaneously. While here, remove a debugging printf from a previous commit, and add more logging around I/O that need to be resubmitted after a controller reset. Sponsored by: Intel Reviewed by: carl	2013-03-26 20:56:58 +00:00
Jim Harris	48ce317898	By default, always escalate to controller reset when an I/O times out. While aborts are typically cleaner than a full controller reset, many times an I/O timeout indicates other controller-level issues where aborts may not work. NVMe drivers for other operating systems are also defaulting to controller reset rather than aborts for timed out I/O. Sponsored by: Intel Reviewed by: carl	2013-03-26 20:32:57 +00:00
Jim Harris	941433323c	Add a tunable for the I/O timeout interval. Default is still 30 seconds, but can be adjusted between a min/max of 5 and 120 seconds. Sponsored by: Intel Reviewed by: carl	2013-03-26 20:02:35 +00:00
Jim Harris	12d191ec12	Add handling for controller fatal status (csts.cfs). On any I/O timeout, check for csts.cfs==1. If set, the controller is reporting fatal status and we reset the controller immediately, rather than trying to abort the timed out command. This changeset also includes deferring the controller start portion of the reset to a separate task. This ensures we are always performing a controller start operation from a consistent context. Sponsored by: Intel Reviewed by: carl	2013-03-26 19:58:17 +00:00
Jim Harris	b846efd7ec	Add controller reset capability to nvme(4) and ability to explicitly invoke it from nvmecontrol(8). Controller reset will be performed in cases where I/O are repeatedly timing out, the controller reports an unrecoverable condition, or when explicitly requested via IOCTL or an nvme consumer. Since the controller may be in such a state where it cannot even process queue deletion requests, we will perform a controller reset without trying to clean up anything on the controller first. Sponsored by: Intel Reviewed by: carl	2013-03-26 19:50:46 +00:00
Jim Harris	65c2474e6d	Keep a doubly-linked list of outstanding trackers. This enables in-order re-submission of I/O after a controller reset. Sponsored by: Intel	2013-03-26 18:45:16 +00:00
Jim Harris	0a0b08cc30	Enable asynchronous event requests on non-Chatham devices. Also add logic to clean up all outstanding asynchronous event requests when resetting or shutting down the controller, since these requests will not be explicitly completed by the controller itself. Sponsored by: Intel	2013-03-26 18:37:36 +00:00

1 2

68 Commits