freebsd-dev

Author	SHA1	Message	Date
Jim Harris	f42ca756b9	nvme: Allocate all MSI resources up front so that we can fall back to INTx if necessary. Sponsored by: Intel MFC after: 3 days	2014-03-18 18:10:35 +00:00
Jim Harris	496a27520d	nvme: Close hole where nvd(4) would not be notified of all nvme(4) instances if modules loaded during boot. Sponsored by: Intel MFC after: 3 days	2014-03-18 18:09:08 +00:00
Jim Harris	2b26030cbc	nvme: Remove the software progress marker SET_FEATURE command during controller initialization. The spec says OS drivers should send this command after controller initialization completes successfully, but other NVMe OS drivers are not sending this command. This change will therefore reduce differences between the FreeBSD and other OS drivers. Sponsored by: Intel MFC after: 3 days	2014-03-17 22:36:04 +00:00
Jim Harris	448cffc859	For IDENTIFY passthrough commands to Chatham prototype controllers, copy the spoofed identify data into the user buffer rather than issuing the command to the controller, since Chatham IDENTIFY data is always spoofed. While here, fix a bug in the spoofed data for Chatham submission and completion queue entry sizes. Sponsored by: Intel MFC after: 3 days	2014-01-06 23:51:26 +00:00
Jim Harris	d603c3d73b	Create a unique unit number for each controller and namespace cdev. Sponsored by: Intel MFC after: 3 days	2013-11-01 23:30:54 +00:00
Jim Harris	bb2f67fd72	Log and then disable asynchronous notification of persistent events after they occur. This prevents repeated notifications of the same event. Status of these events may be viewed at any time by viewing the SMART/Health Info Page using nvmecontrol, whether or not asynchronous events notifications for those events are enabled. This log page can be viewed using: nvmecontrol logpage -p 2 <ctrlr id> Future enhancements may re-enable these notifications on a periodic basis so that if the notified condition persists, it will continue to be logged. Sponsored by: Intel Reviewed by: carl Approved by: re (hrs) MFC after: 1 week	2013-10-08 16:00:12 +00:00
Jim Harris	d5fc982133	Do not enable temperature threshold as an asynchronous event notification on NVMe controllers that do not support it. Sponsored by: Intel Reviewed by: carl Approved by: re (hrs) MFC after: 1 week	2013-10-08 15:49:14 +00:00
Jim Harris	56183abc2b	Send a shutdown notification in the driver unload path, to ensure notification gets sent in cases where system shuts down with driver unloaded. Sponsored by: Intel Reviewed by: carl MFC after: 3 days	2013-08-13 21:47:08 +00:00
Jim Harris	8e0ac13f5a	Use pause() instead of DELAY() when polling for completion of admin commands during controller initialization. DELAY() does not work here during config_intrhook context - we need to explicitly relinquish the CPU for the admin command completion to get processed. Sponsored by: Intel Reported by: Adam Brooks <adam.j.brooks@intel.com> Reviewed by: carl MFC after: 3 days	2013-07-17 23:26:56 +00:00
Jim Harris	e9efbc134f	Update copyright dates. MFC after: 3 days	2013-07-09 21:22:17 +00:00
Jim Harris	ec526ea90b	Do not retry failed async event requests. Sponsored by: Intel MFC after: 3 days	2013-07-09 21:03:39 +00:00
Jim Harris	7b68ae1e5e	Fail any passthrough command whose transfer size exceeds the controller's max transfer size. This guards against rogue commands coming in from userspace. Also add KASSERTS for the virtual address and unmapped bio cases, if the transfer size exceeds the controller's max transfer size. Sponsored by: Intel MFC after: 3 days	2013-06-26 23:32:45 +00:00
Jim Harris	8d09e3c400	Use MAXPHYS to specify the maximum I/O size for nvme(4). Also allow admin commands to transfer up to this maximum I/O size, rather than the artificial limit previously imposed. The larger I/O size is very beneficial for upcoming firmware download support. This has the added benefit of simplifying the code since both admin and I/O commands now use the same maximum I/O size. Sponsored by: Intel MFC after: 3 days	2013-06-26 23:27:17 +00:00
Jim Harris	5076698e19	Remove the NVME_IDENTIFY_CONTROLLER and NVME_IDENTIFY_NAMESPACE IOCTLs and replace them with the NVMe passthrough equivalent. Sponsored by: Intel	2013-04-12 17:56:47 +00:00
Jim Harris	7c3f19d7bb	Add support for passthrough NVMe commands. This includes a new IOCTL to support a generic method for nvmecontrol(8) to pass IDENTIFY, GET_LOG_PAGE, GET_FEATURES and other commands to the controller, rather than separate IOCTLs for each. Sponsored by: Intel	2013-04-12 17:52:17 +00:00
Jim Harris	a90b810492	Rename the controller's fail_req_lock, so that it can be used for other locking operations on the controller. Sponsored by: Intel	2013-04-12 17:36:48 +00:00
Jim Harris	1e526bc478	Add "type" to nvme_request, signifying if its payload is a VADDR, UIO, or NULL. This simplifies decisions around if/how requests are routed through busdma. It also paves the way for supporting unmapped bios. Sponsored by: Intel	2013-03-29 20:34:28 +00:00
Jim Harris	bb852ae89b	Delete extra IO qpairs allocated based on number of MSI-X vectors, but later found to not be usable because the controller doesn't support the same number of queues. This is not the normal case, but does occur with the Chatham prototype board. Sponsored by: Intel	2013-03-28 16:54:19 +00:00
Jim Harris	547d523eb8	Clean up debug prints. 1) Consistently use device_printf. 2) Make dump_completion and dump_command into something more human-readable. Sponsored by: Intel Reviewed by: carl	2013-03-26 22:17:10 +00:00
Jim Harris	237d2019e5	Change a number of malloc(9) calls to use M_WAITOK instead of M_NOWAIT. Sponsored by: Intel Suggested by: carl Reviewed by: carl	2013-03-26 22:11:34 +00:00
Jim Harris	955910a916	Replace usages of mtx_pool_find used for admin commands with a polling mechanism. Now that all requests are timed, we are guaranteed to get a completion notification, even if it is an abort status due to a timed out admin command. This has the effect of simplifying the controller and namespace setup code, so that it reads straight through rather than broken up into a bunch of different callback functions. Sponsored by: Intel Reviewed by: carl	2013-03-26 22:09:51 +00:00
Jim Harris	232e2edb6c	Add the ability to internally mark a controller as failed, if it is unable to start or reset. Also add a notifier for NVMe consumers for controller fail conditions and plumb this notifier for nvd(4) to destroy the associated GEOM disks when a failure occurs. This requires a bit of work to cover the races when a consumer is sending I/O requests to a controller that is transitioning to the failed state. To help cover this condition, add a task to defer completion of I/Os submitted to a failed controller, so that the consumer will still always receive its completions in a different context than the submission. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:58:38 +00:00
Jim Harris	3d7eb41c1b	Just disable the controller instead of deleting IO queues during detach. This is just as effective, and removes the need for a bunch of admin commands to a controller that's going to be disabled shortly anyways. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:48:41 +00:00
Jim Harris	74019d4b67	Set Pre-boot Software Load Count to 0 at the end of the controller start process. The spec indicates the OS driver should use Set Features (Software Progress Marker) to set the pre-boot software load count to 0 after the OS driver has successfully been initialized. This allows pre-boot software to determine if there have been any issues with the OS loading. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:42:53 +00:00
Jim Harris	be34f21609	Remove the is_started flag from struct nvme_controller. This flag was originally added to communicate to the sysctl code which oids should be built, but there are easier ways to do this. This needs to be cleaned up prior to adding new controller states - for example, controller failure. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:19:26 +00:00
Jim Harris	02e3348484	Ensure the controller's MDTS is accounted for in max_xfer_size. The controller's IDENTIFY data contains MDTS (Max Data Transfer Size) to allow the controller to specify the maximum I/O data transfer size. nvme(4) already provides a default maximum, but make sure it does not exceed what MDTS reports. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:16:53 +00:00
Jim Harris	cb5b7c1304	Cap the number of retry attempts to a configurable number. This ensures that if a specific I/O repeatedly times out, we don't retry it indefinitely. The default number of retries will be 4, but is adjusted using hw.nvme.retry_count. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:14:51 +00:00
Jim Harris	0d7e13ecb2	Pass associated log page data to async event consumers, if requested. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:08:32 +00:00
Jim Harris	2868353a57	When an asynchronous event request is completed, automatically fetch the specified log page. This satisfies the spec condition that future async events of the same type will not be sent until the associated log page is fetched. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:05:15 +00:00
Jim Harris	cf81529ce3	Create struct nvme_status. NVMe error log entries include status, so breaking this out into its own data structure allows it to be included in both the nvme_completion data structure as well as error log entry data structures. While here, expose nvme_completion_is_error(), and change all of the places that were explicitly looking at sc/sct bits to use this macro instead. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:00:18 +00:00
Jim Harris	f37c22a3bd	Make nvme_ctrlr_reset a nop if a reset is already in progress. This protects against cases where a controller crashes with multiple I/O outstanding, each timing out and requesting controller resets simultaneously. While here, remove a debugging printf from a previous commit, and add more logging around I/O that need to be resubmitted after a controller reset. Sponsored by: Intel Reviewed by: carl	2013-03-26 20:56:58 +00:00
Jim Harris	48ce317898	By default, always escalate to controller reset when an I/O times out. While aborts are typically cleaner than a full controller reset, many times an I/O timeout indicates other controller-level issues where aborts may not work. NVMe drivers for other operating systems are also defaulting to controller reset rather than aborts for timed out I/O. Sponsored by: Intel Reviewed by: carl	2013-03-26 20:32:57 +00:00
Jim Harris	941433323c	Add a tunable for the I/O timeout interval. Default is still 30 seconds, but can be adjusted between a min/max of 5 and 120 seconds. Sponsored by: Intel Reviewed by: carl	2013-03-26 20:02:35 +00:00
Jim Harris	12d191ec12	Add handling for controller fatal status (csts.cfs). On any I/O timeout, check for csts.cfs==1. If set, the controller is reporting fatal status and we reset the controller immediately, rather than trying to abort the timed out command. This changeset also includes deferring the controller start portion of the reset to a separate task. This ensures we are always performing a controller start operation from a consistent context. Sponsored by: Intel Reviewed by: carl	2013-03-26 19:58:17 +00:00
Jim Harris	dbba74428b	Add API for nvme consumers to access controller and namespace identify data. Sponsored by: Intel Reviewed by: carl	2013-03-26 19:52:57 +00:00
Jim Harris	b846efd7ec	Add controller reset capability to nvme(4) and ability to explicitly invoke it from nvmecontrol(8). Controller reset will be performed in cases where I/O are repeatedly timing out, the controller reports an unrecoverable condition, or when explicitly requested via IOCTL or an nvme consumer. Since the controller may be in such a state where it cannot even process queue deletion requests, we will perform a controller reset without trying to clean up anything on the controller first. Sponsored by: Intel Reviewed by: carl	2013-03-26 19:50:46 +00:00
Jim Harris	038a5ee403	Add an interface for nvme shim drivers (i.e. nvd) to register for notifications when new nvme controllers are added to the system. Sponsored by: Intel	2013-03-26 18:39:54 +00:00
Jim Harris	0a0b08cc30	Enable asynchronous event requests on non-Chatham devices. Also add logic to clean up all outstanding asynchronous event requests when resetting or shutting down the controller, since these requests will not be explicitly completed by the controller itself. Sponsored by: Intel	2013-03-26 18:37:36 +00:00
Jim Harris	990e741c18	Move controller destruction code from nvme_detach() to new nvme_ctrlr_destruct() function. Sponsored by: Intel	2013-03-26 18:34:19 +00:00
David E. O'Brien	4b52061e17	Fix GCC build: /usr/src/sys/modules/nvme/../../dev/nvme/nvme.c:211: warning: format '%qx' expects type 'long unsigned int', but argument 9 has type 'long long unsigned int' [-Wformat]	2013-03-07 22:54:28 +00:00
Jim Harris	91fe20e34d	Map BAR 4/5, because NVMe spec says devices may place the MSI-X table behind BAR 4/5, rather than in BAR 0/1 with the control/doorbell registers. Sponsored by: Intel	2012-12-18 23:27:18 +00:00
Jim Harris	4d6abcb19f	Do not use taskqueue to defer completion work when using INTx. INTx now matches MSI-X behavior. Sponsored by: Intel	2012-12-18 21:50:48 +00:00
Jim Harris	21b6da584b	Preallocate a limited number of nvme_tracker objects per qpair, rather than dynamically creating them at runtime. Sponsored by: Intel	2012-10-18 00:44:39 +00:00
Jim Harris	5ae9ed6811	Create nvme_qpair_submit_request() which eliminates all of the code duplication between the admin and io controller-level submit functions. Sponsored by: Intel	2012-10-18 00:43:25 +00:00
Jim Harris	c2e83b404f	Simplify how the qpair lock is acquired and released. Sponsored by: Intel	2012-10-18 00:41:31 +00:00
Jim Harris	5fa5cc5f12	Cleanup uio-related code to use struct nvme_request and nvme_ctrlr_submit_io_request(). While here, also fix case where a uio may have more than 1 iovec. NVMe's definition of SGEs (called PRPs) only allows for the first SGE to start on a non-page boundary. The simplest way to handle this is to construct a temporary uio for each iovec, and submit an NVMe request for each. Sponsored by: Intel	2012-10-18 00:40:40 +00:00
Jim Harris	d281e8fbbd	Add nvme_ctrlr_submit_[admin\|io]_request functions which consolidates code for allocating nvme_tracker objects and making calls into bus_dmamap_load for commands which have payloads. Sponsored by: Intel	2012-10-18 00:39:29 +00:00
Jim Harris	8a382371f1	Add #if 0 around nvme_async_event_cb() until NVMe AER functionality can be tested. This fixes a build warning found only with clang.	2012-09-18 18:23:21 +00:00
Jim Harris	bb0ec6b359	This is the first of several commits which will add NVM Express (NVMe) support to FreeBSD. A full description of the overall functionality being added is below. nvmexpress.org defines NVM Express as "an optimized register interface, command set and feature set fo PCI Express (PCIe)-based Solid-State Drives (SSDs)." This commit adds nvme(4) and nvd(4) driver source code and Makefiles to the tree. Full NVMe functionality description: Add nvme(4) and nvd(4) drivers and nvmecontrol(8) for NVM Express (NVMe) device support. There will continue to be ongoing work on NVM Express support, but there is more than enough to allow for evaluation of pre-production NVM Express devices as well as soliciting feedback. Questions and feedback are welcome. nvme(4) implements NVMe hardware abstraction and is a provider of NVMe namespaces. The closest equivalent of an NVMe namespace is a SCSI LUN. nvd(4) is an NVMe consumer, surfacing NVMe namespaces as GEOM disks. nvmecontrol(8) is used for NVMe configuration and management. The following are currently supported: nvme(4) - full mandatory NVM command set support - per-CPU IO queues (enabled by default but configurable) - per-queue sysctls for statistics and full command/completion queue dumps for debugging - registration API for NVMe namespace consumers - I/O error handling (except for timeoutsee below) - compilation switches for support back to stable-7 nvd(4) - BIO_DELETE and BIO_FLUSH (if supported by controller) - proper BIO_ORDERED handling nvmecontrol(8) - devlist: list NVMe controllers and their namespaces - identify: display controller or namespace identify data in human-readable or hex format - perftest: quick and dirty performance test to measure raw performance of NVMe device without userspace/physio/GEOM overhead The following are still work in progress and will be completed over the next 3-6 months in rough priority order: - complete man pages - firmware download and activation - asynchronous error requests - command timeout error handling - controller resets - nvmecontrol(8) log page retrieval This has been primarily tested on amd64, with light testing on i386. I would be happy to provide assistance to anyone interested in porting this to other architectures, but am not currently planning to do this work myself. Big-endian and dmamap sync for command/completion queues are the main areas that would need to be addressed. The nvme(4) driver currently has references to Chatham, which is an Intel-developed prototype board which is not fully spec compliant. These references will all be removed over time. Sponsored by: Intel Contributions from: Joe Golio/EMC <joseph dot golio at emc dot com>	2012-09-17 19:23:01 +00:00

49 Commits