freebsd-nq

Author	SHA1	Message	Date
Alexander Motin	a6d222eb68	Add more random bits from NVMe 1.4. MFC after: 2 weeks	2019-08-03 02:36:35 +00:00
Alexander Motin	6c99d1325e	Decode few more NVMe log pages. In particular: Changed Namespace List, Commands Supported and Effects, Reservation Notification, Sanitize Status. Add few new arguments to `nvmecontrol log` subcommand. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2019-08-02 20:16:21 +00:00
Alexander Motin	a7bf63be69	Add IOCTL to translate nvdX into nvmeY and NSID. While very useful by itself, it also makes `nvmecontrol` not depend on hardcoded device names parsing, that in its turn makes simple to take nvdX (and potentially any other) device names as arguments. Also added IOCTL bypass from nvdX to respective nvmeYnsZ makes them interchangeable for management purposes. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2019-08-01 21:44:07 +00:00
Warner Losh	08a607e0f3	Widen the type for to. The timeout field in the CAPS register is defined to be 8 bits, so its type was uint8_t. We recently started adding 1 to it to cope with rogue devices that listed 0 timeout time (which is impossible). However, in so doing, other devices that list 0xff (for a 2 minute timeout) were broken when adding 1 overflowed. Widen the type to be uint32_t like its source register to avoid the issue. Reported by: bapt@	2019-07-25 20:26:21 +00:00
Warner Losh	62d2cf1847	Provide macros to extract the sub-fields of the CAP_LO and CAP_HI registers. These macros make places where we extract these easier to read. The shift and mask stuff is also a bit tedious and error prone. Start with the CAP_LO and CAP_HI registers since their scope is somewhat constrained. This is style chagne only, no functional changes. Reviewed by: chuck Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20979	2019-07-18 15:41:10 +00:00
Warner Losh	dc9df3a59d	Assume that the timeout value from the capacity is 1-based Neither the 1.3 or 1.4 standards say this number is 1's based, but adding 1 costs little and copes with those NVMe drives that report '0' in this field cheaply. This is consistent with what the Linux driver does as well.	2019-07-16 22:55:30 +00:00
Warner Losh	9835d216d8	rename nvme_ctrlr_destroy_qpair to nvme_ctrlr_destroy_qpairs Maintain symmetry with nvme_ctrlr_create_qpairs, making it easier to match init/uninit scenarios. Signed-off-by: John Meneghini <johnm@netapp.com> Submitted by: Michael Hordijk <hordijk@netapp.com> Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D19781	2019-05-08 20:18:11 +00:00
Warner Losh	2ffd6fce5b	Don't print all the I/O we abort on a reset, unless we're out of retries. When resetting the controller, we abort I/O. Prior to this fix, we printed a ton of abort messages for I/O that we're going to retry. This imparts no useful information. Stop printing them unless our retry count is exhausted. Clarify code for when we don't retry, and remove useless arg to a routine that's always called with it as 'true'. All the other debug is still printed (including multiple reset messages if we have multiple timeouts before the taskqueue runs the actual reset) so that we know when we reset. Reviewed by: jimharris@, chuck@ Differential Revision: https://reviews.freebsd.org/D19431	2019-03-09 01:18:16 +00:00
Warner Losh	45d7e233a5	Unconditionally support unmapped BIOs. This was another shim for supporting older kernels. However, all supported versions of FreeBSD have unmapped I/Os (as do several that have gone EOL), remove it. It's unlikely the driver would work on the older kernels anyway at this point.	2019-02-27 22:16:59 +00:00
Gleb Smirnoff	756a541279	Allocate pager bufs from UMA instead of 80-ish mutex protected linked list. o In vm_pager_bufferinit() create pbuf_zone and start accounting on how many pbufs are we going to have set. In various subsystems that are going to utilize pbufs create private zones via call to pbuf_zsecond_create(). The latter calls uma_zsecond_create(), and sets a limit on created zone. After startup preallocate pbufs according to requirements of all pbuf zones. Subsystems that used to have a private limit with old allocator now have private pbuf zones: md(4), fusefs, NFS client, smbfs, VFS cluster, FFS, swap, vnode pager. The following subsystems use shared pbuf zone: cam(4), nvme(4), physio(9), aio(4). They should have their private limits, but changing that is out of scope of this commit. o Fetch tunable value of kern.nswbuf from init_param2() and while here move NSWBUF_MIN to opt_param.h and eliminate opt_swap.h, that was holding only this option. Default values aren't touched by this commit, but they probably should be reviewed wrt to modern hardware. This change removes a tight bottleneck from sendfile(2) operation, that uses pbufs in vnode pager. Other pagers also would benefit from faster allocation. Together with: gallatin Tested by: pho	2019-01-15 01:02:16 +00:00
Warner Losh	91182bcfb6	Even though they are reserved, cdw2 and cdw3 can be set via nvme-cli (and soon nvmecontrol). Go ahead and copy them into rsvd2 and rsvd3. Sponsored by: Netflix	2018-12-07 21:58:08 +00:00
Chuck Tuffli	9544e6dcf1	Make NVMe compatible with the original API The original NVMe API used bit-fields to represent fields in data structures defined by the specification (e.g. the op-code in the command data structure). The implementation targeted x86_64 processors and defined the bit fields for little endian dwords (i.e. 32 bits). This approach does not work as-is for big endian architectures and was changed to use a combination of bit shifts and masks to support PowerPC. Unfortunately, this changed the NVMe API and forces #ifdef's based on the OS revision level in user space code. This change reverts to something that looks like the original API, but it uses bytes instead of bit-fields inside the packed command structure. As a bonus, this works as-is for both big and little endian CPU architectures. Bump __FreeBSD_version to 1200081 due to API change Reviewed by: imp, kbowling, smh, mav Approved by: imp (mentor) Differential Revision: https://reviews.freebsd.org/D16404	2018-08-22 04:29:24 +00:00
Alexander Motin	f439e3a4ff	Refactor NVMe CAM integration. - Remove layering violation, when NVMe SIM code accessed CAM internal device structures to set pointers on controller and namespace data. Instead make NVMe XPT probe fetch the data directly from hardware. - Cleanup NVMe SIM code, fixing support for multiple namespaces per controller (reporting them as LUNs) and adding controller detach support and run-time namespace change notifications. - Add initial support for namespace change async events. So far only in CAM mode, but it allows run-time namespace arrival and departure. - Add missing nvme_notify_fail_consumers() call on controller detach. Together with previous changes this allows NVMe device detach/unplug. Non-CAM mode still requires a lot of love to stay on par, but at least CAM mode code should not stay in the way so much, becoming much more self-sufficient. Reviewed by: imp MFC after: 1 month Sponsored by: iXsystems, Inc.	2018-05-25 03:34:33 +00:00
Alexander Motin	c252f63740	Fix LOR between controller and queue locks. Admin pass-through requests took controller lock before the queue lock, but in case of request submission to a failed controller controller lock was taken after the queue lock. Fix that by reducing the lock scopes and switching to mtx_pool locks to track pass-through request completion. Sponsored by: iXsystems, Inc.	2018-05-02 20:13:03 +00:00
Alexander Motin	e134ecdcfc	Improve nvme(4) attach/detach sequences. This change allows clean device detach on attach failures and driver unload, while previous code tried to talk to already shut down controller, or even accessed resources failed to allocate. Sponsored by: iXsystems, Inc.	2018-04-30 23:05:57 +00:00
Warner Losh	5d7fd8f726	Fix error messages in cut and pasted code. Also, fix an unnecessary deref to get ctrlr. Noticed by: rpokala@ Sponsored by: Netflix	2018-03-14 23:28:28 +00:00
Warner Losh	8b1e6ebe0e	When tearing down a queue pair, also delete the queue entries. The NVME standard has required in section 7.2.6, since at least 1.1, that a clean shutdown is signalled by deleting the subission and the completion queues before setting the shutdown bit in CC. The 1.0 standard, apparently, did not and many of the early Intel cards didn't care. Some newer cards care, at least one whose beta firmware can scramble the card on an unclean shutdown. Linux has done this for some time. To make it possible to move forward with an evaluation of this pre-release card with wonky firmware, delete the queues on the card when we delete the qpair structures. Sponsored by: Netflix	2018-03-14 23:01:18 +00:00
Wojciech Macek	0d787e9b35	NVMe: Add big-endian support Remove bitfields from defined structures as they are not portable. Instead use shift and mask macros in the driver and nvmecontrol application. NVMe is now working on powerpc64 host. Submitted by: Michal Stanek <mst@semihalf.com> Obtained from: Semihalf Reviewed by: imp, wma Sponsored by: IBM, QCM Technologies Differential revision: https://reviews.freebsd.org/D13916	2018-02-22 13:32:31 +00:00
Warner Losh	29077eb456	Use atomic load and stores to ensure that the compiler doesn't optimize away these loops. Change boolean to int to match what atomic API supplies. Remove wmb() since the atomic_store_rel() on status.done ensure the prior writes to status. It also fixes the fact that there wasn't a rmb() before reading done. This should also be more efficient since wmb() is fairly heavy weight. Sponsored by: Netflix Reviewed by: kib@, jim harris Differential Revision: https://reviews.freebsd.org/D14053	2018-01-29 00:00:52 +00:00
Warner Losh	989c7f0b7c	Although we only have one quirk at the moment, guard against the day we have more than one by checking the actual quirk bit before delaying the reset. Noticed by: rpokala@	2017-12-18 20:11:21 +00:00
Warner Losh	ce1ec9c178	When we're disabling the nvme device, some drives have a controller bug that requires 'hands off' for a period of time (2.3s) before we check the RDY bit. Sicne this is a very odd quirk for a very limited selection of drives, do this as a quirk. This prevented a successful reset of the card when the card wedged. Also, make sure that we comply with the advice from section 3.1.5 of the 1.3 spec says that transitioning CC.EN from 0 to 1 when CSTS.RDY is 1 or transitioning CC.EN from 1 to 0 when CSTS.RDY is 0 "has undefined results". Short circuit when EN == RDY == desired state. Finally, fail the reset if the disable fails. This will lead to a failed device, which is what we want. (note: nda device needs work for coping with a failed device). Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D13389	2017-12-18 18:38:00 +00:00
Pedro F. Giffuni	718cf2ccb9	sys/dev: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.	2017-11-27 14:52:40 +00:00
Warner Losh	bb1c7be429	Create general polling function for the nvme controller. Use it when we're doing the various pin-based interrupt modes. Adjust nvme_ctrlr_intx_handler to use nvme_ctrlr_poll. Sponsored by: Netflix Suggested by: scottl@	2017-10-15 16:18:08 +00:00
Warner Losh	5fff95cc1d	Fix queue depth for nda. 1/4 of the number of queues times queue entries is too limiting. It works up to about 4k IOPS / 3.0GB/s for hardware that can do 4.4k/3.2GB/s with nvd. 3/4 works better, though it highlights issues in the fairness of nda's choice of TRIM vs READ. That will be fixed separately.	2017-09-20 21:42:25 +00:00
Warner Losh	c02565f9fa	Set the max transactions for NVMe drives better. Provided a better estimate for the number of transactions that can be pending at one time. This will be number of queues * number of trackers / 4, as suggested by Jim Harris. This gives a better estimate of the number of transactions that CAM should queue before applying back pressure. This should be revisted when we have real multi-queue support in CAM and the upper layers of the I/O stack. Sponsored by: Netflix	2017-08-28 23:54:20 +00:00
Warner Losh	696c950297	NVME Namespace ID is 32-bits, so widen interface to reflect that. Sponsored by: Netflix	2017-08-25 21:38:38 +00:00
Warner Losh	824073fbd6	Avoid dereferencing unintialized elements in the error path. Some drives sometimes have errors for things like setting the number of queue entries in the submission queue. The error paths taken for these drives ensure a panic dereferencing uninialized data. Sponsored by: Netflix	2017-03-07 23:06:41 +00:00
Warner Losh	a8a18dd590	Make multi-namespace nvme drives more robust. Fix assumptions about name spaces in NVME driver. First, it assumes cdata.nn is the number of configured devices. However, it is the number of supported name spaces. Second, it assumes that there will never be more than 16 name spaces supported, but a certain drive I'm testing reports 1024. It assumes that name spaces are a tightly packed namespace, but the standard seems to indicate otherwise. Finally, it assumes that an error would be generated when quearying an unconfigured namespace. Instead, it succeeds but the identify data is all zeros. Fix these by limiting the number of name spaces we probe to 16. Remove aborting when we find one in error. When the size of the name space is zero, ignore it. This is admittedly a bandaide. The long term fix will be to participate in the enumeration and name space change protocols definfed in the NVNe standard. Sponsored by: Netflix	2017-03-07 21:47:54 +00:00
Warner Losh	a3a6c48d66	Ensure that the passthrough request will fit in MAXPHYS bytes after it has been rounded to full pages. This avoids a panic in vm_fault_quick_hold_pages due to this off-by-one error passing one page too many into vmapbuf.	2017-02-02 23:04:06 +00:00
Scott Long	a965389b5a	Convert the Q-Pair and PRP list memory allocations to use BUSDMA. Add a bunch of safery belts and error handling in related codepaths. Reviewed by: jimharris Obtained from: Netflix Differential Revision: D8453	2016-11-08 00:24:49 +00:00
Warner Losh	f24c011beb	Commit the bits of nda that were missed. This should fix the build. Approved by: re@	2016-06-10 06:04:53 +00:00
Jim Harris	361e1fb408	nvme: fix intx handler to not dereference ioq during initialization This was a regression from r293328, which deferred allocation of the controller's ioq array until after interrupts are enabled during boot. PR: 207432 Reported and tested by: Andy Carrel <wac@google.com> MFC after: 3 days Sponsored by: Intel	2016-02-24 00:01:10 +00:00
Justin Hibbits	43cd61606b	Replace several bus_alloc_resource() calls using default arguments with bus_alloc_resource_any() Since these calls only use default arguments, bus_alloc_resource_any() is the right call. Differential Revision: https://reviews.freebsd.org/D5306	2016-02-19 03:37:56 +00:00
Jim Harris	7b036d7790	nvme: avoid duplicate SET_NUM_QUEUES commands nvme(4) issues a SET_NUM_QUEUES command during device initialization to ensure enough I/O queues exists for each of the MSI-X vectors we have allocated. The SET_NUM_QUEUES command is then issued again during nvme_ctrlr_start(), to ensure that is properly set after any controller reset. At least one NVMe drive exists which fails this second SET_NUM_QUEUES command during device initialization. So change nvme_ctrlr_start() to only issue its SET_NUM_QUEUES command when it is coming out of a reset - avoiding the duplicate SET_NUM_QUEUES during device initialization. Reported by: gallatin MFC after: 3 days Sponsored by: Intel	2016-02-11 17:32:41 +00:00
Jim Harris	9c6b5d40eb	nvme: replace NVME_CEILING macro with howmany() Suggested by: rpokala MFC after: 3 days	2016-01-07 20:35:26 +00:00
Jim Harris	50dea2da12	nvme: add hw.nvme.min_cpus_per_ioq tunable Due to FreeBSD system-wide limits on number of MSI-X vectors (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321), it may be desirable to allocate fewer than the maximum number of vectors for an NVMe device, in order to save vectors for other devices (usually Ethernet) that can take better advantage of them and may be probed after NVMe. This tunable is expressed in terms of minimum number of CPUs per I/O queue instead of max number of queues per controller, to allow for a more even distribution of CPUs per queue. This avoids cases where some number of CPUs have a dedicated queue, but other CPUs need to share queues. Ideally the PR referenced above will eventually be fixed and the mechanism implemented here becomes obsolete anyways. While here, fix a bug in the CPUs per I/O queue calculation to properly account for the admin queue's MSI-X vector. Reviewed by: gallatin MFC after: 3 days Sponsored by: Intel	2016-01-07 20:32:04 +00:00
Jim Harris	2b647da7a0	nvme: do not revert o single I/O queue when per-CPU queues not possible Previously nvme(4) would revert to a signle I/O queue if it could not allocate enought interrupt vectors or NVMe submission/completion queues to have one I/O queue per core. This patch determines how to utilize a smaller number of available interrupt vectors, and assigns (as closely as possible) an equal number of cores to each associated I/O queue. MFC after: 3 days Sponsored by: Intel	2016-01-07 16:18:32 +00:00
Jim Harris	d400f790b1	nvme: break out interrupt setup code into a separate function MFC after: 3 days Sponsored by: Intel	2016-01-07 16:12:42 +00:00
Jim Harris	e5af5854ff	nvme: do not pre-allocate MSI-X IRQ resources The issue referenced here was resolved by other changes in recent commits, so this code is no longer needed. MFC after: 3 days Sponsored by: Intel	2016-01-07 16:11:31 +00:00
Jim Harris	c75ad8ce5a	nvme: remove per_cpu_io_queues from struct nvme_controller Instead just use num_io_queues to make this determination. This prepares for some future changes enabling use of multiple queues when we do not have enough queues or MSI-X vectors for one queue per CPU. MFC after: 3 days Sponsored by: Intel	2016-01-07 16:09:56 +00:00
Jim Harris	d85f84abb8	nvme: simplify some of the nested ifs in interrupt setup code This prepares for some follow-up commits which do more work in this area. MFC after: 3 days Sponsored by: Intel	2016-01-07 16:08:04 +00:00
Jeff Roberson	fade8dd714	Refactor unmapped buffer address handling. - Use pointer assignment rather than a combination of pointers and flags to switch buffers between unmapped and mapped. This eliminates multiple flags and generally simplifies the logic. - Eliminate b_saveaddr since it is only used with pager bufs which have their b_data re-initialized on each allocation. - Gather up some convenience routines in the buffer cache for manipulating buf space and buf malloc space. - Add an inline, buf_mapped(), to standardize checks around unmapped buffers. In collaboration with: mlaier Reviewed by: kib Tested by: pho (many small revisions ago) Sponsored by: EMC / Isilon Storage Division	2015-07-23 19:13:41 +00:00
Jim Harris	cbdec09c1c	nvme: ensure csts.rdy bit is cleared before returning from nvme_ctrlr_disable PR: 200458 MFC after: 3 days Sponsored by: Intel	2015-07-23 15:50:39 +00:00
Jim Harris	de9a58f4ee	nvme: properly handle case where pci_alloc_msix does not alloc all vectors Reported by: Sean Kelly <smkelly@smkelly.org> MFC after: 3 days Sponsored by: Intel	2015-07-23 15:35:08 +00:00
Jim Harris	36b0e4ee1f	nvme: remove CHATHAM related code Chatham was an internal NVMe prototype board used for early driver development. MFC after: 1 week Sponsored by: Intel	2015-04-08 21:52:06 +00:00
Jim Harris	e5ce537999	nvme: fall back to a smaller MSI-X vector allocation if necessary Previously, if per-CPU MSI-X vectors could not be allocated, nvme(4) would fall back to INTx with a single I/O queue pair. This change will still fall back to a single I/O queue pair, but allocate MSI-X vectors instead of reverting to INTx. MFC after: 1 week Sponsored by: Intel	2015-04-08 21:46:18 +00:00
Jim Harris	f42ca756b9	nvme: Allocate all MSI resources up front so that we can fall back to INTx if necessary. Sponsored by: Intel MFC after: 3 days	2014-03-18 18:10:35 +00:00
Jim Harris	496a27520d	nvme: Close hole where nvd(4) would not be notified of all nvme(4) instances if modules loaded during boot. Sponsored by: Intel MFC after: 3 days	2014-03-18 18:09:08 +00:00
Jim Harris	2b26030cbc	nvme: Remove the software progress marker SET_FEATURE command during controller initialization. The spec says OS drivers should send this command after controller initialization completes successfully, but other NVMe OS drivers are not sending this command. This change will therefore reduce differences between the FreeBSD and other OS drivers. Sponsored by: Intel MFC after: 3 days	2014-03-17 22:36:04 +00:00
Jim Harris	448cffc859	For IDENTIFY passthrough commands to Chatham prototype controllers, copy the spoofed identify data into the user buffer rather than issuing the command to the controller, since Chatham IDENTIFY data is always spoofed. While here, fix a bug in the spoofed data for Chatham submission and completion queue entry sizes. Sponsored by: Intel MFC after: 3 days	2014-01-06 23:51:26 +00:00

1 2

95 Commits