freebsd-skq

Author	SHA1	Message	Date
imp	02feab2c54	nvme: Remove a wmb() that's not necessary. bus_dmamap_sync() ensures that memory that's prepared for PREWRITE can be DMA'd immediately after it returns. The details differ, but this mirrors atomic thread release semantics, at least for the buffers synced. For non-x86 platforms, bus_dmamap_sync() has the right syncing and fences. So in the past, wmb() had been omitted for them. For x86 platforms, the memory ordering is already strong enough to ensure DMA to the device sees the current contents. As such, we don't need the wmb() here. It translates to an sfence which is only needed for writes to regions that have the write combining attribute set or when some exotic opcodes are used. The nvme driver does neither of these. Since bus_dmamap_sync() includes atomic_thread_fence_rel, we can be assured any optimizer won't reorder the bus_dmamap_sync and the bus_space_write operations. The wmb() was a vestiage of the pre-busdma version initially committed to the tree. Reviewed by: kib@, gallatin@, chuck@, mav@ Differential Revision: https://reviews.freebsd.org/D27448	2020-12-04 21:34:48 +00:00
mmel	970d081572	NVME: Multiple busdma related fixes. - in nvme_qpair_process_completions() do dma sync before completion buffer is used. - in nvme_qpair_submit_tracker(), don't do explicit wmb() also for arm and arm64. Bus_dmamap_sync() on these architectures is sufficient to ensure that all CPU stores are visible to external (including DMA) observers. - Allocate completion buffer as BUS_DMA_COHERENT. On not-DMA coherent systems, buffers continuously owned (and accessed) by DMA must be allocated with this flag. Note that BUS_DMA_COHERENT flag is no-op on DMA coherent systems (or coherent buses in mixed systems). MFC after: 4 weeks Reviewed by: mav, imp Differential Revision: https://reviews.freebsd.org/D27446	2020-12-02 16:54:24 +00:00
chuck	aae2f76c43	nvme: Fix typo in definition Change occurrences of "selt test" to "self tests in the NVMe header file. Reviewed by: imp, mav MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D27439	2020-12-02 15:59:08 +00:00
mmel	135aeef1ec	Always use the __unused attribute even for potentially unused parameters. Requested by: ian, imp MFC with: r368167	2020-12-01 08:52:13 +00:00
mmel	7ce0a5ef8a	Unbreak r368167 in userland. Decorate unused arguments. Reported by: kp, tuexen, jenkins, and many others MFC with: r368167	2020-11-30 14:51:48 +00:00
mmel	985c152bf8	NVME: Don't try to swap data on little endian machines. These swapping functions violate BUSDMA contract - we cannot write to armed (by bus_dmamap_sync(PRE_..)) buffers. Remove them at least from little endian machines until a better solution will be developed. Reviewed by: imp MFC after: 3 weeks	2020-11-30 07:01:12 +00:00
mav	b1a18877d1	Remove aligment requirements for passthrough buffer. After r368124 vmapbuf() should happily map misaligned maxphys-sized buffers thanks to extra page added to pbuf_zone.	2020-11-29 00:57:19 +00:00
mav	cc27bf440d	Increase nvme(4) maximum transfer size from 1MB to 2MB. With 4KB page size the 2MB is the maximum we can address with one page PRP. Going further would require chaining, that would add some more complexity. On the other side, to reduce memory consumption, allocate the PRP memory respecting maximum transfer size reported in the controller identify data. Many of NVMe devices support much smaller values, starting from 128KB. To do that we have to change the initialization sequence to pull the data earlier, before setting up the I/O queue pairs. The admin queue pair is still allocated for full MIN(maxphys, 2MB) size, but it is not a big deal, since there is only one such queue with only 16 trackers. Reviewed by: imp MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2020-11-29 00:20:31 +00:00
kib	ca7c13e470	Make MAXPHYS tunable. Bump MAXPHYS to 1M. Replace MAXPHYS by runtime variable maxphys. It is initialized from MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys. Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer cache buffers exactly to atop(maxbcachebuf) (currently it is sized to atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1. The +1 for pbufs allow several pbuf consumers, among them vmapbuf(), to use unaligned buffers still sized to maxphys, esp. when such buffers come from userspace (). Overall, we save significant amount of otherwise wasted memory in b_pages[] for buffer cache buffers, while bumping MAXPHYS to desired high value. Eliminate all direct uses of the MAXPHYS constant in kernel and driver sources, except a place which initialize maxphys. Some random (and arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted straight. Some drivers, which use MAXPHYS to size embeded structures, get private MAXPHYS-like constant; their convertion is out of scope for this work. Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs, dev/siis, where either submitted by, or based on changes by mav. Suggested by: mav () Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27225	2020-11-28 12:12:51 +00:00
mmel	f34454d248	Ensure that the buffer is in nvme_single_map() mapped to single segment. Not a functional change. MFC after: 1 week	2020-11-23 14:30:22 +00:00
mav	ba953d1e7d	Add PMRCAP printing and fix earlier CAP_HI. MFC after: 3 days	2020-11-14 01:45:34 +00:00
mav	56b2971c7c	Fix panic if NVMe is detached before the intrhook call. MFC after: 1 week Sponsored by: iXsystems, Inc.	2020-11-12 20:20:43 +00:00
mjg	1280af460f	nvme: change namei_request_zone into a malloc type Both the size (128 bytes) and ephemeral nature of allocations make it a great fit for malloc. A dedicated zone unnecessarily avoids sharing buckets with 128-byte objects. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D27103	2020-11-05 21:44:58 +00:00
mav	1cd599537c	Fix unintentional constant rename in r367109. MFC after: 1 week	2020-10-28 18:22:25 +00:00
mav	e157fb4acd	Print NVMe controller capabilities in verbose dmesg. Those values are not reported in controller identification, while sometimes interesting for development and debugging. MFC after: 1 week	2020-10-28 15:43:29 +00:00
imp	3f7e39ba85	nvme: Remove compat code for older kernels Remove code that supported pre-2011 kernels. CTLTYPE_S64 was defined in rev 217616. All supported branches have it, so remove its compat definition as OBE.	2020-10-24 01:59:01 +00:00
brooks	7d8530e564	vmapbuf: don't smuggle address or length in buf Instead, add arguments to vmapbuf. Since this argument is always a pointer use a type of void * and cast to vm_offset_t in vmapbuf. (In CheriBSD we've altered vm_fault_quick_hold_pages to take a pointer and check its bounds.) In no other situtation does b_data contain a user pointer and vmapbuf replaces b_data with the actual mapping. Suggested by: jhb Reviewed by: imp, jhb Obtained from: CheriBSD MFC after: 1 week Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D26784	2020-10-21 16:00:15 +00:00
mav	56d097d368	Use RTD3 Entry Latency value as shutdown timeout. This field was not in specs when the driver was written, but now there are SSDs with the reported latency of 10s, where hardcoded value of 5s seems to be not enough sometimes, causing shutdown timeout messages. MFC after: 1 week Sponsored by: iXsystems, Inc.	2020-10-14 15:50:28 +00:00
dab	b7b729ccb8	Add an ioctl to get an NVMe device's maximum transfer size Reviewed by: imp, chuck Obtained from: Dell EMC Isilon MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D26390	2020-09-21 15:41:47 +00:00
mjg	1748821a48	nvme: clean up empty lines in .c and .h files	2020-09-01 22:03:10 +00:00
imp	f7f8993035	Use symbolic names for asych events Rather than \|= 0x300, define and use asyn event names for the name space changes and the firmware activations that we're asking for.	2020-08-31 19:38:03 +00:00
mav	0f75f27389	Report cpi->hba_* for nda(4) because why not. MFC after: 1 week	2020-08-12 20:05:43 +00:00
markj	2215e2cd8f	Remove free_domain() and uma_zfree_domain(). These functions were introduced before UMA started ensuring that freed memory gets placed in domain-local caches. They no longer serve any purpose since UMA now provides their functionality by default. Remove them to simplyify the kernel memory allocator interfaces a bit. Reviewed by: cem, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25937	2020-08-04 13:58:36 +00:00
mav	e7f2c302f6	Fix few panics on NVMe's timing out initialization requests. MFC after: 1 week Sponsored by: iXsystems, Inc.	2020-06-25 20:29:29 +00:00
mav	ab4eadd6ac	Make polled request timeout less invasive. Instead of panic after one second of polling, make the normal timeout handler to activate, reset the controller and abort the outstanding requests. If all of it won't happen within 10 seconds then something in the driver is likely stuck bad and panic is the only way out. In particular this fixed device hot unplug during execution of those polled commands, allowing clean device detach instead of panic. MFC after: 1 week Sponsored by: iXsystems, Inc.	2020-06-18 19:16:03 +00:00
mav	c7e6bf9e22	Fix admin qpair leak if detached during initial reset. MFC after: 1 week Sponsored by: iXsystems, Inc.	2020-06-17 17:51:40 +00:00
mav	3a1e055ba4	Fix config_intrhook leak on initial reset failure. MFC after: 1 week Sponsored by: iXsystems, Inc.	2020-06-12 14:14:01 +00:00
dab	149ace228d	Fix various Coverity-detected errors in nvme driver This fixes several Coverity-detected errors in the nvme driver. CIDs addressed: 1008344, 1009377, 1009380, 1193740, 1305470, 1403975, 1403980 Reviewed by: imp@, vangyzen@ MFC after: 5 days Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D24532	2020-05-02 20:47:58 +00:00
imp	b150ec9fa9	Add KASSERT to ensure sane nsid. All callers are currently filtering bad nsid to this function, however, we'll have undefined behavior if that's not true. Add the KASSERT to prevent that.	2020-05-01 21:24:19 +00:00
imp	af609559e3	Rename ns notification function... This function is called whenever the namespace is added, deleted or changes. Update the name to reflect that. No functional change.	2020-05-01 21:24:15 +00:00
imp	ad29f78218	Style(9) nit: put function name at start of line.	2020-04-30 20:58:38 +00:00
imp	2da1d539e3	Move / reword a comment. Explain what we're doing with mapping CAM's notion of a LUN to NVMe's notion of a namespace.	2020-04-30 20:58:33 +00:00
imp	b8ce7e8464	Make sure that we get the sbuf resources we need. Since we're calling sbuf_new with NOWAIT, make sure it can allocate a buffer to use. Don't print anything if we can't get it. Noticed by: rpokala	2020-04-30 00:43:11 +00:00
imp	18a1b03238	Return the nvmeX device associated with the ndaX device. Add the nvmeX device to the XPT_PATH_INQ nvme specific information. while one could figure this out by looking up the domain🚌slot:function, it's a lot easier to have the SIM set it directly since the sim knows this.	2020-04-30 00:43:02 +00:00
imp	07ca42c0db	Generate a devctl event for interesting events When we reset the controller, and when the controller tells us about a critical warning, send an event.	2020-04-30 00:27:19 +00:00
emaste	4742a826c3	remove extraneous double ;s in sys/	2020-03-30 16:04:25 +00:00
kaktus	ad355b0a9d	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718	2020-02-26 14:26:36 +00:00
scottl	b2c4b49613	Ever since the block layer expanded its command syntax beyond just BIO_READ and BIO_WRITE, we've handled this expanded syntax poorly in drivers when the driver doesn't support a particular command. Do a sweep and fix that. Reported by: imp	2020-02-07 09:22:08 +00:00
mav	ef8d51daa1	Fix copy-paste bug in HMB free code. MFC after: 2 weeks X-MFC-with: r356474	2020-01-08 18:26:23 +00:00
mav	8f7704790f	Minor adjustments to r356474 and r356480. Reported by: jkim, imp MFC after: 2 weeks X-MFC-with: r356474	2020-01-07 23:29:54 +00:00
mav	0c75e47646	Increate HMB limit from 1% to 5%. SSD capacity in laptops is growing faster then RAM size, so my original guess seems too low on second thought. Hopefully nobody will build large array of those crappy SSDs. MFC after: 2 weeks X-MFC-with: 356474	2020-01-07 23:10:38 +00:00
mav	a91b9f9262	Add Host Memory Buffer support to nvme(4). This allows cheapest DRAM-less NVMe SSDs to use some of host RAM (about 1MB per 1GB on the devices I have) for its metadata cache, significantly improving random I/O performance. Device reports minimal and preferable size of the buffer. The code limits it to 1% of physical RAM by default. If the buffer can not be allocated or below minimal size, the device will just have to work without it. MFC after: 2 weeks Relnotes: yes Sponsored by: iXsystems, Inc.	2020-01-07 21:17:11 +00:00
mmel	61e28a20a8	Properly synchronize completion DMA buffers. Within command completion processing the callback function may access DMAed data buffer. Synchronize it before use, not after. This allows to use NVMe disk on non-DMA coherent arm64 system. MFC after: 3 weeks	2019-12-15 14:28:38 +00:00
imp	a3fcfb05ea	Move to using bool instead of boolean_t While there are subtle semantic differences between bool and boolean_t, none of them matter in these cases. Prefer true/false when dealing with bool type. Preserve a couple of TRUEs since they are passed into int args into CAM. Preserve a couple of FALSEs when used for status.done, an int. Differential Revision: https://reviews.freebsd.org/D20999	2019-12-13 18:35:48 +00:00
imp	00ed045a2c	Move reset to the interrutp processing stage This trims the boot time a bit more for AWS and other platforms that have nvme drives. There's no reason too do this inline. This has been in my tree a while, but IIRC I talked to Jim Harris about this at one of our face to face meetings. MFC After: 2 weeks	2019-12-11 22:51:02 +00:00
imp	ee3e21ead7	trackers always know what qpair they are on Don't needlessly pass around qpair pointers when the tracker knows what qpair it's on. This will simplify code and make it easier to split submission and completion queues in the future. Signed-off-by: John Meneghini <johnm@netapp.com>	2019-12-06 22:12:39 +00:00
mav	37e8a0e005	Make nvme(4) driver some more NUMA aware. - For each queue pair precalculate CPU and domain it is bound to. If queue pairs are not per-CPU, then use the domain of the device. - Allocate most of queue pair memory from the domain it is bound to. - Bind callouts to the same CPUs as queue pair to avoid migrations. - Do not assign queue pairs to each SMT thread. It just wasted resources and increased lock congestions. - Remove fixed multiplier of CPUs per queue pair, spread them even. This allows to use more queue pairs in some hardware configurations. - If queue pair serves multiple CPUs, bind different NVMe devices to different CPUs. MFC after: 1 month Sponsored by: iXsystems, Inc.	2019-09-23 17:53:47 +00:00
imp	362659d41a	Support doorbell strides != 0. The NVMe standard (1.4) states >>> 8.6 Doorbell Stride for Software Emulation >>> The doorbell stride,...is useful in software emulation of an NVM >>> Express controller. ... For hardware implementations of the NVM >>> Express interface, the expected doorbell stride value is 0h. However, hardware in the wild exists with a doorbell stride of 1 (meaning 8 byte separation). This change supports that hardware, as well as software emulators as envisioned in Section 8.6. Since this is the fast path, care has been taken to make this computation efficient. The bit of math to compute an offset for each is replaced by a memory load from cache of a pre-computed value. MFC After: 3 days Reviewed by: scottl@ Differential Revision: https://reviews.freebsd.org/D21514	2019-09-04 20:08:36 +00:00
imp	854a74e65c	Implement nvme suspend / resume for pci attachment When we suspend, we need to properly shutdown the NVME controller. The controller may go into D3 state (or may have the power removed), and to properly flush the metadata to non-volatile RAM, we must complete a normal shutdown. This consists of deleting the I/O queues and setting the shutodown bit. We have to do some extra stuff to make sure we reset the software state of the queues as well. On resume, we have to reset the card twice, for reasons described in the attach funcion. Once we've done that, we can restart the card. If any of this fails, we'll fail the NVMe card, just like we do when a reset fails. Set is_resetting for the duration of the suspend / resume. This keeps the reset taskqueue from running a concurrent reset, and also is needed to prevent any hw completions from queueing more I/O to the card. Pass resetting flag to nvme_ctrlr_start. It doesn't need to get that from the global state of the ctrlr. Wait for any pending reset to finish. All queued I/O will get sent to the hardware as part of nvme_ctrlr_start(), though the upper layers shouldn't send any down. Disabling the qpairs is the other failsafe to ensure all I/O is queued. Rename nvme_ctrlr_destory_qpairs to nvme_ctrlr_delete_qpairs to avoid confusion with all the other destroy functions. It just removes the queues in hardware, while the other _destroy_ functions tear down driver data structures. Split parts of the hardware reset function up so that I can do part of the reset in suspsend. Split out the software disabling of the qpairs into nvme_ctrlr_disable_qpairs. Finally, fix a couple of spelling errors in comments related to this. Relnotes: Yes MFC After: 1 week Reviewed by: scottl@ (prior version) Differential Revision: https://reviews.freebsd.org/D21493	2019-09-03 15:26:11 +00:00
imp	2d33613528	In nvme_completion_poll, add a sanity check to make sure that we complete the polling within a second. Panic if we don't. All the commands that use this interface should typically complete within a few tens to hundreds of microseconds. Panic rather than return ETIMEDOUT because if the command somehow does later complete, it will randomly corrupt memory. Also, it helps to get a traceback from where the unexpected failure happens, rather than an infinite loop.	2019-09-02 17:11:32 +00:00

1 2 3 4 5 ...

297 Commits