Commit Graph

32 Commits

Author SHA1 Message Date
Wojciech Macek
0d787e9b35 NVMe: Add big-endian support
Remove bitfields from defined structures as they are not portable.
Instead use shift and mask macros in the driver and nvmecontrol application.

NVMe is now working on powerpc64 host.

Submitted by:          Michal Stanek <mst@semihalf.com>
Obtained from:         Semihalf
Reviewed by:           imp, wma
Sponsored by:          IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D13916
2018-02-22 13:32:31 +00:00
Pedro F. Giffuni
ac2fffa4b7 Revert r327828, r327949, r327953, r328016-r328026, r328041:
Uses of mallocarray(9).

The use of mallocarray(9) has rocketed the required swap to build FreeBSD.
This is likely caused by the allocation size attributes which put extra pressure
on the compiler.

Given that most of these checks are superfluous we have to choose better
where to use mallocarray(9). We still have more uses of mallocarray(9) but
hopefully this is enough to bring swap usage to a reasonable level.

Reported by:	wosch
PR:		225197
2018-01-21 15:42:36 +00:00
Pedro F. Giffuni
26c1d774b5 dev: make some use of mallocarray(9).
Focus on code where we are doing multiplications within malloc(9). None of
these is likely to overflow, however the change is still useful as some
static checkers can benefit from the allocation attributes we use for
mallocarray.

This initial sweep only covers malloc(9) calls with M_NOWAIT. No good
reason but I started doing the changes before r327796 and at that time it
was convenient to make sure the sorrounding code could handle NULL values.
2018-01-13 22:30:30 +00:00
Pedro F. Giffuni
718cf2ccb9 sys/dev: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 14:52:40 +00:00
Warner Losh
696c950297 NVME Namespace ID is 32-bits, so widen interface to reflect that.
Sponsored by: Netflix
2017-08-25 21:38:38 +00:00
Warner Losh
a8a18dd590 Make multi-namespace nvme drives more robust.
Fix assumptions about name spaces in NVME driver. First, it assumes
cdata.nn is the number of configured devices. However, it is the
number of supported name spaces. Second, it assumes that there will
never be more than 16 name spaces supported, but a certain drive I'm
testing reports 1024. It assumes that name spaces are a tightly packed
namespace, but the standard seems to indicate otherwise. Finally, it
assumes that an error would be generated when quearying an
unconfigured namespace. Instead, it succeeds but the identify data is
all zeros.

Fix these by limiting the number of name spaces we probe to 16. Remove
aborting when we find one in error. When the size of the name space is
zero, ignore it.

This is admittedly a bandaide. The long term fix will be to
participate in the enumeration and name space change protocols
definfed in the NVNe standard.

Sponsored by: Netflix
2017-03-07 21:47:54 +00:00
Alexander Motin
ee7f4d8187 Revert r292074 (by smh): Limit stripesize reported from nvd(4) to 4K
I believe that this patch handled the problem from the wrong side.
Instead of making ZFS properly handle large stripe sizes, it made
unrelated driver to lie in reported parameters to workaround that.

Alternative solution for this problem from ZFS side was committed at
r296615.

Discussed with:	smh
2016-03-10 17:13:10 +00:00
Steven Hartland
fdf16a68ab Limit stripesize reported from nvd(4) to 4K
Intel NVMe controllers have a slow path for I/Os that span a 128KB stripe boundary but ZFS limits ashift, which is derived from d_stripesize, to 13 (8KB) so we limit the stripesize reported to geom(8) to 4KB.

This may result in a small number of additional I/Os to require splitting in nvme(4), however the NVMe I/O path is very efficient so these additional I/Os will cause very minimal (if any) difference in performance or CPU utilisation.

This can be controller by the new sysctl kern.nvme.max_optimal_sectorsize.

MFC after:	1 week
Sponsored by:	Multiplay
Differential Revision:	https://reviews.freebsd.org/D4446
2015-12-11 02:06:03 +00:00
Jim Harris
fdbd3d8068 nvd, nvme: report stripesize through GEOM disk layer
MFC after:	3 days
Sponsored by:	Intel
2015-10-30 16:35:18 +00:00
Jim Harris
e7e7bad3d7 nvme: fix race condition in split bio completion path
Fixes race condition observed under following circumstances:

1) I/O split on 128KB boundary with Intel NVMe controller.
   Current Intel controllers produce better latency when
   I/Os do not span a 128KB boundary - even if the I/O size
   itself is less than 128KB.
2) Per-CPU I/O queues are enabled.
3) Child I/Os are submitted on different submission queues.
4) Interrupts for child I/O completions occur almost
   simultaneously.
5) ithread for child I/O A increments bio_inbed, then
   immediately is preempted (rendezvous IPI, higher priority
   interrupt).
6) ithread for child I/O B increments bio_inbed, then completes
   parent bio since all children are now completed.
7) parent bio is freed, and immediately reallocated for a VFS
   or gpart bio (including setting bio_children to 1 and
   clearing bio_driver1).
8) ithread for child I/O A resumes processing.  bio_children
   for what it thinks is the parent bio is set to 1, so it
   thinks it needs to complete the parent bio.

Result is either calling a NULL callback function, or double freeing
the bio to its uma zone.

PR:		203746
Reported by:	Drew Gallatin <gallatin@netflix.com>,
		Marc Goroff <mgoroff@quorum.net>
Tested by:	Drew Gallatin <gallatin@netflix.com>
MFC after:	3 days
Sponsored by:	Intel
2015-10-30 16:06:34 +00:00
Jim Harris
36b0e4ee1f nvme: remove CHATHAM related code
Chatham was an internal NVMe prototype board used for
early driver development.

MFC after:	1 week
Sponsored by:	Intel
2015-04-08 21:52:06 +00:00
Jim Harris
d603c3d73b Create a unique unit number for each controller and namespace cdev.
Sponsored by:	Intel
MFC after:	3 days
2013-11-01 23:30:54 +00:00
Jim Harris
8a959ae073 Fix the LINT build.
Approved by:	re (implicit)
MFC after:	1 week
2013-10-08 23:23:04 +00:00
Jim Harris
a40e72a695 Add driver-assisted striping for upcoming Intel NVMe controllers that can
benefit from it.

Sponsored by:	Intel
Reviewed by:	kib (earlier version), carl
Approved by:	re (hrs)
MFC after:	1 week
2013-10-08 15:44:04 +00:00
Kenneth D. Merry
ce625ec719 Change the way that unmapped I/O capability is advertised.
The previous method was to set the D_UNMAPPED_IO flag in the cdevsw
for the driver.  The problem with this is that in many cases (e.g.
sa(4)) there may be some instances of the driver that can handle
unmapped I/O and some that can't.  The isp(4) driver can handle
unmapped I/O, but the esp(4) driver currently cannot.  The cdevsw
is shared among all driver instances.

So instead of setting a flag on the cdevsw, set a flag on the cdev.
This allows drivers to indicate support for unmapped I/O on a
per-instance basis.

sys/conf.h:	Remove the D_UNMAPPED_IO cdevsw flag and replace it
		with an SI_UNMAPPED cdev flag.

kern_physio.c:	Look at the cdev SI_UNMAPPED flag to determine
		whether or not a particular driver can handle
		unmapped I/O.

geom_dev.c:	Set the SI_UNMAPPED flag for all GEOM cdevs.
		Since GEOM will create a temporary mapping when
		needed, setting SI_UNMAPPED unconditionally will
		work.

		Remove the D_UNMAPPED_IO flag.

nvme_ns.c:	Set the SI_UNMAPPED flag on cdevs created here
		if NVME_UNMAPPED_BIO_SUPPORT is enabled.

vfs_aio.c:	In aio_qphysio(), check the SI_UNMAPPED flag on a
		cdev instead of the D_UNMAPPED_IO flag on the cdevsw.

sys/param.h:	Bump __FreeBSD_version to 1000045 for the switch from
		setting the D_UNMAPPED_IO flag in the cdevsw to setting
		SI_UNMAPPED in the cdev.

Reviewed by:	kib, jimharris
MFC after:	1 week
Sponsored by:	Spectra Logic
2013-08-15 22:52:39 +00:00
Jim Harris
2fb37e8f1a Fix nvme(4) and nvd(4) to support non 512-byte sector sizes.
Recent testing with QEMU that has variable sector size support for
NVMe uncovered some of these issues.  Chatham prototype boards supported
only 512 byte sectors.

Sponsored by:	Intel
Reviewed by:	carl
MFC after:	3 days
2013-07-19 21:33:24 +00:00
Jim Harris
e9efbc134f Update copyright dates.
MFC after:	3 days
2013-07-09 21:22:17 +00:00
Jim Harris
5076698e19 Remove the NVME_IDENTIFY_CONTROLLER and NVME_IDENTIFY_NAMESPACE IOCTLs and replace
them with the NVMe passthrough equivalent.

Sponsored by:	Intel
2013-04-12 17:56:47 +00:00
Jim Harris
7c3f19d7bb Add support for passthrough NVMe commands.
This includes a new IOCTL to support a generic method for nvmecontrol(8) to pass
IDENTIFY, GET_LOG_PAGE, GET_FEATURES and other commands to the controller, rather than
separate IOCTLs for each.

Sponsored by:	Intel
2013-04-12 17:52:17 +00:00
Jim Harris
611060cab5 Remove the NVMe-specific physio and associated routines.
These were added early on for benchmarking purposes to avoid the mapped I/O
penalties incurred in kern_physio.  Now that FreeBSD (including kern_physio)
supports unmapped I/O, the need for these NVMe-specific routines no longer exists.

Sponsored by:	Intel
2013-04-12 17:44:55 +00:00
Jim Harris
97fafe2580 Add a mutex to each namespace, for general locking operations on the namespace.
Sponsored by:	Intel
2013-04-12 17:41:24 +00:00
Jim Harris
5fdf9c3c8e Add unmapped bio support to nvme(4) and nvd(4).
Sponsored by:	Intel
2013-04-01 16:23:34 +00:00
Jim Harris
64432b473b Remove obsolete comment. This code has now been tested with the QEMU
NVMe device emulator.
2013-03-28 16:57:48 +00:00
Jim Harris
547d523eb8 Clean up debug prints.
1) Consistently use device_printf.
2) Make dump_completion and dump_command into something more
    human-readable.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 22:17:10 +00:00
Jim Harris
237d2019e5 Change a number of malloc(9) calls to use M_WAITOK instead of
M_NOWAIT.

Sponsored by:	Intel
Suggested by:	carl
Reviewed by:	carl
2013-03-26 22:11:34 +00:00
Jim Harris
955910a916 Replace usages of mtx_pool_find used for admin commands with a polling
mechanism.

Now that all requests are timed, we are guaranteed to get a completion
notification, even if it is an abort status due to a timed out admin
command.

This has the effect of simplifying the controller and namespace setup
code, so that it reads straight through rather than broken up into
a bunch of different callback functions.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 22:09:51 +00:00
Jim Harris
cf81529ce3 Create struct nvme_status.
NVMe error log entries include status, so breaking this out into
its own data structure allows it to be included in both the
nvme_completion data structure as well as error log entry data
structures.

While here, expose nvme_completion_is_error(), and change all of
the places that were explicitly looking at sc/sct bits to use this
macro instead.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 21:00:18 +00:00
Jim Harris
dbba74428b Add API for nvme consumers to access controller and namespace identify data.
Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 19:52:57 +00:00
Jim Harris
b846efd7ec Add controller reset capability to nvme(4) and ability to explicitly
invoke it from nvmecontrol(8).

Controller reset will be performed in cases where I/O are repeatedly
timing out, the controller reports an unrecoverable condition, or
when explicitly requested via IOCTL or an nvme consumer.  Since the
controller may be in such a state where it cannot even process queue
deletion requests, we will perform a controller reset without trying
to clean up anything on the controller first.

Sponsored by:	Intel
Reviewed by:	carl
2013-03-26 19:50:46 +00:00
Jim Harris
51a9feb9b0 Do not look at the namespace's thin provisioning field to determine if DSM
command is supported.  The two are not related.

Sponsored by:	Intel
2013-03-26 18:01:24 +00:00
Jim Harris
9eb93f2976 Add return codes to all functions used for submitting commands to I/O
queues.

Sponsored by:	Intel
2012-10-18 00:32:07 +00:00
Jim Harris
bb0ec6b359 This is the first of several commits which will add NVM Express (NVMe)
support to FreeBSD.  A full description of the overall functionality
being added is below.  nvmexpress.org defines NVM Express as "an optimized
register interface, command set and feature set fo PCI Express (PCIe)-based
Solid-State Drives (SSDs)."

This commit adds nvme(4) and nvd(4) driver source code and Makefiles
to the tree.

Full NVMe functionality description:
Add nvme(4) and nvd(4) drivers and nvmecontrol(8) for NVM Express (NVMe)
device support.

There will continue to be ongoing work on NVM Express support, but there
is more than enough to allow for evaluation of pre-production NVM Express
devices as well as soliciting feedback.  Questions and feedback are welcome.

nvme(4) implements NVMe hardware abstraction and is a provider of NVMe
namespaces.  The closest equivalent of an NVMe namespace is a SCSI LUN.
nvd(4) is an NVMe consumer, surfacing NVMe namespaces as GEOM disks.
nvmecontrol(8) is used for NVMe configuration and management.

The following are currently supported:
nvme(4)
- full mandatory NVM command set support
- per-CPU IO queues (enabled by default but configurable)
- per-queue sysctls for statistics and full command/completion queue
     dumps for debugging
- registration API for NVMe namespace consumers
- I/O error handling (except for timeoutsee below)
- compilation switches for support back to stable-7

nvd(4)
- BIO_DELETE and BIO_FLUSH (if supported by controller)
- proper BIO_ORDERED handling

nvmecontrol(8)
- devlist: list NVMe controllers and their namespaces
- identify: display controller or namespace identify data in
      human-readable or hex format
- perftest: quick and dirty performance test to measure raw
      performance of NVMe device without userspace/physio/GEOM
      overhead

The following are still work in progress and will be completed over the
next 3-6 months in rough priority order:
- complete man pages
- firmware download and activation
- asynchronous error requests
- command timeout error handling
- controller resets
- nvmecontrol(8) log page retrieval

This has been primarily tested on amd64, with light testing on i386.  I
would be happy to provide assistance to anyone interested in porting
this to other architectures, but am not currently planning to do this
work myself.  Big-endian and dmamap sync for command/completion queues
are the main areas that would need to be addressed.

The nvme(4) driver currently has references to Chatham, which is an
Intel-developed prototype board which is not fully spec compliant.
These references will all be removed over time.

Sponsored by:        Intel
Contributions from:  Joe Golio/EMC <joseph dot golio at emc dot com>
2012-09-17 19:23:01 +00:00