doc: flatten Markdown docs into chapter-per-file
Doxygen interprets each Markdown input file as a separate section (chapter). Concatenate all of the .md files in directories into a single file per section to get a correctly-nested table of contents. In particular, this matters for the navigation in the PDF output. Change-Id: I778849d89da9a308136e43ac6cb630c4c2bbb3a5 Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This commit is contained in:
parent
388ca48336
commit
1a787169b2
30
doc/Doxyfile
30
doc/Doxyfile
@ -784,27 +784,15 @@ INPUT = ../include/spdk \
|
||||
index.md \
|
||||
directory_structure.md \
|
||||
porting.md \
|
||||
bdev/index.md \
|
||||
bdev/getting_started.md \
|
||||
blob/index.md \
|
||||
blobfs/index.md \
|
||||
blobfs/getting_started.md \
|
||||
event/index.md \
|
||||
ioat/index.md \
|
||||
iscsi/index.md \
|
||||
iscsi/getting_started.md \
|
||||
iscsi/hotplug.md \
|
||||
nvme/index.md \
|
||||
nvme/async_completion.md \
|
||||
nvme/fabrics.md \
|
||||
nvme/initialization.md \
|
||||
nvme/hotplug.md \
|
||||
nvme/io_submission.md \
|
||||
nvme/multi_process.md \
|
||||
nvmf/index.md \
|
||||
nvmf/getting_started.md \
|
||||
vhost/index.md \
|
||||
vhost/getting_started.md
|
||||
blob.md \
|
||||
blobfs.md \
|
||||
bdev.md \
|
||||
event.md \
|
||||
ioat.md \
|
||||
iscsi.md \
|
||||
nvme.md \
|
||||
nvmf.md \
|
||||
vhost.md
|
||||
|
||||
# This tag can be used to specify the character encoding of the source files
|
||||
# that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses
|
||||
|
@ -1,3 +1,5 @@
|
||||
# Block Device Abstraction Layer {#bdev}
|
||||
|
||||
# SPDK bdev Getting Started Guide {#bdev_getting_started}
|
||||
|
||||
Block storage in SPDK applications is provided by the SPDK bdev layer. SPDK bdev consists of:
|
@ -1,3 +0,0 @@
|
||||
# Block Device Abstraction Layer {#bdev}
|
||||
|
||||
- @ref bdev_getting_started
|
@ -1,3 +1,5 @@
|
||||
# BlobFS (Blobstore Filesystem) {#blobfs}
|
||||
|
||||
# BlobFS Getting Started Guide {#blobfs_getting_started}
|
||||
|
||||
# RocksDB Integration {#blobfs_rocksdb}
|
@ -1,3 +0,0 @@
|
||||
# BlobFS (Blobstore Filesystem) {#blobfs}
|
||||
|
||||
- @ref blobfs_getting_started
|
@ -1,9 +1,11 @@
|
||||
# iSCSI Target {#iscsi}
|
||||
|
||||
# Getting Started Guide {#iscsi_getting_started}
|
||||
|
||||
The Intel(R) Storage Performance Development Kit iSCSI target application is named `iscsi_tgt`.
|
||||
This following section describes how to run iscsi from your cloned package.
|
||||
|
||||
# Prerequisites {#iscsi_prereqs}
|
||||
## Prerequisites {#iscsi_prereqs}
|
||||
|
||||
This guide starts by assuming that you can already build the standard SPDK distribution on your
|
||||
platform. The SPDK iSCSI target has been known to work on several Linux distributions, namely
|
||||
@ -11,7 +13,7 @@ Ubuntu 14.04, 15.04, and 15.10, Fedora 21, 22, and 23, and CentOS 7.
|
||||
|
||||
Once built, the binary will be in `app/iscsi_tgt`.
|
||||
|
||||
# Configuring iSCSI Target {#iscsi_config}
|
||||
## Configuring iSCSI Target {#iscsi_config}
|
||||
|
||||
A `iscsi_tgt` specific configuration file is used to configure the iSCSI target. A fully documented
|
||||
example configuration file is located at `etc/spdk/iscsi.conf.in`.
|
||||
@ -43,7 +45,7 @@ the target requires elevated privileges (root) to run.
|
||||
app/iscsi_tgt/iscsi_tgt -c /path/to/iscsi.conf
|
||||
~~~
|
||||
|
||||
# Configuring iSCSI Initiator {#iscsi_initiator}
|
||||
## Configuring iSCSI Initiator {#iscsi_initiator}
|
||||
|
||||
The Linux initiator is open-iscsi.
|
||||
|
||||
@ -58,7 +60,7 @@ Ubuntu:
|
||||
apt-get install -y open-iscsi
|
||||
~~~
|
||||
|
||||
## Setup
|
||||
### Setup
|
||||
|
||||
Edit /etc/iscsi/iscsid.conf
|
||||
~~~
|
||||
@ -146,3 +148,26 @@ Increase requests for block queue
|
||||
~~~
|
||||
echo "1024" > /sys/block/sdc/queue/nr_requests
|
||||
~~~
|
||||
|
||||
|
||||
# iSCSI Hotplug {#iscsi_hotplug}
|
||||
|
||||
At the iSCSI level, we provide the following support for Hotplug:
|
||||
|
||||
1. bdev/nvme:
|
||||
At the bdev/nvme level, we start one hotplug monitor which will call
|
||||
spdk_nvme_probe() periodically to get the hotplug events. We provide the
|
||||
private attach_cb and remove_cb for spdk_nvme_probe(). For the attach_cb,
|
||||
we will create the block device base on the NVMe device attached, and for the
|
||||
remove_cb, we will unregister the block device, which will also notify the
|
||||
upper level stack (for iSCSI target, the upper level stack is scsi/lun) to
|
||||
handle the hot-remove event.
|
||||
|
||||
2. scsi/lun:
|
||||
When the LUN receive the hot-remove notification from block device layer,
|
||||
the LUN will be marked as removed, and all the IOs after this point will
|
||||
return with check condition status. Then the LUN starts one poller which will
|
||||
wait for all the commands which have already been submitted to block device to
|
||||
return back; after all the commands return back, the LUN will be deleted.
|
||||
|
||||
@sa spdk_nvme_probe
|
@ -1,21 +0,0 @@
|
||||
# iSCSI Hotplug {#iscsi_hotplug}
|
||||
|
||||
At the iSCSI level, we provide the following support for Hotplug:
|
||||
|
||||
1. bdev/nvme:
|
||||
At the bdev/nvme level, we start one hotplug monitor which will call
|
||||
spdk_nvme_probe() periodically to get the hotplug events. We provide the
|
||||
private attach_cb and remove_cb for spdk_nvme_probe(). For the attach_cb,
|
||||
we will create the block device base on the NVMe device attached, and for the
|
||||
remove_cb, we will unregister the block device, which will also notify the
|
||||
upper level stack (for iSCSI target, the upper level stack is scsi/lun) to
|
||||
handle the hot-remove event.
|
||||
|
||||
2. scsi/lun:
|
||||
When the LUN receive the hot-remove notification from block device layer,
|
||||
the LUN will be marked as removed, and all the IOs after this point will
|
||||
return with check condition status. Then the LUN starts one poller which will
|
||||
wait for all the commands which have already been submitted to block device to
|
||||
return back; after all the commands return back, the LUN will be deleted.
|
||||
|
||||
@sa spdk_nvme_probe
|
@ -1,4 +0,0 @@
|
||||
# iSCSI Target {#iscsi}
|
||||
|
||||
- @ref iscsi_getting_started
|
||||
- @ref iscsi_hotplug
|
191
doc/nvme.md
Normal file
191
doc/nvme.md
Normal file
@ -0,0 +1,191 @@
|
||||
# NVMe Driver {#nvme}
|
||||
|
||||
# Public Interface {#nvme_interface}
|
||||
|
||||
- spdk/nvme.h
|
||||
|
||||
# Key Functions {#nvme_key_functions}
|
||||
|
||||
Function | Description
|
||||
------------------------------------------- | -----------
|
||||
spdk_nvme_probe() | @copybrief spdk_nvme_probe()
|
||||
spdk_nvme_ns_cmd_read() | @copybrief spdk_nvme_ns_cmd_read()
|
||||
spdk_nvme_ns_cmd_write() | @copybrief spdk_nvme_ns_cmd_write()
|
||||
spdk_nvme_ns_cmd_dataset_management() | @copybrief spdk_nvme_ns_cmd_dataset_management()
|
||||
spdk_nvme_ns_cmd_flush() | @copybrief spdk_nvme_ns_cmd_flush()
|
||||
spdk_nvme_qpair_process_completions() | @copybrief spdk_nvme_qpair_process_completions()
|
||||
spdk_nvme_ctrlr_cmd_admin_raw() | @copybrief spdk_nvme_ctrlr_cmd_admin_raw()
|
||||
spdk_nvme_ctrlr_process_admin_completions() | @copybrief spdk_nvme_ctrlr_process_admin_completions()
|
||||
|
||||
|
||||
# NVMe Initialization {#nvme_initialization}
|
||||
|
||||
\msc
|
||||
|
||||
app [label="Application"], nvme [label="NVMe Driver"];
|
||||
app=>nvme [label="nvme_probe()"];
|
||||
app<<nvme [label="probe_cb(pci_dev)"];
|
||||
nvme=>nvme [label="nvme_attach(devhandle)"];
|
||||
nvme=>nvme [label="nvme_ctrlr_start(nvme_controller ptr)"];
|
||||
nvme=>nvme [label="identify controller"];
|
||||
nvme=>nvme [label="create queue pairs"];
|
||||
nvme=>nvme [label="identify namespace(s)"];
|
||||
app<<nvme [label="attach_cb(pci_dev, nvme_controller)"];
|
||||
app=>app [label="create block devices based on controller's namespaces"];
|
||||
|
||||
\endmsc
|
||||
|
||||
|
||||
# NVMe I/O Submission {#nvme_io_submission}
|
||||
|
||||
I/O is submitted to an NVMe namespace using nvme_ns_cmd_xxx functions
|
||||
defined in nvme_ns_cmd.c. The NVMe driver submits the I/O request
|
||||
as an NVMe submission queue entry on the queue pair specified in the command.
|
||||
The application must poll for I/O completion on each queue pair with outstanding I/O
|
||||
to receive completion callbacks.
|
||||
|
||||
@sa spdk_nvme_ns_cmd_read, spdk_nvme_ns_cmd_write, spdk_nvme_ns_cmd_dataset_management,
|
||||
spdk_nvme_ns_cmd_flush, spdk_nvme_qpair_process_completions
|
||||
|
||||
|
||||
# NVMe Asynchronous Completion {#nvme_async_completion}
|
||||
|
||||
The userspace NVMe driver follows an asynchronous polled model for
|
||||
I/O completion.
|
||||
|
||||
## I/O commands {#nvme_async_io}
|
||||
|
||||
The application may submit I/O from one or more threads on one or more queue pairs
|
||||
and must call spdk_nvme_qpair_process_completions()
|
||||
for each queue pair that submitted I/O.
|
||||
|
||||
When the application calls spdk_nvme_qpair_process_completions(),
|
||||
if the NVMe driver detects completed I/Os that were submitted on that queue,
|
||||
it will invoke the registered callback function
|
||||
for each I/O within the context of spdk_nvme_qpair_process_completions().
|
||||
|
||||
## Admin commands {#nvme_async_admin}
|
||||
|
||||
The application may submit admin commands from one or more threads
|
||||
and must call spdk_nvme_ctrlr_process_admin_completions()
|
||||
from at least one thread to receive admin command completions.
|
||||
The thread that processes admin completions need not be the same thread that submitted the
|
||||
admin commands.
|
||||
|
||||
When the application calls spdk_nvme_ctrlr_process_admin_completions(),
|
||||
if the NVMe driver detects completed admin commands submitted from any thread,
|
||||
it will invote the registered callback function
|
||||
for each command within the context of spdk_nvme_ctrlr_process_admin_completions().
|
||||
|
||||
It is the application's responsibility to manage the order of submitted admin commands.
|
||||
If certain admin commands must be submitted while no other commands are outstanding,
|
||||
it is the application's responsibility to enforce this rule
|
||||
using its own synchronization method.
|
||||
|
||||
|
||||
# NVMe over Fabrics Host Support {#nvme_fabrics_host}
|
||||
|
||||
The NVMe driver supports connecting to remote NVMe-oF targets and
|
||||
interacting with them in the same manner as local NVMe controllers.
|
||||
|
||||
## Specifying Remote NVMe over Fabrics Targets {#nvme_fabrics_trid}
|
||||
|
||||
The method for connecting to a remote NVMe-oF target is very similar
|
||||
to the normal enumeration process for local PCIe-attached NVMe devices.
|
||||
To connect to a remote NVMe over Fabrics subsystem, the user may call
|
||||
spdk_nvme_probe() with the `trid` parameter specifying the address of
|
||||
the NVMe-oF target.
|
||||
The caller may fill out the spdk_nvme_transport_id structure manually
|
||||
or use the spdk_nvme_transport_id_parse() function to convert a
|
||||
human-readable string representation into the required structure.
|
||||
|
||||
The spdk_nvme_transport_id may contain the address of a discovery service
|
||||
or a single NVM subsystem. If a discovery service address is specified,
|
||||
the NVMe library will call the spdk_nvme_probe() `probe_cb` for each
|
||||
discovered NVM subsystem, which allows the user to select the desired
|
||||
subsystems to be attached. Alternatively, if the address specifies a
|
||||
single NVM subsystem directly, the NVMe library will call `probe_cb`
|
||||
for just that subsystem; this allows the user to skip the discovery step
|
||||
and connect directly to a subsystem with a known address.
|
||||
|
||||
|
||||
# NVMe Multi Process {#nvme_multi_process}
|
||||
|
||||
This capability enables the SPDK NVMe driver to support multiple processes accessing the
|
||||
same NVMe device. The NVMe driver allocates critical structures from shared memory, so
|
||||
that each process can map that memory and create its own queue pairs or share the admin
|
||||
queue. There is a limited number of I/O queue pairs per NVMe controller.
|
||||
|
||||
The primary motivation for this feature is to support management tools that can attach
|
||||
to long running applications, perform some maintenance work or gather information, and
|
||||
then detach.
|
||||
|
||||
## Configuration {#nvme_multi_process_configuration}
|
||||
|
||||
DPDK EAL allows different types of processes to be spawned, each with different permissions
|
||||
on the hugepage memory used by the applications.
|
||||
|
||||
There are two types of processes:
|
||||
1. a primary process which initializes the shared memory and has full privileges and
|
||||
2. a secondary process which can attach to the primary process by mapping its shared memory
|
||||
regions and perform NVMe operations including creating queue pairs.
|
||||
|
||||
This feature is enabled by default and is controlled by selecting a value for the shared
|
||||
memory group ID. This ID is a positive integer and two applications with the same shared
|
||||
memory group ID will share memory. The first application with a given shared memory group
|
||||
ID will be considered the primary and all others secondary.
|
||||
|
||||
Example: identical shm_id and non-overlapping core masks
|
||||
~~~{.sh}
|
||||
./perf options [AIO device(s)]...
|
||||
[-c core mask for I/O submission/completion]
|
||||
[-i shared memory group ID]
|
||||
|
||||
./perf -q 1 -s 4096 -w randread -c 0x1 -t 60 -i 1
|
||||
./perf -q 8 -s 131072 -w write -c 0x10 -t 60 -i 1
|
||||
~~~
|
||||
|
||||
## Scalability and Performance {#nvme_multi_process_scalability_performance}
|
||||
|
||||
To maximize the I/O bandwidth of an NVMe device, ensure that each application has its own
|
||||
queue pairs.
|
||||
|
||||
The optimal threading model for SPDK is one thread per core, regardless of which processes
|
||||
that thread belongs to in the case of multi-process environment. To achieve maximum
|
||||
performance, each thread should also have its own I/O queue pair. Applications that share
|
||||
memory should be given core masks that do not overlap.
|
||||
|
||||
However, admin commands may have some performance impact as there is only one admin queue
|
||||
pair per NVMe SSD. The NVMe driver will automatically take a cross-process capable lock
|
||||
to enable the sharing of admin queue pair. Further, when each process polls the admin
|
||||
queue for completions, it will only see completions for commands that it originated.
|
||||
|
||||
## Limitations {#nvme_multi_process_limitations}
|
||||
|
||||
1. Two processes sharing memory may not share any cores in their core mask.
|
||||
2. If a primary process exits while secondary processes are still running, those processes
|
||||
will continue to run. However, a new primary process cannot be created.
|
||||
3. Applications are responsible for coordinating access to logical blocks.
|
||||
|
||||
@sa spdk_nvme_probe, spdk_nvme_ctrlr_process_admin_completions
|
||||
|
||||
|
||||
# NVMe Hotplug {#nvme_hotplug}
|
||||
|
||||
At the NVMe driver level, we provide the following support for Hotplug:
|
||||
|
||||
1. Hotplug events detection:
|
||||
The user of the NVMe library can call spdk_nvme_probe() periodically to detect
|
||||
hotplug events. The probe_cb, followed by the attach_cb, will be called for each
|
||||
new device detected. The user may optionally also provide a remove_cb that will be
|
||||
called if a previously attached NVMe device is no longer present on the system.
|
||||
All subsequent I/O to the removed device will return an error.
|
||||
|
||||
2. Hot remove NVMe with IO loads:
|
||||
When a device is hot removed while I/O is occurring, all access to the PCI BAR will
|
||||
result in a SIGBUS error. The NVMe driver automatically handles this case by installing
|
||||
a SIGBUS handler and remapping the PCI BAR to a new, placeholder memory location.
|
||||
This means I/O in flight during a hot remove will complete with an appropriate error
|
||||
code and will not crash the application.
|
||||
|
||||
@sa spdk_nvme_probe
|
@ -1,33 +0,0 @@
|
||||
# NVMe Asynchronous Completion {#nvme_async_completion}
|
||||
|
||||
The userspace NVMe driver follows an asynchronous polled model for
|
||||
I/O completion.
|
||||
|
||||
# I/O commands {#nvme_async_io}
|
||||
|
||||
The application may submit I/O from one or more threads on one or more queue pairs
|
||||
and must call spdk_nvme_qpair_process_completions()
|
||||
for each queue pair that submitted I/O.
|
||||
|
||||
When the application calls spdk_nvme_qpair_process_completions(),
|
||||
if the NVMe driver detects completed I/Os that were submitted on that queue,
|
||||
it will invoke the registered callback function
|
||||
for each I/O within the context of spdk_nvme_qpair_process_completions().
|
||||
|
||||
# Admin commands {#nvme_async_admin}
|
||||
|
||||
The application may submit admin commands from one or more threads
|
||||
and must call spdk_nvme_ctrlr_process_admin_completions()
|
||||
from at least one thread to receive admin command completions.
|
||||
The thread that processes admin completions need not be the same thread that submitted the
|
||||
admin commands.
|
||||
|
||||
When the application calls spdk_nvme_ctrlr_process_admin_completions(),
|
||||
if the NVMe driver detects completed admin commands submitted from any thread,
|
||||
it will invote the registered callback function
|
||||
for each command within the context of spdk_nvme_ctrlr_process_admin_completions().
|
||||
|
||||
It is the application's responsibility to manage the order of submitted admin commands.
|
||||
If certain admin commands must be submitted while no other commands are outstanding,
|
||||
it is the application's responsibility to enforce this rule
|
||||
using its own synchronization method.
|
@ -1,24 +0,0 @@
|
||||
# NVMe over Fabrics Host Support {#nvme_fabrics_host}
|
||||
|
||||
The NVMe driver supports connecting to remote NVMe-oF targets and
|
||||
interacting with them in the same manner as local NVMe controllers.
|
||||
|
||||
# Specifying Remote NVMe over Fabrics Targets {#nvme_fabrics_trid}
|
||||
|
||||
The method for connecting to a remote NVMe-oF target is very similar
|
||||
to the normal enumeration process for local PCIe-attached NVMe devices.
|
||||
To connect to a remote NVMe over Fabrics subsystem, the user may call
|
||||
spdk_nvme_probe() with the `trid` parameter specifying the address of
|
||||
the NVMe-oF target.
|
||||
The caller may fill out the spdk_nvme_transport_id structure manually
|
||||
or use the spdk_nvme_transport_id_parse() function to convert a
|
||||
human-readable string representation into the required structure.
|
||||
|
||||
The spdk_nvme_transport_id may contain the address of a discovery service
|
||||
or a single NVM subsystem. If a discovery service address is specified,
|
||||
the NVMe library will call the spdk_nvme_probe() `probe_cb` for each
|
||||
discovered NVM subsystem, which allows the user to select the desired
|
||||
subsystems to be attached. Alternatively, if the address specifies a
|
||||
single NVM subsystem directly, the NVMe library will call `probe_cb`
|
||||
for just that subsystem; this allows the user to skip the discovery step
|
||||
and connect directly to a subsystem with a known address.
|
@ -1,19 +0,0 @@
|
||||
# NVMe Hotplug {#nvme_hotplug}
|
||||
|
||||
At the NVMe driver level, we provide the following support for Hotplug:
|
||||
|
||||
1. Hotplug events detection:
|
||||
The user of the NVMe library can call spdk_nvme_probe() periodically to detect
|
||||
hotplug events. The probe_cb, followed by the attach_cb, will be called for each
|
||||
new device detected. The user may optionally also provide a remove_cb that will be
|
||||
called if a previously attached NVMe device is no longer present on the system.
|
||||
All subsequent I/O to the removed device will return an error.
|
||||
|
||||
2. Hot remove NVMe with IO loads:
|
||||
When a device is hot removed while I/O is occurring, all access to the PCI BAR will
|
||||
result in a SIGBUS error. The NVMe driver automatically handles this case by installing
|
||||
a SIGBUS handler and remapping the PCI BAR to a new, placeholder memory location.
|
||||
This means I/O in flight during a hot remove will complete with an appropriate error
|
||||
code and will not crash the application.
|
||||
|
||||
@sa spdk_nvme_probe
|
@ -1,27 +0,0 @@
|
||||
# NVMe Driver {#nvme}
|
||||
|
||||
# Public Interface {#nvme_interface}
|
||||
|
||||
- spdk/nvme.h
|
||||
|
||||
# Key Functions {#nvme_key_functions}
|
||||
|
||||
Function | Description
|
||||
------------------------------------------- | -----------
|
||||
spdk_nvme_probe() | @copybrief spdk_nvme_probe()
|
||||
spdk_nvme_ns_cmd_read() | @copybrief spdk_nvme_ns_cmd_read()
|
||||
spdk_nvme_ns_cmd_write() | @copybrief spdk_nvme_ns_cmd_write()
|
||||
spdk_nvme_ns_cmd_dataset_management() | @copybrief spdk_nvme_ns_cmd_dataset_management()
|
||||
spdk_nvme_ns_cmd_flush() | @copybrief spdk_nvme_ns_cmd_flush()
|
||||
spdk_nvme_qpair_process_completions() | @copybrief spdk_nvme_qpair_process_completions()
|
||||
spdk_nvme_ctrlr_cmd_admin_raw() | @copybrief spdk_nvme_ctrlr_cmd_admin_raw()
|
||||
spdk_nvme_ctrlr_process_admin_completions() | @copybrief spdk_nvme_ctrlr_process_admin_completions()
|
||||
|
||||
# Key Concepts {#nvme_key_concepts}
|
||||
|
||||
- @ref nvme_initialization
|
||||
- @ref nvme_io_submission
|
||||
- @ref nvme_async_completion
|
||||
- @ref nvme_fabrics_host
|
||||
- @ref nvme_multi_process
|
||||
- @ref nvme_hotplug
|
@ -1,16 +0,0 @@
|
||||
# NVMe Initialization {#nvme_initialization}
|
||||
|
||||
\msc
|
||||
|
||||
app [label="Application"], nvme [label="NVMe Driver"];
|
||||
app=>nvme [label="nvme_probe()"];
|
||||
app<<nvme [label="probe_cb(pci_dev)"];
|
||||
nvme=>nvme [label="nvme_attach(devhandle)"];
|
||||
nvme=>nvme [label="nvme_ctrlr_start(nvme_controller ptr)"];
|
||||
nvme=>nvme [label="identify controller"];
|
||||
nvme=>nvme [label="create queue pairs"];
|
||||
nvme=>nvme [label="identify namespace(s)"];
|
||||
app<<nvme [label="attach_cb(pci_dev, nvme_controller)"];
|
||||
app=>app [label="create block devices based on controller's namespaces"];
|
||||
|
||||
\endmsc
|
@ -1,10 +0,0 @@
|
||||
# NVMe I/O Submission {#nvme_io_submission}
|
||||
|
||||
I/O is submitted to an NVMe namespace using nvme_ns_cmd_xxx functions
|
||||
defined in nvme_ns_cmd.c. The NVMe driver submits the I/O request
|
||||
as an NVMe submission queue entry on the queue pair specified in the command.
|
||||
The application must poll for I/O completion on each queue pair with outstanding I/O
|
||||
to receive completion callbacks.
|
||||
|
||||
@sa spdk_nvme_ns_cmd_read, spdk_nvme_ns_cmd_write, spdk_nvme_ns_cmd_dataset_management,
|
||||
spdk_nvme_ns_cmd_flush, spdk_nvme_qpair_process_completions
|
@ -1,59 +0,0 @@
|
||||
# NVMe Multi Process {#nvme_multi_process}
|
||||
|
||||
This capability enables the SPDK NVMe driver to support multiple processes accessing the
|
||||
same NVMe device. The NVMe driver allocates critical structures from shared memory, so
|
||||
that each process can map that memory and create its own queue pairs or share the admin
|
||||
queue. There is a limited number of I/O queue pairs per NVMe controller.
|
||||
|
||||
The primary motivation for this feature is to support management tools that can attach
|
||||
to long running applications, perform some maintenance work or gather information, and
|
||||
then detach.
|
||||
|
||||
# Configuration {#nvme_multi_process_configuration}
|
||||
|
||||
DPDK EAL allows different types of processes to be spawned, each with different permissions
|
||||
on the hugepage memory used by the applications.
|
||||
|
||||
There are two types of processes:
|
||||
1. a primary process which initializes the shared memory and has full privileges and
|
||||
2. a secondary process which can attach to the primary process by mapping its shared memory
|
||||
regions and perform NVMe operations including creating queue pairs.
|
||||
|
||||
This feature is enabled by default and is controlled by selecting a value for the shared
|
||||
memory group ID. This ID is a positive integer and two applications with the same shared
|
||||
memory group ID will share memory. The first application with a given shared memory group
|
||||
ID will be considered the primary and all others secondary.
|
||||
|
||||
Example: identical shm_id and non-overlapping core masks
|
||||
~~~{.sh}
|
||||
./perf options [AIO device(s)]...
|
||||
[-c core mask for I/O submission/completion]
|
||||
[-i shared memory group ID]
|
||||
|
||||
./perf -q 1 -s 4096 -w randread -c 0x1 -t 60 -i 1
|
||||
./perf -q 8 -s 131072 -w write -c 0x10 -t 60 -i 1
|
||||
~~~
|
||||
|
||||
# Scalability and Performance {#nvme_multi_process_scalability_performance}
|
||||
|
||||
To maximize the I/O bandwidth of an NVMe device, ensure that each application has its own
|
||||
queue pairs.
|
||||
|
||||
The optimal threading model for SPDK is one thread per core, regardless of which processes
|
||||
that thread belongs to in the case of multi-process environment. To achieve maximum
|
||||
performance, each thread should also have its own I/O queue pair. Applications that share
|
||||
memory should be given core masks that do not overlap.
|
||||
|
||||
However, admin commands may have some performance impact as there is only one admin queue
|
||||
pair per NVMe SSD. The NVMe driver will automatically take a cross-process capable lock
|
||||
to enable the sharing of admin queue pair. Further, when each process polls the admin
|
||||
queue for completions, it will only see completions for commands that it originated.
|
||||
|
||||
# Limitations {#nvme_multi_process_limitations}
|
||||
|
||||
1. Two processes sharing memory may not share any cores in their core mask.
|
||||
2. If a primary process exits while secondary processes are still running, those processes
|
||||
will continue to run. However, a new primary process cannot be created.
|
||||
3. Applications are responsible for coordinating access to logical blocks.
|
||||
|
||||
@sa spdk_nvme_probe, spdk_nvme_ctrlr_process_admin_completions
|
@ -1,3 +1,8 @@
|
||||
# NVMe over Fabrics Target {#nvmf}
|
||||
|
||||
@sa @ref nvme_fabrics_host
|
||||
|
||||
|
||||
# Getting Started Guide {#nvmf_getting_started}
|
||||
|
||||
The NVMe over Fabrics target is a user space application that presents block devices over the
|
||||
@ -18,7 +23,7 @@ machine, the kernel will need to be a release candidate until the code is actual
|
||||
system running the SPDK target, however, you can run any modern flavor of Linux as required by your
|
||||
NIC vendor's OFED distribution.
|
||||
|
||||
# Prerequisites {#nvmf_prereqs}
|
||||
## Prerequisites {#nvmf_prereqs}
|
||||
|
||||
This guide starts by assuming that you can already build the standard SPDK distribution on your
|
||||
platform. By default, the NVMe over Fabrics target is not built. To build nvmf_tgt there are some
|
||||
@ -43,7 +48,7 @@ make CONFIG_RDMA=y <other config parameters>
|
||||
|
||||
Once built, the binary will be in `app/nvmf_tgt`.
|
||||
|
||||
# Prerequisites for InfiniBand/RDMA Verbs {#nvmf_prereqs_verbs}
|
||||
## Prerequisites for InfiniBand/RDMA Verbs {#nvmf_prereqs_verbs}
|
||||
|
||||
Before starting our NVMe-oF target we must load the InfiniBand and RDMA modules that allow
|
||||
userspace processes to use InfiniBand/RDMA verbs directly.
|
||||
@ -59,12 +64,10 @@ modprobe rdma_cm
|
||||
modprobe rdma_ucm
|
||||
~~~
|
||||
|
||||
# Prerequisites for RDMA NICs {#nvmf_prereqs_rdma_nics}
|
||||
## Prerequisites for RDMA NICs {#nvmf_prereqs_rdma_nics}
|
||||
|
||||
Before starting our NVMe-oF target we must detect RDMA NICs and assign them IP addresses.
|
||||
|
||||
## Detecting Mellannox RDMA NICs
|
||||
|
||||
### Mellanox ConnectX-3 RDMA NICs
|
||||
|
||||
~~~{.sh}
|
||||
@ -80,7 +83,7 @@ modprobe mlx5_core
|
||||
modprobe mlx5_ib
|
||||
~~~
|
||||
|
||||
## Assigning IP addresses to RDMA NICs
|
||||
### Assigning IP addresses to RDMA NICs
|
||||
|
||||
~~~{.sh}
|
||||
ifconfig eth1 192.168.100.8 netmask 255.255.255.0 up
|
||||
@ -88,7 +91,7 @@ ifconfig eth2 192.168.100.9 netmask 255.255.255.0 up
|
||||
~~~
|
||||
|
||||
|
||||
# Configuring NVMe over Fabrics Target {#nvmf_config}
|
||||
## Configuring NVMe over Fabrics Target {#nvmf_config}
|
||||
|
||||
A `nvmf_tgt`-specific configuration file is used to configure the NVMe over Fabrics target. This
|
||||
file's primary purpose is to define subsystems. A fully documented example configuration file is
|
||||
@ -102,7 +105,7 @@ the target requires elevated privileges (root) to run.
|
||||
app/nvmf_tgt/nvmf_tgt -c /path/to/nvmf.conf
|
||||
~~~
|
||||
|
||||
# Configuring NVMe over Fabrics Host {#nvmf_host}
|
||||
## Configuring NVMe over Fabrics Host {#nvmf_host}
|
||||
|
||||
Both the Linux kernel and SPDK implemented NVMe over Fabrics host. Users who want to test
|
||||
`nvmf_tgt` with kernel based host should upgrade to Linux kernel 4.8 or later, or can use
|
||||
@ -125,7 +128,7 @@ Disconnect:
|
||||
nvme disconnect -n "nqn.2016-06.io.spdk.cnode1"
|
||||
~~~
|
||||
|
||||
# Assigning CPU Cores to the NVMe over Fabrics Target {#nvmf_config_lcore}
|
||||
## Assigning CPU Cores to the NVMe over Fabrics Target {#nvmf_config_lcore}
|
||||
|
||||
SPDK uses the [DPDK Environment Abstraction Layer](http://dpdk.org/doc/guides/prog_guide/env_abstraction_layer.html)
|
||||
to gain access to hardware resources such as huge memory pages and CPU core(s). DPDK EAL provides
|
||||
@ -166,7 +169,7 @@ on different threads. SPDK gives the user maximum control to determine how many
|
||||
to execute subsystems. Configuring different subsystems to execute on different CPU cores prevents
|
||||
the subsystem data from being evicted from limited CPU cache space.
|
||||
|
||||
# Emulating an NVMe controller {#nvmf_config_virtual_controller}
|
||||
## Emulating an NVMe controller {#nvmf_config_virtual_controller}
|
||||
|
||||
The SPDK NVMe-oF target provides the capability to emulate an NVMe controller using a virtual
|
||||
controller. Using virtual controllers allows storage software developers to run the NVMe-oF target
|
@ -1,5 +0,0 @@
|
||||
# NVMe over Fabrics {#nvmf}
|
||||
|
||||
- @ref nvmf_getting_started
|
||||
|
||||
@sa @ref nvme_fabrics_host
|
@ -1,3 +1,5 @@
|
||||
# vhost {#vhost}
|
||||
|
||||
# vhost Getting Started Guide {#vhost_getting_started}
|
||||
|
||||
The Storage Performance Development Kit vhost application is named "vhost".
|
@ -1,3 +0,0 @@
|
||||
# vhost {#vhost}
|
||||
|
||||
- @ref vhost_getting_started
|
Loading…
x
Reference in New Issue
Block a user