2017-04-28 23:19:05 +00:00
|
|
|
# NVMe over Fabrics Target {#nvmf}
|
|
|
|
|
|
|
|
@sa @ref nvme_fabrics_host
|
|
|
|
|
|
|
|
|
2017-01-04 21:47:02 +00:00
|
|
|
# Getting Started Guide {#nvmf_getting_started}
|
2016-06-06 23:32:22 +00:00
|
|
|
|
2016-08-04 20:18:12 +00:00
|
|
|
The NVMe over Fabrics target is a user space application that presents block devices over the
|
|
|
|
network using RDMA. It requires an RDMA-capable NIC with its corresponding OFED software package
|
|
|
|
installed to run. The target should work on all flavors of RDMA, but it is currently tested against
|
|
|
|
Mellanox NICs (RoCEv2) and Chelsio NICs (iWARP).
|
|
|
|
|
|
|
|
The NVMe over Fabrics specification defines subsystems that can be exported over the network. SPDK
|
|
|
|
has chosen to call the software that exports these subsystems a "target", which is the term used
|
|
|
|
for iSCSI. The specification refers to the "client" that connects to the target as a "host". Many
|
|
|
|
people will also refer to the host as an "initiator", which is the equivalent thing in iSCSI
|
|
|
|
parlance. SPDK will try to stick to the terms "target" and "host" to match the specification.
|
|
|
|
|
|
|
|
There will be both a target and a host implemented in the Linux kernel, and these are available
|
|
|
|
today as a set of patches against the kernel 4.8 release candidate. All of the testing against th
|
|
|
|
SPDK target has been against the proposed Linux kernel host. This means that for at least the host
|
|
|
|
machine, the kernel will need to be a release candidate until the code is actually merged. For the
|
|
|
|
system running the SPDK target, however, you can run any modern flavor of Linux as required by your
|
|
|
|
NIC vendor's OFED distribution.
|
2016-06-06 23:32:22 +00:00
|
|
|
|
2017-04-28 23:19:05 +00:00
|
|
|
## Prerequisites {#nvmf_prereqs}
|
2016-06-06 23:32:22 +00:00
|
|
|
|
2016-08-04 20:18:12 +00:00
|
|
|
This guide starts by assuming that you can already build the standard SPDK distribution on your
|
2016-08-08 23:35:11 +00:00
|
|
|
platform. By default, the NVMe over Fabrics target is not built. To build nvmf_tgt there are some
|
|
|
|
additional dependencies.
|
2016-06-06 23:32:22 +00:00
|
|
|
|
|
|
|
Fedora:
|
2017-01-04 21:47:02 +00:00
|
|
|
~~~{.sh}
|
2016-06-06 23:32:22 +00:00
|
|
|
dnf install libibverbs-devel librdmacm-devel
|
2017-01-04 21:47:02 +00:00
|
|
|
~~~
|
2016-06-06 23:32:22 +00:00
|
|
|
|
|
|
|
Ubuntu:
|
2017-01-04 21:47:02 +00:00
|
|
|
~~~{.sh}
|
2016-06-06 23:32:22 +00:00
|
|
|
apt-get install libibverbs-dev librdmacm-dev
|
2017-01-04 21:47:02 +00:00
|
|
|
~~~
|
2016-06-06 23:32:22 +00:00
|
|
|
|
2016-08-08 23:35:11 +00:00
|
|
|
Then build SPDK with RDMA enabled, either by editing CONFIG to enable CONFIG_RDMA or
|
2016-08-04 20:18:12 +00:00
|
|
|
enabling it on the `make` command line:
|
2016-06-08 23:34:15 +00:00
|
|
|
|
2017-01-04 21:47:02 +00:00
|
|
|
~~~{.sh}
|
2016-07-13 22:34:28 +00:00
|
|
|
make CONFIG_RDMA=y <other config parameters>
|
2017-01-04 21:47:02 +00:00
|
|
|
~~~
|
2016-06-06 23:32:22 +00:00
|
|
|
|
2016-08-04 20:18:12 +00:00
|
|
|
Once built, the binary will be in `app/nvmf_tgt`.
|
|
|
|
|
2017-04-28 23:19:05 +00:00
|
|
|
## Prerequisites for InfiniBand/RDMA Verbs {#nvmf_prereqs_verbs}
|
2017-02-09 16:32:10 +00:00
|
|
|
|
|
|
|
Before starting our NVMe-oF target we must load the InfiniBand and RDMA modules that allow
|
|
|
|
userspace processes to use InfiniBand/RDMA verbs directly.
|
|
|
|
|
|
|
|
~~~{.sh}
|
|
|
|
modprobe ib_cm
|
|
|
|
modprobe ib_core
|
|
|
|
modprobe ib_ucm
|
|
|
|
modprobe ib_umad
|
|
|
|
modprobe ib_uverbs
|
|
|
|
modprobe iw_cm
|
|
|
|
modprobe rdma_cm
|
|
|
|
modprobe rdma_ucm
|
|
|
|
~~~
|
|
|
|
|
2017-04-28 23:19:05 +00:00
|
|
|
## Prerequisites for RDMA NICs {#nvmf_prereqs_rdma_nics}
|
2017-02-09 16:32:10 +00:00
|
|
|
|
|
|
|
Before starting our NVMe-oF target we must detect RDMA NICs and assign them IP addresses.
|
|
|
|
|
|
|
|
### Mellanox ConnectX-3 RDMA NICs
|
|
|
|
|
|
|
|
~~~{.sh}
|
|
|
|
modprobe mlx4_core
|
|
|
|
modprobe mlx4_ib
|
|
|
|
modprobe mlx4_en
|
|
|
|
~~~
|
|
|
|
|
|
|
|
### Mellanox ConnectX-4 RDMA NICs
|
|
|
|
|
|
|
|
~~~{.sh}
|
|
|
|
modprobe mlx5_core
|
|
|
|
modprobe mlx5_ib
|
|
|
|
~~~
|
|
|
|
|
2017-04-28 23:19:05 +00:00
|
|
|
### Assigning IP addresses to RDMA NICs
|
2017-02-09 16:32:10 +00:00
|
|
|
|
|
|
|
~~~{.sh}
|
|
|
|
ifconfig eth1 192.168.100.8 netmask 255.255.255.0 up
|
|
|
|
ifconfig eth2 192.168.100.9 netmask 255.255.255.0 up
|
|
|
|
~~~
|
|
|
|
|
|
|
|
|
2017-04-28 23:19:05 +00:00
|
|
|
## Configuring NVMe over Fabrics Target {#nvmf_config}
|
2016-06-06 23:32:22 +00:00
|
|
|
|
2016-08-08 23:35:11 +00:00
|
|
|
A `nvmf_tgt`-specific configuration file is used to configure the NVMe over Fabrics target. This
|
2016-08-04 20:18:12 +00:00
|
|
|
file's primary purpose is to define subsystems. A fully documented example configuration file is
|
|
|
|
located at `etc/spdk/nvmf.conf.in`.
|
|
|
|
|
|
|
|
You should make a copy of the example configuration file, modify it to suit your environment, and
|
|
|
|
then run the nvmf_tgt application and pass it the configuration file using the -c option. Right now,
|
|
|
|
the target requires elevated privileges (root) to run.
|
|
|
|
|
2017-01-04 21:47:02 +00:00
|
|
|
~~~{.sh}
|
2016-08-04 20:18:12 +00:00
|
|
|
app/nvmf_tgt/nvmf_tgt -c /path/to/nvmf.conf
|
2017-01-04 21:47:02 +00:00
|
|
|
~~~
|
2017-01-25 06:59:22 +00:00
|
|
|
|
2017-04-28 23:19:05 +00:00
|
|
|
## Configuring NVMe over Fabrics Host {#nvmf_host}
|
2017-01-25 06:59:22 +00:00
|
|
|
|
|
|
|
Both the Linux kernel and SPDK implemented NVMe over Fabrics host. Users who want to test
|
|
|
|
`nvmf_tgt` with kernel based host should upgrade to Linux kernel 4.8 or later, or can use
|
|
|
|
Linux distributions Fedora or Ubuntu with Linux 4.8 kernel or later. A client tool nvme-cli
|
|
|
|
is recommended to connect/disconect with NVMe over Fabrics target subsystems. Before
|
|
|
|
connecting to remote subsystems, users should verify nvme-rdma driver is loaded.
|
|
|
|
|
|
|
|
Discovery:
|
|
|
|
~~~{.sh}
|
2017-02-08 02:38:26 +00:00
|
|
|
nvme discover -t rdma -a 192.168.100.8 -s 4420
|
2017-01-25 06:59:22 +00:00
|
|
|
~~~
|
|
|
|
|
|
|
|
Connect:
|
|
|
|
~~~{.sh}
|
|
|
|
nvme connect -t rdma -n "nqn.2016-06.io.spdk:cnode1" -a 192.168.100.8 -s 4420
|
|
|
|
~~~
|
|
|
|
|
|
|
|
Disconnect:
|
|
|
|
~~~{.sh}
|
|
|
|
nvme disconnect -n "nqn.2016-06.io.spdk.cnode1"
|
|
|
|
~~~
|
2017-02-09 16:32:10 +00:00
|
|
|
|
2017-04-28 23:19:05 +00:00
|
|
|
## Assigning CPU Cores to the NVMe over Fabrics Target {#nvmf_config_lcore}
|
2017-02-09 16:32:10 +00:00
|
|
|
|
|
|
|
SPDK uses the [DPDK Environment Abstraction Layer](http://dpdk.org/doc/guides/prog_guide/env_abstraction_layer.html)
|
|
|
|
to gain access to hardware resources such as huge memory pages and CPU core(s). DPDK EAL provides
|
|
|
|
functions to assign threads to specific cores.
|
|
|
|
To ensure the SPDK NVMe-oF target has the best performance, configure the RNICs, NVMe
|
|
|
|
devices, and the core assigned to the NVMe-oF subsystem to all be located on the same NUMA node.
|
|
|
|
The following parameters are used to control which CPU cores SPDK executes on:
|
|
|
|
|
2017-03-10 12:43:58 +00:00
|
|
|
**ReactorMask:** A hexadecimal bit mask of the CPU cores that SPDK is allowed to execute work
|
2017-02-09 16:32:10 +00:00
|
|
|
items on. The ReactorMask is located in the [Global] section of the configuration file. For example,
|
|
|
|
to assign lcores 24,25,26 and 27 to NVMe-oF work items, set the ReactorMask to:
|
|
|
|
~~~{.sh}
|
|
|
|
ReactorMask 0xF000000
|
|
|
|
~~~
|
|
|
|
|
2017-03-10 12:43:58 +00:00
|
|
|
**Assign each Subsystem to a CPU core:** This is accomplished by specifying the "Core" value in
|
2017-02-09 16:32:10 +00:00
|
|
|
the [Subsystem] section of the configuration file. For example,
|
|
|
|
to assign the Subsystems to lcores 25 and 26:
|
|
|
|
~~~{.sh}
|
2017-06-27 18:26:19 +00:00
|
|
|
[Nvme]
|
|
|
|
TransportID "trtype:PCIe traddr:0000:02:00.0" Nvme0
|
|
|
|
TransportID "trtype:PCIe traddr:0000:82:00.0" Nvme1
|
|
|
|
|
2017-02-09 16:32:10 +00:00
|
|
|
[Subsystem1]
|
|
|
|
NQN nqn.2016-06.io.spdk:cnode1
|
|
|
|
Core 25
|
|
|
|
Listen RDMA 192.168.100.8:4420
|
|
|
|
Host nqn.2016-06.io.spdk:init
|
2017-06-27 18:26:19 +00:00
|
|
|
SN SPDK00000000000001
|
|
|
|
Namespace Nvme0n1
|
2017-02-09 16:32:10 +00:00
|
|
|
|
|
|
|
[Subsystem2]
|
|
|
|
NQN nqn.2016-06.io.spdk:cnode2
|
|
|
|
Core 26
|
|
|
|
Listen RDMA 192.168.100.9:4420
|
2017-06-27 18:26:19 +00:00
|
|
|
SN SPDK00000000000002
|
|
|
|
Namespace Nvme1n1
|
2017-02-09 16:32:10 +00:00
|
|
|
~~~
|
|
|
|
SPDK executes all code for an NVMe-oF subsystem on a single thread. Different subsystems may execute
|
|
|
|
on different threads. SPDK gives the user maximum control to determine how many CPU cores are used
|
|
|
|
to execute subsystems. Configuring different subsystems to execute on different CPU cores prevents
|
|
|
|
the subsystem data from being evicted from limited CPU cache space.
|
2017-03-10 12:43:58 +00:00
|
|
|
|
2017-04-28 23:19:05 +00:00
|
|
|
## Emulating an NVMe controller {#nvmf_config_virtual_controller}
|
2017-03-10 12:43:58 +00:00
|
|
|
|
|
|
|
The SPDK NVMe-oF target provides the capability to emulate an NVMe controller using a virtual
|
|
|
|
controller. Using virtual controllers allows storage software developers to run the NVMe-oF target
|
|
|
|
on a system that does not have NVMe devices. You can configure a virtual controller in the configuration
|
|
|
|
file as follows:
|
|
|
|
|
|
|
|
**Create malloc LUNs:** See @ref bdev_getting_started for details on creating Malloc block devices.
|
|
|
|
|
2017-06-27 18:26:19 +00:00
|
|
|
**Create a virtual controller:** Any bdev may be presented as a namespace. For example, to create a
|
|
|
|
virtual controller with two namespaces backed by the malloc LUNs named Malloc0 and Malloc1:
|
2017-03-10 12:43:58 +00:00
|
|
|
~~~{.sh}
|
|
|
|
# Virtual controller
|
|
|
|
[Subsystem2]
|
|
|
|
NQN nqn.2016-06.io.spdk:cnode2
|
|
|
|
Core 0
|
|
|
|
Listen RDMA 192.168.2.21:4420
|
|
|
|
Host nqn.2016-06.io.spdk:init
|
|
|
|
SN SPDK00000000000001
|
|
|
|
Namespace Malloc0
|
|
|
|
Namespace Malloc1
|
|
|
|
~~~
|