numam-spdk
Go to file
Seth Howell ceb32abbd8 nvmf: don't set qpair->group to NULL.
The typical rdma qpair disconnect function goes through the function
_nvmf_rdma_disconnect_retry. When this function was introduced, it was
discovered that we could receive a qpair disconnect event for a given
qpair before that qpair had been assigned to a poll group. In order to
ensure that the disconnect procedure completed properly, we waited on
the current thread in _nvmf_rdma_disconnect_retry for the qpair to be
assigned a poll group before we finally disconnected. see rdma.c:2250.
Since _nvmf_rdma_disconnect_retry was not necessarily called from the
poll group's thread, we relied upon the assumption that the group
variable would never be set back to NULL. See the comment on rdma.c:
2243.

However, in _spdk_nvmf_qpair_destroy we were setting the group back to
NULL. This operation can result in the following set of operations
across multiple threads that prevent a qpair from ever being fully
destroyed.
1. thread 1: receive a disconnect event - call nvmf_rdma_disconnect
2. thread 1: from nvmf_rdma_disconnect call
spdk_nvmf_rdma_qpair_inc_refcnt - setting rqpair->refcnt to 1.
3. thread 2: call spdk_nvmf_rdma_poller_poll.
4. thread 2: in spdk_nvmf_rdma_poller_poll reap a completion with an
error status which causes us to call spdk_nvmf_qpair_disconnect -
rdma:2846
5. thread 2: spdk_nvmf_qpair_disconnect calls _spdk_nvmf_qpair_destroy which sets
qpair->group = NULL
6. thread 1: from nvmf_rdma_disconnect we call
_nvmf_rdma_disconnect_retry which checks if qpair->group == NULL. If
that is the case, we assume that the qpair has not been assigned a group
yet and send ourself a message to call _nvmf_rdma_disconnect_retry again. see rdma.c:2253
7. thread 2: from _spdk_nvmf_qpair_destroy we call
spdk_nvmf_transport_qpair_fini which results in a call to
spdk_nvmf_rdma_close_qpair. which sends dummy send and recvs to the
qpair.
8. thread 2: we call poller_poll and get completions for both the send
and recv dummy requests. This results in a call to
spdk_nvmf_rdma_qpair_destroy.
9. thread 2: spdk_nvmf_rdma_qpair_destroy checks rqpair->refcnt and when
it sees that it does not = 0 (see step 2 above) it returns without
freeing the resources. see rdma.c:629
10. thread 1: we keep churning in _nvmf_rdma_disconnect_retry sending
ourselves messages because rqpair->group is going to be null. Thread 1
never reaches line 2257 where it sends a message to call
_nvmf_rdma_qpair_disconnect. _nvmf_rdma_qpair_disconnect is the function
that decreases the rqpair->refcnt and allows us to make forward progress
on destroying the qpair.

I encountered this issue while trying to disconnect from our target
using the kernel initiator with an x722 NIC. I think the timing on this
bug comes out with that specific configuration because come of the calls
in the disconnect path on thread 1 fail causing it to take longer giving
a chance to the second thread to delete the qpair.

There are really two issues at play here. We don't have a single point
of entry for disconnecting RDMA qpairs, and we rely on the qpair->group
variable never being set back to NULL. This patch addresses the second
issue, and the next patch in the series addresses the first.

Change-Id: I65395d0bbb67edfa7bad2ddc70906606c3d83781
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443304
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-02-11 19:25:51 +00:00
.githooks test: use SKIP_DPDK_BUILD in pre-push githook 2018-07-14 02:20:30 +00:00
app lib/trace: add trace_record tool 2019-01-30 06:36:25 +00:00
build/lib build: consolidate library outputs in build/lib 2016-11-17 13:15:09 -07:00
doc nvme: Update p2p DMA documentation to indicate how to check for support 2019-02-06 16:01:56 +00:00
dpdk@754c3dbc34 dpdk-sub: update the submodule to DPDK 18.11 2019-01-18 17:59:37 +00:00
dpdkbuild bdev/compress: Add configure option and build dependencies 2019-02-11 19:23:17 +00:00
etc/spdk conf: update RDMA and TCP transport NVMe bdev parameter 2019-02-11 12:27:14 +00:00
examples fio_plugin: Use the new DIF library in FIO plugin 2019-02-11 12:05:13 +00:00
go go: empty Go package 2018-06-28 18:15:51 +00:00
include lib/bdev: Expose enabled DIF check types of bdev. 2019-02-08 23:37:13 +00:00
intel-ipsec-mb@134c90c912 ipsec: update submodule commit 2018-07-26 22:29:25 +00:00
ipsecbuild ipsecbuild: force CC=cc 2019-01-28 02:33:50 +00:00
isa-l@09e787231b spdk: Add ISA-L support with related crc32 function 2019-01-29 08:31:00 +00:00
isalbuild spdk: Add ISA-L support with related crc32 function 2019-01-29 08:31:00 +00:00
lib nvmf: don't set qpair->group to NULL. 2019-02-11 19:25:51 +00:00
mk spdk: Add ISA-L support with related crc32 function 2019-01-29 08:31:00 +00:00
pkg version: 19.04 pre 2019-02-01 09:29:12 +00:00
scripts scripts/common.sh: use PCI blacklist and whitelist 2019-02-11 13:29:38 +00:00
shared_lib ut_mock: rename library from spdk_mock to ut_mock 2018-11-20 14:57:57 +00:00
test vm_setup.sh: add iptables dependency 2019-02-11 19:23:55 +00:00
.astylerc astyle: change "add-braces" to "j" for compatibility 2017-12-13 21:23:27 -05:00
.gitignore configure: use mk/config.mk instead of CONFIG.local 2018-10-16 12:40:43 +00:00
.gitmodules spdk: Add ISA-L support with related crc32 function 2019-01-29 08:31:00 +00:00
.travis.yml .travis.yml: tweak IRC notification 2018-03-16 18:52:11 -04:00
autobuild.sh spdk: Add ISA-L support with related crc32 function 2019-01-29 08:31:00 +00:00
autopackage.sh spdk: Add ISA-L support with related crc32 function 2019-01-29 08:31:00 +00:00
autorun_post.py Check file permissions in the check_format script 2018-10-04 23:08:12 +00:00
autorun.sh autorun: passthrough WITH_DPDK_DIR to autotest.sh 2018-10-12 23:46:14 +00:00
autotest.sh autotest: introduce SPDK_RUN_FUNCTIONAL_TEST 2019-02-04 19:19:36 +00:00
CHANGELOG.md version: 19.04 pre 2019-02-01 09:29:12 +00:00
CONFIG bdev/compress: Add configure option and build dependencies 2019-02-11 19:23:17 +00:00
configure bdev/compress: Add configure option and build dependencies 2019-02-11 19:23:17 +00:00
CONTRIBUTING.md Add CONTRIBUTING.md 2017-09-05 13:25:45 -04:00
ISSUE_TEMPLATE.md github: Add issue tracker template 2018-04-19 13:50:08 -04:00
LICENSE Remove year from copyright headers. 2016-01-28 08:54:18 -07:00
Makefile spdk: Add ISA-L support with related crc32 function 2019-01-29 08:31:00 +00:00
README.md doc: update doc with instructions for building shared lib 2018-10-26 20:41:24 +00:00

Storage Performance Development Kit

Build Status

The Storage Performance Development Kit (SPDK) provides a set of tools and libraries for writing high performance, scalable, user-mode storage applications. It achieves high performance by moving all of the necessary drivers into userspace and operating in a polled mode instead of relying on interrupts, which avoids kernel context switches and eliminates interrupt handling overhead.

The development kit currently includes:

In this readme:

Documentation

Doxygen API documentation is available, as well as a Porting Guide for porting SPDK to different frameworks and operating systems.

Source Code

git clone https://github.com/spdk/spdk
cd spdk
git submodule update --init

Prerequisites

The dependencies can be installed automatically by scripts/pkgdep.sh.

./scripts/pkgdep.sh

Build

Linux:

./configure
make

FreeBSD: Note: Make sure you have the matching kernel source in /usr/src/ and also note that CONFIG_COVERAGE option is not available right now for FreeBSD builds.

./configure
gmake

Unit Tests

./test/unit/unittest.sh

You will see several error messages when running the unit tests, but they are part of the test suite. The final message at the end of the script indicates success or failure.

Vagrant

A Vagrant setup is also provided to create a Linux VM with a virtual NVMe controller to get up and running quickly. Currently this has only been tested on MacOS and Ubuntu 16.04.2 LTS with the VirtualBox provider. The VirtualBox Extension Pack must also be installed in order to get the required NVMe support.

Details on the Vagrant setup can be found in the SPDK Vagrant documentation.

Advanced Build Options

Optional components and other build-time configuration are controlled by settings in the Makefile configuration file in the root of the repository. CONFIG contains the base settings for the configure script. This script generates a new file, mk/config.mk, that contains final build settings. For advanced configuration, there are a number of additional options to configure that may be used, or mk/config.mk can simply be created and edited by hand. A description of all possible options is located in CONFIG.

Boolean (on/off) options are configured with a 'y' (yes) or 'n' (no). For example, this line of CONFIG controls whether the optional RDMA (libibverbs) support is enabled:

CONFIG_RDMA?=n

To enable RDMA, this line may be added to mk/config.mk with a 'y' instead of 'n'. For the majority of options this can be done using the configure script. For example:

./configure --with-rdma

Additionally, CONFIG options may also be overridden on the make command line:

make CONFIG_RDMA=y

Users may wish to use a version of DPDK different from the submodule included in the SPDK repository. Note, this includes the ability to build not only from DPDK sources, but also just with the includes and libraries installed via the dpdk and dpdk-devel packages. To specify an alternate DPDK installation, run configure with the --with-dpdk option. For example:

Linux:

./configure --with-dpdk=/path/to/dpdk/x86_64-native-linuxapp-gcc
make

FreeBSD:

./configure --with-dpdk=/path/to/dpdk/x86_64-native-bsdapp-clang
gmake

The options specified on the make command line take precedence over the values in mk/config.mk. This can be useful if you, for example, generate a mk/config.mk using the configure script and then have one or two options (i.e. debug builds) that you wish to turn on and off frequently.

Shared libraries

By default, the build of the SPDK yields static libraries against which the SPDK applications and examples are linked. Configure option --with-shared provides the ability to produce SPDK shared libraries, in addition to the default static ones. Use of this flag also results in the SPDK executables linked to the shared versions of libraries. SPDK shared libraries by default, are located in ./build/lib. This includes the single SPDK shared lib encompassing all of the SPDK static libs (libspdk.so) as well as individual SPDK shared libs corresponding to each of the SPDK static ones.

In order to start a SPDK app linked with SPDK shared libraries, make sure to do the following steps:

  • run ldconfig specifying the directory containing SPDK shared libraries
  • provide proper LD_LIBRARY_PATH

Linux:

./configure --with-shared
make
ldconfig -v -n ./build/lib
LD_LIBRARY_PATH=./build/lib/ ./app/spdk_tgt/spdk_tgt

Hugepages and Device Binding

Before running an SPDK application, some hugepages must be allocated and any NVMe and I/OAT devices must be unbound from the native kernel drivers. SPDK includes a script to automate this process on both Linux and FreeBSD. This script should be run as root.

sudo scripts/setup.sh

Users may wish to configure a specific memory size. Below is an example of configuring 8192MB memory.

sudo HUGEMEM=8192 scripts/setup.sh

Example Code

Example code is located in the examples directory. The examples are compiled automatically as part of the build process. Simply call any of the examples with no arguments to see the help output. You'll likely need to run the examples as a privileged user (root) unless you've done additional configuration to grant your user permission to allocate huge pages and map devices through vfio.

Contributing

For additional details on how to get more involved in the community, including contributing code and participating in discussions and other activities, please refer to spdk.io