Compare commits

..

8 Commits

Author SHA1 Message Date
Ben Walker
4608e917de Update 18.07.1 Changelog
Change-Id: I527b30a852031a79a01a8ad73e63682a3076296a
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/424892
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2018-09-10 22:34:58 +00:00
Jim Harris
f55ffa8b57 Update DPDK submodule to 18.05.1 + SPDK patches.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I6caea461d2b13239dee42f6f96c5b9bdde14c160

Reviewed-on: https://review.gerrithub.io/423157
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2018-09-10 21:16:10 +00:00
Jim Harris
f3cedcc7fe bdev: set iovs on correct bdev_io in spdk_bdev_io_put_buf
spdk_bdev_io_put_buf() is responsible for reclaiming
bdev-allocated buffers from a bdev_io.  If there are
bdev_ios waiting for one of these buffers, it calls
spdk_bdev_io_set_buf() on the next bdev_io in the queue.
This will set the iov_base and iov_len on the bdev_io
to point to the bdev-allocated buffer.

But spdk_bdev_io_put_buf() was calling spdk_bdev_io_set_buf()
on the just completed bdev_io, not the next bdev_io in the
queue.  So fix that.

Fixes: 844aedf8 ("bdev: Simplify get/set/put buf functions")
Reported-by: Alan Tu
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ibbcad6e35a3db6991bd7deb3516229572f021638
Reviewed-on: https://review.gerrithub.io/424881
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2018-09-10 21:16:10 +00:00
Jim Harris
44a43939e8 nvme: add quirk for Intel SSDs without vendor-specific log pages
QEMU emulated NVMe SSDs report themselves with an Intel vendor ID,
but don't support the Intel vendor-specific log pages.  So add
a quirk to avoid confusing error messages.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ic41476801ede94d43acb9972217ea7420ca53679
Reviewed-on: https://review.gerrithub.io/423422
Reviewed-on: https://review.gerrithub.io/423928
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2018-09-05 20:45:06 +00:00
Karol Latecki
9fca71f514 scripts/vagrant: change create_vbox.sh shebang
pushd and popd not in default path for /bin/bash

Change-Id: I83e0bd1f87005e1c8542ac3db44b26f83eedf96c
Signed-off-by: Karol Latecki <karol.latecki@intel.com>
Reviewed-on: https://review.gerrithub.io/421903
Reviewed-on: https://review.gerrithub.io/423925
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2018-09-05 20:45:06 +00:00
Seth Howell
bbb2989c26 bdev: increment io_time if queue depth > 0
This value is used to calculate the disk utilization of a given bdev.

Change-Id: I4bf101c524b92bdd21573941e17f61db59c5c6b8
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/423017
Reviewed-on: https://review.gerrithub.io/423927
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: Ben Walker <benjamin.walker@intel.com>
2018-09-05 20:45:06 +00:00
Dariusz Stojaczyk
4baae265ca dpdk/pci: support DPDK 18.08 write combined PCI resources
We used to support it by default in our DPDK forks,
but starting with DPDK 18.08, a new PCI driver flag
RTE_PCI_DRV_WC_ACTIVATE is required.

We enable now it for NVMe and Virtio, but not for I/OAT,
as our I/OAT driver currently assumes strong memory
ordering, which prefetchable resources do not provide.

Change-Id: I1a13356e28535981153b3d3e52bfe9d66b6172ae
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/422588
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: <wenqianx.zong@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: Jim Harris <james.r.harris@intel.com>
2018-08-21 14:31:09 +00:00
Dariusz Stojaczyk
ec611eb485 env/dpdk: link with rte_kvargs by default
Starting with DPDK 18.08, rte_kvargs is a dependency
of rte_eal.

Change-Id: I0cde78f632fc313cec745d41ee519fb8b37de81a
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/422587
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: Jim Harris <james.r.harris@intel.com>
2018-08-21 14:31:09 +00:00
1557 changed files with 80253 additions and 297075 deletions

View File

@ -20,17 +20,16 @@ SYSTEM=`uname -s`
exec 1>&2
if [ "$SYSTEM" = "FreeBSD" ]; then
MAKE="gmake MAKE=gmake -j $(sysctl -a | grep -E -i 'hw.ncpu' | awk '{print $2}')"
MAKE="gmake MAKE=gmake -j ${nproc}"
COMP="clang"
else
MAKE="make -j $(nproc)"
MAKE="make -j ${nproc}"
COMP="gcc"
fi
echo "Running make with $COMP ..."
echo "${MAKE} clean " > make.log
$MAKE clean >> make.log 2>&1
echo "${MAKE} CONFIG_DEBUG=n CONFIG_WERROR=y " >> make.log
$MAKE CONFIG_DEBUG=n CONFIG_WERROR=y >> make.log 2>&1
rc=$?
@ -76,6 +75,64 @@ fi
echo "$MAKE clean " >> make.log
$MAKE clean >> make.log 2>&1
if [ "$SYSTEM" = "FreeBSD" ]; then
echo
echo "Pushing to $1 $2"
exit $rc
fi
if ! hash clang 2>/dev/null; then
echo "clang not found; skipping the clang tests"
echo
echo "Pushing to $1 $2"
exit $rc
fi
echo "Running make with clang ..."
echo "make CONFIG_DEBUG=n CONFIG_WERROR=y CC=clang CXX=clang++ " >> make.log
$MAKE CONFIG_DEBUG=n CONFIG_WERROR=y CC=clang CXX=clang++ >> make.log 2>&1
rc=$?
if [ $rc -ne 0 ]; then
tail -20 make.log
echo ""
echo "ERROR make CC=clang CXX=clang++ returned errors!"
echo "ERROR Fix the problem and use 'git commit' to update your changes."
echo "ERROR See `pwd`/make.log for more information."
echo ""
exit $rc
fi
echo "make clean CC=clang CXX=clang++ SKIP_DPDK_BUILD=1 " >> make.log
$MAKE clean CC=clang CXX=clang++ SKIP_DPDK_BUILD=1 >> make.log 2>&1
echo "make CONFIG_DEBUG=y CONFIG_WERROR=y CC=clang CXX=clang++ SKIP_DPDK_BUILD=1 " >> make.log
$MAKE CONFIG_DEBUG=y CONFIG_WERROR=y CC=clang CXX=clang++ SKIP_DPDK_BUILD=1 >> make.log 2>&1
rc=$?
if [ $rc -ne 0 ]; then
tail -20 make.log
echo ""
echo "ERROR make CC=clang CXX=clang++ returned errors!"
echo "ERROR Fix the problem and use 'git commit' to update your changes."
echo "ERROR See `pwd`/make.log for more information."
echo ""
exit $rc
fi
echo "Running unittest.sh ..."
echo "./test/unit/unittest.sh" >> make.log
"./test/unit/unittest.sh" >> make.log 2>&1
rc=$?
if [ $rc -ne 0 ]; then
tail -20 make.log
echo ""
echo "ERROR unittest returned errors!"
echo "ERROR Fix the problem and use 'git commit' to update your changes."
echo "ERROR See `pwd`/make.log for more information."
echo ""
exit $rc
fi
${MAKE} clean CC=clang CXX=clang++ 2> /dev/null
echo "Pushing to $1 $2"
exit $rc

View File

@ -1,8 +0,0 @@
blank_issues_enabled: false
contact_links:
- name: SPDK Community
url: https://spdk.io/community/
about: Please ask and answer questions here.
- name: SPDK Common Vulnerabilities and Exposures (CVE) Process
url: https://spdk.io/cve_threat/
about: Please follow CVE process to responsibly disclose security vulnerabilities.

View File

@ -1,23 +0,0 @@
---
name: CI Intermittent Failure
about: Create a report with CI failure unrelated to the patch tested.
title: '[test_name] Failure description'
labels: 'Intermittent Failure'
assignees: ''
---
<!--- Provide a [test_name] where the issue occurred and brief description in the Title above. -->
<!--- Name of the test can be found by last occurrence of: -->
<!--- ************************************ -->
<!--- START TEST [test_name] -->
<!--- ************************************ -->
## Link to the failed CI build
<!--- Please provide a link to the failed CI build -->
## Execution failed at
<!--- Please provide the first failure in the test. Pointed to by the first occurrence of: -->
<!--- ========== Backtrace start: ========== -->

View File

@ -1,10 +0,0 @@
filters:
- true
commentBody: |
Thanks for your contribution! Unfortunately, we don't use GitHub pull
requests to manage code contributions to this repository. Instead, please
see https://spdk.io/development which provides instructions on how to
submit patches to the SPDK Gerrit instance.
addLabel: false

15
.gitignore vendored
View File

@ -2,40 +2,27 @@
*.a
*.cmd
*.d
*.dll
*.exe
*.gcda
*.gcno
*.kdev4
*.ko
*.lib
*.log
*.o
*.obj
*.pdb
*.pyc
*.so
*.so.*
*.swp
*.DS_Store
build/
ut_coverage/
tags
cscope.out
dpdk-*
CUnit-Memory-Dump.xml
include/spdk/config.h
config.h
CONFIG.local
*VC.db
.vscode
.project
.cproject
.settings
.gitreview
mk/cc.mk
mk/config.mk
mk/cc.flags.mk
PYTHON_COMMAND
test_completions.txt
timing.txt
test/common/build_config.sh

11
.gitmodules vendored
View File

@ -1,15 +1,6 @@
[submodule "dpdk"]
path = dpdk
url = https://git.quacker.org/d/numam-dpdk.git
url = https://github.com/spdk/dpdk.git
[submodule "intel-ipsec-mb"]
path = intel-ipsec-mb
url = https://github.com/spdk/intel-ipsec-mb.git
[submodule "isa-l"]
path = isa-l
url = https://github.com/spdk/isa-l.git
[submodule "ocf"]
path = ocf
url = https://github.com/Open-CAS/ocf.git
[submodule "libvfio-user"]
path = libvfio-user
url = https://github.com/nutanix/libvfio-user.git

37
.travis.yml Normal file
View File

@ -0,0 +1,37 @@
language: c
compiler:
- gcc
- clang
dist: trusty
sudo: false
addons:
apt:
packages:
- libcunit1-dev
- libaio-dev
- libssl-dev
- uuid-dev
- libnuma-dev
before_script:
- git submodule update --init
- export MAKEFLAGS="-j$(nproc)"
script:
- ./scripts/check_format.sh
- ./configure --enable-werror
- make
- ./test/unit/unittest.sh
notifications:
irc:
channels:
- "chat.freenode.net#spdk"
template:
- "(%{repository_name}/%{branch}) %{commit_subject} (%{author})"
- "Diff URL: %{compare_url}"
on_success: always
on_failure: always

File diff suppressed because it is too large Load Diff

131
CONFIG
View File

@ -32,143 +32,78 @@
#
# Installation prefix
CONFIG_PREFIX="/usr/local"
# Target architecture
CONFIG_ARCH=native
# Prefix for cross compilation
CONFIG_CROSS_PREFIX=
CONFIG_PREFIX?=/usr/local
# Build with debug logging. Turn off for performance testing and normal usage
CONFIG_DEBUG=n
CONFIG_DEBUG?=n
# Show backtrace when logging message at level <= lvl (ERROR, WARN, NOTICE, DEBUG)
#CONFIG_LOG_BACKTRACE?=lvl
# Treat warnings as errors (fail the build on any warning).
CONFIG_WERROR=n
CONFIG_WERROR?=n
# Build with link-time optimization.
CONFIG_LTO=n
# Generate profile guided optimization data.
CONFIG_PGO_CAPTURE=n
# Use profile guided optimization data.
CONFIG_PGO_USE=n
CONFIG_LTO?=n
# Build with code coverage instrumentation.
CONFIG_COVERAGE=n
CONFIG_COVERAGE?=n
# Build with Address Sanitizer enabled
CONFIG_ASAN=n
CONFIG_ASAN?=n
# Build with Undefined Behavior Sanitizer enabled
CONFIG_UBSAN=n
CONFIG_UBSAN?=n
# Build with Thread Sanitizer enabled
CONFIG_TSAN=n
CONFIG_TSAN?=n
# Build functional tests
CONFIG_TESTS=y
# Build unit tests
CONFIG_UNIT_TESTS=y
# Build examples
CONFIG_EXAMPLES=y
# Build with Control-flow Enforcement Technology (CET)
CONFIG_CET=n
# Build tests
CONFIG_TESTS?=y
# Directory that contains the desired SPDK environment library.
# By default, this is implemented using DPDK.
CONFIG_ENV=
CONFIG_ENV?=$(SPDK_ROOT_DIR)/lib/env_dpdk
# This directory should contain 'include' and 'lib' directories for your DPDK
# installation.
CONFIG_DPDK_DIR=
# installation. Alternatively you can specify this on the command line
# with 'make DPDK_DIR=/path/to/dpdk'. This is only a valid entry
# when using the default SPDK environment library.
CONFIG_DPDK_DIR?=$(SPDK_ROOT_DIR)/dpdk/build
# This directory should contain 'include' and 'lib' directories for WPDK.
CONFIG_WPDK_DIR=
# Build SPDK FIO plugin. Requires CONFIG_FIO_SOURCE_DIR set to a valid
# Build SPDK FIO plugin. Requires FIO_SOURCE_DIR set to a valid
# fio source code directory.
CONFIG_FIO_PLUGIN=n
CONFIG_FIO_PLUGIN?=n
# This directory should contain the source code directory for fio
# which is required for building the SPDK FIO plugin.
CONFIG_FIO_SOURCE_DIR=/usr/src/fio
FIO_SOURCE_DIR?=/usr/src/fio
# Enable RDMA support for the NVMf target.
# Requires ibverbs development libraries.
CONFIG_RDMA=n
CONFIG_RDMA_SEND_WITH_INVAL=n
CONFIG_RDMA_SET_ACK_TIMEOUT=n
CONFIG_RDMA_PROV=verbs
# Enable NVMe Character Devices.
CONFIG_NVME_CUSE=n
# Enable FC support for the NVMf target.
# Requires FC low level driver (from FC vendor)
CONFIG_FC=n
CONFIG_FC_PATH=
CONFIG_RDMA?=n
# Build Ceph RBD support in bdev modules
# Requires librbd development libraries
CONFIG_RBD=n
CONFIG_RBD?=n
# Build vhost library.
CONFIG_VHOST=y
CONFIG_VHOST?=y
# Build vhost initiator (Virtio) driver.
CONFIG_VIRTIO=y
# Build custom vfio-user transport for NVMf target and NVMe initiator.
CONFIG_VFIO_USER=n
CONFIG_VFIO_USER_DIR=
CONFIG_VIRTIO?=y
# Build with PMDK backends
CONFIG_PMDK=n
CONFIG_PMDK_DIR=
CONFIG_PMDK?=n
# Enable the dependencies for building the compress vbdev
CONFIG_REDUCE=n
# Build with VPP
CONFIG_VPP?=n
# Requires libiscsi development libraries.
CONFIG_ISCSI_INITIATOR=n
CONFIG_ISCSI_INITIATOR?=n
# Build with raid
CONFIG_RAID?=n
# Enable the dependencies for building the crypto vbdev
CONFIG_CRYPTO=n
# Build spdk shared libraries in addition to the static ones.
CONFIG_SHARED=n
# Build with VTune suport.
CONFIG_VTUNE=n
CONFIG_VTUNE_DIR=
# Build Intel IPSEC_MB library
CONFIG_IPSEC_MB=n
# Enable OCF module
CONFIG_OCF=n
CONFIG_OCF_PATH=
CONFIG_CUSTOMOCF=n
# Build ISA-L library
CONFIG_ISAL=y
# Build with IO_URING support
CONFIG_URING=n
# Path to custom built IO_URING library
CONFIG_URING_PATH=
# Build with FUSE support
CONFIG_FUSE=n
# Build with RAID5 support
CONFIG_RAID5=n
# Build with IDXD support
CONFIG_IDXD=n
CONFIG_CRYPTO?=n

View File

@ -1,28 +1,19 @@
---
name: Bug report
about: Create a report to help us improve. Please use the issue tracker only for reporting suspected issues.
title: ''
labels: 'Sighting'
assignees: ''
Please use the issue tracker only for reporting suspected issues.
---
See [The SPDK Community Page](http://www.spdk.io/community/) for other SPDK communications channels.
<!--- Provide a general summary of the issue in the Title above -->
## Expected Behavior
<!--- Tell us what should happen -->
## Current Behavior
<!--- Tell us what happens instead of the expected behavior -->
## Possible Solution
<!--- Not obligatory, but suggest a fix/reason for the bug, -->
## Steps to Reproduce
<!--- Provide a link to a live example, or an unambiguous set of steps to -->
<!--- reproduce this bug. Include code to reproduce, if relevant -->
1.
@ -31,5 +22,4 @@ assignees: ''
4.
## Context (Environment including OS version, SPDK version, etc.)
<!--- Providing context helps us come up with a solution that is most useful in the real world -->

13
LICENSE
View File

@ -1,16 +1,3 @@
The SPDK repo contains multiple git submodules each with its own
license info. Unless otherwise noted all other code in this repo
is BSD as stated below.
Submodule license info:
dpdk: see dpdk/license
intel-ipsec-mb: see intel-ipsec-mb/LICENSE
isa-l: see isa-l/LICENSE
libvfio-user: see libvfio-user/LICENSE
ocf: see ocf/LICENSE
The rest of the SPDK repo:
BSD LICENSE
Copyright (c) Intel Corporation.

112
Makefile
View File

@ -2,7 +2,6 @@
# BSD LICENSE
#
# Copyright (c) Intel Corporation.
# Copyright (c) 2020, Mellanox Corporation.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
@ -37,22 +36,10 @@ S :=
SPDK_ROOT_DIR := $(CURDIR)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
DIRS-y += lib
DIRS-y += module
DIRS-$(CONFIG_SHARED) += shared_lib
DIRS-y += app include
DIRS-$(CONFIG_EXAMPLES) += examples
DIRS-y += test
DIRS-$(CONFIG_IPSEC_MB) += ipsecbuild
DIRS-$(CONFIG_ISAL) += isalbuild
DIRS-$(CONFIG_VFIO_USER) += vfiouserbuild
DIRS-y += lib shared_lib examples app include
DIRS-$(CONFIG_TESTS) += test
.PHONY: all clean $(DIRS-y) include/spdk/config.h mk/config.mk \
cc_version cxx_version .libs_only_other .ldflags ldflags install \
uninstall
# Workaround for ninja. See dpdkbuild/Makefile
export MAKE_PID := $(shell echo $$PPID)
.PHONY: all clean $(DIRS-y) config.h CONFIG.local mk/cc.mk cc_version cxx_version
ifeq ($(SPDK_ROOT_DIR)/lib/env_dpdk,$(CONFIG_ENV))
ifeq ($(CURDIR)/dpdk/build,$(CONFIG_DPDK_DIR))
@ -63,82 +50,33 @@ endif
endif
endif
ifeq ($(OS),Windows)
ifeq ($(CURDIR)/wpdk/build,$(CONFIG_WPDK_DIR))
WPDK = wpdk
DIRS-y += wpdk
endif
endif
ifeq ($(CONFIG_SHARED),y)
LIB = shared_lib
else
LIB = module
endif
ifeq ($(CONFIG_IPSEC_MB),y)
LIB += ipsecbuild
DPDK_DEPS += ipsecbuild
endif
ifeq ($(CONFIG_ISAL),y)
LIB += isalbuild
DPDK_DEPS += isalbuild
endif
ifeq ($(CONFIG_VFIO_USER),y)
VFIOUSERBUILD = vfiouserbuild
LIB += vfiouserbuild
endif
all: mk/cc.mk $(DIRS-y)
all: $(DIRS-y)
clean: $(DIRS-y)
$(Q)rm -f include/spdk/config.h
$(Q)rm -rf build/bin
$(Q)rm -rf build/fio
$(Q)rm -rf build/examples
$(Q)rm -rf build/include
$(Q)rm -rf build/lib/pkgconfig
$(Q)find build/lib ! -name .gitignore -type f -delete
$(Q)rm -f mk/cc.mk
$(Q)rm -f config.h
install: all
$(Q)echo "Installed to $(DESTDIR)$(CONFIG_PREFIX)"
uninstall: $(DIRS-y)
$(Q)echo "Uninstalled spdk"
ifneq ($(SKIP_DPDK_BUILD),1)
dpdkdeps $(DPDK_DEPS): $(WPDK)
dpdkbuild: $(WPDK) $(DPDK_DEPS)
endif
lib: $(WPDK) $(DPDKBUILD) $(VFIOUSERBUILD)
module: lib
shared_lib: module
app: $(LIB)
test: $(LIB)
examples: $(LIB)
shared_lib: lib
lib: $(DPDKBUILD)
app: lib
test: lib
examples: lib
pkgdep:
sh ./scripts/pkgdep.sh
$(DIRS-y): mk/cc.mk build_dir include/spdk/config.h
$(DIRS-y): mk/cc.mk config.h
mk/cc.mk:
$(Q)echo "Please run configure prior to make"
false
$(Q)scripts/detect_cc.sh --cc=$(CC) --cxx=$(CXX) --lto=$(CONFIG_LTO) > $@.tmp; \
cmp -s $@.tmp $@ || mv $@.tmp $@ ; \
rm -f $@.tmp
build_dir: mk/cc.mk
$(Q)mkdir -p build/lib/pkgconfig/tmp
$(Q)mkdir -p build/bin
$(Q)mkdir -p build/fio
$(Q)mkdir -p build/examples
$(Q)mkdir -p build/include/spdk
include/spdk/config.h: mk/config.mk scripts/genconfig.py
$(Q)echo "#ifndef SPDK_CONFIG_H" > $@.tmp; \
echo "#define SPDK_CONFIG_H" >> $@.tmp; \
scripts/genconfig.py $(MAKEFLAGS) >> $@.tmp; \
echo "#endif /* SPDK_CONFIG_H */" >> $@.tmp; \
config.h: CONFIG CONFIG.local scripts/genconfig.py
$(Q)PYCMD=$$(cat PYTHON_COMMAND 2>/dev/null) ; \
test -z "$$PYCMD" && PYCMD=python ; \
$$PYCMD scripts/genconfig.py $(MAKEFLAGS) > $@.tmp; \
cmp -s $@.tmp $@ || mv $@.tmp $@ ; \
rm -f $@.tmp
@ -148,16 +86,4 @@ cc_version: mk/cc.mk
cxx_version: mk/cc.mk
$(Q)echo "SPDK using CXX=$(CXX)"; $(CXX) -v
.libs_only_other:
$(Q)echo -n '$(SYS_LIBS) '
$(Q)if [ "$(CONFIG_SHARED)" = "y" ]; then \
echo -n '-lspdk '; \
fi
.ldflags:
$(Q)echo -n '$(LDFLAGS) '
ldflags: .ldflags .libs_only_other
$(Q)echo ''
include $(SPDK_ROOT_DIR)/mk/spdk.subdirs.mk

View File

@ -10,7 +10,6 @@ interrupts, which avoids kernel context switches and eliminates interrupt
handling overhead.
The development kit currently includes:
* [NVMe driver](http://www.spdk.io/doc/nvme.html)
* [I/OAT (DMA engine) driver](http://www.spdk.io/doc/ioat.html)
* [NVMe over Fabrics target](http://www.spdk.io/doc/nvmf.html)
@ -18,7 +17,7 @@ The development kit currently includes:
* [vhost target](http://www.spdk.io/doc/vhost.html)
* [Virtio-SCSI driver](http://www.spdk.io/doc/virtio.html)
# In this readme
# In this readme:
* [Documentation](#documentation)
* [Prerequisites](#prerequisites)
@ -26,9 +25,7 @@ The development kit currently includes:
* [Build](#libraries)
* [Unit Tests](#tests)
* [Vagrant](#vagrant)
* [AWS](#aws)
* [Advanced Build Options](#advanced)
* [Shared libraries](#shared)
* [Hugepages and Device Binding](#huge)
* [Example Code](#examples)
* [Contributing](#contributing)
@ -53,9 +50,6 @@ git submodule update --init
## Prerequisites
The dependencies can be installed automatically by `scripts/pkgdep.sh`.
The `scripts/pkgdep.sh` script will automatically install the bare minimum
dependencies required to build SPDK.
Use `--help` to see information on installing dependencies for optional components
~~~{.sh}
./scripts/pkgdep.sh
@ -97,33 +91,24 @@ success or failure.
A [Vagrant](https://www.vagrantup.com/downloads.html) setup is also provided
to create a Linux VM with a virtual NVMe controller to get up and running
quickly. Currently this has been tested on MacOS, Ubuntu 16.04.2 LTS and
Ubuntu 18.04.3 LTS with the VirtualBox and Libvirt provider.
The [VirtualBox Extension Pack](https://www.virtualbox.org/wiki/Downloads)
or [Vagrant Libvirt] (https://github.com/vagrant-libvirt/vagrant-libvirt) must
quickly. Currently this has only been tested on MacOS and Ubuntu 16.04.2 LTS
with the [VirtualBox](https://www.virtualbox.org/wiki/Downloads) provider. The
[VirtualBox Extension Pack](https://www.virtualbox.org/wiki/Downloads) must
also be installed in order to get the required NVMe support.
Details on the Vagrant setup can be found in the
[SPDK Vagrant documentation](http://spdk.io/doc/vagrant.html).
<a id="aws"></a>
## AWS
The following setup is known to work on AWS:
Image: Ubuntu 18.04
Before running `setup.sh`, run `modprobe vfio-pci`
then: `DRIVER_OVERRIDE=vfio-pci ./setup.sh`
<a id="advanced"></a>
## Advanced Build Options
Optional components and other build-time configuration are controlled by
settings in the Makefile configuration file in the root of the repository. `CONFIG`
contains the base settings for the `configure` script. This script generates a new
file, `mk/config.mk`, that contains final build settings. For advanced configuration,
there are a number of additional options to `configure` that may be used, or
`mk/config.mk` can simply be created and edited by hand. A description of all
possible options is located in `CONFIG`.
settings in two Makefile fragments in the root of the repository. `CONFIG`
contains the base settings. Running the `configure` script generates a new
file, `CONFIG.local`, that contains overrides to the base `CONFIG` file. For
advanced configuration, there are a number of additional options to `configure`
that may be used, or `CONFIG.local` can simply be created and edited by hand. A
description of all possible options is located in `CONFIG`.
Boolean (on/off) options are configured with a 'y' (yes) or 'n' (no). For
example, this line of `CONFIG` controls whether the optional RDMA (libibverbs)
@ -131,7 +116,7 @@ support is enabled:
CONFIG_RDMA?=n
To enable RDMA, this line may be added to `mk/config.mk` with a 'y' instead of
To enable RDMA, this line may be added to `CONFIG.local` with a 'y' instead of
'n'. For the majority of options this can be done using the `configure` script.
For example:
@ -139,7 +124,7 @@ For example:
./configure --with-rdma
~~~
Additionally, `CONFIG` options may also be overridden on the `make` command
Additionally, `CONFIG` options may also be overrriden on the `make` command
line:
~~~{.sh}
@ -147,10 +132,8 @@ make CONFIG_RDMA=y
~~~
Users may wish to use a version of DPDK different from the submodule included
in the SPDK repository. Note, this includes the ability to build not only
from DPDK sources, but also just with the includes and libraries
installed via the dpdk and dpdk-devel packages. To specify an alternate DPDK
installation, run configure with the --with-dpdk option. For example:
in the SPDK repository. To specify an alternate DPDK installation, run
configure with the --with-dpdk option. For example:
Linux:
@ -167,40 +150,10 @@ gmake
~~~
The options specified on the `make` command line take precedence over the
values in `mk/config.mk`. This can be useful if you, for example, generate
a `mk/config.mk` using the `configure` script and then have one or two
options (i.e. debug builds) that you wish to turn on and off frequently.
<a id="shared"></a>
## Shared libraries
By default, the build of the SPDK yields static libraries against which
the SPDK applications and examples are linked.
Configure option `--with-shared` provides the ability to produce SPDK shared
libraries, in addition to the default static ones. Use of this flag also
results in the SPDK executables linked to the shared versions of libraries.
SPDK shared libraries by default, are located in `./build/lib`. This includes
the single SPDK shared lib encompassing all of the SPDK static libs
(`libspdk.so`) as well as individual SPDK shared libs corresponding to each
of the SPDK static ones.
In order to start a SPDK app linked with SPDK shared libraries, make sure
to do the following steps:
- run ldconfig specifying the directory containing SPDK shared libraries
- provide proper `LD_LIBRARY_PATH`
If DPDK shared libraries are used, you may also need to add DPDK shared
libraries to `LD_LIBRARY_PATH`
Linux:
~~~{.sh}
./configure --with-shared
make
ldconfig -v -n ./build/lib
LD_LIBRARY_PATH=./build/lib/:./dpdk/build/lib/ ./build/bin/spdk_tgt
~~~
default values in `CONFIG` and `CONFIG.local`. This can be useful if you, for
example, generate a `CONFIG.local` using the `configure` script and then have
one or two options (i.e. debug builds) that you wish to turn on and off
frequently.
<a id="huge"></a>
## Hugepages and Device Binding
@ -235,5 +188,5 @@ vfio.
## Contributing
For additional details on how to get more involved in the community, including
[contributing code](http://www.spdk.io/development) and participating in discussions and other activities, please
[contributing code](http://www.spdk.io/development) and participating in discussions and other activiites, please
refer to [spdk.io](http://www.spdk.io/community)

View File

@ -35,19 +35,12 @@ SPDK_ROOT_DIR := $(abspath $(CURDIR)/..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
DIRS-y += trace
DIRS-y += trace_record
DIRS-y += nvmf_tgt
DIRS-y += iscsi_top
DIRS-y += iscsi_tgt
DIRS-y += spdk_tgt
DIRS-y += spdk_lspci
ifneq ($(OS),Windows)
# TODO - currently disabled on Windows due to lack of support for curses
DIRS-y += spdk_top
endif
ifeq ($(OS),Linux)
DIRS-$(CONFIG_VHOST) += vhost
DIRS-y += spdk_dd
endif
.PHONY: all clean $(DIRS-y)

View File

@ -33,6 +33,7 @@
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
include $(SPDK_ROOT_DIR)/mk/spdk.modules.mk
APP = iscsi_tgt
@ -43,14 +44,27 @@ CFLAGS += -I$(SPDK_ROOT_DIR)/lib
C_SRCS := iscsi_tgt.c
SPDK_LIB_LIST = $(ALL_MODULES_LIST) event_iscsi event_net
ifeq ($(SPDK_ROOT_DIR)/lib/env_dpdk,$(CONFIG_ENV))
SPDK_LIB_LIST += env_dpdk_rpc
endif
SPDK_LIB_LIST = event_bdev event_copy event_iscsi event_net event_scsi
SPDK_LIB_LIST += jsonrpc json rpc bdev_rpc bdev iscsi scsi copy trace conf
SPDK_LIB_LIST += thread util log log_rpc event app_rpc net
ifeq ($(OS),Linux)
SPDK_LIB_LIST += event_nbd
SPDK_LIB_LIST += event_nbd nbd
endif
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
LIBS += $(BLOCKDEV_MODULES_LINKER_ARGS) \
$(COPY_MODULES_LINKER_ARGS) \
$(SOCK_MODULES_LINKER_ARGS)
LIBS += $(SPDK_LIB_LINKER_ARGS)
LIBS += $(ENV_LINKER_ARGS)
all : $(APP)
@:
$(APP) : $(OBJS) $(SPDK_LIB_FILES) $(ENV_LIBS) $(BLOCKDEV_MODULES_FILES) $(COPY_MODULES_FILES) $(SOCK_MODULES_FILES)
$(LINK_C)
clean :
$(CLEAN_C) $(APP)
include $(SPDK_ROOT_DIR)/mk/spdk.deps.mk

View File

@ -42,13 +42,28 @@
static int g_daemon_mode = 0;
static void
iscsi_usage(void)
spdk_sigusr1(int signo __attribute__((__unused__)))
{
printf(" -b run iscsi target background, the default is foreground\n");
char *config_str = NULL;
if (spdk_app_get_running_config(&config_str, "iscsi.conf") < 0) {
fprintf(stderr, "Error getting config\n");
} else {
fprintf(stdout, "============================\n");
fprintf(stdout, " iSCSI target running config\n");
fprintf(stdout, "=============================\n");
fprintf(stdout, "%s", config_str);
}
free(config_str);
}
static void
spdk_startup(void *arg1)
iscsi_usage(void)
{
printf(" -b run iscsi target background, the default is foreground\n");
}
static void
spdk_startup(void *arg1, void *arg2)
{
if (getenv("MEMZONE_DUMP") != NULL) {
spdk_memzone_dump(stdout);
@ -56,7 +71,7 @@ spdk_startup(void *arg1)
}
}
static int
static void
iscsi_parse_arg(int ch, char *arg)
{
switch (ch) {
@ -64,9 +79,9 @@ iscsi_parse_arg(int ch, char *arg)
g_daemon_mode = 1;
break;
default:
return -EINVAL;
assert(false);
break;
}
return 0;
}
int
@ -75,9 +90,10 @@ main(int argc, char **argv)
int rc;
struct spdk_app_opts opts = {};
spdk_app_opts_init(&opts, sizeof(opts));
spdk_app_opts_init(&opts);
opts.config_file = SPDK_ISCSI_DEFAULT_CONFIG;
opts.name = "iscsi";
if ((rc = spdk_app_parse_args(argc, argv, &opts, "b", NULL,
if ((rc = spdk_app_parse_args(argc, argv, &opts, "b",
iscsi_parse_arg, iscsi_usage)) !=
SPDK_APP_PARSE_ARGS_SUCCESS) {
exit(rc);
@ -85,15 +101,16 @@ main(int argc, char **argv)
if (g_daemon_mode) {
if (daemon(1, 0) < 0) {
SPDK_ERRLOG("Start iscsi target daemon failed.\n");
SPDK_ERRLOG("Start iscsi target daemon faild.\n");
exit(EXIT_FAILURE);
}
}
opts.shutdown_cb = NULL;
opts.usr1_handler = spdk_sigusr1;
/* Blocks until the application is exiting */
rc = spdk_app_start(&opts, spdk_startup, NULL);
rc = spdk_app_start(&opts, spdk_startup, NULL, NULL);
if (rc) {
SPDK_ERRLOG("Start iscsi target daemon: spdk_app_start() retn non-zero\n");
}

View File

@ -33,14 +33,21 @@
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
include $(SPDK_ROOT_DIR)/mk/spdk.modules.mk
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
CXXFLAGS += $(ENV_CXXFLAGS)
CXXFLAGS += -I$(SPDK_ROOT_DIR)/lib
CXX_SRCS = iscsi_top.cpp
APP = iscsi_top
SPDK_LIB_LIST = rpc
all: $(APP)
@:
CFLAGS += -I$(SPDK_ROOT_DIR)/lib
$(APP) : $(OBJS)
$(LINK_CXX)
C_SRCS := iscsi_top.c
clean:
$(CLEAN_C) $(APP)
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
include $(SPDK_ROOT_DIR)/mk/spdk.deps.mk

View File

@ -33,106 +33,92 @@
#include "spdk/stdinc.h"
#include "spdk/event.h"
#include "spdk/jsonrpc.h"
#include "spdk/rpc.h"
#include "spdk/string.h"
#include "spdk/trace.h"
#include "spdk/util.h"
#include <algorithm>
#include <map>
#include <vector>
extern "C" {
#include "spdk/trace.h"
#include "iscsi/conn.h"
}
static char *exe_name;
static int g_shm_id = 0;
struct spdk_jsonrpc_client *g_rpc_client;
static void usage(void)
{
fprintf(stderr, "usage:\n");
fprintf(stderr, " %s <option>\n", exe_name);
fprintf(stderr, " option = '-i' to specify the shared memory ID,"
" (required)\n");
fprintf(stderr, " -r <path> RPC listen address (default: /var/tmp/spdk.sock\n");
}
struct rpc_conn_info {
uint32_t id;
uint32_t cid;
uint32_t tsih;
uint32_t lcore_id;
char *initiator_addr;
char *target_addr;
char *target_node_name;
};
static struct rpc_conn_info g_conn_info[1024];
static const struct spdk_json_object_decoder rpc_conn_info_decoders[] = {
{"id", offsetof(struct rpc_conn_info, id), spdk_json_decode_uint32},
{"cid", offsetof(struct rpc_conn_info, cid), spdk_json_decode_uint32},
{"tsih", offsetof(struct rpc_conn_info, tsih), spdk_json_decode_uint32},
{"lcore_id", offsetof(struct rpc_conn_info, lcore_id), spdk_json_decode_uint32},
{"initiator_addr", offsetof(struct rpc_conn_info, initiator_addr), spdk_json_decode_string},
{"target_addr", offsetof(struct rpc_conn_info, target_addr), spdk_json_decode_string},
{"target_node_name", offsetof(struct rpc_conn_info, target_node_name), spdk_json_decode_string},
};
static int
rpc_decode_conn_object(const struct spdk_json_val *val, void *out)
static bool
conns_compare(struct spdk_iscsi_conn *first, struct spdk_iscsi_conn *second)
{
struct rpc_conn_info *info = (struct rpc_conn_info *)out;
if (first->lcore < second->lcore) {
return true;
}
return spdk_json_decode_object(val, rpc_conn_info_decoders,
SPDK_COUNTOF(rpc_conn_info_decoders), info);
if (first->lcore > second->lcore) {
return false;
}
if (first->id < second->id) {
return true;
}
return false;
}
static void
print_connections(void)
{
struct spdk_jsonrpc_client_response *json_resp = NULL;
struct spdk_json_write_ctx *w;
struct spdk_jsonrpc_client_request *request;
int rc;
size_t conn_count, i;
struct rpc_conn_info *conn;
std::vector<struct spdk_iscsi_conn *> v;
std::vector<struct spdk_iscsi_conn *>::iterator iter;
size_t conns_size;
struct spdk_iscsi_conn *conns, *conn;
void *conns_ptr;
int fd, i;
char shm_name[64];
request = spdk_jsonrpc_client_create_request();
if (request == NULL) {
return;
snprintf(shm_name, sizeof(shm_name), "/spdk_iscsi_conns.%d", g_shm_id);
fd = shm_open(shm_name, O_RDONLY, 0600);
if (fd < 0) {
fprintf(stderr, "Cannot open shared memory: %s\n", shm_name);
usage();
exit(1);
}
w = spdk_jsonrpc_begin_request(request, 1, "iscsi_get_connections");
spdk_jsonrpc_end_request(request, w);
spdk_jsonrpc_client_send_request(g_rpc_client, request);
conns_size = sizeof(*conns) * MAX_ISCSI_CONNECTIONS;
do {
rc = spdk_jsonrpc_client_poll(g_rpc_client, 1);
} while (rc == 0 || rc == -ENOTCONN);
if (rc <= 0) {
goto end;
conns_ptr = mmap(NULL, conns_size, PROT_READ, MAP_SHARED, fd, 0);
if (conns_ptr == MAP_FAILED) {
fprintf(stderr, "Cannot mmap shared memory (%d)\n", errno);
exit(1);
}
json_resp = spdk_jsonrpc_client_get_response(g_rpc_client);
if (json_resp == NULL) {
goto end;
conns = (struct spdk_iscsi_conn *)conns_ptr;
for (i = 0; i < MAX_ISCSI_CONNECTIONS; i++) {
if (!conns[i].is_valid) {
continue;
}
v.push_back(&conns[i]);
}
if (spdk_json_decode_array(json_resp->result, rpc_decode_conn_object, g_conn_info,
SPDK_COUNTOF(g_conn_info), &conn_count, sizeof(struct rpc_conn_info))) {
goto end;
stable_sort(v.begin(), v.end(), conns_compare);
for (iter = v.begin(); iter != v.end(); iter++) {
conn = *iter;
printf("lcore %2d conn %3d T:%-8s I:%s (%s)\n",
conn->lcore, conn->id,
conn->target_short_name, conn->initiator_name,
conn->initiator_addr);
}
for (i = 0; i < conn_count; i++) {
conn = &g_conn_info[i];
printf("Connection: %u CID: %u TSIH: %u Initiator Address: %s Target Address: %s Target Node Name: %s\n",
conn->id, conn->cid, conn->tsih, conn->initiator_addr, conn->target_addr, conn->target_node_name);
}
end:
spdk_jsonrpc_client_free_request(request);
printf("\n");
munmap(conns, conns_size);
close(fd);
}
int main(int argc, char **argv)
@ -140,7 +126,6 @@ int main(int argc, char **argv)
void *history_ptr;
struct spdk_trace_histories *histories;
struct spdk_trace_history *history;
const char *rpc_socket_path = SPDK_DEFAULT_RPC_ADDR;
uint64_t tasks_done, last_tasks_done[SPDK_TRACE_MAX_LCORE];
int delay, old_delay, history_fd, i, quit, rc;
@ -154,13 +139,10 @@ int main(int argc, char **argv)
int op;
exe_name = argv[0];
while ((op = getopt(argc, argv, "i:r:")) != -1) {
while ((op = getopt(argc, argv, "i:")) != -1) {
switch (op) {
case 'i':
g_shm_id = spdk_strtol(optarg, 10);
break;
case 'r':
rpc_socket_path = optarg;
g_shm_id = atoi(optarg);
break;
default:
usage();
@ -168,12 +150,6 @@ int main(int argc, char **argv)
}
}
g_rpc_client = spdk_jsonrpc_client_connect(rpc_socket_path, AF_UNIX);
if (!g_rpc_client) {
fprintf(stderr, "spdk_jsonrpc_client_connect() failed: %d\n", errno);
return 1;
}
snprintf(spdk_trace_shm_name, sizeof(spdk_trace_shm_name), "/iscsi_trace.%d", g_shm_id);
history_fd = shm_open(spdk_trace_shm_name, O_RDONLY, 0600);
if (history_fd < 0) {
@ -193,7 +169,7 @@ int main(int argc, char **argv)
memset(last_tasks_done, 0, sizeof(last_tasks_done));
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
history = spdk_get_per_lcore_history(histories, i);
history = &histories->per_lcore_history[i];
last_tasks_done[i] = history->tpoint_count[TRACE_ISCSI_TASK_DONE];
}
@ -251,7 +227,7 @@ int main(int argc, char **argv)
printf("=============\n");
total_tasks_done_per_sec = 0;
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
history = spdk_get_per_lcore_history(histories, i);
history = &histories->per_lcore_history[i];
tasks_done = history->tpoint_count[TRACE_ISCSI_TASK_DONE];
tasks_done_delta = tasks_done - last_tasks_done[i];
if (tasks_done_delta == 0) {
@ -271,7 +247,5 @@ cleanup:
munmap(history_ptr, sizeof(*histories));
close(history_fd);
spdk_jsonrpc_client_close(g_rpc_client);
return (0);
}

View File

@ -33,20 +33,33 @@
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
include $(SPDK_ROOT_DIR)/mk/spdk.modules.mk
APP = nvmf_tgt
C_SRCS := nvmf_main.c
SPDK_LIB_LIST = $(ALL_MODULES_LIST) event_nvmf
ifeq ($(SPDK_ROOT_DIR)/lib/env_dpdk,$(CONFIG_ENV))
SPDK_LIB_LIST += env_dpdk_rpc
endif
SPDK_LIB_LIST = event_bdev event_copy event_nvmf
SPDK_LIB_LIST += nvmf event log trace conf thread util bdev copy rpc jsonrpc json
SPDK_LIB_LIST += app_rpc log_rpc bdev_rpc
ifeq ($(OS),Linux)
SPDK_LIB_LIST += event_nbd
SPDK_LIB_LIST += event_nbd nbd
endif
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
LIBS += $(BLOCKDEV_MODULES_LINKER_ARGS) \
$(COPY_MODULES_LINKER_ARGS) \
$(SOCK_MODULES_LINKER_ARGS) \
$(SPDK_LIB_LINKER_ARGS) $(ENV_LINKER_ARGS)
all : $(APP)
@:
$(APP) : $(OBJS) $(SPDK_LIB_FILES) $(SPDK_WHOLE_LIBS) $(BLOCKDEV_MODULES_FILES) $(COPY_MODULES_FILES) $(SOCK_MODULES_FILES) $(LINKER_MODULES) $(ENV_LIBS)
$(LINK_C)
clean :
$(CLEAN_C) $(APP)
include $(SPDK_ROOT_DIR)/mk/spdk.deps.mk

View File

@ -36,19 +36,21 @@
#include "spdk/env.h"
#include "spdk/event.h"
#define SPDK_NVMF_BUILD_ETC "/usr/local/etc/nvmf"
#define SPDK_NVMF_DEFAULT_CONFIG SPDK_NVMF_BUILD_ETC "/nvmf.conf"
static void
nvmf_usage(void)
{
}
static int
static void
nvmf_parse_arg(int ch, char *arg)
{
return 0;
}
static void
nvmf_tgt_started(void *arg1)
nvmf_tgt_started(void *arg1, void *arg2)
{
if (getenv("MEMZONE_DUMP") != NULL) {
spdk_memzone_dump(stdout);
@ -63,16 +65,18 @@ main(int argc, char **argv)
struct spdk_app_opts opts = {};
/* default value in opts */
spdk_app_opts_init(&opts, sizeof(opts));
spdk_app_opts_init(&opts);
opts.name = "nvmf";
if ((rc = spdk_app_parse_args(argc, argv, &opts, "", NULL,
opts.config_file = SPDK_NVMF_DEFAULT_CONFIG;
opts.max_delay_us = 1000; /* 1 ms */
if ((rc = spdk_app_parse_args(argc, argv, &opts, "",
nvmf_parse_arg, nvmf_usage)) !=
SPDK_APP_PARSE_ARGS_SUCCESS) {
exit(rc);
}
/* Blocks until the application is exiting */
rc = spdk_app_start(&opts, nvmf_tgt_started, NULL);
rc = spdk_app_start(&opts, nvmf_tgt_started, NULL, NULL);
spdk_app_fini();
return rc;
}

View File

@ -1 +0,0 @@
spdk_dd

View File

@ -1,44 +0,0 @@
#
# BSD LICENSE
#
# Copyright (c) Intel Corporation.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
include $(SPDK_ROOT_DIR)/mk/spdk.modules.mk
APP = spdk_dd
C_SRCS := spdk_dd.c
SPDK_LIB_LIST = $(ALL_MODULES_LIST) event_bdev
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk

File diff suppressed because it is too large Load Diff

View File

@ -1 +0,0 @@
spdk_lspci

View File

@ -1,44 +0,0 @@
#
# BSD LICENSE
#
# Copyright (c) Intel Corporation.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
include $(SPDK_ROOT_DIR)/mk/spdk.modules.mk
APP = spdk_lspci
C_SRCS := spdk_lspci.c
SPDK_LIB_LIST = $(SOCK_MODULES_LIST) nvme vmd
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk

View File

@ -1,123 +0,0 @@
/*-
* BSD LICENSE
*
* Copyright (c) Intel Corporation.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "spdk/stdinc.h"
#include "spdk/env.h"
#include "spdk/vmd.h"
static void
usage(void)
{
printf("Usage: spdk_lspci\n");
printf("Print available SPDK PCI devices supported by NVMe driver.\n");
}
static int
pci_enum_cb(void *ctx, struct spdk_pci_device *dev)
{
return 0;
}
static void
print_pci_dev(struct spdk_pci_device *dev)
{
struct spdk_pci_addr pci_addr = spdk_pci_device_get_addr(dev);
char addr[32] = { 0 };
spdk_pci_addr_fmt(addr, sizeof(addr), &pci_addr);
printf("%s (%x %x)", addr,
spdk_pci_device_get_vendor_id(dev),
spdk_pci_device_get_device_id(dev));
if (strcmp(spdk_pci_device_get_type(dev), "vmd") == 0) {
printf(" (NVMe disk behind VMD) ");
}
if (dev->internal.driver == spdk_pci_vmd_get_driver()) {
printf(" (VMD) ");
}
printf("\n");
}
int
main(int argc, char **argv)
{
int op;
struct spdk_env_opts opts;
struct spdk_pci_device *dev;
while ((op = getopt(argc, argv, "h")) != -1) {
switch (op) {
case 'h':
usage();
return 0;
default:
usage();
return 1;
}
}
spdk_env_opts_init(&opts);
opts.name = "spdk_lspci";
if (spdk_env_init(&opts) < 0) {
printf("Unable to initialize SPDK env\n");
return 1;
}
if (spdk_vmd_init()) {
printf("Failed to initialize VMD. Some NVMe devices can be unavailable.\n");
}
if (spdk_pci_enumerate(spdk_pci_nvme_get_driver(), pci_enum_cb, NULL)) {
printf("Unable to enumerate PCI nvme driver\n");
return 1;
}
dev = spdk_pci_get_first_device();
if (!dev) {
printf("\nLack of PCI devices available for SPDK!\n");
}
printf("\nList of available PCI devices:\n");
while (dev) {
print_pci_dev(dev);
dev = spdk_pci_get_next_device(dev);
}
spdk_vmd_fini();
return 0;
}

View File

@ -33,31 +33,41 @@
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
include $(SPDK_ROOT_DIR)/mk/spdk.modules.mk
APP = spdk_tgt
C_SRCS := spdk_tgt.c
SPDK_LIB_LIST = $(ALL_MODULES_LIST)
SPDK_LIB_LIST += event_iscsi event_nvmf
ifeq ($(SPDK_ROOT_DIR)/lib/env_dpdk,$(CONFIG_ENV))
SPDK_LIB_LIST += env_dpdk_rpc
endif
SPDK_LIB_LIST = event_bdev event_copy event_iscsi event_net event_scsi event_nvmf
SPDK_LIB_LIST += nvmf event log trace conf thread util bdev iscsi scsi copy rpc jsonrpc json
SPDK_LIB_LIST += app_rpc log_rpc bdev_rpc net
ifeq ($(OS),Linux)
SPDK_LIB_LIST += event_nbd
SPDK_LIB_LIST += event_nbd nbd
ifeq ($(CONFIG_VHOST),y)
SPDK_LIB_LIST += event_vhost
endif
SPDK_LIB_LIST += vhost rte_vhost event_vhost
endif
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
endif
LIBS += $(BLOCKDEV_MODULES_LINKER_ARGS) \
$(COPY_MODULES_LINKER_ARGS) \
$(SOCK_MODULES_LINKER_ARGS) \
$(SPDK_LIB_LINKER_ARGS) $(ENV_LINKER_ARGS)
all: $(APP)
@:
$(APP): $(OBJS) $(SPDK_LIB_FILES) $(SPDK_WHOLE_LIBS) $(BLOCKDEV_MODULES_FILES) $(COPY_MODULES_FILES) $(SOCK_MODULES_FILES) $(LINKER_MODULES) $(ENV_LIBS)
$(LINK_C)
clean:
$(CLEAN_C) $(APP)
install: $(APP)
$(INSTALL_APP)
uninstall:
$(UNINSTALL_APP)
include $(SPDK_ROOT_DIR)/mk/spdk.deps.mk

View File

@ -33,11 +33,15 @@
#include "spdk/stdinc.h"
#include "spdk/config.h"
#include "spdk/env.h"
#include "spdk/event.h"
#include "spdk/vhost.h"
/* TODO: this should be handled by configure */
#if defined(SPDK_CONFIG_VHOST) && !defined(__linux__)
#undef SPDK_CONFIG_VHOST
#endif
#ifdef SPDK_CONFIG_VHOST
#define SPDK_VHOST_OPTS "S:"
#else
@ -50,9 +54,9 @@ static const char g_spdk_tgt_get_opts_string[] = "f:" SPDK_VHOST_OPTS;
static void
spdk_tgt_usage(void)
{
printf(" -f <file> pidfile save pid to file under given path\n");
printf(" -f pidfile save pid to file under given path\n");
#ifdef SPDK_CONFIG_VHOST
printf(" -S <path> directory where to create vhost sockets (default: pwd)\n");
printf(" -S dir directory where to create vhost sockets (default: pwd)\n");
#endif
}
@ -72,7 +76,7 @@ spdk_tgt_save_pid(const char *pid_path)
}
static int
static void
spdk_tgt_parse_arg(int ch, char *arg)
{
switch (ch) {
@ -84,14 +88,11 @@ spdk_tgt_parse_arg(int ch, char *arg)
spdk_vhost_set_socket_path(arg);
break;
#endif
default:
return -EINVAL;
}
return 0;
}
static void
spdk_tgt_started(void *arg1)
spdk_tgt_started(void *arg1, void *arg2)
{
if (g_pid_path) {
spdk_tgt_save_pid(g_pid_path);
@ -109,15 +110,15 @@ main(int argc, char **argv)
struct spdk_app_opts opts = {};
int rc;
spdk_app_opts_init(&opts, sizeof(opts));
spdk_app_opts_init(&opts);
opts.name = "spdk_tgt";
if ((rc = spdk_app_parse_args(argc, argv, &opts, g_spdk_tgt_get_opts_string,
NULL, spdk_tgt_parse_arg, spdk_tgt_usage)) !=
spdk_tgt_parse_arg, spdk_tgt_usage)) !=
SPDK_APP_PARSE_ARGS_SUCCESS) {
return rc;
}
rc = spdk_app_start(&opts, spdk_tgt_started, NULL);
rc = spdk_app_start(&opts, spdk_tgt_started, NULL, NULL);
spdk_app_fini();
return rc;

View File

@ -1 +0,0 @@
spdk_top

View File

@ -1,44 +0,0 @@
#
# BSD LICENSE
#
# Copyright (c) Intel Corporation.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
APP = spdk_top
C_SRCS := spdk_top.c
SPDK_LIB_LIST = rpc
LIBS=-lncurses -lpanel -lmenu
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk

View File

@ -1,74 +0,0 @@
Contents
========
- Overview
- Installation
- Usage
Overview
========
This application provides SPDK live statistics regarding usage of cores,
threads, pollers, execution times, and relations between those. All data
is being gathered from SPDK by calling appropriate RPC calls. Application
consists of three selectable tabs providing statistics related to three
main topics:
- Threads
- Pollers
- Cores
Installation
============
spdk_top requires Ncurses library (can by installed by running
spdk/scripts/pkgdep.sh) and is compiled by default when SPDK compiles.
Usage
=====
To run spdk_top:
sudo spdk_top [options]
options:
-r <path> RPC listen address (optional, default: /var/tmp/spdk.sock)
-h show help message
Application consists of:
- Tabs list (on top)
- Statistics window (main windows in the middle)
- Options window (below statistics window)
- Page indicator / Error status
Tabs list shows available tabs and highlights currently selected tab.
Statistics window displays current statistics. Available statistics
depend on which tab is currently selected. All time and run counter
related statistics are relative - show elapsed time / number of runs
since previous data refresh. Options windows provide hotkeys list
to change application settings. Available options are:
- [q] Quit - quit the application
- [1-3] TAB selection - select tab to be displayed
- [PgUp] Previous page - go to previous page
- [PgDown] Next page - go to next page
- [c] Columns - select which columns should be visible / hidden:
Use arrow up / down and space / enter keys to select which columns
should be visible. Select 'CLOSE' to confirm changes and close
the window.
- [s] Sorting - change data sorting:
Use arrow up / down to select based on which column data should be
sorted. Use enter key to confirm or esc key to exit without
changing current sorting scheme.
- [r] Refresh rate - change data refresh rate:
Enter new data refresh rate value. Refresh rate accepts value
between 0 and 255 seconds. Use enter key to apply or escape key
to cancel.
Page indicator show current data page. Error status can be displayed
on bottom right side of the screen when the application encountered
an error.

File diff suppressed because it is too large Load Diff

View File

@ -33,11 +33,19 @@
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
include $(SPDK_ROOT_DIR)/mk/spdk.modules.mk
APP = spdk_trace
SPDK_NO_LINK_ENV = 1
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
CXX_SRCS := trace.cpp
include $(SPDK_ROOT_DIR)/mk/spdk.app_cxx.mk
APP = spdk_trace
all: $(APP)
@:
$(APP): $(OBJS) $(SPDK_LIBS)
$(LINK_CXX)
clean:
$(CLEAN_C) $(APP)
include $(SPDK_ROOT_DIR)/mk/spdk.deps.mk

View File

@ -37,11 +37,9 @@
extern "C" {
#include "spdk/trace.h"
#include "spdk/util.h"
}
static struct spdk_trace_histories *g_histories;
static bool g_print_tsc = false;
static void usage(void);
@ -81,11 +79,13 @@ struct object_stats {
struct object_stats g_stats[SPDK_TRACE_MAX_OBJECT];
static char *g_exe_name;
static int g_verbose = 1;
static char *exe_name;
static int verbose = 1;
static int g_fudge_factor = 20;
static uint64_t g_tsc_rate;
static uint64_t g_first_tsc = 0x0;
static uint64_t tsc_rate;
static uint64_t first_tsc = 0x0;
static uint64_t last_tsc = -1ULL;
static float
get_us_from_tsc(uint64_t tsc, uint64_t tsc_rate)
@ -110,13 +110,6 @@ print_uint64(const char *arg_string, uint64_t arg)
printf("%-7.7s%-16jd ", arg_string, arg);
}
static void
print_string(const char *arg_string, uint64_t arg)
{
char *str = (char *)&arg;
printf("%-7.7s%.8s ", arg_string, str);
}
static void
print_size(uint32_t size)
{
@ -140,23 +133,16 @@ print_float(const char *arg_string, float arg)
}
static void
print_arg(uint8_t arg_type, const char *arg_string, uint64_t arg)
print_arg(bool arg_is_ptr, const char *arg_string, uint64_t arg)
{
if (arg_string[0] == 0) {
printf("%24s", "");
return;
}
switch (arg_type) {
case SPDK_TRACE_ARG_TYPE_PTR:
if (arg_is_ptr) {
print_ptr(arg_string, arg);
break;
case SPDK_TRACE_ARG_TYPE_INT:
} else {
print_uint64(arg_string, arg);
break;
case SPDK_TRACE_ARG_TYPE_STR:
print_string(arg_string, arg);
break;
}
}
@ -178,12 +164,15 @@ print_event(struct spdk_trace_entry *e, uint64_t tsc_rate,
stats->size[e->object_id] = e->size;
}
if (d->arg1_is_alias) {
stats->index[e->arg1] = stats->index[e->object_id];
stats->start[e->arg1] = stats->start[e->object_id];
stats->size[e->arg1] = stats->size[e->object_id];
}
us = get_us_from_tsc(e->tsc - tsc_offset, tsc_rate);
printf("%2d: %10.3f ", lcore, us);
if (g_print_tsc) {
printf("(%9ju) ", e->tsc - tsc_offset);
}
printf("%2d: %10.3f (%9ju) ", lcore, us, e->tsc - tsc_offset);
if (g_histories->flags.owner[d->owner_type].id_prefix) {
printf("%c%02d ", g_histories->flags.owner[d->owner_type].id_prefix, e->poller_id);
} else {
@ -193,20 +182,26 @@ print_event(struct spdk_trace_entry *e, uint64_t tsc_rate,
printf("%-*s ", (int)sizeof(d->name), d->name);
print_size(e->size);
print_arg(d->arg1_type, d->arg1_name, e->arg1);
if (d->new_object) {
print_arg(d->arg1_is_ptr, d->arg1_name, e->arg1);
print_object_id(d->object_type, stats->index[e->object_id]);
} else if (d->object_type != OBJECT_NONE) {
if (stats->start.find(e->object_id) != stats->start.end()) {
struct spdk_trace_tpoint *start_description;
us = get_us_from_tsc(e->tsc - stats->start[e->object_id],
tsc_rate);
print_object_id(d->object_type, stats->index[e->object_id]);
print_float("time:", us);
start_description = &g_histories->flags.tpoint[stats->tpoint_id[e->object_id]];
if (start_description->short_name[0] != 0) {
printf(" (%.4s)", start_description->short_name);
}
} else {
printf("id: N/A");
}
} else if (e->object_id != 0) {
print_arg(SPDK_TRACE_ARG_TYPE_PTR, "object: ", e->object_id);
} else {
print_arg(d->arg1_is_ptr, d->arg1_name, e->arg1);
}
printf("\n");
}
@ -215,20 +210,24 @@ static void
process_event(struct spdk_trace_entry *e, uint64_t tsc_rate,
uint64_t tsc_offset, uint16_t lcore)
{
if (g_verbose) {
if (verbose) {
print_event(e, tsc_rate, tsc_offset, lcore);
}
}
static int
populate_events(struct spdk_trace_history *history, int num_entries)
populate_events(struct spdk_trace_history *history)
{
int i, num_entries_filled;
int i, entry_size, history_size, num_entries, num_entries_filled;
struct spdk_trace_entry *e;
int first, last, lcore;
lcore = history->lcore;
entry_size = sizeof(history->entries[0]);
history_size = sizeof(history->entries);
num_entries = history_size / entry_size;
e = history->entries;
num_entries_filled = num_entries;
@ -246,19 +245,33 @@ populate_events(struct spdk_trace_history *history, int num_entries)
last = i;
}
}
first += g_fudge_factor;
if (first >= num_entries) {
first -= num_entries;
}
last -= g_fudge_factor;
if (last < 0) {
last += num_entries;
}
} else {
first = 0;
last = num_entries_filled - 1;
}
/*
* We keep track of the highest first TSC out of all reactors.
* We will ignore any events that occured before this TSC on any
* other reactors. This will ensure we only print data for the
* subset of time where we have data across all reactors.
* We keep track of the highest first TSC out of all reactors and
* the lowest last TSC out of all reactors. We will ignore any
* events outside the range of these two TSC values. This will
* ensure we only print data for the subset of time where we have
* data across all reactors.
*/
if (e[first].tsc > g_first_tsc) {
g_first_tsc = e[first].tsc;
if (e[first].tsc > first_tsc) {
first_tsc = e[first].tsc;
}
if (e[last].tsc < last_tsc) {
last_tsc = e[last].tsc;
}
i = first;
@ -279,37 +292,31 @@ populate_events(struct spdk_trace_history *history, int num_entries)
static void usage(void)
{
fprintf(stderr, "usage:\n");
fprintf(stderr, " %s <option> <lcore#>\n", g_exe_name);
fprintf(stderr, " %s <option> <lcore#>\n", exe_name);
fprintf(stderr, " option = '-q' to disable verbose mode\n");
fprintf(stderr, " '-s' to specify spdk_trace shm name\n");
fprintf(stderr, " '-c' to display single lcore history\n");
fprintf(stderr, " '-t' to display TSC offset for each event\n");
fprintf(stderr, " '-s' to specify spdk_trace shm name for a\n");
fprintf(stderr, " currently running process\n");
fprintf(stderr, " '-f' to specify number of events to ignore at\n");
fprintf(stderr, " beginning and end of trace (default: 20)\n");
fprintf(stderr, " '-i' to specify the shared memory ID\n");
fprintf(stderr, " '-p' to specify the trace PID\n");
fprintf(stderr, " (If -s is specified, then one of\n");
fprintf(stderr, " -i or -p must be specified)\n");
fprintf(stderr, " '-f' to specify a tracepoint file name\n");
fprintf(stderr, " (-s and -f are mutually exclusive)\n");
fprintf(stderr, " (One of -i or -p must be specified)\n");
}
int main(int argc, char **argv)
{
void *history_ptr;
struct spdk_trace_history *history;
int fd, i, rc;
struct spdk_trace_history *history_entries, *history;
int fd, i;
int lcore = SPDK_TRACE_MAX_LCORE;
uint64_t tsc_offset;
const char *app_name = NULL;
const char *file_name = NULL;
const char *app_name = "spdk";
int op;
char shm_name[64];
int shm_id = -1, shm_pid = -1;
uint64_t trace_histories_size;
struct stat _stat;
g_exe_name = argv[0];
while ((op = getopt(argc, argv, "c:f:i:p:qs:t")) != -1) {
exe_name = argv[0];
while ((op = getopt(argc, argv, "c:f:i:p:qs:")) != -1) {
switch (op) {
case 'c':
lcore = atoi(optarg);
@ -320,6 +327,9 @@ int main(int argc, char **argv)
exit(1);
}
break;
case 'f':
g_fudge_factor = atoi(optarg);
break;
case 'i':
shm_id = atoi(optarg);
break;
@ -327,135 +337,84 @@ int main(int argc, char **argv)
shm_pid = atoi(optarg);
break;
case 'q':
g_verbose = 0;
verbose = 0;
break;
case 's':
app_name = optarg;
break;
case 'f':
file_name = optarg;
break;
case 't':
g_print_tsc = true;
break;
default:
usage();
exit(1);
}
}
if (file_name != NULL && app_name != NULL) {
fprintf(stderr, "-f and -s are mutually exclusive\n");
usage();
exit(1);
}
if (file_name == NULL && app_name == NULL) {
fprintf(stderr, "One of -f and -s must be specified\n");
usage();
exit(1);
}
if (file_name) {
fd = open(file_name, O_RDONLY);
if (shm_id >= 0) {
snprintf(shm_name, sizeof(shm_name), "/%s_trace.%d", app_name, shm_id);
} else {
if (shm_id >= 0) {
snprintf(shm_name, sizeof(shm_name), "/%s_trace.%d", app_name, shm_id);
} else {
snprintf(shm_name, sizeof(shm_name), "/%s_trace.pid%d", app_name, shm_pid);
}
fd = shm_open(shm_name, O_RDONLY, 0600);
file_name = shm_name;
snprintf(shm_name, sizeof(shm_name), "/%s_trace.pid%d", app_name, shm_pid);
}
fd = shm_open(shm_name, O_RDONLY, 0600);
if (fd < 0) {
fprintf(stderr, "Could not open %s.\n", file_name);
fprintf(stderr, "Could not open shm %s.\n", shm_name);
usage();
exit(-1);
}
rc = fstat(fd, &_stat);
if (rc < 0) {
fprintf(stderr, "Could not get size of %s.\n", file_name);
usage();
exit(-1);
}
if ((size_t)_stat.st_size < sizeof(*g_histories)) {
fprintf(stderr, "%s is not a valid trace file\n", file_name);
usage();
exit(-1);
}
/* Map the header of trace file */
history_ptr = mmap(NULL, sizeof(*g_histories), PROT_READ, MAP_SHARED, fd, 0);
if (history_ptr == MAP_FAILED) {
fprintf(stderr, "Could not mmap %s.\n", file_name);
fprintf(stderr, "Could not mmap shm %s.\n", shm_name);
usage();
exit(-1);
}
g_histories = (struct spdk_trace_histories *)history_ptr;
g_tsc_rate = g_histories->flags.tsc_rate;
if (g_tsc_rate == 0) {
fprintf(stderr, "Invalid tsc_rate %ju\n", g_tsc_rate);
tsc_rate = g_histories->flags.tsc_rate;
if (tsc_rate == 0) {
fprintf(stderr, "Invalid tsc_rate %ju\n", tsc_rate);
usage();
exit(-1);
}
if (g_verbose) {
printf("TSC Rate: %ju\n", g_tsc_rate);
if (verbose) {
printf("TSC Rate: %ju\n", tsc_rate);
}
/* Remap the entire trace file */
trace_histories_size = spdk_get_trace_histories_size(g_histories);
munmap(history_ptr, sizeof(*g_histories));
if ((size_t)_stat.st_size < trace_histories_size) {
fprintf(stderr, "%s is not a valid trace file\n", file_name);
usage();
exit(-1);
history_entries = (struct spdk_trace_history *)malloc(sizeof(g_histories->per_lcore_history));
if (history_entries == NULL) {
goto cleanup;
}
history_ptr = mmap(NULL, trace_histories_size, PROT_READ, MAP_SHARED, fd, 0);
if (history_ptr == MAP_FAILED) {
fprintf(stderr, "Could not mmap %s.\n", file_name);
usage();
exit(-1);
}
g_histories = (struct spdk_trace_histories *)history_ptr;
memcpy(history_entries, g_histories->per_lcore_history,
sizeof(g_histories->per_lcore_history));
if (lcore == SPDK_TRACE_MAX_LCORE) {
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
history = spdk_get_per_lcore_history(g_histories, i);
if (history->num_entries == 0 || history->entries[0].tsc == 0) {
history = &history_entries[i];
if (history->entries[0].tsc == 0) {
continue;
}
if (g_verbose && history->num_entries) {
printf("Trace Size of lcore (%d): %ju\n", i, history->num_entries);
}
populate_events(history, history->num_entries);
populate_events(history);
}
} else {
history = spdk_get_per_lcore_history(g_histories, lcore);
if (history->num_entries > 0 && history->entries[0].tsc != 0) {
if (g_verbose && history->num_entries) {
printf("Trace Size of lcore (%d): %ju\n", lcore, history->num_entries);
}
populate_events(history, history->num_entries);
history = &history_entries[lcore];
if (history->entries[0].tsc != 0) {
populate_events(history);
}
}
tsc_offset = g_first_tsc;
tsc_offset = first_tsc;
for (entry_map::iterator it = g_entry_map.begin(); it != g_entry_map.end(); it++) {
if (it->first.tsc < g_first_tsc) {
if (it->first.tsc < first_tsc || it->first.tsc > last_tsc) {
continue;
}
process_event(it->second, g_tsc_rate, tsc_offset, it->first.lcore);
process_event(it->second, tsc_rate, tsc_offset, it->first.lcore);
}
munmap(history_ptr, trace_histories_size);
free(history_entries);
cleanup:
munmap(history_ptr, sizeof(*g_histories));
close(fd);
return (0);

View File

@ -1 +0,0 @@
spdk_trace_record

View File

@ -1,43 +0,0 @@
#
# BSD LICENSE
#
# Copyright (c) Intel Corporation.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
SPDK_LIB_LIST = util log
APP = spdk_trace_record
C_SRCS := trace_record.c
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk

View File

@ -1,706 +0,0 @@
/*-
* BSD LICENSE
*
* Copyright (c) Intel Corporation.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "spdk/stdinc.h"
#include "spdk/env.h"
#include "spdk/string.h"
#include "spdk/trace.h"
#include "spdk/util.h"
#include "spdk/barrier.h"
#define TRACE_FILE_COPY_SIZE (32 * 1024)
#define TRACE_PATH_MAX 2048
static char *g_exe_name;
static int g_verbose = 1;
static uint64_t g_tsc_rate;
static uint64_t g_utsc_rate;
static bool g_shutdown = false;
static uint64_t g_histories_size;
struct lcore_trace_record_ctx {
char lcore_file[TRACE_PATH_MAX];
int fd;
struct spdk_trace_history *in_history;
struct spdk_trace_history *out_history;
/* Recorded next entry index in record */
uint64_t rec_next_entry;
/* Record tsc for report */
uint64_t first_entry_tsc;
uint64_t last_entry_tsc;
/* Total number of entries in lcore trace file */
uint64_t num_entries;
};
struct aggr_trace_record_ctx {
const char *out_file;
int out_fd;
int shm_fd;
struct lcore_trace_record_ctx lcore_ports[SPDK_TRACE_MAX_LCORE];
struct spdk_trace_histories *trace_histories;
};
static int
input_trace_file_mmap(struct aggr_trace_record_ctx *ctx, const char *shm_name)
{
void *history_ptr;
int i;
ctx->shm_fd = shm_open(shm_name, O_RDONLY, 0);
if (ctx->shm_fd < 0) {
fprintf(stderr, "Could not open %s.\n", shm_name);
return -1;
}
/* Map the header of trace file */
history_ptr = mmap(NULL, sizeof(struct spdk_trace_histories), PROT_READ, MAP_SHARED, ctx->shm_fd,
0);
if (history_ptr == MAP_FAILED) {
fprintf(stderr, "Could not mmap shm %s.\n", shm_name);
close(ctx->shm_fd);
return -1;
}
ctx->trace_histories = (struct spdk_trace_histories *)history_ptr;
g_tsc_rate = ctx->trace_histories->flags.tsc_rate;
g_utsc_rate = g_tsc_rate / 1000;
if (g_tsc_rate == 0) {
fprintf(stderr, "Invalid tsc_rate %ju\n", g_tsc_rate);
munmap(history_ptr, sizeof(struct spdk_trace_histories));
close(ctx->shm_fd);
return -1;
}
if (g_verbose) {
printf("TSC Rate: %ju\n", g_tsc_rate);
}
/* Remap the entire trace file */
g_histories_size = spdk_get_trace_histories_size(ctx->trace_histories);
munmap(history_ptr, sizeof(struct spdk_trace_histories));
history_ptr = mmap(NULL, g_histories_size, PROT_READ, MAP_SHARED, ctx->shm_fd, 0);
if (history_ptr == MAP_FAILED) {
fprintf(stderr, "Could not remmap shm %s.\n", shm_name);
close(ctx->shm_fd);
return -1;
}
ctx->trace_histories = (struct spdk_trace_histories *)history_ptr;
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
ctx->lcore_ports[i].in_history = spdk_get_per_lcore_history(ctx->trace_histories, i);
if (g_verbose) {
printf("Number of trace entries for lcore (%d): %ju\n", i,
ctx->lcore_ports[i].in_history->num_entries);
}
}
return 0;
}
static int
output_trace_files_prepare(struct aggr_trace_record_ctx *ctx, const char *aggr_path)
{
int flags = O_CREAT | O_EXCL | O_RDWR;
struct lcore_trace_record_ctx *port_ctx;
int name_len;
int i, rc;
/* Assign file names for related trace files */
ctx->out_file = aggr_path;
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
port_ctx = &ctx->lcore_ports[i];
/* Get the length of trace file name for each lcore with format "%s-%d" */
name_len = snprintf(port_ctx->lcore_file, TRACE_PATH_MAX, "%s-%d", ctx->out_file, i);
if (name_len >= TRACE_PATH_MAX) {
fprintf(stderr, "Length of file path (%s) exceeds limitation for lcore file.\n",
aggr_path);
goto err;
}
}
/* If output trace file already exists, try to unlink it together with its temporary files */
if (access(ctx->out_file, F_OK) == 0) {
rc = unlink(ctx->out_file);
if (rc) {
goto err;
}
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
port_ctx = &ctx->lcore_ports[i];
if (access(port_ctx->lcore_file, F_OK) == 0) {
rc = unlink(port_ctx->lcore_file);
if (rc) {
goto err;
}
}
}
}
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
port_ctx = &ctx->lcore_ports[i];
port_ctx->fd = open(port_ctx->lcore_file, flags, 0600);
if (port_ctx->fd < 0) {
fprintf(stderr, "Could not open lcore file %s.\n", port_ctx->lcore_file);
goto err;
}
if (g_verbose) {
printf("Create tmp lcore trace file %s for lcore %d\n", port_ctx->lcore_file, i);
}
port_ctx->out_history = calloc(1, sizeof(struct spdk_trace_history));
if (port_ctx->out_history == NULL) {
fprintf(stderr, "Failed to allocate memory for out_history.\n");
goto err;
}
}
return 0;
err:
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
port_ctx = &ctx->lcore_ports[i];
free(port_ctx->out_history);
if (port_ctx->fd > 0) {
close(port_ctx->fd);
}
}
return -1;
}
static void
output_trace_files_finish(struct aggr_trace_record_ctx *ctx)
{
struct lcore_trace_record_ctx *port_ctx;
int i;
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
port_ctx = &ctx->lcore_ports[i];
free(port_ctx->out_history);
close(port_ctx->fd);
unlink(port_ctx->lcore_file);
if (g_verbose) {
printf("Remove tmp lcore trace file %s for lcore %d\n", port_ctx->lcore_file, i);
}
}
}
static int
cont_write(int fildes, const void *buf, size_t nbyte)
{
int rc;
int _nbyte = nbyte;
while (_nbyte) {
rc = write(fildes, buf, _nbyte);
if (rc < 0) {
if (errno != EINTR) {
return -1;
}
continue;
}
_nbyte -= rc;
}
return nbyte;
}
static int
cont_read(int fildes, void *buf, size_t nbyte)
{
int rc;
int _nbyte = nbyte;
while (_nbyte) {
rc = read(fildes, buf, _nbyte);
if (rc == 0) {
return nbyte - _nbyte;
} else if (rc < 0) {
if (errno != EINTR) {
return -1;
}
continue;
}
_nbyte -= rc;
}
return nbyte;
}
static int
lcore_trace_last_entry_idx(struct spdk_trace_history *in_history, int cir_next_idx)
{
int last_idx;
if (cir_next_idx == 0) {
last_idx = in_history->num_entries - 1;
} else {
last_idx = cir_next_idx - 1;
}
return last_idx;
}
static int
circular_buffer_padding_backward(int fd, struct spdk_trace_history *in_history,
int cir_start, int cir_end)
{
int rc;
if (cir_end <= cir_start) {
fprintf(stderr, "Wrong using of circular_buffer_padding_back\n");
return -1;
}
rc = cont_write(fd, &in_history->entries[cir_start],
sizeof(struct spdk_trace_entry) * (cir_end - cir_start));
if (rc < 0) {
fprintf(stderr, "Failed to append entries into lcore file\n");
return rc;
}
return 0;
}
static int
circular_buffer_padding_across(int fd, struct spdk_trace_history *in_history,
int cir_start, int cir_end)
{
int rc;
int num_entries = in_history->num_entries;
if (cir_end > cir_start) {
fprintf(stderr, "Wrong using of circular_buffer_padding_across\n");
return -1;
}
rc = cont_write(fd, &in_history->entries[cir_start],
sizeof(struct spdk_trace_entry) * (num_entries - cir_start));
if (rc < 0) {
fprintf(stderr, "Failed to append entries into lcore file backward\n");
return rc;
}
if (cir_end == 0) {
return 0;
}
rc = cont_write(fd, &in_history->entries[0], sizeof(struct spdk_trace_entry) * cir_end);
if (rc < 0) {
fprintf(stderr, "Failed to append entries into lcore file forward\n");
return rc;
}
return 0;
}
static int
circular_buffer_padding_all(int fd, struct spdk_trace_history *in_history,
int cir_end)
{
return circular_buffer_padding_across(fd, in_history, cir_end, cir_end);
}
static int
lcore_trace_record(struct lcore_trace_record_ctx *lcore_port)
{
struct spdk_trace_history *in_history = lcore_port->in_history;
uint64_t rec_next_entry = lcore_port->rec_next_entry;
uint64_t rec_num_entries = lcore_port->num_entries;
int fd = lcore_port->fd;
uint64_t shm_next_entry;
uint64_t num_cir_entries;
uint64_t shm_cir_next;
uint64_t rec_cir_next;
int rc;
int last_idx;
shm_next_entry = in_history->next_entry;
/* Ensure all entries of spdk_trace_history are latest to next_entry */
spdk_smp_rmb();
if (shm_next_entry == rec_next_entry) {
/* There is no update */
return 0;
} else if (shm_next_entry < rec_next_entry) {
/* Error branch */
fprintf(stderr, "Trace porting error in lcore %d, trace rollback occurs.\n", in_history->lcore);
fprintf(stderr, "shm_next_entry is %ju, record_next_entry is %ju.\n", shm_next_entry,
rec_next_entry);
return -1;
}
num_cir_entries = in_history->num_entries;
shm_cir_next = shm_next_entry & (num_cir_entries - 1);
/* Record first entry's tsc and corresponding entries when recording first time. */
if (lcore_port->first_entry_tsc == 0) {
if (shm_next_entry < num_cir_entries) {
/* Updates haven't been across circular buffer yet.
* The first entry in shared memory is the eldest one.
*/
lcore_port->first_entry_tsc = in_history->entries[0].tsc;
lcore_port->num_entries += shm_cir_next;
rc = circular_buffer_padding_backward(fd, in_history, 0, shm_cir_next);
} else {
/* Updates have already been across circular buffer.
* The eldest entry in shared memory is pointed by shm_cir_next.
*/
lcore_port->first_entry_tsc = in_history->entries[shm_cir_next].tsc;
lcore_port->num_entries += num_cir_entries;
rc = circular_buffer_padding_all(fd, in_history, shm_cir_next);
}
goto out;
}
if (shm_next_entry - rec_next_entry > num_cir_entries) {
/* There must be missed updates */
fprintf(stderr, "Trace-record missed %ju trace entries\n",
shm_next_entry - rec_next_entry - num_cir_entries);
lcore_port->num_entries += num_cir_entries;
rc = circular_buffer_padding_all(fd, in_history, shm_cir_next);
} else if (shm_next_entry - rec_next_entry == num_cir_entries) {
/* All circular buffer is updated */
lcore_port->num_entries += num_cir_entries;
rc = circular_buffer_padding_all(fd, in_history, shm_cir_next);
} else {
/* Part of circular buffer is updated */
rec_cir_next = rec_next_entry & (num_cir_entries - 1);
if (shm_cir_next > rec_cir_next) {
/* Updates are not across circular buffer */
lcore_port->num_entries += shm_cir_next - rec_cir_next;
rc = circular_buffer_padding_backward(fd, in_history, rec_cir_next, shm_cir_next);
} else {
/* Updates are across circular buffer */
lcore_port->num_entries += num_cir_entries - rec_cir_next + shm_cir_next;
rc = circular_buffer_padding_across(fd, in_history, rec_cir_next, shm_cir_next);
}
}
out:
if (rc) {
return rc;
}
if (g_verbose) {
printf("Append %ju trace_entry for lcore %d\n", lcore_port->num_entries - rec_num_entries,
in_history->lcore);
}
/* Update tpoint_count info */
memcpy(lcore_port->out_history, lcore_port->in_history, sizeof(struct spdk_trace_history));
/* Update last_entry_tsc to align with appended entries */
last_idx = lcore_trace_last_entry_idx(in_history, shm_cir_next);
lcore_port->last_entry_tsc = in_history->entries[last_idx].tsc;
lcore_port->rec_next_entry = shm_next_entry;
return rc;
}
static int
trace_files_aggregate(struct aggr_trace_record_ctx *ctx)
{
int flags = O_CREAT | O_EXCL | O_RDWR;
struct lcore_trace_record_ctx *lcore_port;
char copy_buff[TRACE_FILE_COPY_SIZE];
uint64_t lcore_offsets[SPDK_TRACE_MAX_LCORE + 1];
int rc, i;
ssize_t len = 0;
uint64_t len_sum;
ctx->out_fd = open(ctx->out_file, flags, 0600);
if (ctx->out_fd < 0) {
fprintf(stderr, "Could not open aggregation file %s.\n", ctx->out_file);
return -1;
}
if (g_verbose) {
printf("Create trace file %s for output\n", ctx->out_file);
}
/* Write flags of histories into head of converged trace file, except num_entriess */
rc = cont_write(ctx->out_fd, ctx->trace_histories,
sizeof(struct spdk_trace_histories) - sizeof(lcore_offsets));
if (rc < 0) {
fprintf(stderr, "Failed to write trace header into trace file\n");
goto out;
}
/* Update and append lcore offsets converged trace file */
lcore_offsets[0] = sizeof(struct spdk_trace_flags);
for (i = 1; i < (int)SPDK_COUNTOF(lcore_offsets); i++) {
lcore_offsets[i] = spdk_get_trace_history_size(ctx->lcore_ports[i - 1].num_entries) +
lcore_offsets[i - 1];
}
rc = cont_write(ctx->out_fd, lcore_offsets, sizeof(lcore_offsets));
if (rc < 0) {
fprintf(stderr, "Failed to write lcore offsets into trace file\n");
goto out;
}
/* Append each lcore trace file into converged trace file */
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
lcore_port = &ctx->lcore_ports[i];
lcore_port->out_history->num_entries = lcore_port->num_entries;
rc = cont_write(ctx->out_fd, lcore_port->out_history, sizeof(struct spdk_trace_history));
if (rc < 0) {
fprintf(stderr, "Failed to write lcore trace header into trace file\n");
goto out;
}
/* Move file offset to the start of trace_entries */
rc = lseek(lcore_port->fd, 0, SEEK_SET);
if (rc != 0) {
fprintf(stderr, "Failed to lseek lcore trace file\n");
goto out;
}
len_sum = 0;
while ((len = cont_read(lcore_port->fd, copy_buff, TRACE_FILE_COPY_SIZE)) > 0) {
len_sum += len;
rc = cont_write(ctx->out_fd, copy_buff, len);
if (rc != len) {
fprintf(stderr, "Failed to write lcore trace entries into trace file\n");
goto out;
}
}
if (len_sum != lcore_port->num_entries * sizeof(struct spdk_trace_entry)) {
fprintf(stderr, "Len of lcore trace file doesn't match number of entries for lcore\n");
}
}
printf("All lcores trace entries are aggregated into trace file %s\n", ctx->out_file);
out:
close(ctx->out_fd);
return rc;
}
static void
__shutdown_signal(int signo)
{
g_shutdown = true;
}
static int
setup_exit_signal_handler(void)
{
struct sigaction sigact;
int rc;
memset(&sigact, 0, sizeof(sigact));
sigemptyset(&sigact.sa_mask);
/* Install the same handler for SIGINT and SIGTERM */
sigact.sa_handler = __shutdown_signal;
rc = sigaction(SIGINT, &sigact, NULL);
if (rc < 0) {
fprintf(stderr, "sigaction(SIGINT) failed\n");
return rc;
}
rc = sigaction(SIGTERM, &sigact, NULL);
if (rc < 0) {
fprintf(stderr, "sigaction(SIGTERM) failed\n");
}
return rc;
}
static void usage(void)
{
printf("\n%s is used to record all SPDK generated trace entries\n", g_exe_name);
printf("from SPDK trace shared-memory to specified file.\n\n");
printf("usage:\n");
printf(" %s <option>\n", g_exe_name);
printf(" option = '-q' to disable verbose mode\n");
printf(" '-s' to specify spdk_trace shm name for a\n");
printf(" currently running process\n");
printf(" '-i' to specify the shared memory ID\n");
printf(" '-p' to specify the trace PID\n");
printf(" (one of -i or -p must be specified)\n");
printf(" '-f' to specify output trace file name\n");
printf(" '-h' to print usage information\n");
}
int main(int argc, char **argv)
{
const char *app_name = NULL;
const char *file_name = NULL;
int op;
char shm_name[64];
int shm_id = -1, shm_pid = -1;
int rc = 0;
int i;
struct aggr_trace_record_ctx ctx = {};
struct lcore_trace_record_ctx *lcore_port;
g_exe_name = argv[0];
while ((op = getopt(argc, argv, "f:i:p:qs:h")) != -1) {
switch (op) {
case 'i':
shm_id = spdk_strtol(optarg, 10);
break;
case 'p':
shm_pid = spdk_strtol(optarg, 10);
break;
case 'q':
g_verbose = 0;
break;
case 's':
app_name = optarg;
break;
case 'f':
file_name = optarg;
break;
case 'h':
usage();
exit(EXIT_SUCCESS);
default:
usage();
exit(1);
}
}
if (file_name == NULL) {
fprintf(stderr, "-f must be specified\n");
usage();
exit(1);
}
if (app_name == NULL) {
fprintf(stderr, "-s must be specified\n");
usage();
exit(1);
}
if (shm_id == -1 && shm_pid == -1) {
fprintf(stderr, "-i or -p must be specified\n");
usage();
exit(1);
}
if (shm_id >= 0) {
snprintf(shm_name, sizeof(shm_name), "/%s_trace.%d", app_name, shm_id);
} else {
snprintf(shm_name, sizeof(shm_name), "/%s_trace.pid%d", app_name, shm_pid);
}
rc = setup_exit_signal_handler();
if (rc) {
exit(1);
}
rc = input_trace_file_mmap(&ctx, shm_name);
if (rc) {
exit(1);
}
rc = output_trace_files_prepare(&ctx, file_name);
if (rc) {
exit(1);
}
printf("Start to poll trace shm file %s\n", shm_name);
while (!g_shutdown && rc == 0) {
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
lcore_port = &ctx.lcore_ports[i];
rc = lcore_trace_record(lcore_port);
if (rc) {
break;
}
}
}
if (rc) {
exit(1);
}
printf("Start to aggregate lcore trace files\n");
rc = trace_files_aggregate(&ctx);
if (rc) {
exit(1);
}
/* Summary report */
printf("TSC Rate: %ju\n", g_tsc_rate);
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
lcore_port = &ctx.lcore_ports[i];
if (lcore_port->num_entries == 0) {
continue;
}
printf("Port %ju trace entries for lcore (%d) in %ju usec\n",
lcore_port->num_entries, i,
(lcore_port->last_entry_tsc - lcore_port->first_entry_tsc) / g_utsc_rate);
}
munmap(ctx.trace_histories, g_histories_size);
close(ctx.shm_fd);
output_trace_files_finish(&ctx);
return 0;
}

View File

@ -33,16 +33,31 @@
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
include $(SPDK_ROOT_DIR)/mk/spdk.modules.mk
APP = vhost
C_SRCS := vhost.c
SPDK_LIB_LIST = $(ALL_MODULES_LIST) event_vhost event_nbd
SPDK_LIB_LIST = event_bdev event_copy event_net event_scsi event_vhost
SPDK_LIB_LIST += jsonrpc json rpc bdev_rpc bdev scsi copy trace conf
SPDK_LIB_LIST += thread util log log_rpc event app_rpc
SPDK_LIB_LIST += vhost rte_vhost event_nbd nbd net
ifeq ($(SPDK_ROOT_DIR)/lib/env_dpdk,$(CONFIG_ENV))
SPDK_LIB_LIST += env_dpdk_rpc
endif
LIBS += $(BLOCKDEV_MODULES_LINKER_ARGS) \
$(COPY_MODULES_LINKER_ARGS) \
$(SOCK_MODULES_LINKER_ARGS)
LIBS += $(SPDK_LIB_LINKER_ARGS)
LIBS += $(ENV_LINKER_ARGS)
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
all : $(APP)
@:
$(APP) : $(OBJS) $(SPDK_LIB_FILES) $(ENV_LIBS) $(BLOCKDEV_MODULES_FILES) $(COPY_MODULES_FILES) $(SOCK_MODULES_FILES)
$(LINK_C)
clean :
$(CLEAN_C) $(APP)
include $(SPDK_ROOT_DIR)/mk/spdk.deps.mk

View File

@ -33,17 +33,31 @@
#include "spdk/stdinc.h"
#include "spdk/conf.h"
#include "spdk/event.h"
#include "spdk/vhost.h"
#define SPDK_VHOST_DEFAULT_CONFIG "/usr/local/etc/spdk/vhost.conf"
#define SPDK_VHOST_DEFAULT_MEM_SIZE 1024
static const char *g_pid_path = NULL;
static void
vhost_app_opts_init(struct spdk_app_opts *opts)
{
spdk_app_opts_init(opts);
opts->name = "vhost";
opts->config_file = SPDK_VHOST_DEFAULT_CONFIG;
opts->mem_size = SPDK_VHOST_DEFAULT_MEM_SIZE;
}
static void
vhost_usage(void)
{
printf(" -f <path> save pid to file under given path\n");
printf(" -S <path> directory where to create vhost sockets (default: pwd)\n");
printf(" -f pidfile save pid to file under given path\n");
printf(" -S dir directory where to create vhost sockets (default: pwd)\n");
}
static void
@ -61,7 +75,7 @@ save_pid(const char *pid_path)
fclose(pid_file);
}
static int
static void
vhost_parse_arg(int ch, char *arg)
{
switch (ch) {
@ -71,14 +85,11 @@ vhost_parse_arg(int ch, char *arg)
case 'S':
spdk_vhost_set_socket_path(arg);
break;
default:
return -EINVAL;
}
return 0;
}
static void
vhost_started(void *arg1)
vhost_started(void *arg1, void *arg2)
{
}
@ -88,10 +99,9 @@ main(int argc, char *argv[])
struct spdk_app_opts opts = {};
int rc;
spdk_app_opts_init(&opts, sizeof(opts));
opts.name = "vhost";
vhost_app_opts_init(&opts);
if ((rc = spdk_app_parse_args(argc, argv, &opts, "f:S:", NULL,
if ((rc = spdk_app_parse_args(argc, argv, &opts, "f:S:",
vhost_parse_arg, vhost_usage)) !=
SPDK_APP_PARSE_ARGS_SUCCESS) {
exit(rc);
@ -102,7 +112,7 @@ main(int argc, char *argv[])
}
/* Blocks until the application is exiting */
rc = spdk_app_start(&opts, vhost_started, NULL);
rc = spdk_app_start(&opts, vhost_started, NULL, NULL);
spdk_app_fini();

View File

@ -2,241 +2,118 @@
set -e
# If the configuration of tests is not provided, no tests will be carried out.
if [[ ! -f $1 ]]; then
echo "ERROR: SPDK test configuration not specified"
exit 1
fi
rootdir=$(readlink -f $(dirname $0))
source "$1"
source "$rootdir/test/common/autotest_common.sh"
source "$rootdir/scripts/common.sh"
out=$output_dir
if [ -n "$SPDK_TEST_NATIVE_DPDK" ]; then
scanbuild_exclude=" --exclude $(dirname $SPDK_RUN_EXTERNAL_DPDK)"
else
scanbuild_exclude="--exclude $rootdir/dpdk/"
fi
scanbuild="scan-build -o $output_dir/scan-build-tmp $scanbuild_exclude --status-bugs"
config_params=$(get_config_params)
trap '[[ -d $SPDK_WORKSPACE ]] && rm -rf "$SPDK_WORKSPACE"' 0
SPDK_WORKSPACE=$(mktemp -dt "spdk_$(date +%s).XXXXXX")
export SPDK_WORKSPACE
out=$PWD
umask 022
cd $rootdir
# Print some test system info out for the log
date -u
git describe --tags
function ocf_precompile() {
# We compile OCF sources ourselves
# They don't need to be checked with scanbuild and code coverage is not applicable
# So we precompile OCF now for further use as standalone static library
./configure $(echo $config_params | sed 's/--enable-coverage//g')
$MAKE $MAKEFLAGS include/spdk/config.h
CC=gcc CCAR=ar $MAKE $MAKEFLAGS -C lib/env_ocf exportlib O=$rootdir/build/ocf.a
# Set config to use precompiled library
config_params="$config_params --with-ocf=/$rootdir/build/ocf.a"
# need to reconfigure to avoid clearing ocf related files on future make clean.
./configure $config_params
}
# Print some test system info out for the log
echo "** START ** Info for Hostname: $HOSTNAME"
uname -a
$MAKE cc_version
$MAKE cxx_version
echo "** END ** Info for Hostname: $HOSTNAME"
function build_native_dpdk() {
local external_dpdk_dir
local external_dpdk_base_dir
timing_enter autobuild
external_dpdk_dir="$SPDK_RUN_EXTERNAL_DPDK"
external_dpdk_base_dir="$(dirname $external_dpdk_dir)"
./configure $config_params
if [[ ! -d "$external_dpdk_base_dir" ]]; then
sudo mkdir -p "$external_dpdk_base_dir"
sudo chown -R $(whoami) "$external_dpdk_base_dir"/..
fi
orgdir=$PWD
timing_enter check_format
if [ $SPDK_RUN_CHECK_FORMAT -eq 1 ]; then
./scripts/check_format.sh
fi
timing_exit check_format
rm -rf "$external_dpdk_base_dir"
git clone --branch $SPDK_TEST_NATIVE_DPDK --depth 1 http://dpdk.org/git/dpdk "$external_dpdk_base_dir"
git -C "$external_dpdk_base_dir" log --oneline -n 5
scanbuild=''
make_timing_label='make'
if [ $SPDK_RUN_SCANBUILD -eq 1 ] && hash scan-build; then
scanbuild="scan-build -o $out/scan-build-tmp --status-bugs"
make_timing_label='scanbuild_make'
report_test_completion "scanbuild"
dpdk_cflags="-fPIC -g -Werror -fcommon"
dpdk_ldflags=""
fi
# the drivers we use
# net/i40e driver is not really needed by us, but it's built as a workaround
# for DPDK issue: https://bugs.dpdk.org/show_bug.cgi?id=576
DPDK_DRIVERS=("bus" "bus/pci" "bus/vdev" "mempool/ring" "net/i40e" "net/i40e/base")
# all possible DPDK drivers
DPDK_ALL_DRIVERS=($(find "$external_dpdk_base_dir/drivers" -mindepth 1 -type d | sed -n "s#^$external_dpdk_base_dir/drivers/##p"))
if [ $SPDK_RUN_VALGRIND -eq 1 ]; then
report_test_completion "valgrind"
fi
if [[ "$SPDK_TEST_CRYPTO" -eq 1 ]]; then
git clone --branch v0.54 --depth 1 https://github.com/intel/intel-ipsec-mb.git "$external_dpdk_base_dir/intel-ipsec-mb"
cd "$external_dpdk_base_dir/intel-ipsec-mb"
$MAKE $MAKEFLAGS all SHARED=y EXTRA_CFLAGS=-fPIC
DPDK_DRIVERS+=("crypto")
DPDK_DRIVERS+=("crypto/aesni_mb")
DPDK_DRIVERS+=("crypto/qat")
DPDK_DRIVERS+=("compress/qat")
DPDK_DRIVERS+=("common/qat")
dpdk_cflags+=" -I$external_dpdk_base_dir/intel-ipsec-mb"
dpdk_ldflags+=" -L$external_dpdk_base_dir/intel-ipsec-mb"
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$external_dpdk_base_dir/intel-ipsec-mb
fi
if [ $SPDK_RUN_ASAN -eq 1 ]; then
report_test_completion "asan"
fi
if [[ "$SPDK_TEST_REDUCE" -eq 1 ]]; then
isal_dir="$external_dpdk_base_dir/isa-l"
git clone --branch v2.29.0 --depth 1 https://github.com/intel/isa-l.git "$isal_dir"
if [ $SPDK_RUN_UBSAN -eq 1 ]; then
report_test_completion "ubsan"
fi
cd $isal_dir
./autogen.sh
./configure CFLAGS="-fPIC -g -O2" --enable-shared=yes --prefix="$isal_dir/build"
ln -s $PWD/include $PWD/isa-l
$MAKE $MAKEFLAGS all
$MAKE install
DPDK_DRIVERS+=("compress")
DPDK_DRIVERS+=("compress/isal")
DPDK_DRIVERS+=("compress/qat")
DPDK_DRIVERS+=("common/qat")
export PKG_CONFIG_PATH="$PKG_CONFIG_PATH:$isal_dir/build/lib/pkgconfig"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$isal_dir/build/lib"
fi
echo $scanbuild
$MAKE $MAKEFLAGS clean
# Use difference between DPDK_ALL_DRIVERS and DPDK_DRIVERS as a set of DPDK drivers we don't want or
# don't need to build.
DPDK_DISABLED_DRIVERS=($(sort <(printf "%s\n" "${DPDK_DRIVERS[@]}") <(printf "%s\n" "${DPDK_ALL_DRIVERS[@]}") | uniq -u))
cd $external_dpdk_base_dir
if [ "$(uname -s)" = "Linux" ]; then
dpdk_cflags+=" -Wno-stringop-overflow"
# Fix for freeing device if not kernel driver configured.
# TODO: Remove once this is merged in upstream DPDK
if grep "20.08.0" $external_dpdk_base_dir/VERSION; then
wget https://github.com/spdk/dpdk/commit/64f1ced13f974e8b3d46b87c361a09eca68126f9.patch -O dpdk-pci.patch
wget https://github.com/spdk/dpdk/commit/c2c273d5c8fbf673623b427f8f4ab5af5ddf0e08.patch -O dpdk-qat.patch
elif grep "20.11\|21.02" $external_dpdk_base_dir/VERSION; then
wget https://github.com/karlatec/dpdk/commit/3219c0cfc38803aec10c809dde16e013b370bda9.patch -O dpdk-pci.patch
wget https://github.com/karlatec/dpdk/commit/adf8f7638de29bc4bf9ba3faf12bbdae73acda0c.patch -O dpdk-qat.patch
else
wget https://github.com/karlatec/dpdk/commit/f95e331be3a1f856b816948990dd2afc67ea4020.patch -O dpdk-pci.patch
wget https://github.com/karlatec/dpdk/commit/6fd2fa906ffdcee04e6ce5da40e61cb841be9827.patch -O dpdk-qat.patch
fi
git config --local user.name "spdk"
git config --local user.email "nomail@all.com"
git am dpdk-pci.patch
git am dpdk-qat.patch
fi
meson build-tmp --prefix="$external_dpdk_dir" --libdir lib \
-Denable_docs=false -Denable_kmods=false -Dtests=false \
-Dc_link_args="$dpdk_ldflags" -Dc_args="$dpdk_cflags" \
-Dmachine=native -Ddisable_drivers=$(printf "%s," "${DPDK_DISABLED_DRIVERS[@]}")
ninja -C "$external_dpdk_base_dir/build-tmp" $MAKEFLAGS
ninja -C "$external_dpdk_base_dir/build-tmp" $MAKEFLAGS install
# Save this path. In tests are run using autorun.sh then autotest.sh
# script will be unaware of LD_LIBRARY_PATH and will fail tests.
echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH" > /tmp/spdk-ld-path
cd "$orgdir"
}
function make_fail_cleanup() {
timing_enter "$make_timing_label"
fail=0
time $scanbuild $MAKE $MAKEFLAGS || fail=1
if [ $fail -eq 1 ]; then
if [ -d $out/scan-build-tmp ]; then
scanoutput=$(ls -1 $out/scan-build-tmp/)
mv $out/scan-build-tmp/$scanoutput $out/scan-build
rm -rf $out/scan-build-tmp
chmod -R a+rX $out/scan-build
fi
false
}
exit 1
else
rm -rf $out/scan-build-tmp
fi
timing_exit "$make_timing_label"
function scanbuild_make() {
pass=true
$scanbuild $MAKE $MAKEFLAGS > $out/build_output.txt && rm -rf $out/scan-build-tmp || make_fail_cleanup
xtrace_disable
rm -f $out/*files.txt
for ent in $(find app examples lib module test -type f | grep -vF ".h"); do
if [[ $ent == lib/env_ocf* ]]; then continue; fi
if file -bi $ent | grep -q 'text/x-c'; then
echo $ent | sed 's/\.cp\{0,2\}$//g' >> $out/all_c_files.txt
fi
done
xtrace_restore
grep -E "CC|CXX" $out/build_output.txt | sed 's/\s\s\(CC\|CXX\)\s//g' | sed 's/\.o//g' > $out/built_c_files.txt
cat $rootdir/test/common/skipped_build_files.txt >> $out/built_c_files.txt
sort -o $out/all_c_files.txt $out/all_c_files.txt
sort -o $out/built_c_files.txt $out/built_c_files.txt
# from comm manual:
# -2 suppress column 2 (lines unique to FILE2)
# -3 suppress column 3 (lines that appear in both files)
# comm may exit 1 if no lines were printed (undocumented, unreliable)
comm -2 -3 $out/all_c_files.txt $out/built_c_files.txt > $out/unbuilt_c_files.txt || true
if [ $(wc -l < $out/unbuilt_c_files.txt) -ge 1 ]; then
echo "missing files"
cat $out/unbuilt_c_files.txt
pass=false
fi
$pass
}
function porcelain_check() {
if [ $(git status --porcelain --ignore-submodules | wc -l) -ne 0 ]; then
echo "Generated files missing from .gitignore:"
git status --porcelain --ignore-submodules
exit 1
fi
}
# Check for generated files that are not listed in .gitignore
timing_enter generated_files_check
if [ `git status --porcelain --ignore-submodules | wc -l` -ne 0 ]; then
echo "Generated files missing from .gitignore:"
git status --porcelain
exit 1
fi
timing_exit generated_files_check
# Check that header file dependencies are working correctly by
# capturing a binary's stat data before and after touching a
# header file and re-making.
function header_dependency_check() {
STAT1=$(stat $SPDK_BIN_DIR/spdk_tgt)
sleep 1
touch lib/nvme/nvme_internal.h
$MAKE $MAKEFLAGS
STAT2=$(stat $SPDK_BIN_DIR/spdk_tgt)
timing_enter dependency_check
STAT1=`stat examples/nvme/identify/identify`
sleep 1
touch lib/nvme/nvme_internal.h
$MAKE $MAKEFLAGS
STAT2=`stat examples/nvme/identify/identify`
if [ "$STAT1" == "$STAT2" ]; then
echo "Header dependency check failed"
false
fi
}
if [ "$STAT1" == "$STAT2" ]; then
echo "Header dependency check failed"
exit 1
fi
timing_exit dependency_check
function test_make_uninstall() {
# Create empty file to check if it is not deleted by target uninstall
touch "$SPDK_WORKSPACE/usr/lib/sample_xyz.a"
$MAKE $MAKEFLAGS uninstall DESTDIR="$SPDK_WORKSPACE" prefix=/usr
if [[ $(find "$SPDK_WORKSPACE/usr" -maxdepth 1 -mindepth 1 | wc -l) -ne 2 ]] || [[ $(find "$SPDK_WORKSPACE/usr/lib/" -maxdepth 1 -mindepth 1 | wc -l) -ne 1 ]]; then
ls -lR "$SPDK_WORKSPACE"
echo "Make uninstall failed"
exit 1
fi
}
function build_doc() {
local doxygenv
doxygenv=$(doxygen --version)
# Test 'make install'
timing_enter make_install
rm -rf /tmp/spdk
mkdir /tmp/spdk
$MAKE $MAKEFLAGS install DESTDIR=/tmp/spdk prefix=/usr
ls -lR /tmp/spdk
rm -rf /tmp/spdk
timing_exit make_install
timing_enter doxygen
if [ $SPDK_BUILD_DOC -eq 1 ] && hash doxygen; then
$MAKE -C "$rootdir"/doc --no-print-directory $MAKEFLAGS &> "$out"/doxygen.log
if [ -s "$out"/doxygen.log ]; then
cat "$out"/doxygen.log
echo "Doxygen errors found!"
eq "$doxygenv" 1.8.20 || exit 1
echo "Doxygen $doxygenv detected, all warnings are potentially false positives, continuing the test"
exit 1
fi
if hash pdflatex 2> /dev/null; then
if hash pdflatex 2>/dev/null; then
$MAKE -C "$rootdir"/doc/output/latex --no-print-directory $MAKEFLAGS &>> "$out"/doxygen.log
fi
mkdir -p "$out"/doc
@ -246,58 +123,10 @@ function build_doc() {
fi
$MAKE -C "$rootdir"/doc --no-print-directory $MAKEFLAGS clean &>> "$out"/doxygen.log
if [ -s "$out"/doxygen.log ]; then
# Save the log as an artifact in case we are working with potentially broken version
eq "$doxygenv" 1.8.20 || rm "$out"/doxygen.log
rm "$out"/doxygen.log
fi
rm -rf "$rootdir"/doc/output
}
function autobuild_test_suite() {
run_test "autobuild_check_format" ./scripts/check_format.sh
run_test "autobuild_external_code" sudo -E --preserve-env=PATH LD_LIBRARY_PATH=$LD_LIBRARY_PATH $rootdir/test/external_code/test_make.sh $rootdir
if [ "$SPDK_TEST_OCF" -eq 1 ]; then
run_test "autobuild_ocf_precompile" ocf_precompile
fi
run_test "autobuild_check_so_deps" $rootdir/test/make/check_so_deps.sh $1
./configure $config_params --without-shared
run_test "scanbuild_make" scanbuild_make
run_test "autobuild_generated_files_check" porcelain_check
run_test "autobuild_header_dependency_check" header_dependency_check
run_test "autobuild_make_install" $MAKE $MAKEFLAGS install DESTDIR="$SPDK_WORKSPACE" prefix=/usr
run_test "autobuild_make_uninstall" test_make_uninstall
run_test "autobuild_build_doc" build_doc
}
if [ $SPDK_RUN_VALGRIND -eq 1 ]; then
run_test "valgrind" echo "using valgrind"
fi
timing_exit doxygen
if [ $SPDK_RUN_ASAN -eq 1 ]; then
run_test "asan" echo "using asan"
fi
if [ $SPDK_RUN_UBSAN -eq 1 ]; then
run_test "ubsan" echo "using ubsan"
fi
if [ -n "$SPDK_TEST_NATIVE_DPDK" ]; then
run_test "build_native_dpdk" build_native_dpdk
fi
./configure $config_params
echo "** START ** Info for Hostname: $HOSTNAME"
uname -a
$MAKE cc_version
$MAKE cxx_version
echo "** END ** Info for Hostname: $HOSTNAME"
if [ "$SPDK_TEST_AUTOBUILD" -eq 1 ]; then
run_test "autobuild" autobuild_test_suite $1
else
if [ "$SPDK_TEST_OCF" -eq 1 ]; then
run_test "autobuild_ocf_precompile" ocf_precompile
fi
# if we aren't testing the unittests, build with shared objects.
./configure $config_params --with-shared
run_test "make" $MAKE $MAKEFLAGS
fi
timing_exit autobuild

View File

@ -2,85 +2,53 @@
set -xe
# If the configuration of tests is not provided, no tests will be carried out.
if [[ ! -f $1 ]]; then
echo "ERROR: SPDK test configuration not specified"
exit 1
fi
source "$1"
rootdir=$(readlink -f $(dirname $0))
source "$rootdir/test/common/autotest_common.sh"
function build_rpms() (
local version rpms
# Make sure linker will not attempt to look under DPDK's repo dir to get the libs
unset -v LD_LIBRARY_PATH
install_uninstall_rpms() {
rpms=("$HOME/rpmbuild/RPMS/x86_64/"spdk{,-devel,{,-dpdk}-libs}-$version-1.x86_64.rpm)
sudo rpm -i "${rpms[@]}"
rpms=("${rpms[@]##*/}") rpms=("${rpms[@]%.rpm}")
# Check if we can find one of the apps in the PATH now and verify if it doesn't miss
# any libs.
LIST_LIBS=yes "$rootdir/rpmbuild/rpm-deps.sh" "${SPDK_APP[@]##*/}"
sudo rpm -e "${rpms[@]}"
}
build_rpm() {
MAKEFLAGS="$MAKEFLAGS" SPDK_VERSION="$version" DEPS=no "$rootdir/rpmbuild/rpm.sh" "$@"
install_uninstall_rpms
}
version="test_shared"
run_test "build_shared_rpm" build_rpm --with-shared
if [[ -n $SPDK_TEST_NATIVE_DPDK ]]; then
version="test_shared_native_dpdk"
run_test "build_shared_native_dpdk_rpm" build_rpm --with-shared --with-dpdk="$SPDK_RUN_EXTERNAL_DPDK"
fi
)
out=$PWD
MAKEFLAGS=${MAKEFLAGS:--j16}
cd $rootdir
timing_enter porcelain_check
timing_enter autopackage
$MAKE clean
if [ $(git status --porcelain --ignore-submodules | wc -l) -ne 0 ]; then
if [ `git status --porcelain --ignore-submodules | wc -l` -ne 0 ]; then
echo make clean left the following files:
git status --porcelain --ignore-submodules
git status --porcelain
exit 1
fi
timing_exit porcelain_check
if [[ $SPDK_TEST_RELEASE_BUILD -eq 1 ]]; then
run_test "build_rpms" build_rpms
$MAKE clean
spdk_pv=spdk-$(date +%Y_%m_%d)
spdk_tarball=${spdk_pv}.tar
dpdk_pv=dpdk-$(date +%Y_%m_%d)
dpdk_tarball=${dpdk_pv}.tar
find . -iname "spdk-*.tar* dpdk-*.tar*" -delete
git archive HEAD^{tree} --prefix=${spdk_pv}/ -o ${spdk_tarball}
# Build from packaged source
tmpdir=$(mktemp -d)
echo "tmpdir=$tmpdir"
tar -C "$tmpdir" -xf $spdk_tarball
if [ -z "$WITH_DPDK_DIR" ]; then
cd dpdk
git archive HEAD^{tree} --prefix=dpdk/ -o ../${dpdk_tarball}
cd ..
tar -C "$tmpdir/${spdk_pv}" -xf $dpdk_tarball
fi
if [[ $RUN_NIGHTLY -eq 0 ]]; then
timing_finish
exit 0
fi
(
cd "$tmpdir"/spdk-*
# use $config_params to get the right dependency options, but disable coverage and ubsan
# explicitly since they are not needed for this build
./configure $config_params --disable-debug --enable-werror --disable-coverage --disable-ubsan
time $MAKE ${MAKEFLAGS}
)
rm -rf "$tmpdir"
timing_enter build_release
config_params="$(get_config_params | sed 's/--enable-debug//g')"
if [ $(uname -s) = Linux ]; then
./configure $config_params --enable-lto
else
# LTO needs a special compiler to work on BSD.
./configure $config_params
fi
$MAKE ${MAKEFLAGS}
$MAKE ${MAKEFLAGS} clean
timing_exit build_release
timing_exit autopackage
timing_finish

View File

@ -4,19 +4,9 @@ set -e
rootdir=$(readlink -f $(dirname $0))
default_conf=~/autorun-spdk.conf
conf=${1:-${default_conf}}
# If the configuration of tests is not provided, no tests will be carried out.
if [[ ! -f $conf ]]; then
echo "ERROR: $conf doesn't exist"
exit 1
fi
echo "Test configuration:"
cat "$conf"
conf=~/autorun-spdk.conf
# Runs agent scripts
$rootdir/autobuild.sh "$conf"
sudo -E $rootdir/autotest.sh "$conf"
sudo $rootdir/autotest.sh "$conf"
$rootdir/autopackage.sh "$conf"

View File

@ -1,4 +1,4 @@
#!/usr/bin/python3
#! /usr/bin/python3
import shutil
import subprocess
@ -6,74 +6,50 @@ import argparse
import os
import glob
import re
import pandas as pd
def highest_value(inp):
ret_value = False
for x in inp:
if x:
return True
else:
return False
def generateTestCompletionTables(output_dir, completion_table):
data_table = pd.DataFrame(completion_table, columns=["Agent", "Domain", "Test", "With Asan", "With UBsan"])
data_table.to_html(os.path.join(output_dir, 'completions_table.html'))
os.makedirs(os.path.join(output_dir, "post_process"), exist_ok=True)
pivot_by_agent = pd.pivot_table(data_table, index=["Agent", "Domain", "Test"])
pivot_by_agent.to_html(os.path.join(output_dir, "post_process", 'completions_table_by_agent.html'))
pivot_by_test = pd.pivot_table(data_table, index=["Domain", "Test", "Agent"])
pivot_by_test.to_html(os.path.join(output_dir, "post_process", 'completions_table_by_test.html'))
pivot_by_asan = pd.pivot_table(data_table, index=["Domain", "Test"], values=["With Asan"], aggfunc=highest_value)
pivot_by_asan.to_html(os.path.join(output_dir, "post_process", 'completions_table_by_asan.html'))
pivot_by_ubsan = pd.pivot_table(data_table, index=["Domain", "Test"], values=["With UBsan"], aggfunc=highest_value)
pivot_by_ubsan.to_html(os.path.join(output_dir, "post_process", 'completions_table_by_ubsan.html'))
def generateCoverageReport(output_dir, repo_dir):
coveragePath = os.path.join(output_dir, '**', 'cov_total.info')
covfiles = [os.path.abspath(p) for p in glob.glob(coveragePath, recursive=True)]
for f in covfiles:
print(f)
if len(covfiles) == 0:
return
lcov_opts = [
'--rc lcov_branch_coverage=1',
'--rc lcov_function_coverage=1',
'--rc genhtml_branch_coverage=1',
'--rc genhtml_function_coverage=1',
'--rc genhtml_legend=1',
'--rc geninfo_all_blocks=1',
]
cov_total = os.path.abspath(os.path.join(output_dir, 'cov_total.info'))
coverage = os.path.join(output_dir, 'coverage')
lcov = 'lcov' + ' ' + ' '.join(lcov_opts) + ' -q -a ' + ' -a '.join(covfiles) + ' -o ' + cov_total
genhtml = 'genhtml' + ' ' + ' '.join(lcov_opts) + ' -q ' + cov_total + ' --legend' + ' -t "Combined" --show-details -o ' + coverage
try:
subprocess.check_call([lcov], shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
except subprocess.CalledProcessError as e:
print("lcov failed")
print(e)
return
cov_total_file = open(cov_total, 'r')
replacement = "SF:" + repo_dir
file_contents = cov_total_file.readlines()
cov_total_file.close()
os.remove(cov_total)
with open(cov_total, 'w+') as file:
for Line in file_contents:
Line = re.sub("^SF:.*/repo", replacement, Line)
file.write(Line + '\n')
try:
subprocess.check_call([genhtml], shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
except subprocess.CalledProcessError as e:
print("genhtml failed")
print(e)
for f in covfiles:
os.remove(f)
with open(os.path.join(output_dir, 'coverage.log'), 'w+') as log_file:
coveragePath = os.path.join(output_dir, '**', 'cov_total.info')
covfiles = [os.path.abspath(p) for p in glob.glob(coveragePath, recursive=True)]
for f in covfiles:
print(f, file=log_file)
if len(covfiles) == 0:
return
lcov_opts = [
'--rc lcov_branch_coverage=1',
'--rc lcov_function_coverage=1',
'--rc genhtml_branch_coverage=1',
'--rc genhtml_function_coverage=1',
'--rc genhtml_legend=1',
'--rc geninfo_all_blocks=1',
]
cov_total = os.path.abspath(os.path.join(output_dir, 'cov_total.info'))
coverage = os.path.join(output_dir, 'coverage')
lcov = 'lcov' + ' ' + ' '.join(lcov_opts) + ' -q -a ' + ' -a '.join(covfiles) + ' -o ' + cov_total
genhtml = 'genhtml' + ' ' + ' '.join(lcov_opts) + ' -q ' + cov_total + ' --legend' + ' -t "Combined" --show-details -o ' + coverage
try:
subprocess.check_call([lcov], shell=True, stdout=log_file, stderr=log_file)
except subprocess.CalledProcessError as e:
print("lcov failed", file=log_file)
print(e, file=log_file)
return
cov_total_file = open(cov_total, 'r')
replacement = "SF:" + repo_dir
file_contents = cov_total_file.readlines()
cov_total_file.close()
os.remove(cov_total)
with open(cov_total, 'w+') as file:
for Line in file_contents:
Line = re.sub("^SF:.*/repo", replacement, Line)
file.write(Line + '\n')
try:
subprocess.check_call([genhtml], shell=True, stdout=log_file, stderr=log_file)
except subprocess.CalledProcessError as e:
print("genhtml failed", file=log_file)
print(e, file=log_file)
for f in covfiles:
os.remove(f)
def collectOne(output_dir, dir_name):
@ -91,96 +67,80 @@ def collectOne(output_dir, dir_name):
shutil.rmtree(d)
def getCompletions(completionFile, test_list, test_completion_table):
agent_name = os.path.basename(os.path.dirname(completionFile))
with open(completionFile, 'r') as completionList:
completions = completionList.read()
asan_enabled = "asan" in completions
ubsan_enabled = "ubsan" in completions
for line in completions.splitlines():
try:
domain, test_name = line.strip().split()
test_list[test_name] = (True, asan_enabled | test_list[test_name][1], ubsan_enabled | test_list[test_name][2])
test_completion_table.append([agent_name, domain, test_name, asan_enabled, ubsan_enabled])
try:
test_completion_table.remove(["None", "None", test_name, False, False])
except ValueError:
continue
except KeyError:
continue
def printList(header, test_list, index, condition):
print("\n\n-----%s------" % header)
executed_tests = [x for x in sorted(test_list) if test_list[x][index] is condition]
print(*executed_tests, sep="\n")
def printListInformation(table_type, test_list):
printList("%s Executed in Build" % table_type, test_list, 0, True)
printList("%s Missing From Build" % table_type, test_list, 0, False)
printList("%s Missing ASAN" % table_type, test_list, 1, False)
printList("%s Missing UBSAN" % table_type, test_list, 2, False)
def getSkippedTests(repo_dir):
skipped_test_file = os.path.join(repo_dir, "test", "common", "skipped_tests.txt")
if not os.path.exists(skipped_test_file):
return []
else:
with open(skipped_test_file, "r") as skipped_test_data:
return [x.strip() for x in skipped_test_data.readlines() if "#" not in x and x.strip() != '']
def confirmPerPatchTests(test_list, skiplist):
missing_tests = [x for x in sorted(test_list) if test_list[x][0] is False
and x not in skiplist]
if len(missing_tests) > 0:
print("Not all tests were run. Failing the build.")
print(missing_tests)
exit(1)
def aggregateCompletedTests(output_dir, repo_dir, skip_confirm=False):
def aggregateCompletedTests(output_dir, repo_dir):
test_list = {}
test_completion_table = []
testFiles = glob.glob(os.path.join(output_dir, '**', 'all_tests.txt'), recursive=True)
completionFiles = glob.glob(os.path.join(output_dir, '**', 'test_completions.txt'), recursive=True)
test_with_asan = {}
test_with_ubsan = {}
asan_enabled = False
ubsan_enabled = False
test_unit_with_valgrind = False
testFilePath = os.path.join(output_dir, '**', 'all_tests.txt')
completionFilePath = os.path.join(output_dir, '**', 'test_completions.txt')
testFiles = glob.glob(testFilePath, recursive=True)
completionFiles = glob.glob(completionFilePath, recursive=True)
testSummary = os.path.join(output_dir, "test_execution.log")
if len(testFiles) == 0:
print("Unable to perform test completion aggregator. No input files.")
return 0
for item in testFiles:
with open(item, 'r') as raw_test_list:
for line in raw_test_list:
test_list[line.strip()] = (False, False, False)
for item in completionFiles:
with open(item, 'r') as completion_list:
completions = completion_list.read()
with open(testFiles[0], 'r') as raw_test_list:
for line in raw_test_list:
try:
test_name = line.strip()
except Exception:
print("Failed to parse a test type.")
return 1
if "asan" not in completions:
asan_enabled = False
else:
asan_enabled = True
test_list[test_name] = (False, False, False)
test_completion_table.append(["None", "None", test_name, False, False])
if "ubsan" not in completions:
ubsan_enabled = False
else:
ubsan_enabled = True
for completionFile in completionFiles:
getCompletions(completionFile, test_list, test_completion_table)
if "valgrind" in completions and "unittest" in completions:
test_unit_with_valgrind = True
for line in completions.split('\n'):
try:
test_list[line.strip()] = (True, asan_enabled | test_list[line.strip()][1], ubsan_enabled | test_list[line.strip()][1])
except KeyError:
continue
printListInformation("Tests", test_list)
generateTestCompletionTables(output_dir, test_completion_table)
skipped_tests = getSkippedTests(repo_dir)
if not skip_confirm:
confirmPerPatchTests(test_list, skipped_tests)
with open(testSummary, 'w') as fh:
fh.write("\n\n-----Tests Executed in Build------\n")
for item in sorted(test_list):
if test_list[item][0]:
fh.write(item + "\n")
fh.write("\n\n-----Tests Missing From Build------\n")
if not test_unit_with_valgrind:
fh.write("UNITTEST_WITH_VALGRIND\n")
for item in sorted(test_list):
if test_list[item][0] is False:
fh.write(item + "\n")
fh.write("\n\n-----Tests Missing ASAN------\n")
for item in sorted(test_list):
if test_list[item][1] is False:
fh.write(item + "\n")
fh.write("\n\n-----Tests Missing UBSAN------\n")
for item in sorted(test_list):
if test_list[item][2] is False:
fh.write(item + "\n")
with open(testSummary, 'r') as fh:
print(fh.read())
def main(output_dir, repo_dir, skip_confirm=False):
print("-----Begin Post Process Script------")
def main(output_dir, repo_dir):
generateCoverageReport(output_dir, repo_dir)
collectOne(output_dir, 'doc')
collectOne(output_dir, 'ut_coverage')
aggregateCompletedTests(output_dir, repo_dir, skip_confirm)
aggregateCompletedTests(output_dir, repo_dir)
if __name__ == "__main__":
@ -189,7 +149,5 @@ if __name__ == "__main__":
help="The location of your build's output directory")
parser.add_argument("-r", "--repo_directory", type=str, required=True,
help="The location of your spdk repository")
parser.add_argument("-s", "--skip_confirm", required=False, action="store_true",
help="Do not check if all autotest.sh tests were executed.")
args = parser.parse_args()
main(args.directory_location, args.repo_directory, args.skip_confirm)
main(args.directory_location, args.repo_directory)

View File

@ -1,53 +1,19 @@
#!/usr/bin/env bash
rootdir=$(readlink -f $(dirname $0))
# In autotest_common.sh all tests are disabled by default.
# If the configuration of tests is not provided, no tests will be carried out.
if [[ ! -f $1 ]]; then
echo "ERROR: SPDK test configuration not specified"
exit 1
fi
# always test with SPDK shared objects.
export SPDK_LIB_DIR="$rootdir/build/lib"
# Autotest.sh, as part of autorun.sh, runs in a different
# shell process than autobuild.sh. Use helper file to pass
# over env variable containing libraries paths.
if [[ -e /tmp/spdk-ld-path ]]; then
source /tmp/spdk-ld-path
fi
source "$1"
source "$rootdir/test/common/autotest_common.sh"
source "$rootdir/test/nvmf/common.sh"
set -xe
if [ $EUID -ne 0 ]; then
echo "$0 must be run as root"
exit 1
fi
if [ $(uname -s) = Linux ]; then
old_core_pattern=$(< /proc/sys/kernel/core_pattern)
mkdir -p "$output_dir/coredumps"
# set core_pattern to a known value to avoid ABRT, systemd-coredump, etc.
echo "|$rootdir/scripts/core-collector.sh %P %s %t $output_dir/coredumps" > /proc/sys/kernel/core_pattern
echo 2 > /proc/sys/kernel/core_pipe_limit
# Make sure that the hugepage state for our VM is fresh so we don't fail
# hugepage allocation. Allow time for this action to complete.
echo 1 > /proc/sys/vm/drop_caches
sleep 3
# make sure nbd (network block device) driver is loaded if it is available
# this ensures that when tests need to use nbd, it will be fully initialized
modprobe nbd || true
if udevadm=$(type -P udevadm); then
"$udevadm" monitor --property &> "$output_dir/udev.log" &
udevadm_pid=$!
fi
echo "core" > /proc/sys/kernel/core_pattern
fi
trap "process_core; autotest_cleanup; exit 1" SIGINT SIGTERM EXIT
@ -57,16 +23,14 @@ timing_enter autotest
create_test_list
src=$(readlink -f $(dirname $0))
out=$output_dir
out=$PWD
cd $src
./scripts/setup.sh status
freebsd_update_contigmem_mod
# lcov takes considerable time to process clang coverage.
# Disabling lcov allow us to do this.
# More information: https://github.com/spdk/spdk/issues/1693
CC_TYPE=$(grep CC_TYPE mk/cc.mk)
if hash lcov && ! [[ "$CC_TYPE" == *"clang"* ]]; then
if hash lcov; then
# setup output dir for unittest.sh
export UT_COVERAGE=$out/ut_coverage
export LCOV_OPTS="
@ -81,69 +45,29 @@ if hash lcov && ! [[ "$CC_TYPE" == *"clang"* ]]; then
# Print lcov version to log
$LCOV -v
# zero out coverage data
$LCOV -q -c -i -t "Baseline" -d $src -o $out/cov_base.info
$LCOV -q -c -i -t "Baseline" -d $src -o cov_base.info
fi
# Make sure the disks are clean (no leftover partition tables)
timing_enter cleanup
# Remove old domain socket pathname just in case
rm -f /var/tmp/spdk*.sock
# Load the kernel driver
./scripts/setup.sh reset
if [ $(uname -s) = Linux ]; then
# OCSSD devices drivers don't support IO issues by kernel so
# detect OCSSD devices and block them (unbind from any driver).
# If test scripts want to use this device it needs to do this explicitly.
#
# If some OCSSD device is bound to other driver than nvme we won't be able to
# discover if it is OCSSD or not so load the kernel driver first.
# Load the kernel driver
./scripts/setup.sh reset
while IFS= read -r -d '' dev; do
# Send Open Channel 2.0 Geometry opcode "0xe2" - not supported by NVMe device.
if nvme admin-passthru $dev --namespace-id=1 --data-len=4096 --opcode=0xe2 --read > /dev/null; then
bdf="$(basename $(readlink -e /sys/class/nvme/${dev#/dev/}/device))"
echo "INFO: blocking OCSSD device: $dev ($bdf)"
PCI_BLOCKED+=" $bdf"
OCSSD_PCI_DEVICES+=" $bdf"
fi
done < <(find /dev -maxdepth 1 -regex '/dev/nvme[0-9]+' -print0)
# Let the kernel discover any filesystems or partitions
sleep 10
export OCSSD_PCI_DEVICES
# Delete all partitions on NVMe devices
devs=`lsblk -l -o NAME | grep nvme | grep -v p` || true
for dev in $devs; do
parted -s /dev/$dev mklabel msdos
done
# Now, bind blocked devices to pci-stub module. This will prevent
# automatic grabbing these devices when we add device/vendor ID to
# proper driver.
if [[ -n "$PCI_BLOCKED" ]]; then
# shellcheck disable=SC2097,SC2098
PCI_ALLOWED="$PCI_BLOCKED" \
PCI_BLOCKED="" \
DRIVER_OVERRIDE="pci-stub" \
./scripts/setup.sh
# Export our blocked list so it will take effect during next setup.sh
export PCI_BLOCKED
fi
run_test "setup.sh" "$rootdir/test/setup/test-setup.sh"
# Load RAM disk driver if available
modprobe brd || true
fi
./scripts/setup.sh status
if [[ $(uname -s) == Linux ]]; then
# Revert NVMe namespaces to default state
nvme_namespace_revert
fi
# Delete all leftover lvols and gpt partitions
# Matches both /dev/nvmeXnY on Linux and /dev/nvmeXnsY on BSD
# Filter out nvme with partitions - the "p*" suffix
for dev in $(ls /dev/nvme*n* | grep -v p || true); do
dd if=/dev/zero of="$dev" bs=1M count=1
done
sync
timing_exit cleanup
# set up huge pages
@ -151,186 +75,160 @@ timing_enter afterboot
./scripts/setup.sh
timing_exit afterboot
if [[ $SPDK_TEST_CRYPTO -eq 1 || $SPDK_TEST_REDUCE -eq 1 ]]; then
# Make sure that memory is distributed across all NUMA nodes - by default, all goes to
# node0, but if QAT devices are attached to a different node, all of their VFs will end
# up under that node too and memory needs to be available there for the tests.
CLEAR_HUGE=yes HUGE_EVEN_ALLOC=yes ./scripts/setup.sh
./scripts/setup.sh status
if [[ $SPDK_TEST_USE_IGB_UIO -eq 1 ]]; then
./scripts/qat_setup.sh igb_uio
else
./scripts/qat_setup.sh
fi
fi
# Revert existing OPAL to factory settings that may have been left from earlier failed tests.
# This ensures we won't hit any unexpected failures due to NVMe SSDs being locked.
opal_revert_cleanup
timing_enter nvmf_setup
rdma_device_init
timing_exit nvmf_setup
#####################
# Unit Tests
#####################
if [ $SPDK_TEST_UNITTEST -eq 1 ]; then
run_test "unittest" ./test/unit/unittest.sh
run_test "env" test/env/env.sh
timing_enter unittest
run_test ./test/unit/unittest.sh
report_test_completion "unittest"
timing_exit unittest
fi
if [ $SPDK_RUN_FUNCTIONAL_TEST -eq 1 ]; then
timing_enter lib
timing_enter lib
run_test "rpc" test/rpc/rpc.sh
run_test "rpc_client" test/rpc_client/rpc_client.sh
run_test "json_config" ./test/json_config/json_config.sh
run_test "alias_rpc" test/json_config/alias_rpc/alias_rpc.sh
run_test "spdkcli_tcp" test/spdkcli/tcp.sh
run_test "dpdk_mem_utility" test/dpdk_memory_utility/test_dpdk_mem_info.sh
run_test "event" test/event/event.sh
run_test "accel_engine" test/accel_engine/accel_engine.sh
if [ $SPDK_TEST_BLOCKDEV -eq 1 ]; then
run_test "blockdev_general" test/bdev/blockdev.sh
run_test "bdev_raid" test/bdev/bdev_raid.sh
run_test "bdevperf_config" test/bdev/bdevperf/test_config.sh
if [[ $(uname -s) == Linux ]]; then
run_test "spdk_dd" test/dd/dd.sh
run_test "reactor_set_interrupt" test/interrupt/reactor_set_interrupt.sh
if [ $SPDK_TEST_BLOCKDEV -eq 1 ]; then
run_test test/bdev/blockdev.sh
if [ $(uname -s) = Linux ]; then
run_test test/bdev/bdevjson/json_config.sh
if modprobe -n nbd; then
run_test test/bdev/nbdjson/json_config.sh
fi
fi
fi
if [ $SPDK_TEST_JSON -eq 1 ]; then
run_test "test_converter" test/config_converter/test_converter.sh
if [ $SPDK_TEST_EVENT -eq 1 ]; then
run_test test/event/event.sh
fi
if [ $SPDK_TEST_NVME -eq 1 ]; then
run_test test/nvme/nvme.sh
if [ $SPDK_TEST_NVME_CLI -eq 1 ]; then
run_test test/nvme/spdk_nvme_cli.sh
fi
# Only test hotplug without ASAN enabled. Since if it is
# enabled, it catches SEGV earlier than our handler which
# breaks the hotplug logic
if [ $SPDK_RUN_ASAN -eq 0 ]; then
run_test test/nvme/hotplug.sh intel
fi
fi
run_test test/env/env.sh
if [ $SPDK_TEST_IOAT -eq 1 ]; then
run_test test/ioat/ioat.sh
fi
timing_exit lib
if [ $SPDK_TEST_ISCSI -eq 1 ]; then
run_test ./test/iscsi_tgt/iscsi_tgt.sh posix
run_test ./test/iscsi_tgt/iscsijson/json_config.sh
fi
if [ $SPDK_TEST_BLOBFS -eq 1 ]; then
run_test ./test/blobfs/rocksdb/rocksdb.sh
run_test ./test/blobstore/blobstore.sh
fi
if [ $SPDK_TEST_NVMF -eq 1 ]; then
run_test ./test/nvmf/nvmf.sh
run_test ./test/nvmf/nvmfjson/json_config.sh
fi
if [ $SPDK_TEST_VHOST -eq 1 ]; then
timing_enter vhost
timing_enter negative
run_test ./test/vhost/spdk_vhost.sh --negative
timing_exit negative
timing_enter vhost_json_config
run_test ./test/vhost/json_config/json_config.sh
timing_exit vhost_json_config
if [ $RUN_NIGHTLY -eq 1 ]; then
timing_enter integrity_blk
run_test ./test/vhost/spdk_vhost.sh --integrity-blk
timing_exit integrity_blk
timing_enter integrity
run_test ./test/vhost/spdk_vhost.sh --integrity
timing_exit integrity
timing_enter fs_integrity_scsi
run_test ./test/vhost/spdk_vhost.sh --fs-integrity-scsi
timing_exit fs_integrity_scsi
timing_enter fs_integrity_blk
run_test ./test/vhost/spdk_vhost.sh --fs-integrity-blk
timing_exit fs_integrity_blk
timing_enter integrity_lvol_scsi_nightly
run_test ./test/vhost/spdk_vhost.sh --integrity-lvol-scsi-nightly
timing_exit integrity_lvol_scsi_nightly
timing_enter integrity_lvol_blk_nightly
run_test ./test/vhost/spdk_vhost.sh --integrity-lvol-blk-nightly
timing_exit integrity_lvol_blk_nightly
timing_enter vhost_migration
run_test ./test/vhost/spdk_vhost.sh --migration
timing_exit vhost_migration
# timing_enter readonly
# run_test ./test/vhost/spdk_vhost.sh --readonly
# timing_exit readonly
fi
if [ $SPDK_TEST_NVME -eq 1 ]; then
run_test "blockdev_nvme" test/bdev/blockdev.sh "nvme"
run_test "blockdev_nvme_gpt" test/bdev/blockdev.sh "gpt"
run_test "nvme" test/nvme/nvme.sh
if [[ $SPDK_TEST_NVME_PMR -eq 1 ]]; then
run_test "nvme_pmr" test/nvme/nvme_pmr.sh
fi
if [[ $SPDK_TEST_NVME_CUSE -eq 1 ]]; then
run_test "nvme_cuse" test/nvme/cuse/nvme_cuse.sh
fi
run_test "nvme_rpc" test/nvme/nvme_rpc.sh
# Only test hotplug without ASAN enabled. Since if it is
# enabled, it catches SEGV earlier than our handler which
# breaks the hotplug logic.
if [ $SPDK_RUN_ASAN -eq 0 ]; then
run_test "nvme_hotplug" test/nvme/hotplug.sh root
fi
fi
timing_enter integrity_lvol_scsi
run_test ./test/vhost/spdk_vhost.sh --integrity-lvol-scsi
timing_exit integrity_lvol_scsi
if [ $SPDK_TEST_IOAT -eq 1 ]; then
run_test "ioat" test/ioat/ioat.sh
fi
timing_enter integrity_lvol_blk
run_test ./test/vhost/spdk_vhost.sh --integrity-lvol-blk
timing_exit integrity_lvol_blk
timing_exit lib
timing_enter spdk_cli
run_test ./test/spdkcli/vhost.sh
timing_exit spdk_cli
if [ $SPDK_TEST_ISCSI -eq 1 ]; then
run_test "iscsi_tgt" ./test/iscsi_tgt/iscsi_tgt.sh
run_test "spdkcli_iscsi" ./test/spdkcli/iscsi.sh
timing_exit vhost
fi
# Run raid spdkcli test under iSCSI since blockdev tests run on systems that can't run spdkcli yet
run_test "spdkcli_raid" test/spdkcli/raid.sh
fi
if [ $SPDK_TEST_LVOL -eq 1 ]; then
timing_enter lvol
test_cases="1,50,51,52,53,100,101,102,150,200,201,250,251,252,253,254,255,"
test_cases+="300,301,450,451,452,550,551,552,553,"
test_cases+="600,601,650,651,652,654,655,"
test_cases+="700,701,750,751,752,753,754,755,756,757,758,759,"
test_cases+="800,801,802,803,804,10000"
run_test ./test/lvol/lvol.sh --test-cases=$test_cases
report_test_completion "lvol"
timing_exit lvol
fi
if [ $SPDK_TEST_BLOBFS -eq 1 ]; then
run_test "rocksdb" ./test/blobfs/rocksdb/rocksdb.sh
run_test "blobstore" ./test/blobstore/blobstore.sh
run_test "blobfs" ./test/blobfs/blobfs.sh
run_test "hello_blob" $SPDK_EXAMPLE_DIR/hello_blob \
examples/blob/hello_world/hello_blob.json
fi
if [ $SPDK_TEST_VHOST_INIT -eq 1 ]; then
run_test ./test/vhost/initiator/blockdev.sh
run_test ./test/vhost/initiator/json_config.sh
run_test ./test/spdkcli/virtio.sh
report_test_completion "vhost_initiator"
fi
if [ $SPDK_TEST_NVMF -eq 1 ]; then
# The NVMe-oF run test cases are split out like this so that the parser that compiles the
# list of all tests can properly differentiate them. Please do not merge them into one line.
if [ "$SPDK_TEST_NVMF_TRANSPORT" = "rdma" ]; then
timing_enter rdma_setup
rdma_device_init
timing_exit rdma_setup
run_test "nvmf_rdma" ./test/nvmf/nvmf.sh --transport=$SPDK_TEST_NVMF_TRANSPORT
run_test "spdkcli_nvmf_rdma" ./test/spdkcli/nvmf.sh --transport=$SPDK_TEST_NVMF_TRANSPORT
elif [ "$SPDK_TEST_NVMF_TRANSPORT" = "tcp" ]; then
timing_enter tcp_setup
tcp_device_init
timing_exit tcp_setup
run_test "nvmf_tcp" ./test/nvmf/nvmf.sh --transport=$SPDK_TEST_NVMF_TRANSPORT
run_test "spdkcli_nvmf_tcp" ./test/spdkcli/nvmf.sh --transport=$SPDK_TEST_NVMF_TRANSPORT
run_test "nvmf_identify_passthru" test/nvmf/target/identify_passthru.sh --transport=$SPDK_TEST_NVMF_TRANSPORT
run_test "nvmf_dif" test/nvmf/target/dif.sh
elif [ "$SPDK_TEST_NVMF_TRANSPORT" = "fc" ]; then
run_test "nvmf_fc" ./test/nvmf/nvmf.sh --transport=$SPDK_TEST_NVMF_TRANSPORT
run_test "spdkcli_nvmf_fc" ./test/spdkcli/nvmf.sh
else
echo "unknown NVMe transport, please specify rdma, tcp, or fc."
exit 1
fi
fi
if [ $SPDK_TEST_PMDK -eq 1 ]; then
run_test ./test/pmem/pmem.sh -x
run_test ./test/pmem/json_config/json_config.sh
run_test ./test/spdkcli/pmem.sh
fi
if [ $SPDK_TEST_VHOST -eq 1 ]; then
run_test "vhost" ./test/vhost/vhost.sh
fi
if [ $SPDK_TEST_LVOL -eq 1 ]; then
run_test "lvol" ./test/lvol/lvol.sh
run_test "blob_io_wait" ./test/blobstore/blob_io_wait/blob_io_wait.sh
fi
if [ $SPDK_TEST_VHOST_INIT -eq 1 ]; then
timing_enter vhost_initiator
run_test "vhost_blockdev" ./test/vhost/initiator/blockdev.sh
run_test "spdkcli_virtio" ./test/spdkcli/virtio.sh
run_test "vhost_shared" ./test/vhost/shared/shared.sh
run_test "vhost_fuzz" ./test/vhost/fuzz/fuzz.sh
timing_exit vhost_initiator
fi
if [ $SPDK_TEST_PMDK -eq 1 ]; then
run_test "blockdev_pmem" ./test/bdev/blockdev.sh "pmem"
run_test "pmem" ./test/pmem/pmem.sh -x
run_test "spdkcli_pmem" ./test/spdkcli/pmem.sh
fi
if [ $SPDK_TEST_RBD -eq 1 ]; then
run_test "blockdev_rbd" ./test/bdev/blockdev.sh "rbd"
run_test "spdkcli_rbd" ./test/spdkcli/rbd.sh
fi
if [ $SPDK_TEST_OCF -eq 1 ]; then
run_test "ocf" ./test/ocf/ocf.sh
fi
if [ $SPDK_TEST_FTL -eq 1 ]; then
run_test "ftl" ./test/ftl/ftl.sh
fi
if [ $SPDK_TEST_VMD -eq 1 ]; then
run_test "vmd" ./test/vmd/vmd.sh
fi
if [ $SPDK_TEST_REDUCE -eq 1 ]; then
run_test "compress_qat" ./test/compress/compress.sh "qat"
run_test "compress_isal" ./test/compress/compress.sh "isal"
fi
if [ $SPDK_TEST_OPAL -eq 1 ]; then
run_test "nvme_opal" ./test/nvme/nvme_opal.sh
fi
if [ $SPDK_TEST_CRYPTO -eq 1 ]; then
run_test "blockdev_crypto_aesni" ./test/bdev/blockdev.sh "crypto_aesni"
# Proceed with the test only if QAT devices are in place
if [[ $(lspci -d:37c8) ]]; then
run_test "blockdev_crypto_qat" ./test/bdev/blockdev.sh "crypto_qat"
fi
fi
if [[ $SPDK_TEST_SCHEDULER -eq 1 ]]; then
run_test "scheduler" ./test/scheduler/scheduler.sh
fi
if [ $SPDK_TEST_RBD -eq 1 ]; then
run_test ./test/bdev/bdevjson/rbd_json_config.sh
run_test ./test/spdkcli/rbd.sh
fi
timing_enter cleanup
@ -345,10 +243,10 @@ trap - SIGINT SIGTERM EXIT
# catch any stray core files
process_core
if hash lcov && ! [[ "$CC_TYPE" == *"clang"* ]]; then
if hash lcov; then
# generate coverage data and combine with baseline
$LCOV -q -c -d $src -t "$(hostname)" -o $out/cov_test.info
$LCOV -q -a $out/cov_base.info -a $out/cov_test.info -o $out/cov_total.info
$LCOV -q -c -d $src -t "$(hostname)" -o cov_test.info
$LCOV -q -a cov_base.info -a cov_test.info -o $out/cov_total.info
$LCOV -q -r $out/cov_total.info '*/dpdk/*' -o $out/cov_total.info
$LCOV -q -r $out/cov_total.info '/usr/*' -o $out/cov_total.info
git clean -f "*.gcda"

881
configure vendored

File diff suppressed because it is too large Load Diff

View File

@ -1,42 +0,0 @@
# ABI and API Deprecation {#deprecation}
This document details the policy for maintaining stability of SPDK ABI and API.
Major ABI version can change at most once for each quarterly SPDK release.
ABI versions are managed separately for each library and follow [Semantic Versoning](https://semver.org/).
API and ABI deprecation notices shall be posted in the next section.
Each entry must describe what will be removed and can suggest the future use or alternative.
Specific future SPDK release for the removal must be provided.
ABI cannot be removed without providing deprecation notice for at least single SPDK release.
# Deprecation Notices {#deprecation-notices}
## net
The net library is deprecated and will be removed in the 21.07 release.
## nvmf
The following APIs have been deprecated and will be removed in SPDK 21.07:
- `spdk_nvmf_poll_group_get_stat` (function in `nvmf.h`),
- `spdk_nvmf_transport_poll_group_get_stat` (function in `nvmf.h`),
- `spdk_nvmf_transport_poll_group_free_stat`(function in `nvmf.h`),
- `spdk_nvmf_rdma_device_stat` (struct in `nvmf.h`),
- `spdk_nvmf_transport_poll_group_stat` (struct in `nvmf.h`),
- `poll_group_get_stat` (transport op in `nvmf_transport.h`),
- `poll_group_free_stat` (transport op in `nvmf_transport.h`).
Please use `spdk_nvmf_poll_group_dump_stat` and `poll_group_dump_stat` instead.
## rpc
Parameter `enable-zerocopy-send` of RPC `sock_impl_set_options` is deprecated and will be removed in SPDK 21.07,
use `enable-zerocopy-send-server` or `enable-zerocopy-send-client` instead.
Parameter `disable-zerocopy-send` of RPC `sock_impl_set_options` is deprecated and will be removed in SPDK 21.07,
use `disable-zerocopy-send-server` or `disable-zerocopy-send-client` instead.
## rpm
`pkg/spdk.spec` is considered to be deprecated and scheduled for removal in SPDK 21.07.
Please use `rpmbuild/spdk.spec` instead and see
[RPM documentation](https://spdk.io/doc/rpm.html) for more details.

View File

@ -234,7 +234,7 @@ ALIASES =
# A mapping has the form "name=value". For example adding "class=itcl::class"
# will allow you to use the command class in the itcl::class meaning.
# TCL_SUBST =
TCL_SUBST =
# Set the OPTIMIZE_OUTPUT_FOR_C tag to YES if your project consists of C sources
# only. Doxygen will then generate output that is more tailored for C. For
@ -782,73 +782,48 @@ WARN_LOGFILE =
INPUT = ../include/spdk \
index.md \
# This list contains the top level pages listed in index.md. This list should
# remain in the same order as the contents of index.md. The order here also
# determines the order of these sections in the left-side navigation bar.
INPUT += \
\
intro.md \
concepts.md \
user_guides.md \
prog_guides.md \
general.md \
misc.md \
driver_modules.md \
modules.md \
tools.md \
ci_tools.md \
experimental_tools.md \
performance_reports.md \
# All remaining pages are listed here in alphabetical order by filename.
INPUT += \
\
about.md \
accel_fw.md \
applications.md \
changelog.md \
concurrency.md \
directory_structure.md \
getting_started.md \
memory.md \
porting.md \
bdev.md \
bdevperf.md \
bdev_module.md \
bdev_pg.md \
blob.md \
blobfs.md \
changelog.md \
compression.md \
concurrency.md \
containers.md \
../deprecation.md \
event.md \
ftl.md \
gdb_macros.md \
getting_started.md \
idxd.md \
ioat.md \
iscsi.md \
jsonrpc.md \
jsonrpc_proxy.md \
libraries.md \
lvol.md \
memory.md \
notify.md \
nvme.md \
nvme_spec.md \
nvme-cli.md \
nvmf.md \
nvmf_tgt_pg.md \
nvmf_tracing.md \
overview.md \
peer_2_peer.md \
pkgconfig.md \
porting.md \
rpm.md \
scheduler.md \
shfmt.md \
spdkcli.md \
spdk_top.md \
ssd_internals.md \
system_configuration.md \
user_guides_common.md \
userspace.md \
vagrant.md \
vhost.md \
vhost_processing.md \
virtio.md \
vmd.md
virtio.md
# This tag can be used to specify the character encoding of the source files
# that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses
@ -1105,7 +1080,7 @@ ALPHABETICAL_INDEX = YES
# Minimum value: 1, maximum value: 20, default value: 5.
# This tag requires that the tag ALPHABETICAL_INDEX is set to YES.
# COLS_IN_ALPHA_INDEX = 5
COLS_IN_ALPHA_INDEX = 5
# In case all classes in a project start with a common prefix, all classes will
# be put under the same header in the alphabetical index. The IGNORE_PREFIX tag
@ -1666,7 +1641,7 @@ EXTRA_SEARCH_MAPPINGS =
# If the GENERATE_LATEX tag is set to YES, doxygen will generate LaTeX output.
# The default value is: YES.
GENERATE_LATEX = NO
GENERATE_LATEX = YES
# The LATEX_OUTPUT tag is used to specify where the LaTeX docs will be put. If a
# relative path is entered the value of OUTPUT_DIRECTORY will be put in front of
@ -2170,7 +2145,7 @@ EXTERNAL_PAGES = YES
# interpreter (i.e. the result of 'which perl').
# The default file (with absolute path) is: /usr/bin/perl.
# PERL_PATH = /usr/bin/perl
PERL_PATH = /usr/bin/perl
#---------------------------------------------------------------------------
# Configuration options related to the dot tool
@ -2192,7 +2167,7 @@ CLASS_DIAGRAMS = YES
# the mscgen tool resides. If left empty the tool is assumed to be found in the
# default search path.
# MSCGEN_PATH =
MSCGEN_PATH =
# You can include diagrams made with dia in doxygen documentation. Doxygen will
# then run dia to produce the diagram and insert it in the documentation. The

View File

@ -1,4 +1,4 @@
# What is SPDK {#about}
# What is SPDK? {#about}
The Storage Performance Development Kit (SPDK) provides a set of tools and
libraries for writing high performance, scalable, user-mode storage

View File

@ -1,107 +0,0 @@
# Acceleration Framework {#accel_fw}
SPDK provides a framework for abstracting general acceleration capabilities
that can be implemented through plug-in modules and low-level libraries. These
plug-in modules include support for hardware acceleration engines such as
the Intel(R) I/O Acceleration Technology (IOAT) engine and the Intel(R) Data
Streaming Accelerator (DSA) engine. Additionally, a software plug-in module
exists to enable use of the framework in environments without hardware
acceleration capabilities. ISA/L is used for optimized CRC32C calculation within
the software module.
The framework includes an API for getting the current capabilities of the
selected module. See [`spdk_accel_get_capabilities`](https://spdk.io/doc/accel__engine_8h.html) for more details. For the software module, all capabilities will be reported as supported. For the hardware modules, only functions accelerated by hardware will be reported however any function can still be called, it will just be backed by software if it is not reported as a supported capability.
# Acceleration Framework Functions {#accel_functions}
Functions implemented via the framework can be found in the DoxyGen documentation of the
framework public header file here [accel_engine.h](https://spdk.io/doc/accel__engine_8h.html)
# Acceleration Framework Design Considerations {#accel_dc}
The general interface is defined by `/include/accel_engine.h` and implemented
in `/lib/accel`. These functions may be called by an SPDK application and in
most cases, except where otherwise documented, are asynchronous and follow the
standard SPDK model for callbacks with a callback argument.
If the acceleration framework is started without initializing a hardware module,
optimized software implementations of the functions will back the public API.
Additionally, if any hardware module does not support a specific function and that
hardware module is initialized, the specific function will fallback to a software
optimized implementation. For example, IOAT does not support the dualcast function
in hardware but if the IOAT module has been initialized and the public dualcast API
is called, it will actually be done via software behind the scenes.
# Acceleration Low Level Libraries {#accel_libs}
Low level libraries provide only the most basic functions that are specific to
the hardware. Low level libraries are located in the '/lib' directory with the
exception of the software implementation which is implemented as part of the
framework itself. The software low level library does not expose a public API.
Applications may choose to interact directly with a low level library if there are
specific needs/considerations not met via accessing the library through the
framework/module. Note that when using the low level libraries directly, the
framework abstracted interface is bypassed as the application will call the public
functions exposed by the individual low level libraries. Thus, code written this
way needs to be certain that the underlying hardware exists everywhere that it runs.
The low level library for IOAT is located in `/lib/ioat`. The low level library
for DSA is in `/liv/idxd` (IDXD stands for Intel(R) Data Acceleration Driver).
# Acceleration Plug-In Modules {#accel_modules}
Plug-in modules depend on low level libraries to interact with the hardware and
add additional functionality such as queueing during busy conditions or flow
control in some cases. The framework in turn depends on the modules to provide
the complete implementation of the acceleration component. A module must be
selected via startup RPC when the application is started. Otherwise, if no startup
RPC is provided, the framework is available and will use the software plug-in module.
## IOAT Module {#accel_ioat}
To use the IOAT engine, use the RPC [`ioat_scan_accel_engine`](https://spdk.io/doc/jsonrpc.html) before starting the application.
## IDXD Module {#accel_idxd}
To use the DSA engine, use the RPC [`idxd_scan_accel_engine`](https://spdk.io/doc/jsonrpc.html) with an optional parameter of `-c` and provide a configuration number of either 0 or 1. These pre-defined configurations determine how the DSA engine will be setup in terms
of work queues and engines. The DSA engine is very flexible allowing for various configurations of these elements to either account for different quality of service requirements or to isolate hardware paths where the back end media is of varying latency (i.e. persistent memory vs DRAM). The pre-defined configurations are as follows:
0: A single work queue backed with four DSA engines. This is a generic configuration
that enables the hardware to best determine which engine to use as it pulls in new
operations.
1: Two separate work queues each backed with two DSA engines. This is another
generic configuration that is documented in the specification and allows the
application to partition submissions across two work queues. This would be useful
when different priorities might be desired per group.
There are several other configurations that are possible that include quality
of service parameters on the work queues that are not currently utilized by
the module. Specialized use of DSA may require different configurations that
can be added to the module as needed.
## Software Module {#accel_sw}
The software module is enabled by default. If no hardware engine is explicitly
enabled via startup RPC as discussed earlier, the software module will use ISA-L
if available for functions such as CRC32C. Otherwise, standard glibc calls are
used to back the framework API.
## Batching {#batching}
Batching is exposed by the acceleration framework and provides an interface to
batch sets of commands up and then submit them with a single command. The public
API is consistent with the implementation however each plug-in module behaves
differently depending on its capabilities.
The DSA engine has complete support for batching all supported commands together
into one submission. This is advantageous as it reduces the overhead incurred in
the submission process to the hardware.
The software engine supports batching only to be consistent with the framework API.
In software there is no savings by batching sets of commands versus submitting them
individually.
The IOAT engine supports batching but it is only beneficial for `memmove` and `memfill`
as these are supported by the hardware. All other commands can be batched and the
framework will manage all other commands via software.

View File

@ -1,160 +0,0 @@
# An Overview of SPDK Applications {#app_overview}
SPDK is primarily a development kit that delivers libraries and header files for
use in other applications. However, SPDK also contains a number of applications.
These applications are primarily used to test the libraries, but many are full
featured and high quality. The major applications in SPDK are:
- @ref iscsi
- @ref nvmf
- @ref vhost
- SPDK Target (a unified application combining the above three)
There are also a number of tools and examples in the `examples` directory.
The SPDK targets are all based on a common framework so they have much in
common. The framework defines a concept called a `subsystem` and all
functionality is implemented in various subsystems. Subsystems have a unified
initialization and teardown path.
# Configuring SPDK Applications {#app_config}
## Command Line Parameters {#app_cmd_line_args}
The SPDK application framework defines a set of base command line flags for all
applications that use it. Specific applications may implement additional flags.
Param | Long Param | Type | Default | Description
-------- | ---------------------- | -------- | ---------------------- | -----------
-c | --config | string | | @ref cmd_arg_config_file
-d | --limit-coredump | flag | false | @ref cmd_arg_limit_coredump
-e | --tpoint-group-mask | integer | 0x0 | @ref cmd_arg_limit_tpoint_group_mask
-g | --single-file-segments | flag | | @ref cmd_arg_single_file_segments
-h | --help | flag | | show all available parameters and exit
-i | --shm-id | integer | | @ref cmd_arg_multi_process
-m | --cpumask | CPU mask | 0x1 | application @ref cpu_mask
-n | --mem-channels | integer | all channels | number of memory channels used for DPDK
-p | --main-core | integer | first core in CPU mask | main (primary) core for DPDK
-r | --rpc-socket | string | /var/tmp/spdk.sock | RPC listen address
-s | --mem-size | integer | all hugepage memory | @ref cmd_arg_memory_size
| | --silence-noticelog | flag | | disable notice level logging to `stderr`
-u | --no-pci | flag | | @ref cmd_arg_disable_pci_access.
| | --wait-for-rpc | flag | | @ref cmd_arg_deferred_initialization
-B | --pci-blocked | B:D:F | | @ref cmd_arg_pci_blocked_allowed.
-A | --pci-allowed | B:D:F | | @ref cmd_arg_pci_blocked_allowed.
-R | --huge-unlink | flag | | @ref cmd_arg_huge_unlink
| | --huge-dir | string | the first discovered | allocate hugepages from a specific mount
-L | --logflag | string | | @ref cmd_arg_log_flags
### Configuration file {#cmd_arg_config_file}
SPDK applications are configured using a JSON RPC configuration file.
See @ref jsonrpc for details.
### Limit coredump {#cmd_arg_limit_coredump}
By default, an SPDK application will set resource limits for core file sizes
to RLIM_INFINITY. Specifying `--limit-coredump` will not set the resource limits.
### Tracepoint group mask {#cmd_arg_limit_tpoint_group_mask}
SPDK has an experimental low overhead tracing framework. Tracepoints in this
framework are organized into tracepoint groups. By default, all tracepoint
groups are disabled. `--tpoint-group-mask` can be used to enable a specific
subset of tracepoint groups in the application.
Note: Additional documentation on the tracepoint framework is in progress.
### Deferred initialization {#cmd_arg_deferred_initialization}
SPDK applications progress through a set of states beginning with `STARTUP` and
ending with `RUNTIME`.
If the `--wait-for-rpc` parameter is provided SPDK will pause just before starting
framework initialization. This state is called `STARTUP`. The JSON RPC server is
ready but only a small subset of commands are available to set up initialization
parameters. Those parameters can't be changed after the SPDK application enters
`RUNTIME` state. When the client finishes configuring the SPDK subsystems it
needs to issue the @ref rpc_framework_start_init RPC command to begin the
initialization process. After `rpc_framework_start_init` returns `true` SPDK
will enter the `RUNTIME` state and the list of available commands becomes much
larger.
To see which RPC methods are available in the current state, issue the
`rpc_get_methods` with the parameter `current` set to `true`.
For more details see @ref jsonrpc documentation.
### Create just one hugetlbfs file {#cmd_arg_single_file_segments}
Instead of creating one hugetlbfs file per page, this option makes SPDK create
one file per hugepages per socket. This is needed for @ref virtio to be used
with more than 8 hugepages. See @ref virtio_2mb.
### Multi process mode {#cmd_arg_multi_process}
When `--shm-id` is specified, the application is started in multi-process mode.
Applications using the same shm-id share their memory and
[NVMe devices](@ref nvme_multi_process). The first app to start with a given id
becomes a primary process, with the rest, called secondary processes, only
attaching to it. When the primary process exits, the secondary ones continue to
operate, but no new processes can be attached at this point. All processes within
the same shm-id group must use the same
[--single-file-segments setting](@ref cmd_arg_single_file_segments).
### Memory size {#cmd_arg_memory_size}
Total size of the hugepage memory to reserve. If DPDK env layer is used, it will
reserve memory from all available hugetlbfs mounts, starting with the one with
the highest page size. This option accepts a number of bytes with a possible
binary prefix, e.g. 1024, 1024M, 1G. The default unit is megabyte.
Starting with DPDK 18.05.1, it's possible to reserve hugepages at runtime, meaning
that SPDK application can be started with 0 pre-reserved memory. Unlike hugepages
pre-reserved at the application startup, the hugepages reserved at runtime will be
released to the system as soon as they're no longer used.
### Disable PCI access {#cmd_arg_disable_pci_access}
If SPDK is run with PCI access disabled it won't detect any PCI devices. This
includes primarily NVMe and IOAT devices. Also, the VFIO and UIO kernel modules
are not required in this mode.
### PCI address blocked and allowed lists {#cmd_arg_pci_blocked_allowed}
If blocked list is used, then all devices with the provided PCI address will be
ignored. If an allowed list is used, only allowed devices will be probed.
`-B` or `-A` can be used more than once, but cannot be mixed together. That is,
`-B` and `-A` cannot be used at the same time.
### Unlink hugepage files after initialization {#cmd_arg_huge_unlink}
By default, each DPDK-based application tries to remove any orphaned hugetlbfs
files during its initialization. This option removes hugetlbfs files of the current
process as soon as they're created, but is not compatible with `--shm-id`.
### Log flag {#cmd_arg_log_flags}
Enable a specific log type. This option can be used more than once. A list of
all available types is provided in the `--help` output, with `--logflag all`
enabling all of them. Additionally enables debug print level in debug builds of SPDK.
## CPU mask {#cpu_mask}
Whenever the `CPU mask` is mentioned it is a string in one of the following formats:
- Case insensitive hexadecimal string with or without "0x" prefix.
- Comma separated list of CPUs or list of CPU ranges. Use '-' to define range.
### Example
The following CPU masks are equal and correspond to CPUs 0, 1, 2, 8, 9, 10, 11 and 12:
~~~
0x1f07
0x1F07
1f07
[0,1,2,8-12]
[0, 1, 2, 8, 9, 10, 11, 12]
~~~

View File

@ -1,9 +1,5 @@
# Block Device User Guide {#bdev}
# Target Audience {#bdev_ug_targetaudience}
This user guide is intended for software developers who have knowledge of block storage, storage drivers, issuing JSON-RPC commands and storage services such as RAID, compression, crypto, and others.
# Introduction {#bdev_ug_introduction}
The SPDK block device layer, often simply called *bdev*, is a C library
@ -35,166 +31,132 @@ chapters is done by using JSON-RPC commands. SPDK provides a python-based
command line tool for sending RPC commands located at `scripts/rpc.py`. User
can list available commands by running this script with `-h` or `--help` flag.
Additionally user can retrieve currently supported set of RPC commands
directly from SPDK application by running `scripts/rpc.py rpc_get_methods`.
directly from SPDK application by running `scripts/rpc.py get_rpc_methods`.
Detailed help for each command can be displayed by adding `-h` flag as a
command parameter.
# Configuring Block Device Modules {#bdev_ug_general_rpcs}
# General Purpose RPCs {#bdev_ug_general_rpcs}
Block devices can be configured using JSON RPCs. A complete list of available RPC commands
with detailed information can be found on the @ref jsonrpc_components_bdev page.
## get_bdevs {#bdev_ug_get_bdevs}
# Common Block Device Configuration Examples
List of currently available block devices including detailed information about
them can be get by using `get_bdevs` RPC command. User can add optional
parameter `name` to get details about specified by that name bdev.
Example response
~~~
{
"num_blocks": 32768,
"supported_io_types": {
"reset": true,
"nvme_admin": false,
"unmap": true,
"read": true,
"write_zeroes": true,
"write": true,
"flush": true,
"nvme_io": false
},
"driver_specific": {},
"claimed": false,
"block_size": 4096,
"product_name": "Malloc disk",
"name": "Malloc0"
}
~~~
## delete_bdev {#bdev_ug_delete_bdev}
To remove previously created bdev user can use `delete_bdev` RPC command.
Bdev can be deleted at any time and this will be fully handled by any upper
layers. As an argument user should provide bdev name. This RPC command
should be used only for debugging purpose. To remove a particular bdev please
use the delete command specific to its bdev module.
# Malloc bdev {#bdev_config_malloc}
Malloc bdevs are ramdisks. Because of its nature they are volatile. They are created from hugepage memory given to SPDK
application.
# NVMe bdev {#bdev_config_nvme}
There are two ways to create block device based on NVMe device in SPDK. First
way is to connect local PCIe drive and second one is to connect NVMe-oF device.
In both cases user should use `construct_nvme_bdev` RPC command to achieve that.
Example commands
`rpc.py construct_nvme_bdev -b NVMe1 -t PCIe -a 0000:01:00.0`
This command will create NVMe bdev of physical device in the system.
`rpc.py construct_nvme_bdev -b Nvme0 -t RDMA -a 192.168.100.1 -f IPv4 -s 4420 -n nqn.2016-06.io.spdk:cnode1`
This command will create NVMe bdev of NVMe-oF resource.
To remove a NVMe controller use the delete_nvme_controller command.
`rpc.py delete_nvme_controller Nvme0`
This command will remove NVMe controller named Nvme0.
# Null {#bdev_config_null}
The SPDK null bdev driver is a dummy block I/O target that discards all writes and returns undefined
data for reads. It is useful for benchmarking the rest of the bdev I/O stack with minimal block
device overhead and for testing configurations that can't easily be created with the Malloc bdev.
To create Null bdev RPC command `construct_null_bdev` should be used.
Example command
`rpc.py construct_null_bdev Null0 8589934592 4096`
This command will create an 8 petabyte `Null0` device with block size 4096.
To delete a null bdev use the delete_null_bdev command.
`rpc.py delete_null_bdev Null0`
# Linux AIO bdev {#bdev_config_aio}
The SPDK AIO bdev driver provides SPDK block layer access to Linux kernel block
devices or a file on a Linux filesystem via Linux AIO. Note that O_DIRECT is
used and thus bypasses the Linux page cache. This mode is probably as close to
a typical kernel based target as a user space target can get without using a
user-space driver. To create AIO bdev RPC command `construct_aio_bdev` should be
used.
Example commands
`rpc.py construct_aio_bdev /dev/sda aio0`
This command will create `aio0` device from /dev/sda.
`rpc.py construct_aio_bdev /tmp/file file 8192`
This command will create `file` device with block size 8192 from /tmp/file.
To delete an aio bdev use the delete_aio_bdev command.
`rpc.py delete_aio_bdev aio0`
# Ceph RBD {#bdev_config_rbd}
The SPDK RBD bdev driver provides SPDK block layer access to Ceph RADOS block
devices (RBD). Ceph RBD devices are accessed via librbd and librados libraries
to access the RADOS block device exported by Ceph. To create Ceph bdev RPC
command `bdev_rbd_create` should be used.
command `construct_rbd_bdev` should be used.
Example command
`rpc.py bdev_rbd_create rbd foo 512`
`rpc.py construct_rbd_bdev rbd foo 512`
This command will create a bdev that represents the 'foo' image from a pool called 'rbd'.
To remove a block device representation use the bdev_rbd_delete command.
To remove a block device representation use the delete_rbd_bdev command.
`rpc.py bdev_rbd_delete Rbd0`
To resize a bdev use the bdev_rbd_resize command.
`rpc.py bdev_rbd_resize Rbd0 4096`
This command will resize the Rbd0 bdev to 4096 MiB.
# Compression Virtual Bdev Module {#bdev_config_compress}
The compression bdev module can be configured to provide compression/decompression
services for an underlying thinly provisioned logical volume. Although the underlying
module can be anything (i.e. NVME bdev) the overall compression benefits will not be realized
unless the data stored on disk is placed appropriately. The compression vbdev module
relies on an internal SPDK library called `reduce` to accomplish this, see @ref reduce
for detailed information.
The vbdev module relies on the DPDK CompressDev Framework to provide all compression
functionality. The framework provides support for many different software only
compression modules as well as hardware assisted support for Intel QAT. At this
time the vbdev module supports the DPDK drivers for ISAL and QAT.
Persistent memory is used to store metadata associated with the layout of the data on the
backing device. SPDK relies on [PMDK](http://pmem.io/pmdk/) to interface persistent memory so any hardware
supported by PMDK should work. If the directory for PMEM supplied upon vbdev creation does
not point to persistent memory (i.e. a regular filesystem) performance will be severely
impacted. The vbdev module and reduce libraries were designed to use persistent memory for
any production use.
Example command
`rpc.py bdev_compress_create -p /pmem_files -b myLvol`
In this example, a compression vbdev is created using persistent memory that is mapped to
the directory `pmem_files` on top of the existing thinly provisioned logical volume `myLvol`.
The resulting compression bdev will be named `COMP_LVS/myLvol` where LVS is the name of the
logical volume store that `myLvol` resides on.
The logical volume is referred to as the backing device and once the compression vbdev is
created it cannot be separated from the persistent memory file that will be created in
the specified directory. If the persistent memory file is not available, the compression
vbdev will also not be available.
By default the vbdev module will choose the QAT driver if the hardware and drivers are
available and loaded. If not, it will revert to the software-only ISAL driver. By using
the following command, the driver may be specified however this is not persistent so it
must be done either upon creation or before the underlying logical volume is loaded to
be honored. In the example below, `0` is telling the vbdev module to use QAT if available
otherwise use ISAL, this is the default and if sufficient the command is not required. Passing
a value of 1 tells the driver to use QAT and if not available then the creation or loading
the vbdev should fail to create or load. A value of '2' as shown below tells the module
to use ISAL and if for some reason it is not available, the vbdev should fail to create or load.
`rpc.py compress_set_pmd -p 2`
To remove a compression vbdev, use the following command which will also delete the PMEM
file. If the logical volume is deleted the PMEM file will not be removed and the
compression vbdev will not be available.
`rpc.py bdev_compress_delete COMP_LVS/myLvol`
To list compression volumes that are only available for deletion because their PMEM file
was missing use the following. The name parameter is optional and if not included will list
all volumes, if used it will return the name or an error that the device does not exist.
`rpc.py bdev_compress_get_orphans --name COMP_Nvme0n1`
# Crypto Virtual Bdev Module {#bdev_config_crypto}
The crypto virtual bdev module can be configured to provide at rest data encryption
for any underlying bdev. The module relies on the DPDK CryptoDev Framework to provide
all cryptographic functionality. The framework provides support for many different software
only cryptographic modules as well hardware assisted support for the Intel QAT board. The
framework also provides support for cipher, hash, authentication and AEAD functions. At this
time the SPDK virtual bdev module supports cipher only as follows:
- AESN-NI Multi Buffer Crypto Poll Mode Driver: RTE_CRYPTO_CIPHER_AES128_CBC
- Intel(R) QuickAssist (QAT) Crypto Poll Mode Driver: RTE_CRYPTO_CIPHER_AES128_CBC
(Note: QAT is functional however is marked as experimental until the hardware has
been fully integrated with the SPDK CI system.)
In order to support using the bdev block offset (LBA) as the initialization vector (IV),
the crypto module break up all I/O into crypto operations of a size equal to the block
size of the underlying bdev. For example, a 4K I/O to a bdev with a 512B block size,
would result in 8 cryptographic operations.
For reads, the buffer provided to the crypto module will be used as the destination buffer
for unencrypted data. For writes, however, a temporary scratch buffer is used as the
destination buffer for encryption which is then passed on to the underlying bdev as the
write buffer. This is done to avoid encrypting the data in the original source buffer which
may cause problems in some use cases.
Example command
`rpc.py bdev_crypto_create NVMe1n1 CryNvmeA crypto_aesni_mb 0123456789123456`
This command will create a crypto vbdev called 'CryNvmeA' on top of the NVMe bdev
'NVMe1n1' and will use the DPDK software driver 'crypto_aesni_mb' and the key
'0123456789123456'.
To remove the vbdev use the bdev_crypto_delete command.
`rpc.py bdev_crypto_delete CryNvmeA`
# Delay Bdev Module {#bdev_config_delay}
The delay vbdev module is intended to apply a predetermined additional latency on top of a lower
level bdev. This enables the simulation of the latency characteristics of a device during the functional
or scalability testing of an SPDK application. For example, to simulate the effect of drive latency when
processing I/Os, one could configure a NULL bdev with a delay bdev on top of it.
The delay bdev module is not intended to provide a high fidelity replication of a specific NVMe drive's latency,
instead it's main purpose is to provide a "big picture" understanding of how a generic latency affects a given
application.
A delay bdev is created using the `bdev_delay_create` RPC. This rpc takes 6 arguments, one for the name
of the delay bdev and one for the name of the base bdev. The remaining four arguments represent the following
latency values: average read latency, average write latency, p99 read latency, and p99 write latency.
Within the context of the delay bdev p99 latency means that one percent of the I/O will be delayed by at
least by the value of the p99 latency before being completed to the upper level protocol. All of the latency values
are measured in microseconds.
Example command:
`rpc.py bdev_delay_create -b Null0 -d delay0 -r 10 --nine-nine-read-latency 50 -w 30 --nine-nine-write-latency 90`
This command will create a delay bdev with average read and write latencies of 10 and 30 microseconds and p99 read
and write latencies of 50 and 90 microseconds respectively.
A delay bdev can be deleted using the `bdev_delay_delete` RPC
Example command:
`rpc.py bdev_delay_delete delay0`
`rpc.py delete_rbd_bdev Rbd0`
# GPT (GUID Partition Table) {#bdev_config_gpt}
@ -203,36 +165,35 @@ It will automatically detect @ref bdev_ug_gpt on any attached bdev and will crea
possibly multiple virtual bdevs.
## SPDK GPT partition table {#bdev_ug_gpt}
The SPDK partition type GUID is `7c5222bd-8f5d-4087-9c00-bf9843c7b58c`. Existing SPDK bdevs
can be exposed as Linux block devices via NBD and then can be partitioned with
can be exposed as Linux block devices via NBD and then ca be partitioned with
standard partitioning tools. After partitioning, the bdevs will need to be deleted and
attached again for the GPT bdev module to see any changes. NBD kernel module must be
loaded first. To create NBD bdev user should use `nbd_start_disk` RPC command.
attached again fot the GPT bdev module to see any changes. NBD kernel module must be
loaded first. To create NBD bdev user should use `start_nbd_disk` RPC command.
Example command
`rpc.py nbd_start_disk Malloc0 /dev/nbd0`
`rpc.py start_nbd_disk Malloc0 /dev/nbd0`
This will expose an SPDK bdev `Malloc0` under the `/dev/nbd0` block device.
To remove NBD device user should use `nbd_stop_disk` RPC command.
To remove NBD device user should use `stop_nbd_disk` RPC command.
Example command
`rpc.py nbd_stop_disk /dev/nbd0`
`rpc.py stop_nbd_disk /dev/nbd0`
To display full or specified nbd device list user should use `nbd_get_disks` RPC command.
To display full or specified nbd device list user should use `get_nbd_disks` RPC command.
Example command
`rpc.py nbd_stop_disk -n /dev/nbd0`
`rpc.py stop_nbd_disk -n /dev/nbd0`
## Creating a GPT partition table using NBD {#bdev_ug_gpt_create_part}
~~~
# Expose bdev Nvme0n1 as kernel block device /dev/nbd0 by JSON-RPC
rpc.py nbd_start_disk Nvme0n1 /dev/nbd0
rpc.py start_nbd_disk Nvme0n1 /dev/nbd0
# Create GPT partition table.
parted -s /dev/nbd0 mklabel gpt
@ -245,152 +206,13 @@ parted -s /dev/nbd0 mkpart MyPartition '0%' '50%'
sgdisk -t 1:7c5222bd-8f5d-4087-9c00-bf9843c7b58c /dev/nbd0
# Stop the NBD device (stop exporting /dev/nbd0).
rpc.py nbd_stop_disk /dev/nbd0
rpc.py stop_nbd_disk /dev/nbd0
# Now Nvme0n1 is configured with a GPT partition table, and
# the first partition will be automatically exposed as
# Nvme0n1p1 in SPDK applications.
~~~
# iSCSI bdev {#bdev_config_iscsi}
The SPDK iSCSI bdev driver depends on libiscsi and hence is not enabled by default.
In order to use it, build SPDK with an extra `--with-iscsi-initiator` configure option.
The following command creates an `iSCSI0` bdev from a single LUN exposed at given iSCSI URL
with `iqn.2016-06.io.spdk:init` as the reported initiator IQN.
`rpc.py bdev_iscsi_create -b iSCSI0 -i iqn.2016-06.io.spdk:init --url iscsi://127.0.0.1/iqn.2016-06.io.spdk:disk1/0`
The URL is in the following format:
`iscsi://[<username>[%<password>]@]<host>[:<port>]/<target-iqn>/<lun>`
# Linux AIO bdev {#bdev_config_aio}
The SPDK AIO bdev driver provides SPDK block layer access to Linux kernel block
devices or a file on a Linux filesystem via Linux AIO. Note that O_DIRECT is
used and thus bypasses the Linux page cache. This mode is probably as close to
a typical kernel based target as a user space target can get without using a
user-space driver. To create AIO bdev RPC command `bdev_aio_create` should be
used.
Example commands
`rpc.py bdev_aio_create /dev/sda aio0`
This command will create `aio0` device from /dev/sda.
`rpc.py bdev_aio_create /tmp/file file 4096`
This command will create `file` device with block size 4096 from /tmp/file.
To delete an aio bdev use the bdev_aio_delete command.
`rpc.py bdev_aio_delete aio0`
# OCF Virtual bdev {#bdev_config_cas}
OCF virtual bdev module is based on [Open CAS Framework](https://github.com/Open-CAS/ocf) - a
high performance block storage caching meta-library.
To enable the module, configure SPDK using `--with-ocf` flag.
OCF bdev can be used to enable caching for any underlying bdev.
Below is an example command for creating OCF bdev:
`rpc.py bdev_ocf_create Cache1 wt Malloc0 Nvme0n1`
This command will create new OCF bdev `Cache1` having bdev `Malloc0` as caching-device
and `Nvme0n1` as core-device and initial cache mode `Write-Through`.
`Malloc0` will be used as cache for `Nvme0n1`, so data written to `Cache1` will be present
on `Nvme0n1` eventually.
By default, OCF will be configured with cache line size equal 4KiB
and non-volatile metadata will be disabled.
To remove `Cache1`:
`rpc.py bdev_ocf_delete Cache1`
During removal OCF-cache will be stopped and all cached data will be written to the core device.
Note that OCF has a per-device RAM requirement. More details can be found in the
[OCF documentation](https://open-cas.github.io/guide_system_requirements.html).
# Malloc bdev {#bdev_config_malloc}
Malloc bdevs are ramdisks. Because of its nature they are volatile. They are created from hugepage memory given to SPDK
application.
Example command for creating malloc bdev:
`rpc.py bdev_malloc_create -b Malloc0 64 512`
Example command for removing malloc bdev:
`rpc.py bdev_malloc_delete Malloc0`
# Null {#bdev_config_null}
The SPDK null bdev driver is a dummy block I/O target that discards all writes and returns undefined
data for reads. It is useful for benchmarking the rest of the bdev I/O stack with minimal block
device overhead and for testing configurations that can't easily be created with the Malloc bdev.
To create Null bdev RPC command `bdev_null_create` should be used.
Example command
`rpc.py bdev_null_create Null0 8589934592 4096`
This command will create an 8 petabyte `Null0` device with block size 4096.
To delete a null bdev use the bdev_null_delete command.
`rpc.py bdev_null_delete Null0`
# NVMe bdev {#bdev_config_nvme}
There are two ways to create block device based on NVMe device in SPDK. First
way is to connect local PCIe drive and second one is to connect NVMe-oF device.
In both cases user should use `bdev_nvme_attach_controller` RPC command to achieve that.
Example commands
`rpc.py bdev_nvme_attach_controller -b NVMe1 -t PCIe -a 0000:01:00.0`
This command will create NVMe bdev of physical device in the system.
`rpc.py bdev_nvme_attach_controller -b Nvme0 -t RDMA -a 192.168.100.1 -f IPv4 -s 4420 -n nqn.2016-06.io.spdk:cnode1`
This command will create NVMe bdev of NVMe-oF resource.
To remove an NVMe controller use the bdev_nvme_detach_controller command.
`rpc.py bdev_nvme_detach_controller Nvme0`
This command will remove NVMe bdev named Nvme0.
## NVMe bdev character device {#bdev_config_nvme_cuse}
This feature is considered as experimental. You must configure with --with-nvme-cuse
option to enable this RPC.
Example commands
`rpc.py bdev_nvme_cuse_register -n Nvme3
This command will register a character device under /dev/spdk associated with Nvme3
controller. If there are namespaces created on Nvme3 controller, a namespace
character device is also created for each namespace.
For example, the first controller registered will have a character device path of
/dev/spdk/nvmeX, where X is replaced with a unique integer to differentiate it from
other controllers. Note that this 'nvmeX' name here has no correlation to the name
associated with the controller in SPDK. Namespace character devices will have a path
of /dev/spdk/nvmeXnY, where Y is the namespace ID.
Cuse devices are removed from system, when NVMe controller is detached or unregistered
with command:
`rpc.py bdev_nvme_cuse_unregister -n Nvme0`
# Logical volumes {#bdev_ug_logical_volumes}
The Logical Volumes library is a flexible storage space management system. It allows
@ -402,21 +224,22 @@ please refer to @ref lvol.
Before creating any logical volumes (lvols), an lvol store has to be created first on
selected block device. Lvol store is lvols vessel responsible for managing underlying
bdev space assignment to lvol bdevs and storing metadata. To create lvol store user
should use using `bdev_lvol_create_lvstore` RPC command.
bdev space assigment to lvol bdevs and storing metadata. To create lvol store user
should use using `construct_lvol_store` RPC command.
Example command
`rpc.py bdev_lvol_create_lvstore Malloc2 lvs -c 4096`
`rpc.py construct_lvol_store Malloc2 lvs -c 4096`
This will create lvol store named `lvs` with cluster size 4096, build on top of
`Malloc2` bdev. In response user will be provided with uuid which is unique lvol store
identifier.
User can get list of available lvol stores using `bdev_lvol_get_lvstores` RPC command (no
User can get list of available lvol stores using `get_lvol_stores` RPC command (no
parameters available).
Example response
~~~
{
"uuid": "330a6ab2-f468-11e7-983e-001e67edf35d",
@ -429,36 +252,24 @@ Example response
}
~~~
To delete lvol store user should use `bdev_lvol_delete_lvstore` RPC command.
To delete lvol store user should use `destroy_lvol_store` RPC command.
Example commands
`rpc.py bdev_lvol_delete_lvstore -u 330a6ab2-f468-11e7-983e-001e67edf35d`
`rpc.py destroy_lvol_store -u 330a6ab2-f468-11e7-983e-001e67edf35d`
`rpc.py bdev_lvol_delete_lvstore -l lvs`
`rpc.py destroy_lvol_store -l lvs`
## Lvols {#bdev_ug_lvols}
To create lvols on existing lvol store user should use `bdev_lvol_create` RPC command.
To create lvols on existing lvol store user should use `construct_lvol_bdev` RPC command.
Each created lvol will be represented by new bdev.
Example commands
`rpc.py bdev_lvol_create lvol1 25 -l lvs`
`rpc.py construct_lvol_bdev lvol1 25 -l lvs`
`rpc.py bdev_lvol_create lvol2 25 -u 330a6ab2-f468-11e7-983e-001e67edf35d`
# Passthru {#bdev_config_passthru}
The SPDK Passthru virtual block device module serves as an example of how to write a
virtual block device module. It implements the required functionality of a vbdev module
and demonstrates some other basic features such as the use of per I/O context.
Example commands
`rpc.py bdev_passthru_create -b aio -p pt`
`rpc.py bdev_passthru_delete pt`
`rpc.py construct_lvol_bdev lvol2 25 -u 330a6ab2-f468-11e7-983e-001e67edf35d`
# Pmem {#bdev_config_pmem}
@ -468,130 +279,71 @@ First, user needs to configure SPDK to include PMDK support:
`configure --with-pmdk`
To create pmemblk pool for use with SPDK user should use `bdev_pmem_create_pool` RPC command.
To create pmemblk pool for use with SPDK user should use `create_pmem_pool` RPC command.
Example command
`rpc.py bdev_pmem_create_pool /path/to/pmem_pool 25 4096`
`rpc.py create_pmem_pool /path/to/pmem_pool 25 4096`
To get information on created pmem pool file user can use `bdev_pmem_get_pool_info` RPC command.
To get information on created pmem pool file user can use `pmem_pool_info` RPC command.
Example command
`rpc.py bdev_pmem_get_pool_info /path/to/pmem_pool`
`rpc.py pmem_pool_info /path/to/pmem_pool`
To remove pmem pool file user can use `bdev_pmem_delete_pool` RPC command.
To remove pmem pool file user can use `delete_pmem_pool` RPC command.
Example command
`rpc.py bdev_pmem_delete_pool /path/to/pmem_pool`
`rpc.py delete_pmem_pool /path/to/pmem_pool`
To create bdev based on pmemblk pool file user should use `bdev_pmem_create ` RPC
To create bdev based on pmemblk pool file user should use `construct_pmem_bdev ` RPC
command.
Example command
`rpc.py bdev_pmem_create /path/to/pmem_pool -n pmem`
`rpc.py construct_pmem_bdev /path/to/pmem_pool -n pmem`
To remove a block device representation use the bdev_pmem_delete command.
To remove a block device representation use the delete_pmem_bdev command.
`rpc.py bdev_pmem_delete pmem`
# RAID {#bdev_ug_raid}
RAID virtual bdev module provides functionality to combine any SPDK bdevs into
one RAID bdev. Currently SPDK supports only RAID 0. RAID functionality does not
store on-disk metadata on the member disks, so user must recreate the RAID
volume when restarting application. User may specify member disks to create RAID
volume event if they do not exists yet - as the member disks are registered at
a later time, the RAID module will claim them and will surface the RAID volume
after all of the member disks are available. It is allowed to use disks of
different sizes - the smallest disk size will be the amount of space used on
each member disk.
Example commands
`rpc.py bdev_raid_create -n Raid0 -z 64 -r 0 -b "lvol0 lvol1 lvol2 lvol3"`
`rpc.py bdev_raid_get_bdevs`
`rpc.py bdev_raid_delete Raid0`
# Split {#bdev_ug_split}
The split block device module takes an underlying block device and splits it into
several smaller equal-sized virtual block devices. This serves as an example to create
more vbdevs on a given base bdev for user testing.
Example commands
To create four split bdevs with base bdev_b0 use the `bdev_split_create` command.
Each split bdev will be one fourth the size of the base bdev.
`rpc.py bdev_split_create bdev_b0 4`
The `split_size_mb`(-s) parameter restricts the size of each split bdev.
The total size of all split bdevs must not exceed the base bdev size.
`rpc.py bdev_split_create bdev_b0 4 -s 128`
To remove the split bdevs, use the `bdev_split_delete` command with the base bdev name.
`rpc.py bdev_split_delete bdev_b0`
# Uring {#bdev_ug_uring}
The uring bdev module issues I/O to kernel block devices using the io_uring Linux kernel API. This module requires liburing.
For more information on io_uring refer to kernel [IO_uring] (https://kernel.dk/io_uring.pdf)
The user needs to configure SPDK to include io_uring support:
`configure --with-uring`
To create a uring bdev with given filename, bdev name and block size use the `bdev_uring_create` RPC.
`rpc.py bdev_uring_create /path/to/device bdev_u0 512`
To remove a uring bdev use the `bdev_uring_delete` RPC.
`rpc.py bdev_uring_delete bdev_u0`
# Virtio Block {#bdev_config_virtio_blk}
The Virtio-Block driver allows creating SPDK bdevs from Virtio-Block devices.
The following command creates a Virtio-Block device named `VirtioBlk0` from a vhost-user
socket `/tmp/vhost.0` exposed directly by SPDK @ref vhost. Optional `vq-count` and
`vq-size` params specify number of request queues and queue depth to be used.
`rpc.py bdev_virtio_attach_controller --dev-type blk --trtype user --traddr /tmp/vhost.0 --vq-count 2 --vq-size 512 VirtioBlk0`
The driver can be also used inside QEMU-based VMs. The following command creates a Virtio
Block device named `VirtioBlk0` from a Virtio PCI device at address `0000:00:01.0`.
The entire configuration will be read automatically from PCI Configuration Space. It will
reflect all parameters passed to QEMU's vhost-user-scsi-pci device.
`rpc.py bdev_virtio_attach_controller --dev-type blk --trtype pci --traddr 0000:01:00.0 VirtioBlk1`
Virtio-Block devices can be removed with the following command
`rpc.py bdev_virtio_detach_controller VirtioBlk0`
`rpc.py delete_pmem_bdev pmem`
# Virtio SCSI {#bdev_config_virtio_scsi}
The Virtio-SCSI driver allows creating SPDK block devices from Virtio-SCSI LUNs.
Virtio-SCSI bdevs are created the same way as Virtio-Block ones.
The following command creates a Virtio-SCSI device named `VirtioScsi0` from a vhost-user
socket `/tmp/vhost.0` exposed directly by SPDK @ref vhost. Optional `vq-count` and
`vq-size` params specify number of request queues and queue depth to be used.
`rpc.py bdev_virtio_attach_controller --dev-type scsi --trtype user --traddr /tmp/vhost.0 --vq-count 2 --vq-size 512 VirtioScsi0`
`rpc.py construct_virtio_user_scsi_bdev /tmp/vhost.0 VirtioScsi0 --vq-count 2 --vq-size 512`
`rpc.py bdev_virtio_attach_controller --dev-type scsi --trtype pci --traddr 0000:01:00.0 VirtioScsi0`
The driver can be also used inside QEMU-based VMs. The following command creates a Virtio
SCSI device named `VirtioScsi0` from a Virtio PCI device at address `0000:00:01.0`.
The entire configuration will be read automatically from PCI Configuration Space. It will
reflect all parameters passed to QEMU's vhost-user-scsi-pci device.
`rpc.py construct_virtio_pci_scsi_bdev 0000:00:01.0 VirtioScsi0`
Each Virtio-SCSI device may export up to 64 block devices named VirtioScsi0t0 ~ VirtioScsi0t63,
one LUN (LUN0) per SCSI device. The above 2 commands will output names of all exposed bdevs.
Virtio-SCSI devices can be removed with the following command
`rpc.py bdev_virtio_detach_controller VirtioScsi0`
`rpc.py remove_virtio_bdev VirtioScsi0`
Removing a Virtio-SCSI device will destroy all its bdevs.
# Virtio Block {#bdev_config_virtio_blk}
The Virtio-Block driver can expose an SPDK bdev from a Virtio-Block device.
Virtio-Block bdevs are constructed the same way as Virtio-SCSI ones.
`rpc.py construct_virtio_user_blk_bdev /tmp/virtio.0 VirtioBlk0 --vq-count 2 --vq-size 512`
`rpc.py construct_virtio_pci_blk_bdev 0000:01:00.0 VirtioBlk1`
Virtio-BLK devices can be removed with the following command
`rpc.py remove_virtio_bdev VirtioBlk0`

View File

@ -18,14 +18,14 @@ how to write a module.
## Creating A New Module
Block device modules are located in subdirectories under module/bdev today. It is not
Block device modules are located in subdirectories under lib/bdev today. It is not
currently possible to place the code for a bdev module elsewhere, but updates
to the build system could be made to enable this in the future. To create a
module, add a new directory with a single C file and a Makefile. A great
starting point is to copy the existing 'null' bdev module.
The primary interface that bdev modules will interact with is in
include/spdk/bdev_module.h. In that header a macro is defined that registers
include/spdk_internal/bdev.h. In that header a macro is defined that registers
a new bdev module - SPDK_BDEV_MODULE_REGISTER. This macro take as argument a
pointer spdk_bdev_module structure that is used to register new bdev module.
@ -34,8 +34,8 @@ initialization (`module_init`) and teardown (`module_fini`) functions,
the function that returns context size (`get_ctx_size`) - scratch space that
will be allocated in each I/O request for use by this module, and a callback
that will be called each time a new bdev is registered by another module
(`examine_config` and `examine_disk`). Please check the documentation of
struct spdk_bdev_module for more details.
(`examine`). Please check the documentation of struct spdk_bdev_module for
more details.
## Creating Bdevs
@ -137,15 +137,6 @@ block device. Once the I/O request is completed, the module must call
spdk_bdev_io_complete(). The I/O does not have to finish within the calling
context of `submit_request`.
Integrating a new bdev module into the build system requires updates to various
files in the /mk directory.
## Creating Bdevs in an External Repository
A User can build their own bdev module and application on top of existing SPDK libraries. The example in
test/external_code serves as a template for creating, building and linking an external
bdev module. Refer to test/external_code/README.md and @ref so_linking for further information.
## Creating Virtual Bdevs
Block devices are considered virtual if they handle I/O requests by routing
@ -153,7 +144,7 @@ the I/O to other block devices. The canonical example would be a bdev module
that implements RAID. Virtual bdevs are created in the same way as regular
bdevs, but take one additional step. The module can look up the underlying
bdevs it wishes to route I/O to using spdk_bdev_get_by_name(), where the string
name is provided by the user via an RPC. The module
name is provided by the user in a configuration file or via an RPC. The module
then may proceed is normal by opening the bdev to obtain a descriptor, and
creating I/O channels for the bdev (probably in response to the
`get_io_channel` callback). The final step is to have the module use its open

View File

@ -48,15 +48,15 @@ to as virtual bdevs, or *vbdevs* for short.
## Initializing The Library
The bdev layer depends on the generic message passing infrastructure
abstracted by the header file include/spdk/thread.h. See @ref concurrency for a
abstracted by the header file include/io_channel.h. See @ref concurrency for a
full description. Most importantly, calls into the bdev library may only be
made from threads that have been allocated with SPDK by calling
spdk_thread_create().
spdk_allocate_thread().
From an allocated thread, the bdev library may be initialized by calling
spdk_bdev_initialize(), which is an asynchronous operation. Until the completion
callback is called, no other bdev library functions may be invoked. Similarly,
to tear down the bdev library, call spdk_bdev_finish().
to tear down the bdev library, call spdk_bdev_finish.
## Discovering Block Devices
@ -72,7 +72,7 @@ name to look up the block device.
## Preparing To Use A Block Device
In order to send I/O requests to a block device, it must first be opened by
calling spdk_bdev_open_ext(). This will return a descriptor. Multiple users may have
calling spdk_bdev_open(). This will return a descriptor. Multiple users may have
a bdev open at the same time, and coordination of reads and writes between
users must be handled by some higher level mechanism outside of the bdev
layer. Opening a bdev with write permission may fail if a virtual bdev module
@ -81,14 +81,13 @@ logical volume management and forward their I/O to lower level bdevs, so they
mark these lower level bdevs as claimed to prevent outside users from issuing
writes.
When a block device is opened, a callback and context must be provided that
will be called with appropriate spdk_bdev_event_type enum as an argument when
the bdev triggers asynchronous event such as bdev removal. For example,
the callback will be called on each open descriptor for a bdev backed by
a physical NVMe SSD when the NVMe SSD is hot-unplugged. In this case
the callback can be thought of as a request to close the open descriptor so
other memory may be freed. A bdev cannot be torn down while open descriptors
exist, so it is required that a callback is provided.
When a block device is opened, an optional callback and context can be
provided that will be called if the underlying storage servicing the block
device is removed. For example, the remove callback will be called on each
open descriptor for a bdev backed by a physical NVMe SSD when the NVMe SSD is
hot-unplugged. The callback can be thought of as a request to close the open
descriptor so other memory may be freed. A bdev cannot be torn down while open
descriptors exist, so it is highly recommended that a callback is provided.
When a user is done with a descriptor, they may release it by calling
spdk_bdev_close().
@ -106,7 +105,7 @@ Once a descriptor and a channel have been obtained, I/O may be sent by calling
the various I/O submission functions such as spdk_bdev_read(). These calls each
take a callback as an argument which will be called some time later with a
handle to an spdk_bdev_io object. In response to that completion, the user
must call spdk_bdev_free_io() to release the resources. Within this callback,
must call spdk_free_bdev_io() to release the resources. Within this callback,
the user may also use the functions spdk_bdev_io_get_nvme_status() and
spdk_bdev_io_get_scsi_status() to obtain error information in the format of
their choosing.

View File

@ -1,86 +0,0 @@
# Using bdevperf application {#bdevperf}
## Introduction
bdevperf is an SPDK application that is used for performance testing
of block devices (bdevs) exposed by the SPDK bdev layer. It is an
alternative to the SPDK bdev fio plugin for benchmarking SPDK bdevs.
In some cases, bdevperf can provide much lower overhead than the fio
plugin, resulting in much better performance for tests using a limited
number of CPU cores.
bdevperf exposes command line interface that allows to specify
SPDK framework options as well as testing options.
Since SPDK 20.07, bdevperf supports configuration file that is similar
to FIO. It allows user to create jobs parameterized by
filename, cpumask, blocksize, queuesize, etc.
## Config file
Bdevperf's config file is similar to FIO's config file format.
Below is an example config file that uses all available parameters:
~~~{.ini}
[global]
filename=Malloc0:Malloc1
bs=1024
iosize=256
rw=randrw
rwmixread=90
[A]
cpumask=0xff
[B]
cpumask=[0-128]
filename=Malloc1
[global]
filename=Malloc0
rw=write
[C]
bs=4096
iosize=128
offset=1000000
length=1000000
~~~
Jobs `[A]` `[B]` or `[C]`, inherit default values from `[global]`
section residing above them. So in the example, job `[A]` inherits
`filename` value and uses both `Malloc0` and `Malloc1` bdevs as targets,
job `[B]` overrides its `filename` value and uses `Malloc1` and
job `[C]` inherits value `Malloc0` for its `filename`.
Interaction with CLI arguments is not the same as in FIO however.
If bdevperf receives CLI argument, it overrides values
of corresponding parameter for all `[global]` sections of config file.
So if example config is used, specifying `-q` argument
will make jobs `[A]` and `[B]` use its value.
Below is a full list of supported parameters with descriptions.
Param | Default | Description
--------- | ----------------- | -----------
filename | | Bdevs to use, separated by ":"
cpumask | Maximum available | CPU mask. Format is defined at @ref cpu_mask
bs | | Block size (io size)
iodepth | | Queue depth
rwmixread | `50` | Percentage of a mixed workload that should be reads
offset | `0` | Start I/O at the provided offset on the bdev
length | 100% of bdev size | End I/O at `offset`+`length` on the bdev
rw | | Type of I/O pattern
Available rw types:
- read
- randread
- write
- randwrite
- verify
- reset
- unmap
- write_zeroes
- flush
- rw
- randrw

View File

@ -35,27 +35,27 @@ NAND too.
## Theory of Operation {#blob_pg_theory}
### Abstractions
### Abstractions:
The Blobstore defines a hierarchy of storage abstractions as follows.
* **Logical Block**: Logical blocks are exposed by the disk itself, which are numbered from 0 to N, where N is the
number of blocks in the disk. A logical block is typically either 512B or 4KiB.
number of blocks in the disk. A logical block is typically either 512B or 4KiB.
* **Page**: A page is defined to be a fixed number of logical blocks defined at Blobstore creation time. The logical
blocks that compose a page are always contiguous. Pages are also numbered from the beginning of the disk such
that the first page worth of blocks is page 0, the second page is page 1, etc. A page is typically 4KiB in size,
so this is either 8 or 1 logical blocks in practice. The SSD must be able to perform atomic reads and writes of
at least the page size.
blocks that compose a page are always contiguous. Pages are also numbered from the beginning of the disk such
that the first page worth of blocks is page 0, the second page is page 1, etc. A page is typically 4KiB in size,
so this is either 8 or 1 logical blocks in practice. The SSD must be able to perform atomic reads and writes of
at least the page size.
* **Cluster**: A cluster is a fixed number of pages defined at Blobstore creation time. The pages that compose a cluster
are always contiguous. Clusters are also numbered from the beginning of the disk, where cluster 0 is the first cluster
worth of pages, cluster 1 is the second grouping of pages, etc. A cluster is typically 1MiB in size, or 256 pages.
are always contiguous. Clusters are also numbered from the beginning of the disk, where cluster 0 is the first cluster
worth of pages, cluster 1 is the second grouping of pages, etc. A cluster is typically 1MiB in size, or 256 pages.
* **Blob**: A blob is an ordered list of clusters. Blobs are manipulated (created, sized, deleted, etc.) by the application
and persist across power failures and reboots. Applications use a Blobstore provided identifier to access a particular blob.
Blobs are read and written in units of pages by specifying an offset from the start of the blob. Applications can also
store metadata in the form of key/value pairs with each blob which we'll refer to as xattrs (extended attributes).
and persist across power failures and reboots. Applications use a Blobstore provided identifier to access a particular blob.
Blobs are read and written in units of pages by specifying an offset from the start of the blob. Applications can also
store metadata in the form of key/value pairs with each blob which we'll refer to as xattrs (extended attributes).
* **Blobstore**: An SSD which has been initialized by a Blobstore-based application is referred to as "a Blobstore." A
Blobstore owns the entire underlying device which is made up of a private Blobstore metadata region and the collection of
blobs as managed by the application.
Blobstore owns the entire underlying device which is made up of a private Blobstore metadata region and the collection of
blobs as managed by the application.
@htmlonly
@ -87,6 +87,7 @@ The Blobstore defines a hierarchy of storage abstractions as follows.
35,
{ alignment: 'center', fill: 'white' });
for (var j = 0; j < 4; j++) {
let pageWidth = 100;
let pageHeight = canvasHeight;
@ -114,19 +115,19 @@ For all Blobstore operations regarding atomicity, there is a dependency on the u
operations of at least one page size. Atomicity here can refer to multiple operations:
* **Data Writes**: For the case of data writes, the unit of atomicity is one page. Therefore if a write operation of
greater than one page is underway and the system suffers a power failure, the data on media will be consistent at a page
size granularity (if a single page were in the middle of being updated when power was lost, the data at that page location
will be as it was prior to the start of the write operation following power restoration.)
greater than one page is underway and the system suffers a power failure, the data on media will be consistent at a page
size granularity (if a single page were in the middle of being updated when power was lost, the data at that page location
will be as it was prior to the start of the write operation following power restoration.)
* **Blob Metadata Updates**: Each blob has its own set of metadata (xattrs, size, etc). For performance reasons, a copy of
this metadata is kept in RAM and only synchronized with the on-disk version when the application makes an explicit call to
do so, or when the Blobstore is unloaded. Therefore, setting of an xattr, for example is not consistent until the call to
synchronize it (covered later) which is, however, performed atomically.
this metadata is kept in RAM and only synchronized with the on-disk version when the application makes an explicit call to
do so, or when the Blobstore is unloaded. Therefore, setting of an xattr, for example is not consistent until the call to
synchronize it (covered later) which is, however, performed atomically.
* **Blobstore Metadata Updates**: Blobstore itself has its own metadata which, like per blob metadata, has a copy in both
RAM and on-disk. Unlike the per blob metadata, however, the Blobstore metadata region is not made consistent via a blob
synchronization call, it is only synchronized when the Blobstore is properly unloaded via API. Therefore, if the Blobstore
metadata is updated (blob creation, deletion, resize, etc.) and not unloaded properly, it will need to perform some extra
steps the next time it is loaded which will take a bit more time than it would have if shutdown cleanly, but there will be
no inconsistencies.
RAM and on-disk. Unlike the per blob metadata, however, the Blobstore metadata region is not made consistent via a blob
synchronization call, it is only synchronized when the Blobstore is properly unloaded via API. Therefore, if the Blobstore
metadata is updated (blob creation, deletion, resize, etc.) and not unloaded properly, it will need to perform some extra
steps the next time it is loaded which will take a bit more time than it would have if shutdown cleanly, but there will be
no inconsistencies.
### Callbacks
@ -182,22 +183,22 @@ When the Blobstore is initialized, there are multiple configuration options to c
options and their defaults are:
* **Cluster Size**: By default, this value is 1MB. The cluster size is required to be a multiple of page size and should be
selected based on the applications usage model in terms of allocation. Recall that blobs are made up of clusters so when
a blob is allocated/deallocated or changes in size, disk LBAs will be manipulated in groups of cluster size. If the
application is expecting to deal with mainly very large (always multiple GB) blobs then it may make sense to change the
cluster size to 1GB for example.
selected based on the applications usage model in terms of allocation. Recall that blobs are made of up clusters so when
a blob is allocated/deallocated or changes in size, disk LBAs will be manipulated in groups of cluster size. If the
application is expecting to deal with mainly very large (always multiple GB) blobs then it may make sense to change the
cluster size to 1GB for example.
* **Number of Metadata Pages**: By default, Blobstore will assume there can be as many clusters as there are metadata pages
which is the worst case scenario in terms of metadata usage and can be overridden here however the space efficiency is
not significant.
which is the worst case scenario in terms of metadata usage and can be overridden here however the space efficiency is
not significant.
* **Maximum Simultaneous Metadata Operations**: Determines how many internally pre-allocated memory structures are set
aside for performing metadata operations. It is unlikely that changes to this value (default 32) would be desirable.
aside for performing metadata operations. It is unlikely that changes to this value (default 32) would be desirable.
* **Maximum Simultaneous Operations Per Channel**: Determines how many internally pre-allocated memory structures are set
aside for channel operations. Changes to this value would be application dependent and best determined by both a knowledge
of the typical usage model, an understanding of the types of SSDs being used and empirical data. The default is 512.
aside for channel operations. Changes to this value would be application dependent and best determined by both a knowledge
of the typical usage model, an understanding of the types of SSDs being used and empirical data. The default is 512.
* **Blobstore Type**: This field is a character array to be used by applications that need to identify whether the
Blobstore found here is appropriate to claim or not. The default is NULL and unless the application is being deployed in
an environment where multiple applications using the same disks are at risk of inadvertently using the wrong Blobstore, there
is no need to set this value. It can, however, be set to any valid set of characters.
Blobstore found here is appropriate to claim or not. The default is NULL and unless the application is being deployed in
an environment where multiple applications using the same disks are at risk of inadvertently using the wrong Blobstore, there
is no need to set this value. It can, however, be set to any valid set of characters.
### Sub-page Sized Operations
@ -209,11 +210,10 @@ requires finer granularity it will have to accommodate that itself.
As mentioned earlier, Blobstore can share a single thread with an application or the application
can define any number of threads, within resource constraints, that makes sense. The basic considerations that must be
followed are:
* Metadata operations (API with MD in the name) should be isolated from each other as there is no internal locking on the
memory structures affected by these API.
memory structures affected by these API.
* Metadata operations should be isolated from conflicting IO operations (an example of a conflicting IO would be one that is
reading/writing to an area of a blob that a metadata operation is deallocating).
reading/writing to an area of a blob that a metadata operation is deallocating).
* Asynchronous callbacks will always take place on the calling thread.
* No assumptions about IO ordering can be made regardless of how many or which threads were involved in the issuing.
@ -225,12 +225,12 @@ with SPDK API.
### Error Handling
Asynchronous Blobstore callbacks all include an error number that should be checked; non-zero values
indicate an error. Synchronous calls will typically return an error value if applicable.
indicate and error. Synchronous calls will typically return an error value if applicable.
### Asynchronous API
Asynchronous callbacks will return control not immediately, but at the point in execution where no
more forward progress can be made without blocking. Therefore, no assumptions can be made about the progress of
more forward progress can be made without blocking. Therefore, no assumptions can be made be made about the progress of
an asynchronous call until the callback has completed.
### Xattrs
@ -267,18 +267,21 @@ relevant in understanding any kind of structure for what is on the Blobstore.
There are multiple examples of Blobstore usage in the [repo](https://github.com/spdk/spdk):
* **Hello World**: Actually named `hello_blob.c` this is a very basic example of a single threaded application that
does nothing more than demonstrate the very basic API. Although Blobstore is optimized for NVMe, this example uses
a RAM disk (malloc) back-end so that it can be executed easily in any development environment. The malloc back-end
is a `bdev` module thus this example uses not only the SPDK Framework but the `bdev` layer as well.
does nothing more than demonstrate the very basic API. Although Blobstore is optimized for NVMe, this example uses
a RAM disk (malloc) back-end so that it can be executed easily in any development environment. The malloc back-end
is a `bdev` module thus this example uses not on the SPDK Framework but the `bdev` layer as well.
* **Hello NVME Blob**: `hello_nvme_blob.c` is the non-bdev version of `hello_blob.c`and simply shows how an
application can directly integrate Blobstore with the SPDK NVMe driver without using the `bdev` layer at all.
* **CLI**: The `blobcli.c` example is command line utility intended to not only serve as example code but as a test
and development tool for Blobstore itself. It is also a simple single threaded application that relies on both the
SPDK Framework and the `bdev` layer but offers multiple modes of operation to accomplish some real-world tasks. In
command mode, it accepts single-shot commands which can be a little time consuming if there are many commands to
get through as each one will take a few seconds waiting for DPDK initialization. It therefore has a shell mode that
allows the developer to get to a `blob>` prompt and then very quickly interact with Blobstore with simple commands
that include the ability to import/export blobs from/to regular files. Lastly there is a scripting mode to automate
a series of tasks, again, handy for development and/or test type activities.
and development tool for Blobstore itself. It is also a simple single threaded application that relies on both the
SPDK Framework and the `bdev` layer but offers multiple modes of operation to accomplish some real-world tasks. In
command mode, it accepts single-shot commands which can be a little time consuming if there are many commands to
get through as each one will take a few seconds waiting for DPDK initialization. It therefore has a shell mode that
allows the developer to get to a `blob>` prompt and then very quickly interact with Blobstore with simple commands
that include the ability to import/export blobs from/to regular files. Lastly there is a a scripting mode to automate
a series of tasks, again, handy for development and/or test type activities.
## Configuration {#blob_pg_config}
@ -311,32 +314,13 @@ Cluster 0 is special and has the following format, where page 0 is the first pag
The super block is a single page located at the beginning of the partition. It contains basic information about
the Blobstore. The metadata region is the remainder of cluster 0 and may extend to additional clusters. Refer
to the latest source code for complete structural details of the super block and metadata region.
to the latest srouce code for complete structural details of the super block and metadata region.
Each blob is allocated a non-contiguous set of pages inside the metadata region for its metadata. These pages
form a linked list. The first page in the list will be written in place on update, while all other pages will
be written to fresh locations. This requires the backing device to support an atomic write size greater than
or equal to the page size to guarantee that the operation is atomic. See the section on atomicity for details.
### Blob cluster layout {#blob_pg_cluster_layout}
Each blob is an ordered list of clusters, where starting LBA of a cluster is called extent. A blob can be
thin provisioned, resulting in no extent for some of the clusters. When first write operation occurs
to the unallocated cluster - new extent is chosen. This information is stored in RAM and on-disk.
There are two extent representations on-disk, dependent on `use_extent_table` (default:true) opts used
when creating a blob.
* **use_extent_table=true**: EXTENT_PAGE descriptor is not part of linked list of pages. It contains extents
that are not run-length encoded. Each extent page is referenced by EXTENT_TABLE descriptor, which is serialized
as part of linked list of pages. Extent table is run-length encoding all unallocated extent pages.
Every new cluster allocation updates a single extent page, in case when extent page was previously allocated.
Otherwise additionally incurs serializing whole linked list of pages for the blob.
* **use_extent_table=false**: EXTENT_RLE descriptor is serialized as part of linked list of pages.
Extents pointing to contiguous LBA are run-length encoded, including unallocated extents represented by 0.
Every new cluster allocation incurs serializing whole linked list of pages for the blob.
### Sequences and Batches
Internally Blobstore uses the concepts of sequences and batches to submit IO to the underlying device in either
@ -346,7 +330,7 @@ a serial fashion or in parallel, respectively. Both are defined using the follow
struct spdk_bs_request_set;
~~~
These requests sets are basically bookkeeping mechanisms to help Blobstore efficiently deal with related groups
These requests sets are basically bookkeeping mechanisms to help Blobstore efficiently deal will related groups
of IO. They are an internal construct only and are pre-allocated on a per channel basis (channels were discussed
earlier). They are removed from a channel associated linked list when the set (sequence or batch) is started and
then returned to the list when completed.
@ -360,7 +344,7 @@ the public API is `blob.h`.
~~~{.sh}
struct spdk_blob
~~~
This is an in-memory data structure that contains key elements like the blob identifier, its current state and two
This is an in-memory data structure that contains key elements like the blob identifier, it's current state and two
copies of the mutable metadata for the blob; one copy is the current metadata and the other is the last copy written
to disk.
@ -394,6 +378,5 @@ example,
~~~
And for the most part the following conventions are followed throughout:
* functions beginning with an underscore are called internally only
* functions or variables with the letters `cpl` are related to set or callback completions

View File

@ -14,30 +14,32 @@ make
~~~
Clone the RocksDB repository from the SPDK GitHub fork into a separate directory.
Make sure you check out the `6.15.fb` branch.
Make sure you check out the `spdk-v5.6.1` branch.
~~~{.sh}
cd ..
git clone -b 6.15.fb https://github.com/spdk/rocksdb.git
git clone -b spdk-v5.6.1 https://github.com/spdk/rocksdb.git
~~~
Build RocksDB. Only the `db_bench` benchmarking tool is integrated with BlobFS.
(Note: add `DEBUG_LEVEL=0` for a release build.)
~~~{.sh}
cd rocksdb
make db_bench SPDK_DIR=relative_path/to/spdk
make db_bench SPDK_DIR=path/to/spdk
~~~
Or you can also add `DEBUG_LEVEL=0` for a release build (need to turn on `USE_RTTI`).
Copy `etc/spdk/rocksdb.conf.in` from the SPDK repository to `/usr/local/etc/spdk/rocksdb.conf`.
~~~{.sh}
export USE_RTTI=1 && make db_bench DEBUG_LEVEL=0 SPDK_DIR=relative_path/to/spdk
cd ../spdk
cp etc/spdk/rocksdb.conf.in /usr/local/etc/spdk/rocksdb.conf
~~~
Create an NVMe section in the configuration file using SPDK's `gen_nvme.sh` script.
Append an NVMe section to the configuration file using SPDK's `gen_nvme.sh` script.
~~~{.sh}
scripts/gen_nvme.sh --json-with-subsystems > /usr/local/etc/spdk/rocksdb.json
scripts/gen_nvme.sh >> /usr/local/etc/spdk/rocksdb.conf
~~~
Verify the configuration file has specified the correct NVMe SSD.
@ -54,7 +56,7 @@ HUGEMEM=5120 scripts/setup.sh
Create an empty SPDK blobfs for testing.
~~~{.sh}
test/blobfs/mkfs/mkfs /usr/local/etc/spdk/rocksdb.json Nvme0n1
test/blobfs/mkfs/mkfs /usr/local/etc/spdk/rocksdb.conf Nvme0n1
~~~
At this point, RocksDB is ready for testing with SPDK. Three `db_bench` parameters are used to configure SPDK:
@ -66,7 +68,7 @@ At this point, RocksDB is ready for testing with SPDK. Three `db_bench` paramet
Default is 4096 (4GB). (Optional)
SPDK has a set of scripts which will run `db_bench` against a variety of workloads and capture performance and profiling
data. The primary script is `test/blobfs/rocksdb/rocksdb.sh`.
data. The primary script is `test/blobfs/rocksdb/run_tests.sh`.
# FUSE
@ -74,7 +76,7 @@ BlobFS provides a FUSE plug-in to mount an SPDK BlobFS as a kernel filesystem fo
The FUSE plug-in requires fuse3 and will be built automatically when fuse3 is detected on the system.
~~~{.sh}
test/blobfs/fuse/fuse /usr/local/etc/spdk/rocksdb.json Nvme0n1 /mnt/fuse
test/blobfs/fuse/fuse /usr/local/etc/spdk/rocksdb.conf Nvme0n1 /mnt/fuse
~~~
Note that the FUSE plug-in has some limitations - see the list below.

View File

@ -1,6 +0,0 @@
# CI Tools {#ci_tools}
Section describing tools used by CI to verify integrity of the submitted
patches ([status](https://ci.spdk.io)).
- @subpage shfmt

View File

@ -1,286 +0,0 @@
# SPDK "Reduce" Block Compression Algorithm {#reduce}
## Overview
The SPDK "reduce" block compression scheme is based on using SSDs for storing compressed blocks of
storage and persistent memory for metadata. This metadata includes mappings of logical blocks
requested by a user to the compressed blocks on SSD. The scheme described in this document
is generic and not tied to any specific block device framework such as the SPDK block device (bdev)
framework. This algorithm will be implemented in a library called "libreduce". Higher-level
software modules can built on top of this library to create and present block devices in a
specific block device framework. For SPDK, a bdev_reduce module will serve as a wrapper around
the libreduce library, to present the compressed block devices as an SPDK bdev.
This scheme only describes how compressed blocks are stored on an SSD and the metadata for tracking
those compressed blocks. It relies on the higher-software module to perform the compression
algorithm itself. For SPDK, the bdev_reduce module will utilize the DPDK compressdev framework
to perform compression and decompression on behalf of the libreduce library.
(Note that in some cases, blocks of storage may not be compressible, or cannot be compressed enough
to realize savings from the compression. In these cases, the data may be stored uncompressed on
disk. The phrase "compressed blocks of storage" includes these uncompressed blocks.)
A compressed block device is a logical entity built on top of a similarly-sized backing storage
device. The backing storage device must be thin-provisioned to realize any savings from
compression for reasons described later in this document. This algorithm has no direct knowledge
of the implementation of the backing storage device, except that it will always use the
lowest-numbered blocks available on the backing storage device. This will ensure that when this
algorithm is used on a thin-provisioned backing storage device, blocks will not be allocated until
they are actually needed.
The backing storage device must be sized for the worst case scenario, where no data can be
compressed. In this case, the size of the backing storage device would be the same as the
compressed block device. Since this algorithm ensures atomicity by never overwriting data
in place, some additional backing storage is required to temporarily store data for writes in
progress before the associated metadata is updated.
Storage from the backing storage device will be allocated, read, and written to in 4KB units for
best NVMe performance. These 4KB units are called "backing IO units". They are indexed from 0 to N-1
with the indices called "backing IO unit indices". At start, the full set of indices represent the
"free backing IO unit list".
A compressed block device compresses and decompresses data in units of chunks, where a chunk is a
multiple of at least two 4KB backing IO units. The number of backing IO units per chunk determines
the chunk size and is specified when the compressed block device is created. A chunk
consumes a number of 4KB backing IO units between 1 and the number of 4KB units in the chunk. For
example, a 16KB chunk consumes 1, 2, 3 or 4 backing IO units. The number of backing IO units depends on how
much the chunk was able to be compressed. The blocks on disk associated with a chunk are stored in a
"chunk map" in persistent memory. Each chunk map consists of N 64-bit values, where N is the maximum
number of backing IO units in the chunk. Each 64-bit value corresponds to a backing IO unit index. A
special value (for example, 2^64-1) is used for backing IO units not needed due to compression. The
number of chunk maps allocated is equal to the size of the compressed block device divided by its chunk
size, plus some number of extra chunk maps. These extra chunk maps are used to ensure atomicity on
writes and will be explained later in this document. At start, all of the chunk maps represent the
"free chunk map list".
Finally, the logical view of the compressed block device is represented by the "logical map". The
logical map is a mapping of chunk offsets into the compressed block device to the corresponding
chunk map. Each entry in the logical map is a 64-bit value, denoting the associated chunk map.
A special value (UINT64_MAX) is used if there is no associated chunk map. The mapping is
determined by dividing the byte offset by the chunk size to get an index, which is used as an
array index into the array of chunk map entries. At start, all entries in the logical map have no
associated chunk map. Note that while access to the backing storage device is in 4KB units, the
logical view may allow 4KB or 512B unit access and should perform similarly.
## Example
To illustrate this algorithm, we will use a real example at a very small scale.
The size of the compressed block device is 64KB, with a chunk size of 16KB. This will
realize the following:
* "Backing storage" will consist of an 80KB thin-provisioned logical volume. This
corresponds to the 64KB size of the compressed block device, plus an extra 16KB to handle
additional write operations under a worst-case compression scenario.
* "Free backing IO unit list" will consist of indices 0 through 19 (inclusive). These represent
the 20 4KB IO units in the backing storage.
* A "chunk map" will be 32 bytes in size. This corresponds to 4 backing IO units per chunk
(16KB / 4KB), and 8B (64b) per backing IO unit index.
* 5 chunk maps will be allocated in 160B of persistent memory. This corresponds to 4 chunk maps
for the 4 chunks in the compressed block device (64KB / 16KB), plus an extra chunk map for use
when overwriting an existing chunk.
* "Free chunk map list" will consist of indices 0 through 4 (inclusive). These represent the
5 allocated chunk maps.
* The "logical map" will be allocated in 32B of persistent memory. This corresponds to
4 entries for the 4 chunks in the compressed block device and 8B (64b) per entry.
In these examples, the value "X" will represent the special value (2^64-1) described above.
### Initial Creation
```
+--------------------+
Backing Device | |
+--------------------+
Free Backing IO Unit List 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19
+------------+------------+------------+------------+------------+
Chunk Maps | | | | | |
+------------+------------+------------+------------+------------+
Free Chunk Map List 0, 1, 2, 3, 4
+---+---+---+---+
Logical Map | X | X | X | X |
+---+---+---+---+
```
### Write 16KB at Offset 32KB
* Find the corresponding index into the logical map. Offset 32KB divided by the chunk size
(16KB) is 2.
* Entry 2 in the logical map is "X". This means no part of this 16KB has been written to yet.
* Allocate a 16KB buffer in memory
* Compress the incoming 16KB of data into this allocated buffer
* Assume this data compresses to 6KB. This requires 2 4KB backing IO units.
* Allocate 2 blocks (0 and 1) from the free backing IO unit list. Always use the lowest numbered
entries in the free backing IO unit list - this ensures that unnecessary backing storage
is not allocated in the thin-provisioned logical volume holding the backing storage.
* Write the 6KB of data to backing IO units 0 and 1.
* Allocate a chunk map (0) from the free chunk map list.
* Write (0, 1, X, X) to the chunk map. This represents that only 2 backing IO units were used to
store the 16KB of data.
* Write the chunk map index to entry 2 in the logical map.
```
+--------------------+
Backing Device |01 |
+--------------------+
Free Backing IO Unit List 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19
+------------+------------+------------+------------+------------+
Chunk Maps | 0 1 X X | | | | |
+------------+------------+------------+------------+------------+
Free Chunk Map List 1, 2, 3, 4
+---+---+---+---+
Logical Map | X | X | 0 | X |
+---+---+---+---+
```
### Write 4KB at Offset 8KB
* Find the corresponding index into the logical map. Offset 8KB divided by the chunk size is 0.
* Entry 0 in the logical map is "X". This means no part of this 16KB has been written to yet.
* The write is not for the entire 16KB chunk, so we must allocate a 16KB chunk-sized buffer for
source data.
* Copy the incoming 4KB data to offset 8KB of this 16KB buffer. Zero the rest of the 16KB buffer.
* Allocate a 16KB destination buffer.
* Compress the 16KB source data buffer into the 16KB destination buffer
* Assume this data compresses to 3KB. This requires 1 4KB backing IO unit.
* Allocate 1 block (2) from the free backing IO unit list.
* Write the 3KB of data to block 2.
* Allocate a chunk map (1) from the free chunk map list.
* Write (2, X, X, X) to the chunk map.
* Write the chunk map index to entry 0 in the logical map.
```
+--------------------+
Backing Device |012 |
+--------------------+
Free Backing IO Unit List 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19
+------------+------------+------------+------------+------------+
Chunk Maps | 0 1 X X | 2 X X X | | | |
+------------+------------+------------+------------+------------+
Free Chunk Map List 2, 3, 4
+---+---+---+---+
Logical Map | 1 | X | 0 | X |
+---+---+---+---+
```
### Read 16KB at Offset 16KB
* Offset 16KB maps to index 1 in the logical map.
* Entry 1 in the logical map is "X". This means no part of this 16KB has been written to yet.
* Since no data has been written to this chunk, return all 0's to satisfy the read I/O.
### Write 4KB at Offset 4KB
* Offset 4KB maps to index 0 in the logical map.
* Entry 0 in the logical map is "1". Since we are not overwriting the entire chunk, we must
do a read-modify-write.
* Chunk map 1 only specifies one backing IO unit (2). Allocate a 16KB buffer and read block
2 into it. This will be called the compressed data buffer. Note that 16KB is allocated
instead of 4KB so that we can reuse this buffer to hold the compressed data that will
be written later back to disk.
* Allocate a 16KB buffer for the uncompressed data for this chunk. Decompress the data from
the compressed data buffer into this buffer.
* Copy the incoming 4KB of data to offset 4KB of the uncompressed data buffer.
* Compress the 16KB uncompressed data buffer into the compressed data buffer.
* Assume this data compresses to 5KB. This requires 2 4KB backing IO units.
* Allocate blocks 3 and 4 from the free backing IO unit list.
* Write the 5KB of data to blocks 3 and 4.
* Allocate chunk map 2 from the free chunk map list.
* Write (3, 4, X, X) to chunk map 2. Note that at this point, the chunk map is not referenced
by the logical map. If there was a power fail at this point, the previous data for this chunk
would still be fully valid.
* Write chunk map 2 to entry 0 in the logical map.
* Free chunk map 1 back to the free chunk map list.
* Free backing IO unit 2 back to the free backing IO unit list.
```
+--------------------+
Backing Device |01 34 |
+--------------------+
Free Backing IO Unit List 2, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19
+------------+------------+------------+------------+------------+
Chunk Maps | 0 1 X X | | 3 4 X X | | |
+------------+------------+------------+------------+------------+
Free Chunk Map List 1, 3, 4
+---+---+---+---+
Logical Map | 2 | X | 0 | X |
+---+---+---+---+
```
### Operations that span across multiple chunks
Operations that span a chunk boundary are logically split into multiple operations, each of
which is associated with a single chunk.
Example: 20KB write at offset 4KB
In this case, the write operation is split into a 12KB write at offset 4KB (affecting only
chunk 0 in the logical map) and a 8KB write at offset 16KB (affecting only chunk 1 in the
logical map). Each write is processed independently using the algorithm described above.
Completion of the 20KB write does not occur until both operations have completed.
### Unmap Operations
Unmap operations on an entire chunk are achieved by removing the chunk map entry (if any) from
the logical map. The chunk map is returned to the free chunk map list, and any backing IO units
associated with the chunk map are returned to the free backing IO unit list.
Unmap operations that affect only part of a chunk can be treated as writing zeroes to that
region of the chunk. If the entire chunk is unmapped via several operations, it can be
detected via the uncompressed data equaling all zeroes. When this occurs, the chunk map entry
may be removed from the logical map.
After an entire chunk has been unmapped, subsequent reads to the chunk will return all zeroes.
This is similar to the "Read 16KB at offset 16KB" example above.
### Write Zeroes Operations
Write zeroes operations are handled similarly to unmap operations. If a write zeroes
operation covers an entire chunk, we can remove the chunk's entry in the logical map
completely. Then subsequent reads to that chunk will return all zeroes.
### Restart
An application using libreduce will periodically exit and need to be restarted. When the
application restarts, it will reload compressed volumes so they can be used again from the
same state as when the application exited.
When the compressed volume is reloaded, the free chunk map list and free backing IO unit list
are reconstructed by walking the logical map. The logical map will only point to valid
chunk maps, and the valid chunk maps will only point to valid backing IO units. Any chunk maps
and backing IO units not referenced go into their respective free lists.
This ensures that if a system crashes in the middle of a write operation - i.e. during or
after a chunk map is updated, but before it is written to the logical map - that everything
related to that in-progress write will be ignored after the compressed volume is restarted.
### Overlapping operations on same chunk
Implementations must take care to handle overlapping operations on the same chunk. For example,
operation 1 writes some data to chunk A, and while this is in progress, operation 2 also writes
some data to chunk A. In this case, operation 2 should not start until operation 1 has
completed. Further optimizations are outside the scope of this document.
### Thin provisioned backing storage
Backing storage must be thin provisioned to realize any savings from compression. This algorithm
will always use (and reuse) backing IO units available closest to offset 0 on the backing device.
This ensures that even though backing storage device may have been sized similarly to the size of
the compressed volume, storage for the backing storage device will not actually be allocated
until the backing IO units are actually needed.

View File

@ -4,7 +4,5 @@
- @subpage memory
- @subpage concurrency
- @subpage ssd_internals
- @subpage nvme_spec
- @subpage vhost_processing
- @subpage overview
- @subpage porting

View File

@ -3,58 +3,60 @@
# Theory
One of the primary aims of SPDK is to scale linearly with the addition of
hardware. This can mean many things in practice. For instance, moving from one
SSD to two should double the number of I/O's per second. Or doubling the number
of CPU cores should double the amount of computation possible. Or even doubling
the number of NICs should double the network throughput. To achieve this, the
software's threads of execution must be independent from one another as much as
possible. In practice, that means avoiding software locks and even atomic
instructions.
hardware. This can mean a number of things in practice. For instance, moving
from one SSD to two should double the number of I/O's per second. Or doubling
the number of CPU cores should double the amount of computation possible. Or
even doubling the number of NICs should double the network throughput. To
achieve this, the software must be designed such that threads of execution are
independent from one another as much as possible. In practice, that means
avoiding software locks and even atomic instructions.
Traditionally, software achieves concurrency by placing some shared data onto
the heap, protecting it with a lock, and then having all threads of execution
acquire the lock only when accessing the data. This model has many great
properties:
acquire the lock only when that shared data needs to be accessed. This model
has a number of great properties:
* It's easy to convert single-threaded programs to multi-threaded programs
because you don't have to change the data model from the single-threaded
version. You add a lock around the data.
* It's relatively easy to convert single-threaded programs to multi-threaded
programs because you don't have to change the data model from the
single-threaded version. You just add a lock around the data.
* You can write your program as a synchronous, imperative list of statements
that you read from top to bottom.
* The scheduler can interrupt threads, allowing for efficient time-sharing
of CPU resources.
that you read from top to bottom.
* Your threads can be interrupted and put to sleep by the operating system
scheduler behind the scenes, allowing for efficient time-sharing of CPU resources.
Unfortunately, as the number of threads scales up, contention on the lock around
the shared data does too. More granular locking helps, but then also increases
the complexity of the program. Even then, beyond a certain number of contended
locks, threads will spend most of their time attempting to acquire the locks and
the program will not benefit from more CPU cores.
Unfortunately, as the number of threads scales up, contention on the lock
around the shared data does too. More granular locking helps, but then also
greatly increases the complexity of the program. Even then, beyond a certain
number highly contended locks, threads will spend most of their time
attempting to acquire the locks and the program will not benefit from any
additional CPU cores.
SPDK takes a different approach altogether. Instead of placing shared data in a
global location that all threads access after acquiring a lock, SPDK will often
assign that data to a single thread. When other threads want to access the data,
they pass a message to the owning thread to perform the operation on their
behalf. This strategy, of course, is not at all new. For instance, it is one of
the core design principles of
assign that data to a single thread. When other threads want to access the
data, they pass a message to the owning thread to perform the operation on
their behalf. This strategy, of course, is not at all new. For instance, it is
one of the core design principles of
[Erlang](http://erlang.org/download/armstrong_thesis_2003.pdf) and is the main
concurrency mechanism in [Go](https://tour.golang.org/concurrency/2). A message
in SPDK consists of a function pointer and a pointer to some context. Messages
are passed between threads using a
in SPDK typically consists of a function pointer and a pointer to some context,
and is passed between threads using a
[lockless ring](http://dpdk.org/doc/guides/prog_guide/ring_lib.html). Message
passing is often much faster than most software developer's intuition leads them
to believe due to caching effects. If a single core is accessing the same data
(on behalf of all of the other cores), then that data is far more likely to be
in a cache closer to that core. It's often most efficient to have each core work
on a small set of data sitting in its local cache and then hand off a small
message to the next core when done.
passing is often much faster than most software developer's intuition leads them to
believe, primarily due to caching effects. If a single core is consistently
accessing the same data (on behalf of all of the other cores), then that data
is far more likely to be in a cache closer to that core. It's often most
efficient to have each core work on a relatively small set of data sitting in
its local cache and then hand off a small message to the next core when done.
In more extreme cases where even message passing may be too costly, each thread
may make a local copy of the data. The thread will then only reference its local
copy. To mutate the data, threads will send a message to each other thread
telling them to perform the update on their local copy. This is great when the
data isn't mutated very often, but is read very frequently, and is often
employed in the I/O path. This of course trades memory size for computational
efficiency, so it is used in only the most critical code paths.
In more extreme cases where even message passing may be too costly, a copy of
the data will be made for each thread. The thread will then only reference its
local copy. To mutate the data, threads will send a message to each other
thread telling them to perform the update on their local copy. This is great
when the data isn't mutated very often, but may be read very frequently, and is
often employed in the I/O path. This of course trades memory size for
computational efficiency, so it's use is limited to only the most critical code
paths.
# Message Passing Infrastructure
@ -62,65 +64,60 @@ SPDK provides several layers of message passing infrastructure. The most
fundamental libraries in SPDK, for instance, don't do any message passing on
their own and instead enumerate rules about when functions may be called in
their documentation (e.g. @ref nvme). Most libraries, however, depend on SPDK's
[thread](http://www.spdk.io/doc/thread_8h.html)
abstraction, located in `libspdk_thread.a`. The thread abstraction provides a
basic message passing framework and defines a few key primitives.
[io_channel](http://www.spdk.io/doc/io__channel_8h.html) infrastructure,
located in `libspdk_thread.a`. The io_channel infrastructure is an abstraction
around a basic message passing framework and defines a few key abstractions.
First, `spdk_thread` is an abstraction for a lightweight, stackless thread of
execution. A lower level framework can execute an `spdk_thread` for a single
timeslice by calling `spdk_thread_poll()`. A lower level framework is allowed to
move an `spdk_thread` between system threads at any time, as long as there is
only a single system thread executing `spdk_thread_poll()` on that
`spdk_thread` at any given time. New lightweight threads may be created at any
time by calling `spdk_thread_create()` and destroyed by calling
`spdk_thread_destroy()`. The lightweight thread is the foundational abstraction for
threading in SPDK.
First, spdk_thread is an abstraction for a thread of execution and
spdk_poller is an abstraction for a function that should be
periodically called on the given thread. On each system thread that the user
wishes to use with SPDK, they must first call spdk_allocate_thread(). This
function takes three function pointers - one that will be called to pass a
message to this thread, one that will be called to request that a poller be
started on this thread, and finally one to request that a poller be stopped.
*The implementation of these functions is not provided by this library*. Many
applications already have facilities for passing messages, so to ease
integration with existing code bases we've left the implementation up to the
user. However, for users starting from scratch, see the following section on
the event framework for an SPDK-provided implementation.
There are then a few additional abstractions layered on top of the
`spdk_thread`. One is the `spdk_poller`, which is an abstraction for a
function that should be repeatedly called on the given thread. Another is an
`spdk_msg_fn`, which is a function pointer and a context pointer, that can
be sent to a thread for execution via `spdk_thread_send_msg()`.
The library also defines two other abstractions: spdk_io_device and
spdk_io_channel. In the course of implementing SPDK we noticed the
same pattern emerging in a number of different libraries. In order to
implement a message passing strategy, the code would describe some object with
global state and also some per-thread context associated with that object that
was accessed in the I/O path to avoid locking on the global state. The pattern
was clearest in the lowest layers where I/O was being submitted to block
devices. These devices often expose multiple queues that can be assigned to
threads and then accessed without a lock to submit I/O. To abstract that, we
generalized the device to spdk_io_device and the thread-specific queue to
spdk_io_channel. Over time, however, the pattern has appeared in a huge
number of places that don't fit quite so nicely with the names we originally
chose. In today's code spdk_io_device is any pointer, whose uniqueness is
predicated only on its memory address, and spdk_io_channel is the per-thread
context associated with a particular spdk_io_device.
The library also defines two additional abstractions: `spdk_io_device` and
`spdk_io_channel`. In the course of implementing SPDK we noticed the same
pattern emerging in a number of different libraries. In order to implement a
message passing strategy, the code would describe some object with global state
and also some per-thread context associated with that object that was accessed
in the I/O path to avoid locking on the global state. The pattern was clearest
in the lowest layers where I/O was being submitted to block devices. These
devices often expose multiple queues that can be assigned to threads and then
accessed without a lock to submit I/O. To abstract that, we generalized the
device to `spdk_io_device` and the thread-specific queue to `spdk_io_channel`.
Over time, however, the pattern has appeared in a huge number of places that
don't fit quite so nicely with the names we originally chose. In today's code
`spdk_io_device` is any pointer, whose uniqueness is predicated only on its
memory address, and `spdk_io_channel` is the per-thread context associated with
a particular `spdk_io_device`.
The threading abstraction provides functions to send a message to any other
The io_channel infrastructure provides functions to send a message to any other
thread, to send a message to all threads one by one, and to send a message to
all threads for which there is an io_channel for a given io_device.
Most critically, the thread abstraction does not actually spawn any system level
threads of its own. Instead, it relies on the existence of some lower level
framework that spawns system threads and sets up event loops. Inside those event
loops, the threading abstraction simply requires the lower level framework to
repeatedly call `spdk_thread_poll()` on each `spdk_thread()` that exists. This
makes SPDK very portable to a wide variety of asynchronous, event-based
frameworks such as [Seastar](https://www.seastar.io) or [libuv](https://libuv.org/).
# The event Framework
The SPDK project didn't want to officially pick an asynchronous, event-based
framework for all of the example applications it shipped with, in the interest
of supporting the widest variety of frameworks possible. But the applications do
of course require something that implements an asynchronous event loop in order
to run, so enter the `event` framework located in `lib/event`. This framework
includes things like polling and scheduling the lightweight threads, installing
signal handlers to cleanly shutdown, and basic command line option parsing.
Only established applications should consider directly integrating the lower
level libraries.
As the number of example applications in SPDK grew, it became clear that a
large portion of the code in each was implementing the basic message passing
infrastructure required to call spdk_allocate_thread(). This includes spawning
one thread per core, pinning each thread to a unique core, and allocating
lockless rings between the threads for message passing. Instead of
re-implementing that infrastructure for each example application, SPDK
provides the SPDK @ref event. This library handles setting up all of the
message passing infrastructure, installing signal handlers to cleanly
shutdown, implements periodic pollers, and does basic command line parsing.
When started through spdk_app_start(), the library automatically spawns all of
the threads requested, pins them, and calls spdk_allocate_thread() with
appropriate function pointers for each one. This makes it much easier to
implement a brand new SPDK application and is the recommended method for those
starting out. Only established applications with sufficient message passing
infrastructure should consider directly integrating the lower level libraries.
# Limitations of the C Language
@ -156,7 +153,7 @@ Don't split these functions up - keep them as a nice unit that can be read from
For more complex callback chains, especially ones that have logical branches
or loops, it's best to write out a state machine. It turns out that higher
level languages that support futures and promises are just generating state
level langauges that support futures and promises are just generating state
machines at compile time, so even though we don't have the ability to generate
them in C we can still write them out by hand. As an example, here's a
callback chain that performs `foo` 5 times and then calls `bar` - effectively

View File

@ -1,91 +0,0 @@
# SPDK and Containers {#containers}
This is a living document as there are many ways to use containers with
SPDK. As new usages are identified and tested, they will be documented
here.
# In this document {#containers_toc}
* @ref kata_containers_with_spdk_vhost
* @ref spdk_in_docker
# Using SPDK vhost target to provide volume service to Kata Containers and Docker {#kata_containers_with_spdk_vhost}
[Kata Containers](https://katacontainers.io) can build a secure container
runtime with lightweight virtual machines that feel and perform like
containers, but provide stronger workload isolation using hardware
virtualization technology as a second layer of defense.
From Kata Containers [1.11.0](https://github.com/kata-containers/runtime/releases/tag/1.11.0),
vhost-user-blk support is enabled in `kata-containers/runtime`. That is to say
SPDK vhost target can be used to provide volume service to Kata Containers directly.
In addition, a container manager like Docker, can be configured easily to launch
a Kata container with an SPDK vhost-user block device. For operating details, visit
Kata containers use-case [Setup to run SPDK vhost-user devices with Kata Containers and Docker](https://github.com/kata-containers/documentation/blob/master/use-cases/using-SPDK-vhostuser-and-kata.md#host-setup-for-vhost-user-devices)
# Containerizing an SPDK Application for Docker {#spdk_in_docker}
There are no SPDK specific changes needed to run an SPDK based application in
a docker container, however this quick start guide should help you as you
containerize your SPDK based application.
1. Make sure you have all of your app dependencies identified and included in your Dockerfile
2. Make sure you have compiled your application for the target arch
3. Make sure your host has hugepages enabled
4. Make sure your host has bound your nvme device to your userspace driver
5. Write your Dockerfile. The following is a simple Dockerfile to containerize the nvme `hello_world`
example:
~~~{.sh}
# start with the latest Fedora
FROM fedora
# if you are behind a proxy, set that up now
ADD dnf.conf /etc/dnf/dnf.conf
# these are the min dependencies for the hello_world app
RUN dnf install libaio-devel -y
RUN dnf install numactl-devel -y
# set our working dir
WORKDIR /app
# add the hello_world binary
ADD hello_world hello_world
# run the app
CMD ./hello_world
~~~
6. Create your image
`sudo docker image build -t hello:1.0 .`
7. You docker command line will need to include at least the following:
- the `--privileged` flag to enable sharing of hugepages
- use of the `-v` switch to map hugepages
`sudo docker run --privileged -v /dev/hugepages:/dev/hugepages hello:1.0`
or depending on the needs of your app you may need one or more of the following parameters:
- If you are using the SPDK app framework: `-v /dev/shm:/dev/shm`
- If you need to use RPCs from outside of the container: `-v /var/tmp:/var/tmp`
- If you need to use the host network (i.e. NVMF target application): `--network host`
Your output should look something like this:
~~~{.sh}
$ sudo docker run --privileged -v //dev//hugepages://dev//hugepages hello:1.0
Starting SPDK v20.01-pre git sha1 80da95481 // DPDK 19.11.0 initialization...
[ DPDK EAL parameters: hello_world -c 0x1 --log-level=lib.eal:6 --log-level=lib.cryptodev:5 --log-level=user1:6 --iova-mode=pa --base-virtaddr=0x200000000000 --match-allocations --file-prefix=spdk0 --proc-type=auto ]
EAL: No available hugepages reported in hugepages-1048576kB
Initializing NVMe Controllers
Attaching to 0000:06:00.0
Attached to 0000:06:00.0
Using controller INTEL SSDPEDMD400G4 (CVFT7203005M400LGN ) with 1 namespaces.
Namespace ID: 1 size: 400GB
Initialization complete.
INFO: using host memory buffer for IO
Hello world!
~~~

122
doc/directory_structure.md Normal file
View File

@ -0,0 +1,122 @@
# SPDK Directory Structure {#directory_structure}
# Overview {#dir_overview}
SPDK is primarily a collection of C libraries intended to be consumed directly by
applications, but the repository also contains many examples and full-fledged applications.
This will provide a general overview of what is where in the repository.
## Applications {#dir_app}
The `app` top-level directory contains four applications:
- `app/iscsi_tgt`: An iSCSI target
- `app/nvmf_tgt`: An NVMe-oF target
- `app/iscsi_top`: Informational tool (like `top`) that tracks activity in the
iSCSI target.
- `app/trace`: A tool for processing trace points output from the iSCSI and
NVMe-oF targets.
- `app/vhost`: A vhost application that presents virtio controllers to
QEMU-based VMs and process I/O submitted to those controllers.
The application binaries will be in their respective directories after compiling and all
can be run with no arguments to print out their command line arguments. For the iSCSI
and NVMe-oF targets, they both need a configuration file (-c option). Fully commented
examples of the configuration files live in the `etc/spdk` directory.
## Build Collateral {#dir_build}
The `build` directory contains all of the static libraries constructed during
the build process. The `lib` directory combined with the `include/spdk`
directory are the official outputs of an SPDK release, if it were to be packaged.
## Documentation {#dir_doc}
The `doc` top-level directory contains all of SPDK's documentation. API Documentation
is created using Doxygen directly from the code, but more general articles and longer
explanations reside in this directory, as well as the Doxygen config file.
To build the documentation, just type `make` within the doc directory.
## Examples {#dir_examples}
The `examples` top-level directory contains a set of examples intended to be used
for reference. These are different than the applications, which are doing a "real"
task that could reasonably be deployed. The examples are instead either heavily
contrived to demonstrate some facet of SPDK, or aren't considered complete enough
to warrant tagging them as a full blown SPDK application.
This is a great place to learn about how SPDK works. In particular, check out
`examples/nvme/hello_world`.
## Include {#dir_include}
The `include` directory is where all of the header files are located. The public API
is all placed in the `spdk` subdirectory of `include` and we highly
recommend that applications set their include path to the top level `include`
directory and include the headers by prefixing `spdk/` like this:
~~~{.c}
#include "spdk/nvme.h"
~~~
Most of the headers here correspond with a library in the `lib` directory and will be
covered in that section. There are a few headers that stand alone, however. They are:
- `assert.h`
- `barrier.h`
- `endian.h`
- `fd.h`
- `mmio.h`
- `queue.h` and `queue_extras.h`
- `string.h`
There is also an `spdk_internal` directory that contains header files widely included
by libraries within SPDK, but that are not part of the public API and would not be
installed on a user's system.
## Libraries {#dir_lib}
The `lib` directory contains the real heart of SPDK. Each component is a C library with
its own directory under `lib`.
### Block Device Abstraction Layer {#dir_bdev}
The `bdev` directory contains a block device abstraction layer that is currently used
within the iSCSI and NVMe-oF targets. The public interface is `include/spdk/bdev.h`.
This library lacks clearly defined responsibilities as of this writing and instead does a
number of
things:
- Translates from a common `block` protocol to specific protocols like NVMe or to system
calls like libaio. There are currently three block device backend modules that can be
plugged in - libaio, SPDK NVMe, CephRBD, and a RAM-based backend called malloc.
- Provides a mechanism for composing virtual block devices from physical devices (to do
RAID and the like).
- Handles some memory allocation for data buffers.
This layer also could be made to do I/O queueing or splitting in a general way. We're open
to design ideas and discussion here.
### Configuration File Parser {#dir_conf}
The `conf` directory contains configuration file parser. The public header
is `include/spdk/conf.h`. The configuration file format is kind of like INI,
except that the directives are are "Name Value" instead of "Name = Value". This is
the configuration format for both the iSCSI and NVMe-oF targets.
... Lots more libraries that need to be described ...
## Makefile Fragments {#dir_mk}
The `mk` directory contains a number of shared Makefile fragments used in the build system.
## Scripts {#dir_scripts}
The `scripts` directory contains convenient scripts for a number of operations. The two most
important are `check_format.sh`, which will use astyle and pep8 to check C, C++, and Python
coding style against our defined conventions, and `setup.sh` which binds and unbinds devices
from kernel drivers.
## Tests {#dir_tests}
The `test` directory contains all of the tests for SPDK's components and the subdirectories mirror
the structure of the entire repository. The tests are a mixture of unit tests and functional tests.

View File

@ -1,7 +0,0 @@
# Driver Modules {#driver_modules}
- @subpage nvme
- @subpage ioat
- @subpage idxd
- @subpage virtio
- @subpage vmd

View File

@ -0,0 +1,3 @@
# Experimental Tools {#experimental_tools}
- @subpage spdkcli

View File

@ -1,289 +0,0 @@
# Flash Translation Layer {#ftl}
The Flash Translation Layer library provides block device access on top of devices
implementing bdev_zone interface.
It handles the logical to physical address mapping, responds to the asynchronous
media management events, and manages the defragmentation process.
# Terminology {#ftl_terminology}
## Logical to physical address map
* Shorthand: L2P
Contains the mapping of the logical addresses (LBA) to their on-disk physical location. The LBAs
are contiguous and in range from 0 to the number of surfaced blocks (the number of spare blocks
are calculated during device formation and are subtracted from the available address space). The
spare blocks account for zones going offline throughout the lifespan of the device as well as
provide necessary buffer for data [defragmentation](#ftl_reloc).
## Band {#ftl_band}
A band describes a collection of zones, each belonging to a different parallel unit. All writes to
a band follow the same pattern - a batch of logical blocks is written to one zone, another batch
to the next one and so on. This ensures the parallelism of the write operations, as they can be
executed independently on different zones. Each band keeps track of the LBAs it consists of, as
well as their validity, as some of the data will be invalidated by subsequent writes to the same
logical address. The L2P mapping can be restored from the SSD by reading this information in order
from the oldest band to the youngest.
+--------------+ +--------------+ +--------------+
band 1 | zone 1 +--------+ zone 1 +---- --- --- --- --- ---+ zone 1 |
+--------------+ +--------------+ +--------------+
band 2 | zone 2 +--------+ zone 2 +---- --- --- --- --- ---+ zone 2 |
+--------------+ +--------------+ +--------------+
band 3 | zone 3 +--------+ zone 3 +---- --- --- --- --- ---+ zone 3 |
+--------------+ +--------------+ +--------------+
| ... | | ... | | ... |
+--------------+ +--------------+ +--------------+
band m | zone m +--------+ zone m +---- --- --- --- --- ---+ zone m |
+--------------+ +--------------+ +--------------+
| ... | | ... | | ... |
+--------------+ +--------------+ +--------------+
parallel unit 1 pu 2 pu n
The address map and valid map are, along with a several other things (e.g. UUID of the device it's
part of, number of surfaced LBAs, band's sequence number, etc.), parts of the band's metadata. The
metadata is split in two parts:
head metadata band's data tail metadata
+-------------------+-------------------------------+------------------------+
|zone 1 |...|zone n |...|...|zone 1 |...| | ... |zone m-1 |zone m|
|block 1| |block 1| | |block x| | | |block y |block y|
+-------------------+-------------+-----------------+------------------------+
* the head part, containing information already known when opening the band (device's UUID, band's
sequence number, etc.), located at the beginning blocks of the band,
* the tail part, containing the address map and the valid map, located at the end of the band.
Bands are written sequentially (in a way that was described earlier). Before a band can be written
to, all of its zones need to be erased. During that time, the band is considered to be in a `PREP`
state. After that is done, the band transitions to the `OPENING` state, in which head metadata
is being written. Then the band moves to the `OPEN` state and actual user data can be written to the
band. Once the whole available space is filled, tail metadata is written and the band transitions to
`CLOSING` state. When that finishes the band becomes `CLOSED`.
## Ring write buffer {#ftl_rwb}
* Shorthand: RWB
Because the smallest write size the SSD may support can be a multiple of block size, in order to
support writes to a single block, the data needs to be buffered. The write buffer is the solution to
this problem. It consists of a number of pre-allocated buffers called batches, each of size allowing
for a single transfer to the SSD. A single batch is divided into block-sized buffer entries.
write buffer
+-----------------------------------+
|batch 1 |
| +-----------------------------+ |
| |rwb |rwb | ... |rwb | |
| |entry 1|entry 2| |entry n| |
| +-----------------------------+ |
+-----------------------------------+
| ... |
+-----------------------------------+
|batch m |
| +-----------------------------+ |
| |rwb |rwb | ... |rwb | |
| |entry 1|entry 2| |entry n| |
| +-----------------------------+ |
+-----------------------------------+
When a write is scheduled, it needs to acquire an entry for each of its blocks and copy the data
onto this buffer. Once all blocks are copied, the write can be signalled as completed to the user.
In the meantime, the `rwb` is polled for filled batches and, if one is found, it's sent to the SSD.
After that operation is completed the whole batch can be freed. For the whole time the data is in
the `rwb`, the L2P points at the buffer entry instead of a location on the SSD. This allows for
servicing read requests from the buffer.
## Defragmentation and relocation {#ftl_reloc}
* Shorthand: defrag, reloc
Since a write to the same LBA invalidates its previous physical location, some of the blocks on a
band might contain old data that basically wastes space. As there is no way to overwrite an already
written block, this data will stay there until the whole zone is reset. This might create a
situation in which all of the bands contain some valid data and no band can be erased, so no writes
can be executed anymore. Therefore a mechanism is needed to move valid data and invalidate whole
bands, so that they can be reused.
band band
+-----------------------------------+ +-----------------------------------+
| ** * * *** * *** * * | | |
|** * * * * * * *| +----> | |
|* *** * * * | | |
+-----------------------------------+ +-----------------------------------+
Valid blocks are marked with an asterisk '\*'.
Another reason for data relocation might be an event from the SSD telling us that the data might
become corrupt if it's not relocated. This might happen due to its old age (if it was written a
long time ago) or due to read disturb (media characteristic, that causes corruption of neighbouring
blocks during a read operation).
Module responsible for data relocation is called `reloc`. When a band is chosen for defragmentation
or a media management event is received, the appropriate blocks are marked as
required to be moved. The `reloc` module takes a band that has some of such blocks marked, checks
their validity and, if they're still valid, copies them.
Choosing a band for defragmentation depends on several factors: its valid ratio (1) (proportion of
valid blocks to all user blocks), its age (2) (when was it written) and its write count / wear level
index of its zones (3) (how many times the band was written to). The lower the ratio (1), the
higher its age (2) and the lower its write count (3), the higher the chance the band will be chosen
for defrag.
# Usage {#ftl_usage}
## Prerequisites {#ftl_prereq}
In order to use the FTL module, a device capable of zoned interface is required e.g. `zone_block`
bdev or OCSSD `nvme` bdev.
## FTL bdev creation {#ftl_create}
Similar to other bdevs, the FTL bdevs can be created either based on JSON config files or via RPC.
Both interfaces require the same arguments which are described by the `--help` option of the
`bdev_ftl_create` RPC call, which are:
- bdev's name
- base bdev's name (base bdev must implement bdev_zone API)
- UUID of the FTL device (if the FTL is to be restored from the SSD)
## FTL usage with OCSSD nvme bdev {#ftl_ocssd}
This option requires an Open Channel SSD, which can be emulated using QEMU.
The QEMU with the patches providing Open Channel support can be found on the SPDK's QEMU fork
on [spdk-3.0.0](https://github.com/spdk/qemu/tree/spdk-3.0.0) branch.
## Configuring QEMU {#ftl_qemu_config}
To emulate an Open Channel device, QEMU expects parameters describing the characteristics and
geometry of the SSD:
- `serial` - serial number,
- `lver` - version of the OCSSD standard (0 - disabled, 1 - "1.2", 2 - "2.0"), libftl only supports
2.0,
- `lba_index` - default LBA format. Possible values can be found in the table below (libftl only supports lba_index >= 3):
- `lnum_ch` - number of groups,
- `lnum_lun` - number of parallel units
- `lnum_pln` - number of planes (logical blocks from all planes constitute a chunk)
- `lpgs_per_blk` - number of pages (smallest programmable unit) per chunk
- `lsecs_per_pg` - number of sectors in a page
- `lblks_per_pln` - number of chunks in a parallel unit
- `laer_thread_sleep` - timeout in ms between asynchronous events requesting the host to relocate
the data based on media feedback
- `lmetadata` - metadata file
|lba_index| data| metadata|
|---------|-----|---------|
| 0 | 512B| 0B |
| 1 | 512B| 8B |
| 2 | 512B| 16B |
| 3 |4096B| 0B |
| 4 |4096B| 64B |
| 5 |4096B| 128B |
| 6 |4096B| 16B |
For more detailed description of the available options, consult the `hw/block/nvme.c` file in
the QEMU repository.
Example:
```
$ /path/to/qemu [OTHER PARAMETERS] -drive format=raw,file=/path/to/data/file,if=none,id=myocssd0
-device nvme,drive=myocssd0,serial=deadbeef,lver=2,lba_index=3,lnum_ch=1,lnum_lun=8,lnum_pln=4,
lpgs_per_blk=1536,lsecs_per_pg=4,lblks_per_pln=512,lmetadata=/path/to/md/file
```
In the above example, a device is created with 1 channel, 8 parallel units, 512 chunks per parallel
unit, 24576 (`lnum_pln` * `lpgs_per_blk` * `lsecs_per_pg`) logical blocks in each chunk with logical
block being 4096B. Therefore the data file needs to be at least 384G (8 * 512 * 24576 * 4096B) of
size and can be created with the following command:
```
fallocate -l 384G /path/to/data/file
```
## Configuring SPDK {#ftl_spdk_config}
To verify that the drive is emulated correctly, one can check the output of the NVMe identify app
(assuming that `scripts/setup.sh` was called before and the driver has been changed for that
device):
```
$ build/examples/identify
=====================================================
NVMe Controller at 0000:00:0a.0 [1d1d:1f1f]
=====================================================
Controller Capabilities/Features
================================
Vendor ID: 1d1d
Subsystem Vendor ID: 1af4
Serial Number: deadbeef
Model Number: QEMU NVMe Ctrl
... other info ...
Namespace OCSSD Geometry
=======================
OC version: maj:2 min:0
... other info ...
Groups (channels): 1
PUs (LUNs) per group: 8
Chunks per LUN: 512
Logical blks per chunk: 24576
... other info ...
```
In order to create FTL on top Open Channel SSD, the following steps are required:
1) Attach OCSSD NVMe controller
2) Create OCSSD bdev on the controller attached in step 1 (user could specify parallel unit range
and create multiple OCSSD bdevs on single OCSSD NVMe controller)
3) Create FTL bdev on top of bdev created in step 2
Example:
```
$ scripts/rpc.py bdev_nvme_attach_controller -b nvme0 -a 00:0a.0 -t pcie
$ scripts/rpc.py bdev_ocssd_create -c nvme0 -b nvme0n1
nvme0n1
$ scripts/rpc.py bdev_ftl_create -b ftl0 -d nvme0n1
{
"name": "ftl0",
"uuid": "3b469565-1fa5-4bfb-8341-747ec9fca9b9"
}
```
## FTL usage with zone block bdev {#ftl_zone_block}
Zone block bdev is a bdev adapter between regular `bdev` and `bdev_zone`. It emulates a zoned
interface on top of a regular block device.
In order to create FTL on top of a regular bdev:
1) Create regular bdev e.g. `bdev_nvme`, `bdev_null`, `bdev_malloc`
2) Create zone block bdev on top of a regular bdev created in step 1 (user could specify zone capacity
and optimal number of open zones)
3) Create FTL bdev on top of bdev created in step 2
Example:
```
$ scripts/rpc.py bdev_nvme_attach_controller -b nvme0 -a 00:05.0 -t pcie
nvme0n1
$ scripts/rpc.py bdev_zone_block_create -b zone1 -n nvme0n1 -z 4096 -o 32
zone1
$ scripts/rpc.py bdev_ftl_create -b ftl0 -d zone1
{
"name": "ftl0",
"uuid": "3b469565-1fa5-4bfb-8341-747ec9f3a9b9"
}
```

View File

@ -1,221 +0,0 @@
# GDB Macros User Guide {#gdb_macros}
# Introduction
When debugging an spdk application using gdb we may need to view data structures
in lists, e.g. information about bdevs or threads.
If, for example I have several bdevs, and I wish to get information on bdev by
the name 'test_vols3', I will need to manually iterate over the list as follows:
~~~{.sh}
(gdb) p g_bdev_mgr->bdevs->tqh_first->name
$5 = 0x7f7dcc0b21b0 "test_vols1"
(gdb) p g_bdev_mgr->bdevs->tqh_first->internal->link->tqe_next->name
$6 = 0x7f7dcc0b1a70 "test_vols2"
(gdb) p
g_bdev_mgr->bdevs->tqh_first->internal->link->tqe_next->internal->link->tqe_next->name
$7 = 0x7f7dcc215a00 "test_vols3"
(gdb) p
g_bdev_mgr->bdevs->tqh_first->internal->link->tqe_next->internal->link->tqe_next
$8 = (struct spdk_bdev *) 0x7f7dcc2c7c08
~~~
At this stage, we can start looking at the relevant fields of our bdev which now
we know is in address 0x7f7dcc2c7c08.
This can be somewhat troublesome if there are 100 bdevs, and the one we need is
56th in the list...
Instead, we can use a gdb macro in order to get information about all the
devices.
Examples:
Printing bdevs:
~~~{.sh}
(gdb) spdk_print_bdevs
SPDK object of type struct spdk_bdev at 0x7f7dcc1642a8
((struct spdk_bdev*) 0x7f7dcc1642a8)
name 0x7f7dcc0b21b0 "test_vols1"
---------------
SPDK object of type struct spdk_bdev at 0x7f7dcc216008
((struct spdk_bdev*) 0x7f7dcc216008)
name 0x7f7dcc0b1a70 "test_vols2"
---------------
SPDK object of type struct spdk_bdev at 0x7f7dcc2c7c08
((struct spdk_bdev*) 0x7f7dcc2c7c08)
name 0x7f7dcc215a00 "test_vols3"
---------------
~~~
Finding a bdev by name:
~~~{.sh}
(gdb) spdk_find_bdev test_vols1
test_vols1
SPDK object of type struct spdk_bdev at 0x7f7dcc1642a8
((struct spdk_bdev*) 0x7f7dcc1642a8)
name 0x7f7dcc0b21b0 "test_vols1"
~~~
Printing spdk threads:
~~~{.sh}
(gdb) spdk_print_threads
SPDK object of type struct spdk_thread at 0x7fffd0008b50
((struct spdk_thread*) 0x7fffd0008b50)
name 0x7fffd00008e0 "reactor_1"
IO Channels:
SPDK object of type struct spdk_io_channel at 0x7fffd0052610
((struct spdk_io_channel*) 0x7fffd0052610)
name
ref 1
device 0x7fffd0008c80 (0x7fffd0008ce0 "nvmf_tgt")
---------------
SPDK object of type struct spdk_io_channel at 0x7fffd0056cd0
((struct spdk_io_channel*) 0x7fffd0056cd0)
name
ref 2
device 0x7fffd0056bf0 (0x7fffd0008e70 "test_vol1")
---------------
SPDK object of type struct spdk_io_channel at 0x7fffd00582e0
((struct spdk_io_channel*) 0x7fffd00582e0)
name
ref 1
device 0x7fffd0056c50 (0x7fffd0056cb0 "bdev_test_vol1")
---------------
SPDK object of type struct spdk_io_channel at 0x7fffd00583b0
((struct spdk_io_channel*) 0x7fffd00583b0)
name
ref 1
device 0x7fffd0005630 (0x7fffd0005690 "bdev_mgr")
---------------
~~~
Printing nvmf subsystems:
~~~{.sh}
(gdb) spdk_print_nvmf_subsystems
SPDK object of type struct spdk_nvmf_subsystem at 0x7fffd0008d00
((struct spdk_nvmf_subsystem*) 0x7fffd0008d00)
name "nqn.2014-08.org.nvmexpress.discovery", '\000' <repeats 187 times>
nqn "nqn.2014-08.org.nvmexpress.discovery", '\000' <repeats 187 times>
ID 0
---------------
SPDK object of type struct spdk_nvmf_subsystem at 0x7fffd0055760
((struct spdk_nvmf_subsystem*) 0x7fffd0055760)
name "nqn.2016-06.io.spdk.umgmt:cnode1", '\000' <repeats 191 times>
nqn "nqn.2016-06.io.spdk.umgmt:cnode1", '\000' <repeats 191 times>
ID 1
~~~
# Loading The gdb Macros
Copy the gdb macros to the host where you are about to debug.
It is best to copy the file either to somewhere within the PYTHONPATH, or to add
the destination directory to the PYTHONPATH. This is not mandatory, and can be
worked around, but can save a few steps when loading the module to gdb.
From gdb, with the application core open, invoke python and load the modules.
In the example below, I copied the macros to the /tmp directory which is not in
the PYTHONPATH, so I had to manually add the directory to the path.
~~~{.sh}
(gdb) python
>import sys
>sys.path.append('/tmp')
>import gdb_macros
>end
(gdb) spdk_load_macros
~~~
# Using the gdb Data Directory
On most systems, the data directory is /usr/share/gdb. The python script should
be copied into the python/gdb/function (or python/gdb/command) directory under
the data directory, e.g. /usr/share/gdb/python/gdb/function.
If the python script is in there, then the only thing you need to do when
starting gdb is type "spdk_load_macros".
# Using .gdbinit To Load The Macros
.gdbinit can also be used in order to run automatically run the manual steps
above prior to starting gdb.
Exmaple .gdbinit:
~~~{.sh}
source /opt/km/install/tools/gdb_macros/gdb_macros.py
~~~
When starting gdb you still have to call spdk_load_macros.
# Why Do We Need to Explicitly Call spdk_load_macros
The reason is that the macros need to use globals provided by spdk in order to
iterate the spdk lists and build iterable representations of the list objects.
This will result in errors if these are not available which is very possible if
gdb is used for reasons other than debugging spdk core dumps.
In the example bellow, I attempted to load the macros when the globals are not
available causing gdb to fail loading the gdb_macros:
~~~{.sh}
(gdb) spdk_load_macros
Traceback (most recent call last):
File "/opt/km/install/tools/gdb_macros/gdb_macros.py", line 257, in invoke
spdk_print_threads()
File "/opt/km/install/tools/gdb_macros/gdb_macros.py", line 241, in __init__
threads = SpdkThreads()
File "/opt/km/install/tools/gdb_macros/gdb_macros.py", line 234, in __init__
super(SpdkThreads, self).__init__('g_threads', SpdkThread)
File "/opt/km/install/tools/gdb_macros/gdb_macros.py", line 25, in __init__
['tailq'])
File "/opt/km/install/tools/gdb_macros/gdb_macros.py", line 10, in __init__
self.list = gdb.parse_and_eval(self.list_pointer)
RuntimeError: No symbol table is loaded. Use the "file" command.
Error occurred in Python command: No symbol table is loaded. Use the "file"
command.
~~~
# Macros available
- spdk_load_macros: load the macros (use --reload in order to reload them)
- spdk_print_bdevs: information about bdevs
- spdk_find_bdev: find a bdev (substring search)
- spdk_print_io_devices: information about io devices
- spdk_print_nvmf_subsystems: information about nvmf subsystems
- spdk_print_threads: information about threads
# Adding New Macros
The list iteration macros are usually built from 3 layers:
- SpdkPrintCommand: inherits from gdb.Command and invokes the list iteration
- SpdkTailqList: Performs the iteration of a tailq list according to the tailq
member implementation
- SpdkObject: Provides the __str__ function so that the list iteration can print
the object
Other useful objects:
- SpdkNormalTailqList: represents a list which has 'tailq' as the tailq object
- SpdkArr: Iteration over an array (instead of a linked list)

View File

@ -1,6 +1,6 @@
# General Information {#general}
- @subpage directory_structure
- [Public API header files](files.html)
- @subpage event
- @subpage scheduler
- @subpage logical_volumes
- @subpage accel_fw

View File

@ -10,20 +10,13 @@ git submodule update --init
# Installing Prerequisites {#getting_started_prerequisites}
The `scripts/pkgdep.sh` script will automatically install the bare minimum
dependencies required to build SPDK.
Use `--help` to see information on installing dependencies for optional components.
The `scripts/pkgdep.sh` script will automatically install the full set of
dependencies required to build and develop SPDK.
~~~{.sh}
sudo scripts/pkgdep.sh
~~~
Option --all will install all dependencies needed by SPDK features.
~~~{.sh}
sudo scripts/pkgdep.sh --all
~~~
# Building {#getting_started_building}
Linux:
@ -110,7 +103,7 @@ with no arguments to see the help output. If your system has its IOMMU
enabled you can run the examples as your regular user. If it doesn't, you'll
need to run as a privileged user (root).
A good example to start with is `build/examples/identify`, which prints
A good example to start with is `examples/nvme/identify`, which prints
out information about all of the NVMe devices on your system.
Larger, more fully functional applications are available in the `app`

View File

@ -2,6 +2,8 @@
<html>
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<!-- For Mobile Devices -->
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta http-equiv="Content-Type" content="text/xhtml; charset=utf-8">
<meta name="generator" content="Doxygen $doxygenversion">

View File

@ -1,28 +0,0 @@
# IDXD Driver {#idxd}
# Public Interface {#idxd_interface}
- spdk/idxd.h
# Key Functions {#idxd_key_functions}
Function | Description
--------------------------------------- | -----------
spdk_idxd_probe() | @copybrief spdk_idxd_probe()
spdk_idxd_batch_get_max() | @copybrief spdk_idxd_batch_get_max()
spdk_idxd_batch_create() | @copybrief spdk_idxd_batch_create()
spdk_idxd_batch_prep_copy() | @copybrief spdk_idxd_batch_prep_copy()
spdk_idxd_batch_submit() | @copybrief spdk_idxd_batch_submit()
spdk_idxd_submit_copy() | @copybrief spdk_idxd_submit_copy()
spdk_idxd_submit_compare() | @copybrief spdk_idxd_submit_compare()
spdk_idxd_submit_crc32c() | @copybrief spdk_idxd_submit_crc32c()
spdk_idxd_submit_dualcast | @copybrief spdk_idxd_submit_dualcast()
spdk_idxd_submit_fill() | @copybrief spdk_idxd_submit_fill()
# Pre-defined configurations {#idxd_configs}
The RPC `idxd_scan_accel_engine` is used to both enable IDXD and set it's
configuration to one of two pre-defined configs:
Config #0: 4 groups, 1 work queue per group, 1 engine per group.
Config #1: 2 groups, 2 work queues per group, 2 engines per group.

View File

@ -1,827 +0,0 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://creativecommons.org/ns#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns="http://www.w3.org/2000/svg"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
width="174.625mm"
height="82.020836mm"
version="1.1"
viewBox="0 0 174.625 82.020833"
id="svg136"
sodipodi:docname="iscsi.svg"
inkscape:version="0.92.3 (2405546, 2018-03-11)">
<sodipodi:namedview
pagecolor="#ffffff"
bordercolor="#666666"
borderopacity="1"
objecttolerance="10"
gridtolerance="10"
guidetolerance="10"
inkscape:pageopacity="0"
inkscape:pageshadow="2"
inkscape:window-width="1387"
inkscape:window-height="888"
id="namedview138"
showgrid="true"
inkscape:zoom="0.9096286"
inkscape:cx="242.15534"
inkscape:cy="182.31015"
inkscape:window-x="1974"
inkscape:window-y="112"
inkscape:window-maximized="0"
inkscape:current-layer="svg136"
fit-margin-top="0"
fit-margin-left="0"
fit-margin-right="0"
fit-margin-bottom="0">
<inkscape:grid
type="xygrid"
id="grid2224"
originx="38.364584"
originy="-17.197913" />
</sodipodi:namedview>
<title
id="title2">Thin Provisioning Write</title>
<defs
id="defs22">
<marker
inkscape:stockid="Arrow1Mstart"
orient="auto"
refY="0"
refX="0"
id="marker5538"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path5536"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(0.4,0,0,0.4,4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:isstock="true"
style="overflow:visible"
id="marker5348"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mstart"
inkscape:collect="always">
<path
inkscape:connector-curvature="0"
transform="matrix(0.4,0,0,0.4,4,0)"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path5346" />
</marker>
<marker
inkscape:stockid="Arrow1Mstart"
orient="auto"
refY="0"
refX="0"
id="marker5152"
style="overflow:visible"
inkscape:isstock="true"
inkscape:collect="always">
<path
id="path5150"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(0.4,0,0,0.4,4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:isstock="true"
style="overflow:visible"
id="marker4974"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mstart"
inkscape:collect="always">
<path
inkscape:connector-curvature="0"
transform="matrix(0.4,0,0,0.4,4,0)"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path4972" />
</marker>
<marker
inkscape:stockid="Arrow1Mstart"
orient="auto"
refY="0"
refX="0"
id="marker4802"
style="overflow:visible"
inkscape:isstock="true"
inkscape:collect="always">
<path
id="path4800"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(0.4,0,0,0.4,4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:isstock="true"
style="overflow:visible"
id="marker4636"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mstart"
inkscape:collect="always">
<path
inkscape:connector-curvature="0"
transform="matrix(0.4,0,0,0.4,4,0)"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path4634" />
</marker>
<marker
inkscape:isstock="true"
style="overflow:visible"
id="marker4476"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mstart">
<path
inkscape:connector-curvature="0"
transform="matrix(0.4,0,0,0.4,4,0)"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path4474" />
</marker>
<marker
inkscape:stockid="Arrow1Mend"
orient="auto"
refY="0"
refX="0"
id="marker2468"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2466"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#ff0000;fill-opacity:1;fill-rule:evenodd;stroke:#ff2a2a;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:stockid="Arrow1Mstart"
orient="auto"
refY="0"
refX="0"
id="marker2464"
style="overflow:visible"
inkscape:isstock="true"
inkscape:collect="always">
<path
id="path2462"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#ff0000;fill-opacity:1;fill-rule:evenodd;stroke:#ff2a2a;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(0.4,0,0,0.4,4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:stockid="Arrow1Mstart"
orient="auto"
refY="0"
refX="0"
id="Arrow1Mstart"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2198"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#ff0000;fill-opacity:1;fill-rule:evenodd;stroke:#ff2a2a;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(0.4,0,0,0.4,4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:stockid="Arrow1Mend"
orient="auto"
refY="0"
refX="0"
id="Arrow1Mend"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2201"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#ff0000;fill-opacity:1;fill-rule:evenodd;stroke:#ff2a2a;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:isstock="true"
style="overflow:visible"
id="marker1826-2-4-7-1-7"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend">
<path
inkscape:connector-curvature="0"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path1824-9-4-2-5-2" />
</marker>
<marker
inkscape:isstock="true"
style="overflow:visible"
id="marker1826-2-4-7-1-7-5"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend">
<path
inkscape:connector-curvature="0"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path1824-9-4-2-5-2-9" />
</marker>
<marker
inkscape:isstock="true"
style="overflow:visible"
id="marker1826-2-4-7-1-7-9"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend">
<path
inkscape:connector-curvature="0"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path1824-9-4-2-5-2-6" />
</marker>
<marker
inkscape:isstock="true"
style="overflow:visible"
id="marker1826-2-4-7-1-7-5-2"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend">
<path
inkscape:connector-curvature="0"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path1824-9-4-2-5-2-9-3" />
</marker>
<marker
inkscape:isstock="true"
style="overflow:visible"
id="marker1826-2-4-7-1-7-9-4"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend">
<path
inkscape:connector-curvature="0"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path1824-9-4-2-5-2-6-9" />
</marker>
<marker
inkscape:isstock="true"
style="overflow:visible"
id="marker1826-2-4-7-1-7-5-27"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend">
<path
inkscape:connector-curvature="0"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path1824-9-4-2-5-2-9-4" />
</marker>
<marker
inkscape:isstock="true"
style="overflow:visible"
id="marker1826-2-4-7-1-7-5-27-9"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend">
<path
inkscape:connector-curvature="0"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path1824-9-4-2-5-2-9-4-4" />
</marker>
<marker
inkscape:stockid="Arrow1Mstart"
orient="auto"
refY="0"
refX="0"
id="marker2683-6"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2681-3"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(0.4,0,0,0.4,4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:stockid="Arrow1Mend"
orient="auto"
refY="0"
refX="0"
id="marker2679-9"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2677-8"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
inkscape:connector-curvature="0" />
</marker>
</defs>
<metadata
id="metadata24">
<rdf:RDF>
<cc:Work
rdf:about="">
<dc:format>image/svg+xml</dc:format>
<dc:type
rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
<dc:title>Thin Provisioning Write</dc:title>
</cc:Work>
</rdf:RDF>
</metadata>
<rect
style="fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:0.26458332;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
id="rect7030"
width="174.625"
height="82.020836"
x="0"
y="1.4210855e-014"
ry="0" />
<rect
style="fill:none;fill-opacity:1;stroke:#999999;stroke-width:0.5;stroke-opacity:1"
id="rect132-6"
ry="1.3229001"
height="50.270832"
width="75.406242"
y="-91.281242"
x="2.6458344"
transform="rotate(90)" />
<rect
x="50.270416"
y="19.84375"
width="22.49"
height="6.6146002"
id="rect104"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<rect
style="fill:none;stroke:#999999;stroke-width:0.26458332;stroke-miterlimit:4;stroke-dasharray:none"
id="rect132"
ry="1.3229001"
height="30.427082"
width="33.072914"
y="-76.729164"
x="11.906253"
transform="rotate(90)" />
<text
x="56.69899"
y="24.392132"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90">LUN0</text>
<rect
style="fill:none;fill-opacity:1;stroke:#999999;stroke-width:0.5;stroke-opacity:1"
id="rect132-6-8"
ry="1.3229001"
height="33.072914"
width="64.822906"
y="-35.718758"
x="10.583331"
transform="rotate(90)" />
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path1192-8-7-7-4-2"
d="m 30.427087,23.812498 19.843748,3e-6"
style="fill:#0000ff;stroke:#0000ff;stroke-width:0.26511249;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#marker4476);marker-end:url(#marker1826-2-4-7-1-7)" />
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path11761-9-7"
d="m 105.83333,33.072917 38.36458,2e-6"
style="fill:#ff0000;stroke:#ff2a2a;stroke-width:0.26499999;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#marker2464);marker-end:url(#marker2468)" />
<rect
x="50.270416"
y="27.781233"
width="22.49"
height="6.6146002"
id="rect104-6"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<rect
x="50.270836"
y="35.718746"
width="22.49"
height="6.6146002"
id="rect104-5"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="49.004951"
y="16.552654"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-5">Target1</text>
<text
x="56.810654"
y="32.229481"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-59">LUN1</text>
<text
x="56.853249"
y="40.350986"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-0">LUN2</text>
<text
x="43.28257"
y="6.9284844"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-5-5">iSCSI Target server</text>
<rect
x="50.270416"
y="55.562496"
width="22.49"
height="6.6146002"
id="rect104-0"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<rect
style="fill:none;stroke:#999999;stroke-width:0.26458332;stroke-miterlimit:4;stroke-dasharray:none"
id="rect132-3"
ry="1.3229001"
height="30.427078"
width="25.135414"
y="-76.729164"
x="47.624996"
transform="rotate(90)" />
<text
x="56.69899"
y="60.110878"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-05">LUN0</text>
<rect
x="50.270416"
y="63.499977"
width="22.49"
height="6.6146002"
id="rect104-6-8"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="49.004944"
y="52.2714"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-5-2">Target2</text>
<text
x="56.810646"
y="67.948235"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-59-4">LUN1</text>
<rect
x="7.937088"
y="19.84375"
width="22.49"
height="6.6146002"
id="rect104-64"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="14.365662"
y="24.392132"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-56">bdev0</text>
<rect
x="7.937088"
y="27.781233"
width="22.49"
height="6.6146002"
id="rect104-6-9"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<rect
x="7.9375038"
y="35.718746"
width="22.49"
height="6.6146002"
id="rect104-5-4"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="14.477322"
y="32.229481"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-59-2">bdev1</text>
<text
x="14.51992"
y="40.350986"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-0-5">bdev2</text>
<rect
x="7.937088"
y="55.562496"
width="22.49"
height="6.6146002"
id="rect104-0-8"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="14.365662"
y="60.110878"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-05-7">bdev3</text>
<rect
x="7.937088"
y="63.499977"
width="22.49"
height="6.6146002"
id="rect104-6-8-2"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="14.477322"
y="67.948235"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-59-4-0">bdev4</text>
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path1192-8-7-7-4-2-6"
d="m 30.427087,31.749998 19.843748,3e-6"
style="fill:#0000ff;stroke:#0000ff;stroke-width:0.26499999;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#marker4636);marker-end:url(#marker1826-2-4-7-1-7-5)" />
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path1192-8-7-7-4-2-4"
d="m 30.427087,39.687498 19.843748,2e-6"
style="fill:#0000ff;stroke:#0000ff;stroke-width:0.26499999;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#marker4802);marker-end:url(#marker1826-2-4-7-1-7-9)" />
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path1192-8-7-7-4-2-6-5"
d="m 30.427087,59.531248 19.843748,2e-6"
style="fill:#0000ff;stroke:#0000ff;stroke-width:0.26499999;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#marker4974);marker-end:url(#marker1826-2-4-7-1-7-5-2)" />
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path1192-8-7-7-4-2-4-5"
d="m 30.427087,67.468748 19.843748,10e-7"
style="fill:#0000ff;stroke:#0000ff;stroke-width:0.26499999;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#marker5152);marker-end:url(#marker1826-2-4-7-1-7-9-4)" />
<rect
x="83.343323"
y="29.104166"
width="22.49"
height="6.6146002"
id="rect104-63"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="84.467346"
y="33.405464"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-1">portal grp 0</text>
<rect
x="83.343323"
y="54.239578"
width="22.49"
height="6.6146002"
id="rect104-63-1"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="84.673019"
y="58.540874"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-1-7">portal grp 1</text>
<text
x="4.7052402"
y="14.717848"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-5-5-8">SPDK bdevs</text>
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path1192-8-7-7-4-2-6-4"
d="m 76.729167,33.072917 h 6.614587"
style="fill:#0000ff;stroke:#0000ff;stroke-width:0.26499999;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#marker5348);marker-end:url(#marker1826-2-4-7-1-7-5-27)" />
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path1192-8-7-7-4-2-6-4-2"
d="m 76.729167,58.208333 h 6.614587"
style="fill:#0000ff;stroke:#0000ff;stroke-width:0.26499999;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#marker5538);marker-end:url(#marker1826-2-4-7-1-7-5-27-9)" />
<rect
x="144.19748"
y="29.104151"
width="22.49"
height="6.6146002"
id="rect104-63-9"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="147.16313"
y="33.713963"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-1-8">initiator 0</text>
<rect
x="144.19748"
y="54.239567"
width="22.49"
height="6.6146002"
id="rect104-63-1-5"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="147.23584"
y="58.922092"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-1-7-0">initiator 1</text>
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path11761-9-7-9"
d="m 105.83333,58.208333 38.36458,2e-6"
style="fill:#ff0000;stroke:#ff2a2a;stroke-width:0.26511249;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#Arrow1Mstart);marker-end:url(#Arrow1Mend)" />
<rect
style="fill:none;stroke:#999999;stroke-width:0.5"
id="rect132-6-1"
ry="1.3229001"
height="33.072926"
width="38.364586"
y="-171.97916"
x="2.6458333"
transform="rotate(90)" />
<rect
style="fill:none;stroke:#999999;stroke-width:0.5"
id="rect132-6-1-3"
ry="1.3229001"
height="33.072914"
width="35.71875"
y="-171.97916"
x="43.65625"
transform="rotate(90)" />
<text
x="141.38495"
y="7.1341634"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-5-5-7">iSCSI client 0</text>
<text
x="141.15009"
y="48.275509"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-5-5-7-5">iSCSI client 1</text>
<path
style="display:inline;fill:none;stroke:#999999;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="M 105.83333,87.312502 124.35416,1.3229172"
id="path2638"
inkscape:connector-curvature="0"
sodipodi:nodetypes="cc" />
<path
style="display:inline;fill:none;stroke:#999999;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="M 107.15625,88.635419 125.67708,2.6458333"
id="path2640"
inkscape:connector-curvature="0"
sodipodi:nodetypes="cc" />
<text
x="105.28584"
y="13.99068"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;display:inline;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-5-5-9">TCP Network</text>
<path
style="display:inline;fill:none;stroke:#000000;stroke-width:0.52916664;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#marker2683-6);marker-end:url(#marker2679-9)"
d="m 107.15625,17.197917 h 18.52083"
id="path2669"
inkscape:connector-curvature="0"
sodipodi:nodetypes="cc" />
<g
id="g4350-40"
transform="matrix(1,0,0,0.61904764,50.020836,28.004467)">
<ellipse
ry="2.6458333"
rx="6.614583"
cy="-11.045678"
cx="104.76043"
id="path4344-1"
style="fill:#afdde9;fill-opacity:1;stroke:#000000;stroke-width:0.52916664;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
<path
sodipodi:nodetypes="ccccc"
inkscape:connector-curvature="0"
id="path4346-6"
d="m 98.145835,-11.045677 v 6.4110574 c 10e-6,3.968751 13.229165,3.968751 13.229165,0 v -6.4110574 c 0,4.2740384 -13.229155,3.9687504 -13.229165,0 z"
style="fill:#afdde9;fill-opacity:1;stroke:#000000;stroke-width:0.52916664;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
<ellipse
ry="2.645833"
rx="6.6145835"
cy="-17.456738"
cx="104.76044"
id="path4344-1-7"
style="fill:#afdde9;fill-opacity:1;stroke:#000000;stroke-width:0.52916664;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
<path
sodipodi:nodetypes="ccccc"
inkscape:connector-curvature="0"
id="path4346-6-3"
d="m 98.145841,-17.456734 v 6.411057 c 10e-6,3.968751 13.229159,3.968751 13.229159,0 v -6.411057 c 0,4.274038 -13.229149,3.96875 -13.229159,0 z"
style="fill:#afdde9;fill-opacity:1;stroke:#000000;stroke-width:0.52916664;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
<ellipse
ry="2.645833"
rx="6.6145835"
cy="-23.867794"
cx="104.76044"
id="path4344-1-9"
style="fill:#afdde9;fill-opacity:1;stroke:#000000;stroke-width:0.52916664;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
<path
sodipodi:nodetypes="ccccc"
inkscape:connector-curvature="0"
id="path4346-6-2"
d="m 98.145841,-23.867792 v 6.411058 c 10e-6,3.968751 13.229159,3.968751 13.229159,0 v -6.411058 c 0,4.274039 -13.229149,3.968751 -13.229159,0 z"
style="fill:#afdde9;fill-opacity:1;stroke:#000000;stroke-width:0.52916664;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
<ellipse
ry="2.645833"
rx="6.6145835"
cy="72.298073"
cx="106.08334"
id="path4344-1-5"
style="fill:#afdde9;fill-opacity:1;stroke:#000000;stroke-width:0.52916664;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
<ellipse
ry="2.645833"
rx="6.6145835"
cy="65.887009"
cx="106.08335"
id="path4344-1-7-3"
style="fill:#afdde9;fill-opacity:1;stroke:#000000;stroke-width:0.52916664;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
<path
sodipodi:nodetypes="ccccc"
inkscape:connector-curvature="0"
id="path4346-6-3-4"
d="m 99.468754,65.887013 v 6.411057 c 10e-6,3.968751 13.229156,3.968751 13.229156,0 v -6.411057 c 0,4.274038 -13.229146,3.96875 -13.229156,0 z"
style="fill:#afdde9;fill-opacity:1;stroke:#000000;stroke-width:0.52916664;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
<ellipse
ry="2.645833"
rx="6.6145835"
cy="59.475952"
cx="106.08335"
id="path4344-1-9-1"
style="fill:#afdde9;fill-opacity:1;stroke:#000000;stroke-width:0.52916664;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
<path
sodipodi:nodetypes="ccccc"
inkscape:connector-curvature="0"
id="path4346-6-2-9"
d="m 99.468754,59.475955 v 6.411058 c 10e-6,3.968751 13.229156,3.968751 13.229156,0 v -6.411058 c 0,4.274039 -13.229146,3.968751 -13.229156,0 z"
style="fill:#afdde9;fill-opacity:1;stroke:#000000;stroke-width:0.52916664;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
</g>
</svg>

Before

Width:  |  Height:  |  Size: 33 KiB

View File

@ -1,540 +0,0 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://creativecommons.org/ns#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns="http://www.w3.org/2000/svg"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
width="169.33331mm"
height="53.006062mm"
version="1.1"
viewBox="0 0 169.33331 53.00606"
id="svg136"
sodipodi:docname="iscsi_example.svg"
inkscape:version="0.92.3 (2405546, 2018-03-11)">
<sodipodi:namedview
pagecolor="#ffffff"
bordercolor="#666666"
borderopacity="1"
objecttolerance="10"
gridtolerance="10"
guidetolerance="10"
inkscape:pageopacity="0"
inkscape:pageshadow="2"
inkscape:window-width="1742"
inkscape:window-height="910"
id="namedview138"
showgrid="true"
inkscape:zoom="1.2864091"
inkscape:cx="231.4415"
inkscape:cy="205.83148"
inkscape:window-x="1676"
inkscape:window-y="113"
inkscape:window-maximized="0"
inkscape:current-layer="layer1"
fit-margin-top="0"
fit-margin-left="0"
fit-margin-right="0"
fit-margin-bottom="0">
<inkscape:grid
type="xygrid"
id="grid2224"
originx="33.072915"
originy="-46.257384" />
</sodipodi:namedview>
<title
id="title2">Thin Provisioning Write</title>
<defs
id="defs22">
<marker
inkscape:stockid="Arrow1Mstart"
orient="auto"
refY="0"
refX="0"
id="marker2683-6"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2681-3"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(0.4,0,0,0.4,4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:stockid="Arrow1Mend"
orient="auto"
refY="0"
refX="0"
id="marker2679-9"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2677-8"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:stockid="Arrow1Mstart"
orient="auto"
refY="0"
refX="0"
id="marker2464-2-6-1"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2462-7-8-2"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#999999;fill-opacity:1;fill-rule:evenodd;stroke:#999999;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(0.4,0,0,0.4,4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:stockid="Arrow1Mend"
orient="auto"
refY="0"
refX="0"
id="marker2468-8-9-5"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2466-1-3-2"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#999999;fill-opacity:1;fill-rule:evenodd;stroke:#999999;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:stockid="Arrow1Mstart"
orient="auto"
refY="0"
refX="0"
id="marker2464-2-0"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2462-7-6"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#999999;fill-opacity:1;fill-rule:evenodd;stroke:#999999;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(0.4,0,0,0.4,4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:stockid="Arrow1Mend"
orient="auto"
refY="0"
refX="0"
id="marker2468-8-8"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2466-1-5"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#999999;fill-opacity:1;fill-rule:evenodd;stroke:#999999;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:stockid="Arrow1Mstart"
orient="auto"
refY="0"
refX="0"
id="marker2659-1"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2657-7"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(0.4,0,0,0.4,4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:isstock="true"
style="overflow:visible"
id="marker1826-2-4-7-1-7-5-27-1"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend">
<path
inkscape:connector-curvature="0"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path1824-9-4-2-5-2-9-4-0" />
</marker>
<marker
inkscape:stockid="Arrow1Mstart"
orient="auto"
refY="0"
refX="0"
id="marker2667-4"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2665-0"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(0.4,0,0,0.4,4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:isstock="true"
style="overflow:visible"
id="marker1826-2-4-7-1-7-5-9"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend">
<path
inkscape:connector-curvature="0"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path1824-9-4-2-5-2-9-9" />
</marker>
<marker
inkscape:stockid="Arrow1Mstart"
orient="auto"
refY="0"
refX="0"
id="marker2464-3"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2462-5"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#ff0000;fill-opacity:1;fill-rule:evenodd;stroke:#ff2a2a;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(0.4,0,0,0.4,4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:stockid="Arrow1Mend"
orient="auto"
refY="0"
refX="0"
id="marker2468-5"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2466-4"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#ff0000;fill-opacity:1;fill-rule:evenodd;stroke:#ff2a2a;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:stockid="Arrow1Mstart"
orient="auto"
refY="0"
refX="0"
id="marker2663-8"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2661-0"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(0.4,0,0,0.4,4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:isstock="true"
style="overflow:visible"
id="marker1826-2-4-7-1-7-97"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend">
<path
inkscape:connector-curvature="0"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path1824-9-4-2-5-2-93" />
</marker>
</defs>
<metadata
id="metadata24">
<rdf:RDF>
<cc:Work
rdf:about="">
<dc:format>image/svg+xml</dc:format>
<dc:type
rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
<dc:title>Thin Provisioning Write</dc:title>
</cc:Work>
</rdf:RDF>
</metadata>
<g
inkscape:groupmode="layer"
id="layer1"
inkscape:label="Layer 1"
style="display:inline"
transform="translate(-20.09375,9.9883163e-4)">
<rect
style="fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:0.52916664;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
id="rect2890"
width="169.33331"
height="52.916664"
x="20.09375"
y="0.043701001" />
<rect
x="70.364159"
y="19.887449"
width="22.49"
height="6.6146002"
id="rect104"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<rect
style="fill:none;stroke:#999999;stroke-width:0.26458332;stroke-miterlimit:4;stroke-dasharray:none"
id="rect132"
ry="1.3229001"
height="30.427082"
width="33.072914"
y="-96.822914"
x="11.949952"
transform="rotate(90)" />
<text
x="76.792732"
y="24.435831"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90">LUN0</text>
<rect
x="70.364159"
y="27.824934"
width="22.49"
height="6.6146002"
id="rect104-6"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="69.098686"
y="16.596354"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-5">Target: disk1</text>
<text
x="76.904396"
y="32.273182"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-59">LUN1</text>
<text
x="63.376305"
y="6.9721842"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-5-5">iSCSI Target server</text>
<rect
x="28.030828"
y="19.887449"
width="22.49"
height="6.6146002"
id="rect104-64"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="33.225346"
y="24.641508"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-56">Malloc0</text>
<rect
x="28.03083"
y="27.824945"
width="22.49"
height="6.6146002"
id="rect104-6-9"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="33.337006"
y="32.273182"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-59-2">Malloc1</text>
<rect
style="fill:none;stroke:#999999;stroke-width:0.5"
id="rect132-6"
ry="1.3229001"
height="50.270836"
width="47.624996"
y="-111.375"
x="2.6895342"
transform="rotate(90)" />
<rect
style="fill:none;stroke:#999999;stroke-width:0.5"
id="rect132-6-8"
ry="1.3229001"
height="33.072918"
width="27.781242"
y="-55.812492"
x="11.949948"
transform="rotate(90)" />
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path1192-8-7-7-4-2-6"
d="m 50.520827,31.793698 19.843748,3e-6"
style="fill:#0000ff;stroke:#0000ff;stroke-width:0.26499999;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#marker2667-4);marker-end:url(#marker1826-2-4-7-1-7-5-9)" />
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path1192-8-7-7-4-2"
d="m 50.520827,23.856198 19.843748,2e-6"
style="fill:#0000ff;stroke:#0000ff;stroke-width:0.26499999;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#marker2663-8);marker-end:url(#marker1826-2-4-7-1-7-97)" />
<rect
x="103.4371"
y="37.085365"
width="18.521248"
height="6.6145835"
id="rect104-63"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="105.57915"
y="41.386662"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-1">portal 1</text>
<text
x="25.394737"
y="15.738133"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-5-5-8">SPDK bdevs</text>
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path1192-8-7-7-4-2-6-4"
d="M 96.822918,41.054113 H 103.4375"
style="fill:#0000ff;stroke:#0000ff;stroke-width:0.26511249;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#marker2659-1);marker-end:url(#marker1826-2-4-7-1-7-5-27-1)" />
<rect
x="158.99957"
y="37.08535"
width="22.49"
height="6.6146002"
id="rect104-63-9"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="161.96524"
y="41.69516"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-1-8">initiator 2</text>
<rect
style="fill:none;stroke:#999999;stroke-width:0.5"
id="rect132-6-1"
ry="1.3229001"
height="33.072933"
width="38.364578"
y="-186.78125"
x="11.949951"
transform="rotate(90)" />
<text
x="156.03279"
y="15.81625"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-5-5-7">iSCSI client 0</text>
<text
x="101.36903"
y="47.613781"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-5-7">10.0.0.1:3260</text>
<rect
x="161.64542"
y="19.887432"
width="19.844177"
height="6.6146011"
id="rect104-9"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="168.07399"
y="24.435814"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-7">sdd</text>
<rect
x="161.64542"
y="27.824913"
width="19.844177"
height="6.6146178"
id="rect104-6-8"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="168.18565"
y="32.273163"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-59-1">sde</text>
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path11761-9-7-0"
d="m 92.854164,23.8562 68.791666,-1e-6"
style="fill:#999999;fill-opacity:1;stroke:#999999;stroke-width:0.26511249;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:1.06044998, 1.06044998;stroke-dashoffset:0;stroke-opacity:1;marker-start:url(#marker2464-2-0);marker-end:url(#marker2468-8-8)" />
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path11761-9-7-0-0"
d="m 92.854164,31.7937 68.791666,-2e-6"
style="fill:#999999;fill-opacity:1;stroke:#999999;stroke-width:0.26511249;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:1.06044998, 1.06044998;stroke-dashoffset:0;stroke-opacity:1;marker-start:url(#marker2464-2-6-1);marker-end:url(#marker2468-8-9-5)" />
<text
x="160.41017"
y="47.490952"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-5-7-2">10.0.0.2/32</text>
<path
style="fill:none;stroke:#999999;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="M 125.92708,51.63745 144.44792,0.04369787"
id="path2638"
inkscape:connector-curvature="0"
sodipodi:nodetypes="cc" />
<path
style="fill:none;stroke:#999999;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="M 127.25,52.960366 145.77084,1.3666139"
id="path2640"
inkscape:connector-curvature="0"
sodipodi:nodetypes="cc" />
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path11761-9-7"
d="M 121.95833,41.054117 159,41.054115"
style="fill:#ff0000;stroke:#ff2a2a;stroke-width:0.26499999;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#marker2464-3);marker-end:url(#marker2468-5)" />
<text
x="122.73377"
y="8.7427139"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-5-5-9">TCP Network</text>
<path
style="fill:none;stroke:#000000;stroke-width:0.52916664;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#marker2683-6);marker-end:url(#marker2679-9)"
d="M 124.60417,11.949951 H 143.125"
id="path2669"
inkscape:connector-curvature="0" />
</g>
</svg>

Before

Width:  |  Height:  |  Size: 21 KiB

View File

@ -1,124 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<svg width="193.94mm" height="139.71mm" version="1.1" viewBox="0 0 193.94 139.71" xmlns="http://www.w3.org/2000/svg" xmlns:cc="http://creativecommons.org/ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<title>NVMe CUSE</title>
<defs>
<marker id="marker9353" overflow="visible" orient="auto">
<path transform="matrix(-.8 0 0 -.8 -10 0)" d="m0 0 5-5-17.5 5 17.5 5z" fill-rule="evenodd" stroke="#000" stroke-width="1pt"/>
</marker>
<marker id="marker7156" overflow="visible" orient="auto">
<path transform="matrix(.8 0 0 .8 10 0)" d="m0 0 5-5-17.5 5 17.5 5z" fill-rule="evenodd" stroke="#000" stroke-width="1pt"/>
</marker>
<marker id="marker4572" overflow="visible" orient="auto">
<path transform="matrix(-.8 0 0 -.8 -10 0)" d="m0 0 5-5-17.5 5 17.5 5z" fill-rule="evenodd" stroke="#000" stroke-width="1pt"/>
</marker>
<marker id="marker4436" overflow="visible" orient="auto">
<path transform="matrix(-.8 0 0 -.8 -10 0)" d="m0 0 5-5-17.5 5 17.5 5z" fill-rule="evenodd" stroke="#000" stroke-width="1pt"/>
</marker>
<marker id="marker4324" overflow="visible" orient="auto">
<path transform="matrix(.8 0 0 .8 10 0)" d="m0 0 5-5-17.5 5 17.5 5z" fill-rule="evenodd" stroke="#000" stroke-width="1pt"/>
</marker>
<marker id="marker2300" overflow="visible" orient="auto">
<path transform="matrix(-.8 0 0 -.8 -10 0)" d="m0 0 5-5-17.5 5 17.5 5z" fill-rule="evenodd" stroke="#000" stroke-width="1pt"/>
</marker>
<marker id="marker2110" overflow="visible" orient="auto">
<path transform="matrix(-.8 0 0 -.8 -10 0)" d="m0 0 5-5-17.5 5 17.5 5z" fill-rule="evenodd" stroke="#000" stroke-width="1pt"/>
</marker>
<marker id="marker2028" overflow="visible" orient="auto">
<path transform="matrix(-.8 0 0 -.8 -10 0)" d="m0 0 5-5-17.5 5 17.5 5z" fill-rule="evenodd" stroke="#000" stroke-width="1pt"/>
</marker>
<marker id="marker1219" overflow="visible" orient="auto">
<path transform="matrix(.8 0 0 .8 10 0)" d="m0 0 5-5-17.5 5 17.5 5z" fill-rule="evenodd" stroke="#000" stroke-width="1pt"/>
</marker>
<marker id="Arrow1Lstart" overflow="visible" orient="auto">
<path transform="matrix(.8 0 0 .8 10 0)" d="m0 0 5-5-17.5 5 17.5 5z" fill-rule="evenodd" stroke="#000" stroke-width="1pt"/>
</marker>
<marker id="marker1127" overflow="visible" orient="auto">
<path transform="matrix(-.8 0 0 -.8 -10 0)" d="m0 0 5-5-17.5 5 17.5 5z" fill-rule="evenodd" stroke="#000" stroke-width="1pt"/>
</marker>
<marker id="Arrow1Lend" overflow="visible" orient="auto">
<path transform="matrix(-.8 0 0 -.8 -10 0)" d="m0 0 5-5-17.5 5 17.5 5z" fill-rule="evenodd" stroke="#000" stroke-width="1pt"/>
</marker>
</defs>
<metadata>
<rdf:RDF>
<cc:Work rdf:about="">
<dc:format>image/svg+xml</dc:format>
<dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/>
<dc:title>NVMe CUSE</dc:title>
</cc:Work>
</rdf:RDF>
</metadata>
<g transform="translate(-2.1066 -22.189)">
<rect x="11.906" y="134.85" width="72.004" height="20.6" ry="3.7798" fill="none" stroke="#000" stroke-width=".5"/>
<text x="14.363094" y="149.02231" fill="#000000" font-family="sans-serif" font-size="10.583px" letter-spacing="0px" stroke-width=".26458" word-spacing="0px" style="line-height:1.25" xml:space="preserve"><tspan x="14.363094" y="149.02231" font-family="sans-serif" font-size="3.5278px" stroke-width=".26458" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal">/dev/spdk/nvme0</tspan></text>
<text x="47.625" y="149.02231" fill="#000000" font-family="sans-serif" font-size="10.583px" letter-spacing="0px" stroke-width=".26458" word-spacing="0px" style="line-height:1.25" xml:space="preserve"><tspan x="47.625" y="149.02231" font-family="sans-serif" font-size="3.5278px" stroke-width=".26458" writing-mode="lr" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal">/dev/spdk/nvme0n1</tspan></text>
<g stroke="#000">
<rect x="12.095" y="35.818" width="71.249" height="88.446" ry="4.3467" fill="none" stroke-width=".5"/>
<rect x="133.43" y="33.929" width="62.366" height="76.351" ry="4.7247" fill="none" stroke-width=".5"/>
<g fill="#fff" stroke-width=".26458">
<rect x="14.174" y="91.57" width="64.256" height="24.568"/>
<g fill-opacity=".9798">
<rect x="46.302" y="100.64" width="26.62" height="11.061"/>
</g>
</g>
<g transform="translate(-.53932 -.16291)">
<path d="m63.878 111.98v32.884" fill="none" marker-end="url(#marker1127)" marker-start="url(#Arrow1Lstart)" stroke-width=".26458px"/>
<g stroke-width=".265">
<path d="m34.585 115.57v28.726" fill="none" marker-end="url(#Arrow1Lend)" marker-start="url(#marker1219)"/>
<rect x="136.26" y="39.031" width="54.996" height="58.586" fill="#fff"/>
<rect x="153.84" y="52.26" width="34.018" height="11.906" ry="5.8544" fill="none"/>
</g>
<path d="m112.45 24.479v137.58" fill="none" stroke-dasharray="1.5874999, 1.5874999" stroke-width=".26458"/>
</g>
<g fill="#fff" stroke-width=".265">
<rect x="89.58" y="54.339" width="38.365" height="8.8824"/>
</g>
</g>
<g font-family="sans-serif" font-size="4.2333px" letter-spacing="0px" stroke-width=".26458" word-spacing="0px">
<text x="93.54911" y="59.800339" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="93.54911" y="59.800339" stroke-width=".26458">io_msg queue</tspan></text>
<text x="11.906249" y="27.31399" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="11.906249" y="27.31399" stroke-width=".26458">CUSE threads</tspan></text>
<text x="165.36458" y="27.502975" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="165.36458" y="27.502975" stroke-width=".26458">SPDK threads</tspan></text>
</g>
<g stroke="#000">
<rect x="17.009" y="47.914" width="29.482" height="13.04" ry="6.5201" fill="#fff" stroke-width=".265"/>
<rect x="49.921" y="68.161" width="28.915" height="13.04" ry="6.5201" fill="#fff" stroke-width=".265"/>
<g fill="none">
<path d="m32.506 61.143v30.427" marker-start="url(#marker7156)" stroke-width=".26458px"/>
<path d="m63.689 81.176 0.18899 19.277" marker-start="url(#marker4324)" stroke-width=".265"/>
<g stroke-width=".26458px">
<path d="m46.113 54.339h43.467" marker-end="url(#marker2028)"/>
<path d="m64.284 67.972c0.02768-6.3997-1.3229-5.2917 25.135-5.2917" marker-end="url(#marker2110)"/>
<path d="m127.78 56.066h25.135" marker-end="url(#marker2300)"/>
</g>
</g>
</g>
<g stroke-width=".26458">
<g transform="translate(-.25341)" font-family="sans-serif" font-size="4.2333px" letter-spacing="0px" word-spacing="0px">
<text x="138.90625" y="44.889877" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="138.90625" y="44.889877" stroke-width=".26458">NVMe</tspan></text>
<text x="16.063986" y="97.050598" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="16.063986" y="97.050598" stroke-width=".26458">CUSE ctrlr</tspan></text>
<text x="48.380947" y="106.12202" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="48.380947" y="106.12202" stroke-width=".26458">CUSE ns</tspan></text>
<text x="51.420551" y="75.799461" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="51.420551" y="75.799461" stroke-width=".26458">ioctl pthread</tspan></text>
<text x="18.906757" y="55.833015" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="18.906757" y="55.833015" stroke-width=".26458">ioctl pthread</tspan></text>
</g>
<path d="m160.86 85.17c0.38097 13.154-7.1538 11.542-82.052 10.936" fill="none" marker-end="url(#marker4572)" stroke="#000" stroke-dasharray="0.79374995, 0.79374995"/>
<path d="m179.38 85.17c0.37797 22.25-6.5765 20.83-106.08 20.641" fill="none" marker-end="url(#marker4436)" stroke="#000" stroke-dasharray="0.79374995, 0.79374995"/>
</g>
<g font-family="sans-serif" font-size="4.2333px" letter-spacing="0px" stroke-width=".26458" word-spacing="0px">
<text x="13.229166" y="139.7619" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="13.229166" y="139.7619" stroke-width=".26458">Kernel</tspan></text>
<text x="14.552083" y="41.488094" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="14.552083" y="41.488094" stroke-width=".26458">CUSE</tspan></text>
<text x="161.73709" y="59.415913" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="161.73709" y="59.415913" stroke-width=".26458">io poller</tspan></text>
</g>
<g fill="none" stroke="#000">
<path d="m111.91 127.5h-109.8" stroke-dasharray="1.58749992, 1.58749992" stroke-width=".26458"/>
<rect x="153.3" y="71.941" width="34.018" height="13.229" ry="6.6146" stroke-width=".265"/>
<path d="m170.12 64.003v7.9375" marker-end="url(#marker9353)" stroke-width=".265"/>
</g>
<g font-family="sans-serif" font-size="4.2333px" letter-spacing="0px" stroke-width=".26458" word-spacing="0px">
<text x="159.72221" y="79.76664" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="159.72221" y="79.76664" stroke-width=".26458">io execute</tspan></text>
<text x="172.34003" y="68.59539" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="172.34003" y="68.59539" font-family="sans-serif" font-size="2.8222px" stroke-width=".26458" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal">fn(arg)</tspan></text>
<text x="53.046707" y="52.192699" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="53.046707" y="52.192699" font-family="sans-serif" font-size="2.8222px" stroke-width=".26458" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal">nvme_io_msg send()</tspan></text>
<text x="53.102341" y="60.250244" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="53.102341" y="60.250244" font-family="sans-serif" font-size="2.8222px" stroke-width=".26458" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal">nvme_io_msg send()</tspan></text>
<text x="120.79763" y="50.70586" font-size="12px" stroke-width="1" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="120.79763" y="50.70586" font-family="sans-serif" font-size="2.8222px" stroke-width=".26458" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal">spdk_nvme_io_msg process()</tspan></text>
</g>
</g>
</svg>

Before

Width:  |  Height:  |  Size: 12 KiB

View File

@ -1,41 +1,31 @@
# Storage Performance Development Kit {#mainpage}
# Storage Performance Development Kit {#index}
# Introduction
@copydoc intro
# Concepts
@copydoc concepts
# User Guides
@copydoc user_guides
# Programmer Guides
@copydoc prog_guides
# General Information
@copydoc general
# Miscellaneous
@copydoc misc
# Driver Modules
@copydoc driver_modules
# Modules
@copydoc modules
# Tools
@copydoc tools
# CI Tools
@copydoc ci_tools
# Experimental Tools
@copydoc experimental_tools
# Performance Reports
@copydoc performance_reports

View File

@ -4,5 +4,4 @@
- @subpage getting_started
- @subpage vagrant
- @subpage changelog
- @subpage deprecation
- [Source Code (GitHub)](https://github.com/spdk/spdk)

View File

@ -10,71 +10,89 @@ This following section describes how to run iscsi from your cloned package.
This guide starts by assuming that you can already build the standard SPDK distribution on your
platform.
Once built, the binary will be in `build/bin`.
Once built, the binary will be in `app/iscsi_tgt`.
If you want to kill the application by using signal, make sure use the SIGTERM, then the application
will release all the shared memory resource before exit, the SIGKILL will make the shared memory
resource have no chance to be released by applications, you may need to release the resource manually.
## Introduction
## Configuring iSCSI Target {#iscsi_config}
The following diagram shows relations between different parts of iSCSI structure described in this
document.
A `iscsi_tgt` specific configuration file is used to configure the iSCSI target. A fully documented
example configuration file is located at `etc/spdk/iscsi.conf.in`.
![iSCSI structure](iscsi.svg)
The configuration file is used to configure the SPDK iSCSI target. This file defines the following:
TCP ports to use as iSCSI portals; general iSCSI parameters; initiator names and addresses to allow
access to iSCSI target nodes; number and types of storage backends to export over iSCSI LUNs; iSCSI
target node mappings between portal groups, initiator groups, and LUNs.
### Assigning CPU Cores to the iSCSI Target {#iscsi_config_lcore}
You should make a copy of the example configuration file, modify it to suit your environment, and
then run the iscsi_tgt application and pass it the configuration file using the -c option. Right now,
the target requires elevated privileges (root) to run.
~~~
app/iscsi_tgt/iscsi_tgt -c /path/to/iscsi.conf
~~~
## Assigning CPU Cores to the iSCSI Target {#iscsi_config_lcore}
SPDK uses the [DPDK Environment Abstraction Layer](http://dpdk.org/doc/guides/prog_guide/env_abstraction_layer.html)
to gain access to hardware resources such as huge memory pages and CPU core(s). DPDK EAL provides
functions to assign threads to specific cores.
To ensure the SPDK iSCSI target has the best performance, place the NICs and the NVMe devices on the
same NUMA node and configure the target to run on CPU cores associated with that node. The following
command line option is used to configure the SPDK iSCSI target:
parameters in the configuration file are used to configure SPDK iSCSI target:
~~~
-m 0xF000000
**ReactorMask:** A hexadecimal bit mask of the CPU cores that SPDK is allowed to execute work
items on. The ReactorMask is located in the [Global] section of the configuration file. For example,
to assign lcores 24,25,26 and 27 to iSCSI target work items, set the ReactorMask to:
~~~{.sh}
ReactorMask 0xF000000
~~~
This is a hexadecimal bit mask of the CPU cores where the iSCSI target will start polling threads.
In this example, CPU cores 24, 25, 26 and 27 would be used.
## Configuring a LUN in the iSCSI Target {#iscsi_lun}
Each LUN in an iSCSI target node is associated with an SPDK block device. See @ref bdev
for details on configuring SPDK block devices. The block device to LUN mappings are specified in the
configuration file as:
~~~~
[TargetNodeX]
LUN0 Malloc0
LUN1 Nvme0n1
~~~~
This exports a malloc'd target. The disk is a RAM disk that is a chunk of memory allocated by iscsi in
user space. It will use offload engine to do the copy job instead of memcpy if the system has enough DMA
channels.
## Configuring iSCSI Target via RPC method {#iscsi_rpc}
The iSCSI target is configured via JSON-RPC calls. See @ref jsonrpc for details.
In addition to the configuration file, the iSCSI target may also be configured via JSON-RPC calls. See
@ref jsonrpc for details.
### Portal groups
- iscsi_create_portal_group -- Add a portal group.
- iscsi_delete_portal_group -- Delete an existing portal group.
- iscsi_target_node_add_pg_ig_maps -- Add initiator group to portal group mappings to an existing iSCSI target node.
- iscsi_target_node_remove_pg_ig_maps -- Delete initiator group to portal group mappings from an existing iSCSI target node.
- iscsi_get_portal_groups -- Show information about all available portal groups.
### Add the portal group
~~~
/path/to/spdk/scripts/rpc.py iscsi_create_portal_group 1 10.0.0.1:3260
python /path/to/spdk/scripts/rpc.py add_portal_group 1 127.0.0.1:3260
~~~
### Initiator groups
- iscsi_create_initiator_group -- Add an initiator group.
- iscsi_delete_initiator_group -- Delete an existing initiator group.
- iscsi_initiator_group_add_initiators -- Add initiators to an existing initiator group.
- iscsi_get_initiator_groups -- Show information about all available initiator groups.
### Add the initiator group
~~~
/path/to/spdk/scripts/rpc.py iscsi_create_initiator_group 2 ANY 10.0.0.2/32
python /path/to/spdk/scripts/rpc.py add_initiator_group 2 ANY 127.0.0.1/32
~~~
### Target nodes
- iscsi_create_target_node -- Add an iSCSI target node.
- iscsi_delete_target_node -- Delete an iSCSI target node.
- iscsi_target_node_add_lun -- Add a LUN to an existing iSCSI target node.
- iscsi_get_target_nodes -- Show information about all available iSCSI target nodes.
### Construct the backend block device
~~~
/path/to/spdk/scripts/rpc.py iscsi_create_target_node Target3 Target3_alias MyBdev:0 1:2 64 -d
python /path/to/spdk/scripts/rpc.py construct_malloc_bdev -b MyBdev 64 512
~~~
### Construct the target node
~~~
python /path/to/spdk/scripts/rpc.py construct_target_node Target3 Target3_alias MyBdev:0 1:2 64 0 0 0 1
~~~
## Configuring iSCSI Initiator {#iscsi_initiator}
@ -123,9 +141,9 @@ net.core.netdev_max_backlog = 300000
### Discovery
Assume target is at 10.0.0.1
Assume target is at 192.168.1.5
~~~
iscsiadm -m discovery -t sendtargets -p 10.0.0.1
iscsiadm -m discovery -t sendtargets -p 192.168.1.5
~~~
### Connect to target
@ -181,147 +199,166 @@ Increase requests for block queue
echo "1024" > /sys/block/sdc/queue/nr_requests
~~~
### Example: Configure simple iSCSI Target with one portal and two LUNs
Assuming we have one iSCSI Target server with portal at 10.0.0.1:3200, two LUNs (Malloc0 and Malloc1),
and accepting initiators on 10.0.0.2/32, like on diagram below:
# Vector Packet Processing {#vpp}
![Sample iSCSI configuration](iscsi_example.svg)
VPP (part of [Fast Data - Input/Output](https://fd.io/) project) is an extensible
userspace framework providing networking functionality. It is build on idea of
packet processing graph (see [What is VPP?](https://wiki.fd.io/view/VPP/What_is_VPP?)).
#### Configure iSCSI Target
A detailed instructions for **simplified steps 1-3** below, can be found on
VPP [Quick Start Guide](https://wiki.fd.io/view/VPP).
Start iscsi_tgt application:
```
./build/bin/iscsi_tgt
```
*SPDK supports VPP version 18.01.1.*
Construct two 64MB Malloc block devices with 512B sector size "Malloc0" and "Malloc1":
## 1. Building VPP (optional) {#vpp_build}
```
./scripts/rpc.py bdev_malloc_create -b Malloc0 64 512
./scripts/rpc.py bdev_malloc_create -b Malloc1 64 512
```
Create new portal group with id 1, and address 10.0.0.1:3260:
```
./scripts/rpc.py iscsi_create_portal_group 1 10.0.0.1:3260
```
Create one initiator group with id 2 to accept any connection from 10.0.0.2/32:
```
./scripts/rpc.py iscsi_create_initiator_group 2 ANY 10.0.0.2/32
```
Finally construct one target using previously created bdevs as LUN0 (Malloc0) and LUN1 (Malloc1)
with a name "disk1" and alias "Data Disk1" using portal group 1 and initiator group 2.
```
./scripts/rpc.py iscsi_create_target_node disk1 "Data Disk1" "Malloc0:0 Malloc1:1" 1:2 64 -d
```
#### Configure initiator
Discover target
*Please skip this step if using already built packages.*
Clone and checkout VPP
~~~
$ iscsiadm -m discovery -t sendtargets -p 10.0.0.1
10.0.0.1:3260,1 iqn.2016-06.io.spdk:disk1
git clone https://gerrit.fd.io/r/vpp && cd vpp
git checkout v18.01.1
~~~
Connect to the target
Install VPP build dependencies
~~~
iscsiadm -m node --login
make install-dep
~~~
At this point the iSCSI target should show up as SCSI disks.
Check dmesg to see what they came up as. In this example it can look like below:
Build and create .rpm packages
~~~
...
[630111.860078] scsi host68: iSCSI Initiator over TCP/IP
[630112.124743] scsi 68:0:0:0: Direct-Access INTEL Malloc disk 0001 PQ: 0 ANSI: 5
[630112.125445] sd 68:0:0:0: [sdd] 131072 512-byte logical blocks: (67.1 MB/64.0 MiB)
[630112.125468] sd 68:0:0:0: Attached scsi generic sg3 type 0
[630112.125926] sd 68:0:0:0: [sdd] Write Protect is off
[630112.125934] sd 68:0:0:0: [sdd] Mode Sense: 83 00 00 08
[630112.126049] sd 68:0:0:0: [sdd] Write cache: enabled, read cache: disabled, doesn't support DPO or FUA
[630112.126483] scsi 68:0:0:1: Direct-Access INTEL Malloc disk 0001 PQ: 0 ANSI: 5
[630112.127096] sd 68:0:0:1: Attached scsi generic sg4 type 0
[630112.127143] sd 68:0:0:1: [sde] 131072 512-byte logical blocks: (67.1 MB/64.0 MiB)
[630112.127566] sd 68:0:0:1: [sde] Write Protect is off
[630112.127573] sd 68:0:0:1: [sde] Mode Sense: 83 00 00 08
[630112.127728] sd 68:0:0:1: [sde] Write cache: enabled, read cache: disabled, doesn't support DPO or FUA
[630112.128246] sd 68:0:0:0: [sdd] Attached SCSI disk
[630112.129789] sd 68:0:0:1: [sde] Attached SCSI disk
...
make pkg-rpm
~~~
You may also use simple bash command to find /dev/sdX nodes for each iSCSI LUN
in all logged iSCSI sessions:
Alternatively, build and create .deb packages
~~~
make pkg-deb
~~~
Packages can be found in `vpp/build-root/` directory.
For more in depth instructions please see Building section in
[VPP documentation](https://wiki.fd.io/view/VPP/Pulling,_Building,_Running,_Hacking_and_Pushing_VPP_Code#Building)
*Please note: VPP 18.01.1 does not support OpenSSL 1.1. It is suggested to install a compatibility package
for compilation time.*
~~~
$ iscsiadm -m session -P 3 | grep "Attached scsi disk" | awk '{print $4}'
sdd
sde
sudo dnf install -y --allowerasing compat-openssl10-devel
~~~
*Then reinstall latest OpenSSL devel package:*
~~~
sudo dnf install -y --allowerasing openssl-devel
~~~
## 2. Installing VPP {#vpp_install}
Packages can be installed from distribution repository or built in previous step.
Minimal set of packages consists of `vpp`, `vpp-lib` and `vpp-devel`.
*Note: Please remove or modify /etc/sysctl.d/80-vpp.conf file with appropriate values
dependent on number of hugepages that will be used on system.*
## 3. Running VPP {#vpp_run}
VPP takes over any network interfaces that were bound to userspace driver,
for details please see DPDK guide on
[Binding and Unbinding Network Ports to/from the Kernel Modules](http://dpdk.org/doc/guides/linux_gsg/linux_drivers.html#binding-and-unbinding-network-ports-to-from-the-kernel-modules).
VPP is installed as service and disabled by default. To start VPP with default config:
~~~
sudo systemctl start vpp
~~~
Alternatively, use `vpp` binary directly
~~~
sudo vpp unix {cli-listen /run/vpp/cli.sock}
~~~
A usefull tool is `vppctl`, that allows to control running VPP instance.
Either by entering VPP configuration prompt
~~~
sudo vppctl
~~~
Or, by sending single command directly. For example to display interfaces within VPP:
~~~
sudo vppctl show interface
~~~
### Example: Tap interfaces on single host
For functional test purpose a virtual tap interface can be created,
so no additional network hardware is required.
This will allow network communication between SPDK iSCSI target using VPP end of tap
and kernel iSCSI initiator using the kernel part of tap. A single host is used in this scenario.
Create tap interface via VPP
~~~
vppctl tap connect tap0
vppctl set interface state tapcli-0 up
vppctl set interface ip address tapcli-0 10.0.0.1/24
vppctl show int addr
~~~
Assign address on kernel interface
~~~
sudo ip addr add 10.0.0.2/24 dev tap0
sudo ip link set tap0 up
~~~
To verify connectivity
~~~
ping 10.0.0.1
~~~
## 4. Building SPDK with VPP {#vpp_built_into_spdk}
Support for VPP can be built into SPDK by using configuration option.
~~~
configure --with-vpp
~~~
Alternatively, directory with built libraries can be pointed at
and will be used for compilation instead of installed packages.
~~~
configure --with-vpp=/path/to/vpp/repo/build-root/vpp
~~~
## 5. Running SPDK with VPP {#vpp_running_with_spdk}
VPP application has to be started before SPDK iSCSI target,
in order to enable usage of network interfaces.
After SPDK iSCSI target initialization finishes,
interfaces configured within VPP will be available to be configured as portal addresses.
Please refer to @ref iscsi_rpc.
# iSCSI Hotplug {#iscsi_hotplug}
At the iSCSI level, we provide the following support for Hotplug:
1. bdev/nvme:
At the bdev/nvme level, we start one hotplug monitor which will call
spdk_nvme_probe() periodically to get the hotplug events. We provide the
private attach_cb and remove_cb for spdk_nvme_probe(). For the attach_cb,
we will create the block device base on the NVMe device attached, and for the
remove_cb, we will unregister the block device, which will also notify the
upper level stack (for iSCSI target, the upper level stack is scsi/lun) to
handle the hot-remove event.
At the bdev/nvme level, we start one hotplug monitor which will call
spdk_nvme_probe() periodically to get the hotplug events. We provide the
private attach_cb and remove_cb for spdk_nvme_probe(). For the attach_cb,
we will create the block device base on the NVMe device attached, and for the
remove_cb, we will unregister the block device, which will also notify the
upper level stack (for iSCSI target, the upper level stack is scsi/lun) to
handle the hot-remove event.
2. scsi/lun:
When the LUN receive the hot-remove notification from block device layer,
the LUN will be marked as removed, and all the IOs after this point will
return with check condition status. Then the LUN starts one poller which will
wait for all the commands which have already been submitted to block device to
return back; after all the commands return back, the LUN will be deleted.
When the LUN receive the hot-remove notification from block device layer,
the LUN will be marked as removed, and all the IOs after this point will
return with check condition status. Then the LUN starts one poller which will
wait for all the commands which have already been submitted to block device to
return back; after all the commands return back, the LUN will be deleted.
## Known bugs and limitations {#iscsi_hotplug_bugs}
For write command, if you want to test hotplug with write command which will
cause r2t, for example 1M size IO, it will crash the iscsi tgt.
For read command, if you want to test hotplug with large read IO, for example 1M
size IO, it will probably crash the iscsi tgt.
@sa spdk_nvme_probe
# iSCSI Login Redirection {#iscsi_login_redirection}
The SPDK iSCSI target application supports iSCSI login redirection feature.
A portal refers to an IP address and TCP port number pair, and a portal group
contains a set of portals. Users for the SPDK iSCSI target application configure
portals through portal groups.
To support login redirection feature, we utilize two types of portal groups,
public portal group and private portal group.
The SPDK iSCSI target application usually has a discovery portal. The discovery
portal is connected by an initiator to get a list of targets, as well as the list
of portals on which these target may be accessed, by a discovery session.
Public portal groups have their portals returned by a discovery session. Private
portal groups do not have their portals returned by a discovery session. A public
portal group may optionally have a redirect portal for non-discovery logins for
each associated target. This redirect portal must be from a private portal group.
Initiators configure portals in public portal groups as target portals. When an
initator logs in to a target through a portal in an associated public portal group,
the target sends a temporary redirection response with a redirect portal. Then the
initiator logs in to the target again through the redirect portal.
Users set a portal group to public or private at creation using the
`iscsi_create_portal_group` RPC, associate portal groups with a target using the
`iscsi_create_target_node` RPC or the `iscsi_target_node_add_pg_ig_maps` RPC,
specify a up-to-date redirect portal in a public portal group for a target using
the `iscsi_target_node_set_redirect` RPC, and terminate the corresponding connections
by asynchronous logout request using the `iscsi_target_node_request_logout` RPC.
Typically users will use the login redirection feature in scale out iSCSI target
system, which runs multiple SPDK iSCSI target applications.

File diff suppressed because it is too large Load Diff

View File

@ -1,51 +0,0 @@
# JSON-RPC Remote access {#jsonrpc_proxy}
SPDK provides a sample python script `rpc_http_proxy.py`, that provides http server which listens for JSON objects from users. It uses HTTP POST method to receive JSON objects including methods and parameters described in this chapter.
## Parameters
Name | Optional | Type | Description
----------------------- | -------- | ----------- | -----------
server IP | Required | string | IP address that JSON objects shall be received on
server port | Required | number | Port number that JSON objects shall be received on
user name | Required | string | User name that will be used for authentication
password | Required | string | Password that will be used for authentication
RPC listen address | Optional | string | Path to SPDK JSON RPC socket. Default: /var/tmp/spdk.sock
## Example usage
`spdk/scripts/rpc_http_proxy.py 192.168.0.2 8000 user password`
## Returns
Error 401 - missing or incorrect user and/or password.
Error 400 - wrong JSON syntax or incorrect JSON method
Status 200 with resultant JSON object included on success.
## Client side
Below is a sample python script acting as a client side. It sends `bdev_get_bdevs` method with optional `name` parameter and prints JSON object returned from remote_rpc script.
~~~
import json
import requests
if __name__ == '__main__':
payload = {'id':1, 'method': 'bdev_get_bdevs', 'params': {'name': 'Malloc0'}}
url = 'http://192.168.0.2:8000/'
req = requests.post(url,
data=json.dumps(payload),
auth=('user', 'password'),
verify=False,
timeout=30)
print (req.json())
~~~
Output:
~~~
python client.py
[{u'num_blocks': 2621440, u'name': u'Malloc0', u'uuid': u'fb57e59c-599d-42f1-8b89-3e46dbe12641', u'claimed': True, u'driver_specific': {}, u'supported_io_types': {u'reset': True, u'nvme_admin': False, u'unmap': True, u'read': True, u'nvme_io': False, u'write': True, u'flush': True, u'write_zeroes': True}, u'qos_ios_per_sec': 0, u'block_size': 4096, u'product_name': u'Malloc disk', u'aliases': []}]
~~~

View File

@ -1,213 +0,0 @@
# SPDK Libraries {#libraries}
The SPDK repository is, first and foremost, a collection of high-performance
storage-centric software libraries. With this in mind, much care has been taken
to ensure that these libraries have consistent and robust naming and versioning
conventions. The libraries themselves are also divided across two directories
(`lib` and `module`) inside of the SPDK repository in a deliberate way to prevent
mixing of SPDK event framework dependent code and lower level libraries. This document
is aimed at explaining the structure, naming conventions, versioning scheme, and use cases
of the libraries contained in these two directories.
# Directory Structure {#structure}
The SPDK libraries are divided into two directories. The `lib` directory contains the base libraries that
compose SPDK. Some of these base libraries define plug-in systems. Instances of those plug-ins are called
modules and are located in the `module` directory. For example, the `spdk_sock` library is contained in the
`lib` directory while the implementations of socket abstractions, `sock_posix` and `sock_uring`
are contained in the `module` directory.
## lib {#lib}
The libraries in the `lib` directory can be readily divided into four categories:
- Utility Libraries: These libraries contain basic, commonly used functions that make more complex
libraries easier to implement. For example, `spdk_log` contains macro definitions that provide a
consistent logging paradigm and `spdk_json` is a general purpose JSON parsing library.
- Protocol Libraries: These libraries contain the building blocks for a specific service. For example,
`spdk_nvmf` and `spdk_vhost` each define the storage protocols after which they are named.
- Storage Service Libraries: These libraries provide a specific abstraction that can be mapped to somewhere
between the physical drive and the filesystem level of your typical storage stack. For example `spdk_bdev`
provides a general block device abstraction layer, `spdk_lvol` provides a logical volume abstraction,
`spdk_blobfs` provides a filesystem abstraction, and `spdk_ftl` provides a flash translation layer
abstraction.
- System Libraries: These libraries provide system level services such as a JSON based RPC service
(see `spdk_jsonrpc`) and thread abstractions (see `spdk_thread`). The most notable library in this category
is the `spdk_env_dpdk` library which provides a shim for the underlying Data Plane Development Kit (DPDK)
environment and provides services like memory management.
The one library in the `lib` directory that doesn't fit into the above classification is the `spdk_event` library.
This library defines a framework used by the applications contained in the `app` and `example` directories. Much
care has been taken to keep the SPDK libraries independent from this framework. The libraries in `lib` are engineered
to allow plugging directly into independent application frameworks such as Seastar or libuv with minimal effort.
Currently there are two exceptions in the `lib` directory which still rely on `spdk_event`, `spdk_vhost` and `spdk_iscsi`.
There are efforts underway to remove all remaining dependencies these libraries have on the `spdk_event` library.
Much like the `spdk_event` library, the `spdk_env_dpdk` library has been architected in such a way that it
can be readily replaced by an alternate environment shim. More information on replacing the `spdk_env_dpdk`
module and the underlying `dpdk` environment can be found in the [environment](#env_replacement) section.
## module {#module}
The component libraries in the `module` directory represent specific implementations of the base libraries in
the `lib` directory. As with the `lib` directory, much care has been taken to avoid dependencies on the
`spdk_event` framework except for those libraries which directly implement the `spdk_event` module plugin system.
There are seven sub-directories in the `module` directory which each hold a different class of libraries. These
sub-directories can be divided into two types.
- plug-in libraries: These libraries are explicitly tied to one of the libraries in the `lib` directory and
are registered with that library at runtime by way of a specific constructor function. The parent library in
the `lib` directory then manages the module directly. These types of libraries each implement a function table
defined by their parent library. The following table shows these directories and their corresponding parent
libraries:
<center>
| module directory | parent library | dependent on event library |
|------------------|----------------|----------------------------|
| module/accel | spdk_accel | no |
| module/bdev | spdk_bdev | no |
| module/event | spdk_event | yes |
| module/sock | spdk_sock | no |
</center>
- Free libraries: These libraries are highly dependent upon a library in the `lib` directory but are not
explicitly registered to that library via a constructor. The libraries in the `blob`, `blobfs`, and `env_dpdk`
directories fall into this category. None of the libraries in this category depend explicitly on the
`spdk_event` library.
# Library Conventions {#conventions}
The SPDK libraries follow strict conventions for naming functions, logging, versioning, and header files.
## Headers {#headers}
All public SPDK header files exist in the `include` directory of the SPDK repository. These headers
are divided into two sub-directories.
`include/spdk` contains headers intended to be used by consumers of the SPDK libraries. All of the
functions, variables, and types in these functions are intended for public consumption. Multiple headers
in this directory may depend upon the same underlying library and work together to expose different facets
of the library. The `spdk_bdev` library, for example, is exposed in three different headers. `bdev_module.h`
defines the interfaces a bdev module library would need to implement, `bdev.h` contains general block device
functions that would be used by an application consuming block devices exposed by SPDK, and `bdev_zone.h`
exposes zoned bdev specific functions. Many of the other libraries exhibit a similar behavior of splitting
headers between consumers of the library and those wishing to register a module with that library.
`include/spdk_internal`, as its name suggests contains header files intended to be consumed only by other
libraries inside of the SPDK repository. These headers are typically used for sharing lower level functions
between two libraries that both require similar functions. For example `spdk_internal/nvme_tcp.h` contains
low level tcp functions used by both the `spdk_nvme` and `spdk_nvmf` libraries. These headers are *NOT*
intended for general consumption.
Other header files contained directly in the `lib` and `module` directories are intended to be consumed *only*
by source files of their corresponding library. Any symbols intended to be used across libraries need to be
included in a header in the `include/spdk_internal` directory.
## Naming Conventions {#naming}
All public types and functions in SPDK libraries begin with the prefix `spdk_`. They are also typically
further namespaced using the spdk library name. The rest of the function or type name describes its purpose.
There are no internal library functions that begin with the `spdk_` prefix. This naming convention is
enforced by the SPDK continuous Integration testing. Functions not intended for use outside of their home
library should be namespaced with the name of the library only.
## Map Files {#map}
SPDK libraries can be built as both static and shared object files. To facilitate building libraries as shared
objects, each one has a corresponding map file (e.g. `spdk_nvmf` relies on `spdk_nvmf.map`). SPDK libraries
not exporting any symbols rely on a blank map file located at `mk/spdk_blank.map`.
# SPDK Shared Objects {#shared_objects}
## Shared Object Versioning {#versioning}
SPDK shared objects follow a semantic versioning pattern with a major and minor version. Any changes which
break backwards compatibility (symbol removal or change) will cause a shared object major increment and
backwards compatible changes will cause a minor version increment; i.e. an application that relies on
`libspdk_nvmf.so.3.0` will be compatible with `libspdk_nvmf.so.3.1` but not with `libspdk_nvmf.so.4.0`.
Shared object versions are incremented only once between each release cycle. This means that at most, the
major version of each SPDK shared library will increment only once between each SPDK release.
There are currently no guarantees in SPDK of ABI compatibility between two major SPDK releases.
The point releases of an LTS release will be ABI compatible with the corresponding LTS major release.
Shared objects are versioned independently of one another. This means that `libspdk_nvme.so.3.0` and
`libspdk_bdev.so.3.0` do not necessarily belong to the same release. This also means that shared objects
with the same suffix are not necessarily compatible with each other. It is important to source all of your
SPDK libraries from the same repository and version to ensure inter-library compatibility.
## Linking to Shared Objects {#so_linking}
Shared objects in SPDK are created on a per-library basis. There is a top level `libspdk.so` object
which is a linker script. It simply contains references to all of the other spdk shared objects.
There are essentially two ways of linking to SPDK libraries.
1. An application can link to the top level shared object library as follows:
~~~{.sh}
gcc -o my_app ./my_app.c -lspdk -lspdk_env_dpdk -ldpdk
~~~
2. An application can link to only a subset of libraries by linking directly to the ones it relies on:
~~~{.sh}
gcc -o my_app ./my_app.c -lpassthru_external -lspdk_event_bdev -lspdk_bdev -lspdk_bdev_malloc
-lspdk_log -lspdk_thread -lspdk_util -lspdk_event -lspdk_env_dpdk -ldpdk
~~~
In the second instance, please note that applications need only link to the libraries upon which they
directly depend. All SPDK libraries have their dependencies specified at object compile time. This means
that when linking to `spdk_net`, one does not also have to specify `spdk_log`, `spdk_util`, `spdk_json`,
`spdk_jsonrpc`, and `spdk_rpc`. However, this dependency inclusion does not extend to the application
itself; i.e. if an application directly uses symbols from both `spdk_bdev` and `spdk_log`, both libraries
will need to be supplied to the linker when linking the application even though `spdk_log` is a dependency
of `spdk_bdev`.
Please also note that when linking to SPDK libraries, both the spdk_env shim library and the env library
itself need to be supplied to the linker. In the examples above, these are `spdk_env_dpdk` and `dpdk`
respectively. This was intentional and allows one to easily swap out both the environment and the
environment shim.
## Replacing the env abstraction {#env_replacement}
SPDK depends on an environment abstraction that provides crucial pinned memory management and PCIe
bus management operations. The interface for this environment abstraction is defined in the
`include/env.h` header file. The default implementation of this environment is located in `spdk_env_dpdk`.
This abstraction in turn relies upon the DPDK libraries. This two part implementation was deliberate
and allows for easily swapping out the dpdk version upon which the spdk libraries rely without making
modifications to the spdk source directly.
Any environment can replace the `spdk_env_dpdk` environment by implementing the `include/env.h` header
file. The environment can either be implemented wholesale in a single library or as a two-part
shim/implementation library system.
~~~{.sh}
# single library
gcc -o my_app ./my_app.c -lspdk -lcustom_env_implementation
# two libraries
gcc -o my_app ./my_app.c -lspdk -lcustom_env_shim -lcustom_env_implementation
~~~
# SPDK Static Objects {#static_objects}
SPDK static objects are compiled by default even when no parameters are supplied to the build system.
Unlike SPDK shared objects, the filename does not contain any versioning semantics. Linking against
static objects is similar to shared objects but will always require the use of `-Wl,--whole-archive`
as argument. This is due to the use of constructor functions in SPDK such as those to register
NVMe transports.
Due to the lack of versioning semantics, it is not recommended to install static libraries system wide.
Instead the path to these static libraries should be added as argument at compile time using
`-L/path/to/static/libs`. The use of static objects instead of shared objects can also be forced
through `-Wl,-Bsatic`, otherwise some compilers might prefer to use the shared objects if both
are available.
~~~{.sh}
gcc -o my_app ./my_app.c -L/path/to/static/libs -Wl,--whole-archive -Wl,-Bstatic -lpassthru_external
-lspdk_event_bdev -lspdk_bdev -lspdk_bdev_malloc -lspdk_log -lspdk_thread -lspdk_util -lspdk_event
-lspdk_env_dpdk -Wl,--no-whole-archive -Wl,-Bdynamic -pthread -ldpdk
~~~

View File

@ -10,7 +10,6 @@ The Logical Volumes library is a flexible storage space management system. It pr
* Type name: struct spdk_lvol_store
A logical volume store uses the super blob feature of blobstore to hold uuid (and in future other metadata). Blobstore types are implemented in blobstore itself, and saved on disk. An lvolstore will generate a UUID on creation, so that it can be uniquely identified from other lvolstores.
By default when creating lvol store data region is unmapped. Optional --clear-method parameter can be passed on creation to change that behavior to writing zeroes or performing no operation.
## Logical volume {#lvol}
@ -28,7 +27,6 @@ Representation of an SPDK block device (spdk_bdev) with an lvol implementation.
A logical volume block device translates generic SPDK block device I/O (spdk_bdev_io) operations into the equivalent SPDK blob operations. Combination of lvol name and lvolstore name gives lvol_bdev alias name in a form "lvs_name/lvol_name". block_size of the created bdev is always 4096, due to blobstore page size. Cluster_size is configurable by parameter.
Size of the new bdev will be rounded up to nearest multiple of cluster_size.
By default lvol bdevs claim part of lvol store equal to their set size. When thin provision option is enabled, no space is taken from lvol store until data is written to lvol bdev.
By default when deleting lvol bdev or resizing down, allocated clusters are unmapped. Optional --clear-method parameter can be passed on creation to change that behavior to writing zeroes or performing no operation.
## Thin provisioning {#lvol_thin_provisioning}
@ -55,10 +53,7 @@ The write operation is performed as shown in the diagram below:
![Writing cluster to the clone](lvol_clone_snapshot_write.svg)
User may also create clone of existing snapshot that will be thin provisioned and it will behave in the same way as logical volume from which snapshot is created.
There is no limit of clones and snapshots that may be created as long as there is enough space on logical volume store. Snapshots are read only. Clones may be created only from snapshots or read only logical volumes.
A snapshot can be removed only if there is a single clone on top of it. The relation chain will be updated accordingly. The cluster map of clone and snapshot will be merged and entries for unallocated clusters in the clone
will be updated with addresses from the snapshot cluster map. The entire operation modifies metadata only - no data is copied during this process.
There is no limit of clones and snapshots that may be created as long as there is enough space on logical volume store. Snapshots are read only. Clones may be created only from snapshots.
## Inflation {#lvol_inflation}
@ -68,8 +63,7 @@ Blobs can be inflated to copy data from backing devices (e.g. snapshots) and all
## Decoupling {#lvol_decoupling}
Blobs can be decoupled from their parent blob by copying data from backing devices (e.g. snapshots) for all allocated clusters. Remaining unallocated clusters are kept thin provisioned.
Note: When decouple is performed, only single dependency is removed. To remove all dependencies in a chain of blobs depending on each other, multiple calls need to be issued.
Blobs can be decoupled from all dependencies by copying data from backing devices (e.g. snapshots) for all allocated clusters. Remainig unallocated clusters are kept thin provisioned.
# Configuring Logical Volumes
@ -80,30 +74,29 @@ There is no static configuration available for logical volumes. All configuratio
RPC regarding lvolstore:
```
bdev_lvol_create_lvstore [-h] [-c CLUSTER_SZ] bdev_name lvs_name
construct_lvol_store [-h] [-c CLUSTER_SZ] bdev_name lvs_name
Constructs lvolstore on specified bdev with specified name. During
construction bdev is unmapped at initialization and all data is
erased. Then original bdev is claimed by
SPDK, but no additional spdk bdevs are created.
Returns uuid of created lvolstore.
Optional parameters:
Optional paramters:
-h show help
-c CLUSTER_SZ Specifies the size of cluster. By default its 4MiB.
--clear-method specify data region clear method "none", "unmap" (default), "write_zeroes"
bdev_lvol_delete_lvstore [-h] [-u UUID] [-l LVS_NAME]
destroy_lvol_store [-h] [-u UUID] [-l LVS_NAME]
Destroy lvolstore on specified bdev. Removes lvolstore along with lvols on
it. User can identify lvol store by UUID or its name. Note that destroying
lvolstore requires using this call, while deleting single lvol requires
using bdev_lvol_delete rpc call.
using destroy_lvol_bdev rpc call.
optional arguments:
-h, --help show help
bdev_lvol_get_lvstores [-h] [-u UUID] [-l LVS_NAME]
get_lvol_stores [-h] [-u UUID] [-l LVS_NAME]
Display current logical volume store list
optional arguments:
-h, --help show help
-u UUID, --uuid UUID show details of specified lvol store
-l LVS_NAME, --lvs_name LVS_NAME show details of specified lvol store
bdev_lvol_rename_lvstore [-h] old_name new_name
rename_lvol_store [-h] old_name new_name
Change logical volume store name
optional arguments:
-h, --help show this help message and exit
@ -112,48 +105,43 @@ bdev_lvol_rename_lvstore [-h] old_name new_name
RPC regarding lvol and spdk bdev:
```
bdev_lvol_create [-h] [-u UUID] [-l LVS_NAME] [-t] [-c CLEAR_METHOD] lvol_name size
construct_lvol_bdev [-h] [-u UUID] [-l LVS_NAME] [-t] lvol_name size
Creates lvol with specified size and name on lvolstore specified by its uuid
or name. Then constructs spdk bdev on top of that lvol and presents it as spdk bdev.
User may use -t switch to create thin provisioned lvol.
Returns the name of new spdk bdev
optional arguments:
-h, --help show help
-c, --clear-method specify data clusters clear method "none", "unmap" (default), "write_zeroes"
bdev_get_bdevs [-h] [-b NAME]
get_bdevs [-h] [-b NAME]
User can view created bdevs using this call including those created on top of lvols.
optional arguments:
-h, --help show help
-b NAME, --name NAME Name of the block device. Example: Nvme0n1
bdev_lvol_delete [-h] bdev_name
Deletes a logical volume previously created by bdev_lvol_create.
destroy_lvol_bdev [-h] bdev_name
Deletes a logical volume previously created by construct_lvol_bdev.
optional arguments:
-h, --help show help
bdev_lvol_snapshot [-h] lvol_name snapshot_name
snapshot_lvol_bdev [-h] lvol_name snapshot_name
Create a snapshot with snapshot_name of a given lvol bdev.
optional arguments:
-h, --help show help
bdev_lvol_clone [-h] snapshot_name clone_name
clone_lvol_bdev [-h] snapshot_name clone_name
Create a clone with clone_name of a given lvol snapshot.
optional arguments:
-h, --help show help
bdev_lvol_rename [-h] old_name new_name
rename_lvol_bdev [-h] old_name new_name
Change lvol bdev name
optional arguments:
-h, --help show help
bdev_lvol_resize [-h] name size
resize_lvol_bdev [-h] name size
Resize existing lvol bdev
optional arguments:
-h, --help show help
bdev_lvol_set_read_only [-h] name
Mark lvol bdev as read only
optional arguments:
-h, --help show help
bdev_lvol_inflate [-h] name
inflate_lvol_bdev [-h] name
Inflate lvol bdev
optional arguments:
-h, --help show help
bdev_lvol_decouple_parent [-h] name
decouple_parent_lvol_bdev [-h] name
Decouple parent of a logical volume
optional arguments:
-h, --help show help

View File

@ -1,4 +1,4 @@
# Direct Memory Access (DMA) From User Space {#memory}
# Memory Management for User Space Drivers {#memory}
The following is an attempt to explain why all data buffers passed to SPDK must
be allocated using spdk_dma_malloc() or its siblings, and why SPDK relies on
@ -85,7 +85,10 @@ allocating `hugepages` (by default, 2MiB). The Linux kernel treats hugepages
differently than regular 4KiB pages. Specifically, the operating system will
never change their physical location. This is not by intent, and so things
could change in future versions, but it is true today and has been for a number
of years (see the later section on the IOMMU for a future-proof solution).
of years (see the later section on the IOMMU for a future-proof solution). DPDK
goes through great pains to allocate hugepages such that it can string together
the longest runs of physical pages possible, such that it can accomodate
physically contiguous allocations larger than a single page.
With this explanation, hopefully it is now clear why all data buffers passed to
SPDK must be allocated using spdk_dma_malloc() or its siblings. The buffers

View File

@ -1,4 +1,3 @@
# Miscellaneous {#misc}
- @subpage peer_2_peer
- @subpage containers

5
doc/modules.md Normal file
View File

@ -0,0 +1,5 @@
# Modules {#modules}
- @subpage nvme
- @subpage ioat
- @subpage virtio

View File

@ -1,40 +0,0 @@
# Notify library {#notify}
The notify library implements an event bus, allowing users to register, generate,
and listen for events. For example, the bdev library may register a new event type
for bdev creation. Any time a bdev is created, it "sends" the event. Consumers of
that event may periodically poll for new events to retrieve them.
The event bus is implemented as a circular ring of fixed size. If event consumers
do not poll frequently enough, events may be lost. All events are identified by a
monotonically increasing integer, so missing events may be detected, although
not recovered.
# Register event types {#notify_register}
During initialization the sender library should register its own event types using
`spdk_notify_type_register(const char *type)`. Parameter 'type' is the name of
notification type.
# Get info about events {#notify_get_info}
A consumer can get information about the available event types during runtime using
`spdk_notify_foreach_type`, which iterates over registered notification types and
calls a callback on each of them, so that user can produce detailed information
about notification.
# Get new events {#notify_listen}
A consumer can get events by calling function `spdk_notify_foreach_event`.
The caller should specify last received event and the maximum number of invocations.
There might be multiple consumers of each event. The event bus is implemented as a
circular buffer, so older events may be overwritten by newer ones.
# Send events {#notify_send}
When an event occurs, a library can invoke `spdk_notify_send` with two strings.
One containing the type of the event, like "spdk_bdev_register", second with context,
for example "Nvme0n1"
# RPC Calls {#rpc_calls}
See [JSON-RPC documentation](jsonrpc.md/#rpc_notify_get_types)

82
doc/nvme-cli.md Normal file
View File

@ -0,0 +1,82 @@
# nvme-cli {#nvme-cli}
# nvme-cli with SPDK Getting Started Guide
Now nvme-cli can support both kernel driver and SPDK user mode driver for most of its available commands and
Intel specific commands.
1. Clone the nvme-cli repository from the SPDK GitHub fork. Make sure you check out the spdk branch.
~~~{.sh}
git clone -b spdk https://github.com/spdk/nvme-cli.git
~~~
2. Clone the SPDK repository from https://github.com/spdk/spdk under the nvme-cli folder.
3. Refer to the "README.md" under SPDK folder to properly build SPDK.
4. Refer to the "README.md" under nvme-cli folder to properly build nvme-cli.
5. Execute "<spdk_folder>/scripts/setup.sh" with the "root" account.
6. Update the "spdk.conf" file under nvme-cli folder to properly configure the SPDK. Notes as following:
~~~{.sh}
spdk=0
Default to 0 (off) and change to 1 (on) after switching to SPDK via "<spdk_folder>/scripts/setup.sh".
core_mask=0x100
Default to use the 9th core for the nvme-cli running.
mem_size=512
Default to use 512MB memory allocated.
shm_id=1
Default to 1. If other running SPDK application has configured with this same 1 shm_id.
This nvme-cli will access those devices from that running SPDK application.
~~~
7. Run the "./nvme list" command to get the domain:bus:device.function for each found NVMe SSD.
8. Run the other nvme commands with domain:bus:device.function instead of "/dev/nvmeX" for the specified device.
~~~{.sh}
Example: ./nvme smart-log 0000:01:00.0
~~~
9. Run the "./nvme intel" commands for Intel specific commands against Intel NVMe SSD.
~~~{.sh}
Example: ./nvme intel internal-log 0000:08:00.0
~~~
10. Execute "<spdk_folder>/scripts/setup.sh reset" with the "root" account and update "spdk=0" in spdk.conf to
use the kernel driver if wanted.
## Use scenarios
### Run as the only SPDK application on the system
1. Modify the spdk to 1 in spdk.conf. If the system has fewer cores or less memory, update the spdk.conf accordingly.
### Run together with other running SPDK applications on shared NVMe SSDs
1. For the other running SPDK application, start with the parameter like "-i 1" to have the same "shm_id".
2. Use the default spdk.conf setting where "shm_id=1" to start the nvme-cli.
3. If other SPDK applications run with different shm_id parameter, update the "spdk.conf" accordingly.
### Run with other running SPDK applications on non-shared NVMe SSDs
1. Properly configure the other running SPDK applications.
~~~{.sh}
a. Only access the NVMe SSDs it wants.
b. Allocate a fixed number of memory instead of all available memory.
~~~
2. Properly configure the spdk.conf setting for nvme-cli.
~~~{.sh}
a. Not access the NVMe SSDs from other SPDK applications.
b. Change the mem_size to a proper size.
~~~
## Note
1. To run the newly built nvme-cli, either explicitly run as "./nvme" or added it into the $PATH to avoid
invoke other already installed version.
2. To run the newly built nvme-cli with SPDK support in arbitrary directory, copy "spdk.conf" to that
directory from the nvme cli folder and update the configuration as suggested.

View File

@ -9,7 +9,6 @@
* @ref nvme_fabrics_host
* @ref nvme_multi_process
* @ref nvme_hotplug
* @ref nvme_cuse
# Introduction {#nvme_intro}
@ -59,24 +58,24 @@ demonstrate how to use perf.
Example: Using perf for 4K 100% Random Read workload to a local NVMe SSD for 300 seconds
~~~{.sh}
perf -q 128 -o 4096 -w randread -r 'trtype:PCIe traddr:0000:04:00.0' -t 300
perf -q 128 -s 4096 -w randread -r 'trtype:PCIe traddr:0000:04:00.0' -t 300
~~~
Example: Using perf for 4K 100% Random Read workload to a remote NVMe SSD exported over the network via NVMe-oF
~~~{.sh}
perf -q 128 -o 4096 -w randread -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.100.8 trsvcid:4420' -t 300
perf -q 128 -s 4096 -w randread -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.100.8 trsvcid:4420' -t 300
~~~
Example: Using perf for 4K 70/30 Random Read/Write mix workload to all local NVMe SSDs for 300 seconds
~~~{.sh}
perf -q 128 -o 4096 -w randrw -M 70 -t 300
perf -q 128 -s 4096 -w randrw -M 70 -t 300
~~~
Example: Using perf for extended LBA format CRC guard test to a local NVMe SSD,
users must write to the SSD before reading the LBA from SSD
~~~{.sh}
perf -q 1 -o 4096 -w write -r 'trtype:PCIe traddr:0000:04:00.0' -t 300 -e 'PRACT=0,PRCKH=GUARD'
perf -q 1 -o 4096 -w read -r 'trtype:PCIe traddr:0000:04:00.0' -t 200 -e 'PRACT=0,PRCKH=GUARD'
perf -q 1 -s 4096 -w write -r 'trtype:PCIe traddr:0000:04:00.0' -t 300 -e 'PRACT=0,PRCKH=GUARD'
perf -q 1 -s 4096 -w read -r 'trtype:PCIe traddr:0000:04:00.0' -t 200 -e 'PRACT=0,PRCKH=GUARD'
~~~
# Public Interface {#nvme_interface}
@ -117,38 +116,6 @@ spdk_nvme_qpair_process_completions().
@sa spdk_nvme_ns_cmd_read, spdk_nvme_ns_cmd_write, spdk_nvme_ns_cmd_dataset_management,
spdk_nvme_ns_cmd_flush, spdk_nvme_qpair_process_completions
### Fused operations {#nvme_fuses}
To "fuse" two commands, the first command should have the SPDK_NVME_IO_FLAGS_FUSE_FIRST
io flag set, and the next one should have the SPDK_NVME_IO_FLAGS_FUSE_SECOND.
In addition, the following rules must be met to execute two commands as an atomic unit:
- The commands shall be inserted next to each other in the same submission queue.
- The LBA range, should be the same for the two commands.
E.g. To send fused compare and write operation user must call spdk_nvme_ns_cmd_compare
followed with spdk_nvme_ns_cmd_write and make sure no other operations are submitted
in between on the same queue, like in example below:
~~~
rc = spdk_nvme_ns_cmd_compare(ns, qpair, cmp_buf, 0, 1, nvme_fused_first_cpl_cb,
NULL, SPDK_NVME_CMD_FUSE_FIRST);
if (rc != 0) {
...
}
rc = spdk_nvme_ns_cmd_write(ns, qpair, write_buf, 0, 1, nvme_fused_second_cpl_cb,
NULL, SPDK_NVME_CMD_FUSE_SECOND);
if (rc != 0) {
...
}
~~~
The NVMe specification currently defines compare-and-write as a fused operation.
Support for compare-and-write is reported by the controller flag
SPDK_NVME_CTRLR_COMPARE_AND_WRITE_SUPPORTED.
### Scaling Performance {#nvme_scaling}
NVMe queue pairs (struct spdk_nvme_qpair) provide parallel submission paths for
@ -228,10 +195,6 @@ single NVM subsystem directly, the NVMe library will call `probe_cb`
for just that subsystem; this allows the user to skip the discovery step
and connect directly to a subsystem with a known address.
## RDMA Limitations
Please refer to NVMe-oF target's @ref nvmf_rdma_limitations
# NVMe Multi Process {#nvme_multi_process}
This capability enables the SPDK NVMe driver to support multiple processes accessing the
@ -249,10 +212,9 @@ DPDK EAL allows different types of processes to be spawned, each with different
on the hugepage memory used by the applications.
There are two types of processes:
1. a primary process which initializes the shared memory and has full privileges and
2. a secondary process which can attach to the primary process by mapping its shared memory
regions and perform NVMe operations including creating queue pairs.
regions and perform NVMe operations including creating queue pairs.
This feature is enabled by default and is controlled by selecting a value for the shared
memory group ID. This ID is a positive integer and two applications with the same shared
@ -265,146 +227,36 @@ Example: identical shm_id and non-overlapping core masks
[-c core mask for I/O submission/completion]
[-i shared memory group ID]
./perf -q 1 -o 4096 -w randread -c 0x1 -t 60 -i 1
./perf -q 8 -o 131072 -w write -c 0x10 -t 60 -i 1
./perf -q 1 -s 4096 -w randread -c 0x1 -t 60 -i 1
./perf -q 8 -s 131072 -w write -c 0x10 -t 60 -i 1
~~~
## Limitations {#nvme_multi_process_limitations}
1. Two processes sharing memory may not share any cores in their core mask.
2. If a primary process exits while secondary processes are still running, those processes
will continue to run. However, a new primary process cannot be created.
will continue to run. However, a new primary process cannot be created.
3. Applications are responsible for coordinating access to logical blocks.
4. If a process exits unexpectedly, the allocated memory will be released when the last
process exits.
@sa spdk_nvme_probe, spdk_nvme_ctrlr_process_admin_completions
# NVMe Hotplug {#nvme_hotplug}
At the NVMe driver level, we provide the following support for Hotplug:
1. Hotplug events detection:
The user of the NVMe library can call spdk_nvme_probe() periodically to detect
hotplug events. The probe_cb, followed by the attach_cb, will be called for each
new device detected. The user may optionally also provide a remove_cb that will be
called if a previously attached NVMe device is no longer present on the system.
All subsequent I/O to the removed device will return an error.
The user of the NVMe library can call spdk_nvme_probe() periodically to detect
hotplug events. The probe_cb, followed by the attach_cb, will be called for each
new device detected. The user may optionally also provide a remove_cb that will be
called if a previously attached NVMe device is no longer present on the system.
All subsequent I/O to the removed device will return an error.
2. Hot remove NVMe with IO loads:
When a device is hot removed while I/O is occurring, all access to the PCI BAR will
result in a SIGBUS error. The NVMe driver automatically handles this case by installing
a SIGBUS handler and remapping the PCI BAR to a new, placeholder memory location.
This means I/O in flight during a hot remove will complete with an appropriate error
code and will not crash the application.
When a device is hot removed while I/O is occurring, all access to the PCI BAR will
result in a SIGBUS error. The NVMe driver automatically handles this case by installing
a SIGBUS handler and remapping the PCI BAR to a new, placeholder memory location.
This means I/O in flight during a hot remove will complete with an appropriate error
code and will not crash the application.
@sa spdk_nvme_probe
# NVMe Character Devices {#nvme_cuse}
This feature is considered as experimental.
## Design
![NVMe character devices processing diagram](nvme_cuse.svg)
For each controller as well as namespace, character devices are created in the
locations:
~~~{.sh}
/dev/spdk/nvmeX
/dev/spdk/nvmeXnY
...
~~~
Where X is unique SPDK NVMe controller index and Y is namespace id.
Requests from CUSE are handled by pthreads when controller and namespaces are created.
Those pass the I/O or admin commands via a ring to a thread that processes them using
nvme_io_msg_process().
Ioctls that request information attained when attaching NVMe controller receive an
immediate response, without passing them through the ring.
This interface reserves one additional qpair for sending down the I/O for each controller.
## Usage
### Enabling cuse support for NVMe
Cuse support is disabled by default. To enable support for NVMe-CUSE devices first
install required dependencies
~~~{.sh}
sudo scripts/pkgdep.sh --fuse
~~~
Then compile SPDK with "./configure --with-nvme-cuse".
### Creating NVMe-CUSE device
First make sure to prepare the environment (see @ref getting_started).
This includes loading CUSE kernel module.
Any NVMe controller attached to a running SPDK application can be
exposed via NVMe-CUSE interface. When closing SPDK application,
the NVMe-CUSE devices are unregistered.
~~~{.sh}
$ sudo scripts/setup.sh
$ sudo modprobe cuse
$ sudo build/bin/spdk_tgt
# Continue in another session
$ sudo scripts/rpc.py bdev_nvme_attach_controller -b Nvme0 -t PCIe -a 0000:82:00.0
Nvme0n1
$ sudo scripts/rpc.py bdev_nvme_get_controllers
[
{
"name": "Nvme0",
"trid": {
"trtype": "PCIe",
"traddr": "0000:82:00.0"
}
}
]
$ sudo scripts/rpc.py bdev_nvme_cuse_register -n Nvme0
$ ls /dev/spdk/
nvme0 nvme0n1
~~~
### Example of using nvme-cli
Most nvme-cli commands can point to specific controller or namespace by providing a path to it.
This can be leveraged to issue commands to the SPDK NVMe-CUSE devices.
~~~{.sh}
sudo nvme id-ctrl /dev/spdk/nvme0
sudo nvme smart-log /dev/spdk/nvme0
sudo nvme id-ns /dev/spdk/nvme0n1
~~~
Note: `nvme list` command does not display SPDK NVMe-CUSE devices,
see nvme-cli [PR #773](https://github.com/linux-nvme/nvme-cli/pull/773).
### Examples of using smartctl
smartctl tool recognizes device type based on the device path. If none of expected
patterns match, SCSI translation layer is used to identify device.
To use smartctl '-d nvme' parameter must be used in addition to full path to
the NVMe device.
~~~{.sh}
smartctl -d nvme -i /dev/spdk/nvme0
smartctl -d nvme -H /dev/spdk/nvme1
...
~~~
## Limitations
NVMe namespaces are created as character devices and their use may be limited for
tools expecting block devices.
Sysfs is not updated by SPDK.
SPDK NVMe CUSE creates nodes in "/dev/spdk/" directory to explicitly differentiate
from other devices. Tools that only search in the "/dev" directory might not work
with SPDK NVMe CUSE.
SCSI to NVMe Translation Layer is not implemented. Tools that are using this layer to
identify, manage or operate device might not work properly or their use may be limited.

View File

@ -1,123 +0,0 @@
# Submitting I/O to an NVMe Device {#nvme_spec}
## The NVMe Specification
The NVMe specification describes a hardware interface for interacting with
storage devices. The specification includes network transport definitions for
remote storage as well as a hardware register layout for local PCIe devices.
What follows here is an overview of how an I/O is submitted to a local PCIe
device through SPDK.
NVMe devices allow host software (in our case, the SPDK NVMe driver) to allocate
queue pairs in host memory. The term "host" is used a lot, so to clarify that's
the system that the NVMe SSD is plugged into. A queue pair consists of two
queues - a submission queue and a completion queue. These queues are more
accurately described as circular rings of fixed size entries. The submission
queue is an array of 64 byte command structures, plus 2 integers (head and tail
indices). The completion queue is similarly an array of 16 byte completion
structures, plus 2 integers (head and tail indices). There are also two 32-bit
registers involved that are called doorbells.
An I/O is submitted to an NVMe device by constructing a 64 byte command, placing
it into the submission queue at the current location of the submission queue
tail index, and then writing the new index of the submission queue tail to the
submission queue tail doorbell register. It's actually valid to copy a whole set
of commands into open slots in the ring and then write the doorbell just one
time to submit the whole batch.
There is a very detailed description of the command submission and completion
process in the NVMe specification, which is conveniently available from the main
page over at [NVM Express](https://nvmexpress.org).
Most importantly, the command itself describes the operation and also, if
necessary, a location in host memory containing a descriptor for host memory
associated with the command. This host memory is the data to be written on a
write command, or the location to place the data on a read command. Data is
transferred to or from this location using a DMA engine on the NVMe device.
The completion queue works similarly, but the device is instead the one writing
entries into the ring. Each entry contains a "phase" bit that toggles between 0
and 1 on each loop through the entire ring. When a queue pair is set up to
generate interrupts, the interrupt contains the index of the completion queue
head. However, SPDK doesn't enable interrupts and instead polls on the phase
bit to detect completions. Interrupts are very heavy operations, so polling this
phase bit is often far more efficient.
## The SPDK NVMe Driver I/O Path
Now that we know how the ring structures work, let's cover how the SPDK NVMe
driver uses them. The user is going to construct a queue pair at some early time
in the life cycle of the program, so that's not part of the "hot" path. Then,
they'll call functions like spdk_nvme_ns_cmd_read() to perform an I/O operation.
The user supplies a data buffer, the target LBA, and the length, as well as
other information like which NVMe namespace the command is targeted at and which
NVMe queue pair to use. Finally, the user provides a callback function and
context pointer that will be called when a completion for the resulting command
is discovered during a later call to spdk_nvme_qpair_process_completions().
The first stage in the driver is allocating a request object to track the operation. The
operations are asynchronous, so it can't simply track the state of the request
on the call stack. Allocating a new request object on the heap would be far too
slow, so SPDK keeps a pre-allocated set of request objects inside of the NVMe
queue pair object - `struct spdk_nvme_qpair`. The number of requests allocated to
the queue pair is larger than the actual queue depth of the NVMe submission
queue because SPDK supports a couple of key convenience features. The first is
software queueing - SPDK will allow the user to submit more requests than the
hardware queue can actually hold and SPDK will automatically queue in software.
The second is splitting. SPDK will split a request for many reasons, some of
which are outlined next. The number of request objects is configurable at queue
pair creation time and if not specified, SPDK will pick a sensible number based
on the hardware queue depth.
The second stage is building the 64 byte NVMe command itself. The command is
built into memory embedded into the request object - not directly into an NVMe
submission queue slot. Once the command has been constructed, SPDK attempts to
obtain an open slot in the NVMe submission queue. For each element in the
submission queue an object called a tracker is allocated. The trackers are
allocated in an array, so they can be quickly looked up by an index. The tracker
itself contains a pointer to the request currently occupying that slot. When a
particular tracker is obtained, the command's CID value is updated with the
index of the tracker. The NVMe specification provides that CID value in the
completion, so the request can be recovered by looking up the tracker via the
CID value and then following the pointer.
Once a tracker (slot) is obtained, the data buffer associated with it is
processed to build a PRP list. That's essentially an NVMe scatter gather list,
although it is a bit more restricted. The user provides SPDK with the virtual
address of the buffer, so SPDK has to go do a page table look up to find the
physical address (pa) or I/O virtual addresses (iova) backing that virtual
memory. A virtually contiguous memory region may not be physically contiguous,
so this may result in a PRP list with multiple elements. Sometimes this may
result in a set of physical addresses that can't actually be expressed as a
single PRP list, so SPDK will automatically split the user operation into two
separate requests transparently. For more information on how memory is managed,
see @ref memory.
The reason the PRP list is not built until a tracker is obtained is because the
PRP list description must be allocated in DMA-able memory and can be quite
large. Since SPDK typically allocates a large number of requests, we didn't want
to allocate enough space to pre-build the worst case scenario PRP list,
especially given that the common case does not require a separate PRP list at
all.
Each NVMe command has two PRP list elements embedded into it, so a separate PRP
list isn't required if the request is 4KiB (or if it is 8KiB and aligned
perfectly). Profiling shows that this section of the code is not a major
contributor to the overall CPU use.
With a tracker filled out, SPDK copies the 64 byte command into the actual NVMe
submission queue slot and then rings the submission queue tail doorbell to tell
the device to go process it. SPDK then returns back to the user, without waiting
for a completion.
The user can periodically call `spdk_nvme_qpair_process_completions()` to tell
SPDK to examine the completion queue. Specifically, it reads the phase bit of
the next expected completion slot and when it flips, looks at the CID value to
find the tracker, which points at the request object. The request object
contains a function pointer that the user provided initially, which is then
called to complete the command.
The `spdk_nvme_qpair_process_completions()` function will keep advancing to the
next completion slot until it runs out of completions, at which point it will
write the completion queue head doorbell to let the device know that it can use
the completion queue slots for new completions and return.

View File

@ -1,15 +1,17 @@
# NVMe over Fabrics Target {#nvmf}
@sa @ref nvme_fabrics_host
@sa @ref nvmf_tgt_tracepoints
# NVMe-oF Target Getting Started Guide {#nvmf_getting_started}
The SPDK NVMe over Fabrics target is a user space application that presents block devices over a fabrics
such as Ethernet, Infiniband or Fibre Channel. SPDK currently supports RDMA and TCP transports.
The NVMe over Fabrics target is a user space application that presents block devices over the
network using RDMA. It requires an RDMA-capable NIC with its corresponding OFED software package
installed to run. The target should work on all flavors of RDMA, but it is currently tested against
Mellanox NICs (RoCEv2) and Chelsio NICs (iWARP).
The NVMe over Fabrics specification defines subsystems that can be exported over different transports.
SPDK has chosen to call the software that exports these subsystems a "target", which is the term used
The NVMe over Fabrics specification defines subsystems that can be exported over the network. SPDK
has chosen to call the software that exports these subsystems a "target", which is the term used
for iSCSI. The specification refers to the "client" that connects to the target as a "host". Many
people will also refer to the host as an "initiator", which is the equivalent thing in iSCSI
parlance. SPDK will try to stick to the terms "target" and "host" to match the specification.
@ -21,19 +23,20 @@ If you want to kill the application using signal, make sure use the SIGTERM, the
will release all the share memory resource before exit, the SIGKILL will make the share memory
resource have no chance to be released by application, you may need to release the resource manually.
## RDMA transport support {#nvmf_rdma_transport}
## Prerequisites {#nvmf_prereqs}
It requires an RDMA-capable NIC with its corresponding OFED (OpenFabrics Enterprise Distribution)
software package installed to run. Maybe OS distributions provide packages, but OFED is also
available [here](https://downloads.openfabrics.org/OFED/).
### Prerequisites {#nvmf_prereqs}
To build nvmf_tgt with the RDMA transport, there are some additional dependencies,
which can be install using pkgdep.sh script.
This guide starts by assuming that you can already build the standard SPDK distribution on your
platform. By default, the NVMe over Fabrics target is not built. To build nvmf_tgt there are some
additional dependencies.
Fedora:
~~~{.sh}
sudo scripts/pkgdep.sh --rdma
dnf install libibverbs-devel librdmacm-devel
~~~
Ubuntu:
~~~{.sh}
apt-get install libibverbs-dev librdmacm-dev
~~~
Then build SPDK with RDMA enabled:
@ -43,18 +46,17 @@ Then build SPDK with RDMA enabled:
make
~~~
Once built, the binary will be in `build/bin`.
Once built, the binary will be in `app/nvmf_tgt`.
### Prerequisites for InfiniBand/RDMA Verbs {#nvmf_prereqs_verbs}
## Prerequisites for InfiniBand/RDMA Verbs {#nvmf_prereqs_verbs}
Before starting our NVMe-oF target with the RDMA transport we must load the InfiniBand and RDMA modules
that allow userspace processes to use InfiniBand/RDMA verbs directly.
Before starting our NVMe-oF target we must load the InfiniBand and RDMA modules that allow
userspace processes to use InfiniBand/RDMA verbs directly.
~~~{.sh}
modprobe ib_cm
modprobe ib_core
# Please note that ib_ucm does not exist in newer versions of the kernel and is not required.
modprobe ib_ucm || true
modprobe ib_ucm
modprobe ib_umad
modprobe ib_uverbs
modprobe iw_cm
@ -62,17 +64,11 @@ modprobe rdma_cm
modprobe rdma_ucm
~~~
### Prerequisites for RDMA NICs {#nvmf_prereqs_rdma_nics}
## Prerequisites for RDMA NICs {#nvmf_prereqs_rdma_nics}
Before starting our NVMe-oF target we must detect RDMA NICs and assign them IP addresses.
### Finding RDMA NICs and associated network interfaces
~~~{.sh}
ls /sys/class/infiniband/*/device/net
~~~
#### Mellanox ConnectX-3 RDMA NICs
### Mellanox ConnectX-3 RDMA NICs
~~~{.sh}
modprobe mlx4_core
@ -80,100 +76,77 @@ modprobe mlx4_ib
modprobe mlx4_en
~~~
#### Mellanox ConnectX-4 RDMA NICs
### Mellanox ConnectX-4 RDMA NICs
~~~{.sh}
modprobe mlx5_core
modprobe mlx5_ib
~~~
#### Assigning IP addresses to RDMA NICs
### Assigning IP addresses to RDMA NICs
~~~{.sh}
ifconfig eth1 192.168.100.8 netmask 255.255.255.0 up
ifconfig eth2 192.168.100.9 netmask 255.255.255.0 up
~~~
### RDMA Limitations {#nvmf_rdma_limitations}
As RDMA NICs put a limitation on the number of memory regions registered, the SPDK NVMe-oF
target application may eventually start failing to allocate more DMA-able memory. This is
an imperfection of the DPDK dynamic memory management and is most likely to occur with too
many 2MB hugepages reserved at runtime. One type of memory bottleneck is the number of NIC memory
regions, e.g., some NICs report as many as 2048 for the maximum number of memory regions. This
gives us a 4GB memory limit with 2MB hugepages for the total memory regions. It can be overcome by
using 1GB hugepages or by pre-reserving memory at application startup with `--mem-size` or `-s`
option. All pre-reserved memory will be registered as a single region, but won't be returned to the
system until the SPDK application is terminated.
Another known issue occurs when using the E810 NICs in RoCE mode. Specifically, the NVMe-oF target
sometimes cannot destroy a qpair, because its posted work requests don't get flushed. It can cause
the NVMe-oF target application unable to terminate cleanly.
## TCP transport support {#nvmf_tcp_transport}
The transport is built into the nvmf_tgt by default, and it does not need any special libraries.
## FC transport support {#nvmf_fc_transport}
To build nvmf_tgt with the FC transport, there is an additional FC LLD (Low Level Driver) code dependency.
Please contact your FC vendor for instructions to obtain FC driver module.
### Broadcom FC LLD code
FC LLD driver for Broadcom FC NVMe capable adapters can be obtained from,
https://github.com/ecdufcdrvr/bcmufctdrvr.
### Fetch FC LLD module and then build SPDK with FC enabled
After cloning SPDK repo and initialize submodules, FC LLD library is built which then can be linked with
the fc transport.
~~~{.sh}
git clone https://github.com/spdk/spdk spdk
git clone https://github.com/ecdufcdrvr/bcmufctdrvr fc
cd spdk
git submodule update --init
cd ../fc
make DPDK_DIR=../spdk/dpdk/build SPDK_DIR=../spdk
cd ../spdk
./configure --with-fc=../fc/build
make
~~~
## Configuring the SPDK NVMe over Fabrics Target {#nvmf_config}
An NVMe over Fabrics target can be configured using JSON RPCs.
The basic RPCs needed to configure the NVMe-oF subsystem are detailed below. More information about
working with NVMe over Fabrics specific RPCs can be found on the @ref jsonrpc_components_nvmf_tgt RPC page.
A `nvmf_tgt`-specific configuration file is used to configure the NVMe over Fabrics target. This
file's primary purpose is to define subsystems. A fully documented example configuration file is
located at `etc/spdk/nvmf.conf.in`.
### Using RPCs {#nvmf_config_rpc}
Start the nvmf_tgt application with elevated privileges. Once the target is started,
the nvmf_create_transport rpc can be used to initialize a given transport. Below is an
example where the target is started and configured with two different transports.
The RDMA transport is configured with an I/O unit size of 8192 bytes, 4 max qpairs per controller,
and an in capsule data size of 0 bytes. The TCP transport is configured with an I/O unit size of
16384 bytes, 8 max qpairs per controller, and an in capsule data size of 8192 bytes.
You should make a copy of the example configuration file, modify it to suit your environment, and
then run the nvmf_tgt application and pass it the configuration file using the -c option. Right now,
the target requires elevated privileges (root) to run.
~~~{.sh}
build/bin/nvmf_tgt
scripts/rpc.py nvmf_create_transport -t RDMA -u 8192 -m 4 -c 0
scripts/rpc.py nvmf_create_transport -t TCP -u 16384 -m 8 -c 8192
app/nvmf_tgt/nvmf_tgt -c /path/to/nvmf.conf
~~~
Below is an example of creating a malloc bdev and assigning it to a subsystem. Adjust the bdevs,
NQN, serial number, and IP address with RDMA transport to your own circumstances. If you replace
"rdma" with "TCP", then the subsystem will add a listener with TCP transport.
### Subsystem Configuration {#nvmf_config_subsystem}
The `[Subsystem]` section in the configuration file is used to configure
subysystems for the NVMe-oF target.
This example shows two local PCIe NVMe devices exposed as separate NVMe-oF target subsystems:
~~~{.sh}
scripts/rpc.py bdev_malloc_create -b Malloc0 512 512
scripts/rpc.py nvmf_create_subsystem nqn.2016-06.io.spdk:cnode1 -a -s SPDK00000000000001 -d SPDK_Controller1
scripts/rpc.py nvmf_subsystem_add_ns nqn.2016-06.io.spdk:cnode1 Malloc0
scripts/rpc.py nvmf_subsystem_add_listener nqn.2016-06.io.spdk:cnode1 -t rdma -a 192.168.100.8 -s 4420
[Nvme]
TransportID "trtype:PCIe traddr:0000:02:00.0" Nvme0
TransportID "trtype:PCIe traddr:0000:82:00.0" Nvme1
[Subsystem1]
NQN nqn.2016-06.io.spdk:cnode1
Listen RDMA 192.168.100.8:4420
AllowAnyHost No
Host nqn.2016-06.io.spdk:init
SN SPDK00000000000001
Namespace Nvme0n1 1
[Subsystem2]
NQN nqn.2016-06.io.spdk:cnode2
Listen RDMA 192.168.100.9:4420
AllowAnyHost Yes
SN SPDK00000000000002
Namespace Nvme1n1 1
~~~
### NQN Formal Definition
Any bdev may be presented as a namespace.
See @ref bdev for details on setting up bdevs.
For example, to create a virtual controller with two namespaces backed by the malloc bdevs
named Malloc0 and Malloc1 and made available as NSID 1 and 2:
~~~{.sh}
[Subsystem3]
NQN nqn.2016-06.io.spdk:cnode3
Listen RDMA 192.168.2.21:4420
AllowAnyHost Yes
SN SPDK00000000000003
Namespace Malloc0 1
Namespace Malloc1 2
~~~
#### NQN Formal Definition
NVMe qualified names or NQNs are defined in section 7.9 of the
[NVMe specification](http://nvmexpress.org/wp-content/uploads/NVM_Express_Revision_1.3.pdf). SPDK has attempted to
@ -198,7 +171,6 @@ NVMe Domain NQN = "nqn.", year, '-', month, '.', reverse domain, ':', utf-8 stri
~~~
Please note that the following types from the definition above are defined elsewhere:
1. utf-8 string: Defined in [rfc 3629](https://tools.ietf.org/html/rfc3629).
2. reverse domain: Equivalent to domain name as defined in [rfc 1034](https://tools.ietf.org/html/rfc1034).
@ -224,31 +196,26 @@ alphabetic hex digits in their NQNs.
SPDK uses the [DPDK Environment Abstraction Layer](http://dpdk.org/doc/guides/prog_guide/env_abstraction_layer.html)
to gain access to hardware resources such as huge memory pages and CPU core(s). DPDK EAL provides
functions to assign threads to specific cores.
To ensure the SPDK NVMe-oF target has the best performance, configure the NICs and NVMe devices to
To ensure the SPDK NVMe-oF target has the best performance, configure the RNICs and NVMe devices to
be located on the same NUMA node.
The `-m` core mask option specifies a bit mask of the CPU cores that
SPDK is allowed to execute work items on.
For example, to allow SPDK to use cores 24, 25, 26 and 27:
~~~{.sh}
build/bin/nvmf_tgt -m 0xF000000
app/nvmf_tgt/nvmf_tgt -m 0xF000000
~~~
## Configuring the Linux NVMe over Fabrics Host {#nvmf_host}
Both the Linux kernel and SPDK implement an NVMe over Fabrics host.
The Linux kernel NVMe-oF RDMA host support is provided by the `nvme-rdma` driver
(to support RDMA transport) and `nvme-tcp` (to support TCP transport). And the
following shows two different commands for loading the driver.
The Linux kernel NVMe-oF RDMA host support is provided by the `nvme-rdma` driver.
~~~{.sh}
modprobe nvme-rdma
modprobe nvme-tcp
~~~
The nvme-cli tool may be used to interface with the Linux kernel NVMe over Fabrics host.
See below for examples of the discover, connect and disconnect commands. In all three instances, the
transport can be changed to TCP by interchanging 'rdma' for 'tcp'.
Discovery:
~~~{.sh}
@ -264,8 +231,3 @@ Disconnect:
~~~{.sh}
nvme disconnect -n "nqn.2016-06.io.spdk:cnode1"
~~~
## Enabling NVMe-oF target tracepoints for offline analysis and debug {#nvmf_trace}
SPDK has a tracing framework for capturing low-level event information at runtime.
@ref nvmf_tgt_tracepoints enable analysis of both performance and application crashes.

View File

@ -67,9 +67,9 @@ system. This is used for access control.
## The Basics
A user of the NVMe-oF target library begins by creating a target using
spdk_nvmf_tgt_create(), setting up a set of addresses on which to accept
connections by calling spdk_nvmf_tgt_listen_ext(), then creating a subsystem
using spdk_nvmf_subsystem_create().
spdk_nvmf_tgt_create(), setting up a set of addresses to accept connections on
by calling spdk_nvmf_tgt_listen(), then creating a subsystem using
spdk_nvmf_subsystem_create().
Subsystems begin in an inactive state and must be activated by calling
spdk_nvmf_subsystem_start(). Subsystems may be modified at run time, but only
@ -78,19 +78,26 @@ calling spdk_nvmf_subsystem_pause() and resumed by calling
spdk_nvmf_subsystem_resume().
Namespaces may be added to the subsystem by calling
spdk_nvmf_subsystem_add_ns_ext() when the subsystem is inactive or paused.
spdk_nvmf_subsystem_add_ns() when the subsystem is inactive or paused.
Namespaces are bdevs. See @ref bdev for more information about the SPDK bdev
layer. A bdev may be obtained by calling spdk_bdev_get_by_name().
Once a subsystem exists and the target is listening on an address, new
connections will be automatically assigned to poll groups as they are
detected.
connections may be accepted by polling spdk_nvmf_tgt_accept().
All I/O to a subsystem is driven by a poll group, which polls for incoming
network I/O. Poll groups may be created by calling
spdk_nvmf_poll_group_create(). They automatically request to begin polling
upon creation on the thread from which they were created. Most importantly, *a
poll group may only be accessed from the thread on which it was created.*
poll group may only be accessed from the thread it was created on.*
When spdk_nvmf_tgt_accept() detects a new connection, it will construct a new
struct spdk_nvmf_qpair object and call the user provided `new_qpair_fn`
callback for each new qpair. In response to this callback, the user must
assign the qpair to a poll group by calling spdk_nvmf_poll_group_add().
Remember, a poll group may only be accessed from the thread it was created on,
so making a call to spdk_nvmf_poll_group_add() may require passing a message
to the appropriate thread.
## Access Control
@ -104,7 +111,9 @@ and hosts may only be added to inactive or paused subsystems.
A discovery subsystem, as defined by the NVMe-oF specification, is
automatically created for each NVMe-oF target constructed. Connections to the
discovery subsystem are handled in the same way as any other subsystem.
discovery subsystem are handled in the same way as any other subsystem - new
qpairs are created in response to spdk_nvmf_tgt_accept() and they must be
assigned to a poll group.
## Transports
@ -123,7 +132,15 @@ fabrics simultaneously.
The SPDK NVMe-oF target library does not strictly dictate threading model, but
poll groups do all of their polling and I/O processing on the thread they are
created on. Given that, it almost always makes sense to create one poll group
per thread used in the application.
per thread used in the application. New qpairs created in response to
spdk_nvmf_tgt_accept() can be handed out round-robin to the poll groups. This
is how the SPDK NVMe-oF target application currently functions.
More advanced algorithms for distributing qpairs to poll groups is possible.
For instance, a NUMA-aware algorithm would be an improvement over basic
round-robin, where NUMA-aware means assigning qpairs to poll groups running on
CPU cores that are on the same NUMA node as the network adapter and storage
device. Load-aware algorithms also may have benefits.
## Scaling Across CPU Cores
@ -149,7 +166,7 @@ the I/O path.
## Zero Copy Support
For the RDMA transport, data is transferred from the RDMA NIC to host memory
and then host memory to the SSD (or vice versa), without any intermediate
and then host memory to the SSD (or vis. versa), without any intermediate
copies. Data is never moved from one location in host memory to another. Other
transports in the future may require data copies.
@ -184,4 +201,4 @@ object.
Further, RDMA NICs expose different queue depths for READ/WRITE operations
than they do for SEND/RECV operations. The RDMA transport reports available
queue depth based on SEND/RECV operation limits and will queue in software as
necessary to accommodate (usually lower) limits on READ/WRITE operations.
necessary to accomodate (usually lower) limits on READ/WRITE operations.

View File

@ -1,205 +0,0 @@
# NVMe-oF Target Tracepoints {#nvmf_tgt_tracepoints}
# Introduction {#tracepoints_intro}
SPDK has a tracing framework for capturing low-level event information at runtime.
Tracepoints provide a high-performance tracing mechanism that is accessible at runtime.
They are implemented as a circular buffer in shared memory that is accessible from other
processes. The NVMe-oF target is instrumented with tracepoints to enable analysis of
both performance and application crashes. (Note: the SPDK tracing framework should still
be considered experimental. Work to formalize and document the framework is in progress.)
# Enabling Tracepoints {#enable_tracepoints}
Tracepoints are placed in groups. They are enabled and disabled as a group. To enable
the instrumentation of all the tracepoints group in an SPDK target application, start the
target with -e parameter set to 0xFFFF:
~~~
build/bin/nvmf_tgt -e 0xFFFF
~~~
To enable the instrumentation of just the NVMe-oF RDMA tracepoints in an SPDK target
application, start the target with the -e parameter set to 0x10:
~~~
build/bin/nvmf_tgt -e 0x10
~~~
When the target starts, a message is logged with the information you need to view
the tracepoints in a human-readable format using the spdk_trace application. The target
will also log information about the shared memory file.
~~~{.sh}
app.c: 527:spdk_app_setup_trace: *NOTICE*: Tracepoint Group Mask 0xFFFF specified.
app.c: 531:spdk_app_setup_trace: *NOTICE*: Use 'spdk_trace -s nvmf -p 24147' to capture a snapshot of events at runtime.
app.c: 533:spdk_app_setup_trace: *NOTICE*: Or copy /dev/shm/nvmf_trace.pid24147 for offline analysis/debug.
~~~
Note that when tracepoints are enabled, the shared memory files are not deleted when the application
exits. This ensures the file can be used for analysis after the application exits. On Linux, the
shared memory files are in /dev/shm, and can be deleted manually to free shm space if needed. A system
reboot will also free all of the /dev/shm files.
# Capturing a snapshot of events {#capture_tracepoints}
Send I/Os to the SPDK target application to generate events. The following is
an example usage of perf to send I/Os to the NVMe-oF target over an RDMA network
interface for 10 minutes.
~~~
./perf -q 128 -s 4096 -w randread -t 600 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.100.2 trsvcid:4420'
~~~
The spdk_trace program can be found in the app/trace directory. To analyze the tracepoints on the same
system running the NVMe-oF target, simply execute the command line shown in the log:
~~~{.sh}
build/bin/spdk_trace -s nvmf -p 24147
~~~
To analyze the tracepoints on a different system, first prepare the tracepoint file for transfer. The
tracepoint file can be large, but usually compresses very well. This step can also be used to prepare
a tracepoint file to attach to a GitHub issue for debugging NVMe-oF application crashes.
~~~{.sh}
bzip2 -c /dev/shm/nvmf_trace.pid24147 > /tmp/trace.bz2
~~~
After transferring the /tmp/trace.bz2 tracepoint file to a different system:
~~~{.sh}
bunzip2 /tmp/trace.bz2
build/bin/spdk_trace -f /tmp/trace
~~~
The following is sample trace capture showing the cumulative time that each
I/O spends at each RDMA state. All the trace captures with the same id are for
the same I/O.
~~~
28: 6026.658 ( 12656064) RDMA_REQ_NEED_BUFFER id: r3622 time: 0.019
28: 6026.694 ( 12656140) RDMA_REQ_RDY_TO_EXECUTE id: r3622 time: 0.055
28: 6026.820 ( 12656406) RDMA_REQ_EXECUTING id: r3622 time: 0.182
28: 6026.992 ( 12656766) RDMA_REQ_EXECUTED id: r3477 time: 228.510
28: 6027.010 ( 12656804) RDMA_REQ_TX_PENDING_C_TO_H id: r3477 time: 228.528
28: 6027.022 ( 12656828) RDMA_REQ_RDY_TO_COMPLETE id: r3477 time: 228.539
28: 6027.115 ( 12657024) RDMA_REQ_COMPLETING id: r3477 time: 228.633
28: 6027.471 ( 12657770) RDMA_REQ_COMPLETED id: r3518 time: 171.577
28: 6028.027 ( 12658940) RDMA_REQ_NEW id: r3623
28: 6028.057 ( 12659002) RDMA_REQ_NEED_BUFFER id: r3623 time: 0.030
28: 6028.095 ( 12659082) RDMA_REQ_RDY_TO_EXECUTE id: r3623 time: 0.068
28: 6028.216 ( 12659336) RDMA_REQ_EXECUTING id: r3623 time: 0.189
28: 6028.408 ( 12659740) RDMA_REQ_EXECUTED id: r3505 time: 190.509
28: 6028.441 ( 12659808) RDMA_REQ_TX_PENDING_C_TO_H id: r3505 time: 190.542
28: 6028.452 ( 12659832) RDMA_REQ_RDY_TO_COMPLETE id: r3505 time: 190.553
28: 6028.536 ( 12660008) RDMA_REQ_COMPLETING id: r3505 time: 190.637
28: 6028.854 ( 12660676) RDMA_REQ_COMPLETED id: r3465 time: 247.000
28: 6029.433 ( 12661892) RDMA_REQ_NEW id: r3624
28: 6029.452 ( 12661932) RDMA_REQ_NEED_BUFFER id: r3624 time: 0.019
28: 6029.482 ( 12661996) RDMA_REQ_RDY_TO_EXECUTE id: r3624 time: 0.050
28: 6029.591 ( 12662224) RDMA_REQ_EXECUTING id: r3624 time: 0.158
28: 6029.782 ( 12662624) RDMA_REQ_EXECUTED id: r3564 time: 96.937
28: 6029.798 ( 12662658) RDMA_REQ_TX_PENDING_C_TO_H id: r3564 time: 96.953
28: 6029.812 ( 12662688) RDMA_REQ_RDY_TO_COMPLETE id: r3564 time: 96.967
28: 6029.899 ( 12662870) RDMA_REQ_COMPLETING id: r3564 time: 97.054
28: 6030.262 ( 12663634) RDMA_REQ_COMPLETED id: r3477 time: 231.780
28: 6030.786 ( 12664734) RDMA_REQ_NEW id: r3625
28: 6030.804 ( 12664772) RDMA_REQ_NEED_BUFFER id: r3625 time: 0.018
28: 6030.841 ( 12664848) RDMA_REQ_RDY_TO_EXECUTE id: r3625 time: 0.054
28: 6030.963 ( 12665104) RDMA_REQ_EXECUTING id: r3625 time: 0.176
28: 6031.139 ( 12665474) RDMA_REQ_EXECUTED id: r3552 time: 114.906
28: 6031.196 ( 12665594) RDMA_REQ_TX_PENDING_C_TO_H id: r3552 time: 114.963
28: 6031.210 ( 12665624) RDMA_REQ_RDY_TO_COMPLETE id: r3552 time: 114.977
28: 6031.293 ( 12665798) RDMA_REQ_COMPLETING id: r3552 time: 115.060
28: 6031.633 ( 12666512) RDMA_REQ_COMPLETED id: r3505 time: 193.734
28: 6032.230 ( 12667766) RDMA_REQ_NEW id: r3626
28: 6032.248 ( 12667804) RDMA_REQ_NEED_BUFFER id: r3626 time: 0.018
28: 6032.288 ( 12667888) RDMA_REQ_RDY_TO_EXECUTE id: r3626 time: 0.058
28: 6032.396 ( 12668114) RDMA_REQ_EXECUTING id: r3626 time: 0.166
28: 6032.593 ( 12668528) RDMA_REQ_EXECUTED id: r3570 time: 90.443
28: 6032.611 ( 12668564) RDMA_REQ_TX_PENDING_C_TO_H id: r3570 time: 90.460
28: 6032.623 ( 12668590) RDMA_REQ_RDY_TO_COMPLETE id: r3570 time: 90.473
28: 6032.707 ( 12668766) RDMA_REQ_COMPLETING id: r3570 time: 90.557
28: 6033.056 ( 12669500) RDMA_REQ_COMPLETED id: r3564 time: 100.211
~~~
# Capturing sufficient trace events {#capture_trace_events}
Since the tracepoint file generated directly by SPDK application is a circular buffer in shared memory,
the trace events captured by it may be insufficient for further analysis.
The spdk_trace_record program can be found in the app/trace_record directory.
spdk_trace_record is used to poll the spdk tracepoint shared memory, record new entries from it,
and store all entries into specified output file at its shutdown on SIGINT or SIGTERM.
After SPDK nvmf target is launched, simply execute the command line shown in the log:
~~~{.sh}
build/bin/spdk_trace_record -q -s nvmf -p 24147 -f /tmp/spdk_nvmf_record.trace
~~~
Also send I/Os to the SPDK target application to generate events by previous perf example for 10 minutes.
~~~{.sh}
./perf -q 128 -s 4096 -w randread -t 600 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.100.2 trsvcid:4420'
~~~
After the completion of perf example, shut down spdk_trace_record by signal SIGINT (Ctrl + C).
To analyze the tracepoints output file from spdk_trace_record, simply run spdk_trace program by:
~~~{.sh}
build/bin/spdk_trace -f /tmp/spdk_nvmf_record.trace
~~~
# Adding New Tracepoints {#add_tracepoints}
SPDK applications and libraries provide several trace points. You can add new
tracepoints to the existing trace groups. For example, to add a new tracepoints
to the SPDK RDMA library (lib/nvmf/rdma.c) trace group TRACE_GROUP_NVMF_RDMA,
define the tracepoints and assigning them a unique ID using the SPDK_TPOINT_ID macro:
~~~
#define TRACE_GROUP_NVMF_RDMA 0x4
#define TRACE_RDMA_REQUEST_STATE_NEW SPDK_TPOINT_ID(TRACE_GROUP_NVMF_RDMA, 0x0)
...
#define NEW_TRACE_POINT_NAME SPDK_TPOINT_ID(TRACE_GROUP_NVMF_RDMA, UNIQUE_ID)
~~~
You also need to register the new trace points in the SPDK_TRACE_REGISTER_FN macro call
within the application/library using the spdk_trace_register_description function
as shown below:
~~~
SPDK_TRACE_REGISTER_FN(nvmf_trace)
{
spdk_trace_register_object(OBJECT_NVMF_RDMA_IO, 'r');
spdk_trace_register_description("RDMA_REQ_NEW", "",
TRACE_RDMA_REQUEST_STATE_NEW,
OWNER_NONE, OBJECT_NVMF_RDMA_IO, 1, 1, "cmid: ");
...
spdk_trace_register_description("NEW_RDMA_REQ_NAME", "",
NEW_TRACE_POINT_NAME,
OWNER_NONE, OBJECT_NVMF_RDMA_IO, 0, 1, "cmid: ");
}
~~~
Finally, use the spdk_trace_record function at the appropriate point in the
application/library to record the current trace state for the new trace points.
The following example shows the usage of the spdk_trace_record function to
record the current trace state of several tracepoints.
~~~
case RDMA_REQUEST_STATE_NEW:
spdk_trace_record(TRACE_RDMA_REQUEST_STATE_NEW, 0, 0, (uintptr_t)rdma_req, (uintptr_t)rqpair->cm_id);
...
break;
case RDMA_REQUEST_STATE_NEED_BUFFER:
spdk_trace_record(TRACE_RDMA_REQUEST_STATE_NEED_BUFFER, 0, 0, (uintptr_t)rdma_req, (uintptr_t)rqpair->cm_id);
...
break;
case RDMA_REQUEST_STATE_TRANSFER_PENDING_HOST_TO_CONTROLLER:
spdk_trace_record(TRACE_RDMA_REQUEST_STATE_TRANSFER_PENDING_HOST_TO_CONTROLLER, 0, 0,
(uintptr_t)rdma_req, (uintptr_t)rqpair->cm_id);
...
~~~
All the tracing functions are documented in the [Tracepoint library documentation](https://spdk.io/doc/trace_8h.html)

View File

@ -1,102 +0,0 @@
# SPDK Structural Overview {#overview}
# Overview {#dir_overview}
SPDK is composed of a set of C libraries residing in `lib` with public interface
header files in `include/spdk`, plus a set of applications built out of those
libraries in `app`. Users can use the C libraries in their software or deploy
the full SPDK applications.
SPDK is designed around message passing instead of locking, and most of the SPDK
libraries make several assumptions about the underlying threading model of the
application they are embedded into. However, SPDK goes to great lengths to remain
agnostic to the specific message passing, event, co-routine, or light-weight
threading framework actually in use. To accomplish this, all SPDK libraries
interact with an abstraction library in `lib/thread` (public interface at
`include/spdk/thread.h`). Any framework can initialize the threading abstraction
and provide callbacks to implement the functionality that the SPDK libraries
need. For more information on this abstraction, see @ref concurrency.
SPDK is built on top of POSIX for most operations. To make porting to non-POSIX
environments easier, all POSIX headers are isolated into
`include/spdk/stdinc.h`. However, SPDK requires a number of operations that
POSIX does not provide, such as enumerating the PCI devices on the system or
allocating memory that is safe for DMA. These additional operations are all
abstracted in a library called `env` whose public header is at
`include/spdk/env.h`. By default, SPDK implements the `env` interface using a
library based on DPDK. However, that implementation can be swapped out. See @ref
porting for additional information.
## Applications {#dir_app}
The `app` top-level directory contains full-fledged applications, built out of the SPDK
components. For a full overview, see @ref app_overview.
SPDK applications can typically be started with a small number of configuration
options. Full configuration of the applications is then performed using
JSON-RPC. See @ref jsonrpc for additional information.
## Libraries {#dir_lib}
The `lib` directory contains the real heart of SPDK. Each component is a C library with
its own directory under `lib`. Some of the key libraries are:
- @ref bdev
- @ref nvme
## Documentation {#dir_doc}
The `doc` top-level directory contains all of SPDK's documentation. API Documentation
is created using Doxygen directly from the code, but more general articles and longer
explanations reside in this directory, as well as the Doxygen config file.
To build the documentation, just type `make` within the doc directory.
## Examples {#dir_examples}
The `examples` top-level directory contains a set of examples intended to be used
for reference. These are different than the applications, which are doing a "real"
task that could reasonably be deployed. The examples are instead either heavily
contrived to demonstrate some facet of SPDK, or aren't considered complete enough
to warrant tagging them as a full blown SPDK application.
This is a great place to learn about how SPDK works. In particular, check out
`examples/nvme/hello_world`.
## Include {#dir_include}
The `include` directory is where all of the header files are located. The public API
is all placed in the `spdk` subdirectory of `include` and we highly
recommend that applications set their include path to the top level `include`
directory and include the headers by prefixing `spdk/` like this:
~~~{.c}
#include "spdk/nvme.h"
~~~
Most of the headers here correspond with a library in the `lib` directory. There
are a few headers that stand alone, however. They are:
- `assert.h`
- `barrier.h`
- `endian.h`
- `fd.h`
- `mmio.h`
- `queue.h` and `queue_extras.h`
- `string.h`
There is also an `spdk_internal` directory that contains header files widely included
by libraries within SPDK, but that are not part of the public API and would not be
installed on a user's system.
## Scripts {#dir_scripts}
The `scripts` directory contains convenient scripts for a number of operations. The two most
important are `check_format.sh`, which will use astyle and pep8 to check C, C++, and Python
coding style against our defined conventions, and `setup.sh` which binds and unbinds devices
from kernel drivers.
## Tests {#dir_tests}
The `test` directory contains all of the tests for SPDK's components and the subdirectories mirror
the structure of the entire repository. The tests are a mixture of unit tests and functional tests.

View File

@ -29,25 +29,15 @@ capabilities are given in the table below.
Key Functions | Description
------------------------------------------- | -----------
spdk_nvme_ctrlr_map_cmb() | @copybrief spdk_nvme_ctrlr_map_cmb()
spdk_nvme_ctrlr_unmap_cmb() | @copybrief spdk_nvme_ctrlr_unmap_cmb()
spdk_nvme_ctrlr_get_regs_cmbsz() | @copybrief spdk_nvme_ctrlr_get_regs_cmbsz()
# Determining device support {#p2p_support}
SPDK's identify example application displays whether a device has a controller
memory buffer and which operations it supports. Run it as follows:
~~~{.sh}
./build/examples/identify -r traddr:<pci id of ssd>
~~~
spdk_nvme_ctrlr_alloc_cmb_io_buffer() | @copybrief spdk_nvme_ctrlr_alloc_cmb_io_buffer()
spdk_nvme_ctrlr_free_cmb_io_buffer() | @copybrief spdk_nvme_ctrlr_free_cmb_io_buffer()
# cmb_copy: An example P2P Application {#p2p_cmb_copy}
Run the cmb_copy example application.
~~~{.sh}
./build/examples/cmb_copy -r <pci id of write ssd>-1-0-1 -w <pci id of write ssd>-1-0-1 -c <pci id of the ssd with cmb>
./examples/nvme/cmb_copy -r <pci id of write ssd>-1-0-1 -w <pci id of write ssd>-1-0-1 -c <pci id of the ssd with cmb>
~~~
This should copy a single LBA (LBA 0) from namespace 1 on the read
NVMe SSD to LBA 0 on namespace 1 on the write SSD using the CMB as the
@ -62,7 +52,7 @@ DMA buffer.
provided by Broadcom or Microsemi) as that is know to provide good
performance.
* Even with a PCIe switch there may be occasions where peer-2-peer
DMAs fail to work. This is probably due to PCIe Access Control
DMAs fail to work. This is probaby due to PCIe Access Control
Services (ACS) being enabled by the BIOS and/or OS. You can disable
ACS using setpci or via out of tree kernel patches that can be found
on the internet.

View File

@ -1,61 +1,3 @@
# Performance Reports {#performance_reports}
## Release 21.01
- [SPDK 21.01 NVMe Bdev Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_nvme_bdev_perf_report_2101.pdf)
- [SPDK 21.01 NVMe-oF TCP Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_tcp_perf_report_2101.pdf)
- [SPDK 21.01 NVMe-oF RDMA Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_rdma_perf_report_2101.pdf)
- [SPDK 21.01 Vhost Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_vhost_perf_report_2101.pdf)
## Release 20.10
- [SPDK 20.10 NVMe Bdev Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_nvme_bdev_perf_report_2010.pdf)
- [SPDK 20.10 NVMe-oF TCP Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_tcp_perf_report_2010.pdf)
- [SPDK 20.10 NVMe-oF RDMA Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_rdma_perf_report_2010.pdf)
- [SPDK 20.10 Vhost Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_vhost_perf_report_2010.pdf)
## Release 20.07
- [SPDK 20.07 NVMe-oF TCP Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_tcp_perf_report_2007.pdf)
- [SPDK 20.07 NVMe-oF RDMA Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_rdma_perf_report_2007.pdf)
- [SPDK 20.07 Vhost Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_vhost_perf_report_2007.pdf)
## Release 20.04
- [SPDK 20.04 NVMe-oF TCP Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_tcp_perf_report_2004.pdf)
- [SPDK 20.04 NVMe-oF RDMA Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_rdma_perf_report_2004.pdf)
- [SPDK 20.04 Vhost Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_vhost_perf_report_2004.pdf)
## Release 20.01
- [SPDK 20.01 Vhost Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_vhost_perf_report_2001.pdf)
- [SPDK 20.01 NVMe-oF TCP Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_tcp_perf_report_2001.pdf)
- [SPDK 20.01 NVMe-oF RDMA Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_rdma_perf_report_2001.pdf)
## Release 19.10
- [SPDK 19.10 Vhost Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_vhost_perf_report_1910.pdf)
- [SPDK 19.10 NVMe-oF TCP Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_nvmeof_tcp_perf_report_1910.pdf)
- [SPDK 19.10 NVMe-oF RDMA Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_nvmeof_rdma_perf_report_1910.pdf)
## Release 19.07
- [SPDK 19.07 Vhost Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_vhost_perf_report_19.07.pdf)
- [SPDK 19.07 NVMe-oF TCP Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_nvmeof_tcp_perf_report_19.07.pdf)
## Release 19.04
- [SPDK 19.04 NVMe-oF RDMA Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_19.04_NVMeOF_RDMA_benchmark_report.pdf)
## Release 19.01
- [SPDK 19.01.1 NVMe-oF RDMA Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_nvmeof_perf_report_19.01.1.pdf)
## Release 18.04
- [SPDK 18.04 NVMe BDEV Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_nvme_bdev_perf_report_18.04.pdf)
- [SPDK 18.04 NVMe-oF RDMA Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_nvmeof_perf_report_18.04.pdf)
## Release 17.07
- [SPDK 17.07 vhost-scsi Performance Report](https://ci.spdk.io/download/performance-reports/SPDK17_07_vhost_scsi_performance_report.pdf)

View File

@ -1,56 +0,0 @@
# Linking SPDK applications with pkg-config {#pkgconfig}
The SPDK build system generates pkg-config files to facilitate linking
applications with the correct set of SPDK and DPDK libraries. Using pkg-config
in your build system will ensure you do not need to make modifications
when SPDK adds or modifies library dependencies.
If your application is using the SPDK nvme library, you would use the following
to get the list of required SPDK libraries:
~~~
PKG_CONFIG_PATH=/path/to/spdk/build/lib/pkgconfig pkg-config --libs spdk_nvme
~~~
To get the list of required SPDK and DPDK libraries to use the DPDK-based
environment layer:
~~~
PKG_CONFIG_PATH=/path/to/spdk/build/lib/pkgconfig pkg-config --libs spdk_env_dpdk
~~~
When linking with static libraries, the dependent system libraries must also be
specified. To get the list of required system libraries:
~~~
PKG_CONFIG_PATH=/path/to/spdk/build/lib/pkgconfig pkg-config --libs spdk_syslibs
~~~
Note that SPDK libraries use constructor functions liberally, so you must surround
the library list with extra linker options to ensure these functions are not dropped
from the resulting application binary. With shared libraries this is achieved through
the `-Wl,--no-as-needed` parameters while with static libraries `-Wl,--whole-archive`
is used. Here is an example Makefile snippet that shows how to use pkg-config to link
an application that uses the SPDK nvme shared library:
~~~
PKG_CONFIG_PATH = $(SPDK_DIR)/build/lib/pkgconfig
SPDK_LIB := $(shell PKG_CONFIG_PATH="$(PKG_CONFIG_PATH)" pkg-config --libs spdk_nvme
DPDK_LIB := $(shell PKG_CONFIG_PATH="$(PKG_CONFIG_PATH)" pkg-config --libs spdk_env_dpdk
app:
$(CC) -o app app.o -pthread -Wl,--no-as-needed $(SPDK_LIB) $(DPDK_LIB) -Wl,--as-needed
~~~
If using the SPDK nvme static library:
~~~
PKG_CONFIG_PATH = $(SPDK_DIR)/build/lib/pkgconfig
SPDK_LIB := $(shell PKG_CONFIG_PATH="$(PKG_CONFIG_PATH)" pkg-config --libs spdk_nvme
DPDK_LIB := $(shell PKG_CONFIG_PATH="$(PKG_CONFIG_PATH)" pkg-config --libs spdk_env_dpdk
SYS_LIB := $(shell PKG_CONFIG_PATH="$(PKG_CONFIG_PATH)" pkg-config --libs --static spdk_syslibs
app:
$(CC) -o app app.o -pthread -Wl,--whole-archive $(SPDK_LIB) $(DPDK_LIB) -Wl,--no-whole-archive \
$(SYS_LIB)
~~~

View File

@ -1,11 +1,6 @@
# Programmer Guides {#prog_guides}
- [Public API header files](files.html)
- @subpage blob
- @subpage bdev_pg
- @subpage bdev_module
- @subpage nvmf_tgt_pg
- @subpage ftl
- @subpage gdb_macros
- @subpage reduce
- @subpage notify

View File

@ -1,49 +0,0 @@
# RPMs {#rpms}
# In this document {#rpms_toc}
* @ref building_rpms
# Building SPDK RPMs {#building_rpms}
To build basic set of RPM packages out of the SPDK repo simply run:
~~~{.sh}
# rpmbuild/rpm.sh
~~~
Additional configuration options can be passed directly as arguments:
~~~{.sh}
# rpmbuild/rpm.sh --with-shared --with-dpdk=/path/to/dpdk/build
~~~
There are several options that may be passed via environment as well:
- DEPS - Install all needed dependencies for building RPM packages.
Default: "yes"
- MAKEFLAGS - Flags passed to make
- RPM_RELEASE - Target release version of the RPM packages. Default: 1
- REQUIREMENTS - Extra set of RPM dependencies if deemed as needed
- SPDK_VERSION - SPDK version. Default: currently checked out tag
~~~{.sh}
# DEPS=no MAKEFLAGS="-d -j1" rpmbuild/rpm.sh --with-shared
~~~
By default, all RPM packages should be created under $HOME directory of the
target user:
~~~{.sh}
# printf '%s\n' /root/rpmbuild/RPMS/x86_64/*
/root/rpmbuild/RPMS/x86_64/spdk-devel-v21.01-1.x86_64.rpm
/root/rpmbuild/RPMS/x86_64/spdk-dpdk-libs-v21.01-1.x86_64.rpm
/root/rpmbuild/RPMS/x86_64/spdk-libs-v21.01-1.x86_64.rpm
/root/rpmbuild/RPMS/x86_64/spdk-v21.01-1.x86_64.rpm
#
~~~
- spdk - provides all the binaries, common tooling, etc.
- spdk-devel - provides development files
- spdk-libs - provides target lib, .pc files (--with-shared)
- spdk-dpdk-libs - provides dpdk lib files (--with-shared|--with-dpdk)

View File

@ -1,82 +0,0 @@
# Scheduler {#scheduler}
SPDK's event/application framework (`lib/event`) now supports scheduling of
lightweight threads. Schedulers are provided as plugins, called
implementations. A default implementation is provided, but users may wish to
write their own scheduler to integrate into broader code frameworks or meet
their performance needs.
This feature should be considered experimental and is disabled by default. When
enabled, the scheduler framework gathers data for each spdk thread and reactor
and passes it to a scheduler implementation to perform one of the following
actions.
## Actions
### Move a thread
`spdk_thread`s can be moved to another reactor. Schedulers can examine the
suggested cpu_mask value for each lightweight thread to see if the user has
requested specific reactors, or choose a reactor using whatever algorithm they
deem fit.
### Switch reactor mode
Reactors by default run in a mode that constantly polls for new actions for the
most efficient processing. Schedulers can switch a reactor into a mode that
instead waits for an event on a file descriptor. On Linux, this is implemented
using epoll. This results in reduced CPU usage but may be less responsive when
events occur. A reactor cannot enter this mode if any `spdk_threads` are
currently scheduled to it. This limitation is expected to be lifted in the
future, allowing `spdk_threads` to enter interrupt mode.
### Set frequency of CPU core
The frequency of CPU cores can be modified by the scheduler in response to
load. Only CPU cores that match the application cpu_mask may be modified. The
mechanism for controlling CPU frequency is pluggable and the default provided
implementation is called `dpdk_governor`, based on the `rte_power` library from
DPDK.
#### Known limitation
When SMT (Hyperthreading) is enabled the two logical CPU cores sharing a single
physical CPU core must run at the same frequency. If one of two of such logical
CPU cores is outside the application cpu_mask, the policy and frequency on that
core has to be managed by the administrator.
## Scheduler implementations
The scheduler in use may be controlled by JSON-RPC. Please use the
[framework_set_scheduler](jsonrpc.md/#rpc_framework_set_scheduler) RPC to
switch between schedulers or change their options.
[spdk_top](spdk_top.md#spdk_top) is a useful tool to observe the behavior of
schedulers in different scenarios and workloads.
### static [default]
The `static` scheduler is the default scheduler and does no dynamic scheduling.
Lightweight threads are distributed round-robin among reactors, respecting
their requested cpu_mask, and then they are never moved. This is equivalent to
the previous behavior of the SPDK event/application framework.
### dynamic
The `dynamic` scheduler is designed for power saving and reduction of CPU
utilization, especially in cases where workloads show large variations over
time.
Active threads are distributed equally among reactors, taking cpu_mask into
account. All idle threads are moved to the main core. Once an idle thread becomes
active, it is redistributed again.
When a reactor has no scheduled `spdk_thread`s it is switched into interrupt
mode and stops actively polling. After enough threads become active, the
reactor is switched back into poll mode and threads are assigned to it again.
The main core can contain active threads only when their execution time does
not exceed the sum of all idle threads. When no active threads are present on
the main core, the frequency of that CPU core will decrease as the load
decreases. All CPU cores corresponding to the other reactors remain at maximum
frequency.

View File

@ -1,146 +0,0 @@
# shfmt {#shfmt}
# In this document {#shfmt_toc}
* @ref shfmt_overview
* @ref shfmt_usage
* @ref shfmt_installation
* @ref shfmt_examples
# Overview {#shfmt_overview}
The majority of tests (and scripts overall) in the SPDK repo are written
in Bash (with a quite significant emphasis on "Bashism"), thus a style
formatter, shfmt, was introduced to help keep the .sh code consistent
across the entire repo. For more details on the tool itself, please see
[shfmt](https://github.com/mvdan/sh).
# Usage {#shfmt_usage}
On the CI pool, the shfmt is run against all the updated .sh files that
have been committed but not merged yet. Additionally, shfmt will pick
all .sh present in the staging area when run locally from our pre-commit
hook (via check_format.sh). In case any style errors are detected, a
patch with needed changes is going to be generated and either build (CI)
or the commit will be aborted. Said patch can be then easily applied:
~~~{.sh}
# Run from the root of the SPDK repo
patch --merge -p0 <shfmt-3.1.0.patch
~~~
The name of the patch is derived from the version of shfmt that is
currently in use (3.1.0 is currently supported).
Please, see ./scripts/check_format.sh for all the arguments the shfmt
is run with. Additionally, @ref shfmt_examples has more details on how
each of the arguments behave.
# Installation {#shfmt_installation}
The shfmt can be easily installed via pkgdep.sh:
~~~{.sh}
./scripts/pkgdep.sh -d
~~~
This will install all the developers tools, including shfmt, on the
local system. The precompiled binary will be saved, by default, to
/opt/shfmt and then linked under /usr/bin. Both paths can be changed
by setting SHFMT_DIR and SHFMT_DIR_OUT in the environment. Example:
~~~{.sh}
SHFMT_DIR=/keep_the_binary_here \
SHFMT_DIR_OUT=/and_link_it_here \
./scripts/pkgdep.sh -d
~~~
# Examples {#shfmt_examples}
~~~{.sh}
#######################################
if foo=$(bar); then
echo "$foo"
fi
exec "$foo" \
--bar \
--foo
# indent_style = tab
if foo=$(bar); then
echo "$foo"
fi
exec foobar \
--bar \
--foo
######################################
if foo=$(bar); then
echo "$foo" && \
echo "$(bar)"
fi
# binary_next_line = true
if foo=$(bar); then
echo "$foo" \
&& echo "$(bar)"
fi
# Note that each break line is also being indented:
if [[ -v foo ]] \
&& [[ -v bar ]] \
&& [[ -v foobar ]]; then
echo "This is foo"
fi
# ->
if [[ -v foo ]] \
&& [[ -v bar ]] \
&& [[ -v foobar ]]; then
echo "This is foo"
fi
# Currently, newlines are being escaped even if syntax-wise
# they are not needed, thus watch for the following:
if [[ -v foo
&& -v bar
&& -v foobar ]]; then
echo "This is foo"
fi
#->
if [[ -v foo && -v \
bar && -v \
foobar ]]; then
echo "This is foo"
fi
# This, unfortunately, also breaks the -bn behavior.
# (see https://github.com/mvdan/sh/issues/565) for details.
######################################
case "$FOO" in
BAR)
echo "$FOO" ;;
esac
# switch_case_indent = true
case "$FOO" in
BAR)
echo "$FOO"
;;
esac
######################################
exec {foo}>bar
:>foo
exec {bar}<foo
# -sr
exec {foo}> bar
: > foo
exec {bar}< foo
######################################
# miscellaneous, enforced by shfmt
(( no_spacing_at_the_beginning & ~and_no_spacing_at_the_end ))
: $(( no_spacing_at_the_beginning & ~and_no_spacing_at_the_end ))
# ->
((no_spacing_at_the_beginning & ~and_no_spacing_at_the_end))
: $((no_spacing_at_the_beginning & ~and_no_spacing_at_the_end))
~~~

View File

@ -1,65 +0,0 @@
# spdk_top {#spdk_top}
The spdk_top application is designed to resemble the standard top in that it provides a real-time insights into CPU cores usage by SPDK lightweight threads and pollers. Have you ever wondered which CPU core is used most by your SPDK instance? Are you building your own bdev or library and want to know if your code is running efficiently? Are your new pollers busy most of the time? The spdk_top application uses RPC calls to collect performance metrics and displays them in a report that you can analyze and determine if your code is running efficiently so that you can tune your implementation and get more from SPDK.
Why doesn't the classic top utility work for SPDK? SPDK uses a polled-mode design; a reactor thread running on each CPU core assigned to an SPDK application schedules SPDK lightweight threads and pollers to run on the CPU core. Therefore, the standard Linux top utility is not effective for analyzing the CPU usage for polled-mode applications like SPDK because it just reports that they are using 100% of the CPU resources assigned to them. The spdk_top utility was developed to analyze and report the CPU cycles used to do real work vs just polling for work. The utility relies on instrumentation added to pollers to track when they are doing work vs. polling for work. The spdk_top utility gets the fine grained metrics from the pollers, analyzes and report the metrics on a per poller, thread and core basis. This information enables users to identify CPU cores that are busy doing real work so that they can determine if the application needs more or less CPU resources.
# Run spdk_top
Before running spdk_top you need to run the SPDK application whose performance you want to analyze using spdk_top.
Run the spdk_top application
~~~{.sh}
./build/bin/spdk_top
~~~
# Bottom menu
Menu at the bottom of SPDK top window shows many options for changing displayed data. Each menu item has a key associated with it in square brackets.
* Quit - quits the SPDK top application.
* TAB selection - allows to select THREADS/POLLERS/CORES tabs.
* Previous page/Next page - scrolls up/down to the next set of rows displayed. Indicator in the bottom-left corner shows current page and number of all available pages.
* Columns - enables/disables chosen columns in a column pop-up window.
* Sorting - allows to sort displayed data by column in a sorting pop-up.
* Refresh rate - takes user input from 0 to 255 and changes refresh rate to that value in seconds.
* Item details - displays details pop-up window for highlighted data row. Selection is changed by pressing UP and DOWN arrow keys.
* Total/Interval - changes displayed values in all tabs to either Total time (measured since start of SPDK application) or Interval time (measured since last refresh).
# Threads Tab
The threads tab displays a line item for each spdk thread. The information displayed shows:
* Thread name - name of SPDK thread.
* Core - core on which the thread is currently running.
* Active/Timed/Paused pollers - number of pollers grouped by type on this thread.
* Idle/Busy - how many microseconds the thread was idle/busy.
\n
By pressing ENTER key a pop-up window appears, showing above and a list of pollers running on selected thread (with poller name, type, run count and period).
Pop-up then can be closed by pressing ESC key.
To learn more about spdk threads see @ref concurrency.
# Pollers Tab
The pollers tab displays a line item for each poller. The information displayed shows:
* Poller name - name of currently selected poller.
* Type - type of poller (Active/Paused/Timed).
* On thread - thread on which the poller is running.
* Run count - how many times poller was run.
* Period - poller period in microseconds. If period equals 0 then it is not displayed.
* Status - whether poller is currently Busy (red color) or Idle (blue color).
\n
Poller pop-up window can be displayed by pressing ENTER on a selected data row and displays above information.
Pop-up can be closed by pressing ESC key.
# Cores Tab
The cores tab provides insights into how the application is using the CPU cores assigned to it. The information displayed for each core shows:
* Core - core number.
* Thread count - number of threads currently running on core.
* Poller count - total number of pollers running on core.
* Idle/Busy - how many microseconds core was idle (including time when core ran pollers but did not find any work) or doing actual work.
\n
Pressing ENTER key makes a pop-up window appear, showing above information, along with a list of threads running on selected core. Cores details window allows to select a thread and display thread details pop-up on top of it. To close both pop-ups use ESC key.

View File

@ -11,19 +11,18 @@ for the next SPDK release.
All dependencies should be handled by scripts/pkgdep.sh script.
Package dependencies at the moment include:
- configshell
### Run SPDK application instance
~~~{.sh}
./scripts/setup.sh
./build/bin/vhost -c vhost.json
./app/vhost/vhost -c vhost.conf
~~~
### Run SPDK CLI
Spdkcli should be run with the same privileges as SPDK application.
Spdkcli should be run with the same priviliges as SPDK application.
In order to use SPDK CLI in interactive mode please use:
~~~{.sh}
scripts/spdkcli.py
@ -50,7 +49,7 @@ virtualenv-3 ./venv
source ./venv/bin/activate
~~~
Then install the dependencies using pip. That way dependencies will be
Then install the dependencies using pip. That way depedencies will be
installed only inside the virtual environment.
~~~{.sh}
(venv) pip install configshell-fb

Some files were not shown because too many files have changed in this diff Show More