numam-spdk/doc/blobfs.md
Ben Walker d59f28c28a rocksdb: Add the RocksDB Env to the SPDK repository
This code was previously in our fork of RocksDB. Move it here
so that API breaking changes can update it.

Change-Id: Icae3e22380b9bd3de8c1ec5b6f82909f812d204b
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/364531
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2017-06-13 11:38:06 -04:00

3.7 KiB

BlobFS (Blobstore Filesystem)

BlobFS Getting Started Guide

RocksDB Integration

  1. Clone and build the SPDK repository as per https://github.com/spdk/spdk

git clone https://github.com/spdk/spdk.git cd spdk ./configure make

  1. Into a separate directory, clone the RocksDB git repo from the SPDK GitHub fork. Make sure you check out the spdk branch.

cd .. git clone -b spdk https://github.com/spdk/rocksdb.git

  1. Build RocksDB. Only the db_bench benchmarking tool is integrated with BlobFS. (Note: add "DEBUG_LEVEL=0" for a release build.)

cd rocksdb make db_bench SPDK_DIR=path/to/spdk

  1. Copy etc/spdk/rocksdb.conf.in from the spdk repository to /usr/local/etc/spdk/rocksdb.conf.

cd ../spdk cp etc/spdk/rocksdb.conf.in /usr/local/etc/spdk/rocksdb.conf

  1. Append an NVMe section to the configuration file using SPDK's gen_nvme.sh script.

scripts/gen_nvme.sh >> /usr/local/etc/spdk/rocksdb.conf

  1. Verify the configuration file has specified the correct NVMe SSD. If there are any NVMe SSDs you do not wish to use for RocksDB/SPDK testing, remove them from the configuration file.

  2. Make sure you have at least 5GB of memory allocated for huge pages. By default the SPDK setup.sh script only allocates 2GB (1024 huge pages). The following will allocate 5GB worth of 2MB huge pages (in addition to binding the NVMe devices to uio/vfio). If using 1GB huge pages, adjust the NRHUGE value accordingly.

NRHUGE=2560 scripts/setup.sh

  1. Create an empty SPDK blobfs for testing.

test/lib/blobfs/mkfs/mkfs /usr/local/etc/spdk/rocksdb.conf Nvme0n1

At this point, RocksDB is ready for testing with SPDK. Three db_bench parameters are used to configure SPDK:

  1. spdk - Defines the name of the SPDK configuration file. If omitted, RocksDB will use the default PosixEnv implementation instead of SpdkEnv. (Required)
  2. spdk_bdev - Defines the name of the SPDK block device which contains the BlobFS to be used for testing. (Required)
  3. spdk_cache_size - Defines the amount of userspace cache memory used by SPDK. Specified in terms of megabytes (MB). Default is 4096 (4GB). (Optional)

SPDK has a set of scripts which will run db_bench against a variety of workloads and capture performance and profiling data. The primary script is test/blobfs/rocksdb/run_tests.sh.

FUSE

BlobFS provides a FUSE plug-in to mount an SPDK BlobFS as a kernel filesystem for inspection or debug purposes. The FUSE plug-in requires fuse3 and will be built automatically when fuse3 is detected on the system.

test/lib/blobfs/fuse/fuse /usr/local/etc/spdk/rocksdb.conf Nvme0n1 /mnt/fuse

Note that the FUSE plug-in has some limitations - see the list below.

Limitations

  • BlobFS has primarily been tested with RocksDB so far, so any use cases different from how RocksDB uses a filesystem may run into issues. BlobFS will be tested in a broader range of use cases after this initial release.
  • Only a synchronous API is currently supported. An asynchronous API has been developed but not thoroughly tested yet so is not part of the public interface yet. This will be added in a future release.
  • File renames are not atomic. This will be fixed in a future release.
  • BlobFS currently supports only a flat namespace for files with no directory support. Filenames are currently stored as xattrs in each blob. This means that filename lookup is an O(n) operation. An SPDK btree implementation is underway which will be the underpinning for BlobFS directory support in a future release.
  • Writes to a file must always append to the end of the file. Support for writes to any location within the file will be added in a future release.