bdev: add raid bdev module

Raid module:
============
- SPDK raid bdev module is a new bdev module which is
  responsible for striping various NVMe devices and expose the raid bdev
  to bdev layer which would enhance the performance and capacity.
- It can support theoretically 256 base devices (currently it is being
  tested max upto 8 base devices)
- Multiple strip sizes like 32KB, 64KB, 128KB, 256KB, 512KB etc is
  supported. Most of the current testing is focused on 64KB strip size.
- New RPC commands like "create raid bdev", "destroy raid bdev" and "get raid bdevs"
  are introduced to configure raid bdev dynamically in a running
SPDK system.
- Currently raid bdev configuration parameters are persisted in the
  current SPDK configuration file for across reboot support. DDF will be
introduced later.

High level testing done:
=======================
- Raid bdev is created with 8 base NVMe devices via configuration
  file and is exposed to initiator via existing methods. Initiator is
able to see a single NVMe namespace with capacity equal to sum of the
minimum capacities of 8 devices. Initiator was able to run raw
read/write workload, file system workload etc (tested with XFS file
system workload).
- Multiple raid bdevs are also created and exposed to initiator and
  tested with file system and other workloads for read/write IO.
- LVS / LVOL are created over raid bdev and exposed to initiator.
  Testing was done for raw read/write workloads and XFS file system
workloads.
- RPC testing is done where on the running SPDK system raid bdevs
  are created out of NVMe base devices. These raid bdevs (and LVOLs
over raid bdevs) are then exposed to initiator and IO workload was
tested for raw read/write and XFS file system workload.
- RPC testing is done for delete raid bdevs where all raid bdevs
  are deleted in running SPDK system.
- RPC testing is done for get raid bdevs where existing list of
  raid bdev names is printed (it can be all raid bdevs or only
online or only configuring or only offline).
- RPC testing is done where raid bdevs and underlying NVMe devices
  relationship was returned in JSON RPC commands

Change-Id: I10ae1266f8f2cca3c106e4df8c1c0993ddf435d8
Signed-off-by: Kunal Sablok <kunal.sablok@intel.com>
Reviewed-on: https://review.gerrithub.io/410484
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
This commit is contained in:
Kunal Sablok 2018-05-08 07:30:29 -04:00 committed by Ben Walker
parent 7c57c0f2ad
commit 41586b0f1d
16 changed files with 4515 additions and 1 deletions

View File

@ -2,6 +2,13 @@
## v18.07: (Upcoming Release)
### RAID module
A new bdev module called "raid" has been added as experimental module which
aggregates underlying nvme bdevs and expose a single raid bdev to upper bdev
layers. Over this LVS/LVOL can be created as per use-cases and they can be
exposed to NVMe-oF subsystems. Please note that vhost will not work with RAID
module as RAID module does not support multipe IOV Vectors yet.
### Log
The debug log component flag has been renamed from `-t` to `-L` to prevent confusion

3
CONFIG
View File

@ -98,3 +98,6 @@ CONFIG_VPP?=n
# Requires libiscsi development libraries.
CONFIG_ISCSI_INITIATOR?=n
#Build with raid
CONFIG_RAID?=n

12
configure vendored
View File

@ -49,6 +49,8 @@ function usage()
echo " No path required."
echo " iscsi-initiator [disabled]"
echo " No path required."
echo " raid [disabled]"
echo " No path required."
echo " vtune Required to profile I/O under Intel VTune Amplifier XE."
echo " example: /opt/intel/vtune_amplifier_xe_version"
echo ""
@ -136,6 +138,13 @@ for i in "$@"; do
--without-rbd)
CONFIG_RBD=n
;;
--with-raid)
CONFIG_RAID=y
echo "Warning: vhost will not work with RAID module as multiple IOV support is not there"
;;
--without-raid)
CONFIG_RAID=n
;;
--with-rdma)
CONFIG_RDMA=y
;;
@ -327,6 +336,9 @@ fi
if [ -n "$CONFIG_RBD" ]; then
echo "CONFIG_RBD?=$CONFIG_RBD" >> CONFIG.local
fi
if [ -n "$CONFIG_RAID" ]; then
echo "CONFIG_RAID?=$CONFIG_RAID" >> CONFIG.local
fi
if [ -n "$CONFIG_VTUNE" ]; then
echo "CONFIG_VTUNE?=$CONFIG_VTUNE" >> CONFIG.local
fi

View File

@ -52,5 +52,6 @@ DIRS-$(CONFIG_PMDK) += pmem
endif
DIRS-$(CONFIG_RBD) += rbd
DIRS-$(CONFIG_RAID) += raid
include $(SPDK_ROOT_DIR)/mk/spdk.lib.mk

41
lib/bdev/raid/Makefile Normal file
View File

@ -0,0 +1,41 @@
#
# BSD LICENSE
#
# Copyright (c) Intel Corporation.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
CFLAGS += -I$(SPDK_ROOT_DIR)/lib/bdev/
C_SRCS = bdev_raid.c bdev_raid_rpc.c
LIBNAME = vbdev_raid
include $(SPDK_ROOT_DIR)/mk/spdk.lib.mk

1321
lib/bdev/raid/bdev_raid.c Normal file

File diff suppressed because it is too large Load Diff

230
lib/bdev/raid/bdev_raid.h Normal file
View File

@ -0,0 +1,230 @@
/*-
* BSD LICENSE
*
* Copyright (c) Intel Corporation.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef SPDK_BDEV_RAID_INTERNAL_H
#define SPDK_BDEV_RAID_INTERNAL_H
#include "spdk/bdev_module.h"
/*
* Raid state describes the state of the raid. This raid bdev can be either in
* configured list or configuring list
*/
enum raid_bdev_state {
/* raid bdev is ready and is seen by upper layers */
RAID_BDEV_STATE_ONLINE,
/* raid bdev is configuring, not all underlying bdevs are present */
RAID_BDEV_STATE_CONFIGURING,
/*
* In offline state, raid bdev layer will complete all incoming commands without
* submitting to underlying base nvme bdevs
*/
RAID_BDEV_STATE_OFFLINE,
/* raid bdev max, new states should be added before this */
RAID_BDEV_MAX
};
/*
* raid_base_bdev_info contains information for the base bdevs which are part of some
* raid. This structure contains the per base bdev information. Whatever is
* required per base device for raid bdev will be kept here
*/
struct raid_base_bdev_info {
/* pointer to base spdk bdev */
struct spdk_bdev *base_bdev;
/* pointer to base bdev descriptor opened by raid bdev */
struct spdk_bdev_desc *base_bdev_desc;
/*
* When underlying base device calls the hot plug function on drive removal,
* this flag will be set and later after doing some processing, base device
* descriptor will be closed
*/
bool base_bdev_remove_scheduled;
};
/*
* raid_bdev contains the information related to any raid bdev either configured or
* in configuring list
*/
struct raid_bdev {
/* link of raid bdev to link it to configured, configuring or offline list */
TAILQ_ENTRY(raid_bdev) link_specific_list;
/* link of raid bdev to link it to global raid bdev list */
TAILQ_ENTRY(raid_bdev) link_global_list;
/* pointer to config file entry */
struct raid_bdev_config *raid_bdev_config;
/* array of base bdev info */
struct raid_base_bdev_info *base_bdev_info;
/* strip size of raid bdev in blocks */
uint32_t strip_size;
/* strip size bit shift for optimized calculation */
uint32_t strip_size_shift;
/* block length bit shift for optimized calculation */
uint32_t blocklen_shift;
/* state of raid bdev */
enum raid_bdev_state state;
/* number of base bdevs comprising raid bdev */
uint16_t num_base_bdevs;
/* number of base bdevs discovered */
uint16_t num_base_bdevs_discovered;
/* Raid Level of this raid bdev */
uint8_t raid_level;
/* Set to true if destruct is called for this raid bdev */
bool destruct_called;
};
/*
* raid_bdev_ctxt is the single entity structure for entire bdev which is
* allocated for any raid bdev
*/
struct raid_bdev_ctxt {
/* raid bdev device, this will get registered in bdev layer */
struct spdk_bdev bdev;
/* raid_bdev object, io device will be created on this */
struct raid_bdev raid_bdev;
};
/*
* raid_bdev_io is the context part of bdev_io. It contains the information
* related to bdev_io for a pooled bdev
*/
struct raid_bdev_io {
/* WaitQ entry, used only in waitq logic */
struct spdk_bdev_io_wait_entry waitq_entry;
/* Original channel for this IO, used in queuing logic */
struct spdk_io_channel *ch;
/* current buffer location, used in queueing logic */
uint8_t *buf;
/* outstanding child completions */
uint16_t splits_comp_outstanding;
/* pending splits yet to happen */
uint16_t splits_pending;
/* status of parent io */
bool status;
};
/*
* raid_base_bdev_config is the per base bdev data structure which contains
* information w.r.t to per base bdev during parsing config
*/
struct raid_base_bdev_config {
/* base bdev name from config file */
char *bdev_name;
};
/*
* raid_bdev_config contains the raid bdev config related information after
* parsing the config file
*/
struct raid_bdev_config {
/* base bdev config per underlying bdev */
struct raid_base_bdev_config *base_bdev;
/* Points to already created raid bdev */
struct raid_bdev_ctxt *raid_bdev_ctxt;
char *name;
/* strip size of this raid bdev in kilo bytes */
uint32_t strip_size;
/* number of base bdevs */
uint8_t num_base_bdevs;
/* raid level */
uint8_t raid_level;
};
/*
* raid_config is the top level structure representing the raid bdev config as read
* from config file for all raids
*/
struct raid_config {
/* raid bdev context from config file */
struct raid_bdev_config *raid_bdev_config;
/* total raid bdev from config file */
uint8_t total_raid_bdev;
};
/*
* raid_bdev_io_channel is the context of spdk_io_channel for raid bdev device. It
* contains the relationship of raid bdev io channel with base bdev io channels.
*/
struct raid_bdev_io_channel {
/* Array of IO channels of base bdevs */
struct spdk_io_channel **base_bdevs_io_channel;
/* raid bdev context pointer */
struct raid_bdev_ctxt *raid_bdev_ctxt;
};
/* TAIL heads for various raid bdev lists */
TAILQ_HEAD(spdk_raid_configured_tailq, raid_bdev);
TAILQ_HEAD(spdk_raid_configuring_tailq, raid_bdev);
TAILQ_HEAD(spdk_raid_all_tailq, raid_bdev);
TAILQ_HEAD(spdk_raid_offline_tailq, raid_bdev);
extern struct spdk_raid_configured_tailq g_spdk_raid_bdev_configured_list;
extern struct spdk_raid_configuring_tailq g_spdk_raid_bdev_configuring_list;
extern struct spdk_raid_all_tailq g_spdk_raid_bdev_list;
extern struct spdk_raid_offline_tailq g_spdk_raid_bdev_offline_list;
extern struct raid_config g_spdk_raid_config;
void raid_bdev_remove_base_bdev(void *ctx);
int raid_bdev_add_base_device(struct spdk_bdev *bdev);
#endif // SPDK_BDEV_RAID_INTERNAL_H

View File

@ -0,0 +1,632 @@
/*-
* BSD LICENSE
*
* Copyright (c) Intel Corporation.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "spdk/rpc.h"
#include "spdk/bdev.h"
#include "bdev_raid.h"
#include "spdk/util.h"
#include "spdk/string.h"
#include "spdk_internal/log.h"
#include "spdk/env.h"
#define RPC_MAX_BASE_BDEVS 255
static void raid_bdev_config_destroy(struct raid_bdev_config *raid_bdev_config);
SPDK_LOG_REGISTER_COMPONENT("raidrpc", SPDK_LOG_RAID_RPC)
/*
* brief:
* check_raid_bdev_present function tells if the raid bdev with given name already
* exists or not.
* params:
* name - raid bdev name
* returns:
* NULL - raid bdev not present
* non NULL - raid bdev present, returns raid_bdev_ctxt
*/
static struct raid_bdev_ctxt *
check_raid_bdev_present(char *raid_bdev_name)
{
struct raid_bdev *raid_bdev;
struct raid_bdev_ctxt *raid_bdev_ctxt;
TAILQ_FOREACH(raid_bdev, &g_spdk_raid_bdev_list, link_global_list) {
raid_bdev_ctxt = SPDK_CONTAINEROF(raid_bdev, struct raid_bdev_ctxt, raid_bdev);
if (strcmp(raid_bdev_ctxt->bdev.name, raid_bdev_name) == 0) {
/* raid bdev found */
return raid_bdev_ctxt;
}
}
return NULL;
}
/*
* Input structure for get_raid_bdevs RPC
*/
struct rpc_get_raid_bdevs {
/* category - all or online or configuring or offline */
char *category;
};
/*
* brief:
* free_rpc_get_raids function frees RPC get_raids related parameters
* params:
* req - pointer to RPC request
* returns:
* none
*/
static void
free_rpc_get_raid_bdevs(struct rpc_get_raid_bdevs *req)
{
free(req->category);
}
/*
* Decoder object for RPC get_raids
*/
static const struct spdk_json_object_decoder rpc_get_raid_bdevs_decoders[] = {
{"category", offsetof(struct rpc_get_raid_bdevs, category), spdk_json_decode_string},
};
/*
* brief:
* spdk_rpc_get_raids function is the RPC for get_raids. This is used to list
* all the raid bdev names based on the input category requested. Category should be
* one of "all", "online", "configuring" or "offline". "all" means all the raids
* whether they are online or configuring or offline. "online" is the raid bdev which
* is registered with bdev layer. "configuring" is the raid bdev which does not have
* full configuration discovered yet. "offline" is the raid bdev which is not
* registered with bdev as of now and it has encountered any error or user has
* requested to offline the raid.
* params:
* requuest - pointer to json rpc request
* params - pointer to request parameters
* returns:
* none
*/
static void
spdk_rpc_get_raid_bdevs(struct spdk_jsonrpc_request *request, const struct spdk_json_val *params)
{
struct rpc_get_raid_bdevs req = {};
struct spdk_json_write_ctx *w;
struct raid_bdev *raid_bdev;
struct raid_bdev_ctxt *raid_bdev_ctxt;
if (spdk_json_decode_object(params, rpc_get_raid_bdevs_decoders,
SPDK_COUNTOF(rpc_get_raid_bdevs_decoders),
&req)) {
spdk_jsonrpc_send_error_response(request, SPDK_JSONRPC_ERROR_INVALID_PARAMS, "Invalid parameters");
return;
}
if (!(strcmp(req.category, "all") == 0 ||
strcmp(req.category, "online") == 0 ||
strcmp(req.category, "configuring") == 0 ||
strcmp(req.category, "offline") == 0)) {
spdk_jsonrpc_send_error_response(request, SPDK_JSONRPC_ERROR_INVALID_PARAMS, "Invalid parameters");
free_rpc_get_raid_bdevs(&req);
return;
}
w = spdk_jsonrpc_begin_result(request);
if (w == NULL) {
free_rpc_get_raid_bdevs(&req);
return;
}
spdk_json_write_array_begin(w);
/* Get raid bdev list based on the category requested */
if (strcmp(req.category, "all") == 0) {
TAILQ_FOREACH(raid_bdev, &g_spdk_raid_bdev_list, link_global_list) {
raid_bdev_ctxt = SPDK_CONTAINEROF(raid_bdev, struct raid_bdev_ctxt, raid_bdev);
spdk_json_write_string(w, raid_bdev_ctxt->bdev.name);
}
} else if (strcmp(req.category, "online") == 0) {
TAILQ_FOREACH(raid_bdev, &g_spdk_raid_bdev_configured_list, link_specific_list) {
raid_bdev_ctxt = SPDK_CONTAINEROF(raid_bdev, struct raid_bdev_ctxt, raid_bdev);
spdk_json_write_string(w, raid_bdev_ctxt->bdev.name);
}
} else if (strcmp(req.category, "configuring") == 0) {
TAILQ_FOREACH(raid_bdev, &g_spdk_raid_bdev_configuring_list, link_specific_list) {
raid_bdev_ctxt = SPDK_CONTAINEROF(raid_bdev, struct raid_bdev_ctxt, raid_bdev);
spdk_json_write_string(w, raid_bdev_ctxt->bdev.name);
}
} else {
TAILQ_FOREACH(raid_bdev, &g_spdk_raid_bdev_offline_list, link_specific_list) {
raid_bdev_ctxt = SPDK_CONTAINEROF(raid_bdev, struct raid_bdev_ctxt, raid_bdev);
spdk_json_write_string(w, raid_bdev_ctxt->bdev.name);
}
}
spdk_json_write_array_end(w);
spdk_jsonrpc_end_result(request, w);
free_rpc_get_raid_bdevs(&req);
}
SPDK_RPC_REGISTER("get_raid_bdevs", spdk_rpc_get_raid_bdevs, SPDK_RPC_RUNTIME)
/*
* Base bdevs in RPC construct_raid
*/
struct rpc_construct_raid_base_bdevs {
/* Number of base bdevs */
size_t num_base_bdevs;
/* List of base bdevs names */
char *base_bdevs[RPC_MAX_BASE_BDEVS];
};
/*
* Input structure for RPC construct_raid
*/
struct rpc_construct_raid_bdev {
/* Raid bdev name */
char *name;
/* RAID strip size */
uint32_t strip_size;
/* RAID raid level */
uint8_t raid_level;
/* Base bdevs information */
struct rpc_construct_raid_base_bdevs base_bdevs;
};
/*
* brief:
* free_rpc_construct_raid_bdev function is to free RPC construct_raid_bdev related parameters
* params:
* req - pointer to RPC request
* returns:
* none
*/
static void
free_rpc_construct_raid_bdev(struct rpc_construct_raid_bdev *req)
{
free(req->name);
for (size_t iter = 0; iter < req->base_bdevs.num_base_bdevs; iter++) {
free(req->base_bdevs.base_bdevs[iter]);
}
}
/*
* Decoder function for RPC construct_raid_bdev to decode base bdevs list
*/
static int
decode_base_bdevs(const struct spdk_json_val *val, void *out)
{
struct rpc_construct_raid_base_bdevs *base_bdevs = out;
return spdk_json_decode_array(val, spdk_json_decode_string, base_bdevs->base_bdevs,
RPC_MAX_BASE_BDEVS, &base_bdevs->num_base_bdevs, sizeof(char *));
}
/*
* Decoder object for RPC construct_raid
*/
static const struct spdk_json_object_decoder rpc_construct_raid_bdev_decoders[] = {
{"name", offsetof(struct rpc_construct_raid_bdev, name), spdk_json_decode_string},
{"strip_size", offsetof(struct rpc_construct_raid_bdev, strip_size), spdk_json_decode_uint32},
{"raid_level", offsetof(struct rpc_construct_raid_bdev, raid_level), spdk_json_decode_uint32},
{"base_bdevs", offsetof(struct rpc_construct_raid_bdev, base_bdevs), decode_base_bdevs},
};
/*
* brief:
* raid_bdev_config_cleanup function is used to free memory for one raid_bdev in configuration
* params:
* none
* returns:
* none
*/
static void
raid_bdev_config_cleanup(void)
{
void *temp_ptr;
temp_ptr = realloc(g_spdk_raid_config.raid_bdev_config,
sizeof(struct raid_bdev_config) * (g_spdk_raid_config.total_raid_bdev - 1));
if (temp_ptr != NULL) {
g_spdk_raid_config.raid_bdev_config = temp_ptr;
} else {
SPDK_ERRLOG("Config memory allocation failed\n");
assert(0);
}
g_spdk_raid_config.total_raid_bdev--;
}
/*
* brief:
* check_and_remove_raid_bdev function free base bdev descriptors, unclaim the base
* bdevs and free the raid. This function is used to cleanup when raid is not
* able to successfully create during constructing the raid via RPC
* params:
* raid_bdev_config - pointer to raid_bdev_config structure
* returns:
* NULL - raid not present
* non NULL - raid present, returns raid_bdev_ctxt
*/
static void
check_and_remove_raid_bdev(struct raid_bdev_config *raid_bdev_config)
{
struct raid_bdev *raid_bdev;
struct raid_bdev_ctxt *raid_bdev_ctxt;
/* Get the raid structured allocated if exists */
raid_bdev_ctxt = raid_bdev_config->raid_bdev_ctxt;
if (raid_bdev_ctxt == NULL) {
return;
}
/*
* raid should be in configuring state as this function is used to cleanup
* the raid during unsuccessful construction of raid
*/
assert(raid_bdev_ctxt->raid_bdev.state == RAID_BDEV_STATE_CONFIGURING);
raid_bdev = &raid_bdev_ctxt->raid_bdev;
for (uint32_t iter = 0; iter < raid_bdev->num_base_bdevs; iter++) {
assert(raid_bdev->base_bdev_info != NULL);
if (raid_bdev->base_bdev_info[iter].base_bdev) {
/* Release base bdev related resources */
spdk_bdev_module_release_bdev(raid_bdev->base_bdev_info[iter].base_bdev);
spdk_bdev_close(raid_bdev->base_bdev_info[iter].base_bdev_desc);
raid_bdev->base_bdev_info[iter].base_bdev_desc = NULL;
raid_bdev->base_bdev_info[iter].base_bdev = NULL;
assert(raid_bdev->num_base_bdevs_discovered);
raid_bdev->num_base_bdevs_discovered--;
}
}
/* Free raid */
assert(raid_bdev->num_base_bdevs_discovered == 0);
TAILQ_REMOVE(&g_spdk_raid_bdev_configuring_list, raid_bdev, link_specific_list);
TAILQ_REMOVE(&g_spdk_raid_bdev_list, raid_bdev, link_global_list);
free(raid_bdev->base_bdev_info);
free(raid_bdev_ctxt);
raid_bdev_config->raid_bdev_ctxt = NULL;
}
/*
* brief:
* spdk_rpc_construct_raid_bdev function is the RPC for construct_raids. It takes
* input as raid bdev name, raid level, strip size in KB and list of base bdev names.
* params:
* requuest - pointer to json rpc request
* params - pointer to request parameters
* returns:
* none
*/
static void
spdk_rpc_construct_raid_bdev(struct spdk_jsonrpc_request *request,
const struct spdk_json_val *params)
{
struct rpc_construct_raid_bdev req = {};
struct spdk_json_write_ctx *w;
struct raid_bdev_ctxt *raid_bdev_ctxt;
void *temp_ptr;
struct raid_base_bdev_config *base_bdevs;
struct raid_bdev_config *raid_bdev_config;
struct spdk_bdev *base_bdev;
if (spdk_json_decode_object(params, rpc_construct_raid_bdev_decoders,
SPDK_COUNTOF(rpc_construct_raid_bdev_decoders),
&req)) {
spdk_jsonrpc_send_error_response(request, SPDK_JSONRPC_ERROR_INVALID_PARAMS, "Invalid parameters");
return;
}
/* Fail the command if raid bdev is already present */
raid_bdev_ctxt = check_raid_bdev_present(req.name);
if (raid_bdev_ctxt != NULL) {
spdk_jsonrpc_send_error_response(request, SPDK_JSONRPC_ERROR_INVALID_PARAMS,
"raid bdev already present");
free_rpc_construct_raid_bdev(&req);
return;
}
/* Fail the command if input raid level is other than 0 */
if (req.raid_level != 0) {
spdk_jsonrpc_send_error_response(request, SPDK_JSONRPC_ERROR_INVALID_PARAMS, "invalid raid level");
free_rpc_construct_raid_bdev(&req);
return;
}
if (spdk_u32_is_pow2(req.strip_size) == false) {
spdk_jsonrpc_send_error_response(request, SPDK_JSONRPC_ERROR_INVALID_PARAMS, "invalid strip size");
free_rpc_construct_raid_bdev(&req);
return;
}
base_bdevs = calloc(req.base_bdevs.num_base_bdevs, sizeof(struct raid_base_bdev_config));
if (base_bdevs == NULL) {
spdk_jsonrpc_send_error_response(request, SPDK_JSONRPC_ERROR_INTERNAL_ERROR, spdk_strerror(ENOMEM));
free_rpc_construct_raid_bdev(&req);
return;
}
/* Insert the new raid bdev config entry */
temp_ptr = realloc(g_spdk_raid_config.raid_bdev_config,
sizeof(struct raid_bdev_config) * (g_spdk_raid_config.total_raid_bdev + 1));
if (temp_ptr == NULL) {
free(base_bdevs);
spdk_jsonrpc_send_error_response(request, SPDK_JSONRPC_ERROR_INTERNAL_ERROR, spdk_strerror(ENOMEM));
free_rpc_construct_raid_bdev(&req);
return;
}
g_spdk_raid_config.raid_bdev_config = temp_ptr;
for (size_t iter = 0; iter < g_spdk_raid_config.total_raid_bdev; iter++) {
g_spdk_raid_config.raid_bdev_config[iter].raid_bdev_ctxt->raid_bdev.raid_bdev_config =
&g_spdk_raid_config.raid_bdev_config[iter];
}
raid_bdev_config = &g_spdk_raid_config.raid_bdev_config[g_spdk_raid_config.total_raid_bdev];
memset(raid_bdev_config, 0, sizeof(*raid_bdev_config));
raid_bdev_config->name = req.name;
raid_bdev_config->strip_size = req.strip_size;
raid_bdev_config->num_base_bdevs = req.base_bdevs.num_base_bdevs;
raid_bdev_config->raid_level = req.raid_level;
g_spdk_raid_config.total_raid_bdev++;
raid_bdev_config->base_bdev = base_bdevs;
for (size_t iter = 0; iter < raid_bdev_config->num_base_bdevs; iter++) {
raid_bdev_config->base_bdev[iter].bdev_name = req.base_bdevs.base_bdevs[iter];
}
for (size_t iter = 0; iter < raid_bdev_config->num_base_bdevs; iter++) {
/* Check if base_bdev exists already, if not fail the command */
base_bdev = spdk_bdev_get_by_name(req.base_bdevs.base_bdevs[iter]);
if (base_bdev == NULL) {
check_and_remove_raid_bdev(&g_spdk_raid_config.raid_bdev_config[g_spdk_raid_config.total_raid_bdev -
1]);
raid_bdev_config_cleanup();
free(base_bdevs);
spdk_jsonrpc_send_error_response(request, SPDK_JSONRPC_ERROR_INTERNAL_ERROR, "base bdev not found");
free_rpc_construct_raid_bdev(&req);
return;
}
/*
* Try to add base_bdev to this raid bdev, if not able to add fail the
* command. This might be because this base_bdev may already be claimed
* by some other module
*/
if (raid_bdev_add_base_device(base_bdev)) {
check_and_remove_raid_bdev(&g_spdk_raid_config.raid_bdev_config[g_spdk_raid_config.total_raid_bdev -
1]);
raid_bdev_config_cleanup();
free(base_bdevs);
spdk_jsonrpc_send_error_response(request, SPDK_JSONRPC_ERROR_INTERNAL_ERROR,
"base bdev can't be added because of either memory allocation failed or not able to claim");
free_rpc_construct_raid_bdev(&req);
return;
}
}
w = spdk_jsonrpc_begin_result(request);
if (w == NULL) {
return;
}
spdk_json_write_bool(w, true);
spdk_jsonrpc_end_result(request, w);
}
SPDK_RPC_REGISTER("construct_raid_bdev", spdk_rpc_construct_raid_bdev, SPDK_RPC_RUNTIME)
/*
* Input structure for RPC destroy_raid
*/
struct rpc_destroy_raid_bdev {
/* raid bdev name */
char *name;
};
/*
* brief:
* free_rpc_destroy_raid_bdev function is used to free RPC destroy_raid_bdev related parameters
* params:
* req - pointer to RPC request
* params:
* none
*/
static void
free_rpc_destroy_raid_bdev(struct rpc_destroy_raid_bdev *req)
{
free(req->name);
}
/*
* Decoder object for RPC destroy_raid
*/
static const struct spdk_json_object_decoder rpc_destroy_raid_bdev_decoders[] = {
{"name", offsetof(struct rpc_destroy_raid_bdev, name), spdk_json_decode_string},
};
/*
* brief:
* Since destroying raid_bdev is asynchronous operation, so this function is
* used to check if raid bdev still exists. If raid bdev is still there it will create
* event and check later, otherwise it will proceed with cleanup
* params:
* arg - pointer to raid bdev cfg
* returns:
* none
*/
static void
raid_bdev_config_destroy_check_raid_bdev_exists(void *arg)
{
struct raid_bdev_config *raid_cfg = arg;
assert(raid_cfg != NULL);
if (raid_cfg->raid_bdev_ctxt != NULL) {
/* If raid bdev still exists, schedule event and come back later */
spdk_thread_send_msg(spdk_get_thread(), raid_bdev_config_destroy_check_raid_bdev_exists, raid_cfg);
return;
} else {
/* If raid bdev does not exist now, go for raid bdev config cleanup */
raid_bdev_config_destroy(raid_cfg);
}
}
/*
* brief:
* This function will destroy the raid bdev at given slot
* params:
* slot - slot number of raid bdev config to destroy
* returns:
* none
*/
static void
raid_bdev_config_destroy(struct raid_bdev_config *raid_cfg)
{
void *temp_ptr;
uint8_t iter;
struct raid_bdev_config *raid_cfg_next;
uint8_t slot;
assert(raid_cfg != NULL);
if (raid_cfg->raid_bdev_ctxt != NULL) {
/*
* If raid bdev exists for this config, wait for raid bdev to get
* destroyed and come back later
*/
spdk_thread_send_msg(spdk_get_thread(), raid_bdev_config_destroy_check_raid_bdev_exists, raid_cfg);
return;
}
/* Destroy raid bdev config and cleanup */
for (uint8_t iter2 = 0; iter2 < raid_cfg->num_base_bdevs; iter2++) {
free(raid_cfg->base_bdev[iter2].bdev_name);
}
free(raid_cfg->base_bdev);
free(raid_cfg->name);
slot = raid_cfg - g_spdk_raid_config.raid_bdev_config;
assert(slot < g_spdk_raid_config.total_raid_bdev);
if (slot != g_spdk_raid_config.total_raid_bdev - 1) {
iter = slot;
while (iter < g_spdk_raid_config.total_raid_bdev - 1) {
raid_cfg = &g_spdk_raid_config.raid_bdev_config[iter];
raid_cfg_next = &g_spdk_raid_config.raid_bdev_config[iter + 1];
raid_cfg->base_bdev = raid_cfg_next->base_bdev;
raid_cfg->raid_bdev_ctxt = raid_cfg_next->raid_bdev_ctxt;
raid_cfg->name = raid_cfg_next->name;
raid_cfg->strip_size = raid_cfg_next->strip_size;
raid_cfg->num_base_bdevs = raid_cfg_next->num_base_bdevs;
raid_cfg->raid_level = raid_cfg_next->raid_level;
iter++;
}
}
temp_ptr = realloc(g_spdk_raid_config.raid_bdev_config,
sizeof(struct raid_bdev_config) * (g_spdk_raid_config.total_raid_bdev - 1));
if (temp_ptr != NULL) {
g_spdk_raid_config.raid_bdev_config = temp_ptr;
g_spdk_raid_config.total_raid_bdev--;
for (iter = 0; iter < g_spdk_raid_config.total_raid_bdev; iter++) {
g_spdk_raid_config.raid_bdev_config[iter].raid_bdev_ctxt->raid_bdev.raid_bdev_config =
&g_spdk_raid_config.raid_bdev_config[iter];
}
} else {
if (g_spdk_raid_config.total_raid_bdev == 1) {
g_spdk_raid_config.total_raid_bdev--;
g_spdk_raid_config.raid_bdev_config = NULL;
} else {
SPDK_ERRLOG("Config memory allocation failed\n");
assert(0);
}
}
}
/*
* brief:
* spdk_rpc_destroy_raid_bdev function is the RPC for destroy_raid. It takes raid
* name as input and destroy that raid bdev including freeing the base bdev
* resources.
* params:
* requuest - pointer to json rpc request
* params - pointer to request parameters
* returns:
* none
*/
static void
spdk_rpc_destroy_raid_bdev(struct spdk_jsonrpc_request *request, const struct spdk_json_val *params)
{
struct rpc_destroy_raid_bdev req = {};
struct spdk_json_write_ctx *w;
struct raid_bdev_config *raid_bdev_config = NULL;
struct spdk_bdev *base_bdev;
if (spdk_json_decode_object(params, rpc_destroy_raid_bdev_decoders,
SPDK_COUNTOF(rpc_destroy_raid_bdev_decoders),
&req)) {
spdk_jsonrpc_send_error_response(request, SPDK_JSONRPC_ERROR_INVALID_PARAMS, "Invalid parameters");
return;
}
/* Find raid bdev config for this raid bdev */
for (uint32_t iter = 0; iter < g_spdk_raid_config.total_raid_bdev; iter++) {
if (strcmp(g_spdk_raid_config.raid_bdev_config[iter].name, req.name) == 0) {
raid_bdev_config = &g_spdk_raid_config.raid_bdev_config[iter];
break;
}
}
if (raid_bdev_config == NULL) {
spdk_jsonrpc_send_error_response(request, SPDK_JSONRPC_ERROR_INVALID_PARAMS,
"raid bdev name not found");
free_rpc_destroy_raid_bdev(&req);
return;
}
/* Remove all the base bdevs from this raid bdev before destroying the raid bdev */
for (uint32_t iter = 0; iter < raid_bdev_config->num_base_bdevs; iter++) {
base_bdev = spdk_bdev_get_by_name(raid_bdev_config->base_bdev[iter].bdev_name);
if (base_bdev != NULL) {
raid_bdev_remove_base_bdev(base_bdev);
}
}
/*
* Call to destroy the raid bdev, but it will only destroy raid bdev if underlying
* cleanup is done
*/
raid_bdev_config_destroy(raid_bdev_config);
w = spdk_jsonrpc_begin_result(request);
if (w == NULL) {
free_rpc_destroy_raid_bdev(&req);
return;
}
spdk_json_write_bool(w, true);
spdk_jsonrpc_end_result(request, w);
free_rpc_destroy_raid_bdev(&req);
}
SPDK_RPC_REGISTER("destroy_raid_bdev", spdk_rpc_destroy_raid_bdev, SPDK_RPC_RUNTIME)

View File

@ -57,6 +57,10 @@ BLOCKDEV_MODULES_LIST += bdev_rbd
BLOCKDEV_MODULES_DEPS += -lrados -lrbd
endif
ifeq ($(CONFIG_RAID),y)
BLOCKDEV_MODULES_LIST += vbdev_raid
endif
ifeq ($(CONFIG_PMDK),y)
BLOCKDEV_MODULES_LIST += bdev_pmem
BLOCKDEV_MODULES_DEPS += -lpmemblk

View File

@ -960,6 +960,45 @@ if __name__ == "__main__":
p.add_argument('-l', '--lvs-name', help='lvol store name', required=False)
p.set_defaults(func=get_lvol_stores)
@call_cmd
def get_raid_bdevs(args):
print_array(rpc.bdev.get_raid_bdevs(args.client,
category=args.category))
p = subparsers.add_parser('get_raid_bdevs', help="""This is used to list all the raid bdev names based on the input category
requested. Category should be one of 'all', 'online', 'configuring' or 'offline'. 'all' means all the raid bdevs whether
they are online or configuring or offline. 'online' is the raid bdev which is registered with bdev layer. 'configuring'
is the raid bdev which does not have full configuration discovered yet. 'offline' is the raid bdev which is not registered
with bdev as of now and it has encountered any error or user has requested to offline the raid bdev""")
p.add_argument('category', help='all or online or configuring or offline')
p.set_defaults(func=get_raid_bdevs)
@call_cmd
def construct_raid_bdev(args):
base_bdevs = []
for u in args.base_bdevs.strip().split(" "):
base_bdevs.append(u)
rpc.bdev.construct_raid_bdev(args.client,
name=args.name,
strip_size=args.strip_size,
raid_level=args.raid_level,
base_bdevs=base_bdevs)
p = subparsers.add_parser('construct_raid_bdev', help='Construct new raid bdev')
p.add_argument('-n', '--name', help='raid bdev name', required=True)
p.add_argument('-s', '--strip-size', help='strip size in KB', type=int, required=True)
p.add_argument('-r', '--raid-level', help='raid level, only raid level 0 is supported', type=int, required=True)
p.add_argument('-b', '--base-bdevs', help='base bdevs name, whitespace separated list in quotes', required=True)
p.set_defaults(func=construct_raid_bdev)
@call_cmd
def destroy_raid_bdev(args):
rpc.bdev.destroy_raid_bdev(args.client,
name=args.name)
p = subparsers.add_parser('destroy_raid_bdev', help='Destroy existing raid bdev')
p.add_argument('name', help='raid bdev name')
p.set_defaults(func=destroy_raid_bdev)
# split
@call_cmd
def construct_split_vbdev(args):

View File

@ -74,6 +74,49 @@ def delete_null_bdev(client, name):
return client.call('delete_null_bdev', params)
def get_raid_bdevs(client, category):
"""Get list of raid bdevs based on category
Args:
category: any one of all or online or configuring or offline
Returns:
List of raid bdev names
"""
params = {'category': category}
return client.call('get_raid_bdevs', params)
def construct_raid_bdev(client, name, strip_size, raid_level, base_bdevs):
"""Construct pooled device
Args:
name: user defined raid bdev name
strip_size: strip size of raid bdev in KB, supported values like 8, 16, 32, 64, 128, 256, 512, 1024 etc
raid_level: raid level of raid bdev, supported values 0
base_bdevs: Space separated names of Nvme bdevs in double quotes, like "Nvme0n1 Nvme1n1 Nvme2n1"
Returns:
None
"""
params = {'name': name, 'strip_size': strip_size, 'raid_level': raid_level, 'base_bdevs': base_bdevs}
return client.call('construct_raid_bdev', params)
def destroy_raid_bdev(client, name):
"""Destroy pooled device
Args:
name: raid bdev name
Returns:
None
"""
params = {'name': name}
return client.call('destroy_raid_bdev', params)
def construct_aio_bdev(client, filename, name, block_size=None):
"""Construct a Linux AIO block device.

View File

@ -34,7 +34,7 @@
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../../../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
DIRS-y = bdev.c part.c scsi_nvme.c gpt vbdev_lvol.c mt
DIRS-y = bdev.c part.c scsi_nvme.c gpt vbdev_lvol.c mt bdev_raid.c
DIRS-$(CONFIG_PMDK) += pmem

View File

@ -0,0 +1 @@
bdev_raid_ut

View File

@ -0,0 +1,56 @@
#
# BSD LICENSE
#
# Copyright (c) Intel Corporation.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../../../../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
SPDK_LIB_LIST = log
CFLAGS += -I$(SPDK_ROOT_DIR)/test
CFLAGS += -I$(SPDK_ROOT_DIR)/lib/bdev
LIBS += $(SPDK_LIB_LINKER_ARGS)
LIBS += -lcunit
APP = bdev_raid_ut
C_SRCS = bdev_raid_ut.c
all: $(APP)
$(APP): $(OBJS) $(SPDK_LIB_FILES)
$(LINK_C)
clean:
$(CLEAN_C) $(APP)
include $(SPDK_ROOT_DIR)/mk/spdk.deps.mk

File diff suppressed because it is too large Load Diff

View File

@ -48,6 +48,7 @@ fi
$valgrind $testdir/include/spdk/histogram_data.h/histogram_ut
$valgrind $testdir/lib/bdev/bdev.c/bdev_ut
$valgrind $testdir/lib/bdev/bdev_raid.c/bdev_raid_ut
$valgrind $testdir/lib/bdev/part.c/part_ut
$valgrind $testdir/lib/bdev/scsi_nvme.c/scsi_nvme_ut
$valgrind $testdir/lib/bdev/gpt/gpt.c/gpt_ut