numam-spdk/lib/idxd/idxd.c

1177 lines
27 KiB
C
Raw Normal View History

lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
/*-
* BSD LICENSE
*
* Copyright (c) Intel Corporation.
* All rights reserved.
*
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "spdk/stdinc.h"
#include "spdk/env.h"
#include "spdk/util.h"
#include "spdk/memory.h"
#include "spdk/likely.h"
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
#include "spdk/log.h"
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
#include "spdk_internal/idxd.h"
#include "idxd.h"
#define ALIGN_4K 0x1000
#define USERSPACE_DRIVER_NAME "user"
#define KERNEL_DRIVER_NAME "kernel"
/*
* Need to limit how many completions we reap in one poller to avoid starving
* other threads as callers can submit new operations on the polling thread.
*/
#define MAX_COMPLETIONS_PER_POLL 16
static STAILQ_HEAD(, spdk_idxd_impl) g_idxd_impls = STAILQ_HEAD_INITIALIZER(g_idxd_impls);
static struct spdk_idxd_impl *g_idxd_impl;
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
/*
* g_dev_cfg gives us 2 pre-set configurations of DSA to choose from
* via RPC.
*/
struct device_config *g_dev_cfg = NULL;
/*
* Pre-built configurations. Variations depend on various factors
* including how many different types of target latency profiles there
* are, how many different QOS requirements there might be, etc.
*/
struct device_config g_dev_cfg0 = {
.config_num = 0,
lib/idxd: change config #0 to something more sane Config #1 remains what is shown as an example in the spec. Change config #0 to just have 1 work group and 1 work queue all backed by 4 engines. As the majority of initial use cases will not be implementing separate priorities and/or different back end targets (mem, pmem, etc) having just 1 group and work queue makes the most sense as it allows the silicon to decide which engine to use. Also, having multiple work queues spreads out the available entires such that if we're not using all of the work queues then we're not using all of the resources. As channels are created they are assigned the next available device. As a channel is assigned a device that is already in use it will round robin work queues. If then, for example, we have 16 devices then only the first work queue will ever be used for the first 16 threads which seems and if there are even just 2 work queues per device it would take 32 threads to use all of the resources at the device. By haing just one work queue per device we always have the max number of work queue entries available regardless of how many threads are being used. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie15ff6bdea12525fe3bfc769613084ddd2de50bf Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/5845 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Ziye Yang <ziye.yang@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2021-01-09 21:29:34 +00:00
.num_groups = 1,
.total_wqs = 1,
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
.total_engines = 4,
};
struct device_config g_dev_cfg1 = {
.config_num = 1,
.num_groups = 2,
.total_wqs = 4,
.total_engines = 4,
};
static inline void
_submit_to_hw(struct spdk_idxd_io_channel *chan, struct idxd_ops *op)
{
TAILQ_INSERT_TAIL(&chan->ops_outstanding, op, link);
movdir64b(chan->portal + chan->portal_offset, op->desc);
chan->portal_offset = (chan->portal_offset + chan->idxd->chan_per_device * PORTAL_STRIDE) &
PORTAL_MASK;
}
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
struct spdk_idxd_io_channel *
spdk_idxd_get_channel(struct spdk_idxd_device *idxd)
{
struct spdk_idxd_io_channel *chan;
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
struct idxd_batch *batch;
int i, num_batches;
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
assert(idxd != NULL);
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
chan = calloc(1, sizeof(struct spdk_idxd_io_channel));
if (chan == NULL) {
SPDK_ERRLOG("Failed to allocate idxd chan\n");
return NULL;
}
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
num_batches = idxd->queues[idxd->wq_id].wqcfg.wq_size / idxd->chan_per_device;
chan->batch_base = calloc(num_batches, sizeof(struct idxd_batch));
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
if (chan->batch_base == NULL) {
SPDK_ERRLOG("Failed to allocate batch pool\n");
free(chan);
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
return NULL;
}
pthread_mutex_lock(&idxd->num_channels_lock);
if (idxd->num_channels == idxd->chan_per_device) {
/* too many channels sharing this device */
pthread_mutex_unlock(&idxd->num_channels_lock);
free(chan->batch_base);
free(chan);
return NULL;
}
/* Have each channel start at a different offset. */
chan->portal_offset = (idxd->num_channels * PORTAL_STRIDE) & PORTAL_MASK;
idxd->num_channels++;
pthread_mutex_unlock(&idxd->num_channels_lock);
chan->idxd = idxd;
TAILQ_INIT(&chan->ops_pool);
TAILQ_INIT(&chan->batch_pool);
TAILQ_INIT(&chan->ops_outstanding);
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
batch = chan->batch_base;
for (i = 0 ; i < num_batches ; i++) {
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
TAILQ_INSERT_TAIL(&chan->batch_pool, batch, link);
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
batch++;
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
}
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
return chan;
}
void
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
spdk_idxd_put_channel(struct spdk_idxd_io_channel *chan)
{
struct idxd_batch *batch;
assert(chan != NULL);
pthread_mutex_lock(&chan->idxd->num_channels_lock);
assert(chan->idxd->num_channels > 0);
chan->idxd->num_channels--;
pthread_mutex_unlock(&chan->idxd->num_channels_lock);
spdk_free(chan->ops_base);
spdk_free(chan->desc_base);
while ((batch = TAILQ_FIRST(&chan->batch_pool))) {
TAILQ_REMOVE(&chan->batch_pool, batch, link);
spdk_free(batch->user_ops);
spdk_free(batch->user_desc);
}
free(chan->batch_base);
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
free(chan);
}
/* returns the total max operations for channel. */
int
spdk_idxd_chan_get_max_operations(struct spdk_idxd_io_channel *chan)
{
assert(chan != NULL);
return chan->idxd->total_wq_size / chan->idxd->chan_per_device;
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
}
inline static int
_vtophys(const void *buf, uint64_t *buf_addr, uint64_t size)
{
uint64_t updated_size = size;
*buf_addr = spdk_vtophys(buf, &updated_size);
if (*buf_addr == SPDK_VTOPHYS_ERROR) {
SPDK_ERRLOG("Error translating address\n");
return -EINVAL;
}
if (updated_size < size) {
SPDK_ERRLOG("Error translating size (0x%lx), return size (0x%lx)\n", size, updated_size);
return -EINVAL;
}
return 0;
}
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
int
spdk_idxd_configure_chan(struct spdk_idxd_io_channel *chan)
{
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
struct idxd_batch *batch;
int rc, num_descriptors, i;
struct idxd_hw_desc *desc;
struct idxd_ops *op;
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
assert(chan != NULL);
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
/* Round robin the WQ selection for the chan on this IDXD device. */
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
chan->idxd->wq_id++;
if (chan->idxd->wq_id == g_dev_cfg->total_wqs) {
chan->idxd->wq_id = 0;
}
num_descriptors = chan->idxd->queues[chan->idxd->wq_id].wqcfg.wq_size / chan->idxd->chan_per_device;
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
chan->desc_base = desc = spdk_zmalloc(num_descriptors * sizeof(struct idxd_hw_desc),
0x40, NULL,
SPDK_ENV_LCORE_ID_ANY, SPDK_MALLOC_DMA);
if (chan->desc_base == NULL) {
SPDK_ERRLOG("Failed to allocate descriptor memory\n");
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
return -ENOMEM;
}
chan->ops_base = op = spdk_zmalloc(num_descriptors * sizeof(struct idxd_ops),
0x40, NULL,
SPDK_ENV_LCORE_ID_ANY, SPDK_MALLOC_DMA);
if (chan->ops_base == NULL) {
SPDK_ERRLOG("Failed to allocate completion memory\n");
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
rc = -ENOMEM;
goto err_op;
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
}
for (i = 0; i < num_descriptors; i++) {
TAILQ_INSERT_TAIL(&chan->ops_pool, op, link);
op->desc = desc;
rc = _vtophys(&op->hw, &desc->completion_addr, sizeof(struct idxd_hw_comp_record));
if (rc) {
SPDK_ERRLOG("Failed to translate completion memory\n");
rc = -ENOMEM;
goto err_op;
}
op++;
desc++;
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
}
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
/* Populate the batches */
TAILQ_FOREACH(batch, &chan->batch_pool, link) {
batch->user_desc = desc = spdk_zmalloc(DESC_PER_BATCH * sizeof(struct idxd_hw_desc),
0x40, NULL,
SPDK_ENV_LCORE_ID_ANY, SPDK_MALLOC_DMA);
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
if (batch->user_desc == NULL) {
SPDK_ERRLOG("Failed to allocate batch descriptor memory\n");
rc = -ENOMEM;
goto err_user_desc_or_op;
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
}
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
batch->user_ops = op = spdk_zmalloc(DESC_PER_BATCH * sizeof(struct idxd_ops),
0x40, NULL,
SPDK_ENV_LCORE_ID_ANY, SPDK_MALLOC_DMA);
if (batch->user_ops == NULL) {
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
SPDK_ERRLOG("Failed to allocate user completion memory\n");
rc = -ENOMEM;
goto err_user_desc_or_op;
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
}
for (i = 0; i < DESC_PER_BATCH; i++) {
rc = _vtophys(&op->hw, &desc->completion_addr, sizeof(struct idxd_hw_comp_record));
if (rc) {
SPDK_ERRLOG("Failed to translate batch entry completion memory\n");
rc = -ENOMEM;
goto err_user_desc_or_op;
}
op++;
desc++;
}
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
}
chan->portal = chan->idxd->impl->portal_get_addr(chan->idxd);
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
return 0;
err_user_desc_or_op:
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
TAILQ_FOREACH(batch, &chan->batch_pool, link) {
spdk_free(batch->user_desc);
batch->user_desc = NULL;
spdk_free(batch->user_ops);
batch->user_ops = NULL;
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
}
spdk_free(chan->ops_base);
chan->ops_base = NULL;
err_op:
spdk_free(chan->desc_base);
chan->desc_base = NULL;
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
return rc;
}
static inline struct spdk_idxd_impl *
idxd_get_impl_by_name(const char *impl_name)
{
struct spdk_idxd_impl *impl;
assert(impl_name != NULL);
STAILQ_FOREACH(impl, &g_idxd_impls, link) {
if (0 == strcmp(impl_name, impl->name)) {
return impl;
}
}
return NULL;
}
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
/* Called via RPC to select a pre-defined configuration. */
void
spdk_idxd_set_config(uint32_t config_num, bool kernel_mode)
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
{
if (kernel_mode) {
g_idxd_impl = idxd_get_impl_by_name(KERNEL_DRIVER_NAME);
} else {
g_idxd_impl = idxd_get_impl_by_name(USERSPACE_DRIVER_NAME);
}
if (g_idxd_impl == NULL) {
SPDK_ERRLOG("Cannot set the idxd implementation with %s mode\n",
kernel_mode ? KERNEL_DRIVER_NAME : USERSPACE_DRIVER_NAME);
return;
}
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
switch (config_num) {
case 0:
g_dev_cfg = &g_dev_cfg0;
break;
case 1:
g_dev_cfg = &g_dev_cfg1;
break;
default:
g_dev_cfg = &g_dev_cfg0;
SPDK_ERRLOG("Invalid config, using default\n");
break;
}
g_idxd_impl->set_config(g_dev_cfg, config_num);
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
}
static void
idxd_device_destruct(struct spdk_idxd_device *idxd)
{
assert(idxd->impl != NULL);
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
idxd->impl->destruct(idxd);
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
}
int
spdk_idxd_probe(void *cb_ctx, spdk_idxd_attach_cb attach_cb)
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
{
if (g_idxd_impl == NULL) {
SPDK_ERRLOG("No idxd impl is selected\n");
return -1;
}
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
return g_idxd_impl->probe(cb_ctx, attach_cb);
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
}
void
spdk_idxd_detach(struct spdk_idxd_device *idxd)
{
assert(idxd != NULL);
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
idxd_device_destruct(idxd);
}
static int
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
_idxd_prep_command(struct spdk_idxd_io_channel *chan, spdk_idxd_req_cb cb_fn,
void *cb_arg, struct idxd_hw_desc **_desc, struct idxd_ops **_op)
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
{
struct idxd_hw_desc *desc;
struct idxd_ops *op;
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
if (!TAILQ_EMPTY(&chan->ops_pool)) {
op = *_op = TAILQ_FIRST(&chan->ops_pool);
desc = *_desc = op->desc;
TAILQ_REMOVE(&chan->ops_pool, op, link);
} else {
/* The application needs to handle this, violation of flow control */
return -EBUSY;
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
}
desc->flags = IDXD_FLAG_COMPLETION_ADDR_VALID | IDXD_FLAG_REQUEST_COMPLETION;
op->cb_arg = cb_arg;
op->cb_fn = cb_fn;
op->batch = NULL;
return 0;
}
int
spdk_idxd_submit_copy(struct spdk_idxd_io_channel *chan, void *dst, const void *src,
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
uint64_t nbytes, spdk_idxd_req_cb cb_fn, void *cb_arg)
{
struct idxd_hw_desc *desc;
struct idxd_ops *op;
uint64_t src_addr, dst_addr;
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
int rc;
assert(chan != NULL);
assert(dst != NULL);
assert(src != NULL);
/* Common prep. */
rc = _idxd_prep_command(chan, cb_fn, cb_arg, &desc, &op);
if (rc) {
return rc;
}
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
rc = _vtophys(src, &src_addr, nbytes);
if (rc) {
goto error;
}
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
rc = _vtophys(dst, &dst_addr, nbytes);
if (rc) {
goto error;
}
/* Command specific. */
desc->opcode = IDXD_OPCODE_MEMMOVE;
desc->src_addr = src_addr;
desc->dst_addr = dst_addr;
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
desc->xfer_size = nbytes;
desc->flags |= IDXD_FLAG_CACHE_CONTROL; /* direct IO to CPU cache instead of mem */
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
/* Submit operation. */
_submit_to_hw(chan, op);
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
return 0;
error:
TAILQ_INSERT_TAIL(&chan->ops_pool, op, link);
return rc;
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
}
/* Dual-cast copies the same source to two separate destination buffers. */
int
spdk_idxd_submit_dualcast(struct spdk_idxd_io_channel *chan, void *dst1, void *dst2,
const void *src, uint64_t nbytes, spdk_idxd_req_cb cb_fn, void *cb_arg)
{
struct idxd_hw_desc *desc;
struct idxd_ops *op;
uint64_t src_addr, dst1_addr, dst2_addr;
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
int rc;
assert(chan != NULL);
assert(dst1 != NULL);
assert(dst2 != NULL);
assert(src != NULL);
if ((uintptr_t)dst1 & (ALIGN_4K - 1) || (uintptr_t)dst2 & (ALIGN_4K - 1)) {
SPDK_ERRLOG("Dualcast requires 4K alignment on dst addresses\n");
return -EINVAL;
}
/* Common prep. */
rc = _idxd_prep_command(chan, cb_fn, cb_arg, &desc, &op);
if (rc) {
return rc;
}
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
rc = _vtophys(src, &src_addr, nbytes);
if (rc) {
goto error;
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
}
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
rc = _vtophys(dst1, &dst1_addr, nbytes);
if (rc) {
goto error;
}
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
rc = _vtophys(dst2, &dst2_addr, nbytes);
if (rc) {
goto error;
}
/* Command specific. */
desc->opcode = IDXD_OPCODE_DUALCAST;
desc->src_addr = src_addr;
desc->dst_addr = dst1_addr;
desc->dest2 = dst2_addr;
desc->xfer_size = nbytes;
desc->flags |= IDXD_FLAG_CACHE_CONTROL; /* direct IO to CPU cache instead of mem */
/* Submit operation. */
_submit_to_hw(chan, op);
return 0;
error:
TAILQ_INSERT_TAIL(&chan->ops_pool, op, link);
return rc;
}
int
spdk_idxd_submit_compare(struct spdk_idxd_io_channel *chan, void *src1, const void *src2,
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
uint64_t nbytes, spdk_idxd_req_cb cb_fn, void *cb_arg)
{
struct idxd_hw_desc *desc;
struct idxd_ops *op;
uint64_t src1_addr, src2_addr;
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
int rc;
assert(chan != NULL);
assert(src1 != NULL);
assert(src2 != NULL);
/* Common prep. */
rc = _idxd_prep_command(chan, cb_fn, cb_arg, &desc, &op);
if (rc) {
return rc;
}
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
rc = _vtophys(src1, &src1_addr, nbytes);
if (rc) {
goto error;
}
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
rc = _vtophys(src2, &src2_addr, nbytes);
if (rc) {
goto error;
}
/* Command specific. */
desc->opcode = IDXD_OPCODE_COMPARE;
desc->src_addr = src1_addr;
desc->src2_addr = src2_addr;
desc->xfer_size = nbytes;
/* Submit operation. */
_submit_to_hw(chan, op);
return 0;
error:
TAILQ_INSERT_TAIL(&chan->ops_pool, op, link);
return rc;
}
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
int
spdk_idxd_submit_fill(struct spdk_idxd_io_channel *chan, void *dst, uint64_t fill_pattern,
uint64_t nbytes, spdk_idxd_req_cb cb_fn, void *cb_arg)
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
{
struct idxd_hw_desc *desc;
struct idxd_ops *op;
uint64_t dst_addr;
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
int rc;
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
assert(chan != NULL);
assert(dst != NULL);
/* Common prep. */
rc = _idxd_prep_command(chan, cb_fn, cb_arg, &desc, &op);
if (rc) {
return rc;
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
}
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
rc = _vtophys(dst, &dst_addr, nbytes);
if (rc) {
TAILQ_INSERT_TAIL(&chan->ops_pool, op, link);
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
return rc;
}
/* Command specific. */
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
desc->opcode = IDXD_OPCODE_MEMFILL;
desc->pattern = fill_pattern;
desc->dst_addr = dst_addr;
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
desc->xfer_size = nbytes;
desc->flags |= IDXD_FLAG_CACHE_CONTROL; /* direct IO to CPU cache instead of mem */
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
/* Submit operation. */
_submit_to_hw(chan, op);
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
return 0;
}
int
spdk_idxd_submit_crc32c(struct spdk_idxd_io_channel *chan, uint32_t *crc_dst, void *src,
uint32_t seed, uint64_t nbytes,
spdk_idxd_req_cb cb_fn, void *cb_arg)
{
struct idxd_hw_desc *desc;
struct idxd_ops *op;
uint64_t src_addr;
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
int rc;
assert(chan != NULL);
assert(crc_dst != NULL);
assert(src != NULL);
/* Common prep. */
rc = _idxd_prep_command(chan, cb_fn, cb_arg, &desc, &op);
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
if (rc) {
return rc;
}
rc = _vtophys(src, &src_addr, nbytes);
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
if (rc) {
TAILQ_INSERT_TAIL(&chan->ops_pool, op, link);
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
return rc;
}
/* Command specific. */
desc->opcode = IDXD_OPCODE_CRC32C_GEN;
desc->dst_addr = 0; /* Per spec, needs to be clear. */
desc->src_addr = src_addr;
desc->flags &= IDXD_CLEAR_CRC_FLAGS;
desc->crc32c.seed = seed;
desc->xfer_size = nbytes;
op->crc_dst = crc_dst;
/* Submit operation. */
_submit_to_hw(chan, op);
return 0;
}
int
spdk_idxd_submit_copy_crc32c(struct spdk_idxd_io_channel *chan, void *dst, void *src,
uint32_t *crc_dst, uint32_t seed, uint64_t nbytes,
spdk_idxd_req_cb cb_fn, void *cb_arg)
{
struct idxd_hw_desc *desc;
struct idxd_ops *op;
uint64_t src_addr, dst_addr;
int rc;
assert(chan != NULL);
assert(dst != NULL);
assert(src != NULL);
assert(crc_dst != NULL);
/* Common prep. */
rc = _idxd_prep_command(chan, cb_fn, cb_arg, &desc, &op);
if (rc) {
return rc;
}
rc = _vtophys(src, &src_addr, nbytes);
if (rc) {
goto error;
}
rc = _vtophys(dst, &dst_addr, nbytes);
if (rc) {
goto error;
}
/* Command specific. */
desc->opcode = IDXD_OPCODE_COPY_CRC;
desc->dst_addr = dst_addr;
desc->src_addr = src_addr;
desc->flags &= IDXD_CLEAR_CRC_FLAGS;
desc->crc32c.seed = seed;
desc->xfer_size = nbytes;
op->crc_dst = crc_dst;
/* Submit operation. */
_submit_to_hw(chan, op);
return 0;
error:
TAILQ_INSERT_TAIL(&chan->ops_pool, op, link);
return rc;
}
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
uint32_t
spdk_idxd_batch_get_max(void)
{
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
/* TODO: consider setting this via RPC. */
return DESC_PER_BATCH;
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
}
struct idxd_batch *
spdk_idxd_batch_create(struct spdk_idxd_io_channel *chan)
{
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
struct idxd_batch *batch;
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
assert(chan != NULL);
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
if (!TAILQ_EMPTY(&chan->batch_pool)) {
batch = TAILQ_FIRST(&chan->batch_pool);
batch->index = 0;
batch->chan = chan;
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
TAILQ_REMOVE(&chan->batch_pool, batch, link);
} else {
/* The application needs to handle this. */
return NULL;
}
return batch;
}
static bool
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
_is_batch_valid(struct idxd_batch *batch, struct spdk_idxd_io_channel *chan)
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
{
return batch->chan == chan;
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
}
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
static void
_free_batch(struct idxd_batch *batch, struct spdk_idxd_io_channel *chan)
{
SPDK_DEBUGLOG(idxd, "Free batch %p\n", batch);
batch->index = 0;
batch->chan = NULL;
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
TAILQ_INSERT_TAIL(&chan->batch_pool, batch, link);
}
int
spdk_idxd_batch_cancel(struct spdk_idxd_io_channel *chan, struct idxd_batch *batch)
{
assert(chan != NULL);
assert(batch != NULL);
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
if (_is_batch_valid(batch, chan) == false) {
SPDK_ERRLOG("Attempt to cancel an invalid batch.\n");
return -EINVAL;
}
if (batch->index > 0) {
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
SPDK_ERRLOG("Cannot cancel batch, already submitted to HW.\n");
return -EINVAL;
}
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
_free_batch(batch, chan);
return 0;
}
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
static int _idxd_batch_prep_nop(struct spdk_idxd_io_channel *chan, struct idxd_batch *batch);
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
int
spdk_idxd_batch_submit(struct spdk_idxd_io_channel *chan, struct idxd_batch *batch,
spdk_idxd_req_cb cb_fn, void *cb_arg)
{
struct idxd_hw_desc *desc;
struct idxd_ops *op;
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
uint64_t desc_addr;
int i, rc;
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
assert(chan != NULL);
assert(batch != NULL);
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
if (_is_batch_valid(batch, chan) == false) {
SPDK_ERRLOG("Attempt to submit an invalid batch.\n");
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
return -EINVAL;
}
if (batch->index < MIN_USER_DESC_COUNT) {
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
/* DSA needs at least MIN_USER_DESC_COUNT for a batch, add a NOP to make it so. */
if (_idxd_batch_prep_nop(chan, batch)) {
return -EINVAL;
}
}
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
/* Common prep. */
rc = _idxd_prep_command(chan, cb_fn, cb_arg, &desc, &op);
if (rc) {
return rc;
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
}
/* TODO: pre-tranlate these when allocated for max batch size. */
rc = _vtophys(batch->user_desc, &desc_addr, batch->index * sizeof(struct idxd_hw_desc));
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
if (rc) {
TAILQ_INSERT_TAIL(&chan->ops_pool, op, link);
return rc;
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
}
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
/* Command specific. */
desc->opcode = IDXD_OPCODE_BATCH;
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
desc->desc_list_addr = desc_addr;
desc->desc_count = batch->index;
op->batch = batch;
assert(batch->index <= DESC_PER_BATCH);
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
/* Add the batch elements completion contexts to the outstanding list to be polled. */
for (i = 0 ; i < batch->index; i++) {
TAILQ_INSERT_TAIL(&chan->ops_outstanding, (struct idxd_ops *)&batch->user_ops[i],
link);
}
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
/* Submit operation. */
_submit_to_hw(chan, op);
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
SPDK_DEBUGLOG(idxd, "Submitted batch %p\n", batch);
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
return 0;
}
static int
_idxd_prep_batch_cmd(struct spdk_idxd_io_channel *chan, spdk_idxd_req_cb cb_fn,
void *cb_arg, struct idxd_batch *batch,
struct idxd_hw_desc **_desc, struct idxd_ops **_op)
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
{
struct idxd_hw_desc *desc;
struct idxd_ops *op;
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
if (_is_batch_valid(batch, chan) == false) {
SPDK_ERRLOG("Attempt to add to an invalid batch.\n");
return -EINVAL;
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
}
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
assert(batch != NULL); /* suppress scan-build warning. */
if (batch->index == DESC_PER_BATCH) {
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
SPDK_ERRLOG("Attempt to add to a batch that is already full.\n");
return -EINVAL;
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
}
desc = *_desc = &batch->user_desc[batch->index];
op = *_op = &batch->user_ops[batch->index];
op->desc = desc;
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
SPDK_DEBUGLOG(idxd, "Prep batch %p index %u\n", batch, batch->index);
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
batch->index++;
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
desc->flags = IDXD_FLAG_COMPLETION_ADDR_VALID | IDXD_FLAG_REQUEST_COMPLETION;
op->cb_arg = cb_arg;
op->cb_fn = cb_fn;
op->batch = batch;
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
return 0;
}
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
static int
_idxd_batch_prep_nop(struct spdk_idxd_io_channel *chan, struct idxd_batch *batch)
{
struct idxd_hw_desc *desc;
struct idxd_ops *op;
int rc;
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
/* Common prep. */
rc = _idxd_prep_batch_cmd(chan, NULL, NULL, batch, &desc, &op);
if (rc) {
return rc;
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
}
/* Command specific. */
desc->opcode = IDXD_OPCODE_NOOP;
if (chan->idxd->impl->nop_check && chan->idxd->impl->nop_check(chan->idxd)) {
desc->xfer_size = 1;
}
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
return 0;
}
int
spdk_idxd_batch_prep_copy(struct spdk_idxd_io_channel *chan, struct idxd_batch *batch,
void *dst, const void *src, uint64_t nbytes, spdk_idxd_req_cb cb_fn, void *cb_arg)
{
struct idxd_hw_desc *desc;
struct idxd_ops *op;
uint64_t src_addr, dst_addr;
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
int rc;
assert(chan != NULL);
assert(batch != NULL);
assert(dst != NULL);
assert(src != NULL);
/* Common prep. */
rc = _idxd_prep_batch_cmd(chan, cb_fn, cb_arg, batch, &desc, &op);
if (rc) {
return rc;
}
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
rc = _vtophys(src, &src_addr, nbytes);
if (rc) {
return rc;
}
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
rc = _vtophys(dst, &dst_addr, nbytes);
if (rc) {
return rc;
}
/* Command specific. */
desc->opcode = IDXD_OPCODE_MEMMOVE;
desc->src_addr = src_addr;
desc->dst_addr = dst_addr;
desc->xfer_size = nbytes;
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
return 0;
}
int
spdk_idxd_batch_prep_fill(struct spdk_idxd_io_channel *chan, struct idxd_batch *batch,
void *dst, uint64_t fill_pattern, uint64_t nbytes,
spdk_idxd_req_cb cb_fn, void *cb_arg)
{
struct idxd_hw_desc *desc;
struct idxd_ops *op;
uint64_t dst_addr;
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
int rc;
assert(chan != NULL);
assert(batch != NULL);
assert(dst != NULL);
/* Common prep. */
rc = _idxd_prep_batch_cmd(chan, cb_fn, cb_arg, batch, &desc, &op);
if (rc) {
return rc;
}
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
rc = _vtophys(dst, &dst_addr, nbytes);
if (rc) {
return rc;
}
/* Command specific. */
desc->opcode = IDXD_OPCODE_MEMFILL;
desc->pattern = fill_pattern;
desc->dst_addr = dst_addr;
desc->xfer_size = nbytes;
return 0;
}
int
spdk_idxd_batch_prep_dualcast(struct spdk_idxd_io_channel *chan, struct idxd_batch *batch,
void *dst1, void *dst2, const void *src, uint64_t nbytes, spdk_idxd_req_cb cb_fn, void *cb_arg)
{
struct idxd_hw_desc *desc;
struct idxd_ops *op;
uint64_t src_addr, dst1_addr, dst2_addr;
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
int rc;
assert(chan != NULL);
assert(batch != NULL);
assert(dst1 != NULL);
assert(dst2 != NULL);
assert(src != NULL);
if ((uintptr_t)dst1 & (ALIGN_4K - 1) || (uintptr_t)dst2 & (ALIGN_4K - 1)) {
SPDK_ERRLOG("Dualcast requires 4K alignment on dst addresses\n");
return -EINVAL;
}
/* Common prep. */
rc = _idxd_prep_batch_cmd(chan, cb_fn, cb_arg, batch, &desc, &op);
if (rc) {
return rc;
}
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
rc = _vtophys(src, &src_addr, nbytes);
if (rc) {
return rc;
}
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
rc = _vtophys(dst1, &dst1_addr, nbytes);
if (rc) {
return rc;
}
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
rc = _vtophys(dst2, &dst2_addr, nbytes);
if (rc) {
return rc;
}
desc->opcode = IDXD_OPCODE_DUALCAST;
desc->src_addr = src_addr;
desc->dst_addr = dst1_addr;
desc->dest2 = dst2_addr;
desc->xfer_size = nbytes;
return 0;
}
int
spdk_idxd_batch_prep_crc32c(struct spdk_idxd_io_channel *chan, struct idxd_batch *batch,
uint32_t *crc_dst, void *src, uint32_t seed, uint64_t nbytes,
spdk_idxd_req_cb cb_fn, void *cb_arg)
{
struct idxd_hw_desc *desc;
struct idxd_ops *op;
uint64_t src_addr;
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
int rc;
assert(chan != NULL);
assert(batch != NULL);
assert(crc_dst != NULL);
assert(src != NULL);
/* Common prep. */
rc = _idxd_prep_batch_cmd(chan, cb_fn, cb_arg, batch, &desc, &op);
if (rc) {
return rc;
}
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
rc = _vtophys(src, &src_addr, nbytes);
if (rc) {
return rc;
}
/* Command specific. */
desc->opcode = IDXD_OPCODE_CRC32C_GEN;
desc->dst_addr = 0; /* per specification */
desc->src_addr = src_addr;
desc->flags &= IDXD_CLEAR_CRC_FLAGS;
desc->crc32c.seed = seed;
desc->xfer_size = nbytes;
op->crc_dst = crc_dst;
return 0;
}
int
spdk_idxd_batch_prep_copy_crc32c(struct spdk_idxd_io_channel *chan, struct idxd_batch *batch,
void *dst, void *src, uint32_t *crc_dst, uint32_t seed, uint64_t nbytes,
spdk_idxd_req_cb cb_fn, void *cb_arg)
{
struct idxd_hw_desc *desc;
struct idxd_ops *op;
uint64_t src_addr, dst_addr;
int rc;
assert(chan != NULL);
assert(batch != NULL);
assert(crc_dst != NULL);
assert(src != NULL);
/* Common prep. */
rc = _idxd_prep_batch_cmd(chan, cb_fn, cb_arg, batch, &desc, &op);
if (rc) {
return rc;
}
rc = _vtophys(src, &src_addr, nbytes);
if (rc) {
return rc;
}
rc = _vtophys(dst, &dst_addr, nbytes);
if (rc) {
return rc;
}
/* Command specific. */
desc->opcode = IDXD_OPCODE_COPY_CRC;
desc->dst_addr = dst_addr;
desc->src_addr = src_addr;
desc->flags &= IDXD_CLEAR_CRC_FLAGS;
desc->crc32c.seed = seed;
desc->xfer_size = nbytes;
op->crc_dst = crc_dst;
return 0;
}
int
spdk_idxd_batch_prep_compare(struct spdk_idxd_io_channel *chan, struct idxd_batch *batch,
void *src1, void *src2, uint64_t nbytes, spdk_idxd_req_cb cb_fn,
void *cb_arg)
{
struct idxd_hw_desc *desc;
struct idxd_ops *op;
uint64_t src1_addr, src2_addr;
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
int rc;
assert(chan != NULL);
assert(batch != NULL);
assert(src1 != NULL);
assert(src2 != NULL);
/* Common prep. */
rc = _idxd_prep_batch_cmd(chan, cb_fn, cb_arg, batch, &desc, &op);
if (rc) {
return rc;
}
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
rc = _vtophys(src1, &src1_addr, nbytes);
if (rc) {
return rc;
}
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
rc = _vtophys(src2, &src2_addr, nbytes);
if (rc) {
return rc;
}
/* Command specific. */
desc->opcode = IDXD_OPCODE_COMPARE;
desc->src_addr = src1_addr;
desc->src2_addr = src2_addr;
desc->xfer_size = nbytes;
return 0;
}
static inline void
_dump_sw_error_reg(struct spdk_idxd_io_channel *chan)
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
{
struct spdk_idxd_device *idxd = chan->idxd;
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
assert(idxd != NULL);
idxd->impl->dump_sw_error(idxd, chan->portal);
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
}
/* TODO: more performance experiments. */
#define IDXD_COMPLETION(x) ((x) > (0) ? (1) : (0))
#define IDXD_FAILURE(x) ((x) > (1) ? (1) : (0))
#define IDXD_SW_ERROR(x) ((x) &= (0x1) ? (1) : (0))
int
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
spdk_idxd_process_events(struct spdk_idxd_io_channel *chan)
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
{
struct idxd_ops *op, *tmp;
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
int status = 0;
int rc = 0;
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
assert(chan != NULL);
TAILQ_FOREACH_SAFE(op, &chan->ops_outstanding, link, tmp) {
if (rc == MAX_COMPLETIONS_PER_POLL) {
break;
}
if (IDXD_COMPLETION(op->hw.status)) {
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
TAILQ_REMOVE(&chan->ops_outstanding, op, link);
rc++;
if (spdk_unlikely(IDXD_FAILURE(op->hw.status))) {
status = -EINVAL;
_dump_sw_error_reg(chan);
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
}
switch (op->desc->opcode) {
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
case IDXD_OPCODE_BATCH:
SPDK_DEBUGLOG(idxd, "Complete batch %p\n", op->batch);
lib/idxd: refactor batching for increased performance And to eliminate an artificial constraint on # of user descriptors. The main idea here was to move from a single ring that covered all user descriptors to a pre-allocated ring per pre-allocated batch. In addition, the other major change here is in how we poll for completions. We used to poll the batch rings then the main ring. Now when commands are prepared their completion address is added to a per channel list and the poller simply runs through that list not caring which ring the completion address belongs too. This simplifies the completion logic considerably and will avoid polling locations that can't potentially have a completion. Some minor rework was included as well, mainly getting rid of the ring_ctrl struct as it didn't serve much of a purpose anyway and with how things are setup now its easier to read with all the elements in the channel struct. Also, a change that came in while this was WIP needed a few fixes to function correctly. Addressed those and moved them to a helper function so we have one point of control for xlations. Added support for NOP in cases where a batch is submitted with only 1 descriptor. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Ie201b28118823100e908e0d1b08e7c10bb8fa9e7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3654 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2020-08-03 15:54:55 +00:00
break;
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
case IDXD_OPCODE_CRC32C_GEN:
case IDXD_OPCODE_COPY_CRC:
if (spdk_likely(status == 0)) {
*op->crc_dst = op->hw.crc32c_val;
*op->crc_dst ^= ~0;
}
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
break;
case IDXD_OPCODE_COMPARE:
if (spdk_likely(status == 0)) {
status = op->hw.result;
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
}
break;
}
if (op->cb_fn) {
op->cb_fn(op->cb_arg, status);
idxd: add batch capability to accel framework and IDXD back-end This patch only includes the basic framework for batching and the ability to batch one type of command, copy. Follow-on patches will add the ability to batch other commands and include an example of how to do so via the accel perf tool. SW engine support for batching will also come in a future patch. Documentation will also be coming. Batching allows the application to submit a list of independent descriptors to DSA with one single "batch" descriptor. This is beneficial when the application is in a position to have several operations ready at once; batching saves the overhead of submitting each one separately. The way batching works in SPDK is as follows: 1) The app gets a handle to a new batch with spdk_accel_batch_create() 2) The app uses that handle to prepare a command to be included in the batch. For copy the command is spdk_accel_batch_prep_copy(). The app many continue to prep commands for the batch up to the max via calling spdk_accel_batch_get_max() 3) The app then submits the batch with spdk_accel_batch_submit() 4) The callback provided for each command in the batch will be called as they complete, the callback provided to the batch submit itself will be called then the entire batch is done. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I4102e9291fe59a245cedde6888f42a923b6dbafd Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2248 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-05-07 18:45:15 +00:00
}
op->hw.status = status = 0;
if (op->desc->opcode == IDXD_OPCODE_BATCH) {
_free_batch(op->batch, chan);
} else if (op->batch == NULL) {
TAILQ_INSERT_TAIL(&chan->ops_pool, op, link);
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
}
} else {
/*
* oldest locations are at the head of the list so if
* we've polled a location that hasn't completed, bail
* now as there are unlikely to be any more completions.
*/
break;
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
}
}
return rc;
lib/idxd: add low level idxd library Module, etc., will follow. Notes: * IDXD is an Intel silicon feature available in future Intel CPUs. Initial development is being done on a simulator. Once HW is available and the code fully tested the experimental label will be lifted. Spec can be found here: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification * The current implementation will only work with VFIO. * DSA has a number of engines that can be grouped based on application need such as type of memory being served or QoS. Engines are processing units and are assigned to groups. Work queues are on device structures that act as front-end groups for queueing descriptors. Full details on what is configurable & how will come in later doc patches. * There is a finite number of work queue slots that are divided amongst the number of desired work queues in some fashion (ie evenly). * SW (outside of the idxd lib) is required to manage flow control, to not over-run the work queues.This is provided in the accel plug-in module. The upper layers use public API to manage this. * Work queue submissions are done with a 64 byte atomic instruction * The design here creates a set of descriptor rings per channel that match the size of the work queues. Then, an spdk_bit_array is used to make sure we don't overrun a queue. If there are not slots available, the operation is put on a linked list to be retried later from the poller. * As we need to support any number of channels (we can't limit ourselves to the number of work queues) we need to dynamically size/resize our per channel descriptor rings based on the number of current channels. This is done from upper layers via public API into the lib. * As channels are created, the total number of work queue slots is divided across the channels evenly. Same thing when they are destroyed, remaining channels with see the ring sizes increase. This is done from upper layers via public API into the lib. * The sim has 64 total work queue entries (WQE) that get dolled out to the work queues (WQ) evenly. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I899bbeda3cef3db05bea4197b8757e89dddb579d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1809 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-04-10 15:29:01 +00:00
}
void
idxd_impl_register(struct spdk_idxd_impl *impl)
{
STAILQ_INSERT_HEAD(&g_idxd_impls, impl, link);
}
SPDK_LOG_REGISTER_COMPONENT(idxd)