The previous version of this function precluded one target name from
being a leading substring of another. i.e. if "nvmf_tgt_1" was already
used as a name "nvmf_tgt_11" could not be used subsequently.
Just an odd quirk that shouldn't be the case.
Change-Id: Iea59b6757512f01070e48074e35a11d942e399bb
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/468522
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Also update the changelog for the previous few changes.
Change-Id: I79ac330b4992ccc3e41fd1643b09128c6de6c86d
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/468391
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
The current connection scheduling mechanism (RoundRobin) doesn't take into account the qpair type and assigns each new qpair to the next poll group. As a side effect there might occur a disbalance when some poll group handles more IO qpairs than others. In RDMA transport it is possible to get the qpair type before the controller creation using a private data from the rdma_cm event, this allows to schedule admin and IO qpairs in the balanced way.
Change-Id: I90c368a41c4cd0f5347a83cab7511e4494f05b29
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Signed-off-by: Evgenii Kochetov <evgeniik@mellanox.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/468993
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Operations with poll groups list must be protected by rtransport->lock.
Make rtranposrt->lock recursive to avoid unnecessary mutex operations when
the poll group is being destroyed within spdk_nvmf_rdma_poll_group_create
Change-Id: If0856429c10ad3bfcc9942da613796cc86d68d8d
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Signed-off-by: Evgenii Kochetov <evgeniik@mellanox.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/468992
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
On a DIF verification error, fail the read command with a status code
of APPLICATION_TAG_CHECK_ERROR, GUARD_CHECK_ERROR, or
REFERENCE_TAG_CHECK_ERROR and a status code type of SCT_MEDIA_ERROR.
The state of the request is TCP_REQUEST_STATE_TRANSFERRING_CONTROLLER_TO_HOST
when a DIF verification error is detected. So dequeue the request
from C2H data queue, return the response PDU, and then send the command
response.
This was an item on the TODO list. RDMA transport do this right
behavior from the start and so TCP transport follows it by this patch.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I102bbd253cc8c1379d0937c9536bf2bfe04cbf6a
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/468911
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
tcp_req->orig_length had been set just before I/O submission but
the value is already fixed in spdk_nvmf_tcp_req_parse_sgl().
Hence move setting tcp_req->orig_length accordingly.
This follows the good practice of RDMA transport.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I99f6e266d8f7027bce810864314f3ee24a1af10c
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/468910
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Algorithm and some code from: https://github.com/aklomp/base64
Get ~2.3x speedup for encoding and ~1.7x speedup for decoding on
AArch64.
Signed-off-by: Richael Zhuang <richael.zhuang@arm.com>
Change-Id: Ifce07299aea722337b0b4886117d1f616c5c03ef
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/465733
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
spdk_bdev_zone_management() allows to perform
management action on a zone. Zone is specified
by start logical block address. Available zone
actions: open, close, reset and finish.
Change-Id: Ie7eaed3e2cc7b9b49dd51ee2d6c28b4ef2f23eb9
Signed-off-by: Wojciech Malikowski <wojciech.malikowski@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/460647
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
spdk_bdev_get_zone_info() is used for retriving
information about zones inside zoned namespace.
Change-Id: I8f931505245e984c0b1ee35ed6592c978ee47544
Signed-off-by: Wojciech Malikowski <wojciech.malikowski@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/460643
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Some internal bdev.c static function will be shared
with new zoned bdev module.
Change-Id: Ifbb8bf443f67b2daf97858b15d474ecce98a9efb
Signed-off-by: Wojciech Malikowski <wojciech.malikowski@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469100
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Maciej Szwed <maciej.szwed@intel.com>
Added new public header for zoned bdev. Zoned bdev is an
extension of the bdev interface. Generic concept comes
from ATA/SCSI and is also being worked as an NVMe TP.
Zoned device logical blocks space is divided into fixed-sized
zones. Each zone is described by its start logical block address
and capacity. Writes to a single zone need to be sequential.
After zone is fully written it need to be reset to write to it
again. Such writing schema could be very beneficial in terms of
write amplification factor for NAND based devices.
SPDK Flash Translation Layer library will be consuming this
interface in the future.
Extending SPDK bdev interface will allow to use existing bdev
infrastructure for this new type of devices.
Zoned device have several properties defined in spdk_bdev
structure:
- zone_size: default size of each zone
- max_open_zone: maximum number of open zones
- optimal_open_zones: optimal number of open zones to get
best performance on writes
Single zone properties are defined in spdk_bdev_zone_info
structure:
- start_lba: first logical block of z zone
- write_pointer: logical block address in the zone at
which next write shall occur.
- capacity: maximum number of logical blocks that may
be written in the zone when zone is empty.
- state: zone state
Several zone states are defined: Empty, Open, Full, Closed,
Read Only and Offline.
To change zone state zone actions are defined: Close, Finish,
Open and Reset.
Change-Id: I5fcc22d548c15743329344cae96f5ff73e268504
Signed-off-by: Wojciech Malikowski <wojciech.malikowski@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/460642
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Maciej Szwed <maciej.szwed@intel.com>
This patch is entry point for extending bdev
interface to support devices with zoned namespace
semantics.
spdk_bdev_is_zoned() will allow user to check if
bdev is zoned bdev.
Change-Id: Id9ea9898d406d1d942bf3081b00ebcb574ac2b5e
Signed-off-by: Wojciech Malikowski <wojciech.malikowski@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/460641
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Maciej Szwed <maciej.szwed@intel.com>
used for creating a new spdk_nvmf_tgt structure in the application.
Change-Id: Ib0182ea6d935b84b4fe4fcad79e173cb46859669
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/468387
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Functions added in this patch:
spdk_nvmf_tgt_get_name - get human readable name from target.
spdk_nvmf_get_first_tgt - start iterating over global list of targets
spdk_nvmf_get_next_tgt - get next target in iteration
These functions will facilitate the following RPC
nvmf_get_targets - get the names of all active NVMe-oF targets.
In this series, I will also add two more RPCs, nvmf_create_target, and
nvmf_destroy_target, as wrappers around the create and destroy
functions. Since all of these changes are pretty minor and closely
related, I will just do one big changelog entry at the end.
Change-Id: Ia9f1248fbf9726fa3889998a169211fb25e724f2
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/468386
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
This issue can be reproduced on fedora30.
Add assert here is enough to fix this kind of warning.
Error log:
rdma.c:3070:20: warning: Access to field 'data_buf_pool' results in a
dereference of a null pointer (loaded from field 'transport')
spdk_mempool_put(group->transport->data_buf_pool, buf);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
This is to fix issue #965.
Change-Id: Ifb742ab914ee9a0381dca0bb769ba8aa564c816f
Signed-off-by: yidong0635 <dongx.yi@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/468908
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Added write_unit_size field to bdev structure.
It describes required number of logical blocks
for write operation. For legacy bdevs this value
will be equal to logical block size.
For bdevs working on top of Open Channel/Zoned
Namespace SSDs or RAID 5 volumes write size unit
could be greater than logical block size and
upper layer should perform write operations
with size of multiple of write unit size.
Change-Id: I55eb6687912a7d0d1157874b2778e7d6c2d44e37
Signed-off-by: Wojciech Malikowski <wojciech.malikowski@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/463802
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Maciej Szwed <maciej.szwed@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Update transaction length wrt to medata size
Change buffers handling in the case of enabled DIF - add function nvmf_rdma_fill_buffer_with_md_interleave to split SGL into several parts with metadata blocks between them in order to perform RDMA operation with appropriate offsets
Add DIF generation before executing bdev IO operation
Add parsing of DifInsertOrStrip config parameter.
Since there is a limitation on the number of entries in SG list (16), the current approach has a limitation on the max transaction size which depends on the data block size. E.g. if data block size is 512 bytes then the maximum transaction size will be 512 * 16 = 8192 bytes.
In adiition, the size of IO buffer (IOUnitSize conf param) must be aligned to metadata size for better perfromance since metadata is treated as part of this buffer. E.g. if the initiator uses transaction size = 4096, data block size on nvme disk is 512 then IO buffer size should be aligned to (512 + 8) which is 4160. In other case an extra IO buffer will be consumed which will increase the number of entries in SGL and in iov.
Change-Id: I7ad2270fe9dcceb114ece34675eac44e5783a0d5
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Signed-off-by: Evgenii Kochetov <evgeniik@mellanox.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/465248
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Weighted Round Robin can be enabled for users, and users
can allocate different priority IO queues for different
purpose. For now we will enable this feature in the
NVMe driver first, following patches will enable this
feature in bdev layer.
Change-Id: I0f799236ca04eb85ef3c9f972ed63ff2718563ba
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/466852
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Parameter 'MinConnectionsPerCore' was removed in last release and marked
as deprecated, now we will deprecate 'MinConnectionsPerCore' finally.
Change-Id: I613a371e8b5352dfb84f8e4293805b792020c643
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/468789
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Now code always return 0 , do this like nvme_rdma_mr_map_notify.
That callback can get the right return.
Change-Id: Ief2924e14321b2062f6001e7ae3f50d507206594
Signed-off-by: yidong0635 <dongx.yi@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/468663
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
In the spdk_vhost_scsi_dev_remove() it takes a period of time
to remove all the tgts but before it is completed the scsi dev
has been freed. So don't free the scsi dev until all the tgts
have been removed.
Fix Github issue #932
Change-Id: Idf9293c70b8d5f82091db6dd5e018a5cb40eea36
Signed-off-by: JinYu <jin.yu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/464654
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Karol Latecki <karol.latecki@intel.com>
By splitting all cm_event handling into a single function, we can create
a single point of contact for cm_events, whether we want to process them
synchronously or asynchronously.
Change-Id: I053a850358605115362f424de55e66806a769320
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/467546
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This is paving the way for additional changes to enable polling for
cm_events in the initiator.
For now, just present the same blocking API on top of the now polled
file descriptor. Later, we will change this API to be more useful.
Change-Id: I174dac028720f95c30100f6dc2ed49b5bb2a7e40
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/467545
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
spdk_pci_device_claim() could create a file on the
filesystem that couldn't be deleted programatically.
It could only be overwritten - e.g. by another spdk
instance - but this didn't really work if that
another instance had less privileges and hence no
access to the previous file.
This is exactly the case we're seeing on our CI when
running SPDK as non-root. In general it's a good idea
not to leave any leftover files, so now we'll delete
the pci claim file when the spdk process exits.
spdk_pci_device_claim() used to return a file descriptor
that could be simply closed to "un-claim" the device.
It'll now return only a return code. The fd will be
stored inside spdk_pci_device and will be closed either
when user calls the newly introduced spdk_pci_device_unclaim(),
or when the device is detached.
We'll still need to clean up those files somewhere in
our test scripts (probably ./setup.sh cleanup) to
clean up after crashed processes or so - but we don't
necessarily want to run such scripts inside the autotest
whenever a non-root spdk is about to be started.
Change-Id: I797e079417bb56491013cc5b92f0f0d14f451d18
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/467107
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Purpose: Prepare for the further optimization work
to use one bigger buffer to read more data for
reducing system calls.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: Ie92603b09308bd3149263269fdec355b67251b37
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/468538
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Get all of the important stuff into the first cache line.
Change-Id: I5bbfb031bb1d693019abb9e5145579d0b867eaf5
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/465994
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
If the CPU reorders the eventidx read before the shadow doorbell
write, it is indeterminate whether the controller will read the
updated shadow doorbell without an MMIO write. See
https://lkml.org/lkml/2018/8/14/1031 for details.
Signed-off-by: Benjamin Saunders <bsaunders@google.com>
Change-Id: I5aa08fdd5b32c7b81e8048ca6efe546318d80b5c
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/468188
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
It is a very rare thing for a buffer to be split over two memory
regions. In fact, it is only possible in dpdk versions where
--match-allocations is not passed as a startup parameter to dpdk but
dynamic memory allocation is enabled.
By adding a small helper function, we avoid failing an I/O because it
was assigned one of these improperly aligned buffers. Also, we try to
remove the buffer from circulation so that it doesn't get picked up
again by another request.
Also, add a unit test to catch this case.
Change-Id: Ia09865c2f77160a960571665b29c4533b11758ae
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/467446
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Just cleaning up a few things like variable names and ordering to make
the whole function more readable.
Change-Id: I1503cdb43ddd73e063d6e57e9ff0cf2a06e79728
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/467445
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
IB Architecture Specification vol.1 rel.13. in ch.10.3.1 "QUEUE PAIR
AND EE CONTEXT STATES" suggests the following destroy procedure for
QPs associated with SRQ:
- Put the QP in the Error State;
- wait for the Affiliated Asynchronous Last WQE Reached Event;
- either:
* drain the CQ by invoking the Poll CQ verb and either wait for CQ
to be empty or the number of Poll CQ operations has exceeded CQ
capacity size; or
* post another WR that completes on the same CQ and wait for this WR
to return as a WC;
- and then invoke a Destroy QP or Reset QP.
Without the drain step it is possible that LAST_WQE_REACHED event is
received and QP is destroyed before the last receive WR completion is
polled from the CQ.
In SPDK there is no risk of resource leakage in this case. So, instead
of draining we can destroy QP and then just ignore receive completions
without QP and post receive WRs back to SRQ.
Fixes#903
Signed-off-by: Evgeniy Kochetov <evgeniik@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: Ice6d3d5afc205c489f768e3b51c6cda8809bee9a
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/465747
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Function multiversioning conflicts with LTO when applied
to a function defined in a header file included from
multiple compilation units.
Change-Id: I65bed3903a717b7e982ab185c314d2118ae0e795
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/465972
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Support for these options was not introduced until DPDK commit
7f0bb634a1406b132ff15c9cd56a0a9f33e5f11d
Signed-off-by: Benjamin Saunders <bsaunders@google.com>
Change-Id: Id6db73dd48ac01aa1b05eca4c920c5753e8cc6f0
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/467703
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
The default behavior is to set it to 2MB, so this isn't
required anymore.
Change-Id: I62d7605cd4d5bc41347128f32f9a1aa373a15744
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/466993
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
The socket now automatically sets the recvbuf size
to 2MB, so this isn't necessary.
Change-Id: Id2196f4038f6835118047233f18c0395fa3f2670
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/466992
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
The dependencies between vhost and rte_vhost were not added during
earlier changes. This change moves the rte_vhost directory up to the
level of the other libraries and adds the proper dependencies for when
it is linked.
Change-Id: I089de1cd945062b64975a0011887700c0e38bb0f
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/467700
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This reverts commit 6129e78d26.
When the initiator sends the discovery log page, if the log page
exceeds the size of its data buffer, it will break it up into
multiple log page commands with appropriate offsets. However,
supporting offsets in log pages is an optional feature in NVMe
and reported by the EDLP bit in the identify data.
This commit changed the discovery process to no longer send an
identify command prior to doing the discovery log page command,
so the values in the identify data are always 0. If the discovery
log page exceeds the size of the data buffer (4k), it will then
fail to send the second log page with an offset because it
believes the controller does not support the feature.
Revert this change to fix it. An identify should always be sent
as part of the discovery process. A test case is included in a
follow up patch the demonstrates the bug.
Reported-by: Zahra Khatami <zahra.k.khatami@oracle.com>
Reported-by: Akshay Shah <akshay.shah@oracle.com>
Change-Id: Iefd512a7521e0fea90541b3eb547671cfa816ea6
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/466819
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
We used to call a dpdk function to do it, but using
a function for something that simple doesn't make sense.
The function also does its internal queue lookup by vid
and queue number, which could potentially fail, return an
error and technically require SPDK to handle it.
The function makes some sense for vhost-net applications
which don't touch vrings directly but rely on rte_vhost's
API for enqueueing/dequeuing mbufs. SPDK touches DPDK's
rings directly for the entire I/O handling, so it might
just as well for initialization.
This serves as cleanup.
Change-Id: Ifb44fa22ea5fc3633aa85f075aa1a5cd02f5423c
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/466745
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The reported github issue #938 has been reported intermettently.
The issue is that the bdev descriptor passed to spdk_bdev_reset()
is not valid and causes seg. fault.
Current implementation of LUN hot plug is that putting IO channel
and removing LUN are done by different poller. Hence if any task
management command is issued between the gap, the reported issue
is likely to occur.
The flag removing is set at the start of LUN hot plug and so
spdk_scsi_dev_get_lun() can return NULL even before completing
removal by referring the flag removing.
Fixes#938.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I1a51d90cc700134e8c0ec399a3ce62620c84ef73
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/467212
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Since we use pdu->data_iovcnt to
build the iov in nvme_tcp_build_iovs, so
send out pdu has the maximal iov number
equals to: 2 + pdu->data_iovcnt,
so we change the comparison.
This makes sure that we can handle all the data
owned by one pdu.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I2b9258cc5716d706c0fa38af609726c439708768
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/467207
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
This will be consistent with TCP and RDMA transport, and we will use
ctrlr->flags in nvme_ctrlr_init_cap() in next patch, the flags will
be cleared to 0 for now.
Change-Id: Ic360cd0c00d60c77452d19cdc1e7a32a5fc34df0
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/466678
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Change the way we increase poll group reference counts
for round-robin scheduling.
So far we used to increase them whenever someone called
vhost_get_poll_group() and this worked fine for Vhost-Block
which picks a new poll group for each session. Vhost-SCSI,
however, picks only one poll group for all sessions on
a vhost device. This means that some threads will have
multiple Vhost-SCSI pollers but will still appear to the
vhost scheduler as if they had only one.
To fix it, increase poll group refcnt only when sessions
are really being started - in vhost_session_start_done().
Change-Id: I60f0d2101239e5a91138a5afd30c51dc1ccf7c2e
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/466733
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Currently vhost_dev_foreach_session() accepts a single
callback function for both iterating through all active
sessions and for signaling the end of iteration (called
last time with vsession param == NULL). Now that the
final signal has completely different semantics and is
called on a specific thread, it makes sense to put it in
a separate function.
While here, remove the one-line description of
spdk_vhost_session_fn typepef. It wasn't helpful anyway.
Change-Id: I56b97180110874a813e666f964bb51c39a8ce6bb
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/466732
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Currently vhost_dev_foreach_session() accepts a single
callback function for both iterating through all active
sessions and for signaling the end of iteration (called
last time with vsession param == NULL). Now that the
final signal has completely different semantics and is
called on a specific thread, it makes sense to put in
a separate function.
In this patch we prepare separate functions for the final
call, but still call them in the original callback. In
a separate patch we'll start passing both functions
directly to foreach_session().
Change-Id: I9f4338d9696f7bd15ca2d6655c6a3851569aff75
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/466731
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The function could never fail, so make it return void
rather than int. This serves as cleanup.
Change-Id: I16a857ecee8d162f546fd097acaa2e66d51ebffa
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/466730
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Historically the callbacks from vhost_dev_foreach_session()
could be called with vdev argument == NULL, which would
mean that device was removed after enqueuing the event
and before consuming it. Now we keep track of pending
asynchronous operations on each vhost device and don't
allow removing it if there are any unconsumed events,
so the the vdev == NULL checks are redundant. Remove them.
Change-Id: I7aa3785080d20ed06e008c081d3f37a949228f5a
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/466729
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Remove them all at once. spdk_ prefix should be
only applied to publicly exported functions.
Change-Id: Ib6d2bd0954ec5cb7c8cf253d79b9d3cd8aa0eeef
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/466728
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>