Commit Graph

4671 Commits

Author SHA1 Message Date
Ziye Yang
4e331534f1 event/subsystem: solve the subsystem init and destroy conflict
We have conflict to handle the NVMf subsystem shut
down. The situation is that:

If there is shutdown request (e.g., ctrlr+c),
we may have subsystem finalization and subsystem
initialization conflict (e.g., have NVMf subsystem fini and
intialization together), we will have coredump
issue like #682.

If we interrupt the initialization of the subsystem,
following works should do:

1  Do not initilize the next subsystem.
2  Recycle the resources in each subsystem via the
spdk_subsystem_fini related function. And this patch will
do the general thing, but will not consider the detailed
interrupt policy in each subsystem.

Change-Id: I2438b4a2462acb05d8c8e06dfff3da3d388d4b70
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.gerrithub.io/c/446189
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-03-01 04:36:41 +00:00
Xiaodong Liu
1d29b23134 bdev/raid: extend unmap process to null data io
Other io_type, like FLUSH, has a similar character with
UNMAP, that has a range description (offset and length),
but has no data payload. So the process for UNMAP io_type
can be extended to io_type like FLUSH.

Change-Id: I9467dfc3cc4fc1431b79359b0c477807ec138ac7
Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/446491
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: wuzhouhui <wuzhouhui@kingsoft.com>
Reviewed-by: Piotr Pelpliński <piotr.pelplinski@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-03-01 01:07:22 +00:00
Seth Howell
59f0d22e40 rdma: Fix misordered assert and decrement.
In the error path, we were first decrementing a variable and then
asserting that it must be >0. These operations should occur in the
opposite order.

Change-Id: I6cec544faf17bb75cbfca3d3a3c173dc5db14f99
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/446440
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: yidong0635 <dongx.yi@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-02-28 21:20:38 +00:00
Seth Howell
756ce464f6 rdma: update default number of shared buffers.
When the decision was made to uncouple the number of shared buffers from
the queue depth and allow the user to decide for themselves, the default
was also significantly lowered, which caused some issues when trying
torun performance tests (See https://github.com/spdk/spdk/issues/699).
While this is a user modifiable variable, it is still best to keep the
higher default value.

The original value was equivalent to max_queue_depth *
SPDK_NVMF_MAX_SGL_ENTRIES * 2 with the defaults for max_queue depth and
max_sgl_entries being 128 and 16 respectively. Hence 4096

fixes: 0b20f2e552

Change-Id: I809e97a10973093a2b485b85bca7160091166f70
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/446525
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-28 21:09:50 +00:00
Tomasz Zawadzki
ca87060dcc lvol: add option to change clear method for lvol store creation
Default 'unmap' option stays as it was.

'Write_zeroes' comes useful when one wants to make sure
that data presented from lvol bdevs on initial creation presents 0's.

'None' will be used for performance tests,
when whole device is preconditioned before creating lvol store.
Instead of performing preconditioning on each lvol bdev after its creation.

Change-Id: Ic5a5985e42a84f038a882bbe6f881624ae96242c
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442881
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-28 20:50:27 +00:00
Xiaodong Liu
3d951cd321 bdev/raid: enable unmap support
Change-Id: If0e3c483ce16680ecea0252c389e134c59b2793e
Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/441309
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-28 04:22:39 +00:00
Xiaodong Liu
48b4a2545a bdev/raid: add io_expected in struct raid_bdev_io
base_bdev_io_expected can be used for the situation
that IO requries multiple and uncertain number of
base bdevs.

Change-Id: I912400f839c02c95606bc94e7c8ad4946e90b6bf
Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/446009
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-02-27 20:51:09 +00:00
Seth Howell
009868730c init: add --match-allocations to init params.
This feature was added to DPDK by Jim to avoid the failures that can
come from splitting a buffer over memory regions in RDMA.

Change-Id: I13b646e22a4e2a4ccf915b0274061d31d02c03f7
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/446166
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-27 17:27:59 +00:00
Ziye Yang
7ae5b8649e event/nvmf: remove the unnecessary check in spdk_nvmf_subsystem_fini
Since we already checked the core info in _spdk_subsystem_fini_next
function.

Change-Id: I6ab28d8fb11a7a07ae8c14c27357db236bf51b3e
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.gerrithub.io/c/446190
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: qun wan <qun.wan@intel.com>
2019-02-27 08:17:52 +00:00
Shuhei Matsumoto
0f01513359 bdev: process failure of spdk_bdev_io_get_buf_cb in each bdev module
If success is false in each bdev module's spdk_bdev_io_get_buf_cb,
call spdk_bdev_io_complete with SPDK_BDEV_IO_STATUS_FAILED, and
then return.

Change-Id: I6f106d8d39a3616f7305201fa2efc4805d4d00ee
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/446046
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-02-27 07:28:15 +00:00
Xiaodong Liu
427fd1d76c bdev/raid: extract reset failure code
Break out the failure handling code to a separate
function.

Change-Id: Ic530bb4d33c19edb62360e06afe3946b963445b1
Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/446008
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: wuzhouhui <wuzhouhui@kingsoft.com>
2019-02-27 05:46:08 +00:00
Shuhei Matsumoto
4b92ffb3f1 bdev: Not assert but pass completion status to spdk_bdev_io_get_buf_cb
When the specified buffer size to spdk_bdev_io_get_buf() is greater
than the permitted maximum, spdk_bdev_io_get_buf() asserts simply and
doesn't call the specified callback function.

SPDK SCSI library doesn't allocate read buffer and specifies
expected read buffer size, and expects that it is allocated by
spdk_bdev_io_get_buf().

Bdev perf tool also doesn't allocate read buffer and specifies
expected read buffer size, and expects that it is allocated by
spdk_bdev_io_get_buf().

When we support DIF insert and strip in iSCSI target, the read
buffer size iSCSI initiator requests and the read buffer size iSCSI target
requests will become different.

Even after that, iSCSI initiator and iSCSI target will negotiate correctly
not to cause buffer overflow in spdk_bdev_io_get_buf(), but if iSCSI
initiator ignores the result of negotiation, iSCSI initiator can request
read buffer size larger than the permitted maximum, and can cause
failure in iSCSI target. This is very flagile and should be avoided.

This patch do the following
- Add the completion status of spdk_bdev_io_get_buf() to
  spdk_bdev_io_get_buf_cb(),
- spdk_bdev_io_get_buf() calls spdk_bdev_io_get_buf_cb() by setting
  success to false, and return.
- spdk_bdev_io_get_buf_cb() in each bdev module calls assert if success
  is false.

Subsequent patches will process the case that success is false
in spdk_bdev_io_get_buf_cb().

Change-Id: I76429a86e18a69aa085a353ac94743296d270b82
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/446045
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
2019-02-27 01:59:11 +00:00
Jim Harris
518c8add8a nvme: add SHST_COMPLETE quirk for VMWare emulated SSDs
VMWare Workstation NVMe emulation does not seem to write the
SHST_COMPLETE bit within 10 seconds, resulting in an ERRLOG
during detach/shutdown.  So add a quirk to cover these VMWare
SSDs.  But rather than squashing the ERRLOG completely for
these SSDs, just add a message instead indicating this is
somewhat expected on these VMWare emulated SSDs.

Fixes issue #676.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I3dfcb631feda639926fd712f1f41abb66cbf2096

Reviewed-on: https://review.gerrithub.io/c/445942
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
2019-02-27 01:46:32 +00:00
Darek Stojaczyk
0aa926c0c0 rte_vhost: introduce get/set vring base idx APIs
Adapted our custom rte_vhost APIs to the upstream DPDK
version which has independently added similar APIs.
This will potentially allow us to remove our internal
rte_vhost copy.

rte_vhost_set_vhost_vring_last_idx() was renamed to
rte_vhost_set_vring_base() and the last vring indices
have to be acquired with a newly introduced rte_vhost_get_vring_base()
rather than rte_vhost_get_vhost_vring().

This is only a refactor, no functionality is changed.

Change-Id: I1ca2c1216635c117832c9d9c784d5661145c04cd
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/446081
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-02-27 01:43:16 +00:00
Jim Harris
3c90b3ddb7 ioat: add device IDs for new CB-DMA engines
Icelake SP Xeon and Snowridge Xeon-D will share
the same IOAT (CB-DMA) PCI device ID.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ia2da3a0923c4db73db5d224a3db4f6913e7e1891

Reviewed-on: https://review.gerrithub.io/c/446157
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
2019-02-26 06:50:52 +00:00
Xiaodong Liu
31c528414f bdev/raid: change related names for base_bdev involved
The elements and functions which are used for raid reset io,
can also be used for other potential raid IO requests which
need multiple base_bdev involved.

Change-Id: Ide7ea190fdbd29da9f9fa22862a0a7c162509697
Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/441308
Reviewed-by: wuzhouhui <wuzhouhui@kingsoft.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-02-26 05:53:15 +00:00
Vitaliy Mysak
08e4ced116 bdev/ocf: synchronize env_allocator creation
Make modyfication of global allocator index tread safe
  by using atomic operation

This patch also changes mempool size to be 2^n - 1
  which makes it more efficient

Change-Id: I5b7426f2feef31471d3a4e6c6d2c7f7474200d68
Signed-off-by: Vitaliy Mysak <vitaliy.mysak@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442695
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-25 23:13:38 +00:00
Wojciech Malikowski
c5c102ce55 lib/ftl: Fix band picking for write pointer
Removing band from "free list" is moved from FTL_BAND_STATE_OPENING
to FTL_BAND_STATE_PREP state's change actions.
This will fix race condition when one band is prepared (erased)
and write pointer is trying to get next active band.

Change-Id: I9e4fe9482a01ee732271736e4a0e6fcedf2582d8
Signed-off-by: Wojciech Malikowski <wojciech.malikowski@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445118
Reviewed-by: Jakub Radtke <jakub.radtke@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-02-25 22:37:44 +00:00
Wojciech Malikowski
0f12c406d1 lib/ftl: Propagate ENOMEM error during read to upper layer
ENOMEM is expected when nvme_qpair will be out of resources.
In such a case ENOMEM shall be propagated to allow upper (bdev)
layer proper handling.

Change-Id: Ie647c2d3efff24a8de949a22ac42a31dfd0e78b7
Signed-off-by: Wojciech Malikowski <wojciech.malikowski@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445580
Reviewed-by: Jakub Radtke <jakub.radtke@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-02-25 22:37:13 +00:00
Jim Harris
6ff6f6d6f8 blob: pass NULL or SPDK_BLOBID_INVALID when bserrno != 0
When an operation fails, we shouldn't pass a handle or
a 'valid' blob ID to the caller's completion function.
The caller *should* ignore it when bserrno != 0, but
it's best to not take that chance.

Fixes #685.

Note: #685 seems to have a broader issue related to
a possibly locked NVMe SSD in the submitter's system.
This only fixes the assert() that was hit.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I3fb3368ccfe0580f0c505285d4b1e9aca797b6a6

Reviewed-on: https://review.gerrithub.io/c/445941
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
2019-02-25 07:06:04 +00:00
GangCao
120825c91c QoS: enable rate limit when opening the bdev
There are some cases that virtual bdev open and close
the device and QoS will be disabled at the last close.
In this case, when a new bdev open operation comes again,
the QoS needs to be enabled again.

Change-Id: I792e610f4592bad1cac55c6c55261d4946c6b3e2
Signed-off-by: GangCao <gang.cao@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442953
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-02-25 04:35:14 +00:00
Zahra Khatami
a55b2109bb nvmf: remaning changes related to nvmf hooks
Change-Id: I6780fa43cebd9f48d1ae0ea6fbeb92a95c4dfa15
Signed-off-by: zkhatami88 <z.khatami88@gmail.com>
Reviewed-on: https://review.gerrithub.io/c/443653
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-02-22 21:16:36 +00:00
Wojciech Malikowski
1442b5f28a lib/ftl: Fix size of write buffer submission queue
SPDK ring size used for write buffer submission queue
must be increased if required number of batches is a
power of two.

Change-Id: I9b9f885064cf6f0f5fe94b0ed4f9d49a4e5c0cd0
Signed-off-by: Wojciech Malikowski <wojciech.malikowski@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445721
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-22 19:13:40 +00:00
Changpeng Liu
de3eb36a61 bdev/nvme: exit controller removal callback if the destruction was started
For real PCIe drives, if we removed one drive, existing hotplug
monitor will trigger the remove callback twice, there is one
workaround for vfio-attached device hot remove detection which
will also trigger the hot removal callback.  For now we add
the check in the bdev_nvme layer so that coredump will not happen.

Fix issue #606.

Change-Id: I0605fbdf391fed20c4aa9a2d54b4f059f29dc483
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445642
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-02-22 18:32:17 +00:00
Darek Stojaczyk
5c1c946c7a bdev/crypto: compile with DPDK 19.02
It seems like DPDK 19.02 has split the "session mempool"
into two separate mempools but this isn't really described
in the DPDK release notes, so this patch only makes our
crypto code behave just like DPDK crypto examples.

rte_cryptodev_queue_pair_setup() no longer accepts
a separate mempool parameter but instead requires it
to be passed through a new field in struct
rte_cryptodev_qp_conf, which is also passed as a param
to rte_cryptodev_queue_pair_setup(). It's referred to as
"session private mempool" instead of "session mempool",
which makes some sense since we already use
rte_cryptodev_sym_get_private_session_size() (with the
word "private" in name) to calculate its size.

The other mempool - "session mempool" - now has to be
allocated with rte_cryptodev_sym_session_pool_create()
instead of regular rte_mempool_create().

Change-Id: I3bc6185855988b864ca59bc1972beaf4f7ea8925
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443738
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-02-22 18:31:52 +00:00
Seth Howell
b38e3a60c6 rdma: change the logic of rdma_qpair_process_pending
I think this simplifies the process a little bit.

Change-Id: Icc87a59c9f6fd965ef35531975b7036d85c4bc95
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445916
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-22 18:31:02 +00:00
Seth Howell
80eecdd881 rdma: use an stailq for incoming_queue
Change-Id: Ib1e59db4c5dffc9bc21f26461dabeff0d171ad22
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445344
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-22 18:31:02 +00:00
Seth Howell
bfdc957c75 rdma: remove the state_cntr variable.
We were only using one value from this array to tell us if the qpair was
idle or not. Remove this array and all of the functions that are no
longer needed after it is removed.

This series is aimed at reverting
fdec444aa8 which has been tied to
performance decreases on master.

Change-Id: Ia3627c1abd15baee8b16d07e436923d222e17ffe
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445336
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-02-22 18:31:02 +00:00
Seth Howell
04ebc6ea28 RDMA: Remove the state_queues
Since we no longer rely on the state queues for draining qpairs, we can
get rid of most of them. We cn keep just a few, and since we don't ever
remove arbitrary elements, we can use stailqs to perform those
operations. Operations on Stailqs carry about half the overhead as
operations on tailqs

Change-Id: I8f184e6269db853619a3581d387d97a795034798
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445332
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-22 18:31:02 +00:00
Changpeng Liu
30bbf3d944 nvme: move probe context as a internal data structure
Users should not access the internal probe context fields when
using the asynchronous probe API, so change spdk_nvme_probe_async()
to let it can only return the probe context pointer.

Change-Id: I0413c2d8db6cbe4539ad80919ed34dd621a9df70
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445870
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-22 18:13:39 +00:00
Shuhei Matsumoto
8696cd4288 dif: Add seed value for guard to avoid 0 in case of all zero data.
Allow user to add seed value for guard compuation to DIF context.
This will avoid the guard being zero in case of all zero data.

NVMe controller doesn't support seed value for guard computation
explicitly, and hence if we want to use such a seed value in
NVMe controller, we have to format metadata more than 8 byte,
and add seed value into the reserved metadata field.

But some popular iSCSI/FC HBAs and SAS controllers have supported
seed value for guard computation, and so supporting seed value
in the SPDK DIF library is very helpful for some use cases.

Hence this patch makes the DIF library possible to specify seed
value for those use cases.

Change-Id: I7e9e87cb441bf263e64605c7820409fdc22dd977
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/444334
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: wuzhouhui <wuzhouhui@kingsoft.com>
2019-02-22 17:52:51 +00:00
Jim Harris
7739a1f338 vhost: use mmap_size to check for 2MB hugepage multiple
Older versions of QEMU (<= 2.11) expose the VGA BIOS
hole (0xA0000-0xBFFFF) by specifying two separate memory
regions - one before and one after the hole.  This results
in the "size" not being a 2MB multiple.  But the underlying
memory is still mmaped at a 2MB multiple - so that's what
we should be checking to ensure the memory is hugepage backed.

Fixes #673.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I1644bb6d8a8fb1fd51a548ae7a17da061c18c669

Reviewed-on: https://review.gerrithub.io/c/445764
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-02-22 10:24:16 +00:00
Darek Stojaczyk
b42cf6eaff env/dpdk: allow changing DPDK loglevels
spdk_env_opts->env_context may now contain a DPDK-specific
string that will be appended directly into rte_eal_init().
It can be used to e.g. override the default EAL loglevel,
which was hardcoded to RTE_LOG_NOTICE so far.

This is primarily meant to be used during development.

As a test for this feature, the vtophys test app will now
set the highest possible EAL loglevel which will give us
a ton of additional debug logs.

Note: the opts->env_context field is implementation-specific
and hence the vtophys app needs to check if it's run with
our env_dpdk. As SPDK_CONFIG_ENV is a raw text not even
surrounded with quotation marks, the vtophys app needs to
do a bit of #define magic to make it a string.

Change-Id: I0b2196770e5b59a6c33d0170337c34f9f8b8466e
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445111
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-02-22 08:49:45 +00:00
Darek Stojaczyk
9858408b55 env/dpdk: fix potential memleak on init failure
When we were trying to push a newly allocated string
into the arg array and the array realloc() failed,
the string we were about to insert was leaked.

Change-Id: I31ccd5a09956d5407b2938792ecc9b482b2419d1
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445149
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-02-22 08:49:45 +00:00
Shuhei Matsumoto
df99e28158 nvmf: Expose bdev's PI setting to NVMe-oF Initiator
This patch expose backend's bdev's PI setting to the corresponding
NVMe-oF Initiator by Ideintify command, and removes the check if
block size is 512 multiple.

These change enables NVMe-oF Initiator to send extended LBA payload.

Change-Id: Ia7aa8332d36f056872a515b6da90c83112edb909
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/445056
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-02-22 00:36:55 +00:00
lorneli
815f82b17b nvme: mv submit_tick assignments to generic qpair code
Move req->submit_tick assignments from specific transports to generic
qpair code.

Check whether submit_tick has been assigned before doing the actual
assignment, because a request may be submitted several times and the
original submit_tick shouldn't be covered.

Change-Id: I2de8018dc21763eb5a19bb9d48dfbdef764b036e
Signed-off-by: lorneli <lorneli@163.com>
Reviewed-on: https://review.gerrithub.io/c/444702
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-02-21 20:29:59 +00:00
Shuhei Matsumoto
95fc5d3581 iscsi: Remove SPDK_ISCSI_MAX_SEND_DATA_SEGMENT_LENGTH
In iSCSI, SPDK_ISCSI_MAX_SEND_DATA_SEGMENT_LENGTH was an alias
of SPDK_BDEV_LARGE_BUF_MAX_SIZE.

iSCSI had used both interchangeably.

SPDK_BDEV_LARGE_BUF_MAX_SIZE means the buffer size of the large
buffer pool in generic bdev layer, and will be changed to be
configurable.

SPDK_ISCSI_MAX_SEND_DATA_SEGMENT_LENGTH had been used to negotiate
MaxRecvDataSegmentLength with iSCSI initiator and to split large
read data, but both are determined by not iSCSI target but generic
bdev layer.

Hence this patch replaces SPDK_ISCSI_MAX_SEND_DATA_SEGMENT_LENGTH
by SPDK_BDEV_LARGE_BUF_MAX_SIZE.

Change-Id: I822a5203a5092fe8b2d1ca3f93423f1acbfc782e
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/444539
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-21 18:38:07 +00:00
Shuhei Matsumoto
d0efddd281 iscsi: Move macro constant DEFAULT_MAX_QUEUE_DEPTH to the appropriate location
This macro constant is not related with data size and should be moved
to the separate location.

Change-Id: I73b337f5750c39d1f87591c2e372664019e50b95
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/444545
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
2019-02-21 18:38:07 +00:00
Ziye Yang
2da86de69f nvmf/tcp: fix error message printing in spdk_nvmf_tcp_qpair_set_recv_state
If the current recv_state of qpair is same with the state to be set,
we will print error message. And checked the current code,
we should add a check to avoid this.

Change-Id: I49334f637c48e565e785d1fe6d0f000e18b2048a
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445653
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-21 18:04:10 +00:00
heyang
7cd3a6f5e0 nvme: add memory barrier in completion path for arm64
Add a memory barrier for arm64 to prevent possible reordering
of tracker and cpl access,
because arm64 has less strict memory ordering behavior than x86.

Change-Id: I0a8716f7bfeffb0bbce27ee3174e214c8e4566b4
Signed-off-by: heyang <heyang18@huawei.com>
Reviewed-on: https://review.gerrithub.io/c/442964
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-02-21 18:02:31 +00:00
Changpeng Liu
f8dfbc5e9f bdev/nvme: set hotplug poller with default period value
If users didn't set the "HotplugPollRate" field, the value
will be set to NVME_HOTPLUG_POLL_PERIOD_MAX, which isn't
aligned with our design purpose.

Change-Id: I9795d7a16a1cc44ed4de7c40f376c563d977b455
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445077
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-02-21 09:05:55 +00:00
Robert Bałdyga
ec50de0957 bdev/ocf: Add missing error handling in bottom adapter
Signed-off-by: Robert Bałdyga <r.baldyga@hackerion.com>
Change-Id: Iffa18e578511ad656cc4aae097f0066c0a2709eb
Reviewed-on: https://review.gerrithub.io/c/445032
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-21 07:39:59 +00:00
Ziye Yang
a1c5442d16 nvmf/tcp: remove the tqpair->group = NULL statement
Purpose: solve the coredump issue for the buffer
return later in spdk_nvmf_tcp_request_free_buffers.

If keep this statement, we cannot return the buffer
to the polling group.

Change-Id: Ib5c95ba54b37540950e654110fe6317cab507076
Signed-off-by: Ziye Yang <optimistyzy@gmail.com>
Reviewed-on: https://review.gerrithub.io/c/445435
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-02-21 03:37:47 +00:00
Ziye Yang
3a486ab6be nvme/tcp: remove the unnecessary active_r2t_reqs
Change-Id: I3ce4c8cfce5f3e7c2e05b4fa11322805a08ec688
Signed-off-by: Ziye Yang <optimistyzy@gmail.com>
Reviewed-on: https://review.gerrithub.io/c/445240
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-02-20 21:47:02 +00:00
Ziye Yang
14e1d0c747 nvme/tcp: call nvme_ctrlr_add_process in construct function.
Purpose: to make the timeout work for NVMe TCP transport,
we miss this for TCP transport.

Change-Id: Iab4af988cc4796b4d6d98430453f3dbce1fcf313
Signed-off-by: Ziye Yang <optimistyzy@gmail.com>
Reviewed-on: https://review.gerrithub.io/c/445117
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-02-20 20:27:25 +00:00
paul luse
ba82b412cb bdev/crypto: fix error path memory leak in driver init
This patch refactors driver init and in doing so eliminates the mem
leak described in the GitHub issue.  Also it is now consistent with
how the pending compression driver does init.

Fixes #633

Change-Id: Ia2d55d9e98fb9470ff8f9b34aeb4ee9f3d0478f5
Signed-off-by: paul luse <paul.e.luse@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442896
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-20 20:22:16 +00:00
Ziye Yang
73c5108684 bdev/nvme: Enable the timeout function if timeout value is provided
We should not add addtional check since we already have this
option in timeout_cb function, the addtional check is unnecessary.

Change-Id: I77c89303155e0c14072a1838994f9e76a0ffc0f4
Signed-off-by: Ziye Yang <optimistyzy@gmail.com>
Reviewed-on: https://review.gerrithub.io/c/445319
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-02-20 20:20:21 +00:00
Ziye Yang
7bf5e1dee3 nvme/tcp: Implement nvme_tcp_qpair_fail function.
This patch is used to implement this function.
Since we need to call nvme_tcp_req_complete in this
function, so we need to adjust the location of the
nvme_tcp_rep_complete funtion.

Change-Id: I5fc3693aec8dc166ac1eb03babcd2d73d7b00e63
Signed-off-by: Ziye Yang <optimistyzy@gmail.com>
Reviewed-on: https://review.gerrithub.io/c/444489
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
2019-02-20 20:18:46 +00:00
Shuhei Matsumoto
e7dc23696b scsi: Inline spdk_bdev_scsi_read/write into spdk_bdev_scsi_read_write
In this patch series, spdk_bdev_scsi_read and spdk_bdev_scsi_write
became almost identical. Hence squash them into spdk_bdev_scsi_read_write.

Change-Id: Ibbaddf74c1bf2dac37a0133eac27086af650a061
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/444780
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
2019-02-20 20:17:56 +00:00
Shuhei Matsumoto
07e9a00b60 scsi: Use spdk_bdev_writev_blocks instead of spdk_bdev_writev
This is in a effort to consolidate SCSI read and write I/O
for the upcoming transparent DIF support.

Previously conversion of bytes and blocks are done both in
SCSI layer and BDEV layer. After the patch series, conversion is
consolidated into SCSI layer.

Change-Id: Ib964a41ec22757f2a09cea22f398903f78d0781f
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/444779
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
2019-02-20 20:17:56 +00:00