In NVME TCP initiator zero copy is enabled for IO qpairs
and disabled for admin qpairs
Change-Id: Ibdf521dccde9b95ec5dd15a5eb2baed8fcf8b88e
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4211
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
A preparation step for enabling zero copy in NVMEoF TCP initiator.
This option will be used to disable zero copy
for admin qpair. This is needed since the admin
qpair's socket is not connected to socket poll group
and we can't receive buffer reclaim notification.
Change-Id: Ibfbb8a156aafcd7ba8975a50f790da7fbd37d96f
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4210
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
We have some RPCs defined in the bdev library itself,
others in a separate bdev_rpc library. There's no need
for the separate library - just move them all into the
bdev library.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I298eedb88924197e64eb315369efb10f402903a5
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4364
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
There is no need to have the application-level RPCs
defined separately from the event library itself
(which defines the application framework).
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ic264ed761f5ec1a40d604e63395c5740af4be1a6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4363
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This RPC was originally put into the app_rpc library,
but the log_rpc library is a better home for it, since
other log-related RPCs are already there.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I7ba5ac6cdeb57fb4219244690590c8fabbc3f59a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4361
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This was not used by anything. It was intended for use by user-space
TCP stacks.
Change-Id: I416589e421784882c693bcc5b03fe1dbcc4b1bd3
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4297
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
The bdev layer nicely handles the case where we call this
function with the buffers already present - it just
immediately calls the get_buf_cb. But this adds extra
overhead in the case where the buffer is already present.
Since nvme has no alignment restrictions, we can just
check the iovs directly and avoid the extra call to
spdk_bdev_io_get_buf when possible.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I66df0fde574a35e995a3432999d75bdbf9b27212
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4317
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
write instead of writev
Call spdk_nvme_ns_cmd_write_with_md if there is only one iov element.
Use spdk_nvme_ns_cmd_writev_with_md only if there is more than one.
This is about a 15% improvement in I/Ops.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: I1a99b5507d37cde04e81feada65b14554ad01a17
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4319
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
readv API
It's cheaper to call spdk_nvme_ns_cmd_read_with_md than it is to call
spdk_nvme_ns_cmd_readv_with_md, so do a quick check of the iovcnt and
use the best one for the job.
This is about a 15% improvement in I/Ops.
Change-Id: I82e6677d1ac47abf9919f95e651e7a7595c5e9a3
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4316
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The poller is now created internally to the library whenever a target
is constructed. Applications are not expected to poll for connections
any longer.
Change-Id: I523eb6adcc042c1ba2ed41b1cb41256b8bf63772
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3583
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
When net.core.optmem_max is not set high enough, a call to sendmsg()
might fail with ENOBUFS. Currently this is treated as an error.
When we have no more buffer space left, we should continue to process
any completions and by doing so, free up the auxiliary buffers we ran out
of.
With this change I was able to run perf against the spdk target with a
purposely set to a low, value of optmem_max, where previously it would
fail.
This fixes github issue #1592
Signed-off-by: Jeffry Molanus <jeffry.molanus@gmail.com>
Change-Id: Ieeeed4fbecd827d0da815456b57fbe81495fe54d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4129
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
It aims to avoid bdev starvation by setting separate
io_context to each bdev_aio io channel and link them
into group channel.
Previous, each ch of aio_bdev on a same spdk_thread shares
same io_context. If one of them submits too many IO to the
io_context, then other channels will be starved.
Moreover, if the first bdev_io gets NOMEM error, then the
following IO of this bdev have no chance to get processed.
The bdev_aio starvation can be reproduced by bdevperf in
v20.04, by:
./bdevperf -q 100 -w randwrite -t 5 -c hdd.conf -o 1048576
(hdd.conf defines 2 aio_bdevs of HDD)
Change-Id: Ic709323f5baeb2f8f7250b75ad872ec4156b5e78
Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3808
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Community-CI: Broadcom CI
This is important to avoid doubling up on state changes
and hitting asserts.
Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: If8797ea13a5c224cee85e53e9b2542012423b37f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3759
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Not sure how this one was missed back during the big rename party.
Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: Ia00819a2c2e8c81ad0499f4081667dde04fa726e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4140
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: John Kariuki <John.K.Kariuki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This patch is used to enable placement_id getting
in sock layer and also add the rpc support.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I70de57b0ed392a0aefce9d3ff1f61ef924015a87
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4146
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
In most cases, there is single element for iov array,
so using rbd_aio_read/write is enough.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: Id3ccbb4a4ff10be9d4546c8e89e6dc596d0f3a72
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4122
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
The timer poller could add the latency since we poll it every 50ns.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: Ia3d0ec56658b9365fdd34b67663a3365eb738451
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4081
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: <dongx.yi@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Community-CI: Broadcom CI
It is a good enhancement for the GPT vbdev module to support
interleaved metadata but interleaved metadata is not critical and
the priority is low.
However the following error log is always collected when there is
any bdev with interleaved metadata.
vbdev_gpt.c: 552:vbdev_gpt_examine: *ERROR*: GPT module does not
support block size 4224 for bdev Null0
This does not cause any failure but confuses user. Hence, change
the corresponding print from ERRLOG to DEBUGLOG.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Id58507cbdc830d2b139fb34faa29e08d73c16141
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4058
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Broadcom CI
When using --dif-type option --md-size should be
required as well.
Update & improve bdev_null_create rpc.py help
messages as well.
Change-Id: I6588a97aef6c8792bab7a41ece17c0461bb36844
Signed-off-by: Karol Latecki <karol.latecki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3949
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Allow toggling log timestamps on and off by adding new RPC call.
Change-Id: I34c84bf89fae352ade266fbf7fd20594ff67bced
Signed-off-by: Vitaliy Mysak <vitaliy.mysak@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2024
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
The type of sendmsg_idx is uint32_t, so the maximal
is 2^32 -1, so it could be overflow and get 0, so
we should fix it.
PS: I think that our code may have potential defect.
In my experiment, I try to init sendmsg_idx with 2^32 -1,
so the first req->internal.offset = 2^32 - 1.
But for the ee_info and ee_data in "struct sock_extended_err"
got from _sock_check_zcopy is all 0 in the target side.
So it means that the this req will never be completed.
With the increase of sendmsg_idx (the type is
uint32_t), sendmsg_idx will finally goto 2^32 - 1, so I
think it will still kick the issue I described.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: Ic9aaf629d73d5b7e2c81800a4f7f92c728adbc34
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3948
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
In function's error processing, the io_channel should
also be put back.
Change-Id: Ibd3a616244a77fcc29a5ec5d1d1ca53f5412e8c6
Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3859
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
The one large global mempool was a waste of memory for apps that
don't use the accel framework as its always allocated a pool sized
to handle a heavy load with multiple threads.
Instead move to a per channel list of just 1024 tasks greatly
decreasing the memory footprint but still able to scale as more
threads are added.
Also renamed all accel_req to acccel_taak and simply task to
accel_task as this was being touched anyways and not consistent.
fixes issue #1510
Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: I0e93ca6270323e2df4b739711c5d9b667a52e1eb
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3740
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
if mountpoint doesn't exist, fuse mount will fail and @bfuse
will be null, blobfs_fuse_stop() try to access it without
check which cause segmentation finally.
Patch also improve test messages output to include
terminator properly.
Signed-off-by: Wang Shilong <wangshilong1991@gmail.com>
Change-Id: Idba23e3922778f1985d1f4841a1701cdc272ef17
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3863
Community-CI: Mellanox Build Bot
Reviewed-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This patch removes implementation of VPP socket abstraction
along with ways to compile it.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I089f7703cfc4fb517f8f80f4368e544bced549b6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3734
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This patch is used to implement the group polling
policy instead of each rbd has one poller.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: Ieb975e656240bcdaf2657410f010d72b156639ed
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3698
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
We can use rbd_aio_readv and rbd_aio_writev to simplify the code.
Since it rbd_aio_readv/rbd_aio_writev are supported more than 2
years, so can replace rbd_aio_read/write without using "ifdef".
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I924a039110ceed59836c1b904b6e7f8db271ad62
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3697
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
This will allow us to use this RPC to detach only specific
paths from controllers.
Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: Ib52e38aa7d4ea096418a6dc0328481c2e8db6c54
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3070
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This RPC was using snprintf which could potentially
truncate a bad argument silently without alerting the
caller.
Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: I1795b2e4086cbaa45390bd11b189755818b5c9e7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3613
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
According to customers's report, there still exists
the Rados related thread contention with SPDK's reactor thread.
So also meove bdev_rados_context_init into bdev_rbd_handle.
With this patch, QEMU + SPDK vhost + Bdev RBD 's performance
can be improved.
Change-Id: Ie26a140f8d2cfe1a98cd1737f4ff70e9319cdc11
Signed-off-by: Yaowei Bai <baiyaowei_yewu@cmss.chinamobile.com>
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3699
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: <dongx.yi@intel.com>
Reviewed-by: GangCao <gang.cao@intel.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Add a helper function iscsi_parse_redirect_addr() to validate the
passed IP address-port pair.
iSCSI login redirection will support only numeric IP address and
TCP port, and add AI_NUMERICSERV and AI_NUMERICHOST.
This function is almost same as nvme_tcp_parse_addr() and
nvme_rdma_parse_addr().
Besides, update error log in posix_sock_create() to use
gai_strerror(). gai_strerror() will provide more accurate
information as done by nvme_tcp_parse_addr() and nvme_rdma_parse_addr().
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I65c6de81a64dcb26551ce796172d0458e1c298a7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3357
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
TCP delayed ACK can be disabled or enabled by enabling or disabling
quick ACK, respectively.
The recently added spdk_sock_impl_opts is helpful for sock library
to control quick ACK.
Hence this patch adds and uses an option enable_quickack. The option
is effective only for the POSIX sock module.
We have spdk_sock_opts now too but spdk_sock_impl_opts will be better
for this case.
This option is not supported on FreeBSD. FreeBSD users can set the
option globally via sysctl if desired.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ic89620267acce5872dc8ecaf7a99bb70ae97e993
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3603
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This will allow us to further develop the ecosystem for multipath
failover support.
Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: I24a8cf13e60e6cc0d5b6374da33c8a4e5b6c499a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3069
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This allows us to avoid creating a separate rpc just for multipath TRIDs.
Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: I4e83167eaf16e50a72efbd513333a4d09c52be61
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2884
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
When we fail to process admin completions on a controller
attempt to failover to a previously registered trid
Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: I547bd010f4b339b2af7f2b33027cddad4b4926bc
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3045
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
We are about to instrument some checks in this function
that will access the nvme_bdev_ctrlr struct directly so
pass that directly since the ctrlr is still accessible from
the nvme_bdev_ctrlr struct.
Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: I4e0d0d6c433149368e84275426362f8283903027
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3340
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This is part of a larger series enabling failover at the bdev
layer for NVMe.
Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: Ic5c128244699c1a47275145ca7e41aa5f1366259
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3044
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This patch paves the way for introducing a tailq containing
multiple alternative paths to the same controller.
Change-Id: I13d30c12b8e0ce38eae687f9e76740be1d11e4d1
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3259
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Because of the Linux kernel has limitation, IORING_SETUP_IOPOLL is only
used for local devices (e.g., local files, pcie NVMe SSDs etc.). However,
it does not work for devices atttached from the remote. So in order to
make bdev uring generic, Let's do not use IORING_SETUP_IOPOLL to create the
uring.
Change-Id: I6aea1ff222a8a0d67ab040ada75aa0ef6730e725
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3587
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The pipe buffer has obvious performance influence on arm64. The
following is my test result with 1core, we can also enable it
on arm64 currently like the posix socket. And later we can find
the optimal pipe size that won't cause a degradation for large
payloads.
randwrite randread
512 byte 61% 97%
4096 byte 84% 16%
16384 byte -13% -17%
Signed-off-by: Richael Zhuang <richael.zhuang@arm.com>
Change-Id: Ib4df60751c5e06ef9bd7fc7bb7efafa5ad4de211
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3329
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
During virtio_pci_dev_probe, if enum_cb fails, hw needs
to be released. But in bdev_virtio, if vdev fails after
initialization, it will enter the bdev destruction process
which call the modern_destruct_dev function and hw will
be released during the process. So we will encounter the
problem of hw being released twice.
Change-Id: I1e8116283cfd810dfb050f8928f4ecd4bb2d815b
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3566
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
During virtio_pci_dev_probe, if enum_cb fails, hw needs
to be released. But in bdev_virtio, if vdev fails after
initialization, it will enter the bdev destruction process
which call the modern_destruct_dev function and hw will
be released during the process. So we will encounter the
problem of hw being released twice.
Change-Id: Ifba35284c072355ba0e10428b597a1894d32d59e
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3564
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Fixes#1498
When shutting down the application, it was possible to
reference stale ocf_cache pointer. This was the case
when two or more vbdev_ocf devices were based on top
of single cache bdev.
This issue did not occur outside of the shutdown case,
since RPC only allows deletion of the vbdev_ocf.
This erases on disk metadata and next run of the application,
would not detect such vbdev_ocf.
Shutdown meanwhile works different, by first stopping
the instance of running "ocf_mngt_cache" and later detaching
"core" devices (the ones being cached). This prevented
erasing the on disk metadata and allowed for restarted
application to detect vbdev_ocf.
See patch (1292ef2) for details.
Since references to ocf_cache are copied between vbdev_ocf
[see start_cache()], the reference count inside ocf_cache
was limited to original ocf_mngt_cache_start() and
management queue creation. First call into ocf_mngt_cache_stop()
released all references to ocf_cache. Leaving other
vbdev_ocfs pointing to released memory.
This patch works around this issue by increasing ref cnt
on ocf_cache for each vbdev based on top of it.
It allows to call into ocf_mngt_cache_stop(), but not
release the memory for ocf_cache until last vbdev.
Note:
A proper redesign here is in order:
- either rearranging structures to be based around single ocf_cache,
rather than multiple vbdev_ocf instances
- better use of OCF API to reduce book keeping logic in vbdev
There are plans to implement detach/attach in RPC,
so it should be a focus during that effort.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I560a7fbb1c052bf53970e655bdb60803c561a252
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3574
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
There are additional conditions which SPDK tracks,
that are known before issuing call to OCF.
Two main ones are:
- if vbdev->ocf_cache was not yet created [start_cache()]
- if the cache bdev was opened [attach_base()]
Both happen for the first cache bdev once. Then for
consecutive vbdev_ocf on same cache bdev, reference
will be copied.
This call will simplify checking both conditions.
Calling into OCF with NULL or stale ocf_cache pointer,
rightly so will cause issues with ASAN.
Related #1498
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: Ib202c15bda4cbbffa1516c69168e8bfb80370047
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3573
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Added to the framework as well as all 3 engines. Needed by apps
in the event that they have to fail following the creation of a
batch, allows them to tell the framework to forget about the batch
as they have no intent to send it.
Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: Id94754ab1350e5a969a5fd2306bd59c38f0a0120
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3389
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>