Rename it to _is_core_at_limit(). This function
currently returns true if the core is at the limit
(instead of over the limit) which is really the semantics
that we want - so just change the name of the function
to make it more precise.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Idf815f67c71463c3b98bc00211aafdc291abdbd2
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9582
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
We have _is_core_over_limit() which determines if a core is
currently over its busy:total tsc ratio. We use this to determine
if we need to move threads off of a core that is too busy.
But when we pick a core to move a thread *to* we were allowing the
dst core to fill to 100%, rather than the SCHEDULER_CORE_LIMIT.
This patch fixes that, which has the nice effect of keeping
thread-to-core assignments much more stable when running
I/O workloads.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Id98b08803939d2a25104082e6436bb8d4727d7c2
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9578
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This will lead the scheduler to be quicker to move
threads to an unused core - favoring performance over
power savings.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ibaa5edc61a4bdca5550bd23a562c3645fded25e9
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9551
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
If a core has a very high busy percentage, we should
not assume that moving a thread will gain that
thread's busy tsc as newly idle cycles for the
current core.
So if the current core's percentage is above
SCHEDULER_CORE_BUSY (95%), do not adjust the
current core's busy/idle tsc when moving a thread
off of it. If moving the thread does actually
result in some newly idle tsc, it will get adjusted
next scheduling period.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I26a0282cd8f8e821809289b80c979cf94335353d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9581
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
For the src thread, add the busy_tsc of the thread
we are moving to the idle_tsc of the current core.
This is consistent with how are accounting for the
cycles in the target core too.
We will disable the load_balancing.sh script for now.
We will reenable it later in this patch set once
a few other changes are made, along with some updates
to the load_balancing.sh script based on the changes
made in this patch set.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I8af82610804e97dabf62ccd90f75a0e6e37d276f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9550
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This will be useful in some upcoming patches where we will
be calculating these percentages in more places.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: If7d84c00fe1b666988fe06537836ba7b9cb161aa
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9580
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Purpose: Better to put this varable into the bdev_rbd_io structure
instead of on the stack. And we will use this variable in next patch,
when the callback function is completed, so we should not put it on the stack.
Change-Id: I11ff46ef07908084012bc1ce040eceb667334a40
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9334
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
The purpose is that we will remove the group reaping of
rbd_io later, so we need to know the original thread info of the
rbd_io.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I69f60261447fdac0b0885fdb213e92c246439047
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9585
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Allow to return more than one memory domain.
This change aligns bdev and nvme API and provides
more flexibility for custom transports.
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: Ica9b12ad8463c361be6cb62ee2c0513eec0b486d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9546
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Enable dump of transport stats in functional test.
Update unit tests to support the new statistics
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I815aeea7d07bd33a915f19537d60611ba7101361
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8885
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This patch enables us to aggrete multiple ctrlrs in the same NVM
subsystem into a single bdev ctrlr to create multipath.
This patch has a critical limitation that ctrlrs which are aggregated
need to have no namespace. Hence any nvme bdev is not created.
However it will be removed in the next patch.
The design is as follows.
A nvme_bdev_ctrlr is created to aggregate multiple nvme_ctrlrs in
the same NVM subsystem. The name of the nvme_ctrlr is changed to be
the name of the nvme_bdev_ctrlr.
NVMe bdev module has both the failover feature and the multipath
feature now. To choose which of failover or multipath to use, add an new
parameter multipath to the RPC bdev_nvme_attach_controller.
When we attach a new trid to the existing nvme_bdev_ctrlr, we use the failover
feature if multipath is false, we use the multipath feature if multipath is
false.
nvme_bdev_ctrlr has a list for nvme_ctrlr and it is guarded by the
global mutex. Callers can query nvme_ctrlrs from a nvme_bdev_ctrlr via
trid as a key. nvme_bdev_ctrlr is not registered as io_device.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I20571bf89a65d53a00fb77236ad1b193e88b8153
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8119
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Previously, if an I/O qpair is disconnected, we tried reconnecting
the qpair. However, this reconnect operation was very likely to fail
and will not match the upcoming asynchronous connect/reconnect
operation. We need an extra callback to make this reconnect operation
asynchronous, but we do not want to have it.
Hence if an I/O qpair is disconnected, we free the I/O qpair and then
reset the corresponding nvme_ctrlr immediately. If the admin qpair is
also disconnected, the nvme_ctrlr is reset immediately. However this
event may never happen. So we do not wait for the error of the admin
qpair.
The NVMf host may disconnect connections by itself intentionally.
In this case, resetting the nvme_ctrlr will surely fail. But resetting
the nvme_ctrlr frees all I/O qpairs of the nvme_ctrlr and these I/O
qpairs are not created again until resetting the nvme_ctrlr succeeds.
Resetting the nvme_ctrlr once at most is more efficient than repeating
reconnecting the I/O qpair. So this change is valuable even for such
intentional disconnection. However, it is helpful to know the event that
I/O qpair is disconnected. Hence change DEBUGLOG to NOTICELOG in the
disconnected callback. The disconnected callback is not repeated, and
we do not need to worry about NOTICELOG flooding.
Refine the unit test case to verify this change.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I376b749c2f55d010692bf916370e8bb4249b795f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9515
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This is similar to how we name other module library
directories.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Iadaf59231323180b48b5d0cf2e6acb3d8bfc9807
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9549
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Options modified on sock after connected is also moved to a
function.
Signed-off-by: Liu Qing <winglq@gmail.com>
Change-Id: I4c2a9ae9858c102764959d055bed208b4b0621d9
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9516
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This reverts commit 2cd948c4a6.
This commit caused drop in performance tests.
More info in issue https://github.com/spdk/spdk/issues/2158
Signed-off-by: Maciej Wawryk <maciejx.wawryk@intel.com>
Change-Id: Id5d353535323c79e773e33377af388dae47238cb
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9510
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Poll group has a list of the associated ctrlr channels to update
them when the corresponding I/O qpair is disconnected to do path
failover.
Another idea is that poll group has a list of the associated bdev
channels. However, two or more bdev channels may share a single
ctrlr channel and I/O qpair is per ctrlr channel. What we want to do
by this addition is to stop I/O submission to any failed I/O qpair
and choose alternative I/O qpair. Hence we take the first idea.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ia287bd1b803313e66b8505a19694a40133b675f1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8124
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Previously nvme_ctrlr_channel had a pointer to a nvme_ctrlr and used it
throughout. However the use cases are not performance critical and
are better to convert from nvme_ctrlr_channel to nvme_ctrlr by
spdk_io_channel_get_ctx() and spdk_io_channel_get_io_device().
Provide a convenient macro nvme_ctrlr_channel_get_ctrlr() to do it.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ie22e8c318af88e09b95824c67b3e874d85425f1a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9195
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
bdev_nvme_find_admin_path() is used only in a place and it's role
is to find a non-failed ctrlr even after multipath is supported.
Inline it into bdev_nvme_admin_passthru() will be better.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: If58507c49b43d047e1f3ef25bbdfb571c36a1956
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9194
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@gmail.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Previously bdev_nvme_reset() returned -EBUSY if ctrlr is being
destructed and returned -EAGAIN if ctrlr is being reset.
These did not match what spdk_nvme_ctrlr_reset() returned.
Reset operation will be more important than current when multipath
is supported and reset operation is made asynchronous.
Hence change bdev_nvme_reset() to follow spdk_nvme_ctrlr_reset().
bdev_nvme_reset() returns -ENXIO if ctrlr is being destructed and
returns -EBUSY if ctrlr is being reset.
Additionally change the return value of bdev_nvme_failover()
accordingly. After the change bdev_nvme_failover() returns -ENXIO
if being destructed and returns -EBUSY if ctrlr is being reset.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ie2c6f8601050b1043d83de9cf01490751784e4e5
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8859
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Konrad Sztyber <konrad.sztyber@gmail.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Krzysztof Karas <krzysztof.karas@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Following the last patch, include hostid into ctrlr_opts rather than
passing it as a parameter for bdev_nvme_create().
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I0d04db1c5767ec76a9a7cd255c3a8d56b0b8f583
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9344
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@gmail.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
nvme_ctrlr_populate_namespace_done() needs nvme_ns, probe_ctx, and completion
status.
When multipath is supported, nvme_ctrlr_populate_namespace_done() will
be called from the spdk_for_each_channel() callback. The spdk_for_each_channel()
callback can have one context and one completion status.
Hence nvme_ns or probe_ctx need to have another.
If an nvme_ctrlr has multiple namespaces, a probe_ctx is shared by
multiple nvme_ns.
So let nvme_ns has probe_ctx.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Iedaaab80616d34d01935f4ebc31e1f3b84e09e32
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9047
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
For multipath, when an new nvme_ns is added to an existing nvme_bdev,
we need to add the corresponding qpair to all existing nvme_bdev_channels
dynamically. The RPC bdev_nvme_attach_controller has to return after that.
Hence populating namespace needs to be asynchronous.
On the other hand, when an nvme_ns is removed from an existing nvme_bdev,
we need to remove the corresponding qpair from all existing nvme_bdev_channels
dynamically. Hence depopulating namespace needs to be asynchronous.
By the recent refactoring, two callbacks, nvme_ctrlr_populate_namespace_done()
and nvme_ctrlr_depopulate_namespace_done() were removed accidentally.
We need to restore these.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I37ea2420588cef3a18648dec053f8bd2b884e86b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9392
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
A few functions can be made private in bdev_nvme.c. Hence delete
these function declarations from bdev_nvme.h.
Besides, bdev_nvme_remove_trid() is only a function declaration.
Hence delete it together in this patch.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I7bf9ae6da332c6426f6eb40926e7c4130d9acd8a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9444
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
An error might occur after succesful transport creation
when the new transport is added to nvmf poll groups, e.g.
in nvmf_transport_poll_group_create. In that case
transport is not detroyed and poll groups are not fully
functional. To correct this behaviour, destroy transport if
spdk_nvmf_tgt_add_transport fails. Also update nvmf_tgt
initialization step to check that all poll groups were
created.
Change-Id: I116e6944729d846c1755c2844c77825f65db8c12
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9255
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jacek Kalwas <jacek.kalwas@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
This only existed to share code between OCSSD and regular NVM
namespaces. Now OCSSD is gone, so just merge the files into bdev_nvme.
Change-Id: Idb73cc05d67144de5dd20af8db24c8f6974d10a7
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9337
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
nvme_ctrlr_populate_namespaces
Avoid relying on this number. Different targets have interpreted its
meaning in different ways and it cannot be used anymore in practice. It
may also be very, very large.
Change-Id: I94e8eae49d6ccdbd8be302b30a120d89242b6d39
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9316
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Try to use these accessors instead of directly using the namespaces
array. This will make changing the data structure easier later on.
Change-Id: I3367d0e0065894f3aa199ed1698d27976b4cbbb5
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9315
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
If the number of namespaces is very large, this can cause excessive
memory allocation. This is especially true because when the number of
namespaces is large, it is almost always very sparsely populated.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: I27d94956c222ae3c49c6a7422164ae3a8ec8d963
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9302
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Do simple validity checks first, then check for duplicate controllers.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: Ic21d64574b37a3d9148e5cd6d5a7599449ea8fe1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9341
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Jacek Kalwas <jacek.kalwas@intel.com>
It is more flexible now as it is possible to get nvme ns handle to do
additional management or queries, however if nvme ctrlr handle is needed
there is already public nvme API for that with nvme ns as an input.
Signed-off-by: Jacek Kalwas <jacek.kalwas@intel.com>
Change-Id: I5493168ad31cc95687962288d57fb5457f2d7dd6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9357
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
When preparing for a reset, use this new call to tell
the driver to avoid sending DELETE_CQ/SQ commands to a
PCIe controller when they aren't needed.
Fixes issue #2073.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I9ebb7d5c3f7cbb1c3192f162f32edbbea41acde1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9250
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Matt Dumm <matt.dumm@hpe.com>
Reviewed-by: Michael Haeuptle <michaelhaeuptle@gmail.com>
Added writing out JSON configuration for the scheduler
subsystem.
Signed-off-by: Krzysztof Karas <krzysztof.karas@intel.com>
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I51b1f94b3f56d0bfb8a87127163c8e248d6846b6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7119
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This patch moves schedueler and governor related API from
the internal event.h to public scheduler.h.
With this it is possible to create subsystem responsible
for handling the schedulers.
Three schedulers and a governor were moved to scheduler modules
from event framework.
This will allow next patch to add JSON RPC configuration
to the whole subsystem.
Along with easier addition of other schedulers.
Removed debug logs from gscheduler, as they serve little purpose.
Signed-off-by: Krzysztof Karas <krzysztof.karas@intel.com>
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I98ca1ea4fb281beb71941656444267842a8875b7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6995
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
When we rotate the socket in the list, we did not check whether the uc pointer
is NULL, then we cause the coredump when using uc pointer.
When we add a new socket (with pollin event) into the pending_recv list,
we add it into the end of the pending_recv list, then it delays the execution
of the newly socket, and it can cause the timeout especially when a socket
is in a connection phase.
So the purpose of this patch is:
1 Revise the rotation logic to handle the two cases, i.e., (1)sock is in the
beginning of the list; (2)sock is in the end of list. The purpose is
to avoid NULL pointer access, and efficently handle the exceptional case.
2 When there is new pollin event of the socket, we should add socket in the beginning
of the list. And this can avoid the new socket handling starvation.
Since max poll event num is 32 from upper layer and if we always put the new socket
in the end of the list, then starvation will occur if there are many socket connection events.
Because if we add the new socket into the end of the pending list, we will always handle the
existing socks first, then the later coming socket(with relatively pollin event) will always be
handled late. Then in the sock connection initialization phase, it will consume a relatively
long time, then the upper layer connection based on this socket will cause timeout,.e.g.,
ctrlr.c: 185:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host nqn.2014-08.org.nvmexpress:
uuid:af56cce7-2008-408c-a8a0-9c710857febf from subsystem nqn.2019-02.io.spdk:cnode0 due to
keep alive timeout.
[2021-08-25 20:13:42.201139] ctrlr.c: 579:_nvmf_ctrlr_add_io_qpair:
*ERROR*: Unknown controller ID 0x1
Fixes#2097
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I171b83ffd800539e86660c7607538e120fbe1a91
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9223
Reviewed-by: John Kariuki <John.K.Kariuki@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
There's only one type now.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: I8fbf330797e772b1c45a04970c95bf4894c26639
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9348
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
There's only one kind of namespace now that OCSSD is gone, so simplify
those whole path.
Change-Id: I721de11c3e7be8b3a13ada25b6d6163a67c6659f
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9329
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
As far as we're aware, this is not in use by anyone. OCSSD has largely
been replaced by ZNS and no OCSSD drives made it to the market.
Change-Id: I020ee277da5292f8c4777f224acafd87586f8238
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9328
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The values returned weren't actually used, and instead
make the code unclear especially in the error cases.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Iee2661545a6831f6151a183d4ac723c835e84d13
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9349
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Shouldn't really ever happen but is somehow things get out of whack
and we get a busy back from the low level lib we don't want to
increment our flow control counter.
Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: I49176e8ad2bf6f658ac970efea5db89eebc747f1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9232
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
bdev_nvme_create() is called only by a single caller and hostnqn is
just copied to ctrlr_opts even if it is passed separately.
Hence include hostnqn into ctrlr_opts rather than passing it as a
parameter for bdev_nvme_create().
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I75b640bcecefa94950b0c19936fab0571c428125
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9332
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@gmail.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The NVMe bdev module will have two similar features, multipath and
failover when it supports multipath.
Take a case that we add two different trids with the same name by the
bdev_nvme_attach_controller RPC as an example.
The failover adds secondary trid to an existing nvme_ctrlr. The multipath
feature creates another nvme_ctrlr and adds it to the same nvme_bdev_ctrlr
which has an existing nvme_ctrlr.
We want to use bdev_nvme_attach_controller for both failover and multipath.
To do it cleanly, separate callback to spdk_nvme_connect_async() between
creating ctrlr and setting failover.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Id9bc175af6201cdd74e12d4903fc81afe4f91189
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9225
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@gmail.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
We get ext_opts by getting spdk_bdev_io from nvme_bdev_io in
bdev_nvme_readv() or bdev_nvme_writev() now. But this is not aligned
with the current design pattern and not so efficient.
We pass contents of bdev_io via parameters to bdev_nvme_readv() or
bdev_nvme_writev().
Let's follow it about ext_opts.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I8a43f31934a36fa2d43800ec8bf17916edfca477
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9292
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@gmail.com>
When an abort command completes successfully,
cdw0 bit 0 may be set to indicate that the controller
was unable to abort the specified cid. In that case
the bdev nvme abort completion callback needs
to reset the controller.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I42a5e96e19e113c38dec67c2d8575a11f39d41a9
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9320
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Converted log message to debug that is called too many times
during a hot remove, filling up the log file.
Signed-off-by: Michael Haeuptle <michael.haeuptle@hpe.com>
Change-Id: I08f7ff7c36c6388270878291df8a1e83646e8aee
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9258
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
When a bdev is being unregistered, after all
channels have been closed, the bdev layer calls
the module's destruct callback for the bdev before
calling the bdev unregister callback.
For the rbd module, the destruct callback is
bdev_rbd_destruct. This callback unregisters the
rbd io_device which is an asynchronous operation.
We need to return >0 from bdev_rbd_destruct to
inform the bdev layer that this is an asynchronous
operation, so that it does not immediately call
the bdev unregister callback. Once the rbd io_device
is unregistered, we can call spdk_bdev_destruct_done()
which will trigger the bdev layer to finally call
the bdev unregister callback.
Without this fix, deleting an rbd bdev would
complete before the backing cluster reference
had been released. This meant that even
if you had deleted all rbd bdevs, there might still
be cluster references in place for a short period of
time. It's better to wait to complete the delete
operation until the cluster reference has been
released to avoid this issue (which this patch
now does).
Fixes issue #2069.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I8ac156c89d3e235a95ef196308cc349e6078bfd7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9115
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
The bdev layer doesn't call the destruct callback until
all channels have been released, but because the channel
delete callback passes message to the main thread, we can
end up with a complicated race condition. Currently we
have a deferred_free code path to handle this race, but
we can handle this a bit more cleanly by doing the
construct operation on the main_td as well.
This also simplifies the next patch which will
asynchronously destruct the bdev to fix an RPC bug.
Here's the race:
1) first channel was created on thread A, so disk->main_td = thread A
2) second channel was created on thread B
3) first channel is freed (but disk->main_td is still thread A)
4) spdk_bdev_unregister is called on thread C
5) bdev layer gives callback on thread B to upper layer
6) upper layer on thread B frees channel
7) bdev_rbd_destroy_cb runs on thread B and has to send msg to thread A
for processing
8) bdev layer calls bdev_rbd_destruct on thread C (since step #4 was on
thread C)
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I25ede2dc56e24dac0919aed05b9def2560823ee7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9158
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>