The semaphore was a part of struct spdk_vhost_session_fn_ctx
so far, but since there's only one pthread waiting on that
semaphore and hence only one event using it, we could just
use a single global sem_t. Same thing with response code
for those callbacks - there's only one needed.
Going a step further, the function complete_session_event()
was removed - it would only operate on global variables now,
and its signature wouldn't make much sense after this
refactor, so it's been inlined.
This serves as cleanup.
Change-Id: I63ef41d7e1564fff5e785de101d887bc1014aad9
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/459160
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Enforce spdk_vhost_fini() to be called on the same
thread which called spdk_vhost_init(). We'll also use
the newly added g_vhost_init_thread for other purposes
later on.
Change-Id: I99aebeda2d8ddaf42554aa422c32ed935634595f
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/459159
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
With all the pieces in place we can finally remove
the legacy cross thread messages from vhost.
We replace spdk_vhost_allocate_reactor() with
spdk_vhost_get_poll_group(). The returned poll_group
has to be passed to spdk_vhost_session_send_event(),
where it will be assigned to the session. After the
session it started, that poll group will be used for
all the internal vhost cross-thread messaging.
Change-Id: I17f13d3cc6e2b64e4b614c3ceb1eddb31056669b
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452207
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reported by clang:
rte_vhost_compat.c:114:36: error: taking address
of packed member 'payload' of class or structure
'vhost_user_msg' may result in an unaligned
pointer value.
To fix it, just remove the extra unaligned pointer
and inline all its accesses.
Change-Id: I7e4ab536b87ab02a4ea12c55d55a6e495c3091ca
Signed-off-by: Wojciech Malikowski <wojciech.malikowski@intel.com>
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/457559
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
struct ether_addr was renamed to struct rte_ether_addr
in latest DPDK master, but our internal fork of rte_vhost
still used the old name, which can be now a non-defined type.
Together with the struct, the RTE_ETHER_ADDR_LEN define
was renamed as well, so we'll now check if it's defined and
we'll manually define struct ether_addr to keep the old
rte_vhost working.
Change-Id: I78b8104ed3bfe03397881a94f0f8bee14f9efae8
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/457609
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
rte_vhost_vring_call() from upstream DPDK can read some
unitialized memory and crash if it's called on invalid
queue ids. The implementation in our internal rte_vhost
fork ends up wiritng to a random descriptor number, which
doesn't cause any crashes but is a bug nevertheless.
To fix it, just check if the queue is initialized before
interrupting it during the session start. It's not a hot
I/O path and there's no performance impact.
Change-Id: I830c1be98ef00d4ece9a6bd88cf79b9dfe29d2a9
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/457247
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
AIO backend requires aligned data buffers, and the maximum
IOVs supported in bdev module is defined to 32, there are
cases for Windows Guest which will send data segments more
than 32, SPDK can't process such cases, so here we can set
the 'seg_max' parameter based on bdev module capability.
Also set the maximum segment size for those requests.
Fix issue #625.
Change-Id: I0ff61e55872af17115c0b6b28425e70cb8769790
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452378
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
The memory API has been refactored. It is not possible anymore to
register a memory region more than once. This has been introduced in
this patch: https://review.gerrithub.io/426085
In case of vhost with vvu transport, it often happens that two
consequtive vhost memory regions are mapped to virtual addresses that
lie within the same 2MB address range. This means that the vhost memory
regions may not be 2MB-aligned in the process virtual address space. As
a result, the `FLOOR_2MB()` of those addresses gives the same address.
Thus, we end up trying to register the same 2MB memory range twice.
This issue does not appear in case of AF_UNIX transport. Vhost memory
regions in case of AF_UNIX transport are hugepage backed. Therefore, the
mmapped virtual addresses of those memory regions are always
2MB-aligned. On the contrary, in case of vvu transport, the vhost memory
regions are segments of the PCI memory address space of the
virtio-vhost-user PCI device. This MMIO space is mapped in its entirety
by the DPDK vfio interface along with the other PCI BARs. Ultimately,
the vhost memory regions correspond to offsets in this mmapped PCI
memory region and thus there is no warranty that the mmapped virtual
addresses are 2MB-aligned.
This issue is fixed by skipping the already-registered 2MB memory
regions.
Change-Id: I62c9c257e6f172c894cd3454d2cbeee1986e6189
Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/441057
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
vring notification mechanism is transport-specific. At present, vhost
dataplane code in `lib/vhost/vhost.c` triggers guest notifications with
`eventfd_write()` system call. But this is an AF_UNIX specific
notification mechanism. This patch replaces `eventfd_write()` with the
existing generic `rte_vhost_vring_call()` function that is part of
DPDK's librte_vhost public API.
`rte_vhost_vring_call()` takes a vring_idx as an argument to associate
the `struct spdk_vhost_virtqueue` instance with the relevant `struct
vhost_virtqueue` instance. We introduce a new `vring_idx` field in
`struct spdk_vhost_virtqueue` to enable this association. This field is
initialized in `start_device()`. In addition, a stub for
`rte_vhost_vring_call()` is added in the vhost unit test file.
SPDK's internal `rte_vhost` copy will not be updated in order to support
the virtio-vhost-user transport. However, an `rte_vhost_vring_call()`
function is introduced in SPDK's `rte_vhost` in order to have a solid
API. This function is just a wrapper of `eventfd_write()`.
Change-Id: Ic93e25cd3f06e92f04766521bc850f1ee80b8ec8
Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/454373
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
when qemu connect to vhost, but don't send msg to vhost. We use
kill -15 to destroy vhost process. it will lead to deadlock.
(A)
* rte_vhost_driver_unregister()
* pthread_mutex_lock hold vhost_user.mutex (1)
* wait TAILQ_FIRST(&vsocket->conn_list) is NULL
(B)
* fdset_event_dispatch()
* vhost_user_read_cb() start
* vhost_user_msg_handler() start
* dev->notify_ops is NULL because qemu just connect, no message recv.
* vhost_driver_callback_get()
* pthread_mutex_lock hold vhost_user.mutex (2)
(A) & (B) deadlock
To avoid this scenes, when qemu connect in vhost_new_device()
initialize dev->notify_ops
Change-Id: Iaf699da41dfa3088cfc0f09688b50fada6b2c8d6
Signed-off-by: Tianyu yang <yangtianyu2@huawei.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/454832
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
We no longer have any assumptions about vhost memory regions
size being a 2MB multiple, so we can get rid of the security
check preventing some vhost sessions from being initialized.
It will be necessary for virtio-vhost-user, whose memory comes
from PCI BARs and its size may not be a 2MB multiple.
Change-Id: I48f9bc20f4c61aefdddf39ade875867148f0ed75
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/454879
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Currently, we translate each 2MB chunk to manually check
if it's contiguous with the previous one, but there are
rte_vhost APIs that do it way more efficiently.
rte_vhost_va_from_guest_pa() was introduced in DPDK 18.02,
but was backported to 17.11 as well, so we don't even need
any RTE_VERSION ifdefs to use it now. This function
calculates the remaining region size instead of trying to
translate subsequent 2MB chunks over and over.
The previous rte_vhost_gpa_to_vva() was deprecated a long
time ago and after this patch we no longer make any use of
it.
DPDK usages of this new function check if the translated
memory region has 0 length, which seems very silly, but
let's just do it in SPDK as well.
Change-Id: Ifae8daa5f810b5a2ba1524958ad2399af700b532
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/454878
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Now that sessions have a separate flag to check if the
pollers are started, we can set the lcore field on any
thread we want. We currently assign it from within the
session thread to spdk_env_get_current_core(), but we
won't be able to use an equivalent get_current_poll_group()
function after we switch to poll groups. We will only
have a poll group object inside spdk_vhost_session_send_event(),
so that's where we move the lcore assignment for now.
Change-Id: Ib5fb37ec488de80e9d79432120c81500c297b608
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452395
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
We used to rely on lcore >= 0 for sessions that are
started (have their pollers running) and in order to
prevent data races, that lcore field had to be set from
the same thread that runs the pollers, directly after
registering/unregistering them. The lcore was always
set to spdk_env_get_current_core(), but we won't be able
to use an equivalent get_current_poll_group() function
after we switch to poll groups. We will have a poll group
object only inside spdk_vhost_session_send_event() that's
called from the DPDK rte_vhost thread.
In order to change the lcore field (or a poll group one)
from spdk_vhost_session_send_event(), we'll need a separate
field to maintain the started/stopped status that's only
going to be modified from the session's thread.
Change-Id: Idb09cae3c4715eebb20282aad203987b26be707b
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452394
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Prepare to switch to spdk_thread_send_msg() which
accepts only one context parameter.
Change-Id: Iea3e8d1e715957d9b3fea12e969f29084a2948dc
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452393
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
The goal is to remove legacy event messages from vhost.
The new message passing API accepts thread objects instead
of lcore numbers and poll groups are meant to simplify
the transition.
Eventually we'd like vhost to spawn its own threads and
do message passing only within those, but SPDK libraries
can't spawn their own threads just yet. As a stopgap, vhost
will now maintain a list of all available threads (in form
of "poll groups" to mimic nvmf) and will start pollers on
them using its own round robin scheduler.
This patch only adds the poll groups list, it doesn't
change any existing functionality.
Change-Id: I89cc5da5df3612827c6fc9015f03c94b5f4a10ad
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452206
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Prepare vhost lib init to be asynchronous. We'll need
it for setting up the upcoming poll groups.
Change-Id: I3c66b3f17f8635d4b705dd988393431193938971
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452205
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Put all shutdown functions in a single place. This also
lets us remove one forward declaration.
Change-Id: I8c8c602e67e3dafd3cd5e80bc9dd90f23381711e
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452392
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Switch to the new spdk_thread_send_msg() API instead.
Change-Id: I810465cc49d5c4ef23e04953aa29d369f48f68b1
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452391
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
We do technically support initiators without eventq or
controlq, but the lun hotplug/hotremove path expected the
eventq to be always present.
This was causing vhost to randomly crash in the fuzz tests.
Specifically, the crash happened if lun hotplug was handled
while a VM was in the middle of switching from BIOS to OS.
We fix it by checking if eventq is set before putting
any event there. LUN hotplug and hotremove won't work
without an eventq, but the entire session will be restarted
after new queues are initialized. This will make the VM
retrieve all up-to-date luns after OS initialization is
complete.
Change-Id: I5d28cbedad8fb2a35ede5a491aeb7fdc52faad06
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/451789
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
This is the end of the patch series. After this patch,
delete_target_node RPC will wait for the completion of
removal of the SCSI device and then free the iSCSI target.
SCSI device holds passed callback and calls it in free_dev().
free_dev() is ensured to be called after all iSCSI sessions
are closed. So iSCSI target resource can be freed safely
after that.
Change-Id: I25921b4014207092b7b3845dfeae58bcdffa2edc
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/450607
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
spdk_dma_malloc() is not required here, as the device
object is neither DMA-able nor shared between processes.
The device structures used to be aligned to cache line
size, but that's just a leftover from before sessions
were introduced. The device object is just a generic
device information that can be accessed from any thread
holding the proper mutex. The hot data used in the I/O
path sits in the session structure, which is now allocated
with posix_memalloc() to ensure proper alignment.
Vhost NVMe is an exception, as the device struct is used
as hot I/O data for the one and only session it supports,
so it's also allocated with posix_memalloc().
While here, also allocate various vhost buffers using
spdk_zmalloc() instead of spdk_dma_zmalloc(), as
spdk_dma_*malloc() is about to be deprecated.
Change-Id: Ic7f63185639b7b98dc1ef756166c826a0af87b44
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/450551
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
The previous patches described as optimizations also
fixed some issues. They seem sufficient to cover all
the error cases, but the real source of the problem
lies in foreach_session() initiated by the device backend,
which can use sessions that were never seen by the
backend.
The backends are only notified when a session is
*started*, but foreach_session() iterates through
all the sessions - even those that were never started.
Vhost SCSI, for example, in the foreach_session() callbacks
used to expect svsession->svdev to be always set, but
that field is only set when the session gets started.
A perfect solution would to introduce a new backend
callback to be called on new connection. Vhost SCSI
could set e.g. svsession->svdev inside. For now we go
with much easier solution that prevents sessions from
being used in foreach-session() unless they were
started at least once. (...and e.g. got their ->svdev set)
Change-Id: Ida30a1f27f99977360d08a71a64fc92931b25b75
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/449394
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Before SCSI target is removed, all vhost sessions need
to drain their pending I/O and put their I/O channels.
After a session puts it channel, it sends an async
notification to the entire vhost device. The device
will check if there are any other sessions still
referencing the SCSI target and if not - it will
continue removing the spdk_scsi_dev object. There may
be multiple sessions sending those async events at the
same time, and while we do protect from removing the
same spdk_scsi_dev twice, we can still remove
a different spdk_scsi_dev that was hot-attached in the
meantime with the same target ID.
1. SCSI target hotremove (e.g. via RPC or bdev hotremove)
/ \
/ \
session A session B
drain I/O drain I/O
| |
v |
done v
send event done
\ send event*
\
All sessions have detached the SCSI target, remove
it from the entire vhost device. From this point
a new target can be hot-attached (e.g. via RPC).
2. Attach a SCSI target with with same target ID.
3. Hotremove event* from the previous SCSI target gets
finally executed. SCSI target with that ID is
occupied (again) and may be hotremoved by mistake.
The role of that hotremove event is just to kick the
vhost device and make it remove any scsi targets that
can be removed, so add a check preventing it from
removing devices in states other than REMOVING.
Change-Id: Ia1cc7cae797fd8859d485e63f0ef37aeac2945d0
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/449990
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Always unset the VHOST_SCSI_DEV_REMOVED status on
session stop, so that we won't send hotremove SCSI
sense codes after e.g. a VM gets rebooted. The VM
should generally enumerate the SCSI devices again
in such case. We already unset the REMOVED status
for devices which were still attached at the time
of the session stop, but the devices hotremoved
before the session stop retained their REMOVED
status, giving us inconsistent behavior.
Change-Id: I7c5876e29f4bdc99cc060f1d891e24ac57051f37
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/449709
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Vhost sessions currently inherit the SCSI target
status from their vhost devices when started. So
if a session is started while an asynchronous SCSI
target hotplug is in progress, the newly started
session will inherit the VHOST_SCSI_DEV_ADDING
state, which was not meant to be used in sessions
and will likely cause vhost to misbehave. The
ADDING status is used by the entire vhost device
to indicate that some sessions are still hotplugging
the SCSI target and that target can't be hotremoved
just yet. The sessions set their targets' state to
PRESENT when hotplugging them, so newly started
sessions should do the same.
This patch also prevents the same SCSI target to be
hotplugged twice to a single session. It wouldn't
cause any problems, but some resources could've been
leaked.
Change-Id: Icdbff78c167fc1f2f65137087334bd5512e81546
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/450052
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
This an optimization that slightly simplifies the
SCSI target management. Currently if a session is
started while an asynchronous SCSI target hotremove
request is pending, the newly started session will
inherit the target in the REMOVING state. It will be
probably removed from that session in the next
management poller tick, but all that complication
is completely unnecessary. The session shouldn't
have picked up the removed SCSI target when started.
It could have simply checked that the target is
being removed and could have ignored it. That's what
this patch does.
Since the hotremove event used the active session
counter to determine if the removal was additionally
deferred, it had to be refactored to use a separate
per-request context, as there's no longer a direct
relation between started sessions and sessions that
still need to remove the target.
Change-Id: Ib78765290fa337a7d0614e5efc271760e81e4e63
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/449393
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This is just a cleanup. There's no need to hotplug
or hotremove SCSI targets from stopped sessions, because
those sessions can't access any targets anyway. When
session is started, it already inherits all SCSI targets
from the vhost device. When it's stopped, it releases
resources of all targets. Intermediate changes have
no effect whatsoever, so don't do them.
Change-Id: Ibf283bcf8260e71dec8d9ea39a9461a978031ab3
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/449392
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
It is theoretically possible for an asynchronous
hotremove request to be finished before the hotplug
request that was started first. This is obviously
not expected and will most likely result in a resource
leak.
For SCSI target hotplug, we immediately update the
whole vhost device object and then asynchronously
ask each vhost session to poll the changes.
For hotremove, we see the device attached in the
whole vhost device object, so we immediately mark
it as "still being removed" and proceed aynchronously
asking the sessions to hotremove. When session
receives the hotremove event first, it will either
fail an assertion (when debug is on), or do nothing.
The subsequent hotplug event will attach the target
again - and that target won't be ever freed.
Change-Id: I784c979fb47127a4238038ad9fb5ed1cac3ced04
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/449391
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
For session context we need only a few fields from
spdk_scsi_dev_vhost_state structure, so introduce
its stripped variant as a separate structure.
Change-Id: I1be4e77447443d156f86033450892cb7cb464cb9
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/447072
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
In cases where initiator closes the connection as soon
as it receives a hotremove event, there is a possibility
of SPDK vhost stopping the session before finishing up
the asynchronous target hotremoval. The target would be
either hotremoved once the session is started again
(and it registers its management poller again) or it
could cause a potential memory leak if that session is
destroyed. Even though the SCSI target itself is always
freed, the hotremoval completion callback is only called
from the management poller. At least in our RPC case,
not calling that callback results in leaking the context
structure and some json data.
We fix the above by calling all hotremove callbacks just
before stopping the device.
Change-Id: Ibfd773e1ab82b63643c57d7a9d37304e3007e38b
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/439445
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
There is currently a small window after we stop
session's pollers and before we mark the session
as stopped (by setting vsession->lcore to -1). If
spdk_vhost_dev_foreach_session() is called within
this window, its callback could assume the session
is still running and for example in vhost scsi
target hotremove case, could destroy an io_channel
for the second time - as it'd first done when the
session was stopped. That's a bug.
A similar case exists for session start.
We fix the above by setting vsession->lcore directly
after starting or stopping the session, hence
eliminating the possible window for data races.
This has a few implications:
* spdk_vhost_session_send_event() called before
session start can't operate on vsession->lcore,
so it needs to be provided with the lcore as
an additional parameter now.
* the vsession->lcore can't be accessed until
spdk_vhost_session_start_done() is called, so
its existing usages were replaced with
spdk_env_get_current_core()
* active_session_num is decremented right after
spdk_vhost_session_stop_done() is called and
before spdk_vhost_session_send_event() returns,
so some active_session_num == 1 checks meaning
"the last session gets stopped now" needed to be
changed to check against == 0, as if "the last
session has been just stopped"
Change-Id: I5781bb0ce247425130c9672e0df27d06b6234317
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448229
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Split spdk_vhost_session_event_done() into two separate
functions. This is just a preparation for the next patch.
Change-Id: I05e046e4b963387f058d2b822d7493c761eebbbb
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448228
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
In the next patch we will put much more responsibility
on spdk_vhost_session_event_done(), so here we make
sure it's always called under the global vhost mutex.
Specifically, spdk_vhost_session_event_done() will set
vsession->lcore, which any other thread might try to
concurrently access via spdk_vhost_dev_foreach_session().
Change-Id: I7a5fde4be4e8bdfdbbb24ac955af964f516bdb68
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448227
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
We'll make use of it inside the vhost device backend
code. The function itself is generic enough to be put
in the public vhost.h header rather than vhost_internal.h.
Change-Id: I60602c61d8bba665dcf9c6d27af2e910c208a7be
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448226
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
spdk_dma_*malloc() is about to be deprecated.
Change-Id: Iacf9f6536ba5baca7b245e639d0d42a89720ba58
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448173
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
First of all, this struct was used when stopping
a session and wasn't directly related to any vhost
device despite its name.
Second, the struct contained just a single poller.
Instead of renaming it, we remove it. We can use
that poller pointer directly.
Change-Id: I66ad0826f7e809365c07662e59979b1942243c2e
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448225
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
* don't iterate through g_nvme_ctrlrs when it's unnecessary
* fixup a potential deadlock on session stop error
(which can't practically happen unless the SPDK generic
vhost layer is malfunctioning)
* add a FIXME note to wait for pending I/Os before putting
bdev io channels and stopping the vhost pollers.
Change-Id: I576c4771f51e432fbbab244fd1b91668436004bf
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448224
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
The context had to be previously carried around by
particular vhost backend code and now it's embedded
inside the generic vsession struct. This serves mostly
as a cleanup.
Change-Id: I7b6ac2c3cb5d60a035d56affbf42fe5d4697f0f6
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448223
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Nothing actually needs this to be asynchronous. If something
comes up, we can make it asynchronous again.
Change-Id: Icde3af3f8f9efebe75b08471b4afcce3a70da541
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/447114
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Windows Virtio drivers use indirect descriptors without
negotiating their feature flag, which is explicitly
forbidden by the Virtio 1.0 spec. "(2.4.5.3.1 Driver
Requirements: Indirect Descriptors) The driver MUST NOT
set the VIRTQ_DESC_F_INDIRECT flag unless the
VIRTIO_F_INDIRECT_DESC feature was negotiated.".
Violating this rule doesn't cause any issues for SPDK
vhost, but triggers an assert, so we can only run Windows
VMs with non-debug SPDK builds.
This patch removes the assert and allows Windows VMs
to be run with debug versions of SPDK vhost.
Fixes#650
Change-Id: I95f534c33c384a4e1126a8c343c21eb63ec7bcef
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/447803
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Introduce EMPTY, PRESENT and REMOVED states for Vhost SCSI targets. This
does not introduce any functional changes but opens a way to stop using
both removed flag and spdk_scsi_dev pointer as indicator if device is
removed or not.
Change-Id: Iecd76ffe9e8121cc1359b1e268eb21679d13598e
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/447070
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
rte_vhost has rejected a patch with this feature, so
we implement it using the external rte_vhost msg handling
hooks directly in SPDK.
Change-Id: Ib072fc19b921fe0fa01c7f4892e60430232e3a1c
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/447025
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Make the vdev initialization happen before calling
any vdev related functions. This is mostly needed
for an upcomming patch where additional step is
required after initializing the vdev and before
starting rte vhost.
On the other hand, this patch also fixes a technically
possible scenario where rte vhost starts processing
vhost-user messages and calling our ops before the
related vdev was initialized.
Change-Id: I8fbc7e7bc0b364327cfcec60faa74d4f64d6fad8
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/447024
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
rte_vhost requires all queues to be fully initialized
in order to start I/O processing. This behavior is not
compliant with the vhost-user specification and doesn't
work with QEMU 2.12+, which will only initialize 1 I/O
queue for the SeaBIOS boot. Theoretically, we should
start polling each virtqueue individually after
receiving its SET_VRING_KICK message, but rte_vhost is
not designed to poll individual queues. So we use
a workaround to detect when a vhost session could be
potentially at that SeaBIOS stage and we mark it to
start polling as soon as its first virtqueue gets
initialized. This doesn't hurt any non-QEMU vhost slaves
and allows QEMU 2.12+ to boot correctly. SET_FEATURES
could be sent at any time, but QEMU will send it at
least once on SeaBIOS initialization - whenever
powered-up or rebooted.
Vhost sessions are still mostly started/stopped from
within rte_vhost callbacks, but now there's additional
concept of "forced" polling, in which SPDK starts
sessions manually, while rte_vhost still thinks the
sessions are stopped. This can potentially lead to cases
where a session is "started" twice, or gets destroyed
while it's still being polled (by force). Those cases
also need to be handled within this patch.
Change-Id: I70636d63e27914906ddece59cec34f1dd37ec5cd
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/446086
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
VHOST_USER_SET_VRING_CALL invalidates the previous
file descriptor that SPDK may be using on a different
thread. The new descriptor is stored inside rte_vhost
internals and is queryable with rte_vhost APIs, but
those APIs are too expensive to be called every tick
or every time we need to use that fd. Hence, we will
now stop the entire vhost session before processing
SET_VRING_CALL msg and restart it right after. SPDK
will query the most recent call descriptor on session
start.
We do not necessarily have to stop the device - just
letting the session know that its callfd has changed
would be enough. That's an area for future optimization.
Change-Id: Idccf56fccd21ad0d3c2307eefee7bf35e350fec6
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/447639
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
DPDK 19.05+ gives us an ability to pre or post-process
any single vhost-user message. The user can either perform
additional actions upon some generic events, or can
implement handling for brand new message types that
rte_vhost doesn't even know about.
In order to smoothly switch to the upstream rte_vhost
and drop our internal copy, we introduce an SPDK wrapper
function to register SPDK-specific message handlers. For
DPDK 19.05+ this will use the new rte_vhost API to
register those message handlers, and for older DPDKs
this function simply won't do anything - as w assume the
internal rte_copy already contains all the necessary
changes and does not need any "external" hooks.
For now we use the message handlers to stop the vhost
device and wait for any pending DMA ops before letting
rte_vhost to process the SET_MEM_TABLE message and unmap
the current shared memory.
Change-Id: Ic0fefa9174254627cb3fc0ed30ab1e54be4dd654
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/446085
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
It's disabled by default, so no functionality is changed yet.
The intention is to use the upstream rte_vhost from DPDK,
which - starting from DPDK 19.05 - is finally capable of
running with storage device backends.
SPDK still requires a lot of changes in order to support
that upstream version, but the most fundamental change is
dropping vhost-nvme support. It'll remain usable only with
the internal rte_vhost copy and with the upstream rte_vhost
it simply won't be compiled. This allows us at least to
compile with that upstream rte_vhost, where we can pursue
adding the full integration.
Change-Id: Ic8bc5497c4d77bfef77c57f3d5a1f8681ffb6d1f
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/446082
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>