DPDK 18.11+ multi-process hotplug isn't robust.
Multiple secondary processes starting at the same
time might cause the internal IPC to misbehave.
Just retry hotplugging/hotremoving the device
in such case.
Change-Id: I1f830c2c0dbe1d63eca9a116101b3d202172b2ca
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/434539
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
With all the error checks and segfault preventions in place,
we can finally enable hotplug in a multi-process scenario
for DPDK 18.11+.
If a device is attached in the primary process, it will send
an attach IPC request to the secondary process which needs
to succeed. Until now it would get rejected, and the attach
would fail in all the processes.
The device in secondary process will be now probed by DPDK
and will be put into the process local SPDK list of devices
to be locally attached. Either SPDK will attach it sometime
later on any attach/enumerate request, or DPDK will remove
it automatically once the same device in the primary process
gets removed.
We also allow the surprise attach in primary processes, as
it's technically possible for the pci devices (NVMe) to
be attached exclusively from the secondary process. The
fact that the NVMe stack doesn't support it is another story.
Currently the NVMe stack will handle the failure by itself
just fine.
Change-Id: Ia24a8b4610cc7c659f59a2fdda9d8a78e58af873
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/434416
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
DPDK 18.11+ does its best to ensure all devices are
equally attached or detached in all processes within
a shared memory group. For SPDK it means that if
a device is hotplugged in the primary, then DPDK will
automatically send an IPC hotplug request to all other
processes. Those other processes may not have the same
SPDK PCI driver registered and may fail to attach the
device. DPDK will send back the failure status and the
primary process will also fail to hotplug its device.
To prevent that, we need to pre-register the pci
drivers on env init.
We register the drivers just after the EAL init
because we don't want the matching devices to be picked
up by the initial bus probe in DPDK. That's for 2 reasons:
1) we don't want to attach *all* available devices
2) devices attached from non-SPDK context (that is,
outside of the spdk attach or enumerate functions)
will still fail to attach - the entire attaching
process will only take significant amount of time
and will bloat the log with useless status messages
Change-Id: I7b4c3a2e355f98ea755649f789137f5a727bc935
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/434415
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Although the struct is used as an enumeration context,
it really is a pci driver. The subsuequent patch introduces
a few functions around the pci driver, so rename the struct
to make it align nicely with those functions.
Change-Id: I919c30e55d9f42d795ecd8e20e5d29f3918c17a5
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/434414
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Upon detaching a device in a secondary process, DPDK 18.11
will try to detach it from the primary process as well.
SPDK doesn't support such hot-detach and will reject it
in the primary process. That will cause the secondary
process to also reject its detach. The device in the
secondary process will be still there in DPDK, but for
SPDK it will remain inaccessible - neither attach, nor
enumerate will work on it.
To fix it, we make our attach and enumerate functions
always check the process local list of devices probed
by DPDK, but not attached in SPDK.
Looking at the patch from a different perspective, it
simply introduces error handling for the DPDK detach
function. If a device failed to detach, we'll now maintain
it locally in SPDK to make it attach-able again.
Change-Id: I8c509a571bea7a9fb413c9c2bfd64c62ad91074b
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/434413
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
It's handy to store the SPDK structs within the device
structure. The subsequent patch will make us use
spdk_pci_addr much more frequently, so it makes sense
to keep it around rather than build it up from rte_pci_addr
everytime.
The upcoming VMD driver will also benefit from this patch
by being able to fill the spdk_pci_device struct with any
custom PCI details.
Change-Id: I236a19e28beba9a593b29f23b79b1b0b92ef1fa7
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/434418
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
In DPDK 18.11, a device can be potentially detached not only
upon an SPDK request, but also directly from within the DPDK
itself. In a multi-process scenario, when one process detaches
the PCI device, an IPC message - detach request - will be sent
to every other process in the same shared memory group. As we
don't propagate the removal notification to upper layers, the
still-referenced rte_pci_device object will just disappear at
one moment.
SPDK is still not ready for supporting the above case and will
try to avoid it, but just in case some detach request slips
through, then this patch provides the sanity checks preventing
SPDK from crashing.
Change-Id: I3e35d8efb33085163b9acd8a565e86a4221df844
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/434412
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Very minor cleanup before we start refactoring the code.
Change-Id: I00d768ec0c84f2a37c54b7575de695281c5ebb22
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/434411
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: Jim Harris <james.r.harris@intel.com>
DPDK already prints at least one error message, so
there's no need to print a yet another one.
Change-Id: I1c7bdfe5ca2095b93ec282bf193a717627d5fa27
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/434410
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Prepare for storing additional per-device data.
The struct doesn't store any interesting data yet,
but already has a TAILQ_ENTRY that allows us to
put it into a global pci device list. Right now
we use the list only to find the SPDK device once
the corresponding DPDK device gets removed, but
more usages will be implemented soon.
Change-Id: If3abc1da60446e0a647d8d4c642f111ebfbcdb9e
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/434409
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Now that even DPDK 16.11 (LTS) reaches its end of life in
November 2018, we can surely drop support for DPDK
versions older than that.
The PCI code will go through a major refactor soon, so this
patch cleans it up first.
Since this is the very first SPDK patch that drops support
for older DPDK versions, it also introduces an #error
directive that'll directly fail the build if the used DPDK
lib is too old.
Change-Id: I9bae30c98826c75cc91cda498e47e46979a08ed1
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/433865
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
We assumed spdk_mem_map_translate() translates only 2MB-aligned
addresses, but that's not true. Both vtophys and NVMf can use it
with any user-provided address and that breaks our contiguous memory
length calculations. Right now each buffer appeared to have the
first n * 2MB of memory always contiguous.
This is a bugfix for NVMf which does check the mapping length
internally. It will also become handy when adding the similar
functionality to spdk_vtophys().
Change-Id: I3bc8e0b2b8d203cb90320a79264effb7ea7037a7
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/433076
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
This keeps us from having to deal with ALLOC and FREE events
for mismatching regions - which necessitated splitting new
regions into individual pages. This caused all kinds of
problems with NVMe-oF - for example, buffers that spanned
memory regions, or bumping up against MR limits on RDMA
NICs.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I18dcdae148436b55d4481bb9fb8799f4832c7de1
Reviewed-on: https://review.gerrithub.io/434895
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Despite the scary commit title, this patch just unifies
per-driver mutexes into a single pci mutex.
On each hotplug we modify some DPDK global resources,
which per-driver locks aren't sufficient for. If
multiple threads try to attach devices at the same time,
then we'll likely have a data race. DPDK hotplug APIs
don't provide any kind of thread safety on their own.
Change-Id: I89cca9fea04ecf576ec5854c662bae1d3712b3fb
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/433864
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
We need to do it only for DPDK 16.11, which leaks the
mappings otherwise. DPDK was fixed in version 17.02 with
the following commit:
e84ad157 (pci: unmap resources if probe fails)
Unmapping the resources twice doesn't actually cause
us any trouble, but prints an ambiguous error message.
Change-Id: I8b62e86d5fff8fe924dbf9ae2e37cff29298d412
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/433863
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This was dead code. Registering the same device more than
once has never been possible within a single process.
Change-Id: I04fa2a62cd9f3d7f99ff799ea4504c49a2232da4
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/433866
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Previously we used VFIO if only the vfio-pci kernel module
was loaded, which is different from what our setup.sh script
did. On a fairly usual system configuration, setup.sh could
have bound devices to UIO, but SPDK would still try to map
memory to an (empty) DPDK VFIO container just because its fd
was available. That would fail obviously.
setup.sh checks for IOMMU presence in order to use vfio-pci
and SPDK should probably do the same. We could check the
kernel driver of each attached PCI device, but there's no
chance right now of supporting both UIO and VFIO devices
at the same time with IOMMU passthrough. That's not
a reasonable configuration anyway. To keep things simple,
we just add a single check on vtophys initialization.
Fixes#462
Change-Id: Ica653f117743be322291a1b7e37ed00e34ef5035
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/432518
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: <wuzhouhui@kingsoft.com>
BSD implementation for config access in DPDK seems to
return 0 on success while Linux implementation returns 0
only on failure. The env wrapper was always treating 0 as
an error and caused some of our PCI initialization code
to fail prematurely.
At one point DPDK harmonized this BSD behavior with Linux,
but only for config reads.
Fixes#484
Change-Id: I4ea850ea50f5e667fad28e8125209b21c377a2a3
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/432401
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Allow specifying a custom hugetlbfs directory.
This can be useful e.g. when trying to use hugepages
with fixed size, different size limit, or different
access permissions.
Change-Id: I418cbab99ed183383300b3c3d9945095a03478db
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/432105
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
DPDK 18.11 sets the default base-virtaddr to an address
that falls into an area reserved by ASAN. DPDK will try
to remap its memory over and over with the closest
base-virtaddr hint and for ASAN case this would take
a huge amount of time.
This was already raised on DPDK mailing list [1] and
might be eventually fixed or worked around in upstream,
but for now let's just override the default base-virtaddr
to a value that ASAN is known not to occupy.
[1] http://patches.dpdk.org/patch/46130/#88395
Change-Id: Ieada30e82355e8ead458e53795ab98cd12692c1c
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/431257
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
The previous functions were deprecated and now removed.
Change-Id: I076125aaf80b97c627ca45b860700fdf6d87e925
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/430557
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
In early development of crypto we built the lirbary using
dpdkbuild/Makefile but later due to a DPDK change we needed
to move to a newer version of the ipsec library that is now
built and installed in pkgdep.sh btu we never removed it
from the makefile.
This patch removes the extra building of the ipsec module.
Change-Id: I367618440074d952ff13c6e2098fc3a572cc785e
Signed-off-by: paul luse <paul.e.luse@intel.com>
Reviewed-on: https://review.gerrithub.io/429519
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
With all SPDK patches in place, we can now enable dynamic
memory management for all DPDK versions >= 18.05.1
Hugepages can be still reserved with [-s|--mem-size <size>]
option at application startup, but once we use them all up,
instead of failing user allocations with -ENOMEM, we'll try
to dynamically reserve even more. This allows starting DPDK
with `--mem-size 0` and using only as many hugepages as it
was actually necessary.
Change-Id: I9e6f58ea50af2234f96a53e7a32d9e14d2df65ff
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/426828
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This is an NVMe-specific issue and I/OA or VirtIO devices don't
need it. Additionally, the delay is now asynchronous, meaning
that potentially multiple NVMe controllers can wait all at once.
The drawback of this change is that we're needlessly waiting
even when using uio_pci_generic. However, since the delay does
not block anymore, its impact is significantly minimized.
Change-Id: I5d16a7fd7cb66c785acb687f14690e95f6188b9e
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/429414
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
All PCI device management is done only by the primary process,
so there's no need to delay device initialization in secondary
processes. If device is being initialized in a secondary
process, then it must have been already initialized by the
primary.
Change-Id: I087da77f981018dabf3feed59c76b294a16ca88d
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/429413
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
While here, change spdk_lib_list_to_files to
spdk_lib_list_to_static_libs to differentiate it from
the new spdk_lib_list_to_shared_libs.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I6e5913addfbdd556fae2451d4e2b2c43feaf33ab
Reviewed-on: https://review.gerrithub.io/429286
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Lance Hartmann <lance.hartmann@oracle.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Minor cleanup.
Change-Id: I9554163ae22836b50b954ec27ed27bcb848cb193
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/428883
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
The ENV_LINKER_ARGS was employing both the linker --[start|end]-group
and --[whole|no-whole]-archive options around the DPDK_LIBs. With
the use of whole/no-whole, the start/end bracketing is unnecessary.
Change-Id: I97a2ac22df8c6b48ba674b9b292f5eea01823901
Signed-off-by: Lance Hartmann <lance.hartmann@oracle.com>
Reviewed-on: https://review.gerrithub.io/428737
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
The problem with registering the entire hotplugged memory
region is that it won't necessarily be unregistered in one
go. Registering each hugepage separately solves that
problem.
This puts a limitation on the number of pages that can
be allocated when using RDMA. We'll hopefully lift this
limitation sometime in future - probably levereging
ibv_rereg_mr, but for now we'll have to resort to either:
a) using 1GB hugepages
b) preallocating memory (with [-s|--mem-size <size>] app
param) as it will be registered as just one region no
matter what size it is. This memory won't be returned
to the system until the SPDK app exits.
Change-Id: I6de997fb4901b772730ba6fe995dcc0640b85749
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/428716
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
All mem_maps will still receive separate unregister
notification for each registered region, but the
public memory unregister API is more flexible now.
This follows the VFIO_TYPE1v2_IOMMU interface, which
allows the same.
Change-Id: Ifc008afdc6bff39d9b3b4c892c379ade10c3098e
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/428715
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Added sanity checks to prevent unregistering a memory
range that wasn't registered as a one, complete region.
Change-Id: I819b57560b2e48b0802113ffff9f72949d7a148a
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/425556
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Removed the reference count from the registrations map.
Although technically supported, registering a single memory
region more than once had a lot of unhandled cases and could
easily lead to a segfault.
RDMA maps require all memory to be unregistered in the same
chunks the memory was registered, which is often impossible
to achieve if a region was registered more than once:
1. register region 0x0 - 0x3 -> it gets mapped to
a single ibv_mr
2. register region 0x1 - 0x2 -> nothing happens, this region
is already registered
3. unregister region 0x0 - 0x3 -> 0x0-0x1 gets unregistered as
one region. 0x2-0x3 gets
unregistered as another
(leading to segfault in the
the current RDMA implementation)
The problem is that the last two regions share the same ibv_mr,
which SPDK tries to free twice. The second free causes a segfault.
vtophys map handles this case by registering each 2MB chunk
separately, but this solution cannot be applied for RDMA, as
NICs put a limitation (~2048) on the number of regions registered.
Another option is to keep a refcount of each ibv_mr allocated,
and free it only when the entire region was unregistered from the
SPDK mem map. This is however very tricky and RDMAmojo mentions
that freeing a memory buffer before unregistering its ibv_mr
may lead to a segfault.
Change-Id: I545c56e24ffa55bda211dea22aeb8a55d9631fe5
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/426085
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Prevents us from unregistering two or more separately
registered regions with a single notification.
This fixes an ibv_mr leak in RDMA. When multiple registrations
were unregistered with a single notification, only the first
ibv_mr one would be freed and the remaining memory would
possibly still remain DMA-able.
As of now, unregistering multiple complete regions with
a single unregister call is not possible. It will be
implemented later, after the rest of the code is cleaned up.
Change-Id: I7d61867fa61fd7a4a8a644ff45cab17125d63e1b
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/425555
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Newly registered mem maps weren't notified at vaddr 0
as we used value "0" to denote an unitialized, unset
variable. Since we *do* register vaddr 0 in our memory
unit tests let's switch unitialized vaddr value to
something different - like UINT64_MAX.
Change-Id: I9c902165e76155e068642abb9a656f3ae8ca1105
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/428713
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
In spdk_mem_map_alloc() we only do the memory walk when
notify_cb is provided, but spdk_mem_map_free() does the
memory walk undonditionally. Not anymore.
Change-Id: Ic8dfdc5cb2c99dc58e62ab0523cf5a18ba8691cc
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/428722
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
If during mem_map creation we managed to register only a part
of the total registered memory and then failed in the middle,
we will now undo our registrations and gracefully return
an error code leading to a spdk_mem_map_alloc() failure().
Change-Id: Id549f35109bd381318adc676392d9ee26d035359
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/423538
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Didn't enable this in initial patch, probably could have, but we
need it for performance testing now and will need it to start getting
the hardware into CI.
Change-Id: I688cc94713146380933e5cddd7ed5b2d168c1fb2
Signed-off-by: paul luse <paul.e.luse@intel.com>
Reviewed-on: https://review.gerrithub.io/427274
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
eal_get_runtime_dir() that we used so far is not
available in DPDK shared libraries, so we remove
its usages in SPDK together with the functionality
of removing files left over by DPDK. Luckily for
us, this only affects DPDK 18.05, as for DPDK 18.08+
the --no-shconf option that we use prevents all
those files from being created whatsoever.
Change-Id: I078fb7686d2445a6acb067b0c3762a9c99f9a429
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/426218
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Lance Hartmann <lance.hartmann@oracle.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Dynamic memory management was broken in DPDK 18.05
and got fixed in 18.08/18.05.1, but SPDK still needs
a couple of patches to ensure we support it. While
those patches slowly make their way through the review
process, let's stick with legacy mem mgmt for all DPDK
versions.
Change-Id: I6a0bc7b46b28dd75bef6847dde1ef57dc60a829e
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/426817
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
This function will now check for whether or not a memory region is
contiguous accross 2MB map entries and return the total length of that
contiguous buffer up to the size specified by the user.
Also includes unittests
This series of changes is aimed at enabling spdk_mem_map_translate to
report back to the user the length of the valid mem_map up to the
function that requested the translation.
This will be useful when retrieving memory regions associated with I/O
buffers in NVMe-oF. For large I/O it will be possible that the buffer is
split over multiple MRs and the I/O will have to be split into multiple
SGLs.
Change-Id: I2ce582427d451be5a317808d0825c770e12e9a69
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/425329
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This series of changes is aimed at enabling spdk_mem_map_translate to
report back to the user the length of the valid mem_map up to the
function that requested the translation.
This will be useful when retrieving memory regions associated with I/O
buffers in NVMe-oF. For large I/O it will be possible that the buffer is
split over multiple MRs and the I/O will have to be split into multiple
SGLs.
Change-Id: I90da6d4d31c669a3bf046f7721923dd743c5ef21
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/425328
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
The spdk_ring creation does not support "multi-producer/multi-consumer" type
and does not honor all of the types during enqueue/dequeue operations.
- Add SPDK_RING_TYPE_MC and update spdk_ring_create() to support it.
- Update spdk_ring_enqueue() to call rte_ring_enqueue_bulk() instead of
rte_ring_mp_enqueue_bulk().
- Update spdk_ring_dequeue() to call rte_ring_dequeue_burst() instead of
rte_ring_mp_dequeue_burst().
Change-Id: I15219513f9c45a8ec8a0af19cdc35428ba728454
Signed-off-by: John Barnard <john.barnard@broadcom.com>
Reviewed-on: https://review.gerrithub.io/426143
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
We recently increased the top level map from 128TB
(47 bits) to 256TB (48 bits) but missed fixing one
comment that still referred to the old number of bits.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ic993bd136cd09789ea2a65b480151805d231c3d5
Reviewed-on: https://review.gerrithub.io/426002
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Dariusz Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Initial support for softare AESNI_MB DPDK driver only.
Have tested (both aesni and QAT seprately and concurrrently) on underlying NVMe devices
with bdevio and a bdevperf script that runs IOs from 512B to 128K each with Q depths from
1 to 512 in powers of 2 for 30 seconds each run.
QAT can be included in the code (but not makefile) and marked as experimental
until we are ready to test in CI. It works well on 2 systems but is a big PITA to get
the hardware setup and configured for use with DPDK (IMHO).
Change-Id: If518c3df8e74e00efa18afdf194824c5e69778fc
Signed-off-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-on: https://review.gerrithub.io/403107
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This struct will hold the unique operations for the mem_map.
This series of changes is aimed at enabling spdk_mem_map_translate to
report back to the user the length of the valid mem_map up to the
function that requested the translation.
This will be useful when retrieving memory regions associated with I/O
buffers in NVMe-oF. For large I/O it will be possible that the buffer is
split over multiple MRs and the I/O will have to be split into multiple
SGLs.
Change-Id: Ifdd82497f238d99345033f2615c718802a591438
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/425327
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
The function now takes a pointer as it's last argument, and copies the
size of the memory region for which the translation is validinto that
pointer.
For now, that will always be 2MB. However that behavior can change in
the future.
This series of changes is aimed at enabling spdk_mem_map_translate to
report back to the user the length of the valid mem_map up to the
function that requested the translation.
This will be useful when retrieving memory regions associated with I/O
buffers in NVMe-oF. For large I/O it will be possible that the buffer is
split over multiple MRs and the I/O will have to be split into multiple
SGLs.
Change-Id: I8686c166ec956507f5ae55cf602341281482cb89
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/424888
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
The underlying DPDK function we use reads an array
at the provided index without checking for any out
of bounds access. The array is RTE_MAX_LCORE elements
long, so always manually check against that to keep
our APIs safe.
This fixes potential crashes with lcore == SPDK_ENV_LCORE_ID_ANY
Change-Id: I3081b888275fbecba8ab95feb20d2074341e2fc7
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/425042
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This is an artifact from before SPDK had a configure
script or a DPDK submodule. Make configure the
only supported way for specifying the location of the
DPDK installation to use with SPDK.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I5c197c46220928bb18b97c8807755967d76ea42c
Reviewed-on: https://review.gerrithub.io/424893
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
the function pointer .get_iommu_class was not defined until v17.11 of
dpdk so this function causes packaged versions of dpdk <17.11 to fail to
compile with SPDK. Adding a couple preporocessor directives to avoid
this problem.
Change-Id: I70cf44877ddd712d42d117e6fa5f82494675d603
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/424609
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
To use virtual address IOVAs, DPDK has to ensure there
are absolutely no predispositions to require physical
addresses neither now, nor in future. All available buses,
including PCI, report whether they require physical
addresses. For PCI, the registered drivers may report
RTE_PCI_DRV_IOVA_AS_VA to mark a driver as RTE_IOVA_VA
compliant. However, this model assumes that all (PCI)
drivers are registered on DPDK init, which is not the case
in SPDK. With no devices attached, most buses will report
RTE_IOVA_DC (don't care) and EAL will default to
RTE_IOVA_PA.
SPDK needs a hook to report whether it supports RTE_IOVA_VA,
and the fake rte_bus provides it.
Change-Id: Iba2d904200d1b70140d81943a57b98fd290783f9
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/423491
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>