2141 Commits

Author SHA1 Message Date
Stephen Hemminger
e37aad5ed3 eal: drop unused macros for primary process check
No usage in current DPDK code base.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-03-01 18:17:36 +01:00
Luca Boccassi
a9933bb1de build: improve libbsd dependency handling
Use dependency() instead of manual append to ldflags.

Move libbsd inclusion to librte_eal, so that all other libraries and
PMDs will inherit it.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2019-02-27 12:28:03 +01:00
Bruce Richardson
b543d1a715 compat: merge compat library into EAL
Since compat library is only a single header, we can easily move it into
the EAL common headers instead of tracking it separately. The downside of
this is that it becomes a little more difficult to have any libs that are
built before EAL depend on it. Thankfully, this is not a major problem as
the only library which uses rte_compat.h and is built before EAL (kvargs)
already has the path to the compat.h header file explicitly called out as
an include path.

However, to ensure that we don't hit problems later with this, we can add
EAL common headers folder to the global include list in the meson build
which means that all common headers can be safely used by all libraries, no
matter what their build order.

As a side-effect, this patch also fixes an issue with building on BSD using
meson, due to compat lib no longer needing to be listed as a dependency.

Fixes: a8499f65a1d1 ("log: add missing experimental tag")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: David Marchand <david.marchand@redhat.com>
Tested-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2019-02-25 16:03:31 +01:00
Bruce Richardson
146e57627f eal: support strlcat function
Add the strlcat function to DPDK to exist alongside the strlcpy one.
While strncat is generally safe for use for concatenation, the API for the
strlcat function is perhaps a little nicer to use, and supports truncation
detection.

See commit 5364de644a4b ("eal: support strlcpy function") for more
details on the function selection logic, since we only should be using the
DPDK-provided version when no system-provided version is present.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-02-12 10:04:28 +01:00
Thomas Monjalon
cae0d722d6 version: 19.05-rc0
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: John McNamara <john.mcnamara@intel.com>
2019-02-06 11:20:06 +01:00
Thomas Monjalon
8b937bae24 version: 19.02.0
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2019-02-01 15:25:17 +01:00
Thomas Monjalon
a2f9c0d417 version: 19.02-rc4
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2019-01-28 02:53:53 +01:00
Ilya Maximets
0a703f0f36 eal/linux: fix parsing zero socket memory and limits
Modern memory mode allowes to not reserve any memory by the
'--socket-mem' option. i.e. it could be possible to specify
zero preallocated memory like '--socket-mem 0'.
Also, it should be possible to configure unlimited memory
allocations by '--socket-limit 0'.

Both cases are impossible now and blocks starting the DPDK
application:

    ./dpdk-app --socket-limit 0 <...>
    EAL: invalid parameters for --socket-limit
    EAL: Invalid 'command line' arguments.
    Unable to initialize DPDK: Invalid argument

Fixes: 6b42f75632f0 ("eal: enable non-legacy memory mode")
Cc: stable@dpdk.org

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-01-23 23:02:07 +01:00
Anatoly Burakov
47f4fe0595 vfio: allow secondary process to query IOMMU type
It is only possible to know IOMMU type of a given VFIO container
by attempting to initialize it. Since secondary process never
attempts to set up VFIO container itself (because they're shared
between primary and secondary), it never knows which IOMMU type
the container is using, and never sets up the appropriate config
structures. This results in inability to perform DMA mappings in
secondary process.

Fix this by allowing secondary process to query IOMMU type of
primary's default container at device initialization.

Note that this fix is assuming we're only interested in default
container.

Bugzilla ID: 174
Fixes: 6bcb7c95fe14 ("vfio: share default container in multi-process")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
2019-01-21 16:13:59 +01:00
Thomas Monjalon
84a1d4a873 version: 19.02-rc3
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2019-01-20 22:39:20 +01:00
Ilya Maximets
6406d70561 eal: fix clang build with intrinsics forced
This fixes x86_64-native-linuxapp-clang build with
CONFIG_RTE_FORCE_INTRINSICS=y:

    include/generic/rte_atomic.h:218:9: error:
        implicit declaration of function '__atomic_exchange_2'
        is invalid in C99 [-Werror,-Wimplicit-function-declaration]

    include/generic/rte_atomic.h:501:9: error:
        implicit declaration of function '__atomic_exchange_4'
        is invalid in C99 [-Werror,-Wimplicit-function-declaration]

    include/generic/rte_atomic.h:783:9: error:
        implicit declaration of function '__atomic_exchange_8'
        is invalid in C99 [-Werror,-Wimplicit-function-declaration]

We didn't caught this issue previously on other platforms because
CONFIG_RTE_FORCE_INTRINSICS enabled by default only for armv8.

Fixes: 7bdccb93078e ("eal: fix ARM build with clang")
Cc: stable@dpdk.org

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
2019-01-17 18:39:55 +01:00
Anatoly Burakov
2383d8e909 eal: check string parameter lengths
When specifying parameters such as hugefile prefix from the
command-line, it is possibly to supply an empty string. This may
lead to various problems: for example, if hugefile prefix is
empty, the runtime config path construction may end up
looking like "/var/run/dpdk//_config", which will technically
work, but is wrong and places files in the wrong place.

To fix it, check lengths of such user-specified parameters for
hugefile prefix, as well as hugepage dir and user-specified
mbuf pool ops string.

Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-01-17 18:39:55 +01:00
David Marchand
7b55015e14 eal: fix out of bound access when no CPU available
In the unlikely case when the dpdk application is started with no cpu
available in the [0, RTE_MAX_LCORE - 1] range, the master_lcore is
automatically chosen as RTE_MAX_LCORE which triggers an out of bound
access.

Either you have a crash then, or the initialisation fails later when
trying to pin the master thread on it.
In my test, with RTE_MAX_LCORE == 2:

$ taskset -c 2 ./master/app/testpmd --no-huge -m 512 --log-level *:debug
[...]
EAL: pthread_setaffinity_np failed
PANIC in eal_thread_init_master():
cannot set affinity
7: [./master/app/testpmd() [0x47f629]]

Bugzilla ID: 19
Fixes: 2eba8d21f3c9 ("eal: restrict cores auto detection")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
2019-01-17 18:39:55 +01:00
Hari Kumar Vemula
b38693b612 eal: fix core number validation
When incorrect core value or range provided,
as part of -l command line option, a crash occurs.

Added valid range checks to fix the crash.

Added ut check for negative core values.
Added unit test case for invalid core number range.

Fixes: d888cb8b9613 ("eal: add core list input format")
Cc: stable@dpdk.org

Signed-off-by: Hari Kumar Vemula <hari.kumarx.vemula@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2019-01-17 17:22:04 +01:00
Thomas Monjalon
05853e1784 version: 19.02-rc2
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2019-01-15 03:08:43 +01:00
Gaetan Rivet
68c4768d36 eal: return error when option register fails
Make rte_option_register return a negative value when
an error occur.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2019-01-15 02:40:40 +01:00
Gaetan Rivet
e48839afff eal: improve option API documentation
Use doxygen to describe the main structure and describe a little more
why it exists.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2019-01-15 02:40:40 +01:00
Gaetan Rivet
d3bdefef22 eal: fix log level of error in option register
INFO is not correct when logging an error.

Fixes: 2395332798d0 ("eal: add option register infrastructure")
Cc: stable@dpdk.org

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2019-01-15 02:40:40 +01:00
Gaetan Rivet
f87471c3f1 eal: check against common option on register
Not only check against other registered options, but also common EAL
options. This will mitigate user confusion.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2019-01-15 02:40:40 +01:00
Gaetan Rivet
42f6dbda09 eal: rename option name field
option->opt_* is redundant.
The field should also be constant.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2019-01-15 02:40:40 +01:00
Gaetan Rivet
b8fe14b7cf eal: add option usage string
Add a usage string field in rte_option, allowing to display
help to the user and describe which options are currently available.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2019-01-15 02:40:40 +01:00
Gaetan Rivet
ce6448fa01 eal: do not use static option iterator
This is rather weird. Someone should have caught that during review.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2019-01-15 02:40:40 +01:00
Gaetan Rivet
4c3bf26c19 eal: use bare option string as name
Current options name can be passed with arbitrary format.
Force the use of "--" prefix and thus POSIX long options format.

This restricts the ability to introduce surprising options and will help
future additional checks.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2019-01-15 02:40:40 +01:00
Ilya Maximets
9726aa9907 eal: fix build of external app with clang on armv8
In case DPDK built using GCC, RTE_TOOLCHAIN_CLANG is not defined.
But 'rte_atomic.h' is a generic header that included to the
external apps like OVS while building with DPDK. As a result,
clang build of OVS fails on armv8 if DPDK built using gcc:

    include/generic/rte_atomic.h:215:9: error:
            implicit declaration of function '__atomic_exchange_2'
            is invalid in C99
    include/generic/rte_atomic.h:494:9: error:
            implicit declaration of function '__atomic_exchange_4'
            is invalid in C99
    include/generic/rte_atomic.h:772:9: error:
            implicit declaration of function '__atomic_exchange_8'
            is invalid in C99

We need to check for current compiler, not the compiler used for
DPDK build.

Fixes: 7bdccb93078e ("eal: fix ARM build with clang")
Cc: stable@dpdk.org

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2019-01-14 19:49:48 +01:00
Anatoly Burakov
ba07193e03 mem: fix storing old policy
The original code was supposed to overwrite the value pointed to
by the pointer, but the new one is instead overwriting the
pointer value itself, which has no effect outside that function.
Fix it by adding a pointer dereference.

Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-01-14 15:50:52 +01:00
Anatoly Burakov
199629022c mem: fix variable shadowing
A local variable ``flags`` was shadowing another variable from outer
scope. Fix this by renaming the variable and make it const.

Fixes: c127be93f619 ("mem: support using memfd segments for in-memory mode")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-01-14 15:42:40 +01:00
Anatoly Burakov
c0f8d50d1c vfio: do not unregister callback in secondary process
Callbacks are only registered in the primary, so do not attempt to
unregister callbacks in secondary processes.

Fixes: 43e463137154 ("vfio: support memory event callbacks")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-01-14 15:31:51 +01:00
Anatoly Burakov
97257eee2d eal/bsd: remove clean up of files at startup
On FreeBSD, closing the file descriptor drops the lock even if the
file descriptor was mmap'ed. This leads to the cleanup at the end
of EAL init to remove fbarray files that are still in use by the
process itself.

However, instead of working around this issue, we can take advantage
of the fact that FreeBSD doesn't really create any per-process
files in the first place, so no cleanup is actually needed.

Fixes: 0a529578f162 ("eal: clean up unused files on initialization")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-01-14 15:23:12 +01:00
Anatoly Burakov
66d9f61de0 eal: fix strdup usages in internal config
Currently, we use strdup in a few places to store command-line
parameter values for certain internal config values. There are
several issues with that.

First of all, they're never freed, so memory ends up leaking
either after EAL exit, or when these command-line options are
supplied multiple times.

Second of all, they're defined as `const char *`, so they
*cannot* be freed even if we wanted to.

Finally, strdup may return NULL, which will be stored in the
config. For most fields, NULL is a valid value, but for the
default prefix, the value is always expected to be valid.

To fix all of this, three things are done. First, we change
the definitions of these values to `char *` as opposed to
`const char *`. This does not break the ABI, and previous
code assumes constness (which is more restrictive), so it's
safe to do so.

Then, fix all usages of strdup to check return value, and add
a cleanup function that will free the memory occupied by
these strings, as well as freeing them before assigning a new
value to prevent leaks when parameter is specified multiple
times.

And finally, add an internal API to query hugefile prefix, so
that, absent of a valid value, a default value will be
returned, and also fix up all usages of hugefile prefix to
use this API instead of accessing hugefile prefix directly.

Bugzilla ID: 108

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-01-14 15:05:19 +01:00
Thomas Monjalon
7637518249 version: 19.02-rc1
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-12-23 00:21:13 +01:00
Anatoly Burakov
ba731ea1dd malloc: fix deadlock when reading stats
Currently, malloc statistics and external heap creation code
use memory hotplug lock as a way to synchronize accesses to
heaps (as in, locking the hotplug lock to prevent list of heaps
from changing under our feet). At the same time, malloc
statistics code will also lock the heap because it needs to
access heap data and does not want any other thread to allocate
anything from that heap.

In such scheme, it is possible to enter a deadlock with the
following sequence of events:

thread 1		thread 2
rte_malloc()
			rte_malloc_dump_stats()
take heap lock
			take hotplug lock
failed to allocate,
attempt to take
hotplug lock
			attempt to take heap lock

Neither thread will be able to continue, as both of them are
waiting for the other one to drop the lock. Adding an
additional lock will require an ABI change, so instead of
that, make malloc statistics calls thread-unsafe with
respect to creating/destroying heaps.

Fixes: 72cf92b31855 ("malloc: index heaps using heap ID rather than NUMA node")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-21 15:26:43 +01:00
Qi Zhang
85d6815fa6 eal: close multi-process socket during cleanup
When secondary process quit, the mp_socket* file still exist, that
cause rte_mp_request_sync fail when try to send message on a floating
socket.

The patch fix the issue by introduce a function rte_mp_channel_cleanup.
This function will be called by rte_eal_cleanup and it will close the
mp socket and delete the mp_socket* file.

Fixes: bacaa2754017 ("eal: add channel for multi-process communication")
Cc: stable@dpdk.org

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
2018-12-21 01:15:41 +01:00
Anatoly Burakov
9d65053761 eal: add 64-bit log2 function
Add missing implementation for 64-bit log2 function, and extend
the unit test to test this new function. Also, remove duplicate
reimplementation of this function from testpmd and memalloc.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-21 00:23:49 +01:00
Anatoly Burakov
43c9e6c205 eal: add 64-bit fls function
Add missing implementation for 64-bit fls function, and extend
unit test to test the new function as well.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-21 00:17:43 +01:00
Anatoly Burakov
4e261f5519 eal: add 64-bit bsf and 32-bit safe bsf functions
Add an rte_bsf64 function that follows the convention of existing
rte_bsf32 function. Also, add missing implementation for safe
version of rte_bsf32, and implement unit tests for all recently
added bsf varieties.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-21 00:00:58 +01:00
Anatoly Burakov
cc7ddb00da bitmap: remove deprecated 64-bit bsf function
The function rte_bsf64 was deprecated in a previous release, so
remove the function, and the deprecation notice associated with
it.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-20 23:44:56 +01:00
Anatoly Burakov
307315d457 eal: fix runtime directory cleanup in noshconf mode
When using --no-shconf or --in-memory modes, there is no runtime
directory to be created, so there is no point in attempting to
clean it.

Fixes: 0a529578f162 ("eal: clean up unused files on initialization")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-20 23:27:35 +01:00
Anatoly Burakov
c75f535ac5 mem: use memfd for no-huge mode
When running in no-huge mode, we anonymously allocate our memory.
While this works for regular NICs and vdev's, it's not suitable
for memory sharing scenarios such as virtio with vhost_user
backend.

To fix this, allocate no-huge memory using memfd, and register
it with memalloc just like any other memseg fd. This will enable
using rte_memseg_get_fd() API with --no-huge EAL flag.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-12-20 22:58:25 +01:00
Anatoly Burakov
df7722c75b mem: allow setting up segment list fd
Currently, only segment fd's for multi-file segments are supported,
while for memfd-backed no-huge memory we need single-file segments
mode. Add support for single-file segments in the internal API.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-12-20 22:55:56 +01:00
Anatoly Burakov
d75eea3145 mem: check for memfd support in segment fd API
If memfd support was not compiled, or hugepage memfd support
is not available at runtime, the API will now return proper
error code, indicating that this API is unsupported. This
changes the API, so document the changes.

Fixes: 41dbdb68723b ("mem: add external API to retrieve page fd")
Fixes: 3a44687139eb ("mem: allow querying offset into segment fd")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-12-20 22:54:37 +01:00
Anatoly Burakov
525670756a mem: fix segment fd API error code for external segment
Segment fd API does not support getting segment fd's from
externally allocated memory, so return proper error code
on any attempts to do so. This changes API behavior, so
document the change as well.

Fixes: 5282bb1c3695 ("mem: allow memseg lists to be marked as external")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-12-20 22:51:49 +01:00
Anatoly Burakov
bed7941886 mem: allow usage of non-heap external memory in multiprocess
Add multiprocess support for externally allocated memory areas that
are not added to DPDK heap (and add relevant doc sections).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-12-20 18:14:55 +01:00
Anatoly Burakov
950e8fb4e1 mem: allow registering external memory areas
The general use-case of using external memory is well covered by
existing external memory API's. However, certain use cases require
manual management of externally allocated memory areas, so this
memory should not be added to the heap. It should, however, be
added to DPDK's internal structures, so that API's like
``rte_virt2memseg`` would work on such external memory segments.

This commit adds such an API to DPDK. The new functions will allow
to register and unregister externally allocated memory areas, as
well as documentation for them.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-12-20 18:14:55 +01:00
Anatoly Burakov
39ff94e71c malloc: separate destroying memseg list and heap data
Currently, destroying external heap chunk and its memseg list is
part of one process. When we will gain the ability to unregister
external memory from DPDK that doesn't have any heap structures
associated with it, we need to be able to find and destroy
memseg lists as well as heap data separately.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-12-20 18:10:08 +01:00
Anatoly Burakov
0f526d674f malloc: separate creating memseg list and malloc heap
Currently, creating external malloc heap involves also creating
a memseg list backing that malloc heap. We need to have them as
separate functions, to allow creating memseg lists without
creating a malloc heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-12-20 18:09:55 +01:00
Anatoly Burakov
646e5260ee malloc: make alignment requirements more stringent
The external heaps API already implicitly expects start address
of the external memory area to be page-aligned, but it is not
enforced or documented. Fix this by implementing additional
parameter checks at memory add call, and document the page
alignment requirement explicitly.

Fixes: 7d75c31014f7 ("malloc: allow adding memory to named heaps")
Cc: stable@dpdk.org

Suggested-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-12-20 15:34:03 +01:00
Anatoly Burakov
b3e735e16e malloc: fix duplicate mem event notification
We already trigger a mem event notification inside the walk function,
no need to do it twice.

Fixes: f32c7c9de961 ("malloc: enable event callbacks for external memory")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-20 15:28:55 +01:00
Seth Howell
fba0ca2274 malloc: notify primary process about hotplug in secondary
When secondary process hotplugs memory, it sends a request
to primary, which then performs the real mmap() and sends
sync requests to all secondary processes. Upon receiving
such sync request, each secondary process will notify the
upper layers of hotplugged memory (and will call all
locally registered event callbacks).

In the end we'll end up with memory event callbacks fired
in all the processes except the primary, which is a bug.

This gets critical if memory is hotplugged while a VFIO
device is attached, as the VFIO memory registration -
which is done from a memory event callback present in the
primary process only - is never called.

After this patch, a primary process fires memory event
callbacks before secondary processes start their
synchronizations - both for hotplug and hotremove.

Fixes: 07dcbfe0101f ("malloc: support multiprocess memory hotplug")
Cc: stable@dpdk.org

Signed-off-by: Seth Howell <seth.howell@intel.com>
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-20 15:25:34 +01:00
Yongseok Koh
6d09256148 malloc: fix finding maximum contiguous IOVA size
malloc_elem_find_max_iova_contig() could return invalid size due to a
missing sanity check. The following gdb output shows how 'cur_size' can be
invalid in find_biggest_element().

	(gdb) p/x cur_size
	$4 = 0xffffffffffe42900
	(gdb) p elem
	$1 = (struct malloc_elem *) 0x12e842000
	(gdb) p *elem
	$2 = {heap = 0x7ffff7ff387c, prev = 0x12e831fc0, next =
		0x12e842900, free_list = {le_next = 0x109538000, le_prev =
		0x7ffff7ff3894}, msl = 0x7ffff7ff107c, state = ELEM_FREE,
		pad = 0, size = 2304}
	(gdb) p *elem->msl
	$5 = {{base_va = 0x100200000, addr_64 = 4297064448}, page_sz =
		2097152, socket_id = 0, version = 790, len = 17179869184,
		external = 0, memseg_arr = {name = "memseg-2048k-0-0",
		'\000' <repeats 47 times>, count = 493, len = 8192, elt_sz
		= 48, data = 0x10002e000, rwlock = {cnt = 0}}}

Fixes: 9fe6bceafd51 ("malloc: add finding biggest free IOVA-contiguous element")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-20 15:17:48 +01:00
Jim Harris
476c847ab6 malloc: add option --match-allocations
SPDK uses the rte_mem_event_callback_register API to
create RDMA memory regions (MRs) for newly allocated regions
of memory. This is used in both the SPDK NVMe-oF target
and the NVMe-oF host driver.

DPDK creates internal malloc_elem structures for these
allocated regions. As users malloc and free memory, DPDK
will sometimes merge malloc_elems that originated from
different allocations that were notified through the
registered mem_event callback routine. This results
in subsequent allocations that can span across multiple
RDMA MRs. This requires SPDK to check each DPDK buffer to
see if it crosses an MR boundary, and if so, would have to
add considerable logic and complexity to describe that
buffer before it can be accessed by the RNIC. It is somewhat
analagous to rte_malloc returning a buffer that is not
IOVA-contiguous.

As a malloc_elem gets split and some of these elements
get freed, it can also result in DPDK sending an
RTE_MEM_EVENT_FREE notification for a subset of the
original RTE_MEM_EVENT_ALLOC notification. This is also
problematic for RDMA memory regions, since unregistering
the memory region is all-or-nothing. It is not possible
to unregister part of a memory region.

To support these types of applications, this patch adds
a new --match-allocations EAL init flag. When this
flag is specified, malloc elements from different
hugepage allocations will never be merged. Memory will
also only be freed back to the system (with the requisite
memory event callback) exactly as it was originally
allocated.

Since part of this patch is extending the size of struct
malloc_elem, we also fix up the malloc autotests so they
do not assume its size exactly fits in one cacheline.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-12-20 13:01:08 +01:00