The code aimed to pick and remember the value of
mempool ops name from EAL command line arguments does not
copy the string and remembers the pointer provided
by getopt_long() directly. The latter could be clobbered
later and result in reading wrong mbuf pool ops name
by rte_mempool library.
Typically, this flaw could be avoided by using strdup()
to remember the string value of the option.
Fixes: a103a97e7191 ("eal: allow user to override default mempool driver")
Cc: stable@dpdk.org
Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
In function 'rte_try_tm':
rte_spinlock.h:82:2:
warning: ISO C90 forbids mixed declarations and code
[-Wdeclaration-after-statement]
int retries = RTE_RTM_MAX_RETRIES;
Fixes: ba7468997ea6 ("spinlock: add HTM lock elision for x86")
Cc: stable@dpdk.org
Signed-off-by: Andy Green <andy@warmcat.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
rte_lcore.h: In function 'rte_lcore_index':
rte_lcore.h:122:14:
warning: conversion to 'int' from 'unsigned int' may change
the sign of the result [-Wsign-conversion]
lcore_id = rte_lcore_id();
Fixes: 5583037a7950 ("eal: get relative core index")
Cc: stable@dpdk.org
Signed-off-by: Andy Green <andy@warmcat.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
rte_common.h:416:9:
warning: conversion to 'uint32_t' {aka 'unsigned int'} from
'int' may change the sign of the result [-Wsign-conversion]
return __builtin_ctz(v);
^~~~~~~~~~~~~~~~
The builtin is defined to return int, but we want to
return it as uint32_t. Its only defined valid return
values are positive integers or zero, which is OK for
uint32_t. So just add an explicit cast.
Fixes: 03f6bced5bba ("eal: use intrinsic function")
Cc: stable@dpdk.org
Signed-off-by: Andy Green <andy@warmcat.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
It may be useful to pass arbitrary data to the callback (such
as device pointers), so add this to the mem event callback API.
Suggested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Currently, when deallocating pages, malloc will fixup other
elements' headers if there is not enough space to store a full
element in leftover space. This leads to race conditions because
there are some functions that check for pad size with an unlocked
heap, expecting pad size to be constant.
Fix it by being more conservative and only freeing pages when
there is enough space before and after the page to store a free
element.
Fixes: 1403f87d4fb8 ("malloc: enable memory hotplug support")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
The pad value is not used unless element is in pad state, but it
will show up in heap dumps and may be confusing.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
After below commit, we encounter some strange issue:
1) Dead lock as described here:
http://dpdk.org/ml/archives/dev/2018-April/099806.html
2) SIGSEGV issue when starting a testpmd in VM.
Considering below commit changes to use dynamic memory instead of
stack for memory barrier, we doubt it's caused by use-after-free.
Fixes: 3d09a6e26d8b ("eal: fix threads block on barrier")
Reported-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reported-by: Lei Yao <lei.a.yao@intel.com>
Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
params is not freed if pthread_create() fails. The fix is
straight-forward.
Fixes: 3d09a6e26d8b ("eal: fix threads block on barrier")
Reported-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
When heap initializes, we need to add already allocated segments
onto the heap. However, in doing that, we never increased total
heap size. Fix it by adding segment length to total heap length
when initializing the heap.
Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
At hugepage info initialization, EAL takes out a write lock on
hugetlbfs directories, and drops it after the memory init is
finished. However, in non-legacy mode, if "-m" or "--socket-mem"
switches are passed, this leads to a deadlock because EAL tries
to allocate pages (and thus take out a write lock on hugedir)
while still holding a separate hugedir write lock in EAL.
Fix it by checking if write lock in hugepage info is active, and
not trying to lock the directory if the hugedir fd is valid.
Fixes: 1a7dc2252f28 ("mem: revert to using flock and add per-segment lockfiles")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Shahaf Shuler <shahafs@mellanox.com>
Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>
The original implementation used flock() locks, but was later
switched to using fcntl() locks for page locking, because
fcntl() locks allow locking parts of a file, which is useful
for single-file segments mode, where locking the entire file
isn't as useful because we still need to grow and shrink it.
However, according to fcntl()'s Ubuntu manpage [1], semantics of
fcntl() locks have a giant oversight:
This interface follows the completely stupid semantics of System
V and IEEE Std 1003.1-1988 (“POSIX.1”) that require that all
locks associated with a file for a given process are removed
when any file descriptor for that file is closed by that process.
This semantic means that applications must be aware of any files
that a subroutine library may access.
Basically, closing *any* fd with an fcntl() lock (which we do because
we don't want to leak fd's) will drop the lock completely.
So, in this commit, we will be reverting back to using flock() locks
everywhere. However, that still leaves the problem of locking parts
of a memseg list file in single file segments mode, and we will be
solving it with creating separate lock files per each page, and
tracking those with flock().
We will also be removing all of this tailq business and replacing it
with a simple array - saving a few bytes is not worth the extra
hassle of dealing with pointers and potential memory allocation
failures. Also, remove the tailq lock since it is not needed - these
fd lists are per-process, and within a given process, it is always
only one thread handling access to hugetlbfs.
So, first one to allocate a segment will create a lockfile, and put
a shared lock on it. When we're shrinking the page file, we will be
trying to take out a write lock on that lockfile, which would fail if
any other process is holding onto the lockfile as well. This way, we
can know if we can shrink the segment file. Also, if no other locks
are found in the lock list for a given memseg list, the memseg list
fd is automatically closed.
One other thing to note is, according to flock() Ubuntu manpage [2],
upgrading the lock from shared to exclusive is implemented by dropping
and reacquiring the lock, which is not atomic and thus would have
created race conditions. So, on attempting to perform operations in
hugetlbfs, we will take out a writelock on hugetlbfs directory, so
that only one process could perform hugetlbfs operations concurrently.
[1] http://manpages.ubuntu.com/manpages/artful/en/man2/fcntl.2freebsd.html
[2] http://manpages.ubuntu.com/manpages/bionic/en/man2/flock.2.html
Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists")
Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime")
Fixes: a5ff05d60fc5 ("mem: support unmapping pages at runtime")
Fixes: 2a04139f66b4 ("eal: add single file segments option")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Currently, memseg lists for secondary process are allocated on
sync (triggered by init), when they are accessed for the first
time. Move this initialization to a separate init stage for
memalloc.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
For non-legacy mode, we are preallocating space for hugepages, so
we know in advance which pages we will be able to allocate, and
which we won't. However, the init procedure was using hugepage
counts gathered from sysfs and paid no attention to hugepage
sizes that were actually available for reservation, and failed
on attempts to reserve unavailable pages.
Fix this by limiting total page counts by number of pages
actually preallocated.
Also, VA preallocate procedure only looks at mountpoints that are
available, and expects pages to exist if a mountpoint exists. That
might not necessarily be the case, so also check if there are
hugepages available for a particular page size on a particular
NUMA node.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Jananee Parthasarathy <jananeex.m.parthasarathy@intel.com>
Previously, if we couldn't preallocate VA space on 32-bit for
one page size, we simply bailed out, even though we could've
tried allocating VA space with other page sizes.
For example, if user had both 1G and 2M pages enabled, and
has asked DPDK to allocate memory on both sockets, DPDK
would've tried to allocate VA space for 1x1G page on both
sockets, failed and never tried again, even though it
could've allocated the same 1G of VA space for 512x2M pages.
Fix this by retrying with different page sizes if VA space
reservation failed.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Jananee Parthasarathy <jananeex.m.parthasarathy@intel.com>
32-bit mode has an upper limit on amount of VA space it can preallocate,
but the original implementation used the wrong constant, resulting in
failure to initialize due to integer overflow. Fix it by using the
correct constant.
Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Jananee Parthasarathy <jananeex.m.parthasarathy@intel.com>
Previous code checked for both first/last elements being NULL,
but if they weren't, the expectation was that they're both
non-NULL, which will be the case under normal conditions, but
may not be the case due to heap structure corruption.
Coverity issue: 272566
Fixes: bb372060dad4 ("malloc: make heap a doubly-linked list")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Technically, while the pointer would've been invalid if msl_idx
were invalid, we wouldn't have actually attempted to access the
pointer until verifying the index. Fix it by moving array access
to after we've verified validity of the index.
Coverity issue: 272574
Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
If user has specified a flag to unmap the area right after mapping it,
we were passing an already-unmapped pointer to RTE_LOG. This is not an
issue since RTE_LOG doesn't actually dereference the pointer, but fix
it anyway by moving call to RTE_LOG to before unmap.
Coverity issue: 272584
Fixes: b7cc54187ea4 ("mem: move virtual area function in common directory")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Coverity reports these lines as having no effect. Technically, we do
want for those lines to have no effect, however they would've likely
been optimized out. Add volatile qualifiers to ensure the code has
effects.
Coverity issue: 272608
Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Previously, if mmap failed to map page address at requested
address, we were attempting to unmap the wrong address. Fix it
by unmapping our actual mapped address, and jump further to
avoid unmapping memory that is not allocated.
Coverity issue: 272602
Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Previous code had an old rebase leftover from the time when
oldpolicy was an actual int, instead of a pointer. Fix it to
do comparison with dereferencing the pointer.
Coverity issue: 272589
Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Normally, tailq entry should have a valid fd by the time we attempt
to map the segment. However, in case it doesn't, we're leaking fd,
so fix it.
Coverity issue: 272570
Fixes: 2a04139f66b4 ("eal: add single file segments option")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
We close fd if we managed to find it in the list of allocated
segment lists (which should always be the case under normal
conditions), but if we didn't, the fd was leaking. Close it if
we couldn't find it in the segment list. This is not an issue
as if the segment is zero length, we're getting rid of it
anyway, so there's no harm in not storing the fd anywhere.
Coverity issue: 272568
Fixes: 2a04139f66b4 ("eal: add single file segments option")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
We were closing descriptor before checking if mapping has
failed, but if it did, we did a second close afterwards. Fix
it by moving closing descriptor to after we've done all error
checks.
Coverity issue: 272560
Fixes: 2a04139f66b4 ("eal: add single file segments option")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
resize_hugefile() returns either 0 (which indicates success) or -1
(which indicates failure). We failed to check the success as we
use --single-file-segments option.
Fixes: 2a04139f66b4 ("eal: add single file segments option")
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Below commit introduced pthread barrier for synchronization.
But two IPC threads block on the barrier, and never wake up.
(gdb) bt
#0 futex_wait (private=0, expected=0, futex_word=0x7fffffffcff4)
at ../sysdeps/unix/sysv/linux/futex-internal.h:61
#1 futex_wait_simple (private=0, expected=0, futex_word=0x7fffffffcff4)
at ../sysdeps/nptl/futex-internal.h:135
#2 __pthread_barrier_wait (barrier=0x7fffffffcff0) at pthread_barrier_wait.c:184
#3 rte_thread_init (arg=0x7fffffffcfe0)
at ../dpdk/lib/librte_eal/common/eal_common_thread.c:160
#4 start_thread (arg=0x7ffff6ecf700) at pthread_create.c:333
#5 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Through analysis, we find the barrier defined on the stack could be the
root cause. This patch will change to use heap memory as the barrier.
Fixes: d651ee4919cd ("eal: set affinity for control threads")
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
This patch adds APIs to support container create/destroy and device
bind/unbind with a container. It also provides API for IOMMU programing
on a specified container.
A driver could use "rte_vfio_container_create" helper to create a new
container from eal, use "rte_vfio_container_group_bind" to bind a device
to the newly created container. During rte_vfio_setup_device the container
bound with the device will be used for IOMMU setup.
Signed-off-by: Junjie Chen <junjie.j.chen@intel.com>
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Currently eal vfio framework binds vfio group fd to the default
container fd during rte_vfio_setup_device, while in some cases,
e.g. vDPA (vhost data path acceleration), we want to put vfio group
to a separate container and program IOMMU via this container.
This patch extends the vfio_config structure to contain per-container
user_mem_maps and defines an array of vfio_config. The next patch will
base on this to add container API.
Signed-off-by: Junjie Chen <junjie.j.chen@intel.com>
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
The auxiliary vector read is implemented only for Linux.
It could be done with procstat_getauxv() for FreeBSD.
Since the commit below, the auxiliary vector functions
are compiled for every architectures, including x86
which is tested with FreeBSD.
This patch is moving the Linux implementation in Linux directory,
and adding a fake/empty implementation for FreeBSD.
Fixes: 2ed9bf330709 ("eal: abstract away the auxiliary vector")
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The fake getauxval function does not use its parameter.
So the compiler raised this error:
lib/librte_eal/common/eal_common_cpuflags.c:25:25: error:
unused parameter 'type'
Fixes: 2ed9bf330709 ("eal: abstract away the auxiliary vector")
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This message looks suspicious and seen on healthy testpmd.
EAL: WARNING: Master core has no memory on local socket!
The message is wrong: the master lcore is 0 and its socket is 0
and there are multiple available memory segments on socket 0.
At that point in the startup process, the count value is zero,
meaning they are not used yet so the check_socket gets confused.
Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
rte_lcore_has_role() returns 0 if role of lcore matches requested
role. The return value of the API is confusing, and this is a known
problem with a deprecation notice announcing the change to more
intuitive semantics:
Commit 064518f68d48 ("doc: announce EAL API change to lcore role function")
Implement changes announced in the deprecation notice, and remove it.
Also, fix usages of this API to reflect the change. Control thread patches
expected new behavior and were broken before, now they are fixed as well.
Fixes: d651ee4919cd ("eal: set affinity for control threads")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
This commit removes the experimental tags from the
service cores functions, they now become part of the
main DPDK API/ABI.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Coverity was complaining about not checking result of call to
fcntl() for unlocking the file. Disregarding the fact that error
value returned from fcntl() unlock call is highly unlikely in the
first place, we are subsequently calling close() on that same fd,
which will drop the lock, which makes call to fcntl() unnecessary.
Fix this by removing a call to fcntl() altogether.
Coverity issue: 272607
Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Regular expressions are not the best way to match a hierarchical
pattern like dynamic log levels. And the separator for dynamic
log levels is period which is the regex wildcard character.
A better solution is to use filename matching 'globbing' so
that log levels match like file paths. For compatibility,
use colon to separate pattern match style arguments. For
example:
--log-level 'pmd.net.virtio.*:debug'
This also makes the documentation match what really happens
internally.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
We don't want format of eal log level saved values to be visible
in ABI. Move to private storage in eal_common_log.
Includes minor optimization. Compile the regular expression for
each log match once, rather than each time it is used.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Rather than attempting to load the contents of the auxv directly,
prefer to use an exposed API - and if that doesn't exist then attempt
to load the vector. This is because on some systems, when a user
is downgraded, the /proc/self/auxv file retains the old ownership
and permissions. The original method of /proc/self/auxv is retained.
This also removes a potential abort() in the code when compiled with
NDEBUG. A quick parse of the code shows that many (if not all) of
the CPU flag parsing isn't used internally, so it should be okay.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Add the priority RTE_PRIORITY_LAST, used for initialization routines
meant to be run after all other constructors.
This priority becomes the default priority for all DPDK constructors.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Build a central list to quickly see each used priorities for
constructors, allowing to verify that they are both above 100 and in the
proper order.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>