snprintf guarantees to always correctly place a null terminator
in the buffer string. So manually placing a null terminator
in a buffer right after a call to snprintf is redundant code.
Additionally, there is no need to use 'sizeof(buffer) - 1' in snprintf as this
means we are not using the last character in the buffer. 'sizeof(buffer)' is
enough.
Cc: stable@dpdk.org
Signed-off-by: Michael Santana <msantana@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Due to internal glibc limitations [1], DPDK may exhaust internal
file descriptor limits when using smaller page sizes, which results
in inability to use system calls such as select() by user
applications.
Single file segments option stores lock files per page to ensure
that pages are deleted when there are no more users, however this
is not necessary because the processes will be holding onto the
pages anyway because of mmap(). Thus, removing pages from the
filesystem is safe even though they may be used by some other
secondary process. As a result, single file segments mode no
longer stores inordinate amounts of segment fd's, and the above
issue with fd limits is solved.
However, this will not work for legacy mem mode. For that, simply
document that using bigger page sizes is the only option.
[1] https://mails.dpdk.org/archives/dev/2019-February/124386.html
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Currently, we use strdup in a few places to store command-line
parameter values for certain internal config values. There are
several issues with that.
First of all, they're never freed, so memory ends up leaking
either after EAL exit, or when these command-line options are
supplied multiple times.
Second of all, they're defined as `const char *`, so they
*cannot* be freed even if we wanted to.
Finally, strdup may return NULL, which will be stored in the
config. For most fields, NULL is a valid value, but for the
default prefix, the value is always expected to be valid.
To fix all of this, three things are done. First, we change
the definitions of these values to `char *` as opposed to
`const char *`. This does not break the ABI, and previous
code assumes constness (which is more restrictive), so it's
safe to do so.
Then, fix all usages of strdup to check return value, and add
a cleanup function that will free the memory occupied by
these strings, as well as freeing them before assigning a new
value to prevent leaks when parameter is specified multiple
times.
And finally, add an internal API to query hugefile prefix, so
that, absent of a valid value, a default value will be
returned, and also fix up all usages of hugefile prefix to
use this API instead of accessing hugefile prefix directly.
Bugzilla ID: 108
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
When creating process data structures, EAL will create many files
in EAL runtime directory. Because we allow multiple secondary
processes to run, each secondary process gets their own unique
file. With many secondary processes running and exiting on the
system, runtime directory will, over time, create enormous amounts
of sockets, fbarray files and other stuff that just sits there
unused because the process that allocated it has died a long time
ago. This may lead to exhaustion of disk (or RAM) space in the
runtime directory.
Fix this by removing every unlocked file at initialization that
matches either socket or fbarray naming convention. We cannot be
sure of any other files, so we'll leave them alone. Also, remove
similar code from mp socket code.
We do it at the end of init, rather than at the beginning, because
secondary process will use primary process' data structures even
if the primary itself has died, and we don't want to remove those
before we lock them.
Bugzilla ID: 106
Cc: stable@dpdk.org
Reported-by: Vipin Varghese <vipin.varghese@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
The rte_eal_get_runtime_dir() function is currently being declared in two
header files.
This API was made public in commit 6911c9fd8f ("eal: export function to
get runtime directory"), adding it to rte_eal.h. To make it public, the
'rte' prefix was added to the function so it needed to be modified in the
original location of the declaration, eal_filesystem.h. By only modifying,
and not removing the decalration, it is now a duplicate.
This patch removes the declaration from eal_filesystem.h.
Fixes: 6911c9fd8f ("eal: export function to get runtime directory")
Reported-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
This patch makes the eal_get_runtime_dir() API public so it can be used
from outside EAL.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
As per deprecation notice [1], move DPDK runtime config to default
DPDK runtime data location. Also, remove the deprecation notice and
update release notes to indicate the changes.
[1] http://dpdk.org/patch/40418
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Fix all calls to functions in eal_filesystem to produce paths
residing inside dedicated DPDK runtime directory. Leaving DPDK
runtime config in place as 3rd-party applications within the
DPDK ecosystem might rely on this path to determine whether
DPDK is running, so moving that will be postponed to the next
release cycle.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Currently, during runtime, DPDK will store a bunch of files here
and there (in /var/run, /tmp or in $HOME). Fix it by creating a
DPDK-specific runtime directory, under which all runtime data
will be placed. The template for creating this runtime directory
is the following:
<base path>/dpdk/<DPDK prefix>/
Where <base path> is set to either "/var/run" if run as root, or
$XDG_RUNTIME_DIR if run as non-root, with a fallback to /tmp if
$XDG_RUNTIME_DIR is not defined. So, for example, if run as root,
by default all runtime data will be stored at /var/run/dpdk/rte/.
There is no equivalent of "mkdir -p", so we will be creating the
path step by step.
Nothing uses this new path yet, changes for that will come in
next commit.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Reshma Pattan <reshma.pattan@intel.com>
The original name for this path was not too descriptive and
confusing. Rename it to a more appropriate and descriptive name:
it stores data about hugepages, so name it eal_hugepage_data_path().
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Reshma Pattan <reshma.pattan@intel.com>
The define was a leftover from IVSHMEM library.
Fixes: c711ccb309 ("ivshmem: remove library and its EAL integration")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: David Marchand <david.marchand@6wind.com>
The original implementation used flock() locks, but was later
switched to using fcntl() locks for page locking, because
fcntl() locks allow locking parts of a file, which is useful
for single-file segments mode, where locking the entire file
isn't as useful because we still need to grow and shrink it.
However, according to fcntl()'s Ubuntu manpage [1], semantics of
fcntl() locks have a giant oversight:
This interface follows the completely stupid semantics of System
V and IEEE Std 1003.1-1988 (“POSIX.1”) that require that all
locks associated with a file for a given process are removed
when any file descriptor for that file is closed by that process.
This semantic means that applications must be aware of any files
that a subroutine library may access.
Basically, closing *any* fd with an fcntl() lock (which we do because
we don't want to leak fd's) will drop the lock completely.
So, in this commit, we will be reverting back to using flock() locks
everywhere. However, that still leaves the problem of locking parts
of a memseg list file in single file segments mode, and we will be
solving it with creating separate lock files per each page, and
tracking those with flock().
We will also be removing all of this tailq business and replacing it
with a simple array - saving a few bytes is not worth the extra
hassle of dealing with pointers and potential memory allocation
failures. Also, remove the tailq lock since it is not needed - these
fd lists are per-process, and within a given process, it is always
only one thread handling access to hugetlbfs.
So, first one to allocate a segment will create a lockfile, and put
a shared lock on it. When we're shrinking the page file, we will be
trying to take out a write lock on that lockfile, which would fail if
any other process is holding onto the lockfile as well. This way, we
can know if we can shrink the segment file. Also, if no other locks
are found in the lock list for a given memseg list, the memseg list
fd is automatically closed.
One other thing to note is, according to flock() Ubuntu manpage [2],
upgrading the lock from shared to exclusive is implemented by dropping
and reacquiring the lock, which is not atomic and thus would have
created race conditions. So, on attempting to perform operations in
hugetlbfs, we will take out a writelock on hugetlbfs directory, so
that only one process could perform hugetlbfs operations concurrently.
[1] http://manpages.ubuntu.com/manpages/artful/en/man2/fcntl.2freebsd.html
[2] http://manpages.ubuntu.com/manpages/bionic/en/man2/flock.2.html
Fixes: 66cc45e293 ("mem: replace memseg with memseg lists")
Fixes: 582bed1e1d ("mem: support mapping hugepages at runtime")
Fixes: a5ff05d60f ("mem: support unmapping pages at runtime")
Fixes: 2a04139f66 ("eal: add single file segments option")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
fbarray stores its data in a shared file, which is not hidden.
This leads to polluting user's HOME directory with visible
files when running DPDK as non-root. Change fbarray to always
create hidden files by default.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Since we are going to need to map hugepages in both primary and
secondary processes, we need to know where we should look for
hugetlbfs mountpoints. So, share those with secondary processes,
and map them on init.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
rte_fbarray is a simple indexed array stored in shared memory
via mapping files into memory. Rationale for its existence is the
following: since we are going to map memory page-by-page, there
could be quite a lot of memory segments to keep track of (for
smaller page sizes, page count can easily reach thousands). We
can't really make page lists truly dynamic and infinitely expandable,
because that involves reallocating memory (which is a big no-no in
multiprocess). What we can do instead is have a maximum capacity as
something really, really large, and decide at allocation time how
big the array is going to be. We map the entire file into memory,
which makes it possible to use fbarray as shared memory, provided
the structure itself is allocated in shared memory. Per-fbarray
locking is also used to avoid index data races (but not contents
data races - that is up to user application to synchronize).
In addition, in understanding that we will frequently need to scan
this array for free space and iterating over array linearly can
become slow, rte_fbarray provides facilities to index array's
usage. The following use cases are covered:
- find next free/used slot (useful either for adding new elements
to fbarray, or walking the list)
- find starting index for next N free/used slots (useful for when
we want to allocate chunk of VA-contiguous memory composed of
several pages)
- find how many contiguous free/used slots there are, starting
from specified index (useful for when we want to figure out
how many pages we have until next hole in allocated memory, to
speed up some bulk operations where we would otherwise have to
walk the array and add pages one by one)
This is accomplished by storing a usage mask in-memory, right
after the data section of the array, and using some bit-level
magic to figure out the info we need.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Previouly, there are three channels for multi-process
(i.e., primary/secondary) communication.
1. Config-file based channel, in which, the primary process writes
info into a pre-defined config file, and the secondary process
reads the info out.
2. vfio submodule has its own channel based on unix socket for the
secondary process to get container fd and group fd from the
primary process.
3. pdump submodule also has its own channel based on unix socket for
packet dump.
It'd be good to have a generic communication channel for multi-process
communication to accommodate the requirements including:
a. Secondary wants to send info to primary, for example, secondary
would like to send request (about some specific vdev to primary).
b. Sending info at any time, instead of just initialization time.
c. Share FDs with the other side, for vdev like vhost, related FDs
(memory region, kick) should be shared.
d. A send message request needs the other side to response immediately.
This patch proposes to create a communication channel, based on datagram
unix socket, for above requirements. Each process will block on a unix
socket waiting for messages from the peers.
Three new APIs are added:
1. rte_eal_mp_action_register() is used to register an action,
indexed by a string, when a component at receiver side would like
to response the messages from the peer processe.
2. rte_eal_mp_action_unregister() is used to unregister the action
if the calling component does not want to response the messages.
3. rte_eal_mp_sendmsg() is used to send a message, and returns
immediately. If there are n secondary processes, the primary
process will send n messages.
Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Replace the BSD license header with the SPDX tag for files
with only an Intel copyright on them.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
RTE_EAL_SINGLE_FILE_SEGMENTS was introduced with ivshmem integration.
Now that ivshmem was removed (commit c711ccb309)
and a simple git grep shows no one else references it;
I think we can now remove it.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
Some guards are missing or have a wrong name.
Others have LINUXAPP in their name but are now common.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
No need to have different headers for Linux and BSD.
These files are identicals with exception of internal config which has
uio and vfio fields only useful for Linux.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>