2017-12-19 15:49:03 +00:00
|
|
|
/* SPDX-License-Identifier: BSD-3-Clause
|
eal: add channel for multi-process communication
Previouly, there are three channels for multi-process
(i.e., primary/secondary) communication.
1. Config-file based channel, in which, the primary process writes
info into a pre-defined config file, and the secondary process
reads the info out.
2. vfio submodule has its own channel based on unix socket for the
secondary process to get container fd and group fd from the
primary process.
3. pdump submodule also has its own channel based on unix socket for
packet dump.
It'd be good to have a generic communication channel for multi-process
communication to accommodate the requirements including:
a. Secondary wants to send info to primary, for example, secondary
would like to send request (about some specific vdev to primary).
b. Sending info at any time, instead of just initialization time.
c. Share FDs with the other side, for vdev like vhost, related FDs
(memory region, kick) should be shared.
d. A send message request needs the other side to response immediately.
This patch proposes to create a communication channel, based on datagram
unix socket, for above requirements. Each process will block on a unix
socket waiting for messages from the peers.
Three new APIs are added:
1. rte_eal_mp_action_register() is used to register an action,
indexed by a string, when a component at receiver side would like
to response the messages from the peer processe.
2. rte_eal_mp_action_unregister() is used to unregister the action
if the calling component does not want to response the messages.
3. rte_eal_mp_sendmsg() is used to send a message, and returns
immediately. If there are n secondary processes, the primary
process will send n messages.
Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-01-30 06:58:08 +00:00
|
|
|
* Copyright(c) 2010-2018 Intel Corporation
|
2014-02-10 11:49:10 +00:00
|
|
|
*/
|
|
|
|
|
|
|
|
/**
|
|
|
|
* @file
|
|
|
|
* Stores functions and path defines for files and directories
|
|
|
|
* on the filesystem for Linux, that are used by the Linux EAL.
|
|
|
|
*/
|
|
|
|
|
2014-11-21 14:26:17 +00:00
|
|
|
#ifndef EAL_FILESYSTEM_H
|
|
|
|
#define EAL_FILESYSTEM_H
|
2014-02-10 11:49:10 +00:00
|
|
|
|
|
|
|
/** Path of rte config file. */
|
|
|
|
|
|
|
|
#include <stdint.h>
|
|
|
|
#include <limits.h>
|
|
|
|
#include <unistd.h>
|
|
|
|
#include <stdlib.h>
|
|
|
|
|
|
|
|
#include <rte_string_fns.h>
|
|
|
|
#include "eal_internal_cfg.h"
|
|
|
|
|
eal: add directory for runtime data
Currently, during runtime, DPDK will store a bunch of files here
and there (in /var/run, /tmp or in $HOME). Fix it by creating a
DPDK-specific runtime directory, under which all runtime data
will be placed. The template for creating this runtime directory
is the following:
<base path>/dpdk/<DPDK prefix>/
Where <base path> is set to either "/var/run" if run as root, or
$XDG_RUNTIME_DIR if run as non-root, with a fallback to /tmp if
$XDG_RUNTIME_DIR is not defined. So, for example, if run as root,
by default all runtime data will be stored at /var/run/dpdk/rte/.
There is no equivalent of "mkdir -p", so we will be creating the
path step by step.
Nothing uses this new path yet, changes for that will come in
next commit.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Reshma Pattan <reshma.pattan@intel.com>
2018-05-14 16:27:41 +00:00
|
|
|
/* sets up platform-specific runtime data dir */
|
|
|
|
int
|
|
|
|
eal_create_runtime_dir(void);
|
|
|
|
|
eal: clean up unused files on initialization
When creating process data structures, EAL will create many files
in EAL runtime directory. Because we allow multiple secondary
processes to run, each secondary process gets their own unique
file. With many secondary processes running and exiting on the
system, runtime directory will, over time, create enormous amounts
of sockets, fbarray files and other stuff that just sits there
unused because the process that allocated it has died a long time
ago. This may lead to exhaustion of disk (or RAM) space in the
runtime directory.
Fix this by removing every unlocked file at initialization that
matches either socket or fbarray naming convention. We cannot be
sure of any other files, so we'll leave them alone. Also, remove
similar code from mp socket code.
We do it at the end of init, rather than at the beginning, because
secondary process will use primary process' data structures even
if the primary itself has died, and we don't want to remove those
before we lock them.
Bugzilla ID: 106
Cc: stable@dpdk.org
Reported-by: Vipin Varghese <vipin.varghese@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-11-13 15:54:44 +00:00
|
|
|
int
|
|
|
|
eal_clean_runtime_dir(void);
|
|
|
|
|
eal: fix strdup usages in internal config
Currently, we use strdup in a few places to store command-line
parameter values for certain internal config values. There are
several issues with that.
First of all, they're never freed, so memory ends up leaking
either after EAL exit, or when these command-line options are
supplied multiple times.
Second of all, they're defined as `const char *`, so they
*cannot* be freed even if we wanted to.
Finally, strdup may return NULL, which will be stored in the
config. For most fields, NULL is a valid value, but for the
default prefix, the value is always expected to be valid.
To fix all of this, three things are done. First, we change
the definitions of these values to `char *` as opposed to
`const char *`. This does not break the ABI, and previous
code assumes constness (which is more restrictive), so it's
safe to do so.
Then, fix all usages of strdup to check return value, and add
a cleanup function that will free the memory occupied by
these strings, as well as freeing them before assigning a new
value to prevent leaks when parameter is specified multiple
times.
And finally, add an internal API to query hugefile prefix, so
that, absent of a valid value, a default value will be
returned, and also fix up all usages of hugefile prefix to
use this API instead of accessing hugefile prefix directly.
Bugzilla ID: 108
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-01-10 13:38:59 +00:00
|
|
|
/** Function to return hugefile prefix that's currently set up */
|
|
|
|
const char *
|
|
|
|
eal_get_hugefile_prefix(void);
|
|
|
|
|
2018-07-13 10:44:48 +00:00
|
|
|
#define RUNTIME_CONFIG_FNAME "config"
|
2014-02-10 11:49:10 +00:00
|
|
|
static inline const char *
|
|
|
|
eal_runtime_config_path(void)
|
|
|
|
{
|
|
|
|
static char buffer[PATH_MAX]; /* static so auto-zeroed */
|
|
|
|
|
2019-05-10 14:53:12 +00:00
|
|
|
snprintf(buffer, sizeof(buffer), "%s/%s", rte_eal_get_runtime_dir(),
|
2018-07-13 10:44:48 +00:00
|
|
|
RUNTIME_CONFIG_FNAME);
|
2014-02-10 11:49:10 +00:00
|
|
|
return buffer;
|
|
|
|
}
|
|
|
|
|
eal: add channel for multi-process communication
Previouly, there are three channels for multi-process
(i.e., primary/secondary) communication.
1. Config-file based channel, in which, the primary process writes
info into a pre-defined config file, and the secondary process
reads the info out.
2. vfio submodule has its own channel based on unix socket for the
secondary process to get container fd and group fd from the
primary process.
3. pdump submodule also has its own channel based on unix socket for
packet dump.
It'd be good to have a generic communication channel for multi-process
communication to accommodate the requirements including:
a. Secondary wants to send info to primary, for example, secondary
would like to send request (about some specific vdev to primary).
b. Sending info at any time, instead of just initialization time.
c. Share FDs with the other side, for vdev like vhost, related FDs
(memory region, kick) should be shared.
d. A send message request needs the other side to response immediately.
This patch proposes to create a communication channel, based on datagram
unix socket, for above requirements. Each process will block on a unix
socket waiting for messages from the peers.
Three new APIs are added:
1. rte_eal_mp_action_register() is used to register an action,
indexed by a string, when a component at receiver side would like
to response the messages from the peer processe.
2. rte_eal_mp_action_unregister() is used to unregister the action
if the calling component does not want to response the messages.
3. rte_eal_mp_sendmsg() is used to send a message, and returns
immediately. If there are n secondary processes, the primary
process will send n messages.
Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-01-30 06:58:08 +00:00
|
|
|
/** Path of primary/secondary communication unix socket file. */
|
2018-05-14 16:27:42 +00:00
|
|
|
#define MP_SOCKET_FNAME "mp_socket"
|
eal: add channel for multi-process communication
Previouly, there are three channels for multi-process
(i.e., primary/secondary) communication.
1. Config-file based channel, in which, the primary process writes
info into a pre-defined config file, and the secondary process
reads the info out.
2. vfio submodule has its own channel based on unix socket for the
secondary process to get container fd and group fd from the
primary process.
3. pdump submodule also has its own channel based on unix socket for
packet dump.
It'd be good to have a generic communication channel for multi-process
communication to accommodate the requirements including:
a. Secondary wants to send info to primary, for example, secondary
would like to send request (about some specific vdev to primary).
b. Sending info at any time, instead of just initialization time.
c. Share FDs with the other side, for vdev like vhost, related FDs
(memory region, kick) should be shared.
d. A send message request needs the other side to response immediately.
This patch proposes to create a communication channel, based on datagram
unix socket, for above requirements. Each process will block on a unix
socket waiting for messages from the peers.
Three new APIs are added:
1. rte_eal_mp_action_register() is used to register an action,
indexed by a string, when a component at receiver side would like
to response the messages from the peer processe.
2. rte_eal_mp_action_unregister() is used to unregister the action
if the calling component does not want to response the messages.
3. rte_eal_mp_sendmsg() is used to send a message, and returns
immediately. If there are n secondary processes, the primary
process will send n messages.
Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-01-30 06:58:08 +00:00
|
|
|
static inline const char *
|
|
|
|
eal_mp_socket_path(void)
|
|
|
|
{
|
|
|
|
static char buffer[PATH_MAX]; /* static so auto-zeroed */
|
|
|
|
|
2019-05-10 14:53:12 +00:00
|
|
|
snprintf(buffer, sizeof(buffer), "%s/%s", rte_eal_get_runtime_dir(),
|
2018-05-14 16:27:42 +00:00
|
|
|
MP_SOCKET_FNAME);
|
eal: add channel for multi-process communication
Previouly, there are three channels for multi-process
(i.e., primary/secondary) communication.
1. Config-file based channel, in which, the primary process writes
info into a pre-defined config file, and the secondary process
reads the info out.
2. vfio submodule has its own channel based on unix socket for the
secondary process to get container fd and group fd from the
primary process.
3. pdump submodule also has its own channel based on unix socket for
packet dump.
It'd be good to have a generic communication channel for multi-process
communication to accommodate the requirements including:
a. Secondary wants to send info to primary, for example, secondary
would like to send request (about some specific vdev to primary).
b. Sending info at any time, instead of just initialization time.
c. Share FDs with the other side, for vdev like vhost, related FDs
(memory region, kick) should be shared.
d. A send message request needs the other side to response immediately.
This patch proposes to create a communication channel, based on datagram
unix socket, for above requirements. Each process will block on a unix
socket waiting for messages from the peers.
Three new APIs are added:
1. rte_eal_mp_action_register() is used to register an action,
indexed by a string, when a component at receiver side would like
to response the messages from the peer processe.
2. rte_eal_mp_action_unregister() is used to unregister the action
if the calling component does not want to response the messages.
3. rte_eal_mp_sendmsg() is used to send a message, and returns
immediately. If there are n secondary processes, the primary
process will send n messages.
Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2018-01-30 06:58:08 +00:00
|
|
|
return buffer;
|
|
|
|
}
|
|
|
|
|
2018-05-14 16:27:42 +00:00
|
|
|
#define FBARRAY_NAME_FMT "%s/fbarray_%s"
|
eal: add shared indexed file-backed array
rte_fbarray is a simple indexed array stored in shared memory
via mapping files into memory. Rationale for its existence is the
following: since we are going to map memory page-by-page, there
could be quite a lot of memory segments to keep track of (for
smaller page sizes, page count can easily reach thousands). We
can't really make page lists truly dynamic and infinitely expandable,
because that involves reallocating memory (which is a big no-no in
multiprocess). What we can do instead is have a maximum capacity as
something really, really large, and decide at allocation time how
big the array is going to be. We map the entire file into memory,
which makes it possible to use fbarray as shared memory, provided
the structure itself is allocated in shared memory. Per-fbarray
locking is also used to avoid index data races (but not contents
data races - that is up to user application to synchronize).
In addition, in understanding that we will frequently need to scan
this array for free space and iterating over array linearly can
become slow, rte_fbarray provides facilities to index array's
usage. The following use cases are covered:
- find next free/used slot (useful either for adding new elements
to fbarray, or walking the list)
- find starting index for next N free/used slots (useful for when
we want to allocate chunk of VA-contiguous memory composed of
several pages)
- find how many contiguous free/used slots there are, starting
from specified index (useful for when we want to figure out
how many pages we have until next hole in allocated memory, to
speed up some bulk operations where we would otherwise have to
walk the array and add pages one by one)
This is accomplished by storing a usage mask in-memory, right
after the data section of the array, and using some bit-level
magic to figure out the info we need.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 12:30:23 +00:00
|
|
|
static inline const char *
|
|
|
|
eal_get_fbarray_path(char *buffer, size_t buflen, const char *name) {
|
2018-10-27 09:17:40 +00:00
|
|
|
snprintf(buffer, buflen, FBARRAY_NAME_FMT, rte_eal_get_runtime_dir(),
|
|
|
|
name);
|
eal: add shared indexed file-backed array
rte_fbarray is a simple indexed array stored in shared memory
via mapping files into memory. Rationale for its existence is the
following: since we are going to map memory page-by-page, there
could be quite a lot of memory segments to keep track of (for
smaller page sizes, page count can easily reach thousands). We
can't really make page lists truly dynamic and infinitely expandable,
because that involves reallocating memory (which is a big no-no in
multiprocess). What we can do instead is have a maximum capacity as
something really, really large, and decide at allocation time how
big the array is going to be. We map the entire file into memory,
which makes it possible to use fbarray as shared memory, provided
the structure itself is allocated in shared memory. Per-fbarray
locking is also used to avoid index data races (but not contents
data races - that is up to user application to synchronize).
In addition, in understanding that we will frequently need to scan
this array for free space and iterating over array linearly can
become slow, rte_fbarray provides facilities to index array's
usage. The following use cases are covered:
- find next free/used slot (useful either for adding new elements
to fbarray, or walking the list)
- find starting index for next N free/used slots (useful for when
we want to allocate chunk of VA-contiguous memory composed of
several pages)
- find how many contiguous free/used slots there are, starting
from specified index (useful for when we want to figure out
how many pages we have until next hole in allocated memory, to
speed up some bulk operations where we would otherwise have to
walk the array and add pages one by one)
This is accomplished by storing a usage mask in-memory, right
after the data section of the array, and using some bit-level
magic to figure out the info we need.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
2018-04-11 12:30:23 +00:00
|
|
|
return buffer;
|
|
|
|
}
|
|
|
|
|
2014-02-10 11:49:10 +00:00
|
|
|
/** Path of hugepage info file. */
|
2018-05-14 16:27:42 +00:00
|
|
|
#define HUGEPAGE_INFO_FNAME "hugepage_info"
|
2014-02-10 11:49:10 +00:00
|
|
|
static inline const char *
|
|
|
|
eal_hugepage_info_path(void)
|
|
|
|
{
|
|
|
|
static char buffer[PATH_MAX]; /* static so auto-zeroed */
|
|
|
|
|
2019-05-10 14:53:12 +00:00
|
|
|
snprintf(buffer, sizeof(buffer), "%s/%s", rte_eal_get_runtime_dir(),
|
2018-05-14 16:27:42 +00:00
|
|
|
HUGEPAGE_INFO_FNAME);
|
2014-02-10 11:49:10 +00:00
|
|
|
return buffer;
|
|
|
|
}
|
|
|
|
|
2018-05-14 16:27:42 +00:00
|
|
|
/** Path of hugepage data file. */
|
|
|
|
#define HUGEPAGE_DATA_FNAME "hugepage_data"
|
2018-04-11 12:30:33 +00:00
|
|
|
static inline const char *
|
2018-05-14 16:27:40 +00:00
|
|
|
eal_hugepage_data_path(void)
|
2018-04-11 12:30:33 +00:00
|
|
|
{
|
|
|
|
static char buffer[PATH_MAX]; /* static so auto-zeroed */
|
|
|
|
|
2019-05-10 14:53:12 +00:00
|
|
|
snprintf(buffer, sizeof(buffer), "%s/%s", rte_eal_get_runtime_dir(),
|
2018-05-14 16:27:42 +00:00
|
|
|
HUGEPAGE_DATA_FNAME);
|
2018-04-11 12:30:33 +00:00
|
|
|
return buffer;
|
|
|
|
}
|
|
|
|
|
2014-02-10 11:49:10 +00:00
|
|
|
/** String format for hugepage map files. */
|
|
|
|
#define HUGEFILE_FMT "%s/%smap_%d"
|
|
|
|
static inline const char *
|
|
|
|
eal_get_hugefile_path(char *buffer, size_t buflen, const char *hugedir, int f_id)
|
|
|
|
{
|
2014-06-24 18:15:28 +00:00
|
|
|
snprintf(buffer, buflen, HUGEFILE_FMT, hugedir,
|
eal: fix strdup usages in internal config
Currently, we use strdup in a few places to store command-line
parameter values for certain internal config values. There are
several issues with that.
First of all, they're never freed, so memory ends up leaking
either after EAL exit, or when these command-line options are
supplied multiple times.
Second of all, they're defined as `const char *`, so they
*cannot* be freed even if we wanted to.
Finally, strdup may return NULL, which will be stored in the
config. For most fields, NULL is a valid value, but for the
default prefix, the value is always expected to be valid.
To fix all of this, three things are done. First, we change
the definitions of these values to `char *` as opposed to
`const char *`. This does not break the ABI, and previous
code assumes constness (which is more restrictive), so it's
safe to do so.
Then, fix all usages of strdup to check return value, and add
a cleanup function that will free the memory occupied by
these strings, as well as freeing them before assigning a new
value to prevent leaks when parameter is specified multiple
times.
And finally, add an internal API to query hugefile prefix, so
that, absent of a valid value, a default value will be
returned, and also fix up all usages of hugefile prefix to
use this API instead of accessing hugefile prefix directly.
Bugzilla ID: 108
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-01-10 13:38:59 +00:00
|
|
|
eal_get_hugefile_prefix(), f_id);
|
2014-02-10 11:49:10 +00:00
|
|
|
return buffer;
|
|
|
|
}
|
|
|
|
|
|
|
|
/** define the default filename prefix for the %s values above */
|
|
|
|
#define HUGEFILE_PREFIX_DEFAULT "rte"
|
|
|
|
|
|
|
|
/** Function to read a single numeric value from a file on the filesystem.
|
|
|
|
* Used to read information from files on /sys */
|
|
|
|
int eal_parse_sysfs_value(const char *filename, unsigned long *val);
|
|
|
|
|
2014-11-21 14:26:17 +00:00
|
|
|
#endif /* EAL_FILESYSTEM_H */
|