numam-dpdk

Author	SHA1	Message	Date
Michael Santana	f4be6a9a29	fix off-by-one errors in snprintf snprintf guarantees to always correctly place a null terminator in the buffer string. So manually placing a null terminator in a buffer right after a call to snprintf is redundant code. Additionally, there is no need to use 'sizeof(buffer) - 1' in snprintf as this means we are not using the last character in the buffer. 'sizeof(buffer)' is enough. Cc: stable@dpdk.org Signed-off-by: Michael Santana <msantana@redhat.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-05-29 13:02:53 +02:00
Anatoly Burakov	1e3380a2f4	mem: do not use lockfiles for single file segments mode Due to internal glibc limitations [1], DPDK may exhaust internal file descriptor limits when using smaller page sizes, which results in inability to use system calls such as select() by user applications. Single file segments option stores lock files per page to ensure that pages are deleted when there are no more users, however this is not necessary because the processes will be holding onto the pages anyway because of mmap(). Thus, removing pages from the filesystem is safe even though they may be used by some other secondary process. As a result, single file segments mode no longer stores inordinate amounts of segment fd's, and the above issue with fd limits is solved. However, this will not work for legacy mem mode. For that, simply document that using bigger page sizes is the only option. [1] https://mails.dpdk.org/archives/dev/2019-February/124386.html Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-04-02 16:07:25 +02:00
Anatoly Burakov	66d9f61de0	eal: fix strdup usages in internal config Currently, we use strdup in a few places to store command-line parameter values for certain internal config values. There are several issues with that. First of all, they're never freed, so memory ends up leaking either after EAL exit, or when these command-line options are supplied multiple times. Second of all, they're defined as `const char `, so they cannot* be freed even if we wanted to. Finally, strdup may return NULL, which will be stored in the config. For most fields, NULL is a valid value, but for the default prefix, the value is always expected to be valid. To fix all of this, three things are done. First, we change the definitions of these values to `char ` as opposed to `const char `. This does not break the ABI, and previous code assumes constness (which is more restrictive), so it's safe to do so. Then, fix all usages of strdup to check return value, and add a cleanup function that will free the memory occupied by these strings, as well as freeing them before assigning a new value to prevent leaks when parameter is specified multiple times. And finally, add an internal API to query hugefile prefix, so that, absent of a valid value, a default value will be returned, and also fix up all usages of hugefile prefix to use this API instead of accessing hugefile prefix directly. Bugzilla ID: 108 Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-01-14 15:05:19 +01:00
Anatoly Burakov	0a529578f1	eal: clean up unused files on initialization When creating process data structures, EAL will create many files in EAL runtime directory. Because we allow multiple secondary processes to run, each secondary process gets their own unique file. With many secondary processes running and exiting on the system, runtime directory will, over time, create enormous amounts of sockets, fbarray files and other stuff that just sits there unused because the process that allocated it has died a long time ago. This may lead to exhaustion of disk (or RAM) space in the runtime directory. Fix this by removing every unlocked file at initialization that matches either socket or fbarray naming convention. We cannot be sure of any other files, so we'll leave them alone. Also, remove similar code from mp socket code. We do it at the end of init, rather than at the beginning, because secondary process will use primary process' data structures even if the primary itself has died, and we don't want to remove those before we lock them. Bugzilla ID: 106 Cc: stable@dpdk.org Reported-by: Vipin Varghese <vipin.varghese@intel.com> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-19 04:12:30 +01:00
Kevin Laatz	2ddd89c3c6	eal: fix duplicate function declaration The rte_eal_get_runtime_dir() function is currently being declared in two header files. This API was made public in commit `6911c9fd8f` ("eal: export function to get runtime directory"), adding it to rte_eal.h. To make it public, the 'rte' prefix was added to the function so it needed to be modified in the original location of the declaration, eal_filesystem.h. By only modifying, and not removing the decalration, it is now a duplicate. This patch removes the declaration from eal_filesystem.h. Fixes: `6911c9fd8f` ("eal: export function to get runtime directory") Reported-by: Anatoly Burakov <anatoly.burakov@intel.com> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-11-18 13:40:26 +01:00
Kevin Laatz	6911c9fd8f	eal: export function to get runtime directory This patch makes the eal_get_runtime_dir() API public so it can be used from outside EAL. Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-10-27 12:10:24 +02:00
Anatoly Burakov	adf1d86736	eal: move runtime config file to new location As per deprecation notice [1], move DPDK runtime config to default DPDK runtime data location. Also, remove the deprecation notice and update release notes to indicate the changes. [1] http://dpdk.org/patch/40418 Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-07-13 13:29:01 +02:00
Anatoly Burakov	5b18d86dec	eal: move runtime data into dedicated directory Fix all calls to functions in eal_filesystem to produce paths residing inside dedicated DPDK runtime directory. Leaving DPDK runtime config in place as 3rd-party applications within the DPDK ecosystem might rely on this path to determine whether DPDK is running, so moving that will be postponed to the next release cycle. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-05-15 00:35:12 +02:00
Anatoly Burakov	56236363b4	eal: add directory for runtime data Currently, during runtime, DPDK will store a bunch of files here and there (in /var/run, /tmp or in $HOME). Fix it by creating a DPDK-specific runtime directory, under which all runtime data will be placed. The template for creating this runtime directory is the following: <base path>/dpdk/<DPDK prefix>/ Where <base path> is set to either "/var/run" if run as root, or $XDG_RUNTIME_DIR if run as non-root, with a fallback to /tmp if $XDG_RUNTIME_DIR is not defined. So, for example, if run as root, by default all runtime data will be stored at /var/run/dpdk/rte/. There is no equivalent of "mkdir -p", so we will be creating the path step by step. Nothing uses this new path yet, changes for that will come in next commit. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Reshma Pattan <reshma.pattan@intel.com>	2018-05-15 00:35:08 +02:00
Anatoly Burakov	a2a2e499e5	mem: rename function returning hugepage data path The original name for this path was not too descriptive and confusing. Rename it to a more appropriate and descriptive name: it stores data about hugepages, so name it eal_hugepage_data_path(). Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Reshma Pattan <reshma.pattan@intel.com>	2018-05-15 00:35:02 +02:00
Anatoly Burakov	dcbfbe3c80	eal: remove unused path pattern The define was a leftover from IVSHMEM library. Fixes: `c711ccb309` ("ivshmem: remove library and its EAL integration") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: David Marchand <david.marchand@6wind.com>	2018-05-15 00:34:58 +02:00
Anatoly Burakov	1a7dc2252f	mem: revert to using flock and add per-segment lockfiles The original implementation used flock() locks, but was later switched to using fcntl() locks for page locking, because fcntl() locks allow locking parts of a file, which is useful for single-file segments mode, where locking the entire file isn't as useful because we still need to grow and shrink it. However, according to fcntl()'s Ubuntu manpage [1], semantics of fcntl() locks have a giant oversight: This interface follows the completely stupid semantics of System V and IEEE Std 1003.1-1988 (“POSIX.1”) that require that all locks associated with a file for a given process are removed when any file descriptor for that file is closed by that process. This semantic means that applications must be aware of any files that a subroutine library may access. Basically, closing any fd with an fcntl() lock (which we do because we don't want to leak fd's) will drop the lock completely. So, in this commit, we will be reverting back to using flock() locks everywhere. However, that still leaves the problem of locking parts of a memseg list file in single file segments mode, and we will be solving it with creating separate lock files per each page, and tracking those with flock(). We will also be removing all of this tailq business and replacing it with a simple array - saving a few bytes is not worth the extra hassle of dealing with pointers and potential memory allocation failures. Also, remove the tailq lock since it is not needed - these fd lists are per-process, and within a given process, it is always only one thread handling access to hugetlbfs. So, first one to allocate a segment will create a lockfile, and put a shared lock on it. When we're shrinking the page file, we will be trying to take out a write lock on that lockfile, which would fail if any other process is holding onto the lockfile as well. This way, we can know if we can shrink the segment file. Also, if no other locks are found in the lock list for a given memseg list, the memseg list fd is automatically closed. One other thing to note is, according to flock() Ubuntu manpage [2], upgrading the lock from shared to exclusive is implemented by dropping and reacquiring the lock, which is not atomic and thus would have created race conditions. So, on attempting to perform operations in hugetlbfs, we will take out a writelock on hugetlbfs directory, so that only one process could perform hugetlbfs operations concurrently. [1] http://manpages.ubuntu.com/manpages/artful/en/man2/fcntl.2freebsd.html [2] http://manpages.ubuntu.com/manpages/bionic/en/man2/flock.2.html Fixes: `66cc45e293` ("mem: replace memseg with memseg lists") Fixes: `582bed1e1d` ("mem: support mapping hugepages at runtime") Fixes: `a5ff05d60f` ("mem: support unmapping pages at runtime") Fixes: `2a04139f66` ("eal: add single file segments option") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-27 23:52:51 +02:00
Anatoly Burakov	2c8663f9d0	fbarray: make all fbarrays hidden files fbarray stores its data in a shared file, which is not hidden. This leads to polluting user's HOME directory with visible files when running DPDK as non-root. Change fbarray to always create hidden files by default. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-23 22:44:17 +02:00
Anatoly Burakov	cb97d93e9d	mem: share hugepage info primary and secondary Since we are going to need to map hugepages in both primary and secondary processes, we need to know where we should look for hugetlbfs mountpoints. So, share those with secondary processes, and map them on init. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	c44d09811b	eal: add shared indexed file-backed array rte_fbarray is a simple indexed array stored in shared memory via mapping files into memory. Rationale for its existence is the following: since we are going to map memory page-by-page, there could be quite a lot of memory segments to keep track of (for smaller page sizes, page count can easily reach thousands). We can't really make page lists truly dynamic and infinitely expandable, because that involves reallocating memory (which is a big no-no in multiprocess). What we can do instead is have a maximum capacity as something really, really large, and decide at allocation time how big the array is going to be. We map the entire file into memory, which makes it possible to use fbarray as shared memory, provided the structure itself is allocated in shared memory. Per-fbarray locking is also used to avoid index data races (but not contents data races - that is up to user application to synchronize). In addition, in understanding that we will frequently need to scan this array for free space and iterating over array linearly can become slow, rte_fbarray provides facilities to index array's usage. The following use cases are covered: - find next free/used slot (useful either for adding new elements to fbarray, or walking the list) - find starting index for next N free/used slots (useful for when we want to allocate chunk of VA-contiguous memory composed of several pages) - find how many contiguous free/used slots there are, starting from specified index (useful for when we want to figure out how many pages we have until next hole in allocated memory, to speed up some bulk operations where we would otherwise have to walk the array and add pages one by one) This is accomplished by storing a usage mask in-memory, right after the data section of the array, and using some bit-level magic to figure out the info we need. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 19:55:21 +02:00
Jianfeng Tan	bacaa27540	eal: add channel for multi-process communication Previouly, there are three channels for multi-process (i.e., primary/secondary) communication. 1. Config-file based channel, in which, the primary process writes info into a pre-defined config file, and the secondary process reads the info out. 2. vfio submodule has its own channel based on unix socket for the secondary process to get container fd and group fd from the primary process. 3. pdump submodule also has its own channel based on unix socket for packet dump. It'd be good to have a generic communication channel for multi-process communication to accommodate the requirements including: a. Secondary wants to send info to primary, for example, secondary would like to send request (about some specific vdev to primary). b. Sending info at any time, instead of just initialization time. c. Share FDs with the other side, for vdev like vhost, related FDs (memory region, kick) should be shared. d. A send message request needs the other side to response immediately. This patch proposes to create a communication channel, based on datagram unix socket, for above requirements. Each process will block on a unix socket waiting for messages from the peers. Three new APIs are added: 1. rte_eal_mp_action_register() is used to register an action, indexed by a string, when a component at receiver side would like to response the messages from the peer processe. 2. rte_eal_mp_action_unregister() is used to unregister the action if the calling component does not want to response the messages. 3. rte_eal_mp_sendmsg() is used to send a message, and returns immediately. If there are n secondary processes, the primary process will send n messages. Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2018-01-30 15:09:42 +01:00
Bruce Richardson	369991d997	lib: use SPDX tag for Intel copyright files Replace the BSD license header with the SPDX tag for files with only an Intel copyright on them. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2018-01-04 22:41:39 +01:00
Yuanhan Liu	016a23a81e	mem: remove single file segments RTE_EAL_SINGLE_FILE_SEGMENTS was introduced with ivshmem integration. Now that ivshmem was removed (commit `c711ccb309`) and a simple git grep shows no one else references it; I think we can now remove it. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>	2016-10-03 15:20:51 +02:00
Thomas Monjalon	33e25b3394	eal: fix header guards Some guards are missing or have a wrong name. Others have LINUXAPP in their name but are now common. Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2014-11-25 13:30:23 +01:00
Thomas Monjalon	8828a3210c	eal: factorize common headers No need to have different headers for Linux and BSD. These files are identicals with exception of internal config which has uio and vfio fields only useful for Linux. Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2014-11-25 13:16:24 +01:00

20 Commits