numam-dpdk

Author	SHA1	Message	Date
Stephen Hemminger	5f4eb82f3c	log: close in cleanup stage When application calls rte_eal_cleanup on shutdown, the DPDK log should be closed and cleaned up. This helps reduce false reports from tools like ASAN and valgrind that track memory leaks. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2022-02-11 19:49:22 +01:00
Stephen Hemminger	6e97b5fc1a	eal: move Unix filesystem functions into one file Both Linux and FreeBSD have same code for creating runtime directory and reading sysfs files. Put them in the new lib/eal/unix subdirectory. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2022-02-09 19:12:53 +01:00
Stephen Hemminger	1835a22f34	support systemd service convention for runtime directory Systemd.exec supports configuring the runtime directory of a service via RuntimeDirectory=. This creates the directory with the necessary permissions which actual service may not have if running in container. The change to DPDK is to look for the environment RUNTIME_DIRECTORY first and use that in preference to the fallback alternatives. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: Morten Brørup <mb@smartsharesystems.com>	2022-02-09 19:12:40 +01:00
Stephen Hemminger	36514d8dfa	eal: remove size for setting runtime directory The size argument to eal_set_runtime_dir is useless and was being used incorrectly in strlcpy. It worked only because all callers passed PATH_MAX which is same as sizeof the destination runtime_dir. Note: this is an internal API so no user exposed change. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Morten Brørup <mb@smartsharesystems.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2022-02-09 16:42:31 +01:00
Dmitry Kozlyuk	32b4771cd8	eal/linux: allow hugepage file reuse Linux EAL ensured that mapped hugepages are clean by always mapping from newly created files: existing hugepage backing files were always removed. In this case, the kernel clears the page to prevent data leaks, because the mapped memory may contain leftover data from the previous process that was using this memory. Clearing takes the bulk of the time spent in mmap(2), increasing EAL initialization time. Introduce a mode to keep existing files and reuse them in order to speed up initial memory allocation in EAL. Hugepages mapped from such files may contain data left by the previous process that used this memory, so RTE_MEMSEG_FLAG_DIRTY is set for their segments. If multiple hugepages are mapped from the same file: 1. When fallocate(2) is used, all memory mapped from this file is considered dirty, because it is unknown which parts of the file are holes. 2. When ftruncate(3) is used, memory mapped from this file is considered dirty unless the file is extended to create a new mapping, which implies clean memory. Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2022-02-08 21:32:53 +01:00
Dmitry Kozlyuk	52d7d91ed4	eal: refactor --huge-unlink storage In preparation to extend --huge-unlink option semantics refactor how it is stored in the internal configuration. It makes future changes more isolated. Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com> Acked-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2022-02-08 21:32:53 +01:00
Stephen Hemminger	8a5a91401d	eal/linux: log hugepage create errors with filename While debugging running DPDK service in a container, it is useful to see which file creation failed. Don't hide this failure with DEBUG. Cc: stable@dpdk.org Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2022-01-21 15:40:58 +01:00
Josh Soref	7be78d0279	fix spelling in comments and strings The tool comes from https://github.com/jsoref Signed-off-by: Josh Soref <jsoref@gmail.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2022-01-11 12:16:53 +01:00
Maciej Szwed	aeed570a21	interrupt: fix request notifier interrupt processing We should call read() on RTE_INTR_HANDLE_VFIO_REQ event to confirm that event. Fixes: `0eb8a1c4c7` ("vfio: add request notifier interrupt") Cc: stable@dpdk.org Signed-off-by: Maciej Szwed <maciej.szwed@intel.com>	2021-11-08 18:26:07 +01:00
Harman Kalra	7e2083e462	eal/linux: check interrupt file descriptor validity This patch fixes coverity issue by adding a check for negative event fd value. Coverity issue: 373711, 373694 Fixes: `c2bd9367e1` ("lib: remove direct access to interrupt handle") Signed-off-by: Harman Kalra <hkalra@marvell.com> Acked-by: David Marchand <david.marchand@redhat.com>	2021-11-08 17:32:42 +01:00
Harman Kalra	3fcca9fac6	interrupts: check file descriptor validity This patch fixes coverity issues by adding a check for negative event fd value. Coverity issue: 373716, 373699, 373693, 373688 Fixes: `bbbac4cd6e` ("interrupts: remove direct access to interrupt handle") Signed-off-by: Harman Kalra <hkalra@marvell.com> Acked-by: David Marchand <david.marchand@redhat.com>	2021-11-08 17:32:42 +01:00
Anatoly Burakov	84e03bde1c	vfio: drop fallback Linux implementation Currently, VFIO support for Linux is compiled unconditionally, and supported kernel versions start with 4.4, so VFIO is assumed to always be enabled. There is no way of disabling VFIO support at compile time anyway, so just drop the "VFIO not available" fallback code altogether. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Chenbo Xia <chenbo.xia@intel.com>	2021-11-08 16:27:15 +01:00
Olivier Matz	9bffc92850	mem: fix dynamic hugepage mapping in container Since its introduction in 2018, the SIGBUS handler was never registered, and all related functions were unused. A SIGBUS can be received by the application when accessing to hugepages even if mmap() was successful, This happens especially when running inside containers when there is not enough hugepages. In this case, we need to recover. A similar scheme can be found in eal_memory.c. Fixes: `582bed1e1d` ("mem: support mapping hugepages at runtime") Cc: stable@dpdk.org Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: David Marchand <david.marchand@redhat.com>	2021-11-05 15:28:55 +01:00
David Marchand	5633173341	eal/linux: fix device hotplug The device event interrupt handler was always freed. Bugzilla ID: 845 Fixes: `c2bd9367e1` ("lib: remove direct access to interrupt handle") Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Yan Xia <yanx.xia@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2021-11-04 15:13:41 +01:00
David Marchand	4847122aab	eal/linux: fix uevent message parsing Caught with ASan: ==9727==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7f0daa2fc0d0 at pc 0x7f0daeefacb2 bp 0x7f0daa2fadd0 sp 0x7f0daa2fa578 READ of size 1 at 0x7f0daa2fc0d0 thread T1 #0 0x7f0daeefacb1 (/lib64/libasan.so.5+0xbacb1) #1 0x115eba1 in dev_uev_parse ../lib/eal/linux/eal_dev.c:167 #2 0x115f281 in dev_uev_handler ../lib/eal/linux/eal_dev.c:248 #3 0x1169b91 in eal_intr_process_interrupts ../lib/eal/linux/eal_interrupts.c:1026 #4 0x116a3a2 in eal_intr_handle_interrupts ../lib/eal/linux/eal_interrupts.c:1100 #5 0x116a7f0 in eal_intr_thread_main ../lib/eal/linux/eal_interrupts.c:1172 #6 0x112640a in ctrl_thread_init ../lib/eal/common/eal_common_thread.c:202 #7 0x7f0dade27159 in start_thread (/lib64/libpthread.so.0+0x8159) #8 0x7f0dadb58f72 in clone (/lib64/libc.so.6+0xfcf72) Address 0x7f0daa2fc0d0 is located in stack of thread T1 at offset 4192 in frame #0 0x115f0c9 in dev_uev_handler ../lib/eal/linux/eal_dev.c:226 This frame has 2 object(s): [32, 48) 'uevent' [96, 4192) 'buf' <== Memory access at offset 4192 overflows this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext (longjmp and C++ exceptions are supported) Thread T1 created by T0 here: #0 0x7f0daee92ea3 in __interceptor_pthread_create (/lib64/libasan.so.5+0x52ea3) #1 0x1126542 in rte_ctrl_thread_create ../lib/eal/common/eal_common_thread.c:228 #2 0x116a8b5 in rte_eal_intr_init ../lib/eal/linux/eal_interrupts.c:1200 #3 0x1159dd1 in rte_eal_init ../lib/eal/linux/eal.c:1044 #4 0x7a22f8 in main ../app/test-pmd/testpmd.c:4105 #5 0x7f0dada7f802 in __libc_start_main (/lib64/libc.so.6+0x23802) Bugzilla ID: 792 Fixes: `0d0f478d04` ("eal/linux: add uevent parse and process") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Yan Xia <yanx.xia@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2021-11-04 15:13:41 +01:00
Jim Harris	628bac7df1	eal/linux: remove unused variable for socket memory clang-13 rightfully complains that the total_mem variable in eal_parse_socket_arg is set but not used, since the final accumulated total_mem result isn't used anywhere. So just remove the total_mem variable. Fixes: `0a703f0f36` ("eal/linux: fix parsing zero socket memory and limits") Signed-off-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com>	2021-11-04 13:27:18 +01:00
Anatoly Burakov	ab910a8068	vfio: fix partial unmap Partial unmap support was introduced in commit `c13ca4e81c` ("vfio: fix DMA mapping granularity for IOVA as VA"), and with it was added a check that dereferenced the IOMMU type to determine whether partial ummapping is supported for currently configured IOMMU type. In certain circumstances (such as when VFIO is supported, but no devices were bound to the VFIO driver), the IOMMU type pointer can be NULL. However, dereferencing of IOMMU type was guarded by access to the user maps list - that is, we were always checking the user map list first, and then, if we found a memory region that encloses the one we're trying to unmap, we would have performed the IOMMU type check. This ensured that the IOMMU type check will not cause any NULL pointer dereferences, because in order for an IOMMU type check to have been performed, there necessarily must have been at least one memory region that was previously mapped successfully, and that implies having a defined IOMMU type. When commit `56259f7fc0` ("vfio: allow partially unmapping adjacent memory") was introduced, the IOMMU type check was moved to before we were traversing the user mem maps list, thereby introducing a potential NULL dereference, because the IOMMU type access was no longer guarded by the user mem maps list traversal. Fix the issue by moving the IOMMU type check to after the user mem maps traversal, thereby ensuring that by the time the check happens, the IOMMU type is always valid. Fixes: `56259f7fc0` ("vfio: allow partially unmapping adjacent memory") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Tested-by: Xuan Ding <xuan.ding@intel.com>	2021-10-28 09:51:55 +02:00
Harman Kalra	c2bd9367e1	lib: remove direct access to interrupt handle Removing direct access to interrupt handle structure fields, rather use respective get set APIs for the same. Making changes to all the libraries access the interrupt handle fields. Signed-off-by: Harman Kalra <hkalra@marvell.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Raslan Darawsheh <rasland@nvidia.com>	2021-10-25 21:20:12 +02:00
Harman Kalra	90b13ab8d4	alarm: remove direct access to interrupt handle Removing direct access to interrupt handle structure fields, rather use respective get set APIs for the same. Making changes to all the libraries access the interrupt handle fields. Implementing alarm cleanup routine, where the memory allocated for interrupt instance can be freed. Signed-off-by: Harman Kalra <hkalra@marvell.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Raslan Darawsheh <rasland@nvidia.com>	2021-10-25 21:20:12 +02:00
Harman Kalra	bbbac4cd6e	interrupts: remove direct access to interrupt handle Making changes to the interrupt framework to use interrupt handle APIs to get/set any field. Signed-off-by: Harman Kalra <hkalra@marvell.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Raslan Darawsheh <rasland@nvidia.com>	2021-10-25 21:20:12 +02:00
Honnappa Nagarahalli	3596537005	eal: fix memory ordering around lcore task accesses Ensure that the memory operations before the call to rte_eal_remote_launch are visible to the worker thread. Use the function pointer to execute in worker thread as the guard variable. Ensure that the memory operations in worker thread, that happen before it returns the status of the assigned function, are visible to the main thread. Use the variable containing the lcore's state as the guard variable. Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Feifei Wang <feifei.wang2@arm.com>	2021-10-25 18:20:59 +02:00
Honnappa Nagarahalli	f6c6c686f1	eal: remove FINISHED lcore state FINISHED state seems to be used to indicate that the worker's update of the 'state' is not visible to other threads. There seems to be no requirement to have such a state. Since the FINISHED state is removed, the API rte_eal_wait_lcore is updated to always return the status of the last function that ran in the worker core. Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Feifei Wang <feifei.wang2@arm.com>	2021-10-25 18:20:59 +02:00
Honnappa Nagarahalli	33969e9c61	eal: reset lcore task callback and argument In the rte_eal_remote_launch function, the lcore function pointer is checked for NULL. However, the pointer is never reset to NULL. Reset the lcore function pointer and argument after the worker has completed executing the lcore function. Fixes: `af75078fec` ("first public release") Cc: stable@dpdk.org Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Feifei Wang <feifei.wang2@arm.com>	2021-10-25 18:20:59 +02:00
Xuan Ding	56259f7fc0	vfio: allow partially unmapping adjacent memory Currently, if we map a memory area A, then map a separate memory area B that by coincidence happens to be adjacent to A, current implementation will merge these two segments into one, and if partial unmapping is not supported, these segments will then be only allowed to be unmapped in one go. In other words, given segments A and B that are adjacent, it is currently not possible to map A, then map B, then unmap A. Fix this by adding a notion of "chunk size", which will allow subdividing segments into equally sized segments whenever we are dealing with an IOMMU that does not support partial unmapping. With this change, we will still be able to merge adjacent segments, but only if they are of the same size. If we keep with our above example, adjacent segments A and B will be stored as separate segments if they are of different sizes. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Signed-off-by: Xuan Ding <xuan.ding@intel.com> Tested-by: Yvonne Yang <yvonnex.yang@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2021-10-21 14:24:21 +02:00
Bruce Richardson	e89463a366	eal: limit telemetry to primary processes Telemetry interface should be exposed for primary processes only, since secondary processes will conflict on socket creation, and since all data in secondary process is generally available to primary. For example, all device stats for ethdevs, cryptodevs, etc. will all be common across processes. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Ciara Power <ciara.power@intel.com> Tested-by: Conor Walsh <conor.walsh@intel.com>	2021-10-14 20:31:10 +02:00
John Levon	24d5a1ce6b	eal/linux: allow hugetlbfs sub-directories get_hugepage_dir() was implemented in such a way that a --huge-dir option had to exactly match the mountpoint, but there's no reason for this restriction: DPDK might not be the only user of hugepages, and shouldn't assume it owns an entire mountpoint. For example, if I have /dev/hugepages/myapp, and /dev/hugepages/dpdk, I should be able to specify: --huge-dir=/dev/hugepages/dpdk/ and have DPDK only use that sub-directory. Fix the implementation to allow a sub-directory within a suitable hugetlbfs mountpoint to be specified, preferring the closest match. Signed-off-by: John Levon <john.levon@nutanix.com> Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> Reviewed-by: David Marchand <david.marchand@redhat.com>	2021-10-12 21:07:46 +02:00
William Tu	f1f6ebc0ea	eal: remove sys/queue.h from public headers Currently there are some public headers that include 'sys/queue.h', which is not POSIX, but usually provided by the Linux/BSD system library. (Not in POSIX.1, POSIX.1-2001, or POSIX.1-2008. Present on the BSDs.) The file is missing on Windows. During the Windows build, DPDK uses a bundled copy, so building a DPDK library works fine. But when OVS or other applications use DPDK as a library, because some DPDK public headers include 'sys/queue.h', on Windows, it triggers an error due to no such file. One solution is to install the 'lib/eal/windows/include/sys/queue.h' into Windows environment, such as [1]. However, this means DPDK exports the functionalities of 'sys/queue.h' into the environment, which might cause symbols, macros, headers clashing with other applications. The patch fixes it by removing the "#include <sys/queue.h>" from DPDK public headers, so programs including DPDK headers don't depend on the system to provide 'sys/queue.h'. When these public headers use macros such as TAILQ_xxx, we replace it by the ones with RTE_ prefix. For Windows, we copy the definitions from <sys/queue.h> to rte_os.h in Windows EAL. Note that these RTE_ macros are compatible with <sys/queue.h>, both at the level of API (to use with <sys/queue.h> macros in C files) and ABI (to avoid breaking it). Additionally, the TAILQ_FOREACH_SAFE is not part of <sys/queue.h>, the patch replaces it with RTE_TAILQ_FOREACH_SAFE. [1] http://mails.dpdk.org/archives/dev/2021-August/216304.html Suggested-by: Nick Connolly <nick.connolly@mayadata.io> Suggested-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>	2021-10-01 13:09:43 +02:00
Bruce Richardson	ce382fdddb	eal: create runtime dir even when shared data is not used When multi-process is not wanted and DPDK is run with the "no-shconf" flag, the telemetry library still needs a runtime directory to place the unix socket for telemetry connections. Therefore, rather than not creating the directory when this flag is set, we can change the code to attempt the creation anyway, but not error out if it fails. If it succeeds, then telemetry will be available, but if it fails, the rest of DPDK will run without telemetry. This ensures that the "in-memory" flag will allow DPDK to run even if the whole filesystem is read-only, for example. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: Morten Brørup <mb@smartsharesystems.com> Reviewed-by: David Marchand <david.marchand@redhat.com>	2021-07-07 15:23:09 +02:00
Bruce Richardson	99a2dd955f	lib: remove librte_ prefix from directory names There is no reason for the DPDK libraries to all have 'librte_' prefix on the directory names. This prefix makes the directory names longer and also makes it awkward to add features referring to individual libraries in the build - should the lib names be specified with or without the prefix. Therefore, we can just remove the library prefix and use the library's unique name as the directory name, i.e. 'eal' rather than 'librte_eal' Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2021-04-21 14:04:09 +02:00

29 Commits