numam-dpdk

Author	SHA1	Message	Date
Olivier Matz	dec7b1884a	use sizeof to avoid double use of a length define Only a cosmetic change: the *_LEN defines are already used when defining the buffer. Using sizeof() ensures that the length stays consistent, even if the definition is modified. Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-25 00:51:31 +02:00
Jianfeng Tan	79967252c3	eal: bring forward multi-process channel init Adjust the init sequence: put mp channel init before bus scan so that we can init the vdev bus through mp channel in the secondary process before the bus scan. Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Reviewed-by: Qi Zhang <qi.z.zhang@intel.com>	2018-04-24 12:31:26 +02:00
Jianfeng Tan	b8c835909e	ipc: fix timeout handling in async In original implementation, timeout event for an async request will be ignored. As a result, an async request will never trigger the action if it cannot receive any reply any more. We fix this by counting timeout as a processed reply. Fixes: f05e26051c15 ("eal: add IPC asynchronous request") Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-23 22:45:05 +02:00
Jianfeng Tan	2147c09505	ipc: clean up code Following below commit, we change some internal function and variable names: commit ce3a7312357b ("eal: rename IPC request as synchronous one") Also use calloc to supersede malloc + memset for code clean up. Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-23 22:44:26 +02:00
Anatoly Burakov	441d676777	ipc: fix resource leak in init failure Coverity issue: 272609 Fixes: f05e26051c15 ("eal: add IPC asynchronous request") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-04-23 22:44:25 +02:00
Anatoly Burakov	dd7b7f9a52	ipc: fix return without mutex unlock gettimeofday() returning a negative value is highly unlikely, but if it ever happens, we will exit without unlocking the mutex. Arguably at that point we'll have bigger problems, but fix this issue anyway. Coverity issue: 272595 Fixes: f05e26051c15 ("eal: add IPC asynchronous request") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-04-23 22:44:24 +02:00
Anatoly Burakov	505721e170	ipc: use strlcpy where applicable This also silences (or should silence) a few Coverity false positives where we used strcpy before (Coverity complained about not checking buffer size, but source buffers were always known to be sized correctly). Coverity issue: 260407, 272565, 272582 Fixes: bacaa2754017 ("eal: add channel for multi-process communication") Fixes: f05e26051c15 ("eal: add IPC asynchronous request") Fixes: 783b6e54971d ("eal: add synchronous multi-process communication") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-04-23 22:44:23 +02:00
Anatoly Burakov	7508be4cce	fbarray: check sysconf failure sysconf() may return a negative value, check for it. Coverity issue: 272586 Fixes: c44d09811b40 ("eal: add shared indexed file-backed array") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>	2018-04-23 22:44:22 +02:00
Anatoly Burakov	f9a4f1b462	fbarray: fix potential null-dereference We get pointer to mask before we check if fbarray is NULL. Fix by moving getting mask pointer to until after NULL check. Coverity issue: 272579 Fixes: c44d09811b40 ("eal: add shared indexed file-backed array") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>	2018-04-23 22:44:21 +02:00
Anatoly Burakov	2bcbc4d12c	fbarray: check for open failure Coverity issue: 272564 Fixes: c44d09811b40 ("eal: add shared indexed file-backed array") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>	2018-04-23 22:44:21 +02:00
Anatoly Burakov	9d3ba1e0ad	fbarray: use strlcpy instead of snprintf Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>	2018-04-23 22:44:20 +02:00
Anatoly Burakov	2c8663f9d0	fbarray: make all fbarrays hidden files fbarray stores its data in a shared file, which is not hidden. This leads to polluting user's HOME directory with visible files when running DPDK as non-root. Change fbarray to always create hidden files by default. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-23 22:44:17 +02:00
Xiao Wang	b3a022b17c	vfio: fix boundary check in region search A previously mapped region is skipped during the search, leading to DMA unmap fails. This patch fixes it and rewords the comment. Fixes: 73a639085938 ("vfio: allow to map other memory regions") Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-23 21:24:22 +02:00
Thomas Monjalon	91c6de7eb7	eal/linux: use strlcpy in uevent parsing Support of strlcpy has recently been added to DPDK. This replacement has been generated by the coccinelle script: devtools/cocci.sh devtools/cocci/strlcpy.cocci Fixes: 0d0f478d0483 ("eal/linux: add uevent parse and process") Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-23 16:23:15 +02:00
Yangchao Zhou	fb338b80e5	mem: fix leaks of hugedir and replace snprintf The hugedir returned by get_hugepage_dir is allocated by strdup but not released. Replace snprintf with a more suitable strlcpy. Coverity issue: 272585 Fixes: cb97d93e9d3b ("mem: share hugepage info primary and secondary") Signed-off-by: Yangchao Zhou <zhouyates@gmail.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-18 10:58:10 +02:00
Junjie Chen	1c9467a6ef	eal/x86: force inlining of memcpy sub-functions Sometimes gcc does not inline the function despite keyword inline, we observe rte_movX is not inline when doing performance profiling, so use always_inline keyword to force gcc to inline the function. Signed-off-by: Junjie Chen <junjie.j.chen@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-04-18 09:22:56 +02:00
Jianfeng Tan	83a73c5fef	vfio: use generic multi-process channel Previously, vfio uses its own private channel for the secondary process to get container fd and group fd from the primary process. This patch changes to use the generic mp channel. Test: 1. Bind two NICs to vfio-pci. 2. Start the primary and secondary process. $ (symmetric_mp) -c 2 -- -p 3 --num-procs=2 --proc-id=0 $ (symmetric_mp) -c 4 --proc-type=auto -- -p 3 \ --num-procs=2 --proc-id=1 Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-18 01:26:06 +02:00
Adrien Mazarguil	6b298c6285	eal: fix signed integers in fbarray While debugging startup issues encountered with Clang (see "eal: fix undefined behavior in fbarray"), I noticed that fbarray stores indices, sizes and masks on signed integers involved in bitwise operations. Such operations almost invariably cause undefined behavior with values that cannot be represented by the result type, as is often the case with bit-masks and left-shifts. This patch replaces them with unsigned integers as a safety measure and promotes a few internal variables to larger types for consistency. Coverity issue: 272598, 272599 Fixes: c44d09811b40 ("eal: add shared indexed file-backed array") Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-17 14:38:16 +02:00
Adrien Mazarguil	f2e5e85824	eal: fix undefined behavior in fbarray According to GCC documentation [1], the __builtin_clz() family of functions yield undefined behavior when fed a zero value. There is one instance in the fbarray code where this can occur. Clang (at least version 3.8.0-2ubuntu4) seems much more sensitive to this than GCC and yields random results when compiling optimized code, as shown below: #include <stdio.h> int main(void) { volatile unsigned long long moo; int x; moo = 0; x = __builtin_clzll(moo); printf("%d\n", x); return 0; } $ gcc -O3 -o test test.c && ./test 63 $ clang -O3 -o test test.c && ./test 1742715559 $ clang -O0 -o test test.c && ./test 63 Even 63 can be considered an unexpected result given the number of leading zeroes should be the full width of the underlying type, i.e. 64. In practice it causes find_next_n() to sometimes return negative values interpreted as errors by caller functions, which prevents DPDK applications from starting due to inability to find free memory segments: # testpmd [...] EAL: Detected 32 lcore(s) EAL: Detected 2 NUMA nodes EAL: No free hugepages reported in hugepages-1048576kB EAL: Multi-process socket /var/run/.rte_unix EAL: eal_memalloc_alloc_seg_bulk(): couldn't find suitable memseg_list EAL: FATAL: Cannot init memory EAL: Cannot init memory PANIC in main(): Cannot init EAL 4: [./build/app/testpmd(_start+0x29) [0x462289]] 3: [/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f19d54fc830]] 2: [./build/app/testpmd(main+0x8a3) [0x466193]] 1: [./build/app/testpmd(__rte_panic+0xd6) [0x4efaa6]] Aborted This problem appears with commit 66cc45e293ed ("mem: replace memseg with memseg lists") however the root cause is introduced by a prior patch. [1] https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html Fixes: c44d09811b40 ("eal: add shared indexed file-backed array") Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-17 14:37:27 +02:00
Anatoly Burakov	079527f069	malloc: fix not unlocking hotplug on fail to init We lock the hotplug during init, but do not unlock it if we couldn't register multiprocess callbacks. Add the missing unlock. Fixes: 07dcbfe0101f ("malloc: support multiprocess memory hotplug") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-17 12:36:40 +02:00
Anatoly Burakov	48e9728898	ipc: fix missing mutex unlocks on failed send Earlier fix for race condition introduced a bug where mutex wasn't unlocked if message failed to be sent. Fix all of this by moving locking out of mp_request_sync() altogether. Fixes: da5957821bdd ("eal: fix race condition in IPC request") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-17 10:23:05 +02:00
Anatoly Burakov	7d863e253e	ipc: fix missing ignore message name We are trying to notify sender that response from current process should be ignored, but we didn't specify which request this response was for. Fix by copying request name from the original message. Fixes: 579a4ccc345c ("eal: ignore IPC messages until init is complete") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-04-17 01:27:45 +02:00
Anatoly Burakov	35ae44d1e2	ipc: fix use-after-free in asynchronous requests Previously, we were removing request from the list only if we have succeeded to send it. This resulted in leaving an invalid pointer in the request list. Fix this by only adding new requests to the request list if we have succeeded in sending them. Fixes: f05e26051c15 ("eal: add IPC asynchronous request") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-04-17 01:27:27 +02:00
Anatoly Burakov	fe98e52a52	ipc: fix use-after-free in synchronous requests Previously, we were adding synchronous requests to request list, we were doing it after checking if request existed. However, we only removed the request from the request list if we have succeeded in sending the request. In case of failed request send, we left an invalid pointer in the request list. Fix this by only adding request to the list once we succeed in sending it. Fixes: 783b6e54971d ("eal: add synchronous multi-process communication") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-04-17 01:27:21 +02:00
Anatoly Burakov	2ae831fb42	ipc: stop async IPC loop on callback request EAL did not stop processing further asynchronous requests on encountering a request that should trigger the callback. This resulted in erasing valid requests but not triggering them. Fix this by stopping the loop once we have a request that can trigger the callback. Once triggered, we go back to scanning the request queue until there are no more callbacks to trigger. Fixes: f05e26051c15 ("eal: add IPC asynchronous request") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-04-17 01:27:20 +02:00
Anatoly Burakov	6e8a721044	vfio: export functions even when disabled Previously, VFIO functions were not compiled in and exported if VFIO compilation was disabled. Fix this by actually compiling all of the functions unconditionally, and provide missing prototypes on Linux. Fixes: 279b581c897d ("vfio: expose functions") Fixes: 73a639085938 ("vfio: allow to map other memory regions") Fixes: 964b2f3bfb07 ("vfio: export some internal functions") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-16 19:33:46 +02:00
Jeff Guo	0d0f478d04	eal/linux: add uevent parse and process In order to handle the uevent which has been detected from the kernel side, add uevent parse and process function to translate the uevent into device event, which user has subscribed to monitor. Signed-off-by: Jeff Guo <jia.guo@intel.com> Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-04-13 12:00:31 +02:00
Jeff Guo	a753e53d51	eal: add device event monitor framework This patch aims to add a general device event monitor framework at EAL device layer, for device hotplug awareness and actions adopted accordingly. It could also expand for all other types of device event monitor, but not in this scope at the stage. To get started, users firstly call below new added APIs to enable/disable the device event monitor mechanism: - rte_dev_event_monitor_start - rte_dev_event_monitor_stop Then users shell register or unregister callbacks through the new added APIs. Callbacks can be some device specific, or for all devices. -rte_dev_event_callback_register -rte_dev_event_callback_unregister Use hotplug case for example, when device hotplug insertion or hotplug removal, we will get notified from kernel, then call user's callbacks accordingly to handle it, such as detach or attach the device from the bus, and could benefit further fail-safe or live-migration. Signed-off-by: Jeff Guo <jia.guo@intel.com> Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-04-13 12:00:31 +02:00
Jeff Guo	493b8e173f	eal: add device event handle in interrupt thread Add new interrupt handle type of RTE_INTR_HANDLE_DEV_EVENT, for device event interrupt monitor. Signed-off-by: Jeff Guo <jia.guo@intel.com> Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>	2018-04-13 10:49:26 +02:00
Anatoly Burakov	08a20b3d37	vfio: fix device hotplug when several devices per group We only need to perform DMA mapping for first device in first group. At the time of mapping, we haven't yet added the device into the group, so the count is expected to be zero. Fixes: 810bfa64c673 ("vfio: fix index for tracking devices in a group") Fixes: a9c349e3a100 ("vfio: fix device unplug when several devices per group") Fixes: 94c0776b1bad ("vfio: support hotplug") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-13 01:17:55 +02:00
Hemant Agrawal	964b2f3bfb	vfio: export some internal functions This patch moves some of the internal vfio functions from eal_vfio.h to rte_vfio.h for common uses with "rte_" prefix. This patch also change the FSLMC bus usages from the internal VFIO functions to external ones with "rte_" prefix Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-13 01:06:57 +02:00
Hemant Agrawal	c94eb6db0a	doc: add VFIO API in doxygen Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>	2018-04-13 01:06:12 +02:00
Neil Horman	34fbfa585c	mem: set fd to -1 for anonymous mmap https://dpdk.org/tracker/show_bug.cgi?id=18 Indicated that several mmap call sites in the [linux\|bsd]app eal code set fd that was not -1 in their calls while using MAP_ANONYMOUS. While probably not a huge deal, the man page does say the fd should be -1 for portability, as some implementations don't ignore fd as they should for MAP_ANONYMOUS. Suggested-by: Solal Pirelli <solal.pirelli@gmail.com> Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-04-12 14:44:24 +02:00
Pavan Nikhilesh	7bdccb9307	eal: fix ARM build with clang Use __atomic_exchange_n instead of __atomic_exchange_(2/4/8). The error was: include/generic/rte_atomic.h:215:9: error: implicit declaration of function '__atomic_exchange_2' is invalid in C99 include/generic/rte_atomic.h:494:9: error: implicit declaration of function '__atomic_exchange_4' is invalid in C99 include/generic/rte_atomic.h:772:9: error: implicit declaration of function '__atomic_exchange_8' is invalid in C99 Fixes: ff2863570fcc ("eal: introduce atomic exchange operation") Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>	2018-04-11 22:39:50 +02:00
Anatoly Burakov	6f63858e55	mem: prevent preallocated pages from being freed It is common sense to expect for DPDK process to not deallocate any pages that were preallocated by "-m" or "--socket-mem" flags - yet, currently, DPDK memory subsystem will do exactly that once it finds that the pages are unused. Fix this by marking pages as unfreebale, and preventing malloc from ever trying to free them. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:56 +02:00
Anatoly Burakov	93723dd917	malloc: enable validation before new page allocation Before allocating a new page, give a chance to the user to allow or deny allocation via callbacks. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:56 +02:00
Anatoly Burakov	2e378ff297	mem: add validator callback This API will enable application to register for notifications on page allocations that are about to happen, giving the application a chance to allow or deny the allocation when total memory utilization as a result would be above specified limit on specified socket. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:56 +02:00
Anatoly Burakov	6b42f75632	eal: enable non-legacy memory mode Now that every other piece of the puzzle is in place, enable non-legacy init mode. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:56 +02:00
Anatoly Burakov	43e4631371	vfio: support memory event callbacks Enable callbacks on first device attach, disable callbacks on last device attach. PPC64 IOMMU does memseg walk, which will cause a deadlock on trying to do it inside a callback, so provide a local, thread-unsafe copy of memseg walk. PPC64 IOMMU also may remap the entire memory map for DMA while adding new elements to it, so change user map list lock to a recursive lock. That way, we can safely enter rte_vfio_dma_map(), lock the user map list, enter DMA mapping function and lock the list again (for reading previously existing maps). Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	76b15480d6	malloc: enable callbacks on alloc/free and mp sync Callbacks will be triggered just after allocation and just before deallocation, to ensure that memory address space referenced in the callback is always valid by the time callback is called. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	56efb4c117	malloc: support callbacks on memory events Each process will have its own callbacks. Callbacks will indicate whether it's allocation and deallocation that's happened, and will also provide start VA address and length of allocated block. Since memory hotplug isn't supported on FreeBSD and in legacy mem mode, it will not be possible to register them in either. Callbacks are called whenever something happens to the memory map of current process, therefore at those times memory hotplug subsystem is write-locked, which leads to deadlocks on attempt to use these functions. Document the limitation. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	07dcbfe010	malloc: support multiprocess memory hotplug This enables multiprocess synchronization for memory hotplug requests at runtime (as opposed to initialization). Basic workflow is the following. Primary process always does initial mapping and unmapping, and secondary processes always follow primary page map. Only one allocation request can be active at any one time. When primary allocates memory, it ensures that all other processes have allocated the same set of hugepages successfully, otherwise any allocations made are being rolled back, and heap is freed back. Heap is locked throughout the process, and there is also a global memory hotplug lock, so no race conditions can happen. When primary frees memory, it frees the heap, deallocates affected pages, and notifies other processes of deallocations. Since heap is freed from that memory chunk, the area basically becomes invisible to other processes even if they happen to fail to unmap that specific set of pages, so it's completely safe to ignore results of sync requests. When secondary allocates memory, it does not do so by itself. Instead, it sends a request to primary process to try and allocate pages of specified size and on specified socket, such that a specified heap allocation request could complete. Primary process then sends all secondaries (including the requestor) a separate notification of allocated pages, and expects all secondary processes to report success before considering pages as "allocated". Only after primary process ensures that all memory has been successfully allocated in all secondary process, it will respond positively to the initial request, and let secondary proceed with the allocation. Since the heap now has memory that can satisfy allocation request, and it was locked all this time (so no other allocations could take place), secondary process will be able to allocate memory from the heap. When secondary frees memory, it hides pages to be deallocated from the heap. Then, it sends a deallocation request to primary process, so that it deallocates pages itself, and then sends a separate sync request to all other processes (including the requestor) to unmap the same pages. This way, even if secondary fails to notify other processes of this deallocation, that memory will become invisible to other processes, and will not be allocated from again. So, to summarize: address space will only become part of the heap if primary process can ensure that all other processes have allocated this memory successfully. If anything goes wrong, the worst thing that could happen is that a page will "leak" and will not be available to neither DPDK nor the system, as some process will still hold onto it. It's not an actual leak, as we can account for the page - it's just that none of the processes will be able to use this page for anything useful, until it gets allocated from by the primary. Due to underlying DPDK IPC implementation being single-threaded, some asynchronous magic had to be done, as we need to complete several requests before we can definitively allow secondary process to use allocated memory (namely, it has to be present in all other secondary processes before it can be used). Additionally, only one allocation request is allowed to be submitted at once. Memory allocation requests are only allowed when there are no secondary processes currently initializing. To enforce that, a shared rwlock is used, that is set to read lock on init (so that several secondaries could initialize concurrently), and write lock on making allocation requests (so that either secondary init will have to wait, or allocation request will have to wait until all processes have initialized). Any other function that wishes to iterate over memory or prevent allocations should be using memory hotplug lock. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	1403f87d4f	malloc: enable memory hotplug support This set of changes enables rte_malloc to allocate and free memory as needed. Currently, it is disabled because legacy mem mode is enabled unconditionally. The way it works is, first malloc checks if there is enough memory already allocated to satisfy user's request. If there isn't, we try and allocate more memory. The reverse happens with free - we free an element, check its size (including free element merging due to adjacency) and see if it's bigger than hugepage size and that its start and end span a hugepage or more. Then we remove the area from malloc heap (adjusting element lengths where appropriate), and deallocate the page. For legacy mode, runtime alloc/free of pages is disabled. It is worth noting that memseg lists are being sorted by page size, and that we try our best to satisfy user's request. That is, if the user requests an element from a 2MB page memory, we will check if we can satisfy that request from existing memory, if not we try and allocate more 2MB pages. If that fails and user also specified a "size is hint" flag, we then check other page sizes and try to allocate from there. If that fails too, then, depending on flags, we may try allocating from other sockets. In other words, we try our best to give the user what they asked for, but going to other sockets is last resort - first we try to allocate more memory on the same socket. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	6167d81488	mem: add secondary process init with memory hotplug Secondary initialization will just sync memory map with primary process. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	cb97d93e9d	mem: share hugepage info primary and secondary Since we are going to need to map hugepages in both primary and secondary processes, we need to know where we should look for hugetlbfs mountpoints. So, share those with secondary processes, and map them on init. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	41519b9006	mem: make use of memory hotplug for init Add a new (non-legacy) memory init path for EAL. It uses the new memory hotplug facilities. If no -m or --socket-mem switches were specified, the new init will not allocate anything, whereas if those switches were passed, appropriate amounts of pages would be requested, just like for legacy init. Allocated pages will be physically discontiguous (or rather, they're not guaranteed to be physically contiguous - they may still be so by accident) unless RTE_IOVA_VA mode is used. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	b666f17858	mem: read hugepage counts from node-specific sysfs path For non-legacy memory init mode, instead of looking at generic sysfs path, look at sysfs paths pertaining to each NUMA node for hugepage counts. Note that per-NUMA node path does not provide information regarding reserved pages, so we might not get the best info from these paths, but this saves us from the whole mapping/remapping business before we're actually able to tell which page is on which socket, because we no longer require our memory to be physically contiguous. Legacy memory init will not use this. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	524e43c2ad	mem: prepare memseg lists for multiprocess sync In preparation for implementing multiprocess support, we are adding a version number to memseg lists. We will not need any locks, because memory hotplug will have a global lock (so any time memory map and thus version number might change, we will already be holding a lock). There are two ways of implementing multiprocess support for memory hotplug: either all information about mapped memory is shared between processes, and secondary processes simply attempt to map/unmap memory based on requests from the primary, or secondary processes store their own maps and only check if they are in sync with the primary process' maps. This implementation will opt for the latter option: primary process shared mappings will be authoritative, and each secondary process will use its own interal view of mapped memory, and will attempt to synchronize on these mappings using versioning. Under this model, only primary process will decide which pages get mapped, and secondary processes will only copy primary's page maps and get notified of the changes via IPC mechanism (coming in later commits). Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	c8f73de36e	mem: add function to check if memory is contiguous For now, memory is always contiguous because legacy mem mode is enabled unconditionally, but this function will be helpful down the line when we implement support for allocating physically non-contiguous memory. We can no longer guarantee physically contiguous memory unless we're in legacy or IOVA_AS_VA mode, but we can certainly try and see if we succeed. In addition, this would be useful for e.g. PMD's who may allocate chunks that are smaller than the pagesize, but they must not cross the page boundary, in which case we will be able to accommodate that request. This function will also support non-hugepage memory. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00
Anatoly Burakov	2a04139f66	eal: add single file segments option Currently, DPDK stores all pages as separate files in hugetlbfs. This option will allow storing all pages in one file (one file per memseg list). We do this by using fallocate() calls on FreeBSD, however this is only supported on fairly recent (4.3+) kernels, so ftruncate() fallback is provided to grow (but not shrink) hugepage files. Naming scheme is deterministic, so both primary and secondary processes will be able to easily map needed files and offsets. For multi-file segments, we can close fd's right away. For single-file segments, we can reuse the same fd and reduce the amount of fd's needed to map/use hugepages. However, we need to store the fd's somewhere, so we add a tailq. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>	2018-04-11 21:45:55 +02:00

1 2 3 4 5 ...

1662 Commits