numam-dpdk

Author	SHA1	Message	Date
Anand Rawat	82ba4416dd	build: add module definition files for Windows Updated lib/meson.build to create shared libraries on Windows. Added DEF files to list the exports for the eal and kvargs libraries. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Signed-off-by: Anand Rawat <anand.rawat@intel.com> Reviewed-by: Pallavi Kadam <pallavi.kadam@intel.com> Reviewed-by: Ranjit Menon <ranjit.menon@intel.com> Acked-by: Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>	2019-04-03 01:21:31 +02:00
Anand Rawat	58836e93f5	eal/windows: add wrappers for string functions Updated rte_common.h to include rte_os.h to contain OS specific macros and functions. Updated rte_string_fns.h to include rte_common.h for rte_os.h Signed-off-by: Anand Rawat <anand.rawat@intel.com> Reviewed-by: Pallavi Kadam <pallavi.kadam@intel.com> Reviewed-by: Ranjit Menon <ranjit.menon@intel.com> Acked-by: Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>	2019-04-03 01:21:15 +02:00
Anand Rawat	428eb983f5	eal: add OS specific header file Added rte_os.h files to support OS specific functionality. Updated build system to contain OS headers in the include path. Signed-off-by: Anand Rawat <anand.rawat@intel.com> Reviewed-by: Pallavi Kadam <pallavi.kadam@intel.com> Acked-by: Harini Ramakrishnan <harini.ramakrishnan@microsoft.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2019-04-03 01:11:56 +02:00
Anand Rawat	98edcbb5ab	eal/windows: introduce Windows support Added initial stub source files and required meson changes for Windows support. kernel/windows/meson is a stub file added to support Windows specific source in future releases. Signed-off-by: Pallavi Kadam <pallavi.kadam@intel.com> Signed-off-by: Anand Rawat <anand.rawat@intel.com> Reviewed-by: Jeff Shaw <jeffrey.b.shaw@intel.com> Reviewed-by: Ranjit Menon <ranjit.menon@intel.com> Acked-by: Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>	2019-04-03 01:06:01 +02:00
Thomas Monjalon	3c45889189	eal: remove exec-env directory Only one header file (rte_kni_common.h) was in the sub-directory include/exec-env/ This file was installed in a sub-directory of the same name in the makefile-based build. Source and install directories are moved as below: lib/librte_eal/linux/eal/include/exec-env/ -> lib/librte_eal/linux/eal/include/ build/include/exec-env/ -> build/include/ The consequence is to have a file hierarchy a bit more flat. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: David Marchand <david.marchand@redhat.com> Tested-by: David Marchand <david.marchand@redhat.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2019-04-02 21:49:35 +02:00
Anatoly Burakov	1e3380a2f4	mem: do not use lockfiles for single file segments mode Due to internal glibc limitations [1], DPDK may exhaust internal file descriptor limits when using smaller page sizes, which results in inability to use system calls such as select() by user applications. Single file segments option stores lock files per page to ensure that pages are deleted when there are no more users, however this is not necessary because the processes will be holding onto the pages anyway because of mmap(). Thus, removing pages from the filesystem is safe even though they may be used by some other secondary process. As a result, single file segments mode no longer stores inordinate amounts of segment fd's, and the above issue with fd limits is solved. However, this will not work for legacy mem mode. For that, simply document that using bigger page sizes is the only option. [1] https://mails.dpdk.org/archives/dev/2019-February/124386.html Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-04-02 16:07:25 +02:00
Anatoly Burakov	848cbff836	mem: refactor segment resizing function Currently, segment resizing code sits in one giant function which handles both in-memory and regular modes. Split them up into individual functions. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-04-02 16:07:13 +02:00
Darek Stojaczyk	ea4e3ab7bd	eal: initialize alarms early On Linux, we currently initialize rte_alarms after starting to listen for IPC hotplug requests, which gives us a data race window. Upon receiving such hotplug request we always try to set an alarm and this obviously doesn't work if the alarms weren't initialized yet. To fix it, we initialize alarms before starting to listen for IPC hotplug messages. Specifically, we move rte_eal_alarm_init() right after rte_eal_intr_init() as it makes some sense to keep those two close to each other. We update the BSD code as well to keep the initialization order the same in both EAL implementations. Fixes: `244d513071` ("eal: enable hotplug on multi-process") Cc: stable@dpdk.org Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>	2019-04-02 15:00:26 +02:00
Pavan Nikhilesh	e840cb3c2a	eal: increase max number of interrupt vectors MSI-X permits a device to allocate up to 2048 interrupts as per PCIe spec. Increase the max number of vectors to a reasonable value of 512. Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>	2019-04-02 02:59:04 +02:00
Natanael Copa	c2d82896ac	eal/linux: remove thread ID from debug message There is no guarantee that pthread_self() returns the thread ID or that pthread_t is an integer. The thread ID is not that useful so simply remove it. This fixes the following warning when building with musl libc: lib/librte_eal/linuxapp/eal/eal_dev.c: In function 'sigbus_handler': lib/librte_eal/linuxapp/eal/eal_dev.c:70:3: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] (int)pthread_self(), info->si_addr); ^ Fixes: `0fc54536b1` ("eal: add failure handling for hot-unplug") Cc: stable@dpdk.org Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>	2019-03-31 01:01:28 +01:00
Shahaf Shuler	c33a675b62	bus: introduce device level DMA memory mapping The DPDK APIs expose 3 different modes to work with memory used for DMA: 1. Use the DPDK owned memory (backed by the DPDK provided hugepages). This memory is allocated by the DPDK libraries, included in the DPDK memory system (memseg lists) and automatically DMA mapped by the DPDK layers. 2. Use memory allocated by the user and register to the DPDK memory systems. Upon registration of memory, the DPDK layers will DMA map it to all needed devices. After registration, allocation of this memory will be done with rte_malloc APIs. 3. Use memory allocated by the user and not registered to the DPDK memory system. This is for users who wants to have tight control on this memory (e.g. avoid the rte_malloc header). The user should create a memory, register it through rte_extmem_register API, and call DMA map function in order to register such memory to the different devices. The scope of the patch focus on #3 above. Currently the only way to map external memory is through VFIO (rte_vfio_dma_map). While VFIO is common, there are other vendors which use different ways to map memory (e.g. Mellanox and NXP). The work in this patch moves the DMA mapping to vendor agnostic APIs. Device level DMA map and unmap APIs were added. Implementation of those APIs was done currently only for PCI devices. For PCI bus devices, the pci driver can expose its own map and unmap functions to be used for the mapping. In case the driver doesn't provide any, the memory will be mapped, if possible, to IOMMU through VFIO APIs. Application usage with those APIs is quite simple: allocate memory * call rte_extmem_register on the memory chunk. * take a device, and query its rte_device. * call the device specific mapping function for this device. Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap APIs, leaving the rte device APIs as the preferred option for the user. Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>	2019-03-30 16:48:56 +01:00
Shahaf Shuler	0cbce3a167	vfio: skip DMA map failure if already mapped Currently vfio DMA map function will fail in case the same memory segment is mapped twice. This is too strict, as this is not an error to map the same memory twice. Instead, use the kernel return value to detect such state and have the DMA function to return as successful. For type1 mapping the kernel driver returns EEXISTS. For spapr mapping EBUSY is returned since kernel 4.10. Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>	2019-03-30 16:48:55 +01:00
Shahaf Shuler	4106d89a18	vfio: allow DMA map to the default container Enable users the option to call rte_vfio_dma_map with request to map to the default vfio fd. Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>	2019-03-30 16:47:54 +01:00
Anatoly Burakov	23d5455517	mem: warn user when running without NUMA support Running in non-legacy mode on a NUMA-enabled system without libnuma is unsupported, so explicitly print out a warning when trying to do so. Running in legacy mode without libnuma is still supported whether or not we are running with libnuma support enabled, so also fix init to allow that scenario. Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-03-30 00:13:04 +01:00
Anatoly Burakov	3660216ef1	malloc: fix IPC message initialization The memset size for an IPC message is set incorrectly. Fix it to cover the entire IPC message. Fixes: `07dcbfe010` ("malloc: support multiprocess memory hotplug") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-03-29 12:55:07 +01:00
Anatoly Burakov	b8a86c83e0	fbarray: fix init unlock without lock Certain failure paths of rte_fbarray_init() will unlock the mem area lock without locking it first. Fix this by properly handling the failures. Fixes: `5b61c62cfd` ("fbarray: add internal tailq for mapped areas") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-03-29 12:49:35 +01:00
Darek Stojaczyk	5a98bc5e83	fbarray: fix attach deadlock rte_fbarray_attach() currently locks its internal spinlock, but never releases it. Secondary processes won't even start if there is more than one fbarray to be attached to - the second rte_fbarray_attach() would be just stuck. Fix it by releasing the lock at the end of rte_fbarray_attach(). I believe this was the original intention. Fixes: `5b61c62cfd` ("fbarray: add internal tailq for mapped areas") Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-03-29 12:49:35 +01:00
Anatoly Burakov	1fd3bcf3f9	vfio: document multiprocess limitation for container API Currently, there is no support for sharing custom VFIO containers between multiple processes, but it is not documented. Document this limitation. Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-03-29 00:07:16 +01:00
Thomas Monjalon	3a1a885e03	eal: remove redundant atomic API description Atomic functions are described in doxygen of the file lib/librte_eal/common/include/generic/rte_atomic.h The copies in arch-specific files are redundant and confuse readers about the genericity of the API. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2019-03-28 23:52:53 +01:00
Dekel Peled	8015c5593a	eal/ppc: fix global memory barrier From previous patch description: "to improve performance on PPC64, use light weight sync instruction instead of sync instruction." Excerpt from IBM doc [1], section "Memory barrier instructions": "The second form of the sync instruction is light-weight sync, or lwsync. This form is used to control ordering for storage accesses to system memory only. It does not create a memory barrier for accesses to device memory." This patch removes the use of lwsync, so calls to rte_wmb() and rte_rmb() will provide correct memory barrier to ensure order of accesses to system memory and device memory. [1] https://www.ibm.com/developerworks/systems/articles/powerpc.html Fixes: `d23a6bd04d` ("eal/ppc: fix memory barrier for IBM POWER") Cc: stable@dpdk.org Signed-off-by: Dekel Peled <dekelp@mellanox.com>	2019-03-28 23:48:28 +01:00
Michał Mirosław	a1c6b70786	mem: count overcommit hugepages as available With nr_overcommit_hugepages > 0 application may be able to allocate hugepages even when free_hugepages == 0. Take this into account when counting available hugepages. Signed-off-by: Michał Mirosław <michal.miroslaw@atendesoftware.pl> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-03-28 23:33:50 +01:00
Anatoly Burakov	034f1fb616	mem: attempt multiple hugepage allocations at init When requesting memory with ``-m`` or ``--socket-mem`` flags, currently the init will fail if the requested memory amount was bigger than any one memseg list, even if total amount of available memory was sufficient. Fix this by making EAL to attempt to allocate pages multiple times, until we either fulfill our memory requirements, or run out of hugepages to allocate. Bugzilla ID: 95 Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-03-28 23:28:58 +01:00
Anatoly Burakov	bec5625588	mem: improve best-effort allocation Previously, when using non-exact allocation, we were requesting N pages to be allocated, but allowed the memory subsystem to allocate less than requested. However, we were still expecting to see N contigous free pages in the memseg list. This presents a problem because there is no way to try and allocate as many pages as possible, even if there isn't enough contiguous free entries in the list. To address this, use the new "find biggest" fbarray API's when allocating non-exact number of pages. This way, we will first check how many entries in the list are actually available, and then try to allocate up to that number. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-03-28 23:28:54 +01:00
Anatoly Burakov	7353ee7344	fbarray: add API to find biggest used or free chunks Currently, while there is a way to find total amount of used/free space in an fbarray, there is no way to find biggest contiguous chunk. Add such API, as well as unit tests to test this API. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-03-28 23:28:52 +01:00
Anatoly Burakov	5b61c62cfd	fbarray: add internal tailq for mapped areas Currently, there are numerous reliability issues with fbarray, such as: - There is no way to prevent attaching to overlapping memory areas - There is no way to prevent double-detach - Failed destroy leaves fbarray in an invalid state (fbarray itself is valid, but its backing memory area is already detached) In addition, on FreeBSD, doing mmap() on a file descriptor does not keep the lock, so we also need to store the fd in order to keep the lock. This patch improves upon fbarray to address both of these issues by adding an internal tailq to track allocated areas and their respective file descriptors. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-03-28 23:28:50 +01:00
Nikhil Rao	db9f4430c2	service: fix parameter type for attribute The type of value parameter to rte_service_attr_get should be uint64_t *, since the attributes are of type uint64_t. Fixes: `4d55194d76` ("service: add attribute get function") Signed-off-by: Nikhil Rao <nikhil.rao@intel.com> Reviewed-by: Gage Eads <gage.eads@intel.com> Reviewed-by: Rami Rosen <ramirose@gmail.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2019-03-28 21:07:48 +01:00
Joyce Kong	ca49b92079	ticketlock: enable generic ticketlock on all arch Let all architectures use generic ticketlock implementation. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2019-03-28 15:00:11 +01:00
Joyce Kong	184104fc61	ticketlock: introduce fair ticket based locking The spinlock implementation is unfair, some threads may take locks aggressively while leaving the other threads starving for long time. This patch introduces ticketlock which gives each waiting thread a ticket and they can take the lock one by one. First come, first serviced. This avoids starvation for too long time and is more predictable. Suggested-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2019-03-28 14:58:49 +01:00
Joyce Kong	e8af2f1f11	rwlock: reimplement with atomic builtins The __sync builtin based implementation generates full memory barriers ('dmb ish') on Arm platforms. Using C11 atomic builtins to generate one way barriers. Here is the assembly code of __sync_compare_and_swap builtin. __sync_bool_compare_and_swap(dst, exp, src); 0x000000000090f1b0 <+16>: e0 07 40 f9 ldr x0, [sp, #8] 0x000000000090f1b4 <+20>: e1 0f 40 79 ldrh w1, [sp, #6] 0x000000000090f1b8 <+24>: e2 0b 40 79 ldrh w2, [sp, #4] 0x000000000090f1bc <+28>: 21 3c 00 12 and w1, w1, #0xffff 0x000000000090f1c0 <+32>: 03 7c 5f 48 ldxrh w3, [x0] 0x000000000090f1c4 <+36>: 7f 00 01 6b cmp w3, w1 0x000000000090f1c8 <+40>: 61 00 00 54 b.ne 0x90f1d4 <rte_atomic16_cmpset+52> // b.any 0x000000000090f1cc <+44>: 02 fc 04 48 stlxrh w4, w2, [x0] 0x000000000090f1d0 <+48>: 84 ff ff 35 cbnz w4, 0x90f1c0 <rte_atomic16_cmpset+32> 0x000000000090f1d4 <+52>: bf 3b 03 d5 dmb ish 0x000000000090f1d8 <+56>: e0 17 9f 1a cset w0, eq // eq = none Fixes: `af75078fec` ("first public release") Cc: stable@dpdk.org Signed-off-by: Gavin Hu <gavin.hu@arm.com> Signed-off-by: Joyce Kong <joyce.kong@arm.com> Tested-by: Joyce Kong <joyce.kong@arm.com> Acked-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2019-03-28 11:47:05 +01:00
Gavin Hu	453d8f7366	spinlock: reimplement with atomic one-way barrier The __sync builtin based implementation generates full memory barriers ('dmb ish') on Arm platforms. Using C11 atomic builtins to generate one way barriers. Here is the assembly code of __sync_compare_and_swap builtin. __sync_bool_compare_and_swap(dst, exp, src); 0x000000000090f1b0 <+16>: e0 07 40 f9 ldr x0, [sp, #8] 0x000000000090f1b4 <+20>: e1 0f 40 79 ldrh w1, [sp, #6] 0x000000000090f1b8 <+24>: e2 0b 40 79 ldrh w2, [sp, #4] 0x000000000090f1bc <+28>: 21 3c 00 12 and w1, w1, #0xffff 0x000000000090f1c0 <+32>: 03 7c 5f 48 ldxrh w3, [x0] 0x000000000090f1c4 <+36>: 7f 00 01 6b cmp w3, w1 0x000000000090f1c8 <+40>: 61 00 00 54 b.ne 0x90f1d4 <rte_atomic16_cmpset+52> // b.any 0x000000000090f1cc <+44>: 02 fc 04 48 stlxrh w4, w2, [x0] 0x000000000090f1d0 <+48>: 84 ff ff 35 cbnz w4, 0x90f1c0 <rte_atomic16_cmpset+32> 0x000000000090f1d4 <+52>: bf 3b 03 d5 dmb ish 0x000000000090f1d8 <+56>: e0 17 9f 1a cset w0, eq // eq = none The benchmarking results showed constant improvements on all available platforms: 1. Cavium ThunderX2: 126% performance; 2. Hisilicon 1616: 30%; 3. Qualcomm Falkor: 13%; 4. Marvell ARMADA 8040 with A72 cores on macchiatobin: 3.7% Here is the example test result on TX2: $sudo ./build/app/test -l 16-27 -- i RTE>>spinlock_autotest * spinlock_autotest without this patch * Test with lock on 12 cores... Core [16] Cost Time = 53886 us Core [17] Cost Time = 53605 us Core [18] Cost Time = 53163 us Core [19] Cost Time = 49419 us Core [20] Cost Time = 34317 us Core [21] Cost Time = 53408 us Core [22] Cost Time = 53970 us Core [23] Cost Time = 53930 us Core [24] Cost Time = 53283 us Core [25] Cost Time = 51504 us Core [26] Cost Time = 50718 us Core [27] Cost Time = 51730 us Total Cost Time = 612933 us * spinlock_autotest with this patch * Test with lock on 12 cores... Core [16] Cost Time = 18808 us Core [17] Cost Time = 29497 us Core [18] Cost Time = 29132 us Core [19] Cost Time = 26150 us Core [20] Cost Time = 21892 us Core [21] Cost Time = 24377 us Core [22] Cost Time = 27211 us Core [23] Cost Time = 11070 us Core [24] Cost Time = 29802 us Core [25] Cost Time = 15793 us Core [26] Cost Time = 7474 us Core [27] Cost Time = 29550 us Total Cost Time = 270756 us In the tests on ThunderX2, with more cores contending, the performance gain was even higher, indicating the __atomic implementation scales up better than __sync. Fixes: `af75078fec` ("first public release") Cc: stable@dpdk.org Signed-off-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Nipun Gupta <nipun.gupta@nxp.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2019-03-28 09:19:39 +01:00
Pavan Nikhilesh	5cbd14b3e5	eal: roundup TSC frequency when estimating When estimating tsc frequency using sleep/gettime round it up to the nearest multiple of 10Mhz for more accuracy. Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> Reviewed-by: Keith Wiles <keith.wiles@intel.com>	2019-03-28 00:45:16 +01:00
Pavan Nikhilesh	f56e551485	eal: add macro to align value to the nearest multiple Add macro to align value to the nearest multiple of the given value, resultant value might be greater than or less than the first parameter whichever difference is the lowest. Update unit test to include the new macro. Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>	2019-03-28 00:45:00 +01:00
Jakub Grajciar	0c7ce182a7	eal: add pending interrupt callback unregister use case: if callback is used to receive message form socket, and the message received is disconnect/error, this callback needs to be unregistered, but cannot because it is still active. With this patch it is possible to mark the callback to be unregistered once the interrupt process is done with this interrupt source. Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>	2019-03-27 18:53:47 +01:00
Kevin Traynor	c0d9052afb	eal/linux: fix log levels for pagemap reading failure Commit `cdc242f260` says: For Linux kernel 4.0 and newer, the ability to obtain physical page frame numbers for unprivileged users from /proc/self/pagemap was removed. Instead, when an IOMMU is present, simply choose our own DMA addresses instead. In this case the user still sees error messages, so adjust the log levels. Later, other checks will ensure that errors are logged in the appropriate cases. Fixes: `cdc242f260` ("eal/linux: support running as unprivileged user") Cc: stable@dpdk.org Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com>	2019-03-27 14:54:40 +01:00
Anatoly Burakov	929a91e99c	malloc: fix documentation of realloc function The documentation for rte_realloc claims that the resized area will always reside on the same NUMA node. This is not actually the case - while resized area will be on the same NUMA node, if resizing the area is not possible, then the memory will be reallocated using rte_malloc(), which can allocate memory on another NUMA node, depending on which lcore rte_realloc() was called from and which NUMA nodes have memory available. Fix the API doc to match the actual code of rte_realloc(). Fixes: `af75078fec` ("first public release") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-03-27 12:15:04 +01:00
Stephen Hemminger	24aa4f0fba	mem: poison memory when freed DPDK malloc library allows broken programs to work because the semantics of zmalloc and malloc are the same. This patch enables a more secure model which will catch (and crash) programs that reuse memory already freed if RTE_MALLOC_DEBUG is enabled. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-03-27 10:53:41 +01:00
Bruce Richardson	88f591d1db	eal: remove unneeded version logic The version number in the DPDK_VERSION file will never have an offset that needs to be subtracted, so remove that logic from the version string generation. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Luca Boccassi <bluca@debian.org>	2019-03-27 09:43:54 +01:00
Bruce Richardson	d320fe56bd	build: use version number from config file Since we have the version number in a separate file at the root level, we should not need to duplicate this in rte_version.h too. Best approach here is to move the macros for specifying the year/month/etc. parts from the version header file to the build config file - leaving the other utility macros for e.g. printing the version string, where they are. For "make", this is done by having a little bit of awk parse the version file and pass the results through to the preprocessor for the config generation stage. For "meson", this is done by parsing the version and adding it to the standard dpdk_conf object. In both cases, we need to append a large number - in this case "99", previously 16 in original code - to the version number when we want to do version number comparisons. Without this, the release version e.g. 19.05.0 will compare as less than it's RC's e.g. 19.05.0-rc4. With it, the comparison is correct as "19.05.0.99 > 19.05.0-rc4.99". Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Luca Boccassi <bluca@debian.org> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2019-03-27 09:43:47 +01:00
Tomasz Jozwiak	a7cece2ead	malloc: add NUMA-aware realloc function Currently, rte_realloc will not respect original allocation's NUMA node when memory cannot be resized, and there is no NUMA-aware equivalent of rte_realloc. This patch adds such a function. The new API will ensure that reallocated memory stays on requested NUMA node, as well as allow moving allocated memory to a different NUMA node. Signed-off-by: Tomasz Jozwiak <tomaszx.jozwiak@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-03-23 16:54:50 +01:00
Bruce Richardson	218c4e68c1	mk: use linux and freebsd in config names Rather than using linuxapp and bsdapp everywhere, we can change things to use the, more readable, terms "linux" and "freebsd" in our build configs. Rather than renaming the configs we can just duplicate the existing ones with the new names using symlinks, and use the new names exclusively internally. ["make showconfigs" also only shows the new names to keep the list short] The result is that backward compatibility is kept fully but any new builds or development can be done using the newer names, i.e. both "make config T=x86_64-native-linuxapp-gcc" and "T=x86_64-native-linux-gcc" work. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2019-03-12 23:05:06 +01:00
Bruce Richardson	5fbc1d498f	build/freebsd: rename macro BSDPAPP to FREEBSD Rename the macro and all instances in DPDK code, but keep a copy of the old macro defined for legacy code linking against DPDK Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2019-03-12 23:01:14 +01:00
Bruce Richardson	742bde12f3	build/linux: rename macro from LINUXAPP to LINUX Rename the macro to make things shorter and more comprehensible. For both meson and make builds, keep the old macro around for backward compatibility. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2019-03-12 17:31:22 +01:00
Bruce Richardson	91d7846ce6	eal/linux: rename linuxapp to linux The term "linuxapp" is a legacy one, but just calling the subdirectory "linux" is just clearer for all concerned. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2019-03-12 17:31:13 +01:00
Bruce Richardson	25c99fbd68	eal/bsd: rename bsdapp to freebsd The term "bsdapp" is a legacy one, but just calling the subdirectory "freebsd" is just clearer for all concerned. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2019-03-12 17:30:20 +01:00
David Marchand	1598c72959	eal: fix core list validation with disabled cores -l and -c options are two ways to select the cores used by DPDK. Their format differs, but the checks on the selected cores are the same. Use an intermediate array to separate the specific parsing checks from the common consistency checks. The parsing functions now concentrate on validating the passed string and do nothing more. We can report all invalid core indexes rather than only the first error. In the error log message, reporting [0, cfg->lcore_count - 1] as a valid range is then wrong when the core list is not continuous. Example on my 8 cpus laptop with core 2 and 6 disabled. echo 0 > /sys/devices/system/cpu/cpu2/online echo 0 > /sys/devices/system/cpu/cpu6/online Before: ./master/app/testpmd -l 0-7 --no-huge -m 512 -- --total-num-mbufs 2048 EAL: Detected 6 lcore(s) EAL: Detected 1 NUMA nodes EAL: invalid core list, please check core numbers are in [0, 5] range ... After: ./master/app/testpmd -l 0-7 --no-huge -m 512 -- --total-num-mbufs 2048 EAL: Detected 6 lcore(s) EAL: Detected 1 NUMA nodes EAL: lcore 2 unavailable EAL: lcore 6 unavailable EAL: invalid core list, please check specified cores are part of 0-1,3-5,7 ... Fixes: `d888cb8b96` ("eal: add core list input format") Fixes: `b38693b612` ("eal: fix core number validation") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com>	2019-03-07 21:22:53 +01:00
David Marchand	33df941d79	eal: remove dead code in core list parsing We don't need to look for trailing spaces. This is a copy/paste block from eal_parse_coremask(). Remove it and the associated comment. Fixes: `d888cb8b96` ("eal: add core list input format") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com>	2019-03-07 21:22:48 +01:00
David Marchand	c3568ea376	eal: restrict control threads to startup CPU affinity Spawning the ctrl threads on anything that is not part of the eal coremask is not that polite to the rest of the system, especially when you took good care to pin your processes on cpu resources with tools like taskset (linux) / cpuset (freebsd). Rather than introduce yet another eal options to control on which cpu those ctrl threads are created, let's take the startup cpu affinity as a reference and remove the eal coremask from it. If no cpu is left, then we default to the master core. The cpuset is computed once at init before the original cpu affinity is lost. Introduced a RTE_CPU_AND macro to abstract the differences between linux and freebsd respective macros. Examples in a 4 cores FreeBSD vm: $ ./build/app/testpmd -l 2,3 --no-huge --no-pci -m 512 \ -- -i --total-num-mbufs=2048 $ procstat -S 1057 PID TID COMM TDNAME CPU CSID CPU MASK 1057 100131 testpmd - 2 1 2 1057 100140 testpmd eal-intr-thread 1 1 0-1 1057 100141 testpmd rte_mp_handle 1 1 0-1 1057 100142 testpmd lcore-slave-3 3 1 3 $ cpuset -l 1,2,3 ./build/app/testpmd -l 2,3 --no-huge --no-pci -m 512 \ -- -i --total-num-mbufs=2048 $ procstat -S 1061 PID TID COMM TDNAME CPU CSID CPU MASK 1061 100131 testpmd - 2 2 2 1061 100144 testpmd eal-intr-thread 1 2 1 1061 100145 testpmd rte_mp_handle 1 2 1 1061 100147 testpmd lcore-slave-3 3 2 3 $ cpuset -l 2,3 ./build/app/testpmd -l 2,3 --no-huge --no-pci -m 512 \ -- -i --total-num-mbufs=2048 $ procstat -S 1065 PID TID COMM TDNAME CPU CSID CPU MASK 1065 100131 testpmd - 2 2 2 1065 100148 testpmd eal-intr-thread 2 2 2 1065 100149 testpmd rte_mp_handle 2 2 2 1065 100150 testpmd lcore-slave-3 3 2 3 Fixes: `d651ee4919` ("eal: set affinity for control threads") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Olivier Matz <olivier.matz@6wind.com>	2019-03-07 19:21:28 +01:00
David Marchand	759b9be661	eal: fix control threads pinnning pthread_setaffinity_np returns a >0 value on error. We could end up letting the ctrl threads on the current process cpu affinity. Fixes: `d651ee4919` ("eal: set affinity for control threads") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Olivier Matz <olivier.matz@6wind.com>	2019-03-07 19:13:48 +01:00
David Marchand	b206376438	eal: fix check when retrieving current CPU affinity pthread_getaffinity_np returns a >0 value when failing. This is mainly for the sake of correctness. The only case where it could fail is when passing an incorrect cpuset size wrt to the kernel. Fixes: `2eba8d21f3` ("eal: restrict cores auto detection") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Rami Rosen <ramirose@gmail.com>	2019-03-07 16:37:14 +01:00
Stephen Hemminger	e7d798172f	eal: remove legacy PMD log macro The RTE_PMD_DEBUG_TRACE was only enabled for EVENTDEV_DEBUG and that configuration is now handled by RTE_EDEV_LOG macros. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2019-03-01 18:17:36 +01:00
Stephen Hemminger	e37aad5ed3	eal: drop unused macros for primary process check No usage in current DPDK code base. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2019-03-01 18:17:36 +01:00
Luca Boccassi	a9933bb1de	build: improve libbsd dependency handling Use dependency() instead of manual append to ldflags. Move libbsd inclusion to librte_eal, so that all other libraries and PMDs will inherit it. Signed-off-by: Luca Boccassi <bluca@debian.org> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2019-02-27 12:28:03 +01:00
Bruce Richardson	b543d1a715	compat: merge compat library into EAL Since compat library is only a single header, we can easily move it into the EAL common headers instead of tracking it separately. The downside of this is that it becomes a little more difficult to have any libs that are built before EAL depend on it. Thankfully, this is not a major problem as the only library which uses rte_compat.h and is built before EAL (kvargs) already has the path to the compat.h header file explicitly called out as an include path. However, to ensure that we don't hit problems later with this, we can add EAL common headers folder to the global include list in the meson build which means that all common headers can be safely used by all libraries, no matter what their build order. As a side-effect, this patch also fixes an issue with building on BSD using meson, due to compat lib no longer needing to be listed as a dependency. Fixes: `a8499f65a1` ("log: add missing experimental tag") Cc: stable@dpdk.org Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Tested-by: David Marchand <david.marchand@redhat.com> Tested-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2019-02-25 16:03:31 +01:00
Bruce Richardson	146e57627f	eal: support strlcat function Add the strlcat function to DPDK to exist alongside the strlcpy one. While strncat is generally safe for use for concatenation, the API for the strlcat function is perhaps a little nicer to use, and supports truncation detection. See commit `5364de644a` ("eal: support strlcpy function") for more details on the function selection logic, since we only should be using the DPDK-provided version when no system-provided version is present. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2019-02-12 10:04:28 +01:00
Thomas Monjalon	cae0d722d6	version: 19.05-rc0 Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: John McNamara <john.mcnamara@intel.com>	2019-02-06 11:20:06 +01:00
Thomas Monjalon	8b937bae24	version: 19.02.0 Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2019-02-01 15:25:17 +01:00
Thomas Monjalon	a2f9c0d417	version: 19.02-rc4 Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2019-01-28 02:53:53 +01:00
Ilya Maximets	0a703f0f36	eal/linux: fix parsing zero socket memory and limits Modern memory mode allowes to not reserve any memory by the '--socket-mem' option. i.e. it could be possible to specify zero preallocated memory like '--socket-mem 0'. Also, it should be possible to configure unlimited memory allocations by '--socket-limit 0'. Both cases are impossible now and blocks starting the DPDK application: ./dpdk-app --socket-limit 0 <...> EAL: invalid parameters for --socket-limit EAL: Invalid 'command line' arguments. Unable to initialize DPDK: Invalid argument Fixes: `6b42f75632` ("eal: enable non-legacy memory mode") Cc: stable@dpdk.org Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-01-23 23:02:07 +01:00
Anatoly Burakov	47f4fe0595	vfio: allow secondary process to query IOMMU type It is only possible to know IOMMU type of a given VFIO container by attempting to initialize it. Since secondary process never attempts to set up VFIO container itself (because they're shared between primary and secondary), it never knows which IOMMU type the container is using, and never sets up the appropriate config structures. This results in inability to perform DMA mappings in secondary process. Fix this by allowing secondary process to query IOMMU type of primary's default container at device initialization. Note that this fix is assuming we're only interested in default container. Bugzilla ID: 174 Fixes: `6bcb7c95fe` ("vfio: share default container in multi-process") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>	2019-01-21 16:13:59 +01:00
Thomas Monjalon	84a1d4a873	version: 19.02-rc3 Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2019-01-20 22:39:20 +01:00
Ilya Maximets	6406d70561	eal: fix clang build with intrinsics forced This fixes x86_64-native-linuxapp-clang build with CONFIG_RTE_FORCE_INTRINSICS=y: include/generic/rte_atomic.h:218:9: error: implicit declaration of function '__atomic_exchange_2' is invalid in C99 [-Werror,-Wimplicit-function-declaration] include/generic/rte_atomic.h:501:9: error: implicit declaration of function '__atomic_exchange_4' is invalid in C99 [-Werror,-Wimplicit-function-declaration] include/generic/rte_atomic.h:783:9: error: implicit declaration of function '__atomic_exchange_8' is invalid in C99 [-Werror,-Wimplicit-function-declaration] We didn't caught this issue previously on other platforms because CONFIG_RTE_FORCE_INTRINSICS enabled by default only for armv8. Fixes: `7bdccb9307` ("eal: fix ARM build with clang") Cc: stable@dpdk.org Signed-off-by: Ilya Maximets <i.maximets@samsung.com>	2019-01-17 18:39:55 +01:00
Anatoly Burakov	2383d8e909	eal: check string parameter lengths When specifying parameters such as hugefile prefix from the command-line, it is possibly to supply an empty string. This may lead to various problems: for example, if hugefile prefix is empty, the runtime config path construction may end up looking like "/var/run/dpdk//_config", which will technically work, but is wrong and places files in the wrong place. To fix it, check lengths of such user-specified parameters for hugefile prefix, as well as hugepage dir and user-specified mbuf pool ops string. Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-01-17 18:39:55 +01:00
David Marchand	7b55015e14	eal: fix out of bound access when no CPU available In the unlikely case when the dpdk application is started with no cpu available in the [0, RTE_MAX_LCORE - 1] range, the master_lcore is automatically chosen as RTE_MAX_LCORE which triggers an out of bound access. Either you have a crash then, or the initialisation fails later when trying to pin the master thread on it. In my test, with RTE_MAX_LCORE == 2: $ taskset -c 2 ./master/app/testpmd --no-huge -m 512 --log-level *:debug [...] EAL: pthread_setaffinity_np failed PANIC in eal_thread_init_master(): cannot set affinity 7: [./master/app/testpmd() [0x47f629]] Bugzilla ID: 19 Fixes: `2eba8d21f3` ("eal: restrict cores auto detection") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com>	2019-01-17 18:39:55 +01:00
Hari Kumar Vemula	b38693b612	eal: fix core number validation When incorrect core value or range provided, as part of -l command line option, a crash occurs. Added valid range checks to fix the crash. Added ut check for negative core values. Added unit test case for invalid core number range. Fixes: `d888cb8b96` ("eal: add core list input format") Cc: stable@dpdk.org Signed-off-by: Hari Kumar Vemula <hari.kumarx.vemula@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com>	2019-01-17 17:22:04 +01:00
Thomas Monjalon	05853e1784	version: 19.02-rc2 Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2019-01-15 03:08:43 +01:00
Gaetan Rivet	68c4768d36	eal: return error when option register fails Make rte_option_register return a negative value when an error occur. Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>	2019-01-15 02:40:40 +01:00
Gaetan Rivet	e48839afff	eal: improve option API documentation Use doxygen to describe the main structure and describe a little more why it exists. Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>	2019-01-15 02:40:40 +01:00
Gaetan Rivet	d3bdefef22	eal: fix log level of error in option register INFO is not correct when logging an error. Fixes: `2395332798` ("eal: add option register infrastructure") Cc: stable@dpdk.org Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>	2019-01-15 02:40:40 +01:00
Gaetan Rivet	f87471c3f1	eal: check against common option on register Not only check against other registered options, but also common EAL options. This will mitigate user confusion. Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>	2019-01-15 02:40:40 +01:00
Gaetan Rivet	42f6dbda09	eal: rename option name field option->opt_* is redundant. The field should also be constant. Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>	2019-01-15 02:40:40 +01:00
Gaetan Rivet	b8fe14b7cf	eal: add option usage string Add a usage string field in rte_option, allowing to display help to the user and describe which options are currently available. Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>	2019-01-15 02:40:40 +01:00
Gaetan Rivet	ce6448fa01	eal: do not use static option iterator This is rather weird. Someone should have caught that during review. Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>	2019-01-15 02:40:40 +01:00
Gaetan Rivet	4c3bf26c19	eal: use bare option string as name Current options name can be passed with arbitrary format. Force the use of "--" prefix and thus POSIX long options format. This restricts the ability to introduce surprising options and will help future additional checks. Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>	2019-01-15 02:40:40 +01:00
Ilya Maximets	9726aa9907	eal: fix build of external app with clang on armv8 In case DPDK built using GCC, RTE_TOOLCHAIN_CLANG is not defined. But 'rte_atomic.h' is a generic header that included to the external apps like OVS while building with DPDK. As a result, clang build of OVS fails on armv8 if DPDK built using gcc: include/generic/rte_atomic.h:215:9: error: implicit declaration of function '__atomic_exchange_2' is invalid in C99 include/generic/rte_atomic.h:494:9: error: implicit declaration of function '__atomic_exchange_4' is invalid in C99 include/generic/rte_atomic.h:772:9: error: implicit declaration of function '__atomic_exchange_8' is invalid in C99 We need to check for current compiler, not the compiler used for DPDK build. Fixes: `7bdccb9307` ("eal: fix ARM build with clang") Cc: stable@dpdk.org Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2019-01-14 19:49:48 +01:00
Anatoly Burakov	ba07193e03	mem: fix storing old policy The original code was supposed to overwrite the value pointed to by the pointer, but the new one is instead overwriting the pointer value itself, which has no effect outside that function. Fix it by adding a pointer dereference. Fixes: `582bed1e1d` ("mem: support mapping hugepages at runtime") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-01-14 15:50:52 +01:00
Anatoly Burakov	199629022c	mem: fix variable shadowing A local variable ``flags`` was shadowing another variable from outer scope. Fix this by renaming the variable and make it const. Fixes: `c127be93f6` ("mem: support using memfd segments for in-memory mode") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-01-14 15:42:40 +01:00
Anatoly Burakov	c0f8d50d1c	vfio: do not unregister callback in secondary process Callbacks are only registered in the primary, so do not attempt to unregister callbacks in secondary processes. Fixes: `43e4631371` ("vfio: support memory event callbacks") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-01-14 15:31:51 +01:00
Anatoly Burakov	97257eee2d	eal/bsd: remove clean up of files at startup On FreeBSD, closing the file descriptor drops the lock even if the file descriptor was mmap'ed. This leads to the cleanup at the end of EAL init to remove fbarray files that are still in use by the process itself. However, instead of working around this issue, we can take advantage of the fact that FreeBSD doesn't really create any per-process files in the first place, so no cleanup is actually needed. Fixes: `0a529578f1` ("eal: clean up unused files on initialization") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-01-14 15:23:12 +01:00
Anatoly Burakov	66d9f61de0	eal: fix strdup usages in internal config Currently, we use strdup in a few places to store command-line parameter values for certain internal config values. There are several issues with that. First of all, they're never freed, so memory ends up leaking either after EAL exit, or when these command-line options are supplied multiple times. Second of all, they're defined as `const char `, so they cannot* be freed even if we wanted to. Finally, strdup may return NULL, which will be stored in the config. For most fields, NULL is a valid value, but for the default prefix, the value is always expected to be valid. To fix all of this, three things are done. First, we change the definitions of these values to `char ` as opposed to `const char `. This does not break the ABI, and previous code assumes constness (which is more restrictive), so it's safe to do so. Then, fix all usages of strdup to check return value, and add a cleanup function that will free the memory occupied by these strings, as well as freeing them before assigning a new value to prevent leaks when parameter is specified multiple times. And finally, add an internal API to query hugefile prefix, so that, absent of a valid value, a default value will be returned, and also fix up all usages of hugefile prefix to use this API instead of accessing hugefile prefix directly. Bugzilla ID: 108 Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-01-14 15:05:19 +01:00
Thomas Monjalon	7637518249	version: 19.02-rc1 Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-12-23 00:21:13 +01:00
Anatoly Burakov	ba731ea1dd	malloc: fix deadlock when reading stats Currently, malloc statistics and external heap creation code use memory hotplug lock as a way to synchronize accesses to heaps (as in, locking the hotplug lock to prevent list of heaps from changing under our feet). At the same time, malloc statistics code will also lock the heap because it needs to access heap data and does not want any other thread to allocate anything from that heap. In such scheme, it is possible to enter a deadlock with the following sequence of events: thread 1 thread 2 rte_malloc() rte_malloc_dump_stats() take heap lock take hotplug lock failed to allocate, attempt to take hotplug lock attempt to take heap lock Neither thread will be able to continue, as both of them are waiting for the other one to drop the lock. Adding an additional lock will require an ABI change, so instead of that, make malloc statistics calls thread-unsafe with respect to creating/destroying heaps. Fixes: `72cf92b318` ("malloc: index heaps using heap ID rather than NUMA node") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-21 15:26:43 +01:00
Qi Zhang	85d6815fa6	eal: close multi-process socket during cleanup When secondary process quit, the mp_socket* file still exist, that cause rte_mp_request_sync fail when try to send message on a floating socket. The patch fix the issue by introduce a function rte_mp_channel_cleanup. This function will be called by rte_eal_cleanup and it will close the mp socket and delete the mp_socket* file. Fixes: `bacaa27540` ("eal: add channel for multi-process communication") Cc: stable@dpdk.org Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>	2018-12-21 01:15:41 +01:00
Anatoly Burakov	9d65053761	eal: add 64-bit log2 function Add missing implementation for 64-bit log2 function, and extend the unit test to test this new function. Also, remove duplicate reimplementation of this function from testpmd and memalloc. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-21 00:23:49 +01:00
Anatoly Burakov	43c9e6c205	eal: add 64-bit fls function Add missing implementation for 64-bit fls function, and extend unit test to test the new function as well. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-21 00:17:43 +01:00
Anatoly Burakov	4e261f5519	eal: add 64-bit bsf and 32-bit safe bsf functions Add an rte_bsf64 function that follows the convention of existing rte_bsf32 function. Also, add missing implementation for safe version of rte_bsf32, and implement unit tests for all recently added bsf varieties. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-21 00:00:58 +01:00
Anatoly Burakov	cc7ddb00da	bitmap: remove deprecated 64-bit bsf function The function rte_bsf64 was deprecated in a previous release, so remove the function, and the deprecation notice associated with it. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-20 23:44:56 +01:00
Anatoly Burakov	307315d457	eal: fix runtime directory cleanup in noshconf mode When using --no-shconf or --in-memory modes, there is no runtime directory to be created, so there is no point in attempting to clean it. Fixes: `0a529578f1` ("eal: clean up unused files on initialization") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-20 23:27:35 +01:00
Anatoly Burakov	c75f535ac5	mem: use memfd for no-huge mode When running in no-huge mode, we anonymously allocate our memory. While this works for regular NICs and vdev's, it's not suitable for memory sharing scenarios such as virtio with vhost_user backend. To fix this, allocate no-huge memory using memfd, and register it with memalloc just like any other memseg fd. This will enable using rte_memseg_get_fd() API with --no-huge EAL flag. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-12-20 22:58:25 +01:00
Anatoly Burakov	df7722c75b	mem: allow setting up segment list fd Currently, only segment fd's for multi-file segments are supported, while for memfd-backed no-huge memory we need single-file segments mode. Add support for single-file segments in the internal API. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-12-20 22:55:56 +01:00
Anatoly Burakov	d75eea3145	mem: check for memfd support in segment fd API If memfd support was not compiled, or hugepage memfd support is not available at runtime, the API will now return proper error code, indicating that this API is unsupported. This changes the API, so document the changes. Fixes: `41dbdb6872` ("mem: add external API to retrieve page fd") Fixes: `3a44687139` ("mem: allow querying offset into segment fd") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-12-20 22:54:37 +01:00
Anatoly Burakov	525670756a	mem: fix segment fd API error code for external segment Segment fd API does not support getting segment fd's from externally allocated memory, so return proper error code on any attempts to do so. This changes API behavior, so document the change as well. Fixes: `5282bb1c36` ("mem: allow memseg lists to be marked as external") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Tiwei Bie <tiwei.bie@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-12-20 22:51:49 +01:00
Anatoly Burakov	bed7941886	mem: allow usage of non-heap external memory in multiprocess Add multiprocess support for externally allocated memory areas that are not added to DPDK heap (and add relevant doc sections). Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>	2018-12-20 18:14:55 +01:00
Anatoly Burakov	950e8fb4e1	mem: allow registering external memory areas The general use-case of using external memory is well covered by existing external memory API's. However, certain use cases require manual management of externally allocated memory areas, so this memory should not be added to the heap. It should, however, be added to DPDK's internal structures, so that API's like ``rte_virt2memseg`` would work on such external memory segments. This commit adds such an API to DPDK. The new functions will allow to register and unregister externally allocated memory areas, as well as documentation for them. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>	2018-12-20 18:14:55 +01:00
Anatoly Burakov	39ff94e71c	malloc: separate destroying memseg list and heap data Currently, destroying external heap chunk and its memseg list is part of one process. When we will gain the ability to unregister external memory from DPDK that doesn't have any heap structures associated with it, we need to be able to find and destroy memseg lists as well as heap data separately. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>	2018-12-20 18:10:08 +01:00
Anatoly Burakov	0f526d674f	malloc: separate creating memseg list and malloc heap Currently, creating external malloc heap involves also creating a memseg list backing that malloc heap. We need to have them as separate functions, to allow creating memseg lists without creating a malloc heap. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>	2018-12-20 18:09:55 +01:00
Anatoly Burakov	646e5260ee	malloc: make alignment requirements more stringent The external heaps API already implicitly expects start address of the external memory area to be page-aligned, but it is not enforced or documented. Fix this by implementing additional parameter checks at memory add call, and document the page alignment requirement explicitly. Fixes: `7d75c31014` ("malloc: allow adding memory to named heaps") Cc: stable@dpdk.org Suggested-by: Yongseok Koh <yskoh@mellanox.com> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>	2018-12-20 15:34:03 +01:00
Anatoly Burakov	b3e735e16e	malloc: fix duplicate mem event notification We already trigger a mem event notification inside the walk function, no need to do it twice. Fixes: `f32c7c9de9` ("malloc: enable event callbacks for external memory") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-20 15:28:55 +01:00
Seth Howell	fba0ca2274	malloc: notify primary process about hotplug in secondary When secondary process hotplugs memory, it sends a request to primary, which then performs the real mmap() and sends sync requests to all secondary processes. Upon receiving such sync request, each secondary process will notify the upper layers of hotplugged memory (and will call all locally registered event callbacks). In the end we'll end up with memory event callbacks fired in all the processes except the primary, which is a bug. This gets critical if memory is hotplugged while a VFIO device is attached, as the VFIO memory registration - which is done from a memory event callback present in the primary process only - is never called. After this patch, a primary process fires memory event callbacks before secondary processes start their synchronizations - both for hotplug and hotremove. Fixes: `07dcbfe010` ("malloc: support multiprocess memory hotplug") Cc: stable@dpdk.org Signed-off-by: Seth Howell <seth.howell@intel.com> Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-20 15:25:34 +01:00
Yongseok Koh	6d09256148	malloc: fix finding maximum contiguous IOVA size malloc_elem_find_max_iova_contig() could return invalid size due to a missing sanity check. The following gdb output shows how 'cur_size' can be invalid in find_biggest_element(). (gdb) p/x cur_size $4 = 0xffffffffffe42900 (gdb) p elem $1 = (struct malloc_elem ) 0x12e842000 (gdb) p elem $2 = {heap = 0x7ffff7ff387c, prev = 0x12e831fc0, next = 0x12e842900, free_list = {le_next = 0x109538000, le_prev = 0x7ffff7ff3894}, msl = 0x7ffff7ff107c, state = ELEM_FREE, pad = 0, size = 2304} (gdb) p *elem->msl $5 = {{base_va = 0x100200000, addr_64 = 4297064448}, page_sz = 2097152, socket_id = 0, version = 790, len = 17179869184, external = 0, memseg_arr = {name = "memseg-2048k-0-0", '\000' <repeats 47 times>, count = 493, len = 8192, elt_sz = 48, data = 0x10002e000, rwlock = {cnt = 0}}} Fixes: `9fe6bceafd` ("malloc: add finding biggest free IOVA-contiguous element") Cc: stable@dpdk.org Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-20 15:17:48 +01:00
Jim Harris	476c847ab6	malloc: add option --match-allocations SPDK uses the rte_mem_event_callback_register API to create RDMA memory regions (MRs) for newly allocated regions of memory. This is used in both the SPDK NVMe-oF target and the NVMe-oF host driver. DPDK creates internal malloc_elem structures for these allocated regions. As users malloc and free memory, DPDK will sometimes merge malloc_elems that originated from different allocations that were notified through the registered mem_event callback routine. This results in subsequent allocations that can span across multiple RDMA MRs. This requires SPDK to check each DPDK buffer to see if it crosses an MR boundary, and if so, would have to add considerable logic and complexity to describe that buffer before it can be accessed by the RNIC. It is somewhat analagous to rte_malloc returning a buffer that is not IOVA-contiguous. As a malloc_elem gets split and some of these elements get freed, it can also result in DPDK sending an RTE_MEM_EVENT_FREE notification for a subset of the original RTE_MEM_EVENT_ALLOC notification. This is also problematic for RDMA memory regions, since unregistering the memory region is all-or-nothing. It is not possible to unregister part of a memory region. To support these types of applications, this patch adds a new --match-allocations EAL init flag. When this flag is specified, malloc elements from different hugepage allocations will never be merged. Memory will also only be freed back to the system (with the requisite memory event callback) exactly as it was originally allocated. Since part of this patch is extending the size of struct malloc_elem, we also fix up the malloc autotests so they do not assume its size exactly fits in one cacheline. Signed-off-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-20 13:01:08 +01:00
Gao Feng	cc80353223	memzone: fix unlock on initialization failure The RTE_PROC_PRIMARY error handler lost the unlock statement in the current codes. Now unlock and return in one place to fix it. Fixes: `49df3db848` ("memzone: replace memzone array with fbarray") Cc: stable@dpdk.org Signed-off-by: Gao Feng <davidfgao@tencent.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-20 12:24:14 +01:00
Gao Feng	32fa7f8913	eal: check peer allocation in multi-process request Add the check for null peer pointer like the bundle pointer in the mp request handler. They should follow same style. And add some logs for nomem cases. Signed-off-by: Gao Feng <davidfgao@tencent.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-20 00:01:28 +01:00
Gao Feng	e14bc93e8f	eal: fix leak on multi-process request error When rte_eal_alarm_set failed, need to free the bundle mem in the error handler of handle_primary_request and handle_secondary_request. Fixes: `244d513071` ("eal: enable hotplug on multi-process") Fixes: `ac9e4a1737` ("eal: support attach/detach shared device from secondary") Cc: stable@dpdk.org Signed-off-by: Gao Feng <davidfgao@tencent.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-20 00:01:28 +01:00
Gaetan Rivet	c9b413c3b1	eal: fix detection of duplicate option register Missing brackets around the if means that the loop will end at its first iteration. Fixes: `2395332798` ("eal: add option register infrastructure") Cc: stable@dpdk.org Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com>	2018-12-20 00:01:28 +01:00
Keith Wiles	e3b090f3da	eal: fix missing newline in a log Add a missing newline to a RTE_LOG message. Fixes: `2395332798` ("eal: add option register infrastructure") Cc: stable@dpdk.org Signed-off-by: Keith Wiles <keith.wiles@intel.com>	2018-12-20 00:01:28 +01:00
Konstantin Ananyev	d5b46fc363	rwlock: introduce try semantics Introduce rte_rwlock_read_trylock() and rte_rwlock_write_trylock(). Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com>	2018-12-19 20:56:11 +01:00
Anatoly Burakov	0a529578f1	eal: clean up unused files on initialization When creating process data structures, EAL will create many files in EAL runtime directory. Because we allow multiple secondary processes to run, each secondary process gets their own unique file. With many secondary processes running and exiting on the system, runtime directory will, over time, create enormous amounts of sockets, fbarray files and other stuff that just sits there unused because the process that allocated it has died a long time ago. This may lead to exhaustion of disk (or RAM) space in the runtime directory. Fix this by removing every unlocked file at initialization that matches either socket or fbarray naming convention. We cannot be sure of any other files, so we'll leave them alone. Also, remove similar code from mp socket code. We do it at the end of init, rather than at the beginning, because secondary process will use primary process' data structures even if the primary itself has died, and we don't want to remove those before we lock them. Bugzilla ID: 106 Cc: stable@dpdk.org Reported-by: Vipin Varghese <vipin.varghese@intel.com> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-12-19 04:12:30 +01:00
David Marchand	a8499f65a1	log: add missing experimental tag When rte_log_register_type_and_pick_level() has been introduced, it has been correctly added to the EXPERIMENTAL section of the eal map and the symbol itself has been marked at its definition. However, the declaration of this symbol in rte_log.h is missing the __rte_experimental tag. Because of this, a user can try to call this symbol without being aware this is an experimental api (neither compilation nor link warning). Fixes: `b22e77c026` ("eal: register log type and pick level from args") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2018-12-19 02:30:02 +01:00
Jeff Shaw	68687daff2	eal: remove unnecessary dirent.h include Prior to this patch, the two affected .c files include <dirent.h> unnecessarily. This commit removes the include lines. Signed-off-by: Jeff Shaw <jeffrey.b.shaw@intel.com> Reviewed-by: Rami Rosen <ramirose@gmail.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-12-19 01:29:36 +01:00
Thomas Monjalon	37d800031d	version: 19.02-rc0 Start version numbering for a new release cycle, and introduce a template file for release notes. The release notes comments are updated to mandate a scope label for API and ABI changes. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: John McNamara <john.mcnamara@intel.com>	2018-11-30 16:20:33 +00:00
Thomas Monjalon	0da7f445df	version: 18.11.0 Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-11-27 00:36:00 +01:00
Anatoly Burakov	e45088b1e1	mem: fix division by zero in no-NUMA mode When RTE_EAL_NUMA_AWARE_HUGEPAGES is set to "n", not all memtypes will be valid, because we skip some due to not supporting other NUMA nodes, leading to a division by zero error down the line because the necessary memtype fields weren't populated. Fix it by limiting number of memtypes to number of memtypes we have actually created. Fixes: `1dd342d0fd` ("mem: improve segment list preallocation") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: David Hunt <david.hunt@intel.com>	2018-11-26 15:35:46 +01:00
Thomas Monjalon	6cff3183c2	version: 18.11-rc5 Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-11-25 21:19:19 +01:00
Darek Stojaczyk	161419983d	eal: fix devargs reference after probing failure Even if a device failed to plug, it's still a device object that references the devargs. Those devargs will be freed automatically together with the device, but freeing them any earlier - like it's done in the hotplug error handling path right now - will give us a dangling pointer and a segfault scenario. Consider the following case: * secondary process receives the hotplug request IPC message * devargs are either created or updated * the bus is scanned * a new device object is created with the latest devargs * the device can't be plugged for whatever reason, bus->plug returns error * the devargs are freed, even though they're still referenced by the device object on the bus For PCI devices, the generic device name comes from a buffer within the devargs. Freeing those will make EAL segfault whenever the device name is checked. This patch just prevents the hotplug error handling path from removing the devargs when there's a device that references them. This is done by simply exiting early from the hotplug function. As mentioned in the beginning, those devargs will be freed later, together with the device itself. Fixes: `7e8b266501` ("eal: fix hotplug add / remove") Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-11-25 13:45:35 +01:00
Darek Stojaczyk	29bf7e93ba	eal: fix devargs leak on multi-process detach request Device detach triggered through IPC leaked some memory. It allocated a devargs objects just to use it for parsing the devargs string in order to retrieve the device name. Those devargs weren't passed anywhere and were never freed. First of all, let's put those devargs on the stack, so they doesn't need to be freed. Then free the additional arguments string as soon as it's allocated, because we won't need it. Fixes: `ac9e4a1737` ("eal: support attach/detach shared device from secondary") Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com>	2018-11-25 13:32:01 +01:00
Darek Stojaczyk	494db286f3	eal: fix multi-process hotplug if attached in secondary Consider the following scenario: 1) primary process (A) starts, probes the bus 2) a secondary process (B) starts, probes the bus 3) yet another secondary process (C) starts 4) (C) registers the pci driver and hotplugs the device * an IPC attach req is sent to the primary (A) * (A) ignores the -EEXIST from process-local probe * (A) propagates the request to all secondary processes * (B) responds with -EEXIST * (A) replies to the original request with the -EEXIST return code * the -EEXIST is returned back to the user, although the device was successfully attached both locally and in all other processes This patch makes the primary process reply with rc=0 even if there was another secondary process with the device already attached. The primary process already didn't reply with -EEXIST when the device was attached locally, so now this behavior is even more consistent. Looking by the code, this seems to be the originally intended behavior. Fixes: `ac9e4a1737` ("eal: support attach/detach shared device from secondary") Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com>	2018-11-25 13:27:17 +01:00
Darek Stojaczyk	d27eed3139	eal: fix multi-process hotplug if already probed When primary process receives an IPC attach request of a device that's already locally-attached, it doesn't setup its variables properly and is prone to segfaulting on a subsequent rollback. `ret = local_dev_probe(req->devargs, &dev)` The above function will set `dev` pointer to the proper device unless it returns with error. One of those errors is -EEXIST, which the hotplug function explicitly ignores. For -EEXIST, it proceeds with attaching the device and expects the dev pointer to be valid. This patch makes `local_dev_probe` set the dev pointer even if it returns -EEXIST. Fixes: `ac9e4a1737` ("eal: support attach/detach shared device from secondary") Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>	2018-11-25 13:22:51 +01:00
Darek Stojaczyk	5d36bf2bcd	eal: fix multi-process hotplug rollback If a device fails to attach before it's plugged, the subsequent rollback will still try to detach it, causing a segfault. Unplugging a device that wasn't plugged isn't really supported, so this patch adds an extra error check to prevent that from happening. While here, fix this also for normal (non-rollback) detach, which could also theoretically segfault on non-plugged device. Fixes: `244d513071` ("eal: enable hotplug on multi-process") Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>	2018-11-25 13:15:34 +01:00
Ilya Maximets	9e8b90fc6d	eal/bsd: fix possible IOPL fd leak If rte_eal_iopl_init() will be called more than once we'll leak the file descriptor. Fixes: `b46fe31862` ("eal/bsd: fix virtio on FreeBSD") Cc: stable@dpdk.org Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>	2018-11-25 11:44:25 +01:00
Thomas Monjalon	e357e8ebd9	eal: fix build with -O1 In case of optimized compilation, RTE_BUILD_BUG_ON use an external variable which is neither defined, nor used. It seems not optimized out in case of OPDL compiled with clang -O1: opdl_ring.c: undefined reference to `RTE_BUILD_BUG_ON_detected_error' clang-6.0: fatal error: linker command failed with exit code 1 Fixes: `af75078fec` ("first public release") Cc: stable@dpdk.org Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-11-23 01:43:32 +01:00
Anatoly Burakov	509cc88513	eal: deprecate and rename bsf64 function Rename rte_bsf64 to rte_bsf64_safe (this is a "safe" version in that it prevents undefined behavior by checking if incoming parameter is zero) and move it to common header. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Acked-by: Jasvinder Singh <jasvinder.singh@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-11-23 01:43:31 +01:00
Anatoly Burakov	816c924e9e	eal: remove useless code in bsf64 function RTE_BITMAP_OPTIMIZATIONS was never set to 0 and makes no sense anyway, so remove all code related to it. Also, drop the "likely" for bsf64 code, because it's a generic function and we cannot make any assumptions about likely values of incoming arguments. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>	2018-11-23 01:43:26 +01:00
Anatoly Burakov	615fcf55d2	ipc: fix access after async request failure Previous fix for rte_panic has moved setting of alarm before sending the message. This means that whether we send a message, the alarm would still trigger. The comment noted that cleanup would happen in the alarm handler, but that's not what actually happened - instead, in the event of failed send we freed the memory in-place, before putting the request on the queue. This works OK when the message is sent, but when sending the message fails, the alarm would still trigger with a pointer argument that points to non-existent memory, and cause memory corruption. There probably is a "proper" fix for this issue, with correct handling of sent vs. unsent requests, however it would be simpler just to sacrifice the sent request in the (extremely unlikely) event of alarm set failing. The other process would still send a response, but it will be ignored by the sender. Fixes: `45e5f49e87` ("ipc: remove panic in async request") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-11-23 01:43:24 +01:00
Thomas Monjalon	d82e5db6f6	version: 18.11-rc4 Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-11-19 01:40:54 +01:00
Jeff Guo	c48407e8af	eal: fix deadlock in hot-unplug When device be hot-unplugged, the hot-unplug handler will be invoked by uio remove event and the device will be detached, then kernel will sent another pci remove event. So if there is any unlock miss, it will cause a dead lock issue. This patch will add this missing unlock for hot-unplug handler. Fixes: `0fc54536b1` ("eal: add failure handling for hot-unplug") Signed-off-by: Jeff Guo <jia.guo@intel.com>	2018-11-18 17:16:40 +01:00
David Wilder	6b062d56bc	mem: fix anonymous mapping on Power9 Removed the use of MAP_HUGETLB for anonymous mapping on ppc64. The MAP_HUGETLB had previously been added to workaround issues on IBM Power8 systems when mapping /dev/zero. In the current code the MAP_HUGETLB flag will cause the anonymous mapping to fail on Power9. Note, Power8 is currently failing to correctly mmap Hugepages, with and without this change. Fixes: `284ae3e9ff` ("eal/ppc: fix mmap for memory initialization") Signed-off-by: David Wilder <dwilder@us.ibm.com> Reviewed-by: Pradeep Satyanarayana <pradeep@us.ibm.com>	2018-11-18 14:42:18 +01:00
Anatoly Burakov	71aae4b421	malloc: fix adjacency check to also include segment list It may so happen that two memory locations may be adjacent in virtual memory, but belong to different segment lists. With current code, such segments will be concatenated. Fix the adjacency checking code to also check if the adjacent malloc elements belong to the same memseg list. Fixes: `66cc45e293` ("mem: replace memseg with memseg lists") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-11-18 14:15:04 +01:00
Anatoly Burakov	32fc0fa00e	mem: check for contiguousness in external segments For IOVA as VA mode, we assume that memory is contiguous. However, for external segments that assumption may not necessarily hold. Fix the code to not assume that external memory segments are contiguous even in IOVA as VA mode. Fixes: `5282bb1c36` ("mem: allow memseg lists to be marked as external") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-11-18 14:12:20 +01:00
Kevin Laatz	2ddd89c3c6	eal: fix duplicate function declaration The rte_eal_get_runtime_dir() function is currently being declared in two header files. This API was made public in commit `6911c9fd8f` ("eal: export function to get runtime directory"), adding it to rte_eal.h. To make it public, the 'rte' prefix was added to the function so it needed to be modified in the original location of the declaration, eal_filesystem.h. By only modifying, and not removing the decalration, it is now a duplicate. This patch removes the declaration from eal_filesystem.h. Fixes: `6911c9fd8f` ("eal: export function to get runtime directory") Reported-by: Anatoly Burakov <anatoly.burakov@intel.com> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-11-18 13:40:26 +01:00
Thomas Monjalon	3e42b6ce06	version: 18.11-rc3 Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-11-14 05:05:29 +01:00
Bruce Richardson	f98a95102d	eal/x86: move header to standard BSD license This updates the license on the rte_rtm.h file to be the standard BSD-3-Clause license used for the rest of DPDK, thus bringing the file in compliance with the DPDK licensing policy. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2018-11-14 01:44:14 +01:00
Bruce Richardson	e5f9a65147	eal/x86: reduce contention when retrying TSX When TSX transactions abort, it is generally worth retrying a number of times before falling back to the traditional locking path, as the parallelism benefits from TSX can be worth it when a transaction does succeed. For cases with multiple threads and high contention rates, it can be useful to have increasing delays between retry attempts, so as to avoid having the same threads repeatedly collided. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2018-11-14 01:03:21 +01:00
Anatoly Burakov	45e5f49e87	ipc: remove panic in async request EAL should not crash when setting alarm fails. Also, remove the profanity in error message. Fixes: `daf9bfca71` ("ipc: remove thread for async requests") Cc: stable@dpdk.org Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-11-14 00:01:38 +01:00
Jerin Jacob	5d08fecdd3	eal: fix build Some toolchain has fls() definition in string.h as argument type int, which is conflicting uint32_t argument type. /export/dpdk.org/lib/librte_eal/common/rte_reciprocal.c:47:19: error: conflicting types for ‘fls’ static inline int fls(uint32_t x) ^~~ /opt/marvell-tools-201/aarch64-marvell-elf/include/strings.h:59:6: note: previous declaration of ‘fls’ was here int fls(int) __pure2; FreeBSD string.h also has fls() with argument as int type. https://www.freebsd.org/cgi/man.cgi?query=fls&sektion=3 Fixing the conflict by using rte version of fls. Fixes: `ffe3ec811e` ("sched: introduce reciprocal divide") Fixes: `faf2b25c9f` ("fm10k: support VMDQ in multi-queue configuration") Cc: stable@dpdk.org Suggested-by: Thomas Monjalon <thomas@monjalon.net> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>	2018-11-12 13:27:02 +01:00
Jerin Jacob	3a6f2c50b9	eal: introduce rte version of fls The function returns the last (most-significant) bit set. Added unit testcase to verify rte_fls_u32(). Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>	2018-11-12 13:25:01 +01:00
Thomas Monjalon	6bdf144553	eal/x86: remove unused memcpy file The use of rte_memcpy_ptr was removed in revert below, but it was missing removing the file arch/x86/rte_memcpy.c. Fixes: `d35cc1fe6a` ("eal/x86: revert select optimized memcpy at run-time") Cc: stable@dpdk.org Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-11-12 00:11:46 +01:00
Thomas Monjalon	c7ad7754f8	devargs: do not replace already inserted device The devargs of a device can be replaced by a newly allocated one when trying to probe again the same device (multi-process or multi-ports scenarios). This is breaking some pointer references. It can be avoided by copying the new content, freeing the new devargs, and returning the already inserted pointer. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Tested-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Tested-by: Qi Zhang <qi.z.zhang@intel.com> Tested-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>	2018-11-12 00:10:21 +01:00
Alejandro Lucero	ee0e074f81	mem: fix DMA mask width sanity check Current code has different max DMA mask width values for 32 and 64 bits systems. IOMMU hardware could report a higher supported width than current MAX_DMA_MASK_BITS when RTE_ARCH_64 is not defined. This is actually true with a 32 bits kernel running in a 64 bits server with IOMMU hardware. This could also be a problem with embedded systems using an IOMMU designed for 64 bits in a 32 bits system. This patch leaves a single max DMA mask width which will make sure the mask width is within the range for 64 bits variables used for DMA mask. This also will avoid wrong values because any value higher than 64 bits is likely wrong. Fixes: `223b7f1d5e` ("mem: add function for checking memseg IOVA") Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-11-07 14:42:28 +01:00
Anatoly Burakov	4531d096d1	mem: fix use after free in legacy mem init Adding an additional failure path in DMA mask check has exposed an issue where `hugepage` pointer may point to memory that has already been unmapped, but pointer value is still not NULL, so failure handler will attempt to unmap it second time if DMA mask check fails. Fix it by setting `hugepage` pointer to NULL once it is no longer needed. Coverity issue: 325730 Fixes: `165c89b845` ("mem: use DMA mask check for legacy memory") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-11-07 00:06:38 +01:00
Thomas Monjalon	c59b06294f	version: 18.11-rc2 Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-11-06 03:27:49 +01:00
Ferruh Yigit	c8b506e4b6	service: fix possible null access Fixes: `21698354c8` ("service: introduce service cores concept") Cc: stable@dpdk.org Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-11-06 01:14:15 +01:00
Thomas Monjalon	d75d132c30	eal: remove experimental tag for probe/remove The functions rte_dev_probe() and rte_dev_remove() are new in DPDK 18.11 so they got the experimental tag by policy. However they are too much basic functions for being skipped by strict applications which do not use experimental functions. The alternative is to use rte_eal_hotplug_add() and rte_eal_hotplug_remove(), but their API requires the application to parse the devargs string in order to provide bus name, device name and driver arguments. The new function rte_dev_probe() is really simpler to use and more flexible by accepting any devargs string. Let's encourage applications to use it. The old functions rte_eal_hotplug_* may be deprecated later. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Kevin Traynor <ktraynor@redhat.com> Tested-by: Kevin Traynor <ktraynor@redhat.com>	2018-11-06 01:14:02 +01:00
Anatoly Burakov	1ccfeb7df7	malloc: fix invalid argument handling When adding memory to an external heap, do not go to unlock failure handler because the memory hotplug lock hasn't been taken out yet. Fixes: `7d75c31014` ("malloc: allow adding memory to named heaps") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-11-06 01:13:58 +01:00
Alejandro Lucero	84e7477e10	mem: add thread unsafe version for DMA mask check During memory initialization calling rte_mem_check_dma_mask leads to a deadlock because memory_hotplug_lock is locked by a writer, the current code in execution, and rte_memseg_walk tries to lock as a reader. This patch adds a thread_unsafe version which will call the final function specifying the memory_hotplug_lock does not need to be acquired. The patch also modified rte_mem_check_dma_mask as a intermediate step which will call the final function as before, implying memory_hotplug_lock will be acquired. PMDs should always use the version acquiring the lock with the thread_unsafe one being just for internal EAL memory code. Fixes: `223b7f1d5e` ("mem: add function for checking memseg IOVA") Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-11-05 01:02:14 +01:00
Alejandro Lucero	165c89b845	mem: use DMA mask check for legacy memory If a device reports addressing limitations through a dma mask, the IOVAs for mapped memory needs to be checked out for ensuring correct functionality. Previous patches introduced this DMA check for main memory code currently being used but other options like legacy memory and the no hugepages option need to be also considered. This patch adds the DMA check for those cases. Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-11-05 01:02:13 +01:00
Alejandro Lucero	4374ebc24b	malloc: modify error message for DMA mask check If DMA mask checks shows mapped memory out of the supported range specified by the DMA mask, nothing can be done but return an error an report the error. This can imply the app not being executed at all or precluding dynamic memory allocation once the app is running. In any case, we can advice the user to force IOVA as PA if currently IOVA being VA and user being root. Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-11-05 01:02:11 +01:00
Alejandro Lucero	9d15773606	mem: add function for setting DMA mask This patch adds the possibility of setting a dma mask to be used once the memory initialization is done. This is currently needed when IOVA mode is set by PCI related code and an x86 IOMMU hardware unit is present. Current code calls rte_mem_check_dma_mask but it is wrong to do so at that point because the memory has not been initialized yet. Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-11-05 01:02:04 +01:00
Alejandro Lucero	0de9eb6138	mem: rename DMA mask check with proper prefix Current name rte_eal_check_dma_mask does not follow the naming used in the rest of the file. Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-11-05 01:01:54 +01:00
Alejandro Lucero	af0aa2357d	malloc: fix DMA mask check The param needs to be the maskbits and not the mask. Fixes: `223b7f1d5e` ("mem: add function for checking memseg IOVA") Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-11-05 01:01:43 +01:00
Ferruh Yigit	3370975b99	eal: fix build with gcc 9.0 build error: In function ‘eal_plugin_add’, .../lib/librte_eal/common/eal_common_options.c:225:2: error: ‘strncpy’ output may be truncated copying 4095 bytes from a string of length 4095 [-Werror=stringop-truncation] strncpy(solib->name, path, PATH_MAX-1); strncpy may result a not null-terminated string, replaced it with strlcpy Fixes: `f9a08f6502` ("eal: add support for shared object drivers") Cc: stable@dpdk.org Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-11-04 22:48:04 +01:00
Jerin Jacob	11b57c6980	eal: fix error string function errno_autotest testcase were failed since commit `5d7b673d5f` ("mk: build with _GNU_SOURCE defined by default") RTE>>errno_autotest rte_strerror: 'Unknown error 11', strerror: 'Resource temporarily unavailable' Test Failed There are two different version of strerror_t() based on _GNU_SOURCE definition. /* XSI-compliant / int strerror_r(int errnum, char buf, size_t buflen); /* GNU-specific / char strerror_r(int errnum, char buf, size_t buflen); Since the GNU-specific version returns char the exiting "if" condition around the strerror_r fails. Switching back to XSI-compliant version to allow a) Portable strerror_r() usage as musl c library uses non GNU speficic version https://git.musl-libc.org/cgit/musl/tree/src/string/strerror_r.c b) Based on strerror_r(3) man page, it is possible that GNU-specific version need not use char *buf to fill error message instead it can use the immutable static string from the library and return it. note from strerror_r(3) man page: The GNU-specific strerror_r() returns a pointer to a string containing the error message. This may be either a pointer to a string that the function stores in buf, or a pointer to some (immutable) static string (in which case buf is unused). Fixes: `5d7b673d5f` ("mk: build with _GNU_SOURCE defined by default") Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-11-04 22:25:20 +01:00
Luca Boccassi	349ac52bbc	eal/linux: handle UIO read failure in interrupt handler If a device is unplugged while an interrupt is pending, the read call to the uio device to remove it from the poll wait list can fail resulting in it being continually polled forever. This change checks for the read failing and if so, unregisters the device as an interrupt source and causes the wait list to be rebuilt. This race has been reported and observed in production. Fixes: `0a45657a67` ("pci: rework interrupt handling") Cc: stable@dpdk.org Signed-off-by: Brian Russell <brussell@brocade.com> Signed-off-by: Luca Boccassi <bluca@debian.org>	2018-11-02 10:50:49 +01:00
Darek Stojaczyk	95781f4c64	eal: fix memory leak on multi-process hotplug rollback Fixes: `244d513071` ("eal: enable hotplug on multi-process") Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com>	2018-11-02 00:05:49 +01:00
Darek Stojaczyk	04854a39e6	eal: fix IPC memory leak on device hotplug rte_mp_request_sync() says that the caller is responsible for freeing one of its parameters afterwards. EAL didn't do that, causing a memory leak. Fixes: `244d513071` ("eal: enable hotplug on multi-process") Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-31 19:16:42 +01:00
Thomas Monjalon	bdbe62df10	version: 18.11-rc1 Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-10-29 04:08:26 +01:00
Ferruh Yigit	b74fd6b842	add missing static keyword to globals Some global variables can indeed be static, add static keyword to them. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>	2018-10-29 02:01:08 +01:00
Darek Stojaczyk	6bcb7c95fe	vfio: share default container in multi-process So far each process in MP used to have a separate container and relied on the primary process to register all memsegs. Mapping external memory via rte_vfio_container_dma_map() in secondary processes was broken, because the default (process-local) container had no groups bound. There was even no way to bind any groups to it, because the container fd was deeply encapsulated within EAL. This patch introduces a new SOCKET_REQ_DEFAULT_CONTAINER message type for MP synchronization, makes all processes within a MP party use a single default container, and hence fixes rte_vfio_container_dma_map() for secondary processes. From what I checked this behavior was always the same, but started to be invalid/insufficient once mapping external memory was allowed. While here, fix up the comment on rte_vfio_get_container_fd(). This function always opens a new container, never reuses an old one. Fixes: `73a6390859` ("vfio: allow to map other memory regions") Cc: stable@dpdk.org Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-29 01:59:48 +01:00
Darek Stojaczyk	88e2d78a20	vfio: fix read of freed memory on getting container fd We were reading some memory just after freeing it. Fixes: `83a73c5fef` ("vfio: use generic multi-process channel") Cc: stable@dpdk.org Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-29 01:59:48 +01:00
Dariusz Stojaczyk	4f5519ed83	vfio: cleanup getting group fd Factor out duplicated code. Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-29 01:58:32 +01:00
Dariusz Stojaczyk	db9d32b8b7	vfio: check if group fd is already open Always attempt to find already opened fd for an iommu group as subsequent attempts to open it will fail. There's no public API to check if a group was already bound and has a container, so rte_vfio_container_group_bind() shouldn't fail in such case. Fixes: `ea2dc10668` ("vfio: add multi container support") Cc: stable@dpdk.org Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com> Acked-by: Xiao Wang <xiao.w.wang@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-29 01:58:31 +01:00
Eric Zhang	075b182b54	eal: force IOVA to a particular mode This patch uses EAL option "--iova-mode" to force the IOVA mode to a particular value. There exists virtual devices that are not directly attached to the PCI bus, and therefore the auto detection of the IOVA mode based on probing the PCI bus and IOMMU configuration may not report the required addressing mode. Using the EAL option permits the mode to be explicitly configured in this scenario. Signed-off-by: Eric Zhang <eric.zhang@windriver.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Marko Kovacevic <marko.kovacevic@intel.com>	2018-10-29 00:01:05 +01:00
Santosh Shukla	783667c9f9	eal: add --iova-mode option In the case of user don't want to use bus iova scheme and want to override. For that, adding EAL option --iova-mode=<string> where valid input string is 'pa' or 'va'. Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Signed-off-by: Eric Zhang <eric.zhang@windriver.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-28 23:41:26 +01:00
Takeshi Yoshimura	998c89f148	vfio: fix sPAPR IOMMU mapping Commit `73a6390859` ("vfio: allow to map other memory regions") introduced a bug in sPAPR IOMMU mapping. The commit removed necessary ioctl with VFIO_IOMMU_SPAPR_REGISTER_MEMORY. Also, vfio_spapr_map_walk should call vfio_spapr_dma_do_map instead of vfio_spapr_dma_mem_map. Fixes: `73a6390859` ("vfio: allow to map other memory regions") Cc: stable@dpdk.org Signed-off-by: Takeshi Yoshimura <tyos@jp.ibm.com>	2018-10-28 22:33:27 +01:00
Alejandro Lucero	1df2170287	mem: use address hint for mapping hugepages Linux kernel uses a really high address as starting address for serving mmaps calls. If there exist addressing limitations and IOVA mode is VA, this starting address is likely too high for those devices. However, it is possible to use a lower address in the process virtual address space as with 64 bits there is a lot of available space. This patch adds an address hint as starting address for 64 bits systems and increments the hint for next invocations. If the mmap call does not use the hint address, repeat the mmap call using the hint address incremented by page size. Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-28 22:06:05 +01:00
Alejandro Lucero	223b7f1d5e	mem: add function for checking memseg IOVA A device can suffer addressing limitations. This function checks memsegs have iovas within the supported range based on dma mask. PMDs should use this function during initialization if device suffers addressing limitations, returning an error if this function returns memsegs out of range. Another usage is for emulated IOMMU hardware with addressing limitations. It is necessary to save the most restricted dma mask for checking out memory allocated dynamically after initialization. Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-28 22:04:34 +01:00
Darek Stojaczyk	c7810c319d	malloc: check size hint when reserving the biggest element RTE_MEMZONE_SIZE_HINT_ONLY wasn't checked in any way, causing size hints to be parsed as hard requirements. This resulted in some allocations being failed prematurely. Fixes: `68b6092bd3` ("malloc: allow reserving biggest element") Cc: stable@dpdk.org Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-28 11:59:02 +01:00
Ziye Yang	e4f2c1421d	eal/linux: fix memory leak of logid This patch is used to fix the memory leak issue of logid. We use the ASAN test in SPDK when integrating DPDK and find this memory leak issue. Fixes: `d8a2bc71df` ("log: remove app path from syslog id") Cc: stable@dpdk.org Signed-off-by: Ziye Yang <ziye.yang@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-28 11:42:18 +01:00
Kevin Laatz	6911c9fd8f	eal: export function to get runtime directory This patch makes the eal_get_runtime_dir() API public so it can be used from outside EAL. Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2018-10-27 12:10:24 +02:00
Kevin Laatz	2395332798	eal: add option register infrastructure This commit adds infrastructure to EAL that allows an application to register it's init function with EAL. This allows libraries to be initialized at the end of EAL init. This infrastructure allows libraries that depend on EAL to be initialized as part of EAL init, removing circular dependency issues. Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>	2018-10-27 12:10:10 +02:00
Ilya Maximets	a51639cc72	eal: add nanosleep based delay function Add a new rte_delay_us_sleep() function that uses nanosleep(). This function can be used by applications to not implement their own nanosleep() based callback and by internal DPDK code if CPU non-blocking delay needed. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 22:14:06 +02:00
Thomas Monjalon	01e5b16c57	eal: remove deprecated attach/detach functions These hotplug functions were deprecated and have some new replacements. As announced earlier, the oldest ones are now removed. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:05 +02:00
Thomas Monjalon	214ed1acd1	ethdev: add iterator to match devargs input The iterator will return the ethdev port ids matching a devargs string. It is recommended to use the macro RTE_ETH_FOREACH_MATCHING_DEV() for usage convenience. The class string is prefixed with '+' in order to skip the validation of the parameter keys. It is tolerated for the compatibility with the old (current) syntax where all parameters (bus, class and driver) are mixed in the same string without any delimiter. Thanks to this compatibility prefix, the driver parameters will be skipped during the ethdev parsing, and not considered invalid. A macro is introduced in rte_common.h to workaround a const field. This hack is needed to free const strings in the iterator. It is preferred to keep the const for these fields, because it gives a hint that they are not changed at each iteration. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-26 22:14:05 +02:00
Anatoly Burakov	5640171c52	malloc: fix external heap allocation in no-huge mode When no-huge mode is enabled, we always overwrite the socket ID to be SOCKET_ID_ANY in rte_malloc, because there is no NUMA awareness in no-huge mode. However, with external memory support, a socket ID may have other meaning, and we cannot overwrite the socket ID in those cases. Fixes: `65ff37b105` ("malloc: add function to check if socket is external") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-26 22:37:59 +02:00
Phil Yang	fd5f33323e	kni: introduce C11 atomic into FIFO synchronization Syncing the values by adding c11 atomic memory barriers to make sure the values being synced before updating fifo_write and fifo_read. Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 18:10:14 +02:00
Phil Yang	711859cd0d	kni: fix kernel FIFO synchronization Adding memory barrier to make sure the values being synced before updating fifo_write in kni_fifo_put and fifo_read in kni_fifo_get. Fixes: `3fc5ca2f63` ("kni: initial import") Cc: stable@dpdk.org Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-26 18:10:14 +02:00
Jerin Jacob	1f8494f002	eal/ppc: support pause API Add support for rte_pause() implementation for ppc64. Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Chao Zhu <chaozhu@linux.vnet.ibm.com>	2018-10-26 14:37:56 +02:00
Paul Luse	66fd3a3b0f	bus/vdev: fix multi-process IPC buffer leak on scan This patch fixes an issue caught with ASAN where a vdev_scan() to a secondary bus was failing to free some memory. The doxygen comment in EAL is fixed at the same time. Fixes: `cdb068f031` ("bus/vdev: scan by multi-process channel") Fixes: `783b6e5497` ("eal: add synchronous multi-process communication") Cc: stable@dpdk.org Signed-off-by: Paul Luse <paul.e.luse@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-25 10:28:13 +02:00
Gaetan Rivet	97e476ad7c	devargs: fix variadic parsing memory leak rte_devargs_parsef will leak memory each time it is called. The device string must be freed. Fixes: `a23bc2c4e0` ("devargs: add non-variadic parsing function") Cc: stable@dpdk.org Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>	2018-10-25 08:54:25 +02:00
Keith Wiles	81bede55e3	eal: add macro for attribute weak eal: add shorthand __rte_weak macro qat: update code to use __rte_weak macro avf: update code to use __rte_weak macro fm10k: update code to use __rte_weak macro i40e: update code to use __rte_weak macro ixgbe: update code to use __rte_weak macro mlx5: update code to use __rte_weak macro virtio: update code to use __rte_weak macro acl: update code to use __rte_weak macro bpf: update code to use __rte_weak macro Signed-off-by: Keith Wiles <keith.wiles@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-25 02:11:23 +02:00
Stephen Hemminger	dea302eb4e	eal/arm: remove profanity in comment Update comment to describe the problem better without risk of being offensive. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2018-10-25 02:11:22 +02:00
Stephen Hemminger	4f06c51c7e	eal/linux: eliminate cast of HPET thread signature The cast of hpet_msb_inc is causing a warning in some compilations. Yet the cast is unnecessary, the function is used only one place just use the correct signature. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2018-10-25 02:11:22 +02:00
Stephen Hemminger	08e348daab	eal: remove double space in init alert messages rte_init_alert already adds a newline, don't do it twice. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Shreyansh Jain <shreyansh.jain@nxp.com>	2018-10-25 02:11:22 +02:00
Dariusz Stojaczyk	b4f62e5862	ipc: fix undefined behavior in no-shconf mode In no-shconf mode the rte_mp_request_sync() wasn't initializing the `reply` parameter, which contained e.g. a number of sent requests. Callers of rte_mp_request_sync() might check that param afterwards and might read potentially unitialized memory. The no-shconf check that makes us return early (with rc = 0) was placed before the `reply` initialization. Fix this by making the `reply` initialization occur first. Fixes: `5848e3d281` ("ipc: support --no-shconf mode") Cc: stable@dpdk.org Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-24 21:49:57 +02:00
Anatoly Burakov	198b66b946	mem: fix resource leak Segment preallocation code allocates an array of structures on the heap but does not free the memory afterwards. Fix it by freeing it at the end of the function, and changing control flow to always go through that code path. Coverity issue: 323524 Fixes: `1dd342d0fd` ("mem: improve segment list preallocation") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-23 11:36:48 +02:00
Qi Zhang	4dc3db031d	eal: fix bus name read for removal in multi-process A crash may appear when removing some PCI devices because dev->devargs is not always initialized. So use dev->bus instead of dev->devargs->bus when building devargs string to remove a device. Fixes: `244d513071` ("eal: enable hotplug on multi-process") Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net>	2018-10-22 12:41:28 +02:00
Anatoly Burakov	1dd342d0fd	mem: improve segment list preallocation Current code to preallocate segment lists is trying to do everything in one go, and thus ends up being convoluted, hard to understand, and, most importantly, does not scale beyond initial assumptions about number of NUMA nodes and number of page sizes, and therefore has issues on some configurations. Instead of fixing these issues in the existing code, simply rewrite it to be slightly less clever but much more logical, and provide ample comments to explain exactly what is going on. We cannot use the same approach for 32-bit code because the limitations of the target dictate current socket-centric approach rather than type-centric approach we use on 64-bit target, so 32-bit code is left unmodified. FreeBSD doesn't support NUMA so there's no complexity involved there, and thus its code is much more readable and not worth changing. Fixes: `1d406458db` ("mem: make segment preallocation OS-specific") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-22 12:40:14 +02:00
Anatoly Burakov	0042eb5646	eal: improve musl compatibility of thread log Musl complains about pthread id being of wrong size, because on musl, pthread_t is a struct pointer, not an unsigned int. Fix the printing code by casting pthread id to unsigned pointer type and adjusting the format specifier to be of appropriate size. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-22 12:40:14 +02:00
Anatoly Burakov	0601defe2e	eal: improve musl compatibility of string functions Musl wraps various string functions such as strlcpy in order to harden them. However, the fortify wrappers are included without including the actual string functions being wrapped, which throws missing definition compile errors. Fix by including string.h in string functions header. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-22 12:40:14 +02:00
Anatoly Burakov	3717943819	mem: improve musl compatibility When built against musl, fcntl.h doesn't silently get included. Fix by including it explicitly. Bugzilla ID: 31 Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-22 11:29:37 +02:00
Anatoly Burakov	1c7fd81054	eal/linux: improve musl compatibility When built against musl, fcntl.h doesn't silently get included. Fix by including it explicitly. Bugzilla ID: 33 Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-22 11:28:52 +02:00
Anatoly Burakov	997b0ef8f8	fbarray: improve musl compatibility When built against musl, fcntl.h doesn't silently get included. Fix by including it explicitly. Bugzilla ID: 34 Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-22 11:28:46 +02:00
Anatoly Burakov	5d7b673d5f	mk: build with _GNU_SOURCE defined by default We use _GNU_SOURCE all over the place, but often times we miss defining it, resulting in broken builds on musl. Rather than fixing every library's and driver's and application's makefile, fix it by simply defining _GNU_SOURCE by default for all builds. Remove all usages of _GNU_SOURCE in source files and makefiles, and also fixup a couple of instances of using __USE_GNU instead of _GNU_SOURCE. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-22 11:28:27 +02:00
Thomas Monjalon	739e13bcc9	devargs: fix freeing during device removal After calling unplug function of a bus, the device is expected to be freed. It is too late for getting devargs to remove. Anyway, the buses which implement unplug are already freeing the devargs, except the PCI bus. So the call to rte_devargs_remove() is removed from EAL and added in PCI. Fixes: `2effa126fb` ("devargs: simplify parameters of removal function") Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2018-10-19 22:37:10 +02:00
Thomas Monjalon	e9d159c3d5	eal: allow probing a device again In the devargs syntax for device representors, it is possible to add several devices at once: -w dbdf,representor=[0-3] It will become a more frequent case when introducing wildcards and ranges in the new devargs syntax. If a devargs string is provided for probing, and updated with a bigger range for a new probing, then we do not want it to fail because part of this range was already probed previously. There can be new ports to create from an existing rte_device. That's why the check for an already probed device is moved as bus responsibility. In the case of vdev, a global check is kept in insert_vdev(), assuming that a vdev will always have only one port. In the case of ifpga and vmbus, already probed devices are checked. In the case of NXP buses, the probing is done only once (no hotplug), though a check is added at bus level for consistency. In the case of PCI, a driver flag is added to allow PMD probing again. Only the PMD knows the ports attached to one rte_device. As another consequence of being able to probe in several steps, the field rte_device.devargs must not be considered as a full representation of the rte_device, but only the latest probing args. Anyway, the field rte_device.devargs is used only for probing. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com> Tested-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>	2018-10-18 01:49:52 +02:00
Thomas Monjalon	52897e7e70	eal: add function to query device status The function rte_dev_is_probed() is added in order to improve semantic and enforce proper check of the probing status of a device. It will answer this rte_device query: Is it already successfully probed or not? Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com> Tested-by: Andrew Rybchenko <arybchenko@solarflare.com>	2018-10-18 01:49:28 +02:00
Thomas Monjalon	391797f042	drivers/bus: move driver assignment to end of probing The PCI mapping requires to know the PCI driver to use, even before the probing is done. That's why the PCI driver is referenced early inside the PCI device structure. See commit `1d20a073fa` ("bus/pci: reference driver structure before mapping") However the rte_driver does not need to be referenced in rte_device before the device probing is done. By moving back this assignment at the end of the device probing, it becomes possible to make clear the status of a rte_device. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com> Tested-by: Andrew Rybchenko <arybchenko@solarflare.com> Reviewed-by: Rosen Xu <rosen.xu@intel.com>	2018-10-17 10:26:59 +02:00
Qi Zhang	ac9e4a1737	eal: support attach/detach shared device from secondary This patch cover the multi-process hotplug case when a device attach/detach request be issued from a secondary process device attach on secondary: a) secondary send sync request to the primary. b) primary receive the request and attach the new device if failed goto i). c) primary forward attach sync request to all secondary. d) secondary receive the request and attach the device and send a reply. e) primary check the reply if all success goes to j). f) primary send attach rollback sync request to all secondary. g) secondary receive the request and detach the device and send a reply. h) primary receive the reply and detach device as rollback action. i) send attach fail to secondary as a reply of step a), goto k). j) send attach success to secondary as a reply of step a). k) secondary receive reply and return. device detach on secondary: a) secondary send sync request to the primary. b) primary send detach sync request to all secondary. c) secondary detach the device and send a reply. d) primary check the reply if all success goes to g). e) primary send detach rollback sync request to all secondary. f) secondary receive the request and attach back device. goto h). g) primary detach the device if success goto i), else goto e). h) primary send detach fail to secondary as a reply of step a), goto j). i) primary send detach success to secondary as a reply of step a). j) secondary receive reply and return. Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-17 10:16:18 +02:00
Qi Zhang	244d513071	eal: enable hotplug on multi-process We are going to introduce the solution to handle hotplug in multi-process, it includes the below scenario: 1. Attach a device from the primary 2. Detach a device from the primary 3. Attach a device from a secondary 4. Detach a device from a secondary In the primary-secondary process model, we assume devices are shared by default. that means attaches or detaches a device on any process will broadcast to all other processes through mp channel then device information will be synchronized on all processes. Any failure during attaching/detaching process will cause inconsistent status between processes, so proper rollback action should be considered. This patch covers the implementation of case 1,2. Case 3,4 will be implemented on a separate patch. IPC scenario for Case 1, 2: attach a device a) primary attach the new device if failed goto h). b) primary send attach sync request to all secondary. c) secondary receive request and attach the device and send a reply. d) primary check the reply if all success goes to i). e) primary send attach rollback sync request to all secondary. f) secondary receive the request and detach the device and send a reply. g) primary receive the reply and detach device as rollback action. h) attach fail i) attach success detach a device a) primary send detach sync request to all secondary b) secondary detach the device and send reply c) primary check the reply if all success goes to f). d) primary send detach rollback sync request to all secondary. e) secondary receive the request and attach back device. goto g) f) primary detach the device if success goto g), else goto d) g) detach fail. h) detach success. Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2018-10-17 10:16:18 +02:00
Jerin Jacob	94d7265976	vfio: fix missing header inclusion The following change set introduces HAVE_VFIO_DEV_REQ_INTERFACE and used in the below files. drivers/bus/pci/linux/pci_vfio.c drivers/bus/pci/pci_common.c lib/librte_eal/linuxapp/eal/eal_interrupts.c However, Except the first file, the change missed to include <rte_vfio.h> where HAVE_VFIO_DEV_REQ_INTERFACE defined. This creates runtime following error on vfio-pci mode and kernel >= 4.0.0 combination. EAL: [rte_intr_enable] Unknown handle type of fd 95 EAL: [pci_vfio_enable_notifier]Fail to enable req notifier. EAL: Fail to unregister req notifier handler. EAL: Error setting up notifier! EAL: Requested device 0000:07:00.1 cannot be used Fixes: `cda9441996` ("vfio: fix build with Linux < 4.0") Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>	2018-10-17 10:16:18 +02:00
Jeff Guo	c89fdd8da2	eal/bsd: fix build When compiling on FreeBSD, a warning/error is thrown for unused parameter. This patch aim to fix the issue by delete the useless func definition. Fixes: `89ecd11052` ("eal: modify device event process function") Signed-off-by: Jeff Guo <jia.guo@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2018-10-16 14:54:25 +02:00

... 2 3 4 5 6 ...

2141 Commits