numam-dpdk

Author	SHA1	Message	Date
Bruce Richardson	e9c6594264	examples: detect default build directory Most examples have in their makefiles a default RTE_TARGET directory to be used in case RTE_TARGET is not set. Rather than just using a hard-coded default, we can instead detect what the build directory is relative to RTE_SDK directory. This fixes a potential issue for anyone who continues to build using "make install T=x86_64-native-linuxapp-gcc" and skips setting RTE_TARGET explicitly, instead relying on the fact that they were building in a directory which corresponded to the example default path - which was changed to "x86_64-native-linux-gcc" by commit `218c4e68c1` ("mk: use linux and freebsd in config names"). Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2019-03-30 01:12:15 +01:00
Liron Himi	ff1e35fb5f	kni: calculate MTU from mbuf size - mbuf_size and mtu are now being calculated according to the given mb-pool. - max_mtu is now being set according to the given mtu the above two changes provide the ability to work with jumbo frames Signed-off-by: Liron Himi <lironh@marvell.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>	2019-03-30 00:59:59 +01:00
Anatoly Burakov	23d5455517	mem: warn user when running without NUMA support Running in non-legacy mode on a NUMA-enabled system without libnuma is unsupported, so explicitly print out a warning when trying to do so. Running in legacy mode without libnuma is still supported whether or not we are running with libnuma support enabled, so also fix init to allow that scenario. Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-03-30 00:13:04 +01:00
David Marchand	76f8486e16	ci: fix arm64 config filename The ARM64 config file has been renamed in the commit `ae2f2fee24` ("build: rename linuxapp to linux in meson cross files"). Fixes: `99889bd852` ("ci: introduce Travis builds for GitHub repositories") Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com>	2019-03-30 00:01:35 +01:00
Lukasz Krakowiak	d7b713d0dc	power: add some logs on requests Extend debugs on power instruction and cmd police destroy requests. Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com>	2019-03-29 23:29:21 +01:00
Lukasz Krakowiak	1b89799147	power: update error handling Update for handling negative returned status from functions call. Signed-off-by: Lukasz Krakowiak <lukaszx.krakowiak@intel.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-03-29 15:29:31 +01:00
Kevin Traynor	e1e4dafbc7	power: fix frequency list buffer validation The frequency list buffer was already validated in power_acpi_cpufreq_freqs(), so the newly added check was redundant. To keep consistency with power_pstate_cpufreq_freqs(), remove the original check and update the log message. Fixes: `2e6ccdb4e0` ("power: fix frequency list to handle null buffer") Cc: stable@dpdk.org Signed-off-by: Kevin Traynor <ktraynor@redhat.com>	2019-03-29 14:58:27 +01:00
Hemant Agrawal	6a556bd6ca	net/dpaa2: support flow table flush Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>	2019-03-29 14:38:06 +01:00
Hemant Agrawal	4762b3d419	bus/dpaa: delay fman device list to bus probe The fman device list need to be accessed across processes. The hw device structures should be allocated with rte_calloc instead of calloc. The rte_calloc is not available at the time of bus scan, so better prepare the device list at probe. Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>	2019-03-29 14:37:45 +01:00
Akhil Goyal	e1797f4b44	mempool/dpaa: allocate bp info for multiprocess rte_dpaa_bpid_info shall be allocated with the hugepage memory which can be shared across processes. Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com>	2019-03-29 14:36:39 +01:00
Akhil Goyal	4bbc759f53	bus/dpaa: save fq lookup table for secondary process A reference to qman_fq_lookup_table need to be saved in each fq, so that it is retrieved while in running secondary process. Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com>	2019-03-29 14:33:23 +01:00
Shreyansh Jain	6880caed6b	bus/dpaa: fix Rx discard register mask Current value of 'fmbm_rfsdm' register (0x010CE3F0) doesn't include the bit to drop colored (red) packets. New value (0x010EE3F0) fixes this. Check with 'fmbm_rffc' register of fm_port_bmi_regs. Fixes: `6d6b4f49a1` ("bus/dpaa: add FMAN hardware operations") Cc: stable@dpdk.org Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>	2019-03-29 14:33:21 +01:00
Stephen Hemminger	4764beda0d	bus/fslmc: remove unneeded strdup The fslmc bus code was duplicating the device name and doing extra initialization. The code can be simplified to just use the device name directly. Compile tested only; do not have this hardware. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>	2019-03-29 13:55:58 +01:00
Stephen Hemminger	a6ffe11b72	bus/fslmc: decrease log level for unsupported devices When fslmc is built as part of a general distribution, the bus code will log errors when other devices are present. This could confuse users it is not an error. Fixes: `50245be05d` ("bus/fslmc: support device blacklisting") Cc: stable@dpdk.org Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2019-03-29 13:53:10 +01:00
Stephen Hemminger	2b9cee18d6	net/netvsc: remove unnecessary format of MAC address The ethernet address was being converted to a string but the code using that is no longer present. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>	2019-03-29 13:44:49 +01:00
Stephen Hemminger	2528d17199	bus/vmbus: refactor secondary mapping The secondary mapping function was duplicating the code used to search the uio_resource list. Skip the unwinding since map failure already makes device unusable. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>	2019-03-29 13:44:36 +01:00
Stephen Hemminger	2a28a502c6	bus/vmbus: map ring in secondary process Need to remember primary channel in secondary process. Then use it to iterate over subchannels in secondary process mapping setup. Fixes: `831dba47bd` ("bus/vmbus: add Hyper-V virtual bus support") Cc: stable@dpdk.org Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>	2019-03-29 13:44:19 +01:00
Stephen Hemminger	41a7f8cbee	bus/vmbus: stop mapping if empty resource found If vmbus is run on older kernel (without all the uio mappings), then the bus driver should stop when it hits the missing mappings rather than recording the empty values. Fixes: `831dba47bd` ("bus/vmbus: add Hyper-V virtual bus support") Cc: stable@dpdk.org Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>	2019-03-29 13:44:10 +01:00
Stephen Hemminger	3f9277031a	bus/vmbus: fix check for mmap failure The code was testing the result of mmap incorrectly. I.e the test that a local pointer is not MAP_FAILED would always succeed and therefore hid any potential problems. Fixes: `831dba47bd` ("bus/vmbus: add Hyper-V virtual bus support") Cc: stable@dpdk.org Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>	2019-03-29 13:44:02 +01:00
Stephen Hemminger	4a9efcddad	net/netvsc: fix VF support with secondary process The VF device management in netvsc was using a pointer to the rte_eth_devices. But the actual rte_eth_devices array is likely to be place in the secondary process; which causes a crash. The solution is to record the port of the VF (instead of a pointer) and find the device in the per process array as needed. Fixes: `dc7680e859` ("net/netvsc: support integrated VF") Cc: stable@dpdk.org Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>	2019-03-29 13:43:55 +01:00
Stephen Hemminger	fc20b5809d	bus/vmbus: fix secondary process setup The secondary process doesn't correctly map the second and later resources because it doesn't change the offset. Fixes: `831dba47bd` ("bus/vmbus: add Hyper-V virtual bus support") Cc: stable@dpdk.org Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>	2019-03-29 13:43:45 +01:00
Anatoly Burakov	3660216ef1	malloc: fix IPC message initialization The memset size for an IPC message is set incorrectly. Fix it to cover the entire IPC message. Fixes: `07dcbfe010` ("malloc: support multiprocess memory hotplug") Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-03-29 12:55:07 +01:00
Anatoly Burakov	b8a86c83e0	fbarray: fix init unlock without lock Certain failure paths of rte_fbarray_init() will unlock the mem area lock without locking it first. Fix this by properly handling the failures. Fixes: `5b61c62cfd` ("fbarray: add internal tailq for mapped areas") Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-03-29 12:49:35 +01:00
Darek Stojaczyk	5a98bc5e83	fbarray: fix attach deadlock rte_fbarray_attach() currently locks its internal spinlock, but never releases it. Secondary processes won't even start if there is more than one fbarray to be attached to - the second rte_fbarray_attach() would be just stuck. Fix it by releasing the lock at the end of rte_fbarray_attach(). I believe this was the original intention. Fixes: `5b61c62cfd` ("fbarray: add internal tailq for mapped areas") Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-03-29 12:49:35 +01:00
Stephen Hemminger	c6b5715746	drivers: fix SPDX license id consistency All drivers should have SPDX on the first line of the source files in the format /* SPDX-License-Identifier: ... Several files used minor modifications which were inconsistent with the pattern. Fix it to make scanning tools easier. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2019-03-29 00:15:53 +01:00
Anatoly Burakov	1fd3bcf3f9	vfio: document multiprocess limitation for container API Currently, there is no support for sharing custom VFIO containers between multiple processes, but it is not documented. Document this limitation. Cc: stable@dpdk.org Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-03-29 00:07:16 +01:00
Thomas Monjalon	3a1a885e03	eal: remove redundant atomic API description Atomic functions are described in doxygen of the file lib/librte_eal/common/include/generic/rte_atomic.h The copies in arch-specific files are redundant and confuse readers about the genericity of the API. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Shahaf Shuler <shahafs@mellanox.com>	2019-03-28 23:52:53 +01:00
Dekel Peled	8015c5593a	eal/ppc: fix global memory barrier From previous patch description: "to improve performance on PPC64, use light weight sync instruction instead of sync instruction." Excerpt from IBM doc [1], section "Memory barrier instructions": "The second form of the sync instruction is light-weight sync, or lwsync. This form is used to control ordering for storage accesses to system memory only. It does not create a memory barrier for accesses to device memory." This patch removes the use of lwsync, so calls to rte_wmb() and rte_rmb() will provide correct memory barrier to ensure order of accesses to system memory and device memory. [1] https://www.ibm.com/developerworks/systems/articles/powerpc.html Fixes: `d23a6bd04d` ("eal/ppc: fix memory barrier for IBM POWER") Cc: stable@dpdk.org Signed-off-by: Dekel Peled <dekelp@mellanox.com>	2019-03-28 23:48:28 +01:00
Michał Mirosław	a1c6b70786	mem: count overcommit hugepages as available With nr_overcommit_hugepages > 0 application may be able to allocate hugepages even when free_hugepages == 0. Take this into account when counting available hugepages. Signed-off-by: Michał Mirosław <michal.miroslaw@atendesoftware.pl> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-03-28 23:33:50 +01:00
Anatoly Burakov	034f1fb616	mem: attempt multiple hugepage allocations at init When requesting memory with ``-m`` or ``--socket-mem`` flags, currently the init will fail if the requested memory amount was bigger than any one memseg list, even if total amount of available memory was sufficient. Fix this by making EAL to attempt to allocate pages multiple times, until we either fulfill our memory requirements, or run out of hugepages to allocate. Bugzilla ID: 95 Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-03-28 23:28:58 +01:00
Anatoly Burakov	bec5625588	mem: improve best-effort allocation Previously, when using non-exact allocation, we were requesting N pages to be allocated, but allowed the memory subsystem to allocate less than requested. However, we were still expecting to see N contigous free pages in the memseg list. This presents a problem because there is no way to try and allocate as many pages as possible, even if there isn't enough contiguous free entries in the list. To address this, use the new "find biggest" fbarray API's when allocating non-exact number of pages. This way, we will first check how many entries in the list are actually available, and then try to allocate up to that number. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-03-28 23:28:54 +01:00
Anatoly Burakov	7353ee7344	fbarray: add API to find biggest used or free chunks Currently, while there is a way to find total amount of used/free space in an fbarray, there is no way to find biggest contiguous chunk. Add such API, as well as unit tests to test this API. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-03-28 23:28:52 +01:00
Anatoly Burakov	5b61c62cfd	fbarray: add internal tailq for mapped areas Currently, there are numerous reliability issues with fbarray, such as: - There is no way to prevent attaching to overlapping memory areas - There is no way to prevent double-detach - Failed destroy leaves fbarray in an invalid state (fbarray itself is valid, but its backing memory area is already detached) In addition, on FreeBSD, doing mmap() on a file descriptor does not keep the lock, so we also need to store the fd in order to keep the lock. This patch improves upon fbarray to address both of these issues by adding an internal tailq to track allocated areas and their respective file descriptors. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>	2019-03-28 23:28:50 +01:00
Nikhil Rao	db9f4430c2	service: fix parameter type for attribute The type of value parameter to rte_service_attr_get should be uint64_t *, since the attributes are of type uint64_t. Fixes: `4d55194d76` ("service: add attribute get function") Signed-off-by: Nikhil Rao <nikhil.rao@intel.com> Reviewed-by: Gage Eads <gage.eads@intel.com> Reviewed-by: Rami Rosen <ramirose@gmail.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>	2019-03-28 21:07:48 +01:00
Ruifeng Wang	90fefe78bf	hash: optimize signature compare for Arm NEON Implemented signature compare function based on neon intrinsic. Hash bulk lookup had 3% - 6% performance gain after optimization. Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Acked-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2019-03-28 19:54:21 +01:00
Dharmik Thakkar	1ae40fdb8a	test/timer: replace config macro with runtime log level This patch replaces macro with log-level based approach to print debug information. Need to set timer log type to debug using the following eal parameter: --log-level=test.timer:debug Suggested-by: Thomas Monjalon <thomas@monjalon.net> Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2019-03-28 19:39:13 +01:00
Dharmik Thakkar	9038ea4674	test/efd: replace config macro with runtime log level This patch enables compilation of print_key_info() always using log-level based approach instead of a macro. Need to set efd log type to debug to print debug information, using the following eal parameter: --log-level=test.efd:debug Suggested-by: Thomas Monjalon <thomas@monjalon.net> Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2019-03-28 19:38:52 +01:00
Dharmik Thakkar	54e5545d33	test/hash: replace config macro with runtime log level Need to set hash log type to debug to print debug information, using following eal parameter: --log-level=test.hash:debug Suggested-by: Thomas Monjalon <thomas@monjalon.net> Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>	2019-03-28 19:37:41 +01:00
Joyce Kong	efbcdaa55b	test/ticketlock: add test cases Add test cases for ticket lock, recursive ticket lock, and ticket lock performance. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2019-03-28 15:02:10 +01:00
Joyce Kong	ca49b92079	ticketlock: enable generic ticketlock on all arch Let all architectures use generic ticketlock implementation. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2019-03-28 15:00:11 +01:00
Joyce Kong	184104fc61	ticketlock: introduce fair ticket based locking The spinlock implementation is unfair, some threads may take locks aggressively while leaving the other threads starving for long time. This patch introduces ticketlock which gives each waiting thread a ticket and they can take the lock one by one. First come, first serviced. This avoids starvation for too long time and is more predictable. Suggested-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2019-03-28 14:58:49 +01:00
Joyce Kong	6fef1ae4fc	test/rwlock: amortize the cost of getting time Instead of getting timestamp per iteration, amortize its overhead can help to get more precise benchmarking results. Fixes: `af75078fec` ("first public release") Cc: stable@dpdk.org Signed-off-by: Joyce Kong <joyce.kong@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2019-03-28 11:49:36 +01:00
Joyce Kong	fe252fb695	test/rwlock: benchmark on all available cores Add performance test on all available cores to benchmark the scaling up performance of rw_lock. Fixes: `af75078fec` ("first public release") Cc: stable@dpdk.org Suggested-by: Gavin Hu <gavin.hu@arm.com> Signed-off-by: Joyce Kong <joyce.kong@arm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2019-03-28 11:47:19 +01:00
Joyce Kong	e8af2f1f11	rwlock: reimplement with atomic builtins The __sync builtin based implementation generates full memory barriers ('dmb ish') on Arm platforms. Using C11 atomic builtins to generate one way barriers. Here is the assembly code of __sync_compare_and_swap builtin. __sync_bool_compare_and_swap(dst, exp, src); 0x000000000090f1b0 <+16>: e0 07 40 f9 ldr x0, [sp, #8] 0x000000000090f1b4 <+20>: e1 0f 40 79 ldrh w1, [sp, #6] 0x000000000090f1b8 <+24>: e2 0b 40 79 ldrh w2, [sp, #4] 0x000000000090f1bc <+28>: 21 3c 00 12 and w1, w1, #0xffff 0x000000000090f1c0 <+32>: 03 7c 5f 48 ldxrh w3, [x0] 0x000000000090f1c4 <+36>: 7f 00 01 6b cmp w3, w1 0x000000000090f1c8 <+40>: 61 00 00 54 b.ne 0x90f1d4 <rte_atomic16_cmpset+52> // b.any 0x000000000090f1cc <+44>: 02 fc 04 48 stlxrh w4, w2, [x0] 0x000000000090f1d0 <+48>: 84 ff ff 35 cbnz w4, 0x90f1c0 <rte_atomic16_cmpset+32> 0x000000000090f1d4 <+52>: bf 3b 03 d5 dmb ish 0x000000000090f1d8 <+56>: e0 17 9f 1a cset w0, eq // eq = none Fixes: `af75078fec` ("first public release") Cc: stable@dpdk.org Signed-off-by: Gavin Hu <gavin.hu@arm.com> Signed-off-by: Joyce Kong <joyce.kong@arm.com> Tested-by: Joyce Kong <joyce.kong@arm.com> Acked-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2019-03-28 11:47:05 +01:00
Gavin Hu	453d8f7366	spinlock: reimplement with atomic one-way barrier The __sync builtin based implementation generates full memory barriers ('dmb ish') on Arm platforms. Using C11 atomic builtins to generate one way barriers. Here is the assembly code of __sync_compare_and_swap builtin. __sync_bool_compare_and_swap(dst, exp, src); 0x000000000090f1b0 <+16>: e0 07 40 f9 ldr x0, [sp, #8] 0x000000000090f1b4 <+20>: e1 0f 40 79 ldrh w1, [sp, #6] 0x000000000090f1b8 <+24>: e2 0b 40 79 ldrh w2, [sp, #4] 0x000000000090f1bc <+28>: 21 3c 00 12 and w1, w1, #0xffff 0x000000000090f1c0 <+32>: 03 7c 5f 48 ldxrh w3, [x0] 0x000000000090f1c4 <+36>: 7f 00 01 6b cmp w3, w1 0x000000000090f1c8 <+40>: 61 00 00 54 b.ne 0x90f1d4 <rte_atomic16_cmpset+52> // b.any 0x000000000090f1cc <+44>: 02 fc 04 48 stlxrh w4, w2, [x0] 0x000000000090f1d0 <+48>: 84 ff ff 35 cbnz w4, 0x90f1c0 <rte_atomic16_cmpset+32> 0x000000000090f1d4 <+52>: bf 3b 03 d5 dmb ish 0x000000000090f1d8 <+56>: e0 17 9f 1a cset w0, eq // eq = none The benchmarking results showed constant improvements on all available platforms: 1. Cavium ThunderX2: 126% performance; 2. Hisilicon 1616: 30%; 3. Qualcomm Falkor: 13%; 4. Marvell ARMADA 8040 with A72 cores on macchiatobin: 3.7% Here is the example test result on TX2: $sudo ./build/app/test -l 16-27 -- i RTE>>spinlock_autotest * spinlock_autotest without this patch * Test with lock on 12 cores... Core [16] Cost Time = 53886 us Core [17] Cost Time = 53605 us Core [18] Cost Time = 53163 us Core [19] Cost Time = 49419 us Core [20] Cost Time = 34317 us Core [21] Cost Time = 53408 us Core [22] Cost Time = 53970 us Core [23] Cost Time = 53930 us Core [24] Cost Time = 53283 us Core [25] Cost Time = 51504 us Core [26] Cost Time = 50718 us Core [27] Cost Time = 51730 us Total Cost Time = 612933 us * spinlock_autotest with this patch * Test with lock on 12 cores... Core [16] Cost Time = 18808 us Core [17] Cost Time = 29497 us Core [18] Cost Time = 29132 us Core [19] Cost Time = 26150 us Core [20] Cost Time = 21892 us Core [21] Cost Time = 24377 us Core [22] Cost Time = 27211 us Core [23] Cost Time = 11070 us Core [24] Cost Time = 29802 us Core [25] Cost Time = 15793 us Core [26] Cost Time = 7474 us Core [27] Cost Time = 29550 us Total Cost Time = 270756 us In the tests on ThunderX2, with more cores contending, the performance gain was even higher, indicating the __atomic implementation scales up better than __sync. Fixes: `af75078fec` ("first public release") Cc: stable@dpdk.org Signed-off-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Nipun Gupta <nipun.gupta@nxp.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2019-03-28 09:19:39 +01:00
Gavin Hu	a52c5530d8	test/spinlock: amortize the cost of getting time Instead of getting timestamps per iteration, amortize its overhead can help getting more precise benchmarking results. Fixes: `af75078fec` ("first public release") Cc: stable@dpdk.org Signed-off-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Nipun Gupta <nipun.gupta@nxp.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2019-03-28 09:18:59 +01:00
Gavin Hu	9119ad305d	test/spinlock: remove delay for correct benchmarking The test is to benchmark the performance of spinlock by counting the number of spinlock acquire and release operations within the specified time. A typical pair of lock and unlock operations costs tens or hundreds of nano seconds, in comparison to this, delaying 1 us outside of the locked region is too much, compromising the goal of benchmarking the lock and unlock performance. Fixes: `af75078fec` ("first public release") Cc: stable@dpdk.org Signed-off-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Acked-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Nipun Gupta <nipun.gupta@nxp.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2019-03-28 09:17:49 +01:00
Gavin Hu	85cffb2ecc	ring: enforce reading tail before slots In weak memory models, like arm64, reading the prod.tail may get reordered after reading the ring slots, which corrupts the ring and stale data is observed. This issue was reported by NXP on 8-A72 DPAA2 board. The problem is most likely caused by missing the acquire semantics when reading prod.tail (in SC dequeue) which makes it possible to read a stale value from the ring slots. For MP (and MC) case, rte_atomic32_cmpset() already provides the required ordering. For SP case, the control depependency between if-statement (which depends on the read of r->cons.tail) and the later stores to the ring slots make RMB unnecessary. About the control dependency, read more at: https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf This patch is adding the required read barrier to prevent reading the ring slots get reordered before reading prod.tail for SC case. Fixes: `c9fb3c6289` ("ring: move code in a new header file") Cc: stable@dpdk.org Signed-off-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Tested-by: Nipun Gupta <nipun.gupta@nxp.com> Acked-by: Nipun Gupta <nipun.gupta@nxp.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2019-03-28 01:22:04 +01:00
Pavan Nikhilesh	5cbd14b3e5	eal: roundup TSC frequency when estimating When estimating tsc frequency using sleep/gettime round it up to the nearest multiple of 10Mhz for more accuracy. Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> Reviewed-by: Keith Wiles <keith.wiles@intel.com>	2019-03-28 00:45:16 +01:00
Pavan Nikhilesh	f56e551485	eal: add macro to align value to the nearest multiple Add macro to align value to the nearest multiple of the given value, resultant value might be greater than or less than the first parameter whichever difference is the lowest. Update unit test to include the new macro. Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>	2019-03-28 00:45:00 +01:00

1 2 3 4 5 ...

16925 Commits