Commit Graph

2004 Commits

Author SHA1 Message Date
Bruce Richardson
f9acaf84e9 replace snprintf with strlcpy without adding extra include
For files that already have rte_string_fns.h included in them, we can
do a straight replacement of snprintf(..."%s",...) with strlcpy. The
changes in this patch were auto-generated via command:

spatch --sp-file devtools/cocci/strlcpy-with-header.cocci --dir . --in-place

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2019-04-04 22:45:54 +02:00
Bruce Richardson
70d284ab82 eal: tighten permissions on shared memory files
When creating files on disk, e.g. for EAL configuration or shared memory
locks, etc., there is no need to grant any permissions on those files to
other users. All directories are already created with 0700 permissions, so
we should create all files with 0600 permissions.

Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2019-04-04 22:06:16 +02:00
Thomas Monjalon
721ac9f9e0 eal/x86: fix pedantic build
When enabling pedantic compilation with CONFIG_RTE_LIBRTE_MLX5_DEBUG,
the compiler complains about non standard 128-bit integer type:

include/rte_atomic_64.h:223:3: error:
ISO C does not support ‘__int128’ types [-Werror=pedantic]

It must be marked as an extension of the standard C language
to be accepted in pedantic compilation.

Fixes: 640c5f09ef ("eal/x86: add 128-bit atomic compare exchange")
Cc: gage.eads@intel.com

Reported-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Gage Eads <gage.eads@intel.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-04-04 17:22:06 +02:00
Jerin Jacob
4b3997680a eal: allow to override init macros per OS
baremetal execution environments may have a different
method to enable RTE_INIT instead of using compiler
constructor and/or OS specific linker scheme.
Allow an option to override RTE_INIT* macros using
rte_os.h or appropriate header file.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2019-04-03 23:52:00 +02:00
Gage Eads
640c5f09ef eal/x86: add 128-bit atomic compare exchange
This operation can be used for non-blocking algorithms, such as a
non-blocking stack or ring.

It is available only for x86_64.

Signed-off-by: Gage Eads <gage.eads@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
2019-04-03 21:59:46 +02:00
Shahaf Shuler
237060c4ad mem: limit use of address hint
The commit below added an address hint as starting address for 64-bit
systems in case an explicit base virtual address was not set by the user.

The justification for such hint was to help devices that work in VA
mode and has a address range limitation to work smoothly with the eal
memory subsystem.

While the base address value selected may work fine for the eal
initialization, it easily breaks when trying to register external memory
using rte_extmem_register API.

Trying to register anonymous memory on RH x86_64 machine took several
minutes, during them the function eal_get_virtual_area repeatedly
scanned for a good VA candidate.

The attempt to guess which VA address will be free for mapping will
always result in not portable, error prone code:
* different application may use different libraries along w/ DPDK. One
  can never guess which library was called first and how much virtual
  memory it consumed.
* external memory can be registered at any time in the application run
  time.

In order not to break the existing secondary process design, this patch
only limits the max number of tries that will be done with the
address hint.
When the number of tries exceeds the threshold the code
will use the suggested address from kernel.

Fixes: 1df2170287 ("mem: use address hint for mapping hugepages")
Cc: stable@dpdk.org

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Tested-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Alejandro Lucero <alejandro.lucero@netronome.com>
2019-04-03 19:10:47 +02:00
Stephen Hemminger
d0885cb781 eal: align hexdump output
This fixes the issue where if the length of the output is not
a multiple of 16 the formatting was off.

Before:
00000000: 45 00 00 1C 12 34 2C E0 40 06 B8 2E C0 A8 01 12 | E....4,.@.......
00000010: C0 A8 01 37 |  |  |  |  |  |  |  |  |  |  |  |  | ...7

After:
00000000: 45 00 00 1C 12 34 2C E0 40 06 B8 2E C0 A8 01 12 | E....4,.@.......
00000010: C0 A8 01 37                                     | ...7

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-03 18:34:59 +02:00
Stephen Hemminger
779d9d0986 eal: clean formatting of hexdump functions
The hexdump code obviously came from somewhere else originally.
It is not formatted according to DPDK coding style.

Also, drop the comment which is not useful the docbock comment
is already in the rte_hexdump.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-03 18:32:42 +02:00
Stephen Hemminger
6d96b48af8 eal: make u64 reciprocal divisor const
The divisor is not modified here. Doesn't really matter for optimizaton
since the function is inline already; but helps with expressing
intent.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-04-03 18:32:41 +02:00
Anand Rawat
fa647c5722 build: add workarounds for Windows helloworld
Added meson workarounds to build helloworld on Windows.
Windows currently only supports kvargs and eal libraries.
This change restricts the build flow to supported libraries
only.

Signed-off-by: Anand Rawat <anand.rawat@intel.com>
Signed-off-by: Pallavi Kadam <pallavi.kadam@intel.com>
Reviewed-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Reviewed-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>
2019-04-03 01:21:31 +02:00
Anand Rawat
53ffd9f080 eal/windows: add minimum viable code
Add Windows specific logic for eal.c, eal_lcore.c,
eal_debug.c and eal_thread.c. Updated header files to
contain suitable function declarations.

Signed-off-by: Anand Rawat <anand.rawat@intel.com>
Signed-off-by: Pallavi Kadam <pallavi.kadam@intel.com>
Reviewed-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Reviewed-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>
2019-04-03 01:21:31 +02:00
Anand Rawat
4dc2b4d2a4 eal/windows: add headers for compatibility
Added headers to support Windows environment for common source.
These headers will have Windows specific implementions of the
system library APIs provided in Linux and FreeBSD.

Signed-off-by: Anand Rawat <anand.rawat@intel.com>
Signed-off-by: Pallavi Kadam <pallavi.kadam@intel.com>
Reviewed-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Reviewed-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>
2019-04-03 01:21:31 +02:00
Anand Rawat
846ff907ee eal/windows: add sys/queue.h implementation copy
Adding sys/queue.h on Windows for supporting common code.
This implementation has BSD-3-Clause licensing.

Signed-off-by: Ranjit Menon <ranjit.menon@intel.com>
Signed-off-by: Anand Rawat <anand.rawat@intel.com>
Reviewed-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Acked-by: Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>
2019-04-03 01:21:31 +02:00
Anand Rawat
82ba4416dd build: add module definition files for Windows
Updated lib/meson.build to create shared libraries on Windows.
Added DEF files to list the exports for the eal and kvargs libraries.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Anand Rawat <anand.rawat@intel.com>
Reviewed-by: Pallavi Kadam <pallavi.kadam@intel.com>
Reviewed-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>
2019-04-03 01:21:31 +02:00
Anand Rawat
58836e93f5 eal/windows: add wrappers for string functions
Updated rte_common.h to include rte_os.h to contain
OS specific macros and functions. Updated rte_string_fns.h
to include rte_common.h for rte_os.h

Signed-off-by: Anand Rawat <anand.rawat@intel.com>
Reviewed-by: Pallavi Kadam <pallavi.kadam@intel.com>
Reviewed-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>
2019-04-03 01:21:15 +02:00
Anand Rawat
428eb983f5 eal: add OS specific header file
Added rte_os.h files to support OS specific functionality.
Updated build system to contain OS headers in the include
path.

Signed-off-by: Anand Rawat <anand.rawat@intel.com>
Reviewed-by: Pallavi Kadam <pallavi.kadam@intel.com>
Acked-by: Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2019-04-03 01:11:56 +02:00
Anand Rawat
98edcbb5ab eal/windows: introduce Windows support
Added initial stub source files and required meson changes
for Windows support.

kernel/windows/meson is a stub file added to support
Windows specific source in future releases.

Signed-off-by: Pallavi Kadam <pallavi.kadam@intel.com>
Signed-off-by: Anand Rawat <anand.rawat@intel.com>
Reviewed-by: Jeff Shaw <jeffrey.b.shaw@intel.com>
Reviewed-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>
2019-04-03 01:06:01 +02:00
Thomas Monjalon
3c45889189 eal: remove exec-env directory
Only one header file (rte_kni_common.h) was in the sub-directory
	include/exec-env/
This file was installed in a sub-directory of the same name
in the makefile-based build.
Source and install directories are moved as below:

   lib/librte_eal/linux/eal/include/exec-env/
-> lib/librte_eal/linux/eal/include/

   build/include/exec-env/
-> build/include/

The consequence is to have a file hierarchy a bit more flat.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-04-02 21:49:35 +02:00
Anatoly Burakov
1e3380a2f4 mem: do not use lockfiles for single file segments mode
Due to internal glibc limitations [1], DPDK may exhaust internal
file descriptor limits when using smaller page sizes, which results
in inability to use system calls such as select() by user
applications.

Single file segments option stores lock files per page to ensure
that pages are deleted when there are no more users, however this
is not necessary because the processes will be holding onto the
pages anyway because of mmap(). Thus, removing pages from the
filesystem is safe even though they may be used by some other
secondary process. As a result, single file segments mode no
longer stores inordinate amounts of segment fd's, and the above
issue with fd limits is solved.

However, this will not work for legacy mem mode. For that, simply
document that using bigger page sizes is the only option.

[1] https://mails.dpdk.org/archives/dev/2019-February/124386.html

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-04-02 16:07:25 +02:00
Anatoly Burakov
848cbff836 mem: refactor segment resizing function
Currently, segment resizing code sits in one giant function which
handles both in-memory and regular modes. Split them up into
individual functions.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-04-02 16:07:13 +02:00
Darek Stojaczyk
ea4e3ab7bd eal: initialize alarms early
On Linux, we currently initialize rte_alarms after
starting to listen for IPC hotplug requests, which gives
us a data race window. Upon receiving such hotplug
request we always try to set an alarm and this obviously
doesn't work if the alarms weren't initialized yet.

To fix it, we initialize alarms before starting to
listen for IPC hotplug messages. Specifically, we move
rte_eal_alarm_init() right after rte_eal_intr_init() as
it makes some sense to keep those two close to each other.

We update the BSD code as well to keep the initialization
order the same in both EAL implementations.

Fixes: 244d513071 ("eal: enable hotplug on multi-process")
Cc: stable@dpdk.org

Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
2019-04-02 15:00:26 +02:00
Pavan Nikhilesh
e840cb3c2a eal: increase max number of interrupt vectors
MSI-X permits a device to allocate up to 2048 interrupts as per PCIe
spec.
Increase the max number of vectors to a reasonable value of 512.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
2019-04-02 02:59:04 +02:00
Natanael Copa
c2d82896ac eal/linux: remove thread ID from debug message
There is no guarantee that pthread_self() returns the thread ID or that
pthread_t is an integer. The thread ID is not that useful so simply
remove it.

This fixes the following warning when building with musl libc:

lib/librte_eal/linuxapp/eal/eal_dev.c: In function 'sigbus_handler':
lib/librte_eal/linuxapp/eal/eal_dev.c:70:3: warning:
cast from pointer to integer of different size [-Wpointer-to-int-cast]
   (int)pthread_self(), info->si_addr);
   ^

Fixes: 0fc54536b1 ("eal: add failure handling for hot-unplug")
Cc: stable@dpdk.org

Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
2019-03-31 01:01:28 +01:00
Shahaf Shuler
c33a675b62 bus: introduce device level DMA memory mapping
The DPDK APIs expose 3 different modes to work with memory used for DMA:

1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
This memory is allocated by the DPDK libraries, included in the DPDK
memory system (memseg lists) and automatically DMA mapped by the DPDK
layers.

2. Use memory allocated by the user and register to the DPDK memory
systems. Upon registration of memory, the DPDK layers will DMA map it
to all needed devices. After registration, allocation of this memory
will be done with rte_*malloc APIs.

3. Use memory allocated by the user and not registered to the DPDK memory
system. This is for users who wants to have tight control on this
memory (e.g. avoid the rte_malloc header).
The user should create a memory, register it through rte_extmem_register
API, and call DMA map function in order to register such memory to
the different devices.

The scope of the patch focus on #3 above.

Currently the only way to map external memory is through VFIO
(rte_vfio_dma_map). While VFIO is common, there are other vendors
which use different ways to map memory (e.g. Mellanox and NXP).

The work in this patch moves the DMA mapping to vendor agnostic APIs.
Device level DMA map and unmap APIs were added. Implementation of those
APIs was done currently only for PCI devices.

For PCI bus devices, the pci driver can expose its own map and unmap
functions to be used for the mapping. In case the driver doesn't provide
any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.

Application usage with those APIs is quite simple:
* allocate memory
* call rte_extmem_register on the memory chunk.
* take a device, and query its rte_device.
* call the device specific mapping function for this device.

Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
APIs, leaving the rte device APIs as the preferred option for the user.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2019-03-30 16:48:56 +01:00
Shahaf Shuler
0cbce3a167 vfio: skip DMA map failure if already mapped
Currently vfio DMA map function will fail in case the same memory
segment is mapped twice.

This is too strict, as this is not an error to map the same memory
twice.

Instead, use the kernel return value to detect such state and have the
DMA function to return as successful.

For type1 mapping the kernel driver returns EEXISTS.
For spapr mapping EBUSY is returned since kernel 4.10.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2019-03-30 16:48:55 +01:00
Shahaf Shuler
4106d89a18 vfio: allow DMA map to the default container
Enable users the option to call rte_vfio_dma_map with request to map
to the default vfio fd.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2019-03-30 16:47:54 +01:00
Anatoly Burakov
23d5455517 mem: warn user when running without NUMA support
Running in non-legacy mode on a NUMA-enabled system without libnuma
is unsupported, so explicitly print out a warning when trying to
do so.

Running in legacy mode without libnuma is still supported whether or
not we are running with libnuma support enabled, so also fix init to
allow that scenario.

Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-30 00:13:04 +01:00
Anatoly Burakov
3660216ef1 malloc: fix IPC message initialization
The memset size for an IPC message is set incorrectly. Fix it to
cover the entire IPC message.

Fixes: 07dcbfe010 ("malloc: support multiprocess memory hotplug")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-29 12:55:07 +01:00
Anatoly Burakov
b8a86c83e0 fbarray: fix init unlock without lock
Certain failure paths of rte_fbarray_init() will unlock the
mem area lock without locking it first. Fix this by properly
handling the failures.

Fixes: 5b61c62cfd ("fbarray: add internal tailq for mapped areas")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-29 12:49:35 +01:00
Darek Stojaczyk
5a98bc5e83 fbarray: fix attach deadlock
rte_fbarray_attach() currently locks its internal
spinlock, but never releases it. Secondary processes
won't even start if there is more than one fbarray
to be attached to - the second rte_fbarray_attach()
would be just stuck.

Fix it by releasing the lock at the end of
rte_fbarray_attach(). I believe this was the original
intention.

Fixes: 5b61c62cfd ("fbarray: add internal tailq for mapped areas")

Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-29 12:49:35 +01:00
Anatoly Burakov
1fd3bcf3f9 vfio: document multiprocess limitation for container API
Currently, there is no support for sharing custom VFIO containers
between multiple processes, but it is not documented.

Document this limitation.

Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-29 00:07:16 +01:00
Thomas Monjalon
3a1a885e03 eal: remove redundant atomic API description
Atomic functions are described in doxygen of the file
lib/librte_eal/common/include/generic/rte_atomic.h
The copies in arch-specific files are redundant
and confuse readers about the genericity of the API.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-28 23:52:53 +01:00
Dekel Peled
8015c5593a eal/ppc: fix global memory barrier
From previous patch description: "to improve performance on PPC64,
use light weight sync instruction instead of sync instruction."

Excerpt from IBM doc [1], section "Memory barrier instructions":
"The second form of the sync instruction is light-weight sync,
or lwsync.
This form is used to control ordering for storage accesses to system
memory only. It does not create a memory barrier for accesses to
device memory."

This patch removes the use of lwsync, so calls to rte_wmb() and
rte_rmb() will provide correct memory barrier to ensure order of
accesses to system memory and device memory.

[1] https://www.ibm.com/developerworks/systems/articles/powerpc.html

Fixes: d23a6bd04d ("eal/ppc: fix memory barrier for IBM POWER")
Cc: stable@dpdk.org

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
2019-03-28 23:48:28 +01:00
Michał Mirosław
a1c6b70786 mem: count overcommit hugepages as available
With nr_overcommit_hugepages > 0 application may be able to allocate
hugepages even when free_hugepages == 0. Take this into account when
counting available hugepages.

Signed-off-by: Michał Mirosław <michal.miroslaw@atendesoftware.pl>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-28 23:33:50 +01:00
Anatoly Burakov
034f1fb616 mem: attempt multiple hugepage allocations at init
When requesting memory with ``-m`` or ``--socket-mem`` flags,
currently the init will fail if the requested memory amount was
bigger than any one memseg list, even if total amount of
available memory was sufficient.

Fix this by making EAL to attempt to allocate pages multiple
times, until we either fulfill our memory requirements, or run
out of hugepages to allocate.

Bugzilla ID: 95

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-28 23:28:58 +01:00
Anatoly Burakov
bec5625588 mem: improve best-effort allocation
Previously, when using non-exact allocation, we were requesting
N pages to be allocated, but allowed the memory subsystem to
allocate less than requested. However, we were still expecting
to see N contigous free pages in the memseg list.

This presents a problem because there is no way to try and
allocate as many pages as possible, even if there isn't
enough contiguous free entries in the list.

To address this, use the new "find biggest" fbarray API's when
allocating non-exact number of pages. This way, we will first
check how many entries in the list are actually available, and
then try to allocate up to that number.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-28 23:28:54 +01:00
Anatoly Burakov
7353ee7344 fbarray: add API to find biggest used or free chunks
Currently, while there is a way to find total amount of used/free
space in an fbarray, there is no way to find biggest contiguous
chunk. Add such API, as well as unit tests to test this API.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-28 23:28:52 +01:00
Anatoly Burakov
5b61c62cfd fbarray: add internal tailq for mapped areas
Currently, there are numerous reliability issues with fbarray,
such as:
- There is no way to prevent attaching to overlapping memory
  areas
- There is no way to prevent double-detach
- Failed destroy leaves fbarray in an invalid state (fbarray
  itself is valid, but its backing memory area is already
  detached)

In addition, on FreeBSD, doing mmap() on a file descriptor
does not keep the lock, so we also need to store the fd
in order to keep the lock.

This patch improves upon fbarray to address both of these
issues by adding an internal tailq to track allocated areas
and their respective file descriptors.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-28 23:28:50 +01:00
Nikhil Rao
db9f4430c2 service: fix parameter type for attribute
The type of value parameter to rte_service_attr_get
should be uint64_t *, since the attributes
are of type uint64_t.

Fixes: 4d55194d76 ("service: add attribute get function")

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Reviewed-by: Gage Eads <gage.eads@intel.com>
Reviewed-by: Rami Rosen <ramirose@gmail.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
2019-03-28 21:07:48 +01:00
Joyce Kong
ca49b92079 ticketlock: enable generic ticketlock on all arch
Let all architectures use generic ticketlock implementation.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2019-03-28 15:00:11 +01:00
Joyce Kong
184104fc61 ticketlock: introduce fair ticket based locking
The spinlock implementation is unfair, some threads may take locks
aggressively while leaving the other threads starving for long time.

This patch introduces ticketlock which gives each waiting thread a
ticket and they can take the lock one by one. First come, first serviced.
This avoids starvation for too long time and is more predictable.

Suggested-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2019-03-28 14:58:49 +01:00
Joyce Kong
e8af2f1f11 rwlock: reimplement with atomic builtins
The __sync builtin based implementation generates full memory
barriers ('dmb ish') on Arm platforms. Using C11 atomic builtins
to generate one way barriers.

Here is the assembly code of __sync_compare_and_swap builtin.
__sync_bool_compare_and_swap(dst, exp, src);
   0x000000000090f1b0 <+16>:    e0 07 40 f9 ldr x0, [sp, #8]
   0x000000000090f1b4 <+20>:    e1 0f 40 79 ldrh    w1, [sp, #6]
   0x000000000090f1b8 <+24>:    e2 0b 40 79 ldrh    w2, [sp, #4]
   0x000000000090f1bc <+28>:    21 3c 00 12 and w1, w1, #0xffff
   0x000000000090f1c0 <+32>:    03 7c 5f 48 ldxrh   w3, [x0]
   0x000000000090f1c4 <+36>:    7f 00 01 6b cmp w3, w1
   0x000000000090f1c8 <+40>:    61 00 00 54 b.ne    0x90f1d4
<rte_atomic16_cmpset+52>  // b.any
   0x000000000090f1cc <+44>:    02 fc 04 48 stlxrh  w4, w2, [x0]
   0x000000000090f1d0 <+48>:    84 ff ff 35 cbnz    w4, 0x90f1c0
<rte_atomic16_cmpset+32>
   0x000000000090f1d4 <+52>:    bf 3b 03 d5 dmb ish
   0x000000000090f1d8 <+56>:    e0 17 9f 1a cset    w0, eq  // eq = none

Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Tested-by: Joyce Kong <joyce.kong@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2019-03-28 11:47:05 +01:00
Gavin Hu
453d8f7366 spinlock: reimplement with atomic one-way barrier
The __sync builtin based implementation generates full memory barriers
('dmb ish') on Arm platforms. Using C11 atomic builtins to generate one way
barriers.

Here is the assembly code of __sync_compare_and_swap builtin.
__sync_bool_compare_and_swap(dst, exp, src);
   0x000000000090f1b0 <+16>:    e0 07 40 f9 ldr x0, [sp, #8]
   0x000000000090f1b4 <+20>:    e1 0f 40 79 ldrh    w1, [sp, #6]
   0x000000000090f1b8 <+24>:    e2 0b 40 79 ldrh    w2, [sp, #4]
   0x000000000090f1bc <+28>:    21 3c 00 12 and w1, w1, #0xffff
   0x000000000090f1c0 <+32>:    03 7c 5f 48 ldxrh   w3, [x0]
   0x000000000090f1c4 <+36>:    7f 00 01 6b cmp w3, w1
   0x000000000090f1c8 <+40>:    61 00 00 54 b.ne    0x90f1d4
<rte_atomic16_cmpset+52>  // b.any
   0x000000000090f1cc <+44>:    02 fc 04 48 stlxrh  w4, w2, [x0]
   0x000000000090f1d0 <+48>:    84 ff ff 35 cbnz    w4, 0x90f1c0
<rte_atomic16_cmpset+32>
   0x000000000090f1d4 <+52>:    bf 3b 03 d5 dmb ish
   0x000000000090f1d8 <+56>:    e0 17 9f 1a cset    w0, eq  // eq = none

The benchmarking results showed constant improvements on all available
platforms:
1. Cavium ThunderX2: 126% performance;
2. Hisilicon 1616: 30%;
3. Qualcomm Falkor: 13%;
4. Marvell ARMADA 8040 with A72 cores on macchiatobin: 3.7%

Here is the example test result on TX2:
$sudo ./build/app/test -l 16-27 -- i
RTE>>spinlock_autotest

*** spinlock_autotest without this patch ***
Test with lock on 12 cores...
Core [16] Cost Time = 53886 us
Core [17] Cost Time = 53605 us
Core [18] Cost Time = 53163 us
Core [19] Cost Time = 49419 us
Core [20] Cost Time = 34317 us
Core [21] Cost Time = 53408 us
Core [22] Cost Time = 53970 us
Core [23] Cost Time = 53930 us
Core [24] Cost Time = 53283 us
Core [25] Cost Time = 51504 us
Core [26] Cost Time = 50718 us
Core [27] Cost Time = 51730 us
Total Cost Time = 612933 us

*** spinlock_autotest with this patch ***
Test with lock on 12 cores...
Core [16] Cost Time = 18808 us
Core [17] Cost Time = 29497 us
Core [18] Cost Time = 29132 us
Core [19] Cost Time = 26150 us
Core [20] Cost Time = 21892 us
Core [21] Cost Time = 24377 us
Core [22] Cost Time = 27211 us
Core [23] Cost Time = 11070 us
Core [24] Cost Time = 29802 us
Core [25] Cost Time = 15793 us
Core [26] Cost Time = 7474 us
Core [27] Cost Time = 29550 us
Total Cost Time = 270756 us

In the tests on ThunderX2, with more cores contending, the performance gain
was even higher, indicating the __atomic implementation scales up better
than __sync.

Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2019-03-28 09:19:39 +01:00
Pavan Nikhilesh
5cbd14b3e5 eal: roundup TSC frequency when estimating
When estimating tsc frequency using sleep/gettime round it up to the
nearest multiple of 10Mhz for more accuracy.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Reviewed-by: Keith Wiles <keith.wiles@intel.com>
2019-03-28 00:45:16 +01:00
Pavan Nikhilesh
f56e551485 eal: add macro to align value to the nearest multiple
Add macro to align value to the nearest multiple of the given value,
resultant value might be greater than or less than the first parameter
whichever difference is the lowest.
Update unit test to include the new macro.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
2019-03-28 00:45:00 +01:00
Jakub Grajciar
0c7ce182a7 eal: add pending interrupt callback unregister
use case: if callback is used to receive message form socket,
and the message received is disconnect/error, this callback needs
to be unregistered, but cannot because it is still active.

With this patch it is possible to mark the callback to be
unregistered once the interrupt process is done with this
interrupt source.

Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
2019-03-27 18:53:47 +01:00
Kevin Traynor
c0d9052afb eal/linux: fix log levels for pagemap reading failure
Commit cdc242f260 says:
    For Linux kernel 4.0 and newer, the ability to obtain
    physical page frame numbers for unprivileged users from
    /proc/self/pagemap was removed. Instead, when an IOMMU
    is present, simply choose our own DMA addresses instead.

In this case the user still sees error messages, so adjust
the log levels. Later, other checks will ensure that errors
are logged in the appropriate cases.

Fixes: cdc242f260 ("eal/linux: support running as unprivileged user")
Cc: stable@dpdk.org

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
2019-03-27 14:54:40 +01:00
Anatoly Burakov
929a91e99c malloc: fix documentation of realloc function
The documentation for rte_realloc claims that the resized area
will always reside on the same NUMA node. This is not actually
the case - while *resized* area will be on the same NUMA node,
if resizing the area is not possible, then the memory will be
reallocated using rte_malloc(), which can allocate memory on
another NUMA node, depending on which lcore rte_realloc() was
called from and which NUMA nodes have memory available.

Fix the API doc to match the actual code of rte_realloc().

Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-27 12:15:04 +01:00
Stephen Hemminger
24aa4f0fba mem: poison memory when freed
DPDK malloc library allows broken programs to work because
the semantics of zmalloc and malloc are the same.

This patch enables a  more secure model which will catch
(and crash) programs that reuse memory already freed if
RTE_MALLOC_DEBUG is enabled.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-27 10:53:41 +01:00
Bruce Richardson
88f591d1db eal: remove unneeded version logic
The version number in the DPDK_VERSION file will never have an offset
that needs to be subtracted, so remove that logic from the version
string generation.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
2019-03-27 09:43:54 +01:00