1348 Commits

Author SHA1 Message Date
Mattias Rönnblom
faf8fd2527 eal: improve entropy for initial PRNG seed
Replace the use of rte_get_timer_cycles() with getentropy() for
seeding the pseudo-random number generator. getentropy() provides a
more truly random value.

getentropy() requires glibc 2.25 and Linux kernel 3.17. In case
getentropy() is not found at compile time, or the relevant syscall
fails in runtime, the rdseed machine instruction will be used as a
fallback.

rdseed is only available on x86 (Broadwell or later). In case it is
not present, rte_get_timer_cycles() will be used as a second fallback.

On non-Meson builds, getentropy() will not be used.

Suggested-by: Bruce Richardson <bruce.richardson@intel.com>
Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2019-06-28 15:23:52 +02:00
Mattias Rönnblom
3f002f0696 eal: replace libc-based random generation with LFSR
This commit replaces rte_rand()'s use of lrand48() with a DPDK-native
combined Linear Feedback Shift Register (LFSR) (also known as
Tausworthe) pseudo-random number generator.

This generator is faster and produces better-quality random numbers
than the linear congruential generator (LCG) of lib's lrand48(). The
implementation, as opposed to lrand48(), is multi-thread safe in
regards to concurrent rte_rand() calls from different lcore threads.
A LCG is still used, but only to seed the five per-lcore LFSR
sequences.

In addition, this patch also addresses the issue of the legacy
implementation only producing 62 bits of pseudo randomness, while the
API requires all 64 bits to be random.

This pseudo-random number generator is not cryptographically secure -
just like lrand48().

Bugzilla ID: 114
Bugzilla ID: 276

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2019-06-28 15:23:38 +02:00
Maxime Coquelin
1f4d55be43 eal/x86: force inlining of all memcpy and mov helpers
Some helpers in the header file are forced inlined other are
only inlined, this patch forces inline for all.

It will avoid it to be embedded as functions when called multiple
times in the same object file. For example, when we added packed
ring support in vhost-user library, rte_memcpy_generic got no
more inlined.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2019-06-13 23:54:29 +09:00
Anatoly Burakov
67dd4d77e0 ipc: handle unsupported IPC in async request
Currently, IPC API will silently ignore unsupported IPC.
Fix the API call to explicitly handle unsupported IPC cases.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-06-05 11:28:10 +02:00
Anatoly Burakov
c7ef989970 ipc: handle unsupported IPC in sync request
Currently, IPC API will silently ignore unsupported IPC.
Fix the API call and its callers to explicitly handle
unsupported IPC cases.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-06-05 11:28:05 +02:00
Anatoly Burakov
144767514e ipc: handle unsupported IPC in sendmsg
Currently, IPC API will silently ignore unsupported IPC.
Fix the API call and its callers to explicitly handle
unsupported IPC cases.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-06-05 11:28:00 +02:00
Anatoly Burakov
9a4274be9e ipc: do not unregister action if IPC unsupported
Currently, unregister will be attempted even if IPC wasn't
supported in the first place. It is harmless, but for
consistency reasons, update the unregister API call to
exit early when IPC is not supported.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-06-05 11:27:43 +02:00
Anatoly Burakov
edf73dd330 ipc: handle unsupported IPC in action register
Currently, IPC API will silently ignore unsupported IPC.
Fix the API call and its callers to explicitly handle
unsupported IPC cases.

For primary processes, it is OK to not have IPC because
there may not be any secondary processes in the first place,
and there are valid use cases that disable IPC support, so
all primary process usages are fixed up to ignore IPC
failures.

For secondary processes, IPC will be crucial, so leave all
of the error handling as is.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-06-05 11:27:36 +02:00
Anatoly Burakov
c830900d75 ipc: handle unsupported IPC in init
Currently, IPC API will silently ignore unsupported IPC.
Fix the API call and its callers to explicitly handle
unsupported IPC cases.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-06-05 11:27:17 +02:00
Yongseok Koh
0cb86518db bus/pci: add Mellanox kernel driver type
When checking RTE_PCI_DRV_IOVA_AS_VA flag to determine IOVA mode,
pci_one_device_has_iova_va() returns true only if kernel driver of the
device is vfio. However, Mellanox mlx4/5 PMD doesn't need to be detached
from kernel driver and attached to VFIO/UIO. Control path still goes
through the existing kernel driver, which is mlx4_core/mlx5_core. In order
to make RTE_PCI_DRV_IOVA_AS_VA effective for mlx4/mlx5 PMD, a new kernel
driver type has to be introduced.

Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
2019-06-04 00:33:06 +02:00
Bruce Richardson
ffadd933ac eal/x86: check rdrand and rdseed
The meson build never checked for the presence of rdrand and rdseed
instructions, while make build never checked for rdseed. Ensure builds
always have the appropriate checks - and therefore defines - for these
instructions. For runtime, we also add in rdseed to the list of known
bits returned from cpuid() instruction, so we can confirm its presence at
application init time.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
2019-06-04 00:23:04 +02:00
Stephen Hemminger
26cc3bbe4d eal: add lcore accessors
The fields of the internal EAL core configuration are currently
laid bare as part of the API. This is not good practice and limits
fixing issues with layout and sizes.

Make new accessor functions for the fields used by current drivers
and examples.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Marchand <david.marchand@redhat.com>
2019-06-03 12:29:54 +02:00
Stephen Hemminger
3a1bb50734 eal: use unsigned int in lcore API prototypes
Purely cosmetic change, use unsigned int instead of unsigned alone.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Marchand <david.marchand@redhat.com>
2019-06-03 12:29:23 +02:00
Michael Santana
f4be6a9a29 fix off-by-one errors in snprintf
snprintf guarantees to always correctly place a null terminator
in the buffer string. So manually placing a null terminator
in a buffer right after a call to snprintf is redundant code.

Additionally, there is no need to use 'sizeof(buffer) - 1' in snprintf as this
means we are not using the last character in the buffer. 'sizeof(buffer)' is
enough.

Cc: stable@dpdk.org

Signed-off-by: Michael Santana <msantana@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-05-29 13:02:53 +02:00
Anatoly Burakov
bfbc3a5041 ipc: add warnings about correct API usage
When handling synchronous or asynchronous requests, the reply
must be sent explicitly even if the result of the operation is
an error, to avoid the other side timing out. Make note of this
in documentation explicitly.

Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-05-09 17:50:59 +02:00
Anatoly Burakov
3855b41500 ipc: add warnings about not using IPC with memory API
IPC and memory-related API's should not be mixed because memory
relies on IPC internally. Add explicit warnings to IPC API and
to the documentation about this.

Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-05-09 17:49:32 +02:00
Aaron Conole
e377285aba ipc: unlock on failure
Coverity issue: 340076
Fixes: a2a06860b8c4 ("ipc: fix memory leak on request failure")
Cc: stable@dpdk.org

Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-05-09 16:30:03 +02:00
Thomas Monjalon
bb8b33b9a6 ipc: replace bool checks with explicit non-zero
The function check_input() was returning a bool as error code.
It is changed to return an int, semantically more correct.
While at it, make checks of validate_action_name() return
explicit as described in the coding guidelines.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-05-03 22:02:57 +02:00
Anatoly Burakov
6e5d779ecb ipc: handle more invalid parameter cases
Length of buffer and number of fd's to send are signed values, so
they can be negative, but the API doesn't check for that. Fix it
by checking for negative values as well.

Fixes: bacaa2754017 ("eal: add channel for multi-process communication")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-05-03 14:52:49 +02:00
Anatoly Burakov
7b51d1b162 ipc: harden message receive
Currently, IPC does not check received messages for invalid data
and passes them to user code unchanged. This may result in buffer
overruns on reading message data. Fix this by checking the message
length and fd number on receive, and discard any messages that
are not valid.

Fixes: bacaa2754017 ("eal: add channel for multi-process communication")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-05-03 14:30:49 +02:00
Anatoly Burakov
535113907f ipc: fix send error handling
According to manpage, ENOBUFS error indicates that either the
input or the output queue is full. This should be considered
an error, but it is treated as an "ignore" condition. Fix the
code to report an error instead.

Fixes: bacaa2754017 ("eal: add channel for multi-process communication")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Rami Rosen <ramirose@gmail.com>
2019-05-03 14:25:55 +02:00
Herakliusz Lipiec
a2a06860b8 ipc: fix memory leak on request failure
When sending multiple requests, rte_mp_request_sync
can succeed sending a few of those requests, but then
fail on a later one and in the end return with rc=-1.
The upper layers - e.g. device hotplug - currently
handles this case as if no messages were sent and no
memory for response buffers was allocated, which is
not true. Fixed by always freeing memory buffers on
failure.

Bugzilla ID: 228
Fixes: 783b6e54971d ("eal: add synchronous multi-process communication")
Cc: stable@dpdk.org

Signed-off-by: Herakliusz Lipiec <herakliusz.lipiec@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-05-03 12:51:28 +02:00
Stephen Hemminger
4e2fdf3760 eal: fix formatting of hotplug error message
This message was missing newline, and should capitalize
"Cannot" like all the others in this area.

Fixes: ac9e4a17370f ("eal: support attach/detach shared device from secondary")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Rami Rosen <ramirose@gmail.com>
2019-05-03 12:51:20 +02:00
John McNamara
8bd5f07c7a doc: fix spelling reported by aspell in comments
Fix spelling errors in the doxygen docs.

Signed-off-by: John McNamara <john.mcnamara@intel.com>
2019-05-03 00:38:14 +02:00
Thomas Monjalon
d711bea6fe eal: promote some experimental functions as stable
The function rte_eal_cleanup() was introduced more than one year ago,
in DPDK 18.02. It is no longer experimental, allowing
pdump, proc-info and hotplug_mp apps to not need any experimental API.

The function rte_ctrl_thread_create() was introduced one year ago
in DPDK 18.05. It is no longer experimental, allowing
KNI PMD and TEP example to not need any experimental API.

The functions rte_socket_count() and rte_socket_id_by_idx() were
introduced one year ago in DPDK 18.05. They are no longer experimental.

The function rte_dev_is_probed() was introduced half a year ago
in DPDK 18.11. It is no longer experimental.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
2019-04-21 19:11:37 +02:00
Hemant Agrawal
73eca2f77f devargs: promote experimental API as stable
These APIs are available in DPDK for last 4 releases
and used by multiple drivers.

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2019-04-17 18:30:04 +02:00
Dekel Peled
f129008f5a eal: fix typo in comment of vector function
Remove redundant item 'a4' in comment.

Fixes: 86c743cf9140 ("eal: define generic vector types")
Cc: stable@dpdk.org

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
2019-04-05 10:40:56 +02:00
Bruce Richardson
6723c0fc72 replace snprintf with strlcpy
Do a global replace of snprintf(..."%s",...) with strlcpy, adding in the
rte_string_fns.h header if needed.  The function changes in this patch were
auto-generated via command:

  spatch --sp-file devtools/cocci/strlcpy.cocci --dir . --in-place

and then the files edited using awk to add in the missing header:

  gawk -i inplace '/include <rte_/ && ! seen { \
  	print "#include <rte_string_fns.h>"; seen=1} {print}'

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2019-04-04 22:46:05 +02:00
Bruce Richardson
f9acaf84e9 replace snprintf with strlcpy without adding extra include
For files that already have rte_string_fns.h included in them, we can
do a straight replacement of snprintf(..."%s",...) with strlcpy. The
changes in this patch were auto-generated via command:

spatch --sp-file devtools/cocci/strlcpy-with-header.cocci --dir . --in-place

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2019-04-04 22:45:54 +02:00
Thomas Monjalon
721ac9f9e0 eal/x86: fix pedantic build
When enabling pedantic compilation with CONFIG_RTE_LIBRTE_MLX5_DEBUG,
the compiler complains about non standard 128-bit integer type:

include/rte_atomic_64.h:223:3: error:
ISO C does not support ‘__int128’ types [-Werror=pedantic]

It must be marked as an extension of the standard C language
to be accepted in pedantic compilation.

Fixes: 640c5f09ef2c ("eal/x86: add 128-bit atomic compare exchange")
Cc: gage.eads@intel.com

Reported-by: Ferruh Yigit <ferruh.yigit@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Gage Eads <gage.eads@intel.com>
Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-04-04 17:22:06 +02:00
Jerin Jacob
4b3997680a eal: allow to override init macros per OS
baremetal execution environments may have a different
method to enable RTE_INIT instead of using compiler
constructor and/or OS specific linker scheme.
Allow an option to override RTE_INIT* macros using
rte_os.h or appropriate header file.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2019-04-03 23:52:00 +02:00
Gage Eads
640c5f09ef eal/x86: add 128-bit atomic compare exchange
This operation can be used for non-blocking algorithms, such as a
non-blocking stack or ring.

It is available only for x86_64.

Signed-off-by: Gage Eads <gage.eads@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
2019-04-03 21:59:46 +02:00
Shahaf Shuler
237060c4ad mem: limit use of address hint
The commit below added an address hint as starting address for 64-bit
systems in case an explicit base virtual address was not set by the user.

The justification for such hint was to help devices that work in VA
mode and has a address range limitation to work smoothly with the eal
memory subsystem.

While the base address value selected may work fine for the eal
initialization, it easily breaks when trying to register external memory
using rte_extmem_register API.

Trying to register anonymous memory on RH x86_64 machine took several
minutes, during them the function eal_get_virtual_area repeatedly
scanned for a good VA candidate.

The attempt to guess which VA address will be free for mapping will
always result in not portable, error prone code:
* different application may use different libraries along w/ DPDK. One
  can never guess which library was called first and how much virtual
  memory it consumed.
* external memory can be registered at any time in the application run
  time.

In order not to break the existing secondary process design, this patch
only limits the max number of tries that will be done with the
address hint.
When the number of tries exceeds the threshold the code
will use the suggested address from kernel.

Fixes: 1df21702873d ("mem: use address hint for mapping hugepages")
Cc: stable@dpdk.org

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Tested-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Alejandro Lucero <alejandro.lucero@netronome.com>
2019-04-03 19:10:47 +02:00
Stephen Hemminger
d0885cb781 eal: align hexdump output
This fixes the issue where if the length of the output is not
a multiple of 16 the formatting was off.

Before:
00000000: 45 00 00 1C 12 34 2C E0 40 06 B8 2E C0 A8 01 12 | E....4,.@.......
00000010: C0 A8 01 37 |  |  |  |  |  |  |  |  |  |  |  |  | ...7

After:
00000000: 45 00 00 1C 12 34 2C E0 40 06 B8 2E C0 A8 01 12 | E....4,.@.......
00000010: C0 A8 01 37                                     | ...7

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-03 18:34:59 +02:00
Stephen Hemminger
779d9d0986 eal: clean formatting of hexdump functions
The hexdump code obviously came from somewhere else originally.
It is not formatted according to DPDK coding style.

Also, drop the comment which is not useful the docbock comment
is already in the rte_hexdump.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
2019-04-03 18:32:42 +02:00
Stephen Hemminger
6d96b48af8 eal: make u64 reciprocal divisor const
The divisor is not modified here. Doesn't really matter for optimizaton
since the function is inline already; but helps with expressing
intent.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-04-03 18:32:41 +02:00
Anand Rawat
58836e93f5 eal/windows: add wrappers for string functions
Updated rte_common.h to include rte_os.h to contain
OS specific macros and functions. Updated rte_string_fns.h
to include rte_common.h for rte_os.h

Signed-off-by: Anand Rawat <anand.rawat@intel.com>
Reviewed-by: Pallavi Kadam <pallavi.kadam@intel.com>
Reviewed-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>
2019-04-03 01:21:15 +02:00
Anatoly Burakov
1e3380a2f4 mem: do not use lockfiles for single file segments mode
Due to internal glibc limitations [1], DPDK may exhaust internal
file descriptor limits when using smaller page sizes, which results
in inability to use system calls such as select() by user
applications.

Single file segments option stores lock files per page to ensure
that pages are deleted when there are no more users, however this
is not necessary because the processes will be holding onto the
pages anyway because of mmap(). Thus, removing pages from the
filesystem is safe even though they may be used by some other
secondary process. As a result, single file segments mode no
longer stores inordinate amounts of segment fd's, and the above
issue with fd limits is solved.

However, this will not work for legacy mem mode. For that, simply
document that using bigger page sizes is the only option.

[1] https://mails.dpdk.org/archives/dev/2019-February/124386.html

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-04-02 16:07:25 +02:00
Pavan Nikhilesh
e840cb3c2a eal: increase max number of interrupt vectors
MSI-X permits a device to allocate up to 2048 interrupts as per PCIe
spec.
Increase the max number of vectors to a reasonable value of 512.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
2019-04-02 02:59:04 +02:00
Shahaf Shuler
c33a675b62 bus: introduce device level DMA memory mapping
The DPDK APIs expose 3 different modes to work with memory used for DMA:

1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
This memory is allocated by the DPDK libraries, included in the DPDK
memory system (memseg lists) and automatically DMA mapped by the DPDK
layers.

2. Use memory allocated by the user and register to the DPDK memory
systems. Upon registration of memory, the DPDK layers will DMA map it
to all needed devices. After registration, allocation of this memory
will be done with rte_*malloc APIs.

3. Use memory allocated by the user and not registered to the DPDK memory
system. This is for users who wants to have tight control on this
memory (e.g. avoid the rte_malloc header).
The user should create a memory, register it through rte_extmem_register
API, and call DMA map function in order to register such memory to
the different devices.

The scope of the patch focus on #3 above.

Currently the only way to map external memory is through VFIO
(rte_vfio_dma_map). While VFIO is common, there are other vendors
which use different ways to map memory (e.g. Mellanox and NXP).

The work in this patch moves the DMA mapping to vendor agnostic APIs.
Device level DMA map and unmap APIs were added. Implementation of those
APIs was done currently only for PCI devices.

For PCI bus devices, the pci driver can expose its own map and unmap
functions to be used for the mapping. In case the driver doesn't provide
any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.

Application usage with those APIs is quite simple:
* allocate memory
* call rte_extmem_register on the memory chunk.
* take a device, and query its rte_device.
* call the device specific mapping function for this device.

Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
APIs, leaving the rte device APIs as the preferred option for the user.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2019-03-30 16:48:56 +01:00
Shahaf Shuler
4106d89a18 vfio: allow DMA map to the default container
Enable users the option to call rte_vfio_dma_map with request to map
to the default vfio fd.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2019-03-30 16:47:54 +01:00
Anatoly Burakov
23d5455517 mem: warn user when running without NUMA support
Running in non-legacy mode on a NUMA-enabled system without libnuma
is unsupported, so explicitly print out a warning when trying to
do so.

Running in legacy mode without libnuma is still supported whether or
not we are running with libnuma support enabled, so also fix init to
allow that scenario.

Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-30 00:13:04 +01:00
Anatoly Burakov
3660216ef1 malloc: fix IPC message initialization
The memset size for an IPC message is set incorrectly. Fix it to
cover the entire IPC message.

Fixes: 07dcbfe0101f ("malloc: support multiprocess memory hotplug")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-29 12:55:07 +01:00
Anatoly Burakov
b8a86c83e0 fbarray: fix init unlock without lock
Certain failure paths of rte_fbarray_init() will unlock the
mem area lock without locking it first. Fix this by properly
handling the failures.

Fixes: 5b61c62cfd76 ("fbarray: add internal tailq for mapped areas")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-29 12:49:35 +01:00
Darek Stojaczyk
5a98bc5e83 fbarray: fix attach deadlock
rte_fbarray_attach() currently locks its internal
spinlock, but never releases it. Secondary processes
won't even start if there is more than one fbarray
to be attached to - the second rte_fbarray_attach()
would be just stuck.

Fix it by releasing the lock at the end of
rte_fbarray_attach(). I believe this was the original
intention.

Fixes: 5b61c62cfd76 ("fbarray: add internal tailq for mapped areas")

Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-29 12:49:35 +01:00
Anatoly Burakov
1fd3bcf3f9 vfio: document multiprocess limitation for container API
Currently, there is no support for sharing custom VFIO containers
between multiple processes, but it is not documented.

Document this limitation.

Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-29 00:07:16 +01:00
Thomas Monjalon
3a1a885e03 eal: remove redundant atomic API description
Atomic functions are described in doxygen of the file
lib/librte_eal/common/include/generic/rte_atomic.h
The copies in arch-specific files are redundant
and confuse readers about the genericity of the API.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-03-28 23:52:53 +01:00
Dekel Peled
8015c5593a eal/ppc: fix global memory barrier
From previous patch description: "to improve performance on PPC64,
use light weight sync instruction instead of sync instruction."

Excerpt from IBM doc [1], section "Memory barrier instructions":
"The second form of the sync instruction is light-weight sync,
or lwsync.
This form is used to control ordering for storage accesses to system
memory only. It does not create a memory barrier for accesses to
device memory."

This patch removes the use of lwsync, so calls to rte_wmb() and
rte_rmb() will provide correct memory barrier to ensure order of
accesses to system memory and device memory.

[1] https://www.ibm.com/developerworks/systems/articles/powerpc.html

Fixes: d23a6bd04d72 ("eal/ppc: fix memory barrier for IBM POWER")
Cc: stable@dpdk.org

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
2019-03-28 23:48:28 +01:00
Anatoly Burakov
7353ee7344 fbarray: add API to find biggest used or free chunks
Currently, while there is a way to find total amount of used/free
space in an fbarray, there is no way to find biggest contiguous
chunk. Add such API, as well as unit tests to test this API.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-28 23:28:52 +01:00
Anatoly Burakov
5b61c62cfd fbarray: add internal tailq for mapped areas
Currently, there are numerous reliability issues with fbarray,
such as:
- There is no way to prevent attaching to overlapping memory
  areas
- There is no way to prevent double-detach
- Failed destroy leaves fbarray in an invalid state (fbarray
  itself is valid, but its backing memory area is already
  detached)

In addition, on FreeBSD, doing mmap() on a file descriptor
does not keep the lock, so we also need to store the fd
in order to keep the lock.

This patch improves upon fbarray to address both of these
issues by adding an internal tailq to track allocated areas
and their respective file descriptors.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-03-28 23:28:50 +01:00