70 Commits

Author SHA1 Message Date
Anatoly Burakov
47c45a4df6 vfio: fix DMA mapping of external heaps
Currently, externally created heaps are supposed to be automatically
mapped for VFIO DMA by EAL, however they only do so if, at the time of
heap creation, VFIO is initialized and has at least one device
available. If no devices are available at the time of heap creation (or
if devices were available, but were since hot-unplugged, thus dropping
all VFIO container mappings), then VFIO mapping code would have skipped
over externally allocated heaps.

The fix is two-fold. First, we allow externally allocated memory
segments to be marked as "heap" segments. This allows us to distinguish
between external memory segments that were created via heap API, from
those that were created via rte_extmem_register() API.

Then, we fix the VFIO code to only skip non-heap external segments.
Also, since external heaps are not guaranteed to have valid IOVA
addresses, we will skip those which have invalid IOVA addresses as well.

Fixes: 0f526d674f8e ("malloc: separate creating memseg list and malloc heap")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Rajesh Ravi <rajesh.ravi@broadcom.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2019-11-07 17:46:43 +01:00
Anatoly Burakov
b14d192ca1 vfio: remove deprecated DMA mapping functions
The rte_vfio_dma_map/unmap API's have been marked as deprecated in
release 19.05. Remove them.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2019-11-07 17:46:43 +01:00
Anatoly Burakov
9362945d7e vfio: fix DMA mapping with default container
When requesting DMA mapping to default container, we are meant to
supply the RTE_VFIO_DEFAULT_CONTAINER_FD value, however this is
not handled correctly by get_vfio_cfg_by_container_fd(), because
it only looks at actual fd values and does not check for this
special case.

Fix it to return default container if the fd requested is the
special RTE_VFIO_DEFAULT_CONTAINER_FD value.

Fixes: 4106d89a18f8 ("vfio: allow DMA map to the default container")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
2019-11-07 17:46:43 +01:00
Igor Ryzhov
49e7e2dee3 kni: add ability to set min/max MTU
Starting with kernel version 4.10, there are new min/max MTU values in
net_device structure, which are set to ETH_MIN_MTU and ETH_DATA_LEN by
default. We should be able to change these values to allow MTU more than
1500 to be set on KNI.

Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-27 11:07:43 +01:00
David Marchand
6614072791 eal: factorize lcore role code
This code belongs to the lcore API, move the prototype to the right
header, then factorize the code into the common code.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2019-10-27 10:41:08 +01:00
Stephen Hemminger
65661351ca eal: make lcore config private
The internal structure of lcore_config does not need to be part of
visible API/ABI. Make it private to EAL.

Rearrange the structure so it takes less memory (and cache footprint).

Since we change the ABI, bump the library version.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-10-27 10:35:11 +01:00
Anatoly Burakov
6d3f9917ff eal: fix memory config allocation for multi-process
Currently, mem config will be mapped without using the virtual
area reservation infrastructure, which means it will be mapped
at an arbitrary location. This may cause failures to map the
shared config in secondary process due to things like PCI
whitelist arguments allocating memory in a space where the
primary has allocated the shared mem config.

Fix this by using virtual area reservation to reserve space for
the mem config, thereby avoiding the problem and reserving the
shared config (hopefully) far away from any normal memory
allocations.

Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-10-26 18:03:26 +02:00
Anatoly Burakov
6080796f65 mem: make base address hint OS specific
Not all OS's follow Linux's memory layout, which may lead to
problems following the suggested common address hint absent
of a base-virtaddr flag. Make this address hint OS-specific.

Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-10-26 18:03:24 +02:00
Pallavi Kadam
7e708cd8c6 eal: move CPU operations to OS specific headers
Moving RTE_CPU* definitions from the common code to the Linux and
FreeBSD rte_os.h file to avoid #ifdef clutter.

Signed-off-by: Pallavi Kadam <pallavi.kadam@intel.com>
Signed-off-by: Antara Ganesh Kolar <antara.ganesh.kolar@intel.com>
Reviewed-by: Ranjit Menon <ranjit.menon@intel.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
2019-10-26 17:06:41 +02:00
Anatoly Burakov
3fe4bced1b eal: use define instead of raw option name
We are using '--base-virtaddr' in a few places. We have a define for that,
so use it instead.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2019-10-25 11:35:10 +02:00
Anatoly Burakov
8f29a60764 eal/freebsd: support option --base-virtaddr
According to our docs, only Linuxapp supports base-virtaddr option.
That is, strictly speaking, not true because most of the things
that are attempting to respect base-virtaddr are in common files,
so FreeBSD already *mostly* supports this option in practice.

This commit fixes the remaining bits to explicitly support
base-virtaddr option, and moves the arg parsing from EAL to common
options parsing code. Documentation is also updated to reflect
that all platforms now support base-virtaddr.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2019-10-25 11:17:29 +02:00
David Christensen
ed5d3d5cdb eal/linux: restore specific hugepage ordering for ppc
An ifdef present in eal_memory.c references "RTE_ARCH_PPC64" when
it should actually use "RTE_ARCH_PPC_64".  Simple testing revealed
that both the PPC_64 and non-PPC_64 versions of the code involved
work, but the PPC_64 version of the code is retained to be
consistent with other instances in the same file where mmapped
memory is accessed in reverse order on Power platforms.

Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists")
Cc: stable@dpdk.org

Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-24 14:15:10 +02:00
Jim Harris
c1077933d4 timer: remove useless check on x86 TSC reliability
This code was added 7+ years ago in
commit fb022b85bae4 ("timer: check TSC reliability")
presumably when variant TSCs were still somewhat common.

But this code doesn't do anything except print a warning,
and the warning doesn't give any kind of advice to the user,
so let's just remove it.

While the warning has no functional meaning, the /proc/cpuinfo
parsing consumes a non-trivial amount of time which is especially
noticeable in secondary processes.
On my test system, it consumes 21ms out of the 66ms total execution
time for rte_eal_init() in a secondary process.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2019-10-17 09:47:42 +02:00
Xiaolong Ye
b34801d1aa kni: support allmulticast mode set
This patch adds support to allow users enable/disable allmulticast mode for
kni interface.

This requirement comes from bugzilla 312, more details can refer to:
https://bugs.dpdk.org/show_bug.cgi?id=312

Bugzilla ID: 312

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-15 21:16:32 +02:00
Arnon Warshavsky
75dbb45f28 eal: fix mapping leak in secondary process
Have rte_eal_config_reattach clean up the mapped address which is a valid
address but not the one intended.

Coverity issue: 343439
Fixes: 4e8854ae89fa ("eal: do not panic on shared memory init")
Fixes: b149a7064261 ("eal/freebsd: add config reattach in secondary process")
Cc: stable@dpdk.org

Signed-off-by: Arnon Warshavsky <arnon@qwilt.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2019-10-15 20:37:11 +02:00
Jim Harris
773a860aef vfio: fix leak with multiprocess
The code checks both rte_mp_request_sync() return code and that the number
of messages in the reply equals 1.  If rte_mp_request_sync() succeeds but
there was more than one message, those messages would get leaked.

Found via code review by Anatoly Burakov of patches that used the vhost
code as a template for using rte_mp_request_sync().

Fixes: 83a73c5fef66 ("vfio: use generic multi-process channel")
Cc: stable@dpdk.org

Reported-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-10-15 20:36:58 +02:00
David Marchand
8ac3591694 remove useless include of EAL memory config header
Restrict this header inclusion to its real users.

Fixes: 028669bc9f0d ("eal: hide shared memory config")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-10-09 10:22:24 +02:00
Anatoly Burakov
79a0bbe5b6 eal: pick IOVA as PA if IOMMU is not available
When IOMMU is not available, /sys/kernel/iommu_groups will not be
populated. This is happening since at least 3.6 when VFIO support
was added. If the directory is empty, EAL should not pick IOVA as
VA as the default IOVA mode.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Tested-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2019-07-30 10:09:13 +02:00
Anatoly Burakov
78a6d7ed19 vfio: use contiguous mapping for IOVA as VA mode
When using IOVA as VA mode, there is no need to map segments
page by page. This normally isn't a problem, but it becomes one
when attempting to use DPDK in no-huge mode, where VFIO subsystem
simply runs out of space to store mappings.

Fix this for x86 by triggering different callbacks based on whether
IOVA as VA mode is enabled.

Fixes: 73a639085938 ("vfio: allow to map other memory regions")
Cc: stable@dpdk.org

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Andrius Sirvys <andrius.sirvys@intel.com>
2019-07-23 20:47:14 +02:00
Nithin Dabilpuram
a159730c2f eal: add ack interrupt API
Add new ack interrupt API to avoid using
VFIO_IRQ_SET_ACTION_TRIGGER(rte_intr_enable()) for
acking interrupt purpose for VFIO based interrupt handlers.
This implementation is specific to Linux.

Using rte_intr_enable() for acking interrupt has below issues

 * Time consuming to do for every interrupt received as it will
   free_irq() followed by request_irq() and all other initializations
 * A race condition because of a window between free_irq() and
   request_irq() with packet reception still on and device still
   enabled and would throw warning messages like below.
   [158764.159833] do_IRQ: 9.34 No irq handler for vector

In this patch, rte_intr_ack() is a no-op for VFIO_MSIX/VFIO_MSI interrupts
as they are edge triggered and kernel would not mask the interrupt before
delivering the event to userspace and we don't need to ack.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Tested-by: Shahed Shaikh <shshaikh@marvell.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
2019-07-23 12:00:22 +02:00
Nithin Dabilpuram
33543fb3b6 vfio: revert interrupt eventfd setup at probe
This reverts commit 89aac60e0be9ed95a87b16e3595f102f9faaffb4.
"vfio: fix interrupts race condition"

The above mentioned commit moves the interrupt's eventfd setup
to probe time but only enables one interrupt for all types of
interrupt handles i.e VFIO_MSI, VFIO_LEGACY, VFIO_MSIX, UIO.
It works fine with default case but breaks below cases specifically
for MSIX based interrupt handles.

* Applications like l3fwd-power that request rxq interrupts
  while ethdev setup.
* Drivers that need > 1 MSIx interrupts to be configured for
  functionality to work.

VFIO PCI for MSIx expects all the possible vectors to be setup up
when using VFIO_IRQ_SET_ACTION_TRIGGER so that they can be
allocated from kernel pci subsystem. Only way to increase the number
of vectors later is first free all by using VFIO_IRQ_SET_DATA_NONE
with action trigger and then enable new vector count.

Above commit changes the behavior of rte_intr_[enable|disable] to
only mask and unmask unlike earlier behavior and thereby
breaking above two scenarios.

Fixes: 89aac60e0be9 ("vfio: fix interrupts race condition")
Cc: stable@dpdk.org

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Tested-by: Stephen Hemminger <stephen@networkplumber.org>
Tested-by: Shahed Shaikh <shshaikh@marvell.com>
Tested-by: Lei Yao <lei.a.yao@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2019-07-23 12:00:14 +02:00
Jerin Jacob
bbe29a9bd7 eal/linux: select IOVA as VA mode for default case
When bus layer reports the preferred mode as RTE_IOVA_DC then
select the RTE_IOVA_VA mode:

- All drivers work in RTE_IOVA_VA mode, irrespective of physical
address availability.

- By default, a mempool asks for IOVA-contiguous memory using
RTE_MEMZONE_IOVA_CONTIG. This is slow in RTE_IOVA_PA mode and it
may affect the application boot time.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
2019-07-22 17:47:27 +02:00
Takeshi Yoshimura
e072d16f89 vfio: fix expanding DMA area in ppc64le
In ppc64le, expanding DMA areas always fail because we cannot remove
a DMA window. As a result, we cannot allocate more than one memseg in
ppc64le. This is because vfio_spapr_dma_mem_map() doesn't unmap all
the mapped DMA before removing the window. This patch fixes this
incorrect behavior.

I also fixed the order of ioctl for unregister and unmap. The ioctl
for unregister sometimes report device busy errors due to the
existence of mapped area.

Signed-off-by: Takeshi Yoshimura <tyos@jp.ibm.com>
Acked-by: David Christensen <drc@linux.vnet.ibm.com>
2019-07-16 12:56:03 +02:00
Yangchao Zhou
5eb1708ec1 kni: fix kernel crash with multi-segments
va2pa depends on the physical address and virtual address offset of
current mbuf. It may get the wrong physical address of next mbuf which
allocated in another hugepage segment.

In rte_mempool_populate_default(), trying to allocate whole block of
contiguous memory could be failed. Then, it would reserve memory in
several memzones that have different physical address and virtual address
offsets. The rte_mempool_populate_default() is used by
rte_pktmbuf_pool_create().

Fixes: 8451269e6d7b ("kni: remove continuous memory restriction")
Cc: stable@dpdk.org

Signed-off-by: Yangchao Zhou <zhouyates@gmail.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-07-15 22:48:20 +02:00
Takeshi Yoshimura
22a55d2eb6 vfio: fix build on Linux < 4.2
The commit db90b4969e2e ("vfio: retry creating sPAPR DMA window")
introduced a build breakage on old Linux. Linux <4.2 does not define ddw in
struct vfio_iommu_spapr_tce_info. Without ddw, we cannot change window size
and so should give up the creation. I just exculuded the retrying code if
ddw is not supported.

Fixes: db90b4969e2e ("vfio: retry creating sPAPR DMA window")

Signed-off-by: Takeshi Yoshimura <tyos@jp.ibm.com>
Tested-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-07-11 11:28:20 +02:00
David Marchand
89aac60e0b vfio: fix interrupts race condition
Populating the eventfd in rte_intr_enable in each request to vfio
triggers a reconfiguration of the interrupt handler on the kernel side.
The problem is that rte_intr_enable is often used to re-enable masked
interrupts from drivers interrupt handlers.

This reconfiguration leaves a window during which a device could send
an interrupt and then the kernel logs this (unsolicited from the kernel
point of view) interrupt:
[158764.159833] do_IRQ: 9.34 No irq handler for vector

VFIO api makes it possible to set the fd at setup time.
Make use of this and then we only need to ask for masking/unmasking
legacy interrupts and we have nothing to do for MSI/MSIX.

"rxtx" interrupts are left untouched but are most likely subject to the
same issue.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1654824
Fixes: 5c782b3928b8 ("vfio: interrupts")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Tested-by: Shahed Shaikh <shshaikh@marvell.com>
2019-07-10 18:53:47 +02:00
Takeshi Yoshimura
db90b4969e vfio: retry creating sPAPR DMA window
sPAPR allows only page_shift from VFIO_IOMMU_SPAPR_TCE_GET_INFO ioctl.
However, Linux 4.17 or before returns incorrect page_shift for Power9.
I added the code for retrying creation of sPAPR DMA window.

Signed-off-by: Takeshi Yoshimura <tyos@jp.ibm.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-07-07 23:20:23 +02:00
Anatoly Burakov
ae3b4bc4fb eal: prevent different primary/secondary process versions
Currently, nothing stops DPDK to attempt to run primary and
secondary processes while having different versions. This
can lead to all sorts of weird behavior and makes it harder
to maintain compatibility without breaking ABI every once
in a while.

Fix it by explicitly disallowing running different DPDK
versions as primary and secondary processes.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2019-07-06 10:32:40 +02:00
Anatoly Burakov
b39fcd9569 eal: unify internal config init
Currently, each EAL will update internal/shared config in their
own way at init, resulting in needless duplication of code and
OS-dependent behavior. Move the functions to a common file and
add missing FreeBSD steps.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Marchand <david.marchand@redhat.com>
2019-07-06 10:32:40 +02:00
Anatoly Burakov
00299d3960 eal: unify wait for complete init
Currently, mcfg completion function exists in two independent
implementations doing the same thing, which is bug prone.
Unify the two functions and move them into one place.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Marchand <david.marchand@redhat.com>
2019-07-06 10:32:40 +02:00
Anatoly Burakov
a08a5dd20e eal: uninline wait for complete init
Currently, the function to wait until config completion is
static inline for no reason. Move its implementation to
an EAL common file.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Marchand <david.marchand@redhat.com>
2019-07-06 10:32:40 +02:00
Anatoly Burakov
028669bc9f eal: hide shared memory config
Now that everything that has ever accessed the shared memory
config is doing so through the public API's, we can make it
internal. Since we're removing quite a few headers from
rte_eal_memconfig.h, we need to add them back in places
where this header is used.

This bumps the ABI, so also change all build files and make
update documentation.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Marchand <david.marchand@redhat.com>
2019-07-06 10:32:34 +02:00
Anatoly Burakov
76f80881ef mem: add API to lock/unlock memory hotplug
Currently, the memory hotplug is locked automatically by all
memory-related _walk() functions, but sometimes locking the
memory subsystem outside of them is needed. There is no
public API to do that, so it creates a dependency on shared
memory config to be public. Fix this by introducing a new
API to lock/unlock the memory hotplug subsystem.

Create a new common file for all things mem config, and a
new API namespace rte_mcfg_*, and search-and-replace all
usages of the locks with the new API.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Marchand <david.marchand@redhat.com>
2019-07-05 22:12:40 +02:00
Ben Walker
c2361bab70 eal: compute IOVA mode based on PA availability
Currently, if the bus selects IOVA as PA, the memory init can fail when
lacking access to physical addresses.
This can be quite hard for normal users to understand what is wrong
since this is the default behavior.

Catch this situation earlier in eal init by validating physical addresses
availability, or select IOVA when no clear preferrence had been expressed.

The bus code is changed so that it reports when it does not care about
the IOVA mode and let the eal init decide.

In Linux implementation, rework rte_eal_using_phys_addrs() so that it can
be called earlier but still avoid a circular dependency with
rte_mem_virt2phys().
In FreeBSD implementation, rte_eal_using_phys_addrs() always returns
false, so the detection part is left as is.

If librte_kni is compiled in and the KNI kmod is loaded,
- if the buses requested VA, force to PA if physical addresses are
  available as it was done before,
- else, keep iova as VA, KNI init will fail later.

Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-07-05 16:55:44 +02:00
David Marchand
cfe3aeb170 remove experimental tags from all symbol definitions
We had some inconsistencies between functions prototypes and actual
definitions.
Let's avoid this by only adding the experimental tag to the prototypes.
Tests with gcc and clang show it is enough.

git grep -l __rte_experimental |grep \.c$ |while read file; do
	sed -i -e '/^__rte_experimental$/d' $file;
	sed -i -e 's/  *__rte_experimental//' $file;
	sed -i -e 's/__rte_experimental  *//' $file;
done

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2019-06-29 19:04:43 +02:00
David Marchand
55cf111b6e vfio: remove incorrect experimental tag
The incriminated commit promoted this symbol as stable but the
definition still has the tag.

Fixes: 787ae736a3d9 ("vfio: remove experimental tag")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2019-06-29 19:04:25 +02:00
David Marchand
f319d99379 eal: hide internal hotplug function
This API was experimental and not properly marked in the map file.
But looking more closely, this is just an internal wrapper for EAL init.
Hide it in the hotplug code.

Fixes: 244d5130719c ("eal: enable hotplug on multi-process")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2019-06-29 19:04:07 +02:00
Mattias Rönnblom
3f002f0696 eal: replace libc-based random generation with LFSR
This commit replaces rte_rand()'s use of lrand48() with a DPDK-native
combined Linear Feedback Shift Register (LFSR) (also known as
Tausworthe) pseudo-random number generator.

This generator is faster and produces better-quality random numbers
than the linear congruential generator (LCG) of lib's lrand48(). The
implementation, as opposed to lrand48(), is multi-thread safe in
regards to concurrent rte_rand() calls from different lcore threads.
A LCG is still used, but only to seed the five per-lcore LFSR
sequences.

In addition, this patch also addresses the issue of the legacy
implementation only producing 62 bits of pseudo randomness, while the
API requires all 64 bits to be random.

This pseudo-random number generator is not cryptographically secure -
just like lrand48().

Bugzilla ID: 114
Bugzilla ID: 276

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2019-06-28 15:23:38 +02:00
Thomas Monjalon
75683290e5 eal/linux: fix return after alarm registration failure
When adding an alarm, if an error happen when registering
the common alarm callback, it is not considered as a major failure.
The alarm is then inserted in the list.
However it was returning an error code after inserting the alarm.

The error code is not set anymore to be consistent with the behaviour.

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
2019-06-27 17:25:05 +02:00
Xiaolong Ye
4142b06e2d eal: correct log for alarm error
Fixes: af75078fece3 ("first public release")
Fixes: 764bf26873b9 ("add FreeBSD support")
Cc: stable@dpdk.org

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2019-06-27 10:33:06 +02:00
Arnon Warshavsky
4e8854ae89 eal: do not panic on shared memory init
This patch changes some void functions to return a value,
so that the init sequence may tear down orderly
instead of calling panic.

Signed-off-by: Arnon Warshavsky <arnon@qwilt.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2019-06-26 16:50:33 +02:00
Igor Ryzhov
ee3cac929e kni: remove PCI related information
As there is no ethtool support in KNI anymore,
PCI related information is no longer needed.

Fixes: ea6b39b5b847 ("kni: remove ethtool support")

Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-06-17 14:22:22 +03:00
Anatoly Burakov
edf73dd330 ipc: handle unsupported IPC in action register
Currently, IPC API will silently ignore unsupported IPC.
Fix the API call and its callers to explicitly handle
unsupported IPC cases.

For primary processes, it is OK to not have IPC because
there may not be any secondary processes in the first place,
and there are valid use cases that disable IPC support, so
all primary process usages are fixed up to ignore IPC
failures.

For secondary processes, IPC will be crucial, so leave all
of the error handling as is.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-06-05 11:27:36 +02:00
Anatoly Burakov
c830900d75 ipc: handle unsupported IPC in init
Currently, IPC API will silently ignore unsupported IPC.
Fix the API call and its callers to explicitly handle
unsupported IPC cases.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-06-05 11:27:17 +02:00
Nicolas Dichtel
2a96c88be8 mem: ease init in a docker container
move_pages() is only used to get the numa node id, but this function
is not allowed by default in docker (it needs CAP_SYS_NICE and an update of
the seccomp profile).
get_mempolicy() also requires CAP_SYS_NICE but doesn't need any change in
the default seccomp profile.

Note that the returned value of move_pages() was not checked, thus some
errors could be hidden (if the requested id was 0).

Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime")
Cc: stable@dpdk.org

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Didier Pallard <didier.pallard@6wind.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2019-06-04 13:03:24 +02:00
Bruce Richardson
be9d80d4df mem: mark unused function in 32-bit builds
The get_socket_mem_size() function is only used in 64-bit builds,
causing clang to warn about it for 32-bit builds. Add the __rte_unused
attribute to the function to silence the warning.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
2019-06-03 23:56:35 +02:00
Bruce Richardson
f5256f8dfd build: remove unnecessary large file support defines
Since we now always use _FILE_OFFSET_BITS=64 flag when building
DPDK, we can remove the Makefile and C-file #defines setting it
individually for parts of the build.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
2019-06-03 23:55:50 +02:00
Kevin Traynor
da2d1d42f9 vfio: expand non-viable group error message
"VFIO group is not viable" error message is correct
but not very user friendly for something which can
usually be easily rectified.

Add some additional text to give more of a hint.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Rami Rosen <ramirose@gmail.com>
2019-05-03 22:42:30 +02:00
John McNamara
8bd5f07c7a doc: fix spelling reported by aspell in comments
Fix spelling errors in the doxygen docs.

Signed-off-by: John McNamara <john.mcnamara@intel.com>
2019-05-03 00:38:14 +02:00
Thomas Monjalon
d711bea6fe eal: promote some experimental functions as stable
The function rte_eal_cleanup() was introduced more than one year ago,
in DPDK 18.02. It is no longer experimental, allowing
pdump, proc-info and hotplug_mp apps to not need any experimental API.

The function rte_ctrl_thread_create() was introduced one year ago
in DPDK 18.05. It is no longer experimental, allowing
KNI PMD and TEP example to not need any experimental API.

The functions rte_socket_count() and rte_socket_id_by_idx() were
introduced one year ago in DPDK 18.05. They are no longer experimental.

The function rte_dev_is_probed() was introduced half a year ago
in DPDK 18.11. It is no longer experimental.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
2019-04-21 19:11:37 +02:00