Enable both C11 atomic and non C11 atomic lock-free stack for aarch64.
Introduced a new header to reduce the ifdef clutter across generic and C11
files. The rte_stack_lf_stubs.h contains stub implementations of
__rte_stack_lf_count, __rte_stack_lf_push_elems and
__rte_stack_lf_pop_elems.
Suggested-by: Gage Eads <gage.eads@intel.com>
Suggested-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Tested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
When IOMMU is not available, /sys/kernel/iommu_groups will not be
populated. This is happening since at least 3.6 when VFIO support
was added. If the directory is empty, EAL should not pick IOVA as
VA as the default IOVA mode.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Tested-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
When bus layer reports the preferred mode as RTE_IOVA_DC then
select the RTE_IOVA_VA mode:
- All drivers work in RTE_IOVA_VA mode, irrespective of physical
address availability.
- By default, a mempool asks for IOVA-contiguous memory using
RTE_MEMZONE_IOVA_CONTIG. This is slow in RTE_IOVA_PA mode and it
may affect the application boot time.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which
was intended to mean "driver only supports VA" but had been understood
as "driver supports both PA and VA" by most net drivers and used to let
dpdk processes to run as non root (which do not have access to physical
addresses on recent kernels).
The check on physical addresses actually closed the gap for those
drivers. We don't need to mark them with RTE_PCI_DRV_IOVA_AS_VA and this
flag can retain its intended meaning.
Document explicitly its meaning.
We can check that a driver requirement wrt to IOVA mode is fulfilled
before trying to probe a device.
Finally, document the heuristic used to select the IOVA mode and hope
that we won't break it again.
Fixes: 703458e19c16 ("bus/pci: consider only usable devices for IOVA mode")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Tested-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
IPC and memory-related API's should not be mixed because memory
relies on IPC internally. Add explicit warnings to IPC API and
to the documentation about this.
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
This commit adds support for lock-free (linked list based) stack mempool
handler.
In mempool_perf_autotest the lock-based stack outperforms the
lock-free handler for certain lcore/alloc count/free count
combinations*, however:
- For applications with preemptible pthreads, a standard (lock-based)
stack's worst-case performance (i.e. one thread being preempted while
holding the spinlock) is much worse than the lock-free stack's.
- Using per-thread mempool caches will largely mitigate the performance
difference.
*Test setup: x86_64 build with default config, dual-socket Xeon E5-2699 v4,
running on isolcpus cores with a tickless scheduler. The lock-based stack's
rate_persec was 0.6x-3.5x the lock-free stack's.
Signed-off-by: Gage Eads <gage.eads@intel.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
Due to internal glibc limitations [1], DPDK may exhaust internal
file descriptor limits when using smaller page sizes, which results
in inability to use system calls such as select() by user
applications.
Single file segments option stores lock files per page to ensure
that pages are deleted when there are no more users, however this
is not necessary because the processes will be holding onto the
pages anyway because of mmap(). Thus, removing pages from the
filesystem is safe even though they may be used by some other
secondary process. As a result, single file segments mode no
longer stores inordinate amounts of segment fd's, and the above
issue with fd limits is solved.
However, this will not work for legacy mem mode. For that, simply
document that using bigger page sizes is the only option.
[1] https://mails.dpdk.org/archives/dev/2019-February/124386.html
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
The DPDK APIs expose 3 different modes to work with memory used for DMA:
1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
This memory is allocated by the DPDK libraries, included in the DPDK
memory system (memseg lists) and automatically DMA mapped by the DPDK
layers.
2. Use memory allocated by the user and register to the DPDK memory
systems. Upon registration of memory, the DPDK layers will DMA map it
to all needed devices. After registration, allocation of this memory
will be done with rte_*malloc APIs.
3. Use memory allocated by the user and not registered to the DPDK memory
system. This is for users who wants to have tight control on this
memory (e.g. avoid the rte_malloc header).
The user should create a memory, register it through rte_extmem_register
API, and call DMA map function in order to register such memory to
the different devices.
The scope of the patch focus on #3 above.
Currently the only way to map external memory is through VFIO
(rte_vfio_dma_map). While VFIO is common, there are other vendors
which use different ways to map memory (e.g. Mellanox and NXP).
The work in this patch moves the DMA mapping to vendor agnostic APIs.
Device level DMA map and unmap APIs were added. Implementation of those
APIs was done currently only for PCI devices.
For PCI bus devices, the pci driver can expose its own map and unmap
functions to be used for the mapping. In case the driver doesn't provide
any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.
Application usage with those APIs is quite simple:
* allocate memory
* call rte_extmem_register on the memory chunk.
* take a device, and query its rte_device.
* call the device specific mapping function for this device.
Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
APIs, leaving the rte device APIs as the preferred option for the user.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Rather than using linuxapp and bsdapp everywhere, we can change things to
use the, more readable, terms "linux" and "freebsd" in our build configs.
Rather than renaming the configs we can just duplicate the existing ones
with the new names using symlinks, and use the new names exclusively
internally. ["make showconfigs" also only shows the new names to keep the
list short] The result is that backward compatibility is kept fully but any
new builds or development can be done using the newer names, i.e. both
"make config T=x86_64-native-linuxapp-gcc" and "T=x86_64-native-linux-gcc"
work.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
The term "linuxapp" is a legacy one, but just calling the subdirectory
"linux" is just clearer for all concerned.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
The term "bsdapp" is a legacy one, but just calling the subdirectory
"freebsd" is just clearer for all concerned.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Spawning the ctrl threads on anything that is not part of the eal
coremask is not that polite to the rest of the system, especially
when you took good care to pin your processes on cpu resources with
tools like taskset (linux) / cpuset (freebsd).
Rather than introduce yet another eal options to control on which cpu
those ctrl threads are created, let's take the startup cpu affinity
as a reference and remove the eal coremask from it.
If no cpu is left, then we default to the master core.
The cpuset is computed once at init before the original cpu affinity
is lost.
Introduced a RTE_CPU_AND macro to abstract the differences between linux
and freebsd respective macros.
Examples in a 4 cores FreeBSD vm:
$ ./build/app/testpmd -l 2,3 --no-huge --no-pci -m 512 \
-- -i --total-num-mbufs=2048
$ procstat -S 1057
PID TID COMM TDNAME CPU CSID CPU MASK
1057 100131 testpmd - 2 1 2
1057 100140 testpmd eal-intr-thread 1 1 0-1
1057 100141 testpmd rte_mp_handle 1 1 0-1
1057 100142 testpmd lcore-slave-3 3 1 3
$ cpuset -l 1,2,3 ./build/app/testpmd -l 2,3 --no-huge --no-pci -m 512 \
-- -i --total-num-mbufs=2048
$ procstat -S 1061
PID TID COMM TDNAME CPU CSID CPU MASK
1061 100131 testpmd - 2 2 2
1061 100144 testpmd eal-intr-thread 1 2 1
1061 100145 testpmd rte_mp_handle 1 2 1
1061 100147 testpmd lcore-slave-3 3 2 3
$ cpuset -l 2,3 ./build/app/testpmd -l 2,3 --no-huge --no-pci -m 512 \
-- -i --total-num-mbufs=2048
$ procstat -S 1065
PID TID COMM TDNAME CPU CSID CPU MASK
1065 100131 testpmd - 2 2 2
1065 100148 testpmd eal-intr-thread 2 2 2
1065 100149 testpmd rte_mp_handle 2 2 2
1065 100150 testpmd lcore-slave-3 3 2 3
Fixes: d651ee4919cd ("eal: set affinity for control threads")
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
Memory mode flags are now shared between primary and secondary
processes, so the in documentation about limitations is no longer
necessary.
Fixes: 64cdfc35aaad ("mem: store memory mode flags in shared config")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Add multiprocess support for externally allocated memory areas that
are not added to DPDK heap (and add relevant doc sections).
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
The general use-case of using external memory is well covered by
existing external memory API's. However, certain use cases require
manual management of externally allocated memory areas, so this
memory should not be added to the heap. It should, however, be
added to DPDK's internal structures, so that API's like
``rte_virt2memseg`` would work on such external memory segments.
This commit adds such an API to DPDK. The new functions will allow
to register and unregister externally allocated memory areas, as
well as documentation for them.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
SPDK uses the rte_mem_event_callback_register API to
create RDMA memory regions (MRs) for newly allocated regions
of memory. This is used in both the SPDK NVMe-oF target
and the NVMe-oF host driver.
DPDK creates internal malloc_elem structures for these
allocated regions. As users malloc and free memory, DPDK
will sometimes merge malloc_elems that originated from
different allocations that were notified through the
registered mem_event callback routine. This results
in subsequent allocations that can span across multiple
RDMA MRs. This requires SPDK to check each DPDK buffer to
see if it crosses an MR boundary, and if so, would have to
add considerable logic and complexity to describe that
buffer before it can be accessed by the RNIC. It is somewhat
analagous to rte_malloc returning a buffer that is not
IOVA-contiguous.
As a malloc_elem gets split and some of these elements
get freed, it can also result in DPDK sending an
RTE_MEM_EVENT_FREE notification for a subset of the
original RTE_MEM_EVENT_ALLOC notification. This is also
problematic for RDMA memory regions, since unregistering
the memory region is all-or-nothing. It is not possible
to unregister part of a memory region.
To support these types of applications, this patch adds
a new --match-allocations EAL init flag. When this
flag is specified, malloc elements from different
hugepage allocations will never be merged. Memory will
also only be freed back to the system (with the requisite
memory event callback) exactly as it was originally
allocated.
Since part of this patch is extending the size of struct
malloc_elem, we also fix up the malloc autotests so they
do not assume its size exactly fits in one cacheline.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
The PCI bus is an independent driver and not part of EAL
as it was in the early days.
EAL must be understood as a generic layer.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: John McNamara <john.mcnamara@intel.com>
This patch uses EAL option "--iova-mode" to force the IOVA mode to a
particular value. There exists virtual devices that are not directly
attached to the PCI bus, and therefore the auto detection of the IOVA
mode based on probing the PCI bus and IOMMU configuration may not
report the required addressing mode. Using the EAL option permits the
mode to be explicitly configured in this scenario.
Signed-off-by: Eric Zhang <eric.zhang@windriver.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Marko Kovacevic <marko.kovacevic@intel.com>
rte_ring implementation is not preemptible only under certain
circumstances. This clarification is helpful for data plane and
control plane communication using rte_ring.
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Previously, it was possible to limit maximum amount of memory
allowed for allocation by creating validator callbacks. Although a
powerful tool, it's a bit of a hassle and requires modifying the
application for it to work with DPDK example applications.
Fix this by adding a new parameter "--socket-limit", with syntax
similar to "--socket-mem", which would set per-socket memory
allocation limits, and set up a default validator callback to deny
all allocations above the limit.
This option is incompatible with legacy mode, as validator callbacks
are not supported there.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Document new command-line switches and the principles behind the
new memory subsystem. Also, replace outdated malloc heap picture.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
This patch fixes a trivial typo in the programmer's guide.
Fixes: 1733be6d3147 ("doc: new eal multi-pthread feature")
Cc: stable@dpdk.org
Signed-off-by: Rami Rosen <rami.rosen@intel.com>
This commit adds a new function rte_eal_cleanup().
The function serves as a hook to allow DPDK to release
internal resources (e.g.: hugepage allocations).
This function allows DPDK to become more like an ordinary
library, where the library context itself can be initialized
and cleaned up by the application.
The rte_exit() and rte_panic() functions must be considered,
particularly if they should call rte_eal_cleanup() to release any
resources or not. This patch adds the cleanup to rte_exit(),
but does not clean up on rte_panic(). The reason to not clean
up on panicing is that the developer may wish to inspect the
exact internal state of EAL and hugepages.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Vipin Varghese <vipin.varghese@intel.com>
Fix an error in DPDK programmer's guide (EAL section):
it should be rte_thread_get_affinity() instead of
rte_pthread_get_affinity().
Signed-off-by: Rami Rosen <rami.rosen@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
This patch fixes a trivial typo in DPDK programmer's guide:
it should be rte_cpu_get_features() instead of rte_cpu_get_feature().
Signed-off-by: Rami Rosen <rami.rosen@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
We remove xen-specific code in EAL, including the option --xen-dom0,
memory initialization code, compiling dependency, etc.
Related documents are removed or updated, and bump the eal library
version.
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
This new API allows reacting to a device removal.
A device removal is the sudden disappearance of a device from its
bus.
PMDs implementing support for this notification guarantee that the removal
of the underlying device does not incur a risk to the application.
In particular, Rx/Tx bursts and all other functions can still be called
(albeit likely returning errors) without triggering a crash, irrespective
of an application handling this event.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Signed-off-by: Elad Persiko <eladpe@mellanox.com>
There was a compile time setting to enable a ring to yield when
it entered a loop in mp or mc rings waiting for the tail pointer update.
Build time settings are not recommended for enabling/disabling features,
and since this was off by default, remove it completely. If needed, a
runtime enabled equivalent can be used.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The mempool cache is only available to EAL threads as a per-lcore
resource. Change this so that the user can create and provide their own
cache on mempool get and put operations. This works with non-EAL threads
too. This commit introduces the new API calls:
rte_mempool_cache_create(size, socket_id)
rte_mempool_cache_free(cache)
rte_mempool_cache_flush(cache, mp)
rte_mempool_default_cache(mp, lcore_id)
Changes the API calls:
rte_mempool_generic_put(mp, obj_table, n, cache, flags)
rte_mempool_generic_get(mp, obj_table, n, cache, flags)
The cache-oblivious API calls use the per-lcore default local cache.
Signed-off-by: Lazaros Koromilas <l@nofutznetworks.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
* remove outdated chapter reference to Multi-process support.
* html output converts "--" to "-", this is wrong when explaining the
command arguments, used fixed width quotes for them.
Fixes: fc1f2750a3ec ("doc: programmers guide")
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
The malloc library is now part of the EAL.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
The patch updates the env_abstraction_layer.rst part in prog_guide.
It adds the RX interrupt event declaration and revises the others in
interrupt event section.
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Acked-by: Danny Zhou <danny.zhou@intel.com>
This change adds automatic figure references to the docs. The
figure numbers in the generated Html and PDF docs are now
automatically numbered based on section.
Requires Sphinx >= 1.3.1.
The patch makes the following changes.
* Changes image:: tag to figure:: and moves image caption
to the figure.
* Adds captions to figures that didn't previously have any.
* Un-templates the |image-name| substitution definitions
into explicit figure:: tags. They weren't used more
than once anyway and Sphinx doesn't support them
for figure.
* Adds a target to each image that didn't previously
have one so that they can be cross-referenced.
* Renamed existing image target to match the image
name for consistency.
* Replaces the Figures lists with automatic :numref:
:ref: entries to generate automatic numbering
and captions.
* Replaces "Figure" references with automatic :numref:
references.
Signed-off-by: John McNamara <john.mcnamara@intel.com>
Some blocks are not visible with some Sphinx versions because
they are using the wrong keyword for code.
Tested with Sphinx v1.1.3.
Fixes: 1733be6d3147 ("doc: new eal multi-pthread feature")
Fixes: ccefe752cab0 ("doc: add jobstats sample guide")
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Changed all image.svg and image.png extensions to image.*
This allows Sphinx to decide the appropriate image type
from the available image options.
In case of PDF, SVG images are converted and Sphinx must pick
the converted version.
Signed-off-by: John McNamara <john.mcnamara@intel.com>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
Since DPDK now has support for the in-tree uio_pci_generic driver,
update the programmers guide document to reference this module, and to use it
in preference to the igb_uio driver, which is DPDK-specific.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
The 1.7 DPDK_Prog_Guide document in MSWord has been converted to rst format for
use with Sphinx. There is an rst file for each chapter and an index.rst file
which contains the table of contents.
The top level index file has been modified to include this guide.
This document contains some png image files. If any of these png files are modified
they should be replaced with an svg file.
This is the sixth document from a set of 6 documents.
Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>