Add an API to remove memory from specified heaps. This will first
check if all elements within the region are free, and that the
region is the original region that was added to the heap (by
comparing its length to length of memory addressed by the
underlying memseg list).
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Add an API to add externally allocated memory to malloc heap. The
memory will be stored in memseg lists like regular DPDK memory.
Multiple segments are allowed within a heap. If IOVA table is
not provided, IOVA addresses are filled in with RTE_BAD_IOVA.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Add API to allow creating new malloc heaps. They will be created
with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing
with internal heaps.
This breaks the ABI, so document the change.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
An API is needed to check whether a particular socket ID belongs
to an internal or external heap. Prime user of this would be
mempool allocator, because normal assumptions of IOVA
contiguousness in IOVA as VA mode do not hold in case of
externally allocated memory.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
When we will be creating external heaps, they will have their own
"fake" socket ID, so add a function that will map the heap name
to its socket ID.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
We will need to refer to external heaps in some way. While we use
heap ID's internally, for external API use it has to be something
more user-friendly. So, we will be using a string to uniquely
identify a heap.
This breaks the ABI, so document the change.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Switch over all parts of EAL to use heap ID instead of NUMA node
ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
node's index within the detected NUMA node list. Heap ID for
external heaps will be order of their creation.
This breaks the ABI, so document the changes.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.
This breaks the ABI, so document the change in release notes.
This also breaks a few internal assumptions about memory
contiguousness, so adjust malloc code in a few places.
All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.
Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Previously, to calculate length of memory area covered by a memseg
list, we would've needed to multiply page size by length of fbarray
backing that memseg list. This is not obvious and unnecessarily
low level, so store length in the memseg list itself.
This breaks ABI, so bump the EAL ABI version and document the
change. Also, while we're breaking ABI, pack the members a little
better.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Currently, DPDK will skip mapping some areas (or even an entire BAR)
if MSI-X table happens to be in them but is smaller than page size.
Kernels 4.16+ will allow mapping MSI-X BARs [1], and will report this
as a capability flag. Capability flags themselves are also only
supported since kernel 4.6 [2].
This commit will introduce support for checking VFIO capabilities,
and will use it to check if we are allowed to map BARs with MSI-X
tables in them, along with backwards compatibility for older
kernels, including a workaround for a variable rename in VFIO
region info structure [3].
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
linux.git/commit/?id=a32295c612c57990d17fb0f41e7134394b2f35f6
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
linux.git/commit/?id=c84982adb23bcf3b99b79ca33527cd2625fbe279
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
linux.git/commit/?id=ff63eb638d63b95e489f976428f1df01391e15e4
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Currently, command-line switches for legacy mem mode or single-file
segments mode are only stored in internal config. This leads to a
situation where these flags have to always match between primary
and secondary, which is bad for usability.
Fix this by storing these flags in the shared config as well, so
that secondary process can know if the primary was launched in
single-file segments or legacy mem mode.
This bumps the EAL ABI, however there's an EAL deprecation notice
already in place[1] for a different feature, so that's OK.
[1] http://patches.dpdk.org/patch/43502/
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
In a few cases, user may need to query offset into fd for a
particular memory segment (for example, to selectively map
pages). This commit adds a new API to do that.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Now that we can retrieve page fd's internally, we can expose it
as an external API. This will add two flavors of API - thread-safe
and non-thread-safe. Fix up internal API's to return values we need
without modifying rte_errno internally if called from within EAL.
We do not want calling code to accidentally close an internal fd, so
we make a duplicate of it before we return it to the user. Caller is
therefore responsible for closing this fd.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
The strncpy function has long been deemed unsafe for use,
in favor of strlcpy or snprintf.
While snprintf is standard and strlcpy is still largely available,
they both have issues regarding error checking and performance.
Both will force reading the source buffer past the requested size
if the input is not a proper c-string, and will return the expected
number of bytes copied, meaning that error checking needs to verify
that the number of bytes copied is not superior to the destination
size.
This contributes to awkward code flow, unclear error checking and
potential issues with malformed input.
The function strscpy has been discussed for some time already and
has been made available in the linux kernel[1].
Propose this new function as a safe alternative.
[1]: http://git.kernel.org/linus/30c44659f4a3
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Juhamatti Kuusisaari <juhamatti.kuusisaari@coriant.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
This has been only build-tested for now, on a native ppc64el POWER8E
machine running Debian sid.
Signed-off-by: Luca Boccassi <bluca@debian.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Start version numbering for a new release cycle,
and introduce a template file for release notes.
The release notes comments have a new block to suggest
the order of items, inspired by Ferruh's proposal.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: John McNamara <john.mcnamara@intel.com>
Hotplug functions should be used directly to add and remove devices.
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
n_bits comes as first argument, align doxygen comment.
n_bit need to not be multiple of 512 as n_bits
are rounding to RTE_BITMAP_CL_BIT_SIZE.
Fixes: 14456f59e9f7 ("doc: fix doxygen warnings in QoS API")
Fixes: de3cfa2c9823 ("sched: initial import")
Cc: stable@dpdk.org
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Use the iteration hooks in the abstraction layers to perform the
requested filtering on the internal device lists.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Parse a device description.
Split this description in their relevant part for each layers.
No dynamic allocation is performed.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
A device iterator allows iterating over a set of devices.
This set is defined by the two descriptions offered,
* rte_bus
* rte_class
Only one description can be provided, or both. It is not allowed to
provide no description at all.
Each layer of abstraction then performs a filter based on the
description provided. This filtering allows iterating on their internal
set of devices, stopping when a match is valid and returning the current
iteration context.
This context allows starting the next iteration from the same point and
going forward.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
This function is private to the EAL.
It is used to parse each layers in a device description string,
and store the result in an rte_devargs structure.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
This abstraction exists since the infancy of DPDK.
It needs to be fleshed out however, to allow a generic
description of devices properties and capabilities.
A device class is the northbound interface of the device, intended
for applications to know what it can be used for.
It is conceptually just above buses.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
This macro adds symbols to the .fini section using the global
RTE priorities, to ensure consistency.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
rte_devargs_parse becomes non-variadic,
rte_devargs_parsef becomes the variadic version, to be used to compose
device strings.
Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Since uuid functions may not be available everywhere, implement
uuid functions in DPDK. These are based off the BSD licensed
libuuid in util-link.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Currently, reserving zero-length memzones is done by looking at
malloc statistics, and reserving biggest sized element found in those
statistics. This has two issues.
First, there is a race condition. The heap is unlocked between the
time we check stats, and the time we reserve malloc element for memzone.
This may lead to inability to reserve the memzone we wanted to reserve,
because another allocation might have taken place and biggest sized
element may no longer be available.
Second, the size returned by malloc statistics does not include any
alignment information, which is worked around by being conservative and
subtracting alignment length from the final result. This leads to
fragmentation and reserving memzones that could have been bigger but
aren't.
Fix all of this by using earlier-introduced operation to reserve
biggest possible malloc element. This, however, comes with a trade-off,
because we can only lock one heap at a time. So, if we check the first
available heap and find *any* element at all, that element will be
considered "the biggest", even though other heaps might have bigger
elements. We cannot know what other heaps have before we try and
allocate it, and it is not a good idea to lock all of the heaps at
the same time, so, we will just document this limitation and
encourage users to reserve memzones with socket id properly set.
Also, fixup unit tests to account for the new behavior.
Fixes: fafcc11985a2 ("mem: rework memzone to be allocated by malloc")
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Sometimes, user code needs to walk memseg list while being inside
a memory-related callback. Rather than making everyone copy around
the same iteration code and depending on DPDK internals, provide an
official way to do memseg_list_walk() inside callbacks.
Also, remove existing reimplementation from memalloc code and use
the new API instead.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Sometimes, user code needs to walk memseg list while being inside
a memory-related callback. Rather than making everyone copy around
the same iteration code and depending on DPDK internals, provide an
official way to do memseg_walk() inside callbacks.
Also, remove existing reimplementation from sPAPR VFIO code and use
the new API instead.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Sometimes, user code needs to walk memseg list while being inside
a memory-related callback. Rather than making everyone copy around
the same iteration code and depending on DPDK internals, provide an
official way to do memseg_contig_walk() inside callbacks.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Add a function to return starting point of current contiguous
block, going backwards. All semantics are kept the same as the
existing function, with the only difference being that given the
same input, results will be returned in reverse order.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Add a function to look for N used/free slots, but going backwards
instead of forwards. All semantics are kept similar to the existing
function, with the difference being that given the same input, the
same results will be returned in reverse order.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Add function to look up used/free indexes starting from specified
index, but going backwards instead of forward. Semantics are kept
similar to the existing function, except for the fact that, given
the same input, the results returned will be in reverse order.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Validate RTE_HASH_BUCKET_ENTRIES during compilation instead of
run time.
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Existing service functions allow us to stop a service, but doing so doesn't
guarantee that the service has finished running on a service core. This
commit introduces rte_service_may_be_active(), which returns whether the
service may be executing on one or more lcores currently, or definitely is
not.
The service core layer supports this function by setting a flag when
a service core is going to execute a service, and unsetting the flag when
the core is no longer able to run the service (its runstate becomes stopped
or the lcore is no longer mapped).
With this new function, applications can set a service's runstate to
stopped, then poll rte_service_may_be_active() until it returns false. At
that point, the service is quiesced.
Signed-off-by: Gage Eads <gage.eads@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
A constructor is usually declared with RTE_INIT* macros.
As it is a static function, no need to declare before its definition.
The macro is used directly in the function definition.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Add APIs that allow an application to query and reset the attributes of
a service lcore. Add one such new attribute, "loops", which is a
counter that tracks the number of times the service core has looped in
the service runner function. This is useful to applications that desire
a "liveness" check to make sure a service core is not stuck.
Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
For cryptodev dynamic logging, conditional compilation of
debug logs is not actually required.
Signed-off-by: Jananee Parthasarathy <jananeex.m.parthasarathy@intel.com>
Reviewed-by: Reshma Pattan <reshma.pattan@intel.com>
Reviewed-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
static logging macro RTE_PMD_DEBUG_TRACE is enabled with a few DEBUG
config options, including RTE_LIBRTE_ETHDEV_DEBUG
RTE_LIBRTE_ETHDEV_DEBUG is still used for data path logging, but all
ethdev logging switched to dynamic logging, so no need to enable static
logging macro for ethdev.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>