811 Commits

Author SHA1 Message Date
Anatoly Burakov
75185aa5fe malloc: allow removing memory from named heaps
Add an API to remove memory from specified heaps. This will first
check if all elements within the region are free, and that the
region is the original region that was added to the heap (by
comparing its length to length of memory addressed by the
underlying memseg list).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 11:56:55 +02:00
Anatoly Burakov
7d75c31014 malloc: allow adding memory to named heaps
Add an API to add externally allocated memory to malloc heap. The
memory will be stored in memseg lists like regular DPDK memory.
Multiple segments are allowed within a heap. If IOVA table is
not provided, IOVA addresses are filled in with RTE_BAD_IOVA.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 11:56:55 +02:00
Anatoly Burakov
15d6dd023c malloc: allow destroying heaps
Add an API to destroy specified heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 11:56:55 +02:00
Anatoly Burakov
02e323a8a8 malloc: allow creating malloc heaps
Add API to allow creating new malloc heaps. They will be created
with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing
with internal heaps.

This breaks the ABI, so document the change.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 11:56:51 +02:00
Anatoly Burakov
65ff37b105 malloc: add function to check if socket is external
An API is needed to check whether a particular socket ID belongs
to an internal or external heap. Prime user of this would be
mempool allocator, because normal assumptions of IOVA
contiguousness in IOVA as VA mode do not hold in case of
externally allocated memory.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 11:11:25 +02:00
Anatoly Burakov
e1fe3c2fab malloc: add function to query socket ID of named heap
When we will be creating external heaps, they will have their own
"fake" socket ID, so add a function that will map the heap name
to its socket ID.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 11:11:25 +02:00
Anatoly Burakov
d14c148e79 malloc: add name to malloc heaps
We will need to refer to external heaps in some way. While we use
heap ID's internally, for external API use it has to be something
more user-friendly. So, we will be using a string to uniquely
identify a heap.

This breaks the ABI, so document the change.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 11:11:23 +02:00
Anatoly Burakov
72cf92b318 malloc: index heaps using heap ID rather than NUMA node
Switch over all parts of EAL to use heap ID instead of NUMA node
ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
node's index within the detected NUMA node list. Heap ID for
external heaps will be order of their creation.

This breaks the ABI, so document the changes.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-11 10:37:39 +02:00
Anatoly Burakov
5282bb1c36 mem: allow memseg lists to be marked as external
When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.

This breaks the ABI, so document the change in release notes.
This also breaks a few internal assumptions about memory
contiguousness, so adjust malloc code in a few places.

All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.

Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
2018-10-11 10:24:29 +02:00
Anatoly Burakov
4104b2a485 mem: add length to memseg list
Previously, to calculate length of memory area covered by a memseg
list, we would've needed to multiply page size by length of fbarray
backing that memseg list. This is not obvious and unnecessarily
low level, so store length in the memseg list itself.

This breaks ABI, so bump the EAL ABI version and document the
change. Also, while we're breaking ABI, pack the members a little
better.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
2018-10-11 10:24:16 +02:00
Anatoly Burakov
03ba15ca65 vfio: allow mapping MSI-X BARs if kernel allows it
Currently, DPDK will skip mapping some areas (or even an entire BAR)
if MSI-X table happens to be in them but is smaller than page size.

Kernels 4.16+ will allow mapping MSI-X BARs [1], and will report this
as a capability flag. Capability flags themselves are also only
supported since kernel 4.6 [2].

This commit will introduce support for checking VFIO capabilities,
and will use it to check if we are allowed to map BARs with MSI-X
tables in them, along with backwards compatibility for older
kernels, including a workaround for a variable rename in VFIO
region info structure [3].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
linux.git/commit/?id=a32295c612c57990d17fb0f41e7134394b2f35f6

[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
linux.git/commit/?id=c84982adb23bcf3b99b79ca33527cd2625fbe279

[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
linux.git/commit/?id=ff63eb638d63b95e489f976428f1df01391e15e4

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-04 00:45:50 +02:00
Anatoly Burakov
64cdfc35aa mem: store memory mode flags in shared config
Currently, command-line switches for legacy mem mode or single-file
segments mode are only stored in internal config. This leads to a
situation where these flags have to always match between primary
and secondary, which is bad for usability.

Fix this by storing these flags in the shared config as well, so
that secondary process can know if the primary was launched in
single-file segments or legacy mem mode.

This bumps the EAL ABI, however there's an EAL deprecation notice
already in place[1] for a different feature, so that's OK.

[1] http://patches.dpdk.org/patch/43502/

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-10-04 00:09:47 +02:00
Anatoly Burakov
3a44687139 mem: allow querying offset into segment fd
In a few cases, user may need to query offset into fd for a
particular memory segment (for example, to selectively map
pages). This commit adds a new API to do that.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-19 15:01:58 +02:00
Anatoly Burakov
41dbdb6872 mem: add external API to retrieve page fd
Now that we can retrieve page fd's internally, we can expose it
as an external API. This will add two flavors of API - thread-safe
and non-thread-safe. Fix up internal API's to return values we need
without modifying rte_errno internally if called from within EAL.

We do not want calling code to accidentally close an internal fd, so
we make a duplicate of it before we return it to the user. Caller is
therefore responsible for closing this fd.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2018-09-19 14:48:04 +02:00
Gaetan Rivet
b0236c7cf7 eal: add strscpy function
The strncpy function has long been deemed unsafe for use,
in favor of strlcpy or snprintf.

While snprintf is standard and strlcpy is still largely available,
they both have issues regarding error checking and performance.

Both will force reading the source buffer past the requested size
if the input is not a proper c-string, and will return the expected
number of bytes copied, meaning that error checking needs to verify
that the number of bytes copied is not superior to the destination
size.

This contributes to awkward code flow, unclear error checking and
potential issues with malformed input.

The function strscpy has been discussed for some time already and
has been made available in the linux kernel[1].

Propose this new function as a safe alternative.

[1]: http://git.kernel.org/linus/30c44659f4a3

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Juhamatti Kuusisaari <juhamatti.kuusisaari@coriant.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2018-09-19 11:38:19 +02:00
Luca Boccassi
54d609a138 build: add ppc64 meson build
This has been only build-tested for now, on a native ppc64el POWER8E
machine running Debian sid.

Signed-off-by: Luca Boccassi <bluca@debian.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-09-17 12:21:17 +02:00
Thomas Monjalon
76b9d9de5c version: 18.11-rc0
Start version numbering for a new release cycle,
and introduce a template file for release notes.

The release notes comments have a new block to suggest
the order of items, inspired by Ferruh's proposal.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: John McNamara <john.mcnamara@intel.com>
2018-08-13 12:42:46 +02:00
Thomas Monjalon
11a1f847d2 version: 18.08.0
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-08-09 23:11:26 +02:00
Olivier Matz
6b867cc113 eal: remove experimental tag for user mbuf pool ops
Remove experimental tag from rte_eal_mbuf_user_pool_ops().

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2018-08-09 01:03:14 +02:00
Olivier Matz
83a8a143bb eal: remove deprecated function for mbuf pool ops
rte_eal_mbuf_default_mempool_ops() is replaced by
rte_mbuf_best_mempool_ops().

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2018-08-09 01:03:14 +02:00
Thomas Monjalon
686a41ac97 version: 18.08-rc3
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-08-06 01:45:20 +02:00
Andrew Rybchenko
aed1a766ed eal: deprecate device attach and detach functions
Hotplug functions should be used directly to add and remove devices.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
2018-08-06 00:52:25 +02:00
Jerin Jacob
ebaa25f070 eal: fix bitmap documentation
n_bits comes as first argument, align doxygen comment.

n_bit need to not be multiple of 512 as n_bits
are rounding to RTE_BITMAP_CL_BIT_SIZE.

Fixes: 14456f59e9f7 ("doc: fix doxygen warnings in QoS API")
Fixes: de3cfa2c9823 ("sched: initial import")
Cc: stable@dpdk.org

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
2018-08-05 20:10:12 +02:00
Thomas Monjalon
23888166d9 version: 18.08-rc2
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-07-26 23:59:06 +02:00
Hemant Agrawal
787ae736a3 vfio: remove experimental tag
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-26 23:46:18 +02:00
Thomas Monjalon
c27dbc300e version: 18.08-rc1
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-07-16 01:17:18 +02:00
Gaetan Rivet
ac1a511eff eal: implement device iteration
Use the iteration hooks in the abstraction layers to perform the
requested filtering on the internal device lists.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2018-07-15 23:44:17 +02:00
Gaetan Rivet
c99a2d4c6b eal: implement device iteration initialization
Parse a device description.
Split this description in their relevant part for each layers.
No dynamic allocation is performed.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2018-07-15 23:43:53 +02:00
Gaetan Rivet
670658b7a9 eal: add device iterator interface
A device iterator allows iterating over a set of devices.
This set is defined by the two descriptions offered,

  * rte_bus
  * rte_class

Only one description can be provided, or both. It is not allowed to
provide no description at all.

Each layer of abstraction then performs a filter based on the
description provided. This filtering allows iterating on their internal
set of devices, stopping when a match is valid and returning the current
iteration context.

This context allows starting the next iteration from the same point and
going forward.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2018-07-15 23:43:40 +02:00
Gaetan Rivet
338327d731 devargs: add function to parse device layers
This function is private to the EAL.
It is used to parse each layers in a device description string,
and store the result in an rte_devargs structure.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
2018-07-15 23:43:34 +02:00
Gaetan Rivet
d70f8448d0 eal: introduce device class abstraction
This abstraction exists since the infancy of DPDK.
It needs to be fleshed out however, to allow a generic
description of devices properties and capabilities.

A device class is the northbound interface of the device, intended
for applications to know what it can be used for.

It is conceptually just above buses.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
2018-07-15 23:42:53 +02:00
Gaetan Rivet
a671f01fcc eal: introduce destructor macros
This macro adds symbols to the .fini section using the global
RTE priorities, to ensure consistency.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
2018-07-15 23:42:27 +02:00
Gaetan Rivet
a23bc2c4e0 devargs: add non-variadic parsing function
rte_devargs_parse becomes non-variadic,
rte_devargs_parsef becomes the variadic version, to be used to compose
device strings.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-07-15 23:42:10 +02:00
Stephen Hemminger
6bc67c497a eal: add uuid API
Since uuid functions may not be available everywhere, implement
uuid functions in DPDK. These are based off the BSD licensed
libuuid in util-link.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
2018-07-13 23:42:08 +02:00
Anatoly Burakov
0b82bd7b24 memzone: improve zero-length reserve
Currently, reserving zero-length memzones is done by looking at
malloc statistics, and reserving biggest sized element found in those
statistics. This has two issues.

First, there is a race condition. The heap is unlocked between the
time we check stats, and the time we reserve malloc element for memzone.
This may lead to inability to reserve the memzone we wanted to reserve,
because another allocation might have taken place and biggest sized
element may no longer be available.

Second, the size returned by malloc statistics does not include any
alignment information, which is worked around by being conservative and
subtracting alignment length from the final result. This leads to
fragmentation and reserving memzones that could have been bigger but
aren't.

Fix all of this by using earlier-introduced operation to reserve
biggest possible malloc element. This, however, comes with a trade-off,
because we can only lock one heap at a time. So, if we check the first
available heap and find *any* element at all, that element will be
considered "the biggest", even though other heaps might have bigger
elements. We cannot know what other heaps have before we try and
allocate it, and it is not a good idea to lock all of the heaps at
the same time, so, we will just document this limitation and
encourage users to reserve memzones with socket id properly set.

Also, fixup unit tests to account for the new behavior.

Fixes: fafcc11985a2 ("mem: rework memzone to be allocated by malloc")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:27:30 +02:00
Anatoly Burakov
e26415428f mem: provide thread-unsafe memseg list walk variant
Sometimes, user code needs to walk memseg list while being inside
a memory-related callback. Rather than making everyone copy around
the same iteration code and depending on DPDK internals, provide an
official way to do memseg_list_walk() inside callbacks.

Also, remove existing reimplementation from memalloc code and use
the new API instead.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:21:25 +02:00
Anatoly Burakov
7c790af08f mem: provide thread-unsafe memseg walk variant
Sometimes, user code needs to walk memseg list while being inside
a memory-related callback. Rather than making everyone copy around
the same iteration code and depending on DPDK internals, provide an
official way to do memseg_walk() inside callbacks.

Also, remove existing reimplementation from sPAPR VFIO code and use
the new API instead.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:21:15 +02:00
Anatoly Burakov
b917147601 mem: provide thread-unsafe contig walk variant
Sometimes, user code needs to walk memseg list while being inside
a memory-related callback. Rather than making everyone copy around
the same iteration code and depending on DPDK internals, provide an
official way to do memseg_contig_walk() inside callbacks.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:20:06 +02:00
Anatoly Burakov
4d2dde26aa fbarray: add reverse finding of contiguous
Add a function to return starting point of current contiguous
block, going backwards. All semantics are kept the same as the
existing function, with the only difference being that given the
same input, results will be returned in reverse order.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:03:44 +02:00
Anatoly Burakov
e1ca5dc862 fbarray: add reverse finding of chunk
Add a function to look for N used/free slots, but going backwards
instead of forwards. All semantics are kept similar to the existing
function, with the difference being that given the same input, the
same results will be returned in reverse order.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:03:16 +02:00
Anatoly Burakov
b8d07c5252 fbarray: add reverse finding
Add function to look up used/free indexes starting from specified
index, but going backwards instead of forward. Semantics are kept
similar to the existing function, except for the fact that, given
the same input, the results returned will be in reverse order.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2018-07-13 11:02:39 +02:00
Honnappa Nagarahalli
7c872b9698 hash: validate hash bucket entries while compiling
Validate RTE_HASH_BUCKET_ENTRIES during compilation instead of
run time.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2018-07-12 12:43:10 +02:00
Gage Eads
e30dd31847 service: add mechanism for quiescing
Existing service functions allow us to stop a service, but doing so doesn't
guarantee that the service has finished running on a service core. This
commit introduces rte_service_may_be_active(), which returns whether the
service may be executing on one or more lcores currently, or definitely is
not.

The service core layer supports this function by setting a flag when
a service core is going to execute a service, and unsetting the flag when
the core is no longer able to run the service (its runstate becomes stopped
or the lcore is no longer mapped).

With this new function, applications can set a service's runstate to
stopped, then poll rte_service_may_be_active() until it returns false. At
that point, the service is quiesced.

Signed-off-by: Gage Eads <gage.eads@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
2018-07-06 06:54:49 +02:00
Thomas Monjalon
f8e9989606 remove useless constructor headers
A constructor is usually declared with RTE_INIT* macros.
As it is a static function, no need to declare before its definition.
The macro is used directly in the function definition.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-07-12 00:00:35 +02:00
Erik Gabriel Carrillo
f28f3594de service: add attribute API
Add APIs that allow an application to query and reset the attributes of
a service lcore.  Add one such new attribute, "loops", which is a
counter that tracks the number of times the service core has looped in
the service runner function.  This is useful to applications that desire
a "liveness" check to make sure a service core is not stuck.

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
2018-07-11 23:43:23 +02:00
Jananee Parthasarathy
b40ae2be73 cryptodev: remove debug compilation option
For cryptodev dynamic logging, conditional compilation of
debug logs is not actually required.

Signed-off-by: Jananee Parthasarathy <jananeex.m.parthasarathy@intel.com>
Reviewed-by: Reshma Pattan <reshma.pattan@intel.com>
Reviewed-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2018-07-11 00:57:51 +02:00
Ferruh Yigit
64bd384be4 eal: do not enable static log macro for ethdev
static logging macro RTE_PMD_DEBUG_TRACE is enabled with a few DEBUG
config options, including RTE_LIBRTE_ETHDEV_DEBUG

RTE_LIBRTE_ETHDEV_DEBUG is still used for data path logging, but all
ethdev logging switched to dynamic logging, so no need to enable static
logging macro for ethdev.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2018-07-03 01:35:58 +02:00
Thomas Monjalon
b259da7b24 version: 18.08-rc0
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-06-01 12:58:36 +02:00
Thomas Monjalon
a5dce55556 version: 18.05.0
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-05-30 22:55:57 +02:00
Thomas Monjalon
830410b265 version: 18.05-rc6
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2018-05-28 03:29:40 +02:00