Introduce a new function rte_mempool_create_empty()
that allocates a mempool that is not populated.
The functions rte_mempool_create() and rte_mempool_xmem_create()
now make use of it, making their code much easier to read.
Currently, they are the only users of rte_mempool_create_empty()
but the function will be made public in next commits.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Introduce rte_mempool_free() that:
- unlink the mempool from the global list if it is found
- free all the memory chunks using their free callbacks
- free the internal ring
- free the memzone containing the mempool
Currently this function is only used in error cases when
creating a new mempool, but it will be made public later
in the patch series.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Storing the pointer to the memzone instead of the physical address
provides more information than just the physical address: for instance,
the memzone flags.
Moreover, keeping the memzone pointer will allow us to free the mempool
(this is done later in the series).
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Introduce a new function rte_mempool_populate_virt() that is now called
by default when hugepages are not supported. This function populate the
mempool with several physically contiguous chunks whose minimum size is
the page size of the system.
Thanks to this, rte_mempool_create() will work properly in without
hugepages (if the object size is smaller than a page size), and 2
specific workarouds can be removed:
- trailer_size was artificially extended to a page size
- rte_mempool_virt2phy() did not rely on object physical address
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Introduce rte_mempool_populate_default() which allocates
mempool objects in several memzones.
The mempool header is now always allocated in a specific memzone
(not with its objects). Thanks to this modification, we can remove
many specific behavior that was required when hugepages are not
enabled in case we are using rte_mempool_xmem_create().
This change requires to update how kni and mellanox drivers lookup for
mbuf memory. For now, this will only work if there is only one memory
chunk (like today), but we could make use of rte_mempool_mem_iter() to
support more memory chunks.
We can also remove RTE_MEMPOOL_OBJ_NAME that is not required anymore for
the lookup, as memory chunks are referenced by the mempool.
Note that rte_mempool_create() is still broken (it was the case before)
when there is no hugepages support (rte_mempool_create_xmem() has to be
used). This is fixed in next commit.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Update rte_mempool_xmem_size() so that when the page_shift argument is
set to 0, assume that memory is physically contiguous, allowing to
ignore page boundaries. This will be used in the next commits.
By the way, rename the variable 'n' as 'obj_per_page' and avoid the
affectation inside the if().
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Introduce a free callback that is passed to the populate* functions,
which is used when freeing a mempool. This is unused now, but as next
commits will populate the mempool with several chunks of memory, we
need a way to free them properly on error.
Later in the series, we will also introduce a public rte_mempool_free()
and the ability for the user to populate a mempool with its own memory.
For that, we also need a free callback.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
This commit simplifies rte_mempool_xmem_usage().
Since previous commit, the function rte_mempool_xmem_usage() is
now the last user of rte_mempool_obj_mem_iter(). This complex
code can now be moved inside the function. We can get rid of the
callback and do some simplification to make the code more readable.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
In the same model than rte_mempool_obj_iter(), introduce
rte_mempool_mem_iter() to iterate the memory chunks attached
to the mempool.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Do not use paddr table to store the mempool memory chunks.
This will allow to have several chunks with different virtual addresses.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Store the physical address of the object in its header. It simplifies
rte_mempool_virt2phy() and prepares the removing of the paddr[] table
in the mempool header.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
This makes the code of rte_mempool_create() clearer, and it will make
the introduction of external mempool handler easier (in another patch
series). Indeed, this function contains the specific part when a ring is
used, but it could be replaced by something else in the future.
This commit also adds a socket_id field in the mempool structure that
is used by this new function.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Before this patch, the mempool elements were initialized at the time
they were added to the mempool. This patch changes this to do the
initialization of all objects once the mempool is populated, using
rte_mempool_obj_iter() introduced in previous commits.
Thanks to this modification, we are getting closer to a new API
that would allow us to do:
mempool_init()
mempool_populate(mem1)
mempool_populate(mem2)
mempool_populate(mem3)
mempool_init_obj()
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Use the new rte_mempool_obj_iter() instead the old rte_mempool_obj_iter()
to iterate among objects to audit them (check for cookies).
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Now that the mempool objects are chained into a list, we can use it to
browse them. This implies a rework of rte_mempool_obj_iter() API, that
does not need to take as many arguments as before. The previous function
is kept as a private function, and renamed in this commit. It will be
removed in a next commit of the patch series.
The only internal users of this function are the mellanox drivers. The
code is updated accordingly.
Introducing an API compatibility for this function has been considered,
but it is not easy to do without keeping the old code, as the previous
function could also be used to browse elements that were not added in a
mempool. Moreover, the API is already be broken by other patches in this
version.
The library version was already updated in
commit 213af31e09 ("mempool: reduce structure size if no cache needed")
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
In next commits, we will use an iterator to walk through the objects in
mempool in rte_mempool_audit(). This iterator takes a "struct
rte_mempool *" as a parameter because it is assumed that the callback
function can modify the mempool.
The previous approach was to introduce a RTE_DECONST() macro, but
after discussion it seems that removing the const qualifier is better
to avoid fooling the compiler, and also because these functions are
not used in datapath (possible compiler optimizations due to const
are not critical).
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
This commit removes the const qualifier for the mempool in
rte_mempool_walk() callback prototype.
Indeed, most functions that can be done on a mempool require a non-const
mempool pointer, except the dump and the audit. Therefore, the
mempool_walk() is more useful if the mempool pointer is not const.
This is required by next commit where the mellanox drivers use
rte_mempool_walk() to iterate the mempools, then rte_mempool_obj_iter()
to iterate the objects in each mempool.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Introduce a list entry in object header so they can be listed and
browsed. The objective is to introduce a more simple way to browse the
elements of a mempool.
The next commits will update rte_mempool_obj_iter() to use this list,
and remove the previous complex implementation.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
This commit renames mempool_obj_ctor_t as mempool_obj_cb_t.
In next commits, we will add the ability to populate the
mempool and iterate through objects using the same function.
We will use the same callback type for that. As the callback is
not a constructor anymore, rename it into rte_mempool_obj_cb_t.
The rte_mempool_obj_iter_t that was used to iterate over objects
will be removed in next commits.
No functional change.
In this commit, the API is preserved through a compat typedef.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Since commits d2e0ca22f and 97e7e685b the headers and trailers
of the mempool are defined as a structure. We can get their
size using a sizeof instead of doing a calculation that will
become wrong at the first structure update.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
There's no reason to keep this function inlined. Move it to
rte_mempool.c. We need to export the function for when compiling
with shared libraries + debug. We also need to keep the macro,
because we don't want to call an empty function when debug is
disabled.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
This commit replaces elt_size by total_elt_size when appropriate.
In some mempool functions, we use the size of the elements as arguments
or variables. There is a confusion between the size including or not
including the header and trailer.
To avoid this confusion:
- update the API documentation
- rename the variables and argument names as "elt_size" when the size
does not include the header and trailer, or else as "total_elt_size".
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked by: Keith Wiles <keith.wiles@intel.com>
No functional change, just fix some comments and styling issues.
Also avoid to duplicate comments between rte_mempool_create()
and rte_mempool_xmem_create().
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked by: Keith Wiles <keith.wiles@intel.com>
The rte_mempool structure is changed, which will cause an ABI change
for this structure. Providing backward compat is not reasonable
here as this structure is used in multiple defines/inlines.
Allow mempool cache support to be dynamic depending on if the
mempool being created needs cache support. Saves about 1.5M of
memory used by the rte_mempool structure.
Allocating small mempools which do not require cache can consume
larges amounts of memory if you have a number of these mempools.
Signed-off-by: Keith Wiles <keith.wiles@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The macro RTE_VERIFY always checks a condition.
It is optimized with "unlikely" hint.
While this macro is well suited for test applications, it is preferred
in libraries and examples to enable such check in debug mode.
That's why the macro RTE_ASSERT is introduced to call RTE_VERIFY only
if built with debug logs enabled.
A lot of assert macros were duplicated and enabled with a specific flag.
Removing these #ifdef allows to test these code branches more easily
and avoid dead code pitfalls.
The ENA_ASSERT is kept (in debug mode only) because it has more
parameters to log.
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Since commits ff909fe21f and 4e32101f9b, it is now possible to free
memzones and rings.
The rte_mempool_create() should be modified to take advantage of this
and not leak memory when an allocation fails.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
There is a new function in the EAL API for internal use.
It has neither a proper prefix nor a .map export:
libethdev.so: undefined reference to `is_xen_dom0_supported'
Fixes: 719dbebceb ("xen: allow determining DOM0 at runtime")
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Optimize for quad-channel by default, this should work well for
all the cases, better than the previous value of one anyway.
Suggested-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: David Marchand <david.marchand@6wind.com>
On TILE-Gx and TILE-Mx platforms, the buffers fed into the hardware
buffer manager require a 128-byte alignment. With this change, we
allow configuration based override of the element alignment, and
default to RTE_CACHE_LINE_SIZE if left unspecified.
Signed-off-by: Cyril Chemparathy <cchemparathy@ezchip.com>
Signed-off-by: Zhigang Lu <zlu@ezchip.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Translating from a mempool object to the mempool pointer does not break
alignment constraints. However, the compiler is unaware of this fact and
complains on -Wcast-align. This patch modifies the code to use RTE_PTR_SUB(),
thereby silencing the compiler by casting through (void *).
Signed-off-by: Cyril Chemparathy <cchemparathy@ezchip.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Each object stored in mempools are suffixed by a trailer, storing
a cookie in debug mode which help to detect memory corruptions.
Like for headers, introduce a structure that materializes the content of
this trailer.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Each object stored in mempools are prefixed by a header, allowing for
instance to retrieve the mempool pointer from the object. When debug is
enabled, a cookie is also added in this header that helps to detect
corruptions and double-frees.
Introduce a structure that materializes the content of this header,
and will simplify future patches adding things in this header.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
In rte_mempool_obj_iter(), when element boundary coincides with page boundary,
even if a single page is required per object, a loop checks that the next page
is contiguous and drops the first one otherwise.
This commit checks subsequent pages only when several are required per object.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
rte_mempool_xmem_usage()'s return type is ssize_t which has the same
architecture-dependent width as size_t but is signed.
On 64-bit architectures, returning a negative uint32_t value without casting
to ssize_t first does not work as intended, the sign bit is lost and the
returned value is garbage.
This commit fixes an assertion failure in testpmd on 64 bit architectures
when combining --no-huge and --mp-anon outside of Xen Dom0:
PANIC in mempool_anon_create():
line 170 assert "elt_num == mp->size" failed
Fixes: 148f963fb5 ("xen: core library changes")
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Otherwise cache_flushthresh can be bigger than n, and
a consumer can starve others by keeping every element
either in use or in the cache.
Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
A lot of places just protect against concurrent access and I can not see the
gain of having those macros.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Check the FILE *f and rte_mempool *mp pointers for NULL.
Signed-off-by: Keith Wiles <keith.wiles@windriver.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
CACHE_LINE_SIZE is a macro defined in machine/param.h in FreeBSD and
conflicts with DPDK macro version.
Adding RTE_ prefix to avoid conflicts.
CACHE_LINE_MASK and CACHE_LINE_ROUNDUP are also prefixed.
Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
[Thomas: updated on HEAD, including PPC]
Since the data structures such as rings are shared in their entirety,
those TAILQ pointers are shared as well. Meaning that, after a
successful rte_ring creation, the tailq_next pointer of the last
ring in the TAILQ will be updated with a pointer to a ring which may
not be present in the address space of another process (i.e. a ring
that may be host-local or guest-local, and not shared over IVSHMEM).
Any successive ring create/lookup on the other side of IVSHMEM will
result in trying to dereference an invalid pointer.
This patchset fixes this problem by creating a default tailq entry
that may be used by any data structure that chooses to use TAILQs.
This default TAILQ entry will consist of a tailq_next/tailq_prev
pointers, and an opaque pointer to arbitrary data. All TAILQ
pointers from data structures themselves will be removed and
replaced by those generic TAILQ entries, thus fixing the problem
of potentially exposing local address space to shared structures.
Technically, only rte_ring structure require modification, because
IVSHMEM is only using memzones (which aren't in TAILQs) and rings,
but for consistency's sake other TAILQ-based data structures were
adapted as well.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
The function rte_snprintf serves no useful purpose. It is the
same as snprintf() for all valid inputs. Deprecate it and
replace all uses in current code.
Leave the tests for the deprecated function in place.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
This commit removes trailing whitespace from lines in files. Almost all
files are affected, as the BSD license copyright header had trailing
whitespace on 4 lines in it [hence the number of files reporting 8 lines
changed in the diffstat].
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
[Thomas: remove spaces before tabs in libs]
[Thomas: remove more trailing spaces in non-C files]
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Add function to iterate over mempool.
Useful for diagnostic code that wants to look at mempool usage patterns.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
The DPDK dump functions are useful for remote debugging of an
applications. But when application runs as a daemon, stdout
is typically routed to /dev/null.
Instead change all these functions to take a stdio FILE * handle
instead. An application can then use open_memstream() to capture
the output.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
[Thomas: fix quota_watermark example]
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
The second condition of this logical OR:
(get_gcd(new_obj_size, nrank * nchan) != 1 ||
get_gcd(nchan, new_obj_size) != 1)
is redundant with the first condition.
We can show that the first condition is equivalent to its disjunction
with the second condition using these two results:
- R1: For all conditions A and B, if B implies A, then (A || B) is
equivalent to A.
- R2: (get_gcd(nchan, new_obj_size) != 1) implies
(get_gcd(new_obj_size, nrank * nchan) != 1)
We can show R1 with the following truth table (0 is false, 1 is true):
+-----+-----++----------+-----+-------------+
| A | B || (A || B) | A | B implies A |
+-----+-----++----------+-----+-------------+
| 0 | 0 || 0 | 0 | 1 |
| 0 | 1 || 1 | 0 | 0 |
| 1 | 0 || 1 | 1 | 1 |
| 1 | 1 || 1 | 1 | 1 |
+-----+-----++----------+-----+-------------+
Truth table of (A || B) and A
We can show R2 by looking at the code of optimize_object_size and
get_gcd.
We see that:
- S1: (nchan >= 1) and (nrank >= 1).
- S2: get_gcd returns 0 only when both arguments are 0.
Let:
- X be get_gcd(new_obj_size, nrank * nchan).
- Y be get_gcd(nchan, new_obj_size).
Suppose:
- H1: get_gcd returns the greatest common divisor of its arguments.
- H2: (nrank * nchan) does not exceed UINT_MAX.
We prove (Y != 1) implies (X != 1) with the following steps:
- Suppose L0: (Y != 1). We have to show (X != 1).
- By H1, Y is the greatest common divisor of nchan and new_obj_size.
In particular, we have L1: Y divides nchan and new_obj_size.
- By H2, we have L2: nchan divides (nrank * nchan)
- By L1 and L2, we have L3: Y divides (nrank * nchan) and
new_obj_size.
- By H1 and L3, we have L4: (Y <= X).
- By S1 and S2, we have L5: (Y != 0).
- By L0 and L5, we have L6: (Y > 1).
- By L4 and L6, we have (X > 1) and thus (X != 1), which concludes.
R2 was also tested for all values of new_obj_size, nrank, and nchan
between 0 and 2000.
This redundant condition was found using TrustInSoft Analyzer.
Signed-off-by: Julien Cretin <julien.cretin@trust-in-soft.com>
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>