Add a new function, rte_hash_count, to return the number of keys that
are currently stored in the hash table. Corresponding test functions are
added into hash_test and hash_multiwriter test.
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
The existing implementation of librte_hash does not support read-write
concurrency. This commit implements read-write safety using rte_rwlock
and rte_rwlock TM version if hardware transactional memory is available.
Both multi-writer and read-write concurrency is protected by rte_rwlock
now. The x86 specific header file is removed since the x86 specific RTM
function is not called directly by rte hash now.
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
This commit refactors the hash table lookup/add/del code
to remove some code duplication. Processing on primary bucket can
also apply to secondary bucket with same code.
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
This commit calculates the needed key slot size more
accurately. The previous local cache fix requires
the free slot ring to be larger than actually needed.
The calculation of the value is inaccurate.
Fixes: 5915699153d7 ("hash: fix scaling by reducing contention")
Cc: stable@dpdk.org
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Current multi-writer implementation uses Intel TSX to
protect the cuckoo path moving but not the cuckoo
path searching. After searching, we need to verify again if
the same empty slot still exists at the beginning of the TSX
region. Otherwise another writer could occupy the empty slot
before the TSX region. Current code does not verify.
Fixes: be856325cba3 ("hash: add scalable multi-writer insertion with Intel TSX")
Cc: stable@dpdk.org
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
When malloc for multiwriter_lock, the align should be
RTE_CACHE_LINE_SIZE rather than LCORE_CACHE_SIZE.
Also there should be check to verify the success of
rte_malloc.
Fixes: be856325cba3 ("hash: add scalable multi-writer insertion with Intel TSX")
Cc: stable@dpdk.org
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Validate RTE_HASH_BUCKET_ENTRIES during compilation instead of
run time.
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
In function 'crc32c_sse42_u64_mimic':
rte_hash_crc.h:402:40:
warning: conversion from 'uint64_t' {aka 'long unsigned int'}
to 'uint32_t' {aka 'unsigned int'} may change value [-Wconversion]
init_val = crc32c_sse42_u32(d.u32[0], init_val);
Fixes: 00bf774bab0b ("hash: add assembly implementation of CRC32 intrinsics")
Cc: stable@dpdk.org
Signed-off-by: Andy Green <andy@warmcat.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
In function 'crc32c_2words':
rte_hash_crc.h:347:2:
warning: ISO C90 forbids mixed declarations and code
[-Wdeclaration-after-statement]
uint32_t crc, term1, term2;
Fixes: d983cf41698f ("hash: add software CRC32 implementation")
Cc: stable@dpdk.org
Signed-off-by: Andy Green <andy@warmcat.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
rte_hash_lookup_with_hash() has wrong comment for its 'sig' param.
Fixes: 1a9f648be291 ("hash: fix for multi-process apps")
Cc: stable@dpdk.org
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
By making "compat" lib (which consists of a header only) a dependency of
the EAL, we make the header file available to all other libs, drivers and
apps, and thereby make it less work to do ABI versioning.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Add non-EAL libraries to DPDK build. The compat lib is a special case,
along with the previously-added EAL, but all other libs can be build using
the same set of commands, where the individual meson.build files only need
to specify their dependencies, source files, header files and ABI versions.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Keith Wiles <keith.wiles@intel.com>
Acked-by: Luca Boccassi <luca.boccassi@gmail.com>
Compile-time function selection can potentially lead to
lower performance on generic builds done by distros.
Replaced compile time flag checks with run-time function
selection.
Signed-off-by: Elza Mathew <elza.mathew@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Compile-time function selection can potentially lead to
lower performance on generic builds done by distros.
Replaced compile time flag checks with run-time function
selection.
Signed-off-by: Elza Mathew <elza.mathew@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Many exported headers rely on definitions found in rte_config.h without
including it, as shown by the following command:
grep -L '^#include <rte_config.h>' -- \
$(grep -Rl \
$(sed -n '/^#define \([^ ]\+\).*$/{s//\1/;H;};${x;s/\n//;s/\n/\\|/g;p;}' \
build/include/rte_config.h) \
-- build/include/)
We cannot assume external applications will include rte_config.h on their
own, neither directly nor through a -include parameter like DPDK does
internally.
This not only causes obvious compilation failures that can be reproduced
with check-includes.sh such as:
[...]/rte_memory.h:88:43: error: ‘RTE_CACHE_LINE_SIZE’ was not declared in
this scope
#define __rte_cache_aligned __rte_aligned(RTE_CACHE_LINE_SIZE)
^
It also results in less visible issues, for instance rte_hash_crc.h relying
on RTE_ARCH_X86_64's presence to provide dedicated inline functions.
This patch partially reverts the commit below and adds missing include
lines to the remaining files.
Fixes: f1a7a5c5f404 ("remove include of generated config header")
Cc: stable@dpdk.org
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
void * pointer can be assigned to any data type pointer.
Unnecessary cast can be removed in order to keep code clearer.
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Replace the BSD license header with the SPDX tag for files
with only an Intel copyright on them.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
The memzone header is often included without good reason.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
It is easier to find all constructor functions when they use
the same macros RTE_INIT or RTE_INIT_PRIO.
The macro definitions are moved from rte_eal.h to rte_common.h.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
The list of libraries in LDLIBS was generated from the DEPDIRS-xyz
variable. This is valid when the subdirectory name match the library
name, but it's not always the case, especially for PMDs.
The patches removes this feature and explicitly adds the proper
libraries in LDLIBS.
Some DEPDIRS-xyz variables become useless, remove them.
Reported-by: Gage Eads <gage.eads@intel.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Gage Eads <gage.eads@intel.com>
Use rte_bsf32 and fast bit unset operation to optimize the
softrss computation.
The following measurements shows improvement over the default
softrss computation function.
tuple lens old(cycles) new(cycles)
3 1225 337
9 3743 992
Signed-off-by: Yangchao Zhou <zhouyates@gmail.com>
Reviewed-by: Vladimir Medvedkin <medvedkinv@gmail.com>
When adding a new entry in a hash table, there is
a maximum number of evictions that can be
performed. When the counter of these evictions reaches
this maximum, the entry cannot be added, as it is considered
that the algorithm has encountered an infinite loop.
The problem with the current implementation, is that this
counter was declared as a static variable.
If there are multiple threads adding entries in the same table
or in different tables, they should access different counters,
one per core and per table.
Therefore, the variable has been modified to be non-static.
Fixes: 243e93a5046f ("hash: fix unlimited cuckoo path")
Cc: stable@dpdk.org
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Due to the uint32_t accesses in the hash computation, keys that aren't
aligned to a uint32_t boundary or multiples of uint32_t in length, may
see accesses beyond the end of the key. This may cross a page boundary.
Signed-off-by: Chas Williams <chas3@att.com>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: John McNamara <john.mcnamara@intel.com>
When adding items to a hash table with multiple threads,
there is an spinlock used to prevent data corruption
(unless Transactional Memory is supported).
If there is a failure, the spinlock should be released,
but there were cases where that was not happening.
Fixes: be856325cba3 ("hash: add scalable multi-writer insertion with Intel TSX")
Cc: stable@dpdk.org
Signed-off-by: Mike Stolarchuk <mike.stolarchuk@bigswitch.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Replace the incorrect reference to "Cavium Networks", "Cavium Ltd"
company name with correct the "Cavium, Inc" company name in
copyright headers.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Since SSE4 is now part of the minimum requirements for DPDK, we don't need
a fallback case to handle selection of algorithm when SSE4 is unavailable.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Remove rte_pause() definition from rte_common.h and
switchover to architecture specific rte_pause.h
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Compile the armv8a CRC32 support only if the machine
has the CRC extensions i.e if RTE_MACHINE_CPUFLAG_CRC32
is defined.
Removed the .arch assembly directives as these are no
more necessary.
Signed-off-by: Ashwin Sekhar T K <ashwin.sekhar@caviumnetworks.com>
Reviewed-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Verified the changes with thash_autotest unit test case
Signed-off-by: Ashwin Sekhar T K <ashwin.sekhar@caviumnetworks.com>
Acked-by: Jianbo Liu <jianbo.liu@linaro.org>
Fixing typos across dpdk source code using codespell utility.
Skipped the ethdev driver's base code fixes to keep the base
code intact.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
build error with icc version 17.0.4 (gcc version 7.0.0 compatibility):
In file included from .../dpdk/lib/librte_hash/rte_fbk_hash.h(59),
from .../dpdk/lib/librte_hash/rte_fbk_hash.c(54):
.../dpdk/x86_64-native-linuxapp-icc/include/rte_hash_crc.h(480):
error #1292: unknown attribute "fallthrough"
__attribute__ ((fallthrough));
^
In file included from .../dpdk/lib/librte_hash/rte_fbk_hash.h(59),
from .../dpdk/lib/librte_hash/rte_fbk_hash.c(54):
.../dpdk/x86_64-native-linuxapp-icc/include/rte_hash_crc.h(486):
error #1292: unknown attribute "fallthrough"
__attribute__ ((fallthrough));
^
This code patch hit when gcc > 7 installed and ICC doesn't recognize
fallthrough attribute.
Fixed by disabling code when compiled with ICC.
Fixes: 3dfb9facb055 ("lib: add switch fall-through comments")
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
This fixes compiler warnings with GCC 7 for arm64 build.
Fixes: da8dcc27f644 ("hash: use armv8-a CRC32 instructions")
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
With GCC 7 we need to explicitly document when we are falling through from
one switch case to another.
Fixes: af75078fece3 ("first public release")
Fixes: 8bae1da2afe0 ("hash: fallback to software CRC32 implementation")
Fixes: 9ec201f5d6e7 ("mbuf: provide bulk allocation")
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Add an extra parameter to the ring dequeue burst/bulk functions so that
those functions can optionally return the amount of remaining objs in the
ring. This information can be used by applications in a number of ways,
for instance, with single-consumer queues, it provides a max
dequeue size which is guaranteed to work.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Add an extra parameter to the ring enqueue burst/bulk functions so that
those functions can optionally return the amount of free space in the
ring. This information can be used by applications in a number of ways,
for instance, with single-producer queues, it provides a max
enqueue size which is guaranteed to work. It can also be used to
implement watermark functionality in apps, replacing the older
functionality with a more flexible version, which enables apps to
implement multiple watermark thresholds, rather than just one.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Before this patch, the management of dependencies between directories
had several issues:
- the generation of .depdirs, done at configuration is slow: it can take
more than one minute on some slow targets (usually ~10s on a standard
PC without -j).
- for instance, it is possible to express a dependency like:
- app/foo depends on lib/librte_foo
- and lib/librte_foo depends on app/bar
But this won't work because the directories are traversed with a
depth-first algorithm, so we have to choose between doing 'app' before
or after 'lib'.
- the script depdirs-rule.sh is too complex.
- we cannot use "make -d" for debug, because the output of make is used for
the generation of .depdirs.
This patch moves the DEPDIRS-* variables in the upper Makefile, making
the dependencies much easier to calculate. A DEPDIRS variable is still
used to process library dependencies in LDLIBS.
After this commit, "make config" is almost immediate.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Robin Jarry <robin.jarry@6wind.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Multiwriter insert function was using a fixed value for
the bucket size, instead of using the
RTE_HASH_BUCKET_ENTRIES macro, which value was changed
recently (making it inconsistent in this case).
Fixes: be856325cba3 ("hash: add scalable multi-writer insertion with Intel TSX")
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
When trying to insert a new entry, if its target bucket is full,
the alternative location (bucket) of one of the entries is checked,
to try to find an empty slot, with make_space_bucket.
This function is called every time a new bucket is checked, recursively.
To avoid having a very long insert operation (and to avoid filling up
the stack), a limit in the number of pushes is introduced.
Fixes: 48a399119619 ("hash: replace with cuckoo hash implementation")
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
This patch replaces the pipelined rte_hash lookup mechanism with a
loop-and-jump model, which performs significantly better,
especially for smaller table sizes and smaller table occupancies.
Signed-off-by: Byron Marohn <byron.marohn@intel.com>
Signed-off-by: Saikrishna Edupuganti <saikrishna.edupuganti@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Sameh Gobriel <sameh.gobriel@intel.com>
In lookup bulk function, the signatures of all entries
are compared against the signature of the key that is being looked up.
Now that all the signatures are together, they can be compared
with vector instructions (SSE, AVX2), achieving higher lookup performance.
Also, entries per bucket are increased to 8 when using processors
with AVX2, as 256 bits can be compared at once, which is the size of
8x32-bit signatures.
Signed-off-by: Byron Marohn <byron.marohn@intel.com>
Signed-off-by: Saikrishna Edupuganti <saikrishna.edupuganti@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Sameh Gobriel <sameh.gobriel@intel.com>
Move current signatures of all entries together in the bucket
and same with all alternative signatures, instead of having
current and alternative signatures together per entry in the bucket.
This will be benefitial in the next commits, where a vectorized
comparison will be performed, achieving better performance.
The alternative signatures have been moved away from
the current signatures, to make the key indices be consecutive
to the current signatures, as these two fields are used by lookup,
so they are in the same cache line.
Signed-off-by: Byron Marohn <byron.marohn@intel.com>
Signed-off-by: Saikrishna Edupuganti <saikrishna.edupuganti@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Sameh Gobriel <sameh.gobriel@intel.com>
In order to optimize lookup performance, hash structure
is reordered, so all fields used for lookup will be
in the first cache line.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Sameh Gobriel <sameh.gobriel@intel.com>
In function rte_hash_cuckoo_insert_mw_tm, while looking for
an empty slot, only the first entry in the bucket was being checked,
as key_idx array was not being iterated.
Fixes: 5fc74c2e146d ("hash: check if slot is empty with key index")
Reported-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Instead of checking if the current and alternative signatures are 0,
it is faster to check if the key index associated to an entry
is 0, meaning that the slot is empty.
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Saikrishna Edupuganti <saikrishna.edupuganti@intel.com>
This commit fixes a corner case scenario. When a key is deleted,
its signature in the hash table gets clear, which should prevent
a lookup of that same key, unless the signature of the key is all zeroes.
In that case, there will be a match, and key would be compared against
the key that is in the table (which does not get cleared,
as the performance penalty would be high), resulting in a wrong hit.
To prevent this from happening, the key index associated to that entry
should be set to zero when deleting it, so in case that same key
is looked up just after a deletion, it will point to the dummy key slot,
which guarantees a miss.
Fixes: 48a399119619 ("hash: replace with cuckoo hash implementation")
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Saikrishna Edupuganti <saikrishna.edupuganti@intel.com>
Ring stores the free slots available to be used in the key table.
The ring size was being increased by 1, because of the dummy slot,
used for key misses, but this is not actually stored in the ring,
so there is no need to increase it.
Fixes: 5915699153d7 ("hash: fix scaling by reducing contention")
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Saikrishna Edupuganti <saikrishna.edupuganti@intel.com>