24 Commits

Author SHA1 Message Date
Dharmik Thakkar
769b2de7fb hash: implement RCU resources reclamation
Currently, users have to use external RCU mechanisms to free resources
when using lock free hash algorithm.

Integrate RCU QSBR process to make it easier for the applications to use
lock free algorithm.
Refer to RCU documentation to understand various aspects of
integrating RCU library into other libraries.

Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
2020-10-24 09:25:13 +02:00
Honnappa Nagarahalli
fbfe568103 hash: use 32-bit elements rings to save memory
The freelist and external bucket indices are 32b. Using rings
that use 32b element sizes will save memory.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
2020-01-19 19:32:50 +01:00
Dharmik Thakkar
f401363d98 hash: support lock-free extendable bucket
This patch enables lock-free read-write concurrency support for
extendable bucket feature.

Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
2019-04-03 20:52:35 +02:00
Ruifeng Wang
90fefe78bf hash: optimize signature compare for Arm NEON
Implemented signature compare function based on neon intrinsic.
Hash bulk lookup had 3% - 6% performance gain after optimization.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2019-03-28 19:54:21 +01:00
Honnappa Nagarahalli
d5c677db89 hash: fix out-of-bound write while freeing key slot
Add a debug check for out-of-bound write while freeing the key slot.

Coverity issue: 325733
Fixes: e605a1d36ca7 ("hash: add lock-free r/w concurrency")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-12-21 01:53:33 +01:00
Honnappa Nagarahalli
e605a1d36c hash: add lock-free r/w concurrency
Add lock-free read-write concurrency. This is achieved by the
following changes.

1) Add memory ordering to avoid race conditions. The only race
condition that can occur is -  using the key store element
before the key write is completed. Hence, while inserting the element
the release memory order is used. Any other race condition is caught
by the key comparison. Memory orderings are added only where needed.
For ex: reads in the writer's context do not need memory ordering
as there is a single writer.

key_idx in the bucket entry and pdata in the key store element are
used for synchronisation. key_idx is used to release an inserted
entry in the bucket to the reader. Use of pdata for synchronisation
is required due to updation of an existing entry where-in only
the pdata is updated without updating key_idx.

2) Reader-writer concurrency issue, caused by moving the keys
to their alternative locations during key insert, is solved
by introducing a global counter(tbl_chng_cnt) indicating a
change in table.

3) Add the flag to enable reader-writer concurrency during
run time.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-10-26 12:50:43 +02:00
Honnappa Nagarahalli
dbdbc4a2e9 hash: fix key store element alignment
Fix the key store array element alignment such that every array
element is aligned on KEY_ALIGNMENT boundary. This is required to
make 'pdata' in 'struct rte_hash_key' align on its natural boundary
for atomic load/store.

Fixes: 473d1bebce43 ("hash: allow to store data in hash table")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-10-26 12:45:40 +02:00
Honnappa Nagarahalli
9d033dac7d hash: support no free on delete
rte_hash_lookup_xxx APIs return the index of slot in
the key store. Application(reader) can use that index to reference
other data structures in its scope. Because of this, the
index should not be freed till the application completes
using the index.
RTE_HASH_EXTRA_FLAGS_NO_FREE_ON_DEL is introduced to support this.
When this flag is enabled rte_hash_del_xxx APIs do not free the
key-store index/internal memory associated with the deleted
entry. The new API rte_hash_free_key_with_position should be called
to free the key-store index/internal memory after calling
rte_hash_del_xxx APIs.

Suggested-by: Yipeng Wang <yipeng1.wang@intel.com>
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-10-26 12:44:52 +02:00
Honnappa Nagarahalli
40f8e9c28c hash: separate multi-writer from r/w concurrency
RW concurrency is required with single writer and multiple reader
usecase as well. Hence, multi-writer should not be enabled by default when
RW concurrency is enabled.

Fixes: f2e3001b53ec ("hash: support read/write concurrency")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-10-26 12:43:52 +02:00
Yipeng Wang
c7d93df552 hash: use partial-key hashing
This commit changes the hashing mechanism to "partial-key
hashing" to calculate bucket index and signature of key.

This is  proposed in Bin Fan, et al's paper
"MemC3: Compact and Concurrent MemCache with Dumber Caching
and Smarter Hashing". Basically the idea is to use "xor" to
derive alternative bucket from current bucket index and
signature.

With "partial-key hashing", it reduces the bucket memory
requirement from two cache lines to one cache line, which
improves the memory efficiency and thus the lookup speed.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-10-26 01:04:33 +02:00
Yipeng Wang
75706568a7 hash: add extendable bucket feature
In use cases that hash table capacity needs to be guaranteed,
the extendable bucket feature can be used to contain extra
keys in linked lists when conflict happens. This is similar
concept to the extendable bucket hash table in packet
framework.

This commit adds the extendable bucket feature. User can turn
it on or off through the extra flag field during table
creation time.

Extendable bucket table composes of buckets that can be
linked list to current main table. When extendable bucket
is enabled, the hash table load can always achieve 100%.
In other words, the table can always accommodate the same
number of keys as the specified table size. This provides
100% table capacity guarantee.

Although keys ending up in the ext buckets may have longer
look up time, they should be rare due to the cuckoo
algorithm.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-10-26 01:04:33 +02:00
Yipeng Wang
86c1ef2090 hash: remove unused constant
Since the depth-first search of cuckoo path is removed, we do not
need the macro anymore which specifies the depth of the cuckoo
search.

Fixes: f2e3001b53ec ("hash: support read/write concurrency")
Cc: stable@dpdk.org

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-10-26 00:00:16 +02:00
Yipeng Wang
f2e3001b53 hash: support read/write concurrency
The existing implementation of librte_hash does not support read-write
concurrency. This commit implements read-write safety using rte_rwlock
and rte_rwlock TM version if hardware transactional memory is available.

Both multi-writer and read-write concurrency is protected by rte_rwlock
now. The x86 specific header file is removed since the x86 specific RTM
function is not called directly by rte hash now.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2018-07-12 23:03:50 +02:00
Honnappa Nagarahalli
7c872b9698 hash: validate hash bucket entries while compiling
Validate RTE_HASH_BUCKET_ENTRIES during compilation instead of
run time.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2018-07-12 12:43:10 +02:00
Elza Mathew
3a47be9abb hash: select cuckoo function at run-time
Compile-time function selection can potentially lead to
lower performance on generic builds done by distros.
Replaced compile time flag checks with run-time function
selection.

Signed-off-by: Elza Mathew <elza.mathew@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2018-01-20 15:34:50 +01:00
Bruce Richardson
369991d997 lib: use SPDX tag for Intel copyright files
Replace the BSD license header with the SPDX tag for files
with only an Intel copyright on them.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
2018-01-04 22:41:39 +01:00
Bruce Richardson
4f4cd8717e hash: remove checks for SSE
Since SSE4 is now part of the minimum requirements for DPDK, we don't need
a fallback case to handle selection of algorithm when SSE4 is unavailable.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2017-07-04 14:35:41 +02:00
Pablo de Lara
243e93a504 hash: fix unlimited cuckoo path
When trying to insert a new entry, if its target bucket is full,
the alternative location (bucket) of one of the entries is checked,
to try to find an empty slot, with make_space_bucket.
This function is called every time a new bucket is checked, recursively.
To avoid having a very long insert operation (and to avoid filling up
the stack), a limit in the number of pushes is introduced.

Fixes: 48a399119619 ("hash: replace with cuckoo hash implementation")

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2016-10-12 18:40:51 +02:00
Byron Marohn
ff15d9c0ba hash: modify lookup bulk pipeline
This patch replaces the pipelined rte_hash lookup mechanism with a
loop-and-jump model, which performs significantly better,
especially for smaller table sizes and smaller table occupancies.

Signed-off-by: Byron Marohn <byron.marohn@intel.com>
Signed-off-by: Saikrishna Edupuganti <saikrishna.edupuganti@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Sameh Gobriel <sameh.gobriel@intel.com>
2016-10-05 12:10:49 +02:00
Byron Marohn
58017c98ed hash: add vectorized comparison
In lookup bulk function, the signatures of all entries
are compared against the signature of the key that is being looked up.
Now that all the signatures are together, they can be compared
with vector instructions (SSE, AVX2), achieving higher lookup performance.

Also, entries per bucket are increased to 8 when using processors
with AVX2, as 256 bits can be compared at once, which is the size of
8x32-bit signatures.

Signed-off-by: Byron Marohn <byron.marohn@intel.com>
Signed-off-by: Saikrishna Edupuganti <saikrishna.edupuganti@intel.com>
Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Sameh Gobriel <sameh.gobriel@intel.com>
2016-10-05 12:09:50 +02:00
Byron Marohn
8a9f542f32 hash: reorganize bucket structure
Move current signatures of all entries together in the bucket
and same with all alternative signatures, instead of having
current and alternative signatures together per entry in the bucket.
This will be benefitial in the next commits, where a vectorized
comparison will be performed, achieving better performance.

The alternative signatures have been moved away from
the current signatures, to make the key indices be consecutive
to the current signatures, as these two fields are used by lookup,
so they are in the same cache line.

Signed-off-by: Byron Marohn <byron.marohn@intel.com>
Signed-off-by: Saikrishna Edupuganti <saikrishna.edupuganti@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Sameh Gobriel <sameh.gobriel@intel.com>
2016-10-05 12:08:56 +02:00
Pablo de Lara
02a08eb355 hash: reorder hash structure
In order to optimize lookup performance, hash structure
is reordered, so all fields used for lookup will be
in the first cache line.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Sameh Gobriel <sameh.gobriel@intel.com>
2016-10-05 12:08:04 +02:00
Pablo de Lara
5fc74c2e14 hash: check if slot is empty with key index
Instead of checking if the current and alternative signatures are 0,
it is faster to check if the key index associated to an entry
is 0, meaning that the slot is empty.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Saikrishna Edupuganti <saikrishna.edupuganti@intel.com>
2016-09-29 21:51:27 +02:00
Wei Shen
be856325cb hash: add scalable multi-writer insertion with Intel TSX
This patch introduced scalable multi-writer Cuckoo Hash insertion
based on a split Cuckoo Search and Move operation using Intel
TSX. It can do scalable hash insertion with 22 cores with little
performance loss and negligible TSX abortion rate.

* Added an extra rte_hash flag definition to switch default single writer
  Cuckoo Hash behavior to multiwriter.
    - If HTM is available, it would use hardware feature for concurrency.
    - If HTM is not available, it would fall back to spinlock.

* Created a rte_cuckoo_hash_x86.h file to hold all x86-arch related
  cuckoo_hash functions. And rte_cuckoo_hash.c uses compile time flag to
  select x86 file or other platform-specific implementations. While HTM check
  is still done at runtime (same idea with
  RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT)

* Moved rte_hash private struct definitions to rte_cuckoo_hash.h, to allow
  rte_cuckoo_hash_x86.h or future platform dependent functions to include.

* Following new functions are created for consistent names when new platform
  TM support are added.
    - rte_hash_cuckoo_move_insert_mw_tm: do insertion with bucket movement.
    - rte_hash_cuckoo_insert_mw_tm: do insertion without bucket movement.

* One extra multi-writer test case is added.

Signed-off-by: Wei Shen <wei1.shen@intel.com>
Signed-off-by: Sameh Gobriel <sameh.gobriel@intel.com>
Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
2016-06-24 16:25:07 +02:00