numam-dpdk

Author	SHA1	Message	Date
Dharmik Thakkar	769b2de7fb	hash: implement RCU resources reclamation Currently, users have to use external RCU mechanisms to free resources when using lock free hash algorithm. Integrate RCU QSBR process to make it easier for the applications to use lock free algorithm. Refer to RCU documentation to understand various aspects of integrating RCU library into other libraries. Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Ray Kinsella <mdr@ashroe.eu> Acked-by: Yipeng Wang <yipeng1.wang@intel.com>	2020-10-24 09:25:13 +02:00
Honnappa Nagarahalli	fbfe568103	hash: use 32-bit elements rings to save memory The freelist and external bucket indices are 32b. Using rings that use 32b element sizes will save memory. Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Acked-by: Yipeng Wang <yipeng1.wang@intel.com>	2020-01-19 19:32:50 +01:00
Dharmik Thakkar	f401363d98	hash: support lock-free extendable bucket This patch enables lock-free read-write concurrency support for extendable bucket feature. Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Yipeng Wang <yipeng1.wang@intel.com>	2019-04-03 20:52:35 +02:00
Ruifeng Wang	90fefe78bf	hash: optimize signature compare for Arm NEON Implemented signature compare function based on neon intrinsic. Hash bulk lookup had 3% - 6% performance gain after optimization. Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Acked-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Jerin Jacob <jerinj@marvell.com>	2019-03-28 19:54:21 +01:00
Honnappa Nagarahalli	d5c677db89	hash: fix out-of-bound write while freeing key slot Add a debug check for out-of-bound write while freeing the key slot. Coverity issue: 325733 Fixes: e605a1d36ca7 ("hash: add lock-free r/w concurrency") Cc: stable@dpdk.org Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-12-21 01:53:33 +01:00
Honnappa Nagarahalli	e605a1d36c	hash: add lock-free r/w concurrency Add lock-free read-write concurrency. This is achieved by the following changes. 1) Add memory ordering to avoid race conditions. The only race condition that can occur is - using the key store element before the key write is completed. Hence, while inserting the element the release memory order is used. Any other race condition is caught by the key comparison. Memory orderings are added only where needed. For ex: reads in the writer's context do not need memory ordering as there is a single writer. key_idx in the bucket entry and pdata in the key store element are used for synchronisation. key_idx is used to release an inserted entry in the bucket to the reader. Use of pdata for synchronisation is required due to updation of an existing entry where-in only the pdata is updated without updating key_idx. 2) Reader-writer concurrency issue, caused by moving the keys to their alternative locations during key insert, is solved by introducing a global counter(tbl_chng_cnt) indicating a change in table. 3) Add the flag to enable reader-writer concurrency during run time. Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 12:50:43 +02:00
Honnappa Nagarahalli	dbdbc4a2e9	hash: fix key store element alignment Fix the key store array element alignment such that every array element is aligned on KEY_ALIGNMENT boundary. This is required to make 'pdata' in 'struct rte_hash_key' align on its natural boundary for atomic load/store. Fixes: 473d1bebce43 ("hash: allow to store data in hash table") Cc: stable@dpdk.org Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 12:45:40 +02:00
Honnappa Nagarahalli	9d033dac7d	hash: support no free on delete rte_hash_lookup_xxx APIs return the index of slot in the key store. Application(reader) can use that index to reference other data structures in its scope. Because of this, the index should not be freed till the application completes using the index. RTE_HASH_EXTRA_FLAGS_NO_FREE_ON_DEL is introduced to support this. When this flag is enabled rte_hash_del_xxx APIs do not free the key-store index/internal memory associated with the deleted entry. The new API rte_hash_free_key_with_position should be called to free the key-store index/internal memory after calling rte_hash_del_xxx APIs. Suggested-by: Yipeng Wang <yipeng1.wang@intel.com> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 12:44:52 +02:00
Honnappa Nagarahalli	40f8e9c28c	hash: separate multi-writer from r/w concurrency RW concurrency is required with single writer and multiple reader usecase as well. Hence, multi-writer should not be enabled by default when RW concurrency is enabled. Fixes: f2e3001b53ec ("hash: support read/write concurrency") Cc: stable@dpdk.org Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 12:43:52 +02:00
Yipeng Wang	c7d93df552	hash: use partial-key hashing This commit changes the hashing mechanism to "partial-key hashing" to calculate bucket index and signature of key. This is proposed in Bin Fan, et al's paper "MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing". Basically the idea is to use "xor" to derive alternative bucket from current bucket index and signature. With "partial-key hashing", it reduces the bucket memory requirement from two cache lines to one cache line, which improves the memory efficiency and thus the lookup speed. Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 01:04:33 +02:00
Yipeng Wang	75706568a7	hash: add extendable bucket feature In use cases that hash table capacity needs to be guaranteed, the extendable bucket feature can be used to contain extra keys in linked lists when conflict happens. This is similar concept to the extendable bucket hash table in packet framework. This commit adds the extendable bucket feature. User can turn it on or off through the extra flag field during table creation time. Extendable bucket table composes of buckets that can be linked list to current main table. When extendable bucket is enabled, the hash table load can always achieve 100%. In other words, the table can always accommodate the same number of keys as the specified table size. This provides 100% table capacity guarantee. Although keys ending up in the ext buckets may have longer look up time, they should be rare due to the cuckoo algorithm. Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 01:04:33 +02:00
Yipeng Wang	86c1ef2090	hash: remove unused constant Since the depth-first search of cuckoo path is removed, we do not need the macro anymore which specifies the depth of the cuckoo search. Fixes: f2e3001b53ec ("hash: support read/write concurrency") Cc: stable@dpdk.org Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-10-26 00:00:16 +02:00
Yipeng Wang	f2e3001b53	hash: support read/write concurrency The existing implementation of librte_hash does not support read-write concurrency. This commit implements read-write safety using rte_rwlock and rte_rwlock TM version if hardware transactional memory is available. Both multi-writer and read-write concurrency is protected by rte_rwlock now. The x86 specific header file is removed since the x86 specific RTM function is not called directly by rte hash now. Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com> Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2018-07-12 23:03:50 +02:00
Honnappa Nagarahalli	7c872b9698	hash: validate hash bucket entries while compiling Validate RTE_HASH_BUCKET_ENTRIES during compilation instead of run time. Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2018-07-12 12:43:10 +02:00
Elza Mathew	3a47be9abb	hash: select cuckoo function at run-time Compile-time function selection can potentially lead to lower performance on generic builds done by distros. Replaced compile time flag checks with run-time function selection. Signed-off-by: Elza Mathew <elza.mathew@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2018-01-20 15:34:50 +01:00
Bruce Richardson	369991d997	lib: use SPDX tag for Intel copyright files Replace the BSD license header with the SPDX tag for files with only an Intel copyright on them. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2018-01-04 22:41:39 +01:00
Bruce Richardson	4f4cd8717e	hash: remove checks for SSE Since SSE4 is now part of the minimum requirements for DPDK, we don't need a fallback case to handle selection of algorithm when SSE4 is unavailable. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2017-07-04 14:35:41 +02:00
Pablo de Lara	243e93a504	hash: fix unlimited cuckoo path When trying to insert a new entry, if its target bucket is full, the alternative location (bucket) of one of the entries is checked, to try to find an empty slot, with make_space_bucket. This function is called every time a new bucket is checked, recursively. To avoid having a very long insert operation (and to avoid filling up the stack), a limit in the number of pushes is introduced. Fixes: 48a399119619 ("hash: replace with cuckoo hash implementation") Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2016-10-12 18:40:51 +02:00
Byron Marohn	ff15d9c0ba	hash: modify lookup bulk pipeline This patch replaces the pipelined rte_hash lookup mechanism with a loop-and-jump model, which performs significantly better, especially for smaller table sizes and smaller table occupancies. Signed-off-by: Byron Marohn <byron.marohn@intel.com> Signed-off-by: Saikrishna Edupuganti <saikrishna.edupuganti@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Sameh Gobriel <sameh.gobriel@intel.com>	2016-10-05 12:10:49 +02:00
Byron Marohn	58017c98ed	hash: add vectorized comparison In lookup bulk function, the signatures of all entries are compared against the signature of the key that is being looked up. Now that all the signatures are together, they can be compared with vector instructions (SSE, AVX2), achieving higher lookup performance. Also, entries per bucket are increased to 8 when using processors with AVX2, as 256 bits can be compared at once, which is the size of 8x32-bit signatures. Signed-off-by: Byron Marohn <byron.marohn@intel.com> Signed-off-by: Saikrishna Edupuganti <saikrishna.edupuganti@intel.com> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Sameh Gobriel <sameh.gobriel@intel.com>	2016-10-05 12:09:50 +02:00
Byron Marohn	8a9f542f32	hash: reorganize bucket structure Move current signatures of all entries together in the bucket and same with all alternative signatures, instead of having current and alternative signatures together per entry in the bucket. This will be benefitial in the next commits, where a vectorized comparison will be performed, achieving better performance. The alternative signatures have been moved away from the current signatures, to make the key indices be consecutive to the current signatures, as these two fields are used by lookup, so they are in the same cache line. Signed-off-by: Byron Marohn <byron.marohn@intel.com> Signed-off-by: Saikrishna Edupuganti <saikrishna.edupuganti@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Sameh Gobriel <sameh.gobriel@intel.com>	2016-10-05 12:08:56 +02:00
Pablo de Lara	02a08eb355	hash: reorder hash structure In order to optimize lookup performance, hash structure is reordered, so all fields used for lookup will be in the first cache line. Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Sameh Gobriel <sameh.gobriel@intel.com>	2016-10-05 12:08:04 +02:00
Pablo de Lara	5fc74c2e14	hash: check if slot is empty with key index Instead of checking if the current and alternative signatures are 0, it is faster to check if the key index associated to an entry is 0, meaning that the slot is empty. Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> Acked-by: Saikrishna Edupuganti <saikrishna.edupuganti@intel.com>	2016-09-29 21:51:27 +02:00
Wei Shen	be856325cb	hash: add scalable multi-writer insertion with Intel TSX This patch introduced scalable multi-writer Cuckoo Hash insertion based on a split Cuckoo Search and Move operation using Intel TSX. It can do scalable hash insertion with 22 cores with little performance loss and negligible TSX abortion rate. * Added an extra rte_hash flag definition to switch default single writer Cuckoo Hash behavior to multiwriter. - If HTM is available, it would use hardware feature for concurrency. - If HTM is not available, it would fall back to spinlock. * Created a rte_cuckoo_hash_x86.h file to hold all x86-arch related cuckoo_hash functions. And rte_cuckoo_hash.c uses compile time flag to select x86 file or other platform-specific implementations. While HTM check is still done at runtime (same idea with RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT) * Moved rte_hash private struct definitions to rte_cuckoo_hash.h, to allow rte_cuckoo_hash_x86.h or future platform dependent functions to include. * Following new functions are created for consistent names when new platform TM support are added. - rte_hash_cuckoo_move_insert_mw_tm: do insertion with bucket movement. - rte_hash_cuckoo_insert_mw_tm: do insertion without bucket movement. * One extra multi-writer test case is added. Signed-off-by: Wei Shen <wei1.shen@intel.com> Signed-off-by: Sameh Gobriel <sameh.gobriel@intel.com> Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>	2016-06-24 16:25:07 +02:00

24 Commits