numam-dpdk

Author	SHA1	Message	Date
Konstantin Ananyev	6fba1c8ba0	acl: optimize AVX512 classify with 4 bytes loads With current ACL implementation first field in the rule definition has always to be one byte long. Though for optimising classify implementation it might be useful to do 4B reads (as we do for rest of the fields). So at build phase, check user provided field definitions to determine is it safe to do 4B loads for first ACL field. Then at run-time this information can be used to choose classify behavior. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2020-10-14 14:23:01 +02:00
Konstantin Ananyev	b64c2295f7	acl: add 256-bit AVX512 classify method Introduce classify implementation that uses AVX512 specific ISA. rte_acl_classify_avx512x16() is able to process up to 16 flows in parallel. It uses 256-bit width registers/instructions only (to avoid frequency level change). Note that for now only 64-bit version is supported. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2020-10-14 14:23:00 +02:00
Konstantin Ananyev	7c6cca6b60	acl: add infrastructure for AVX512 classify methods Add necessary changes to support new AVX512 specific ACL classify algorithm: - changes in meson.build to check that build tools (compiler, assembler, etc.) do properly support AVX512. - run-time checks to make sure target platform does support AVX512. - dummy rte_acl_classify_avx512() for targets where AVX512 implementation couldn't be properly supported. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>	2020-10-14 14:23:00 +02:00
Bruce Richardson	369991d997	lib: use SPDX tag for Intel copyright files Replace the BSD license header with the SPDX tag for files with only an Intel copyright on them. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>	2018-01-04 22:41:39 +01:00
Gowrishankar Muthukrishnan	1d73135f9f	acl: add AltiVec for ppc64 This patch adds port for ACL library in ppc64le. Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com> Acked-by: Chao Zhu <chaozhu@linux.vnet.ibm.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2016-09-09 17:56:14 +02:00
Jerin Jacob	34fa6c27c1	acl: add NEON optimization for ARMv8 The implementation uses NEON gcc intrinsic. Verified with testacl and acl_autotest applications on arm64 architecture. Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2015-11-18 22:44:01 +01:00
Konstantin Ananyev	229ea9a71c	acl: remove subtree calculations at build stage As now subtree_id is not used acl_merge_trie() any more, there is no point to calculate and maintain that information. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2015-06-04 11:14:45 +02:00
Konstantin Ananyev	17f520d2cf	acl: add comments about internal layout Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-01-28 17:12:16 +01:00
Konstantin Ananyev	f3d24368ef	acl: remove unused constant Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-01-28 17:11:26 +01:00
Konstantin Ananyev	62945e029e	acl: introduce config parameter for performance/space trade-off If at build phase we don't make any trie splitting, then temporary build structures and resulting RT structure might be much bigger than current. >From other side - having just one trie instead of multiple can speedup search quite significantly. >From my measurements on rule-sets with ~10K rules: RT table up to 8 times bigger, classify() up to 80% faster than current implementation. To make it possible for the user to decide about performance/space trade-off - new parameter for build config structure (max_size) is introduced. Setting it to the value greater than zero, instructs rte_acl_build() to: - make sure that size of RT table wouldn't exceed given value. - attempt to minimise number of tries in the table. Setting it to zero maintains current behaviour. That introduces a minor change in the public API, but I think the possible performance gain is too big to ignore it. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-01-28 17:11:26 +01:00
Konstantin Ananyev	5dd71363bf	acl: add AVX2 classify method Introduce new classify() method that uses AVX2 instructions. >From my measurements: On HSW boards when processing >= 16 packets per call, AVX2 method outperforms it's SSE counterpart by 10-25%, (depending on the ruleset). When build with the compilers that don't support AVX2 instructions, make rte_acl_classify_avx2() do nothing and return an error. At runtime, if librte_acl was build with the compiler that supports AVX2, this method is selected as default one on HW that supports AVX2. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-01-28 17:11:25 +01:00
Konstantin Ananyev	a726650857	acl: simplify match nodes allocation Right now we allocate indexes for all types of nodes, except MATCH, at 'gen final RT table' stage. For MATCH type nodes we are doing it at building temporary tree stage. This is totally unnecessary and makes code more complex and error prone. Rework the code and make MATCH indexes being allocated at the same stage as all others. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-01-28 17:11:25 +01:00
Konstantin Ananyev	ec51901a0b	acl: introduce DFA nodes compression (group64) for identical entries Introduced division of whole 256 child transition enties into 4 sub-groups (64 kids per group). So 2 groups within the same node with identical children, can use one set of transition entries. That allows to compact some DFA nodes and get space savings in the RT table, without any negative performance impact. >From what I've seen an average space savings: ~20%. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-01-28 17:11:25 +01:00
Konstantin Ananyev	074f54ad03	acl: fix build and runtime for default target Make ACL library to build/work on 'default' architecture: - make rte_acl_classify_scalar really scalar (make sure it wouldn't use sse4 instrincts through resolve_priority()). - Provide two versions of rte_acl_classify code path: rte_acl_classify_sse() - could be build and used only on systems with sse4.2 and upper, return -ENOTSUP on lower arch. rte_acl_classify_scalar() - a slower version, but could be build and used on all systems. - Addition of a new function rte_acl_classify_alg. This function lets you specify an enum value to override the acl contexts default algorithm when doing a classification. This allows an application to specify a classification algorithm without needing to publicize each method. I know there was concern over keeping those methods public, but we don't have a static ABI at the moment, so this seems to me a reasonable thing to do, as it gives us less of an ABI surface to worry about. - keep common code shared between these two codepaths. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2014-09-03 03:26:50 +02:00
Anatoly Burakov	8d8d88cbd9	acl: make tailq fully local Since the data structures such as rings are shared in their entirety, those TAILQ pointers are shared as well. Meaning that, after a successful rte_ring creation, the tailq_next pointer of the last ring in the TAILQ will be updated with a pointer to a ring which may not be present in the address space of another process (i.e. a ring that may be host-local or guest-local, and not shared over IVSHMEM). Any successive ring create/lookup on the other side of IVSHMEM will result in trying to dereference an invalid pointer. This patchset fixes this problem by creating a default tailq entry that may be used by any data structure that chooses to use TAILQs. This default TAILQ entry will consist of a tailq_next/tailq_prev pointers, and an opaque pointer to arbitrary data. All TAILQ pointers from data structures themselves will be removed and replaced by those generic TAILQ entries, thus fixing the problem of potentially exposing local address space to shared structures. Technically, only rte_ring structure require modification, because IVSHMEM is only using memzones (which aren't in TAILQs) and rings, but for consistency's sake other TAILQ-based data structures were adapted as well. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2014-07-22 19:42:23 +02:00
Konstantin Ananyev	dc276b5780	acl: new library The ACL library is used to perform an N-tuple search over a set of rules with multiple categories and find the best match for each category. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Tested-by: Waterman Cao <waterman.cao@intel.com> Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com> [Thomas: some code-style changes]	2014-06-14 01:29:45 +02:00

16 Commits