numam-dpdk

Author	SHA1	Message	Date
Konstantin Ananyev	4269eae463	acl: use scalar method fastest for some cases Previous improvements made scalar method the fastest one for tiny bunch of packets (< 4). That allows us to remove specific vector code-path for small number of packets (search_sse_2) and always use scalar method for such cases. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-01-28 17:11:25 +01:00
Konstantin Ananyev	5dd71363bf	acl: add AVX2 classify method Introduce new classify() method that uses AVX2 instructions. >From my measurements: On HSW boards when processing >= 16 packets per call, AVX2 method outperforms it's SSE counterpart by 10-25%, (depending on the ruleset). When build with the compilers that don't support AVX2 instructions, make rte_acl_classify_avx2() do nothing and return an error. At runtime, if librte_acl was build with the compiler that supports AVX2, this method is selected as default one on HW that supports AVX2. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-01-28 17:11:25 +01:00
Konstantin Ananyev	da826b7135	eal: introduce ymm type for AVX 256-bit New data type to manipulate 256 bit AVX values. Rename field in the rte_xmm to keep common naming across SSE/AVX fields. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-01-28 17:11:25 +01:00
Konstantin Ananyev	3858b90d82	acl: deduplicate a bit of RT code Move common check for input parameters up into rte_acl_classify_alg(). Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-01-28 17:11:25 +01:00
Konstantin Ananyev	ec51901a0b	acl: introduce DFA nodes compression (group64) for identical entries Introduced division of whole 256 child transition enties into 4 sub-groups (64 kids per group). So 2 groups within the same node with identical children, can use one set of transition entries. That allows to compact some DFA nodes and get space savings in the RT table, without any negative performance impact. >From what I've seen an average space savings: ~20%. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-01-28 17:11:25 +01:00
Thomas Monjalon	4b9bb6b71a	acl: fix code typos Replace indicies by indices. Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>	2014-11-14 17:23:50 +01:00
Konstantin Ananyev	074f54ad03	acl: fix build and runtime for default target Make ACL library to build/work on 'default' architecture: - make rte_acl_classify_scalar really scalar (make sure it wouldn't use sse4 instrincts through resolve_priority()). - Provide two versions of rte_acl_classify code path: rte_acl_classify_sse() - could be build and used only on systems with sse4.2 and upper, return -ENOTSUP on lower arch. rte_acl_classify_scalar() - a slower version, but could be build and used on all systems. - Addition of a new function rte_acl_classify_alg. This function lets you specify an enum value to override the acl contexts default algorithm when doing a classification. This allows an application to specify a classification algorithm without needing to publicize each method. I know there was concern over keeping those methods public, but we don't have a static ABI at the moment, so this seems to me a reasonable thing to do, as it gives us less of an ABI surface to worry about. - keep common code shared between these two codepaths. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2014-09-03 03:26:50 +02:00

7 Commits