Previous improvements made scalar method the fastest one
for tiny bunch of packets (< 4).
That allows us to remove specific vector code-path for small number of packets
(search_sse_2) and always use scalar method for such cases.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Introduce new classify() method that uses AVX2 instructions.
>From my measurements:
On HSW boards when processing >= 16 packets per call,
AVX2 method outperforms it's SSE counterpart by 10-25%,
(depending on the ruleset).
When build with the compilers that don't support AVX2 instructions,
make rte_acl_classify_avx2() do nothing and return an error.
At runtime, if librte_acl was build with the compiler that supports AVX2,
this method is selected as default one on HW that supports AVX2.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
New data type to manipulate 256 bit AVX values.
Rename field in the rte_xmm to keep common naming across SSE/AVX fields.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Move common check for input parameters up into rte_acl_classify_alg().
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Introduced division of whole 256 child transition enties
into 4 sub-groups (64 kids per group).
So 2 groups within the same node with identical children,
can use one set of transition entries.
That allows to compact some DFA nodes and get space savings in the RT table,
without any negative performance impact.
>From what I've seen an average space savings: ~20%.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Make ACL library to build/work on 'default' architecture:
- make rte_acl_classify_scalar really scalar
(make sure it wouldn't use sse4 instrincts through resolve_priority()).
- Provide two versions of rte_acl_classify code path:
rte_acl_classify_sse() - could be build and used only on systems with sse4.2
and upper, return -ENOTSUP on lower arch.
rte_acl_classify_scalar() - a slower version, but could be build and used
on all systems.
- Addition of a new function rte_acl_classify_alg. This function lets you
specify an enum value to override the acl contexts default algorithm when doing
a classification. This allows an application to specify a classification
algorithm without needing to publicize each method. I know there was concern
over keeping those methods public, but we don't have a static ABI at the moment,
so this seems to me a reasonable thing to do, as it gives us less of an ABI
surface to worry about.
- keep common code shared between these two codepaths.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>