Current rte_acl_classify_avx512x32() and rte_acl_classify_avx512x16()
code paths are very similar. The only differences are due to
256/512 register/instrincts naming conventions.
So to deduplicate the code:
- Move common code into “acl_run_avx512_common.h”
- Use macros to hide difference in naming conventions
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
With current ACL implementation first field in the rule definition
has always to be one byte long. Though for optimising classify
implementation it might be useful to do 4B reads
(as we do for rest of the fields).
So at build phase, check user provided field definitions to determine
is it safe to do 4B loads for first ACL field.
Then at run-time this information can be used to choose classify
behavior.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Introduce classify implementation that uses AVX512 specific ISA.
rte_acl_classify_avx512x32() is able to process up to 32 flows in parallel.
It uses 512-bit width registers/instructions and provides higher
performance then rte_acl_classify_avx512x16(), but can cause
frequency level change.
Note that for now only 64-bit version is supported.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>