numam-dpdk

Author	SHA1	Message	Date
Konstantin Ananyev	229ea9a71c	acl: remove subtree calculations at build stage As now subtree_id is not used acl_merge_trie() any more, there is no point to calculate and maintain that information. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2015-06-04 11:14:45 +02:00
Konstantin Ananyev	2f372ab5c9	acl: fix matching rule Reported by Zi Hu: " cat test_data/rule1 @192.168.0.0/24 192.168.0.0/24 400 : 500 0 : 52 6/0xff @192.168.0.0/24 192.168.0.0/24 400 : 500 54 : 65280 6/0xff @192.168.0.0/24 192.168.0.0/24 400 : 500 0 : 65535 6/0xff cat test_data/trace1 0xc0a80005 0xc0a80009 450 53 0x06 I run the test by: sudo ./testacl -n 2 -c 4 -- --rulesf=./test_data/rule1 --tracef=./test_data/trace1 The result shows that the packet matches the second rule, which is wrong. The dest port of the pkt is 53, so it should match the third rule. " Indeed there is problem at ACL build stage. Sometimes acl_merge_trie() is too aggressive in trying to conserve space at build time. So it takes a wrong assumptions and didn't duplicate a node, even when it should. The easiest and safest fix seems to always duplicate a left non-root/non-leaf node first, and let the further code to destroy the node, if it is not needed. Reported-by: Zi Hu <huzilucky@gmail.com> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2015-06-04 11:14:45 +02:00
Konstantin Ananyev	afd7f2d86a	acl: use setjmp/longjmp to handle alloc failures at build phase During build phase ACL doing quite a lot of memory allocations for relatively small temporary structures. In theory each of such allocation can fail, so we need to handle all these possible failures. That adds a lot of extra checks and makes the code harder to read and follow. To simplify the process, made changes to handle all such failures in one place. Note, that all that memory for temporary structures is freed at one go at the end of build phase. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2015-04-28 11:55:03 +02:00
Konstantin Ananyev	1e496d6fdf	eal/x86: move header file for vector instructions lib/librte_eal/common/include/rte_common_vect.h -> lib/librte_eal/common/include/arch/x86/rte_vect.h Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>	2015-03-20 19:24:38 +01:00
David Marchand	a2348166ea	tailq: move to dynamic tailq Use dynamic tailq rather than static entries. Signed-off-by: David Marchand <david.marchand@6wind.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-03-10 12:06:08 +01:00
David Marchand	ff708facfc	tailq: remove unneeded inclusions Only keep inclusion where really needed. Signed-off-by: David Marchand <david.marchand@6wind.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-03-10 11:47:46 +01:00
Neil Horman	133b75923b	mk: add library version extension To differentiate libraries that break ABI, we add a library version number suffix to the library, which must be incremented when a given libraries ABI is broken. This patch enforces that addition, sets the initial abi soname extension to 1 for each library and creates a symlink to the base SONAME so that the test applications will link properly. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>	2015-02-03 16:56:58 +01:00
Neil Horman	9d41beed24	lib: provide initial versioning Add linker version script files to each DPDK library to put a stake in the ground from which we can start cleaning up API's Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>	2015-02-03 16:56:58 +01:00
Thomas Monjalon	7e60e08397	acl: remove standalone header This is a duplication of some EAL parts for a standalone packaging which is not documented. Packaging should be done outside of DPDK. Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2015-02-02 12:30:33 +01:00
Konstantin Ananyev	17f520d2cf	acl: add comments about internal layout Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-01-28 17:12:16 +01:00
Konstantin Ananyev	f3d24368ef	acl: remove unused constant Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-01-28 17:11:26 +01:00
Konstantin Ananyev	62945e029e	acl: introduce config parameter for performance/space trade-off If at build phase we don't make any trie splitting, then temporary build structures and resulting RT structure might be much bigger than current. >From other side - having just one trie instead of multiple can speedup search quite significantly. >From my measurements on rule-sets with ~10K rules: RT table up to 8 times bigger, classify() up to 80% faster than current implementation. To make it possible for the user to decide about performance/space trade-off - new parameter for build config structure (max_size) is introduced. Setting it to the value greater than zero, instructs rte_acl_build() to: - make sure that size of RT table wouldn't exceed given value. - attempt to minimise number of tries in the table. Setting it to zero maintains current behaviour. That introduces a minor change in the public API, but I think the possible performance gain is too big to ignore it. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-01-28 17:11:26 +01:00
Konstantin Ananyev	a0e3310e7a	acl: deduplicate some SSE and AVX2 code Vector code reorganisation/deduplication: To avoid maintaining two nearly identical implementations of calc_addr() (one for SSE, another for AVX2), replace it with a new macro that suits both SSE and AVX2 code-paths. Also remove no needed any more MM_* macros. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-01-28 17:11:25 +01:00
Konstantin Ananyev	cf59b29bb9	acl: move SSE dwords shuffle Reorganise SSE code-path a bit by moving lo/hi dwords shuffle out from calc_addr(). That allows to make calc_addr() for SSE and AVX2 practically identical and opens opportunity for further code deduplication. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-01-28 17:11:25 +01:00
Konstantin Ananyev	4269eae463	acl: use scalar method fastest for some cases Previous improvements made scalar method the fastest one for tiny bunch of packets (< 4). That allows us to remove specific vector code-path for small number of packets (search_sse_2) and always use scalar method for such cases. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-01-28 17:11:25 +01:00
Konstantin Ananyev	5dd71363bf	acl: add AVX2 classify method Introduce new classify() method that uses AVX2 instructions. >From my measurements: On HSW boards when processing >= 16 packets per call, AVX2 method outperforms it's SSE counterpart by 10-25%, (depending on the ruleset). When build with the compilers that don't support AVX2 instructions, make rte_acl_classify_avx2() do nothing and return an error. At runtime, if librte_acl was build with the compiler that supports AVX2, this method is selected as default one on HW that supports AVX2. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-01-28 17:11:25 +01:00
Konstantin Ananyev	da826b7135	eal: introduce ymm type for AVX 256-bit New data type to manipulate 256 bit AVX values. Rename field in the rte_xmm to keep common naming across SSE/AVX fields. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-01-28 17:11:25 +01:00
Konstantin Ananyev	3858b90d82	acl: deduplicate a bit of RT code Move common check for input parameters up into rte_acl_classify_alg(). Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-01-28 17:11:25 +01:00
Konstantin Ananyev	5c0e6c3de5	acl: make scalar RT code more similar to vector one Make classify_scalar to behave in the same way as it's vector counterpart: move match check out of the inner loop, etc. That makes scalar and vector code look more identical. Plus it improves scalar code performance. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-01-28 17:11:25 +01:00
Konstantin Ananyev	a726650857	acl: simplify match nodes allocation Right now we allocate indexes for all types of nodes, except MATCH, at 'gen final RT table' stage. For MATCH type nodes we are doing it at building temporary tree stage. This is totally unnecessary and makes code more complex and error prone. Rework the code and make MATCH indexes being allocated at the same stage as all others. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-01-28 17:11:25 +01:00
Konstantin Ananyev	ec51901a0b	acl: introduce DFA nodes compression (group64) for identical entries Introduced division of whole 256 child transition enties into 4 sub-groups (64 kids per group). So 2 groups within the same node with identical children, can use one set of transition entries. That allows to compact some DFA nodes and get space savings in the RT table, without any negative performance impact. >From what I've seen an average space savings: ~20%. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-01-28 17:11:25 +01:00
Konstantin Ananyev	d4132664d8	acl: fix overwritten matches There was a bug at build phase that can cause matches beeing overwritten. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-01-28 17:11:25 +01:00
Konstantin Ananyev	c11f17d7f9	acl: remove build phase heuristic with negative performance effect Current rule-wildness based heuristics can cause unnecessary splits of the ruleset. That might have negative performance effect: more tries to traverse, bigger RT tables. After removing it, on some test-cases with big rulesets (~10K) observed ~50% speedup. No difference for smaller rulesets. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-01-28 17:11:25 +01:00
Konstantin Ananyev	efb2529385	acl: make data indexes long enough to survive idle transitions Make data_indexes long enough to survive idle transitions. That allows to simplify match processing code. Also fix incorrect size calculations for data indexes. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-01-28 17:11:25 +01:00
Konstantin Ananyev	47b1402088	acl: fix build in standalone mode Fix compilation with RTE_LIBRTE_ACL_STANDALONE=y Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2015-01-28 17:11:25 +01:00
Sergio Gonzalez Monroy	fdf20fa7be	add prefix to cache line macros CACHE_LINE_SIZE is a macro defined in machine/param.h in FreeBSD and conflicts with DPDK macro version. Adding RTE_ prefix to avoid conflicts. CACHE_LINE_MASK and CACHE_LINE_ROUNDUP are also prefixed. Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com> [Thomas: updated on HEAD, including PPC]	2014-11-27 16:21:11 +01:00
Thomas Monjalon	4b9bb6b71a	acl: fix code typos Replace indicies by indices. Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>	2014-11-14 17:23:50 +01:00
Thomas Monjalon	7eef9194ab	acl: fix comments typos Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>	2014-11-14 17:23:50 +01:00
Konstantin Ananyev	074f54ad03	acl: fix build and runtime for default target Make ACL library to build/work on 'default' architecture: - make rte_acl_classify_scalar really scalar (make sure it wouldn't use sse4 instrincts through resolve_priority()). - Provide two versions of rte_acl_classify code path: rte_acl_classify_sse() - could be build and used only on systems with sse4.2 and upper, return -ENOTSUP on lower arch. rte_acl_classify_scalar() - a slower version, but could be build and used on all systems. - Addition of a new function rte_acl_classify_alg. This function lets you specify an enum value to override the acl contexts default algorithm when doing a classification. This allows an application to specify a classification algorithm without needing to publicize each method. I know there was concern over keeping those methods public, but we don't have a static ABI at the moment, so this seems to me a reasonable thing to do, as it gives us less of an ABI surface to worry about. - keep common code shared between these two codepaths. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2014-09-03 03:26:50 +02:00
Anatoly Burakov	8d8d88cbd9	acl: make tailq fully local Since the data structures such as rings are shared in their entirety, those TAILQ pointers are shared as well. Meaning that, after a successful rte_ring creation, the tailq_next pointer of the last ring in the TAILQ will be updated with a pointer to a ring which may not be present in the address space of another process (i.e. a ring that may be host-local or guest-local, and not shared over IVSHMEM). Any successive ring create/lookup on the other side of IVSHMEM will result in trying to dereference an invalid pointer. This patchset fixes this problem by creating a default tailq entry that may be used by any data structure that chooses to use TAILQs. This default TAILQ entry will consist of a tailq_next/tailq_prev pointers, and an opaque pointer to arbitrary data. All TAILQ pointers from data structures themselves will be removed and replaced by those generic TAILQ entries, thus fixing the problem of potentially exposing local address space to shared structures. Technically, only rte_ring structure require modification, because IVSHMEM is only using memzones (which aren't in TAILQs) and rings, but for consistency's sake other TAILQ-based data structures were adapted as well. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>	2014-07-22 19:42:23 +02:00
Bruce Richardson	10c066d9b3	acl: fix header include for intrinsics Clang compile fails without nmmintrin.h being explicitly included. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Tested-by: Zhaochen Zhan <zhaochen.zhan@intel.com> Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>	2014-07-19 01:54:01 +02:00
Stephen Hemminger	6f41fe75e2	eal: deprecate rte_snprintf The function rte_snprintf serves no useful purpose. It is the same as snprintf() for all valid inputs. Deprecate it and replace all uses in current code. Leave the tests for the deprecated function in place. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>	2014-06-27 02:31:24 +02:00
Konstantin Ananyev	dc276b5780	acl: new library The ACL library is used to perform an N-tuple search over a set of rules with multiple categories and find the best match for each category. Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Tested-by: Waterman Cao <waterman.cao@intel.com> Acked-by: Pablo de Lara Guarch <pablo.de.lara.guarch@intel.com> [Thomas: some code-style changes]	2014-06-14 01:29:45 +02:00

33 Commits