This small optimization uses the static the Virtio-net
header len in packed datapath, since Virtio-net header
cannot be the legacy one in case of packed ring.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
In case packed ring layout has been negotiated, but neither
Version 1 nor mergeable buffers, the Virtio-net header len
is assigned to the legacy devices value, which is wrong.
This patch fixes this with using the proper len as devices
using packed ring are not legacy devices.
Fixes: a922401f35 ("vhost: add Rx support for packed ring")
Fixes: ae999ce49d ("vhost: add Tx support for packed ring")
Cc: stable@dpdk.org
Reported-by: Marvin Liu <yong.liu@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
In virtio_dev_extbuf_alloc(), the shinfo structure used to store
the reference counter and the free callback of the external buffer
is by default stored inside the mbuf data.
This is wrong because the mbuf (and its data) can be freed before
the external buffer, for instance in the following situation:
pkt2 = rte_pktmbuf_alloc(mp);
rte_pktmbuf_attach(pkt2, pkt);
rte_pktmbuf_free(pkt);
After this, pkt is freed, but it still contains shinfo, which is
referenced by pkt2.
Fix this by always storing the shinfo beside the external buffer.
Fixes: c3ff0ac70a ("vhost: improve performance by supporting large buffer")
Cc: stable@dpdk.org
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This patch fixes the feature negotiation for vhost crypto during
initialization. The patch uses the newly created driver start
function to inform the driver type with the fixed vhost features.
In addition the patch provides a new API specifically used by
the application to start a vhost-crypto driver.
Fixes: 939066d965 ("vhost/crypto: add public function implementation")
Cc: stable@dpdk.org
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
When testing on some x86 platforms, code compiled with meson was observed
running at a different power-license level to that compiled with make. This
is due to the fact that meson auto-detects the instruction sets available
on the system and enabled AVX512 rte_memcpy when AVX512 was available,
while on make, a build time AVX-512 flag needed to be explicitly set to
enable that AVX512 rte_memcpy code path.
In the absence of runtime path selection for rte_memcpy - which is
complicated by it being a static inline function in a header file - we can
fix this behaviour regression by similarly having a build-time option which
must be set to enable the AVX-512 memcpy path.
Fixes: a25a650be5 ("build: add infrastructure for meson and ninja builds")
Fixes: 3e1bb55fd6 ("build/x86: add SSE flags")
Cc: stable@dpdk.org
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Yingya Han <yingyax.han@intel.com>
rte_cldemote is similar to a prefetch hint - in reverse.
On x86, cldemote(addr) enables software to hint to hardware that line is
likely to be shared. This is quite useful in core-to-core communications
where cache-line is likely to be shared.
ARM and PPC implementation is provided with NOP and can be added if any
equivalent instructions could be used for implementation on those
architectures.
Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: David Christensen <drc@linux.vnet.ibm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
The incriminated commit forgot to clean the Windows export file.
Fixes: 3cd73a1a1c ("eal: simplify exit functions")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Add new internal wrapper function for use by pci drivers as a
.probe function to attach to an event interface. Same as
rte_event_pmd_pci_probe, except the caller can specify the name.
Updated rte_event_pmd_pci_probe so as to not duplicate
code.
Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
Reviewed-by: Gage Eads <gage.eads@intel.com>
This commit implements the eventdev ABI changes required by
the DLB/DLB2 PMDs. Several data structures and constants are modified
or added in this patch, thereby requiring modifications to the
dependent apps and examples.
The DLB/DLB2 hardware does not conform exactly to the eventdev interface.
1) It has a limit on the number of queues that may be linked to a port.
2) Some ports a further restricted to a maximum of 1 linked queue.
3) DLB does not have the ability to carry the flow_id as part
of the event (QE) payload. Note that the DLB2 hardware is capable of
carrying the flow_id.
Following is a detailed description of the changes that have been made.
1) Add new fields to the rte_event_dev_info struct. These fields allow
the device to advertise its capabilities so that applications can take
the appropriate actions based on those capabilities.
struct rte_event_dev_info {
uint32_t max_event_port_links;
/**< Maximum number of queues that can be linked to a single event
* port by this device.
*/
uint8_t max_single_link_event_port_queue_pairs;
/**< Maximum number of event ports and queues that are optimized for
* (and only capable of) single-link configurations supported by this
* device. These ports and queues are not accounted for in
* max_event_ports or max_event_queues.
*/
}
2) Add a new field to the rte_event_dev_config struct. This field allows
the application to specify how many of its ports are limited to a single
link, or will be used in single link mode.
/** Event device configuration structure */
struct rte_event_dev_config {
uint8_t nb_single_link_event_port_queues;
/**< Number of event ports and queues that will be singly-linked to
* each other. These are a subset of the overall event ports and
* queues; this value cannot exceed *nb_event_ports* or
* *nb_event_queues*. If the device has ports and queues that are
* optimized for single-link usage, this field is a hint for how many
* to allocate; otherwise, regular event ports and queues can be used.
*/
}
3) Replace the dedicated implicit_release_disabled field with a bit field
of explicit port capabilities. The implicit_release_disable functionality
is assigned to one bit, and a port-is-single-link-only attribute is
assigned to other, with the remaining bits available for future assignment.
* Event port configuration bitmap flags */
#define RTE_EVENT_PORT_CFG_DISABLE_IMPL_REL (1ULL << 0)
/**< Configure the port not to release outstanding events in
* rte_event_dev_dequeue_burst(). If set, all events received through
* the port must be explicitly released with RTE_EVENT_OP_RELEASE or
* RTE_EVENT_OP_FORWARD. Must be unset if the device is not
* RTE_EVENT_DEV_CAP_IMPLICIT_RELEASE_DISABLE capable.
*/
#define RTE_EVENT_PORT_CFG_SINGLE_LINK (1ULL << 1)
/**< This event port links only to a single event queue.
*
* @see rte_event_port_setup(), rte_event_port_link()
*/
#define RTE_EVENT_PORT_ATTR_IMPLICIT_RELEASE_DISABLE 3
/**
* The implicit release disable attribute of the port
*/
struct rte_event_port_conf {
uint32_t event_port_cfg;
/**< Port cfg flags(EVENT_PORT_CFG_) */
}
This patch also removes the depreciation notice and announce
the new eventdev ABI changes in release note.
Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
In rte_event_crypto_adapter_create_ext() allocated memory for
adapter, we should free it when error happens, otherwise it
will lead to memory leak.
Fixes: 7901eac340 ("eventdev: add crypto adapter implementation")
Cc: stable@dpdk.org
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
The function rte_zmalloc() could return NULL, the return value
need to be checked.
Fixes: a3bbf2e097 ("eventdev: add eth Tx adapter implementation")
Cc: stable@dpdk.org
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Reviewed-by: Nikhil Rao <nikhil.rao@intel.com>
The telemetry library is connected with eventdev xstats and
port link info. The following new telemetry commands are added:
/eventdev/dev_list
/eventdev/port_list,DevID
/eventdev/queue_list,DevID
/eventdev/dev_xstats,DevID
/eventdev/port_xstats,DevID,PortID
/eventdev/queue_xstats,DevID,PortID
/eventdev/queue_links,DevID,PortID
queue_links command displays a list of queues linked with a specified
eventdev port and a service priority associated with each link.
Signed-off-by: Mike Ximing Chen <mike.ximing.chen@intel.com>
Reviewed-by: Ciara Power <ciara.power@intel.com>
Reviewed-by: Gage Eads <gage.eads@intel.com>
Currently, the rte_flow functions are not defined as thread safe.
DPDK applications either call the functions in single thread or
protect any concurrent calling for the rte_flow operations using
a lock.
For PMDs support the flow operations thread safe natively, the
redundant protection in application hurts the performance of the
rte_flow operation functions.
And the restriction of thread safe is not guaranteed for the
rte_flow functions also limits the applications' expectation.
This feature is going to change the rte_flow functions to be thread
safe. As different PMDs have different flow operations, some may
support thread safe already and others may not. For PMDs don't
support flow thread safe operation, a new lock is defined in ethdev
in order to protects thread unsafe PMDs from rte_flow level.
A new RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE device flag is added to
determine whether the PMD supports thread safe flow operation or not.
For PMDs support thread safe flow operations, set the
RTE_ETH_DEV_FLOW_OPS_THREAD_SAFE flag, rte_flow level functions will
skip the thread safe helper lock for these PMDs. Again the rte_flow
level thread safe lock only works when PMD operation functions are
not thread safe.
For the PMDs which don't want the default mutex lock, just set the
flag in the PMD, and add the prefer type of lock in the PMD. Then
the default mutex lock is easily replaced by the PMD level lock.
The change has no effect on the current DPDK applications. No change
is required for the current DPDK applications. For the standard posix
pthread_mutex, if no lock contention with the added rte_flow level
mutex, the mutex only does the atomic increasing in
pthread_mutex_lock() and decreasing in
pthread_mutex_unlock(). No futex() syscall will be involved.
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Add pthread mutex lock as it is needed for the thread safe rte_flow
functions.
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Tested-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>
Clarify the documentation of QinQ flags, and extend the meaning of the
flag: if PKT_RX_QINQ_STRIPPED is set and PKT_RX_VLAN_STRIPPED is unset,
only the outer VLAN is removed from packet data, but both tci are saved
in mbuf->vlan_tci (inner) and mbuf->vlan_tci_outer (outer).
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
The SDAP is a protocol in the LTE stack on top of PDCP for
QOS. A particular PDCP session may or may not have
SDAP enabled. But if it is enabled, SDAP header should be
authenticated but not encrypted if both confidentiality and
integrity is enabled. Hence, the driver should be intimated
from the xform so that it skip the SDAP header while encryption.
A new field is added in the PDCP xform to specify SDAP is enabled.
The overall size of the xform is not changed, as hfn_ovrd is just
a flag and does not need uint32. Hence, it is converted to uint8_t
and a 16 bit reserved field is added for future.
Signed-off-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
This patch removes enumerators RTE_CRYPTO_CIPHER_LIST_END,
RTE_CRYPTO_AUTH_LIST_END, RTE_CRYPTO_AEAD_LIST_END to prevent
ABI breakage that may arise when adding new crypto algorithms.
Signed-off-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
This patch adds raw data-path APIs for enqueue and dequeue
operations to cryptodev. The APIs support flexible user-define
enqueue and dequeue behaviors.
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Signed-off-by: Piotr Bronowski <piotrx.bronowski@intel.com>
Acked-by: Adam Dybkowski <adamx.dybkowski@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
This patch updates ``rte_crypto_sym_vec`` structure to add
support for both cpu_crypto synchronous operation and
asynchronous raw data-path APIs. The patch also includes
AESNI-MB and AESNI-GCM PMD changes, unit test changes and
documentation updates.
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
The rte_cryptodev_pmd_parse_input_args function crashes with a
segmentation fault when passing a non-empty argument string.
The function passes cryptodev_pmd_valid_params to rte_kvargs_parse,
which accepts a NULL-terminated list of valid keys, yet
cryptodev_pmd_valid_params does not end with NULL. The patch adds the
missing NULL pointer.
Fixes: 9e6edea418 ("cryptodev: add APIs to assist PMD initialisation")
Cc: stable@dpdk.org
Signed-off-by: Haggai Eran <haggaie@nvidia.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
This reverts commit a0f0de06d4 as the
rte_cryptodev_info_get function versioning was a temporary solution
to maintain ABI compatibility for ChaCha20-Poly1305 and is not
needed in 20.11.
Fixes: a0f0de06d4 ("cryptodev: fix ABI compatibility for ChaCha20-Poly1305")
Signed-off-by: Adam Dybkowski <adamx.dybkowski@intel.com>
Reviewed-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Since librte_ipsec was first introduced in 19.02 and there were no changes
in it's public API since 19.11, it should be considered mature enough to
remove the 'experimental' tag from it.
The RTE_SATP_LOG2_NUM enum is also being dropped from rte_ipsec_sa.h to
avoid possible ABI problems in the future.
Signed-off-by: Conor Walsh <conor.walsh@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
The option RTE_EAL_ALWAYS_PANIC_ON_ERROR was off by default,
and not customizable with meson. It is completely removed.
The function rte_dump_registers is a trace of the bare metal support
era, and was not supported in userland. It is completely removed.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
This commit adds new rte_prefetchX_write() variants, that suggest to the
compiler to use a prefetch instruction with intention to write. As a
compiler builtin, the compiler can choose based on compilation target
what the best implementation for this instruction is.
Three versions are provided, targeting the different levels of cache.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
The cited commit introduced functions with 'int memory_order' argument.
The C11 standard section 7.17.1.4 defines 'memory_order' as the
"enumerated type whose enumerators identify memory ordering constraints".
A compilation error occurs:
error: declaration of 'memory_order' shadows a global declaration
[-Werror=shadow]
rte_atomic_thread_fence(int memory_order)
This issue was hit when trying to compile OVS with gcc 4.8.5. This
compiler version does not provide stdatomic.h, so enum memory_order is
redefined in OVS code.
In another case, if the compiler does provide stdatomic.h header,
passing -Wsystem-headers in the CFLAGS will also cause that failure.
Fix it by changing the argument name 'memory_order' to 'memorder'.
Fixes: 672a150563 ("eal: add wrapper for C11 atomic thread fence")
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Asaf Penso <asafp@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
gcc 5.4 fails with:
../lib/librte_acl/acl_run_avx512x8.h: In function 'match_process_avx512x8':
../lib/librte_acl/acl_run_avx512x8.h:382:31: error:
pointer targets in passing argument 1 of '_mm256_mask_i32scatter_epi32'
differ in signedness [-Werror=pointer-sign]
Later gcc versions work fine, as for them parameter type was
changed to 'void *'.
Fixed by applying explicit cast for offending argument.
Bugzilla ID: 556
Fixes: b64c2295f7 ("acl: add 256-bit AVX512 classify method")
Fixes: 45da22e42e ("acl: add 512-bit AVX512 classify method")
Reported-by: Ali Alnubani <alialnu@nvidia.com>
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: Ali Alnubani <alialnu@nvidia.com>
Only marking the doxygen declarations is not enough.
Arch specific implementations must be tagged as well since there is no
common declaration of those inlines.
Fixes: 8a00dfc738 ("eal: add write combining store")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Radu Nicolau <radu.nicolau@intel.com>
Add subport profile table to internal port data structure
and update the port config function.
Signed-off-by: Savinay Dharmappa <savinay.dharmappa@intel.com>
Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Implement terminal handling, input polling, and vdprintf() for Windows.
Because Windows I/O model differs fundamentally from Unix and there is
no concept of character device, polling is simulated depending on the
underlying input device. Supporting non-terminal input is useful for
automated testing.
Windows emulation of VT100 uses "ESC [ E" for newline instead of
standard "ESC E", so add a workaround.
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Extend compatibility header system to support librte_cmdline.
pthread.h has to include windows.h, which exposes struct in_addr, etc.
conflicting with compatibility headers. WIN32_LEAN_AND_MEAN macro
is required to disable this behavior. Use rte_windows.h to define
WIN32_LEAN_AND_MEAN for pthread library.
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Add internal wrapper for vdprintf(3) that is only available on Unix.
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
poll(3) is a purely Unix facility, so it cannot be directly used by
common code. read(2) is limited in device support outside of Unix.
Create wrapper functions and implement them for Unix.
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Add functions that set up, save, and restore terminal parameters.
Use existing code as Unix implementation.
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
struct cmdline exposes platform-specific members it contains, most
notably struct termios that is only available on Unix. While ABI
considerations prevent from hinding the definition on already supported
platforms, struct cmdline is considered logically opaque from now on.
Add a deprecation notice targeted at 20.11.
* Remove tests checking struct cmdline content as meaningless.
* Fix missing cmdline_free() in unit test.
* Add cmdline_get_rdline() to access history buffer indirectly.
The new function is currently used only in tests.
Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Implementation is based on waitable timers Win32 API. When timer is set,
a callback and its argument are supplied to the OS, while timer handle
is stored in EAL alarm list. When timer expires, OS wakes up the
interrupt thread and runs the callback. Upon completion it removes the
alarm.
Waitable timers must be set from the thread their callback will run in,
eal_intr_thread_schedule() provides a way to schedule asyncronuous code
execution in the interrupt thread. Alarm module builds synchronous timer
setup on top of it.
Windows alarms are not a type of DPDK interrupt handle and do not
interact with interrupt module beyond executing in the same thread.
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>
Windows interrupt support is based on IO completion ports (IOCP).
Interrupt thread would send the devices requests to notify about
interrupts and then wait for any request completion. Add skeleton code
of this model without any hardware support.
Another way to wake up the interrupt thread is APC (asynchronous procedure
call), scheduled by any other thread via eal_intr_thread_schedule().
This internal API is intended for alarm implementation.
Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Narcisa Vasile <navasile@linux.microsoft.com>
When create softnic hash table with 16 keys, it failed on 32-bit
environment, because the pointer field in structure rte_bucket_4_16
is only 32 bits. Add a padding field in 32-bit environment to keep
the structure to a multiple of 64 bytes. Apply this to 8-byte and
32-byte key hash function as well.
Fixes: 8aa327214c ("table: hash")
Cc: stable@dpdk.org
Signed-off-by: Ting Xu <ting.xu@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Current rte_acl_classify_avx512x32() and rte_acl_classify_avx512x16()
code paths are very similar. The only differences are due to
256/512 register/instrincts naming conventions.
So to deduplicate the code:
- Move common code into “acl_run_avx512_common.h”
- Use macros to hide difference in naming conventions
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
With current ACL implementation first field in the rule definition
has always to be one byte long. Though for optimising classify
implementation it might be useful to do 4B reads
(as we do for rest of the fields).
So at build phase, check user provided field definitions to determine
is it safe to do 4B loads for first ACL field.
Then at run-time this information can be used to choose classify
behavior.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Introduce classify implementation that uses AVX512 specific ISA.
rte_acl_classify_avx512x32() is able to process up to 32 flows in parallel.
It uses 512-bit width registers/instructions and provides higher
performance then rte_acl_classify_avx512x16(), but can cause
frequency level change.
Note that for now only 64-bit version is supported.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
On supported platforms, set RTE_ACL_CLASSIFY_AVX512X16 as
default ACL classify algorithm.
Note that AVX512X16 implementation uses 256-bit registers/instincts only
to avoid possibility of frequency drop.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Introduce classify implementation that uses AVX512 specific ISA.
rte_acl_classify_avx512x16() is able to process up to 16 flows in parallel.
It uses 256-bit width registers/instructions only
(to avoid frequency level change).
Note that for now only 64-bit version is supported.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Add necessary changes to support new AVX512 specific ACL classify
algorithm:
- changes in meson.build to check that build tools
(compiler, assembler, etc.) do properly support AVX512.
- run-time checks to make sure target platform does support AVX512.
- dummy rte_acl_classify_avx512() for targets where AVX512
implementation couldn't be properly supported.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Right now ACL library determines best possible (default) classify method
on a given platform with special constructor function rte_acl_init().
This patch makes the following changes:
- Move selection of default classify method into a separate private
function and call it for each ACL context creation (rte_acl_create()).
- Remove library constructor function
- Make rte_acl_set_ctx_classify() to check that requested algorithm
is supported on given platform.
The purpose of these changes to improve and simplify algorithm selection
process and prepare ACL library to be integrated with the
max SIMD bitwidth series in discussion.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Removal of unused enum value (RTE_ACL_CLASSIFY_NUM).
This enum value is not used inside DPDK, while it prevents
to add new classify algorithms without causing an ABI breakage.
Note that this change introduce a formal ABI incompatibility
with previous versions of ACL library.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Right now we define dummy version of rte_acl_classify_avx2()
when both X86 and AVX2 are not detected, though it should be
for non-AVX2 case only.
Fixes: e53ce4e413 ("acl: remove use of weak functions")
Cc: stable@dpdk.org
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
New data type to manipulate 512 bit AVX values.
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>