Queue-based flow rules management mechanism is suitable
not only for flow rules creation/destruction, but also
for speeding up other types of Flow API management.
Indirect action object operations may be executed
asynchronously as well. Provide async versions for all
indirect action operations, namely:
rte_flow_async_action_handle_create,
rte_flow_async_action_handle_destroy and
rte_flow_async_action_handle_update.
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
A new, faster, queue-based flow rules management mechanism is needed for
applications offloading rules inside the datapath. This asynchronous
and lockless mechanism frees the CPU for further packet processing and
reduces the performance impact of the flow rules creation/destruction
on the datapath. Note that queues are not thread-safe and the queue
should be accessed from the same thread for all queue operations.
It is the responsibility of the app to sync the queue functions in case
of multi-threaded access to the same queue.
The rte_flow_async_create() function enqueues a flow creation to the
requested queue. It benefits from already configured resources and sets
unique values on top of item and action templates. A flow rule is enqueued
on the specified flow queue and offloaded asynchronously to the hardware.
The function returns immediately to spare CPU for further packet
processing. The application must invoke the rte_flow_pull() function
to complete the flow rule operation offloading, to clear the queue, and to
receive the operation status. The rte_flow_async_destroy() function
enqueues a flow destruction to the requested queue.
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Treating every single flow rule as a completely independent and separate
entity negatively impacts the flow rules insertion rate. Oftentimes in an
application, many flow rules share a common structure (the same item mask
and/or action list) so they can be grouped and classified together.
This knowledge may be used as a source of optimization by a PMD/HW.
The pattern template defines common matching fields (the item mask) without
values. The actions template holds a list of action types that will be used
together in the same rule. The specific values for items and actions will
be given only during the rule creation.
A table combines pattern and actions templates along with shared flow rule
attributes (group ID, priority and traffic direction). This way a PMD/HW
can prepare all the resources needed for efficient flow rules creation in
the datapath. To avoid any hiccups due to memory reallocation, the maximum
number of flow rules is defined at the table creation time.
The flow rule creation is done by selecting a table, a pattern template
and an actions template (which are bound to the table), and setting unique
values for the items and actions.
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
The flow rules creation/destruction at a large scale incurs a performance
penalty and may negatively impact the packet processing when used
as part of the datapath logic. This is mainly because software/hardware
resources are allocated and prepared during the flow rule creation.
In order to optimize the insertion rate, PMD may use some hints provided
by the application at the initialization phase. The rte_flow_configure()
function allows to pre-allocate all the needed resources beforehand.
These resources can be used at a later stage without costly allocations.
Every PMD may use only the subset of hints and ignore unused ones or
fail in case the requested configuration is not supported.
The rte_flow_info_get() is available to retrieve the information about
supported pre-configurable resources. Both these functions must be called
before any other usage of the flow API engine.
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
rte_gpu_mem_cpu_map() exposes a GPU memory area to the CPU.
In gpudev communication list this is useful to store the
status flag.
A communication list status flag allocated on GPU memory
and mapped for CPU visibility can be updated by CPU and polled
by a GPU workload.
The polling operation is more frequent than the CPU update operation.
Having the status flag in GPU memory reduces the GPU workload polling
latency.
If CPU mapping feature is not enabled, status flag resides in
CPU memory registered so it's visible from the GPU.
To facilitate the interaction with the status flag, this patch
provides also the set/get functions for it.
Signed-off-by: Elena Agostini <eagostini@nvidia.com>
A user data field is added to the asymmetric session structure.
Relevant API added to get/set the field.
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Anoob Joseph <anoobj@marvell.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
The rte_cryptodev_asym_session structure is now moved to an internal
header. This will no longer be used directly by apps,
private session data can be accessed via get API.
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Anoob Joseph <anoobj@marvell.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Rather than using a session buffer that contains pointers to private
session data elsewhere, have a single session buffer.
This session is created for a driver ID, and the mempool element
contains space for the max session private data needed for any driver.
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Anoob Joseph <anoobj@marvell.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
The programmer's guide for cryptodev included sample code for using
Asymmetric crypto. This is now replaced with direct code from the test
application, using literal includes. It is broken into snippets as the
test application didn't have all of the required code in one function.
Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: Anoob Joseph <anoobj@marvell.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Add flow pattern items and header format for matching optional fields
(checksum/key/sequence) in GRE header. And the flags in gre item should
be correspondingly set with the new added items.
Signed-off-by: Sean Zhang <xiazhang@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Since dmadev is introduced in 21.11, to avoid the overhead of vhost DMA
abstraction layer and simplify application logics, this patch integrates
dmadev in asynchronous data path.
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Signed-off-by: Sunil Pai G <sunil.pai.g@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Enable the possibility to expose a GPU memory area and make it
accessible from the CPU.
GPU memory has to be allocated via rte_gpu_mem_alloc().
This patch allows the gpudev library to map (and unmap),
through the GPU driver, a chunk of GPU memory and to return
a memory pointer usable by the CPU to access the GPU memory area.
Signed-off-by: Elena Agostini <eagostini@nvidia.com>
DPDK 21.11 adds vfio support for DMA device in vhost. This patch
updates recommended IOVA mode in async datapath.
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Expose Linux EAL ability to reuse existing hugepage files
via --huge-unlink=never switch.
Default behavior is unchanged, it can also be specified
using --huge-unlink=existing for consistency.
Old --huge-unlink switch is kept,
it is an alias for --huge-unlink=always.
Add a test case for the --huge-unlink=never mode.
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
EAL malloc layer assumed all free elements content
is filled with zeros ("clean"), as opposed to uninitialized ("dirty").
This assumption was ensured in two ways:
1. EAL memalloc layer always returned clean memory.
2. Freed memory was cleared before returning into the heap.
Clearing the memory can be as slow as around 14 GiB/s.
To save doing so, memalloc layer is allowed to return dirty memory.
Such segments being marked with RTE_MEMSEG_FLAG_DIRTY.
The allocator tracks elements that contain dirty memory
using the new flag in the element header.
When clean memory is requested via rte_zmalloc*()
and the suitable element is dirty, it is cleared on allocation.
When memory is deallocated, the freed element is joined
with adjacent free elements, and the dirty flag is updated:
a) If the joint element contains dirty parts, it is dirty:
dirty + freed + dirty = dirty => no need to clean
freed + dirty = dirty the freed memory
Dirty parts may be large (e.g. initial allocation),
so clearing them could create unpredictable slowdown.
b) If the only dirty part of the joint element
is the freed memory, the joint element can be made clean:
clean + freed + clean = clean => freed memory
clean + freed = clean must be cleared
freed + clean = clean
freed = clean
This logic naturally reproduces the old behavior
and always applies in modes when EAL memalloc layer
returns only clean segments.
As a result, memory is either cleared on free, as before,
or it will be cleared on allocation if need be, but never twice.
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Hugepage mapping is a layer of EAL malloc builds upon.
There were implicit references to its details,
like mentions of segment file descriptors,
but no explicit description of its modes and operation.
Add an overview of mechanics used on ech supported OS.
Convert memory management subsections from list items
to level 4 headers: they are big and important enough.
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
The Kni kthreads seem to be re-scheduled at a granularity of roughly
1 millisecond right now, which seems to be insufficient for performing
tests involving a lot of control plane traffic.
Even if KNI_KTHREAD_RESCHEDULE_INTERVAL is set to 5 microseconds, it
seems that the existing code cannot reschedule at the desired granularily,
due to precision constraints of schedule_timeout_interruptible().
In our use case, we leverage the Linux Kernel for control plane, and
it is not uncommon to have 60K - 100K pps for some signaling protocols.
Since we are not in atomic context, the usleep_range() function seems to be
more appropriate for being able to introduce smaller controlled delays,
in the range of 5-10 microseconds. Upon reading the existing code, it would
seem that this was the original intent. Adding sub-millisecond delays,
seems unfeasible with a call to schedule_timeout_interruptible().
KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */
schedule_timeout_interruptible(
usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL));
Below, we attempted a brief comparison between the existing implementation,
which uses schedule_timeout_interruptible() and usleep_range().
We attempt to measure the CPU usage, and RTT between two Kni interfaces,
which are created on top of vmxnet3 adapters, connected by a vSwitch.
insmod rte_kni.ko kthread_mode=single carrier=on
schedule_timeout_interruptible(usecs_to_jiffies(5))
kni_single CPU Usage: 2-4 %
[root@localhost ~]# ping 1.1.1.2 -I eth1
PING 1.1.1.2 (1.1.1.2) from 1.1.1.1 eth1: 56(84) bytes of data.
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.70 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.00 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.99 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.985 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.00 ms
usleep_range(5, 10)
kni_single CPU usage: 50%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.338 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.150 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.123 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.139 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.159 ms
usleep_range(20, 50)
kni_single CPU usage: 24%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.202 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.170 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.171 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.248 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.185 ms
usleep_range(50, 100)
kni_single CPU usage: 13%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.537 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.257 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.231 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.143 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.200 ms
usleep_range(100, 200)
kni_single CPU usage: 7%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.716 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.167 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.459 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.455 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.252 ms
usleep_range(1000, 1100)
kni_single CPU usage: 2%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.22 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.17 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.17 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=1.17 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.15 ms
Upon testing, usleep_range(1000, 1100) seems roughly equivalent in
latency and cpu usage to the variant with schedule_timeout_interruptible(),
while usleep_range(100, 200) seems to give a decent tradeoff between
latency and cpu usage, while allowing users to tweak the limits for
improved precision if they have such use cases.
Disabling RTE_KNI_PREEMPT_DEFAULT, interestingly seems to lead to a
softlockup on my kernel.
Kernel panic - not syncing: softlockup: hung tasks
CPU: 0 PID: 1226 Comm: kni_single Tainted: G W O 3.10 #1
<IRQ> [<ffffffff814f84de>] dump_stack+0x19/0x1b
[<ffffffff814f7891>] panic+0xcd/0x1e0
[<ffffffff810993b0>] watchdog_timer_fn+0x160/0x160
[<ffffffff810644b2>] __run_hrtimer.isra.4+0x42/0xd0
[<ffffffff81064b57>] hrtimer_interrupt+0xe7/0x1f0
[<ffffffff8102cd57>] smp_apic_timer_interrupt+0x67/0xa0
[<ffffffff8150321d>] apic_timer_interrupt+0x6d/0x80
This patch also attempts to remove this option.
References:
[1] https://www.kernel.org/doc/Documentation/timers/timers-howto.txt
Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com>
Acked-by: Padraig Connolly <Padraig.J.Connolly@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
The generic RTE_FLOW_ACTION_TYPE_MODIFY_FIELD action was
introduced by [1]. This action provides an unified way
to perform various arithmetic and transfer operations over
packet network header fields and packet metadata.
[1] 73b68f4c54 ("ethdev: introduce generic modify flow action")
On other side there are a bunch of multiple legacy actions,
that can be superseded by the generic MODIFY_FIELD action:
RTE_FLOW_ACTION_TYPE_OF_SET_MPLS_TTL
RTE_FLOW_ACTION_TYPE_OF_DEC_MPLS_TTL
RTE_FLOW_ACTION_TYPE_OF_SET_NW_TTL
RTE_FLOW_ACTION_TYPE_OF_DEC_NW_TTL sfc
RTE_FLOW_ACTION_TYPE_OF_COPY_TTL_OUT
RTE_FLOW_ACTION_TYPE_OF_COPY_TTL_IN
RTE_FLOW_ACTION_TYPE_SET_IPV4_SRC bnxt, cxgbe, mlx5
RTE_FLOW_ACTION_TYPE_SET_IPV4_DST bnxt, cxgbe, mlx5
RTE_FLOW_ACTION_TYPE_SET_IPV6_SRC cxgbe, mlx5
RTE_FLOW_ACTION_TYPE_SET_IPV6_DST cxgbe, mlx5
RTE_FLOW_ACTION_TYPE_SET_TP_SRC cxgbe, mlx5
RTE_FLOW_ACTION_TYPE_SET_TP_DST cxgbe, mlx5
RTE_FLOW_ACTION_TYPE_DEC_TTL mlx5, sfc
RTE_FLOW_ACTION_TYPE_SET_TTL mlx5
RTE_FLOW_ACTION_TYPE_SET_MAC_SRC cxgbe, mlx5
RTE_FLOW_ACTION_TYPE_SET_MAC_DST cxgbe, mlx5
RTE_FLOW_ACTION_TYPE_INC_TCP_SEQ mlx5
RTE_FLOW_ACTION_TYPE_DEC_TCP_SEQ mlx5
RTE_FLOW_ACTION_TYPE_INC_TCP_ACK mlx5
RTE_FLOW_ACTION_TYPE_DEC_TCP_ACK mlx5
RTE_FLOW_ACTION_TYPE_SET_IPV4_DSCP mlx5
RTE_FLOW_ACTION_TYPE_SET_IPV6_DSCP mlx5
RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID bnxt, cnxk, cxgbe, enic,
mlx5, octeontx2, sfc
RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_PCP bnxt, cnxk, cxgbe, enic,
mlx5, octeontx2, sfc
RTE_FLOW_ACTION_TYPE_SET_TAG mlx5
RTE_FLOW_ACTION_TYPE_SET_META mlx5
This note deprecates the following RTE Flow actions,
as not supported by any of PMDs:
RTE_FLOW_ACTION_TYPE_OF_SET_MPLS_TTL
RTE_FLOW_ACTION_TYPE_OF_DEC_MPLS_TTL
RTE_FLOW_ACTION_TYPE_OF_SET_NW_TTL
RTE_FLOW_ACTION_TYPE_OF_COPY_TTL_OUT
RTE_FLOW_ACTION_TYPE_OF_COPY_TTL_IN
The following actions are supposed to be deprecated in 22.07
and replaced by generic field modify action:
RTE_FLOW_ACTION_TYPE_OF_DEC_NW_TTL
RTE_FLOW_ACTION_TYPE_SET_IPV4_SRC
RTE_FLOW_ACTION_TYPE_SET_IPV4_DST
RTE_FLOW_ACTION_TYPE_SET_IPV6_SRC
RTE_FLOW_ACTION_TYPE_SET_IPV6_DST
RTE_FLOW_ACTION_TYPE_SET_TP_SRC
RTE_FLOW_ACTION_TYPE_SET_TP_DST
RTE_FLOW_ACTION_TYPE_DEC_TTL
RTE_FLOW_ACTION_TYPE_SET_TTL
RTE_FLOW_ACTION_TYPE_SET_MAC_SRC
RTE_FLOW_ACTION_TYPE_SET_MAC_DST
RTE_FLOW_ACTION_TYPE_INC_TCP_SEQ
RTE_FLOW_ACTION_TYPE_DEC_TCP_SEQ
RTE_FLOW_ACTION_TYPE_INC_TCP_ACK
RTE_FLOW_ACTION_TYPE_DEC_TCP_ACK
RTE_FLOW_ACTION_TYPE_SET_IPV4_DSCP
RTE_FLOW_ACTION_TYPE_SET_IPV6_DSCP
RTE_FLOW_ACTION_TYPE_SET_TAG
RTE_FLOW_ACTION_TYPE_SET_META
The VLAN set actions are interrelated to VLAN header insertion/removal
and supported by multiple PMDs and widely used by applications and
not supposed to be deprecated due to potential large impact on
drivers and applications.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Currently, programmer's guide for the RIB and FIB libraries are missing.
This commit adds them.
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>
Change from "how many segments each segment can have" to
"how many segments each segment list can have".
Fixes: b317393283 ("doc: update guides for memory subsystem")
Cc: stable@dpdk.org
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Update the docs to reflect the two new variables, cpu_instruction_set
for non-arm builds and platform for arm builds.
Fixes: bf66003b51 ("build: use platform for generic and native builds")
Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Added a diagram to document meter library components
and added text for steps performed by the application to
configure the traffic meter and policing library.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
When using PMD Power Management, scale mode reacts slower than
monitor mode and pause mode. Add note in user guide to this
effect.
Signed-off-by: David Hunt <david.hunt@intel.com>
The doc's contain references to pmd but the proper use is to use PMD.
Cc: stable@dpdk.org
Signed-off-by: Sean Morrissey <sean.morrissey@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Removing the use of driver following PMD as its unnecessary.
Cc: stable@dpdk.org
Signed-off-by: Sean Morrissey <sean.morrissey@intel.com>
Signed-off-by: Conor Fogarty <conor.fogarty@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Reviewed-by: Conor Walsh <conor.walsh@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Some duplicate words were detected with a script.
Fixes: fdec9301f5 ("doc: add flow classify guides")
Fixes: 4dc6d8e63c ("doc: add graph library guide")
Fixes: 30d3aa861d ("doc: rework VM power manager user guide")
Fixes: 0d547ed037 ("examples/ipsec-secgw: support configuration file")
Fixes: e64833f227 ("examples/l2fwd-keepalive: add sample application")
Cc: stable@dpdk.org
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: David Marchand <david.marchand@redhat.com>
To enable bifurcated device support, rtnl_lock is released before calling
userspace callbacks and asynchronous requests are enabled.
But these changes caused more issues, like bug #809, #816. To reduce the
scope of the problems, the bifurcated device support related changes are
only enabled when it is requested explicitly with new 'enable_bifurcated'
module parameter.
And bifurcated device support is disabled by default.
So the bifurcated device related problems are isolated and they can be
fixed without impacting all use cases.
Bugzilla ID: 816
Fixes: 631217c761 ("kni: fix kernel deadlock with bifurcated device")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Igor Ryzhov <iryzhov@nfware.com>
This library can be made optional.
drivers/gpu and app/test-gpudev depend on this library,
so they are automatically disabled if the lib is disabled.
Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Add support for Address Sanitizer (ASan) for PPC/POWER architecture.
Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
This patch defines ASAN_SHADOW_OFFSET for arm64 according to the ASan
documentation. This offset should cover all arm64 VMAs supported by
ASan.
Signed-off-by: Volodymyr Fialko <vfialko@marvell.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Ruifeng Wang <ruifeng.wang@arm.com>
In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
When mixing network activity with task processing there may be the need
to put in communication the CPU with the device in order to synchronize
operations.
An example could be a receive-and-process application
where CPU is responsible for receiving packets in multiple mbufs
and the GPU is responsible for processing the content of those packets.
The purpose of this list is to provide a buffer in CPU memory visible
from the GPU that can be treated as a circular buffer
to let the CPU provide fondamental info of received packets to the GPU.
A possible use-case is described below.
CPU:
- Trigger some task on the GPU
- in a loop:
- receive a number of packets
- provide packets info to the GPU
GPU:
- Do some pre-processing
- Wait to receive a new set of packet to be processed
Layout of a communication list would be:
-------
| 0 | => pkt_list
| status |
| #pkts |
-------
| 1 | => pkt_list
| status |
| #pkts |
-------
| 2 | => pkt_list
| status |
| #pkts |
-------
| .... | => pkt_list
-------
Signed-off-by: Elena Agostini <eagostini@nvidia.com>
In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
When mixing network activity with task processing there may be the need
to put in communication the CPU with the device in order to synchronize
operations.
The purpose of this flag is to allow the CPU and the GPU to
exchange ACKs. A possible use-case is described below.
CPU:
- Trigger some task on the GPU
- Prepare some data
- Signal to the GPU the data is ready updating the communication flag
GPU:
- Do some pre-processing
- Wait for more data from the CPU polling on the communication flag
- Consume the data prepared by the CPU
Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Add a function for the application to ensure the coherency
of the writes executed by another device into the GPU memory.
Signed-off-by: Elena Agostini <eagostini@nvidia.com>
In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
Such workload distribution can be achieved by sharing some memory.
As a first step, the features are focused on memory management.
A function allows to allocate memory inside the device,
or in the main (CPU) memory while making it visible for the device.
This memory may be used to save packets or for synchronization data.
The next step should focus on GPU processing task control.
Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
The computing device may operate in some isolated contexts.
Memory and processing are isolated in a silo represented by
a child device.
The context is provided as an opaque by the caller of
rte_gpu_add_child().
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
The new library gpudev is for dealing with GPGPU computing devices
from a DPDK application running on the CPU.
The infrastructure is prepared to welcome drivers in drivers/gpu/.
Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
This patch adds new api ``rte_event_eth_rx_adapter_queue_stats_get`` to
retrieve queue stats. The queue stats are in the format
``struct rte_event_eth_rx_adapter_queue_stats``.
For resetting the queue stats,
``rte_event_eth_rx_adapter_queue_stats_reset`` api is added.
The adapter stats_get and stats_reset apis are also updated to
handle queue level event buffer use case.
Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Add support for transmit segmentation offload to inline crypto processing
mode. This offload is not supported by other offload modes, as at a
minimum it requires inline crypto for IPsec to be supported on the
network interface.
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Signed-off-by: Abhijit Sinha <abhijit.sinha@intel.com>
Signed-off-by: Daniel Martin Buckley <daniel.m.buckley@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
The cryptodev library now registers commands with telemetry, and
implements the corresponding callback functions. These commands
allow a list of cryptodevs to be queried, as well as info and stats
for the corresponding cryptodev.
An example usage can be seen below:
Connecting to /var/run/dpdk/rte/dpdk_telemetry.v2
{"version": "DPDK 21.11.0-rc0", "pid": 1135019, "max_output_len": 16384}
--> /
{"/": ["/", "/cryptodev/info", "/cryptodev/list", "/cryptodev/stats", ...]}
--> /cryptodev/list
{"/cryptodev/list": [0,1,2,3]}
--> /cryptodev/info,0
{"/cryptodev/info": {"device_name": "0000:1c:01.0_qat_sym", \
"max_nb_queue_pairs": 2}}
--> /cryptodev/stats,0
{"/cryptodev/stats": {"enqueued_count": 0, "dequeued_count": 0, \
"enqueue_err_count": 0, "dequeue_err_count": 0}}
Signed-off-by: Rebecca Troy <rebecca.troy@intel.com>
Acked-by: Ciara Power <ciara.power@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
As previously announced, this patch renames struct
vhost_device_ops to struct rte_vhost_device_ops.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
rte_flow_action_handle_create() did not mention what happens
with an indirect action when a device is stopped and started again.
It is natural for some indirect actions, like counter, to be persistent.
Keeping others at least saves application time and complexity.
However, not all PMDs can support it, or the support may be limited
by particular action kinds, that is, combinations of action type
and the value of the transfer bit in its configuration.
Add a device capability to indicate if at least some indirect actions
are kept across the above sequence. Without this capability the behavior
is still unspecified, and application is required to destroy
the indirect actions before stopping the device.
In the future, indirect actions may not be the only type of objects
shared between flow rules. The capability bit intends to cover all
possible types of such objects, hence its name.
Declare that the application can test for the persistence
of a particular indirect action kind by attempting to create
an indirect action of that kind when the device is stopped
and checking for the specific error type.
This is logical because if the PMD can to create an indirect action
when the device is not started and use it after the start happens,
it is natural that it can move its internal flow shared object
to the same state when the device is stopped and restore the state
when the device is started.
Indirect action persistence across a reconfigurations is not required.
In case a PMD cannot keep the indirect actions across reconfiguration,
it is allowed just to report an error.
Application must then flush the indirect actions before attempting it.
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Previously, it was not specified what happens to the flow rules
when the device is stopped, possibly reconfigured, then started.
If flow rules were kept, it could be convenient for application
developers, because they wouldn't need to save and restore them.
However, due to the number of flows and possible creation rate it is
impractical to save all flow rules in DPDK layer. This means that flow
rules persistence really depends on whether PMD and HW can implement it
efficiently. It can also be limited by the rule item and action types,
and its attributes transfer bit (a combination of an item/action type
and a value of the transfer bit is called a rule feature).
Add a device capability bit for PMDs that can keep at least some
of the flow rules across restart. Without this capability behavior
is still unspecified and it is declared that the application must
flush the rules before stopping the device.
Allow the application to test for persistence of rules using
a particular feature by attempting to create a flow rule
using that feature when the device is stopped
and checking for the specific error.
This is logical because if the PMD can to create the flow rule
when the device is not started and use it after the start happens,
it is natural that it can move its internal flow rule object
to the same state when the device is stopped and restore the state
when the device is started.
Rule persistence across a reconfigurations is not required,
because tracking all the rules and configuration-dependent resources
they use may be infeasible. In case a PMD cannot keep the rules
across reconfiguration, it is allowed just to report an error.
Application must then flush the rules before attempting it.
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Implement PIE based congestion management based on rfc8033.
The Proportional Integral Controller Enhanced (PIE) algorithm works
by proactively dropping packets randomly.
PIE is implemented as more advanced queue management is required to
address the bufferbloat problem and provide desirable quality of
service to users.
Tests for PIE code added to test application.
Added PIE related information to documentation.
Signed-off-by: Wojciech Liguzinski <wojciechx.liguzinski@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Jasvinder Singh <jasvinder.singh@intel.com>
This patch adds a bulk version for the Toeplitz hash implemented
with Galios Fields New Instructions (GFNI).
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
This patch add a new Toeplitz hash implementation using
Galios Fields New Instructions (GFNI).
Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
This patch adds necessary hooks in the memory allocator for ASan.
This feature is currently available in DPDK only on Linux x86_64.
If other OS/architectures want to support it, ASAN_SHADOW_OFFSET must be
defined and RTE_MALLOC_ASAN must be set accordingly in meson.
Signed-off-by: Xueqin Lin <xueqin.lin@intel.com>
Signed-off-by: Zhihong Peng <zhihongx.peng@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>