This patch fixes coverity issue by adding a check for negative event fd
value.
Coverity issue: 373711, 373694
Fixes: c2bd9367e1 ("lib: remove direct access to interrupt handle")
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Acked-by: David Marchand <david.marchand@redhat.com>
In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
When mixing network activity with task processing there may be the need
to put in communication the CPU with the device in order to synchronize
operations.
An example could be a receive-and-process application
where CPU is responsible for receiving packets in multiple mbufs
and the GPU is responsible for processing the content of those packets.
The purpose of this list is to provide a buffer in CPU memory visible
from the GPU that can be treated as a circular buffer
to let the CPU provide fondamental info of received packets to the GPU.
A possible use-case is described below.
CPU:
- Trigger some task on the GPU
- in a loop:
- receive a number of packets
- provide packets info to the GPU
GPU:
- Do some pre-processing
- Wait to receive a new set of packet to be processed
Layout of a communication list would be:
-------
| 0 | => pkt_list
| status |
| #pkts |
-------
| 1 | => pkt_list
| status |
| #pkts |
-------
| 2 | => pkt_list
| status |
| #pkts |
-------
| .... | => pkt_list
-------
Signed-off-by: Elena Agostini <eagostini@nvidia.com>
In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
When mixing network activity with task processing there may be the need
to put in communication the CPU with the device in order to synchronize
operations.
The purpose of this flag is to allow the CPU and the GPU to
exchange ACKs. A possible use-case is described below.
CPU:
- Trigger some task on the GPU
- Prepare some data
- Signal to the GPU the data is ready updating the communication flag
GPU:
- Do some pre-processing
- Wait for more data from the CPU polling on the communication flag
- Consume the data prepared by the CPU
Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Add a function for the application to ensure the coherency
of the writes executed by another device into the GPU memory.
Signed-off-by: Elena Agostini <eagostini@nvidia.com>
In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
Such workload distribution can be achieved by sharing some memory.
As a first step, the features are focused on memory management.
A function allows to allocate memory inside the device,
or in the main (CPU) memory while making it visible for the device.
This memory may be used to save packets or for synchronization data.
The next step should focus on GPU processing task control.
Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
The device data shared between processes are moved in a struct
allocated in a shared memory (a new memzone for all GPUs).
The main struct rte_gpu references the shared memory
via the pointer mpshared.
The API function rte_gpu_attach() is added to attach a device
from the secondary process.
The function rte_gpu_allocate() can be used only by primary process.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
The computing device may operate in some isolated contexts.
Memory and processing are isolated in a silo represented by
a child device.
The context is provided as an opaque by the caller of
rte_gpu_add_child().
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Callback functions may be registered for a device event.
Callback management is per-process and not thread-safe.
The events RTE_GPU_EVENT_NEW and RTE_GPU_EVENT_DEL
are notified respectively after creation and before removal
of a device, as part of the library functions.
Some future events may be emitted from drivers.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
The new library gpudev is for dealing with GPGPU computing devices
from a DPDK application running on the CPU.
The infrastructure is prepared to welcome drivers in drivers/gpu/.
Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Currently, when code is running on FreeBSD or Windows, there is no way
to distinguish between a geniune error and a "VFIO is unsupported"
error. Fix the dummy implementations to also set the rte_errno flag.
Fixes: 279b581c89 ("vfio: expose functions")
Fixes: c564a2a200 ("vfio: expose clear group function for internal usages")
Fixes: 964b2f3bfb ("vfio: export some internal functions")
Fixes: ea2dc10668 ("vfio: add multi container support")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
On FreeBSD, `rte_vfio_is_enabled()` and `rte_vfio_noiommu_is_enabled()`
API calls will not return error, and will instead return 0. This is
intentional, because the caller of this API does not care whether VFIO
is supported at all, and will instead be interested in whether VFIO is
enabled or not. However, the doxygen comments for these functions state
that they will return an error on FreeBSD, which is incorrect.
Fix the doxygen comment to call out the fact that these
functions are only relevant on Linux, but remove the reference to
returning errors.
Fixes: 279b581c89 ("vfio: expose functions")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
On FreeBSD, `rte_vfio_clear_group()` was returning 0 even though this
function is not valid for FreeBSD, and is called out to return error in
doxygen comments.
Fix the return value to match documentation.
Fixes: c564a2a200 ("vfio: expose clear group function for internal usages")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Currently, VFIO support for Linux is compiled unconditionally, and
supported kernel versions start with 4.4, so VFIO is assumed to always
be enabled. There is no way of disabling VFIO support at compile time
anyway, so just drop the "VFIO not available" fallback code altogether.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
Some drivers may return errcode when switch allmulticast mode,
so it's necessary to check the return code.
Fixes: b34801d1aa ("kni: support allmulticast mode set")
Cc: stable@dpdk.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
Compare pkt_len to 0 instead of NULL to avoid the following build
failure with debug mode enabled:
../lib/sched/rte_pie.h: In function 'rte_pie_enqueue_empty':
../lib/sched/rte_pie.h:125:21: error: comparison between pointer
and integer [-Werror]
RTE_ASSERT(pkt_len != NULL);
Bugzilla ID: 878
Fixes: 44c730b0e3 ("sched: add PIE based congestion management")
Signed-off-by: Ali Alnubani <alialnu@nvidia.com>
'eth_dev->data' can be null before ethdev allocated. The API walks
through all eth devices, at least for some data can be null.
Adding 'eth_dev->data' null check before accessing it.
Fixes: 33c73aae32 ("ethdev: allow ownership operations on unused port")
Cc: stable@dpdk.org
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Introduce support for event devices requiring calls to
rte_event_maintain() in the Ethernet RX, Timer and Crypto Eventdev
adapters.
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Tested-by: Richard Eklycke <richard.eklycke@ericsson.com>
Extend Eventdev API to allow for event devices which require various
forms of internal processing to happen, even when events are not
enqueued to or dequeued from a port.
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Tested-by: Richard Eklycke <richard.eklycke@ericsson.com>
Tested-by: Liron Himi <lironh@marvell.com>
Added telemetry support for rxa_queue_stats and
rxa_queue_stats_reset to get and reset rx queue
stats respectively.
Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
This patch adds new api ``rte_event_eth_rx_adapter_queue_stats_get`` to
retrieve queue stats. The queue stats are in the format
``struct rte_event_eth_rx_adapter_queue_stats``.
For resetting the queue stats,
``rte_event_eth_rx_adapter_queue_stats_reset`` api is added.
The adapter stats_get and stats_reset apis are also updated to
handle queue level event buffer use case.
Signed-off-by: Naga Harish K S V <s.v.naga.harish.k@intel.com>
Acked-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com>
Add 1MB data-unit length to the capability's bitmap.
Handle 1MB data-unit length in the mlx5 session create operation,
and expose its capability in the mlx5 capabilities.
Signed-off-by: Raja Zidane <rzidane@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Add support for transmit segmentation offload to inline crypto processing
mode. This offload is not supported by other offload modes, as at a
minimum it requires inline crypto for IPsec to be supported on the
network interface.
Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Signed-off-by: Abhijit Sinha <abhijit.sinha@intel.com>
Signed-off-by: Daniel Martin Buckley <daniel.m.buckley@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
This promotes the bbdev interface to stable.
Overdue for some time as bbdev interface has been stable.
Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
The cryptodev library now registers commands with telemetry, and
implements the corresponding callback functions. These commands
allow a list of cryptodevs to be queried, as well as info and stats
for the corresponding cryptodev.
An example usage can be seen below:
Connecting to /var/run/dpdk/rte/dpdk_telemetry.v2
{"version": "DPDK 21.11.0-rc0", "pid": 1135019, "max_output_len": 16384}
--> /
{"/": ["/", "/cryptodev/info", "/cryptodev/list", "/cryptodev/stats", ...]}
--> /cryptodev/list
{"/cryptodev/list": [0,1,2,3]}
--> /cryptodev/info,0
{"/cryptodev/info": {"device_name": "0000:1c:01.0_qat_sym", \
"max_nb_queue_pairs": 2}}
--> /cryptodev/stats,0
{"/cryptodev/stats": {"enqueued_count": 0, "dequeued_count": 0, \
"enqueue_err_count": 0, "dequeue_err_count": 0}}
Signed-off-by: Rebecca Troy <rebecca.troy@intel.com>
Acked-by: Ciara Power <ciara.power@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
RTE flow API defines two flow elements types - common and PMD private.
Common RTE flow types are defined in rte_flow.h while PMD private
types exists inside specific PMD only. Application can create a flow
rule with PMD private items or actions. RTE flow API restricts
private PMD types to negative values.
Current implementation tried to use negative PMD private item type
value as index in the rte_flow_desc_item[] array.
The patch allows access to rte_flow_desc_item[] and
rte_flow_desc_action[] arrays to non-private PMD types only.
Fixes: 6cf7204733 ("ethdev: support flow elements with variable length")
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
The function rte_eth_dev_is_removed() was introduced in DPDK 18.02,
and is integrated in error checks of ethdev library.
It is promoted as stable ABI.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
As previously announced, this patch renames struct
vhost_device_ops to struct rte_vhost_device_ops.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This patch marks the vDPA driver APIs as internal and
rename the corresponding header file to vdpa_driver.h.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
rte_flow_action_handle_create() did not mention what happens
with an indirect action when a device is stopped and started again.
It is natural for some indirect actions, like counter, to be persistent.
Keeping others at least saves application time and complexity.
However, not all PMDs can support it, or the support may be limited
by particular action kinds, that is, combinations of action type
and the value of the transfer bit in its configuration.
Add a device capability to indicate if at least some indirect actions
are kept across the above sequence. Without this capability the behavior
is still unspecified, and application is required to destroy
the indirect actions before stopping the device.
In the future, indirect actions may not be the only type of objects
shared between flow rules. The capability bit intends to cover all
possible types of such objects, hence its name.
Declare that the application can test for the persistence
of a particular indirect action kind by attempting to create
an indirect action of that kind when the device is stopped
and checking for the specific error type.
This is logical because if the PMD can to create an indirect action
when the device is not started and use it after the start happens,
it is natural that it can move its internal flow shared object
to the same state when the device is stopped and restore the state
when the device is started.
Indirect action persistence across a reconfigurations is not required.
In case a PMD cannot keep the indirect actions across reconfiguration,
it is allowed just to report an error.
Application must then flush the indirect actions before attempting it.
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Previously, it was not specified what happens to the flow rules
when the device is stopped, possibly reconfigured, then started.
If flow rules were kept, it could be convenient for application
developers, because they wouldn't need to save and restore them.
However, due to the number of flows and possible creation rate it is
impractical to save all flow rules in DPDK layer. This means that flow
rules persistence really depends on whether PMD and HW can implement it
efficiently. It can also be limited by the rule item and action types,
and its attributes transfer bit (a combination of an item/action type
and a value of the transfer bit is called a rule feature).
Add a device capability bit for PMDs that can keep at least some
of the flow rules across restart. Without this capability behavior
is still unspecified and it is declared that the application must
flush the rules before stopping the device.
Allow the application to test for persistence of rules using
a particular feature by attempting to create a flow rule
using that feature when the device is stopped
and checking for the specific error.
This is logical because if the PMD can to create the flow rule
when the device is not started and use it after the start happens,
it is natural that it can move its internal flow rule object
to the same state when the device is stopped and restore the state
when the device is started.
Rule persistence across a reconfigurations is not required,
because tracking all the rules and configuration-dependent resources
they use may be infeasible. In case a PMD cannot keep the rules
across reconfiguration, it is allowed just to report an error.
Application must then flush the rules before attempting it.
Signed-off-by: Dmitry Kozlyuk <dkozlyuk@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
This patch increases the number of IO vectors for the
asynchronous data path from 512 to 2048. It has been
reported during testing the starvation of IO vectors
during iperf benchmark with 64KB packet size.
As there are no direct relationship between
VHOST_MAX_ASYNC_VEC and BUF_VECTOR_MAX, this patch also
assign VHOST_MAX_ASYNC_VEC value directly instead of being
a multiple of BUF_VECTOR_MAX.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
This patches merges copy_mbuf_to_desc() used by the sync
path with async_mbuf_to_desc() used by the async path.
Most of these complex functions are identical, so merging
them will make the maintenance easier.
In order not to degrade performance, the patch introduces
a boolean function parameter to specify whether it is called
in async context. This boolean is statically passed to this
always-inlined function, so the compiler will optimize this
out.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
This patch extracts the descriptors buffers filling
from copy_mbuf_to_desc() into a dedicated function as a
preliminary step of merging copy_mubf_to_desc() and
async_mbuf_to_desc().
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
This patch extracts the IO vectors filling from
async_mbuf_to_desc() into a dedicated function as a
preliminary step of merging copy_mubf_to_desc() and
async_mbuf_to_desc().
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
This patch reworks the function getting the index
for the first packet in-flight.
When this index turns out to be zero, let's use the simple
path. Doing that avoid having to do a modulo with the
virtqueue size.
The patch also rename the function for better clarification,
and only pass the virtqueue metadata pointer, as all the
needed information are stored there.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
vhost_poll_enqueue_completed() assumes some inflight
packets could have been completed in a previous call but
not returned to the application. But this is not the case,
since check_completed_copies callback is never called with
more than the current count as argument.
In other words, async->last_pkts_n is always 0. Removing it
greatly simplifies the function.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
Now that IO vectors iterator have been simplified, the
rte_vhost_async_desc struct only contains a pointer on
the iterator array stored in the async metadata.
This patch removes it, and pass directly the iterators
array pointer to the transfer_data callback. Doing that,
we avoid declaring the descriptor array in the stack, and
also avoid the cost of filling it.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
IO vectors and their iterators arrays were part of the
async metadata but not their indexes.
In order to makes this more consistent, the patch adds the
indexes to the async metadata. Doing that, we can avoid
triggering DMA transfer within the loop as it IO vector
index overflow is now prevented in the async_mbuf_to_desc()
function.
Note that previous detection mechanism was broken
since the overflow already happened when detected, so OOB
memory access would already have happened.
With this changes done, virtio_dev_rx_async_submit_split()
and virtio_dev_rx_async_submit_packed() can be further
simplified.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
Offset and count fields are unused and so can be removed.
The offset field was actually in the Vhost example, but
in a way that does not make sense.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
This patch introduces rte_vhost_iovec struct that contains
both source and destination addresses since we always have
a 1:1 mapping between source and destination. While using
the standard iovec struct might have seemed better, having
to duplicate IO vectors and its iterators is memory
inefficient and make the implementation more complex.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
Reaching the async batch threshold was one of the condition
to trigger the DMA transfer. However, this condition was
never met since the threshold value is 32, same as the
MAX_PKT_BURST value.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
This patch splits the iterator arrays in two, one for
source and one for destination. The goal is make the code
easier to understand.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
IO vectors implementation is unnecessarily complex, mixing
source and destinations vectors in the same array.
This patch declares two arrays, one for the source and one
for the destination. It also gets rid of seg_awaits variable
in both packed and split implementation, which is the same
as iovec_idx.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
This patch moves async_inflight_info struct to internal
header since it should not be part of the API.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
This patch moves async-related metadata from vhost_virtqueue
to a dedicated struct. It makes it clear which fields are
async related, and also saves some memory when async feature
is not in use.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Jiayu Hu <jiayu.hu@intel.com>
Since some vdevs like virtio and vhost do not support rxq_info_get and
queue state inquiry, the error return value -ENOTSUP need to be ignored
when queue_stopped cannot get rx queue information and rx queue state.
This patch changes the return value of queue_stopped when
rte_eth_rx_queue_info_get return -ENOTSUP to support vdevs which cannot
provide rx queue information and rx queue state enable power management.
Fixes: 209fd58545 ("power: make ethdev power management thread unsafe")
Cc: stable@dpdk.org
Signed-off-by: Miao Li <miao.li@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
This commit defines rte_vhost_power_monitor_cond which is used to pass
some information to vhost driver. The information is including the
address to monitor, the expected value, the mask to extract value read
from 'addr', the value size of monitor address, the match flag used to
distinguish the value used to match something or not match something.
Vhost driver can use these information to fill rte_power_monitor_cond.
Signed-off-by: Miao Li <miao.li@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>