rte_gpu_mem_cpu_map() exposes a GPU memory area to the CPU.
In gpudev communication list this is useful to store the
status flag.
A communication list status flag allocated on GPU memory
and mapped for CPU visibility can be updated by CPU and polled
by a GPU workload.
The polling operation is more frequent than the CPU update operation.
Having the status flag in GPU memory reduces the GPU workload polling
latency.
If CPU mapping feature is not enabled, status flag resides in
CPU memory registered so it's visible from the GPU.
To facilitate the interaction with the status flag, this patch
provides also the set/get functions for it.
Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Update rte_gpu_mem_cpu_unmap() header documentation
and the test application to use GPU pointer when unmapping.
Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Enable the possibility to expose a GPU memory area and make it
accessible from the CPU.
GPU memory has to be allocated via rte_gpu_mem_alloc().
This patch allows the gpudev library to map (and unmap),
through the GPU driver, a chunk of GPU memory and to return
a memory pointer usable by the CPU to access the GPU memory area.
Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Similarly to rte_malloc, rte_gpu_mem_alloc accepts as
input the memory alignment size.
GPU driver should return GPU memory address aligned
with the input value.
Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Remove all memory leaks in case of errors in
test-gpudev application.
Fixes: e818c4e2bf ("gpudev: add memory API")
Fixes: c7ebd65c13 ("gpudev: add communication list")
Signed-off-by: Elena Agostini <eagostini@nvidia.com>
In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
When mixing network activity with task processing there may be the need
to put in communication the CPU with the device in order to synchronize
operations.
An example could be a receive-and-process application
where CPU is responsible for receiving packets in multiple mbufs
and the GPU is responsible for processing the content of those packets.
The purpose of this list is to provide a buffer in CPU memory visible
from the GPU that can be treated as a circular buffer
to let the CPU provide fondamental info of received packets to the GPU.
A possible use-case is described below.
CPU:
- Trigger some task on the GPU
- in a loop:
- receive a number of packets
- provide packets info to the GPU
GPU:
- Do some pre-processing
- Wait to receive a new set of packet to be processed
Layout of a communication list would be:
-------
| 0 | => pkt_list
| status |
| #pkts |
-------
| 1 | => pkt_list
| status |
| #pkts |
-------
| 2 | => pkt_list
| status |
| #pkts |
-------
| .... | => pkt_list
-------
Signed-off-by: Elena Agostini <eagostini@nvidia.com>
In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
When mixing network activity with task processing there may be the need
to put in communication the CPU with the device in order to synchronize
operations.
The purpose of this flag is to allow the CPU and the GPU to
exchange ACKs. A possible use-case is described below.
CPU:
- Trigger some task on the GPU
- Prepare some data
- Signal to the GPU the data is ready updating the communication flag
GPU:
- Do some pre-processing
- Wait for more data from the CPU polling on the communication flag
- Consume the data prepared by the CPU
Signed-off-by: Elena Agostini <eagostini@nvidia.com>
In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
Such workload distribution can be achieved by sharing some memory.
As a first step, the features are focused on memory management.
A function allows to allocate memory inside the device,
or in the main (CPU) memory while making it visible for the device.
This memory may be used to save packets or for synchronization data.
The next step should focus on GPU processing task control.
Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
The computing device may operate in some isolated contexts.
Memory and processing are isolated in a silo represented by
a child device.
The context is provided as an opaque by the caller of
rte_gpu_add_child().
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
The new library gpudev is for dealing with GPGPU computing devices
from a DPDK application running on the CPU.
The infrastructure is prepared to welcome drivers in drivers/gpu/.
Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>