10 Commits

Author SHA1 Message Date
Elena Agostini
579147d7aa gpudev: remove unnecessary memory barrier
Remove unnecessary rte_gpu_wmb from rte_gpu_comm_populate_list_pkts.
It causes a performance degradation in case of NVIDIA GPU V100.

This change doesn't affect any functionality as the status resides
in CPU registered memory.

Fixes: c7ebd65c1372 ("gpudev: add communication list")

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
2021-11-26 12:26:46 +01:00
Elena Agostini
1674c56dc3 gpudev: manage null parameters in memory functions
The gpudev functions free, register and unregister
return gracefully if input pointer is NULL or size 0,
as API doc was indicating no-op accepted values.

CUDA driver checks are removed because redundant
with the checks added in gpudev library.

Fixes: e818c4e2bf50 ("gpudev: add memory API")

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
2021-11-24 09:38:43 +01:00
Elena Agostini
c7ebd65c13 gpudev: add communication list
In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
When mixing network activity with task processing there may be the need
to put in communication the CPU with the device in order to synchronize
operations.

An example could be a receive-and-process application
where CPU is responsible for receiving packets in multiple mbufs
and the GPU is responsible for processing the content of those packets.

The purpose of this list is to provide a buffer in CPU memory visible
from the GPU that can be treated as a circular buffer
to let the CPU provide fondamental info of received packets to the GPU.

A possible use-case is described below.

CPU:
- Trigger some task on the GPU
- in a loop:
    - receive a number of packets
    - provide packets info to the GPU

GPU:
- Do some pre-processing
- Wait to receive a new set of packet to be processed

Layout of a communication list would be:

     -------
    |   0    | => pkt_list
    | status |
    | #pkts  |
     -------
    |   1    | => pkt_list
    | status |
    | #pkts  |
     -------
    |   2    | => pkt_list
    | status |
    | #pkts  |
     -------
    |  ....  | => pkt_list
     -------

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
2021-11-08 17:20:53 +01:00
Elena Agostini
f56160a255 gpudev: add communication flag
In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
When mixing network activity with task processing there may be the need
to put in communication the CPU with the device in order to synchronize
operations.

The purpose of this flag is to allow the CPU and the GPU to
exchange ACKs. A possible use-case is described below.

CPU:
- Trigger some task on the GPU
- Prepare some data
- Signal to the GPU the data is ready updating the communication flag

GPU:
- Do some pre-processing
- Wait for more data from the CPU polling on the communication flag
- Consume the data prepared by the CPU

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
2021-11-08 17:20:53 +01:00
Elena Agostini
2d61b429cf gpudev: add memory barrier
Add a function for the application to ensure the coherency
of the writes executed by another device into the GPU memory.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
2021-11-08 17:20:53 +01:00
Elena Agostini
e818c4e2bf gpudev: add memory API
In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
Such workload distribution can be achieved by sharing some memory.

As a first step, the features are focused on memory management.
A function allows to allocate memory inside the device,
or in the main (CPU) memory while making it visible for the device.
This memory may be used to save packets or for synchronization data.

The next step should focus on GPU processing task control.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2021-11-08 17:20:53 +01:00
Thomas Monjalon
a9af048aba gpudev: support multi-process
The device data shared between processes are moved in a struct
allocated in a shared memory (a new memzone for all GPUs).
The main struct rte_gpu references the shared memory
via the pointer mpshared.

The API function rte_gpu_attach() is added to attach a device
from the secondary process.
The function rte_gpu_allocate() can be used only by primary process.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2021-11-08 17:20:53 +01:00
Thomas Monjalon
82e5f6b658 gpudev: add child device representing a device context
The computing device may operate in some isolated contexts.
Memory and processing are isolated in a silo represented by
a child device.
The context is provided as an opaque by the caller of
rte_gpu_add_child().

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2021-11-08 17:20:52 +01:00
Thomas Monjalon
18cb075631 gpudev: add event notification
Callback functions may be registered for a device event.
Callback management is per-process and not thread-safe.

The events RTE_GPU_EVENT_NEW and RTE_GPU_EVENT_DEL
are notified respectively after creation and before removal
of a device, as part of the library functions.
Some future events may be emitted from drivers.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2021-11-08 17:20:52 +01:00
Elena Agostini
8b8036a66e gpudev: introduce GPU device class library
In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.

The new library gpudev is for dealing with GPGPU computing devices
from a DPDK application running on the CPU.

The infrastructure is prepared to welcome drivers in drivers/gpu/.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2021-11-08 17:20:52 +01:00