In heterogeneous computing system, processing is not only in the CPU. Some tasks can be delegated to devices working in parallel. When mixing network activity with task processing there may be the need to put in communication the CPU with the device in order to synchronize operations. An example could be a receive-and-process application where CPU is responsible for receiving packets in multiple mbufs and the GPU is responsible for processing the content of those packets. The purpose of this list is to provide a buffer in CPU memory visible from the GPU that can be treated as a circular buffer to let the CPU provide fondamental info of received packets to the GPU. A possible use-case is described below. CPU: - Trigger some task on the GPU - in a loop: - receive a number of packets - provide packets info to the GPU GPU: - Do some pre-processing - Wait to receive a new set of packet to be processed Layout of a communication list would be: ------- | 0 | => pkt_list | status | | #pkts | ------- | 1 | => pkt_list | status | | #pkts | ------- | 2 | => pkt_list | status | | #pkts | ------- | .... | => pkt_list ------- Signed-off-by: Elena Agostini <eagostini@nvidia.com>
105 lines
3.9 KiB
ReStructuredText
105 lines
3.9 KiB
ReStructuredText
.. SPDX-License-Identifier: BSD-3-Clause
|
|
Copyright (c) 2021 NVIDIA Corporation & Affiliates
|
|
|
|
General-Purpose Graphics Processing Unit Library
|
|
================================================
|
|
|
|
When mixing networking activity with task processing on a GPU device,
|
|
there may be the need to put in communication the CPU with the device
|
|
in order to manage the memory, synchronize operations, exchange info, etc..
|
|
|
|
By means of the generic GPU interface provided by this library,
|
|
it is possible to allocate a chunk of GPU memory and use it
|
|
to create a DPDK mempool with external mbufs having the payload
|
|
on the GPU memory, enabling any network interface card
|
|
(which support this feature like Mellanox NIC)
|
|
to directly transmit and receive packets using GPU memory.
|
|
|
|
Additionally, this library provides a number of functions
|
|
to enhance the dialog between CPU and GPU.
|
|
|
|
Out of scope of this library is to provide a wrapper for GPU specific libraries
|
|
(e.g. CUDA Toolkit or OpenCL), thus it is not possible to launch workload
|
|
on the device or create GPU specific objects
|
|
(e.g. CUDA Driver context or CUDA Streams in case of NVIDIA GPUs).
|
|
|
|
|
|
Features
|
|
--------
|
|
|
|
This library provides a number of features:
|
|
|
|
- Interoperability with device-specific library through generic handlers.
|
|
- Allocate and free memory on the device.
|
|
- Register CPU memory to make it visible from the device.
|
|
- Communication between the CPU and the device.
|
|
|
|
The whole CPU - GPU communication is implemented
|
|
using CPU memory visible from the GPU.
|
|
|
|
|
|
API Overview
|
|
------------
|
|
|
|
Child Device
|
|
~~~~~~~~~~~~
|
|
|
|
By default, DPDK PCIe module detects and registers physical GPU devices
|
|
in the system.
|
|
With the gpudev library is also possible to add additional non-physical devices
|
|
through an ``uint64_t`` generic handler (e.g. CUDA Driver context)
|
|
that will be registered internally by the driver as an additional device (child)
|
|
connected to a physical device (parent).
|
|
Each device (parent or child) is represented through a ID
|
|
required to indicate which device a given operation should be executed on.
|
|
|
|
Memory Allocation
|
|
~~~~~~~~~~~~~~~~~
|
|
|
|
gpudev can allocate on an input given GPU device a memory area
|
|
returning the pointer to that memory.
|
|
Later, it's also possible to free that memory with gpudev.
|
|
GPU memory allocated outside of the gpudev library
|
|
(e.g. with GPU-specific library) cannot be freed by the gpudev library.
|
|
|
|
Memory Registration
|
|
~~~~~~~~~~~~~~~~~~~
|
|
|
|
gpudev can register a CPU memory area to make it visible from a GPU device.
|
|
Later, it's also possible to unregister that memory with gpudev.
|
|
CPU memory registered outside of the gpudev library
|
|
(e.g. with GPU specific library) cannot be unregistered by the gpudev library.
|
|
|
|
Memory Barrier
|
|
~~~~~~~~~~~~~~
|
|
|
|
Some GPU drivers may need, under certain conditions,
|
|
to enforce the coherency of external devices writes (e.g. NIC receiving packets)
|
|
into the GPU memory.
|
|
gpudev abstracts and exposes this capability.
|
|
|
|
Communication Flag
|
|
~~~~~~~~~~~~~~~~~~
|
|
|
|
Considering an application with some GPU task
|
|
that's waiting to receive a signal from the CPU
|
|
to move forward with the execution.
|
|
The communication flag allocates a CPU memory GPU-visible ``uint32_t`` flag
|
|
that can be used by the CPU to communicate with a GPU task.
|
|
|
|
Communication list
|
|
~~~~~~~~~~~~~~~~~~
|
|
|
|
By default, DPDK pulls free mbufs from a mempool to receive packets.
|
|
Best practice, especially in a multithreaded application,
|
|
is to no make any assumption on which mbufs will be used
|
|
to receive the next bursts of packets.
|
|
Considering an application with a GPU memory mempool
|
|
attached to a receive queue having some task waiting on the GPU
|
|
to receive a new burst of packets to be processed,
|
|
there is the need to communicate from the CPU
|
|
the list of mbuf payload addresses where received packet have been stored.
|
|
The ``rte_gpu_comm_*()`` functions are responsible to create a list of packets
|
|
that can be populated with receive mbuf payload addresses
|
|
and communicated to the task running on the GPU.
|