24c77594e0
To enable the gpudev rte_gpu_mem_cpu_map feature to expose GPU memory to the CPU, the GPU CUDA driver library needs the GDRCopy library and driver. If DPDK is built without GDRCopy, the GPU CUDA driver returns error if the is invoked rte_gpu_mem_cpu_map. All the others GPU CUDA driver functionalities are not affected by the absence of GDRCopy, thus this is an optional functionality that can be enabled in the GPU CUDA driver. CUDA driver documentation has been updated accordingly. Signed-off-by: Elena Agostini <eagostini@nvidia.com>
205 lines
7.3 KiB
ReStructuredText
205 lines
7.3 KiB
ReStructuredText
.. SPDX-License-Identifier: BSD-3-Clause
|
|
Copyright (c) 2021 NVIDIA Corporation & Affiliates
|
|
|
|
CUDA GPU driver
|
|
===============
|
|
|
|
The CUDA GPU driver library (**librte_gpu_cuda**) provides support for NVIDIA GPUs.
|
|
Information and documentation about these devices can be found on the
|
|
`NVIDIA website <http://www.nvidia.com>`_. Help is also provided by the
|
|
`NVIDIA CUDA Toolkit developer zone <https://docs.nvidia.com/cuda>`_.
|
|
|
|
Build dependencies
|
|
------------------
|
|
|
|
The CUDA GPU driver library has an header-only dependency on ``cuda.h`` and ``cudaTypedefs.h``.
|
|
To get these headers there are two options:
|
|
|
|
- Install `CUDA Toolkit <https://developer.nvidia.com/cuda-toolkit>`_
|
|
(either regular or stubs installation).
|
|
- Download these two headers from this `CUDA headers
|
|
<https://gitlab.com/nvidia/headers/cuda-individual/cudart>`_ repository.
|
|
|
|
You need to indicate to meson where CUDA headers files are through the CFLAGS variable.
|
|
Three ways:
|
|
|
|
- Set ``export CFLAGS=-I/usr/local/cuda/include`` before building
|
|
- Add CFLAGS in the meson command line ``CFLAGS=-I/usr/local/cuda/include meson build``
|
|
- Add the ``-Dc_args`` in meson command line ``meson build -Dc_args=-I/usr/local/cuda/include``
|
|
|
|
If headers are not found, the CUDA GPU driver library is not built.
|
|
|
|
CPU map GPU memory
|
|
~~~~~~~~~~~~~~~~~~
|
|
|
|
To enable this gpudev feature (i.e. implement the ``rte_gpu_mem_cpu_map``),
|
|
you need the `GDRCopy <https://github.com/NVIDIA/gdrcopy>`_ library and driver
|
|
installed on your system.
|
|
|
|
A quick recipe to download, build and run GDRCopy library and driver:
|
|
|
|
.. code-block:: console
|
|
|
|
$ git clone https://github.com/NVIDIA/gdrcopy.git
|
|
$ make
|
|
$ # make install to install GDRCopy library system wide
|
|
$ # Launch gdrdrv kernel module on the system
|
|
$ sudo ./insmod.sh
|
|
|
|
You need to indicate to meson where GDRCopy headers files are as in case of CUDA headers.
|
|
An example would be:
|
|
|
|
.. code-block:: console
|
|
|
|
$ meson build -Dc_args="-I/usr/local/cuda/include -I/path/to/gdrcopy/include"
|
|
|
|
If headers are not found, the CUDA GPU driver library is built without the CPU map capability
|
|
and will return error if the application invokes the gpudev ``rte_gpu_mem_cpu_map`` function.
|
|
|
|
|
|
CUDA Shared Library
|
|
-------------------
|
|
|
|
To avoid any system configuration issue, the CUDA API **libcuda.so** shared library
|
|
is not linked at building time because of a Meson bug that looks
|
|
for `cudart` module even if the `meson.build` file only requires default `cuda` module.
|
|
|
|
**libcuda.so** is loaded at runtime in the ``cuda_gpu_probe`` function through ``dlopen``
|
|
when the very first GPU is detected.
|
|
If CUDA installation resides in a custom directory,
|
|
the environment variable ``CUDA_PATH_L`` should specify where ``dlopen``
|
|
can look for **libcuda.so**.
|
|
|
|
All CUDA API symbols are loaded at runtime as well.
|
|
For this reason, to build the CUDA driver library,
|
|
no need to install the CUDA library.
|
|
|
|
CPU map GPU memory
|
|
~~~~~~~~~~~~~~~~~~
|
|
|
|
Similarly to CUDA shared library, if the **libgdrapi.so** shared library
|
|
is not installed in default locations (e.g. /usr/local/lib),
|
|
you can use the variable ``GDRCOPY_PATH_L``.
|
|
|
|
As an example, to enable the CPU map feature sanity check,
|
|
run the ``app/test-gpudev`` application with:
|
|
|
|
.. code-block:: console
|
|
|
|
$ sudo CUDA_PATH_L=/path/to/libcuda GDRCOPY_PATH_L=/path/to/libgdrapi ./build/app/dpdk-test-gpudev
|
|
|
|
Additionally, the ``gdrdrv`` kernel module built with the GDRCopy project
|
|
has to be loaded on the system:
|
|
|
|
.. code-block:: console
|
|
|
|
$ lsmod | egrep gdrdrv
|
|
gdrdrv 20480 0
|
|
nvidia 35307520 19 nvidia_uvm,nv_peer_mem,gdrdrv,nvidia_modeset
|
|
|
|
|
|
Design
|
|
------
|
|
|
|
**librte_gpu_cuda** relies on CUDA Driver API (no need for CUDA Runtime API).
|
|
|
|
Goal of this driver library is not to provide a wrapper for the whole CUDA Driver API.
|
|
Instead, the scope is to implement the generic features of gpudev API.
|
|
For a CUDA application, integrating the gpudev library functions
|
|
using the CUDA driver library is quite straightforward
|
|
and doesn't create any compatibility problem.
|
|
|
|
Initialization
|
|
~~~~~~~~~~~~~~
|
|
|
|
During initialization, CUDA driver library detects NVIDIA physical GPUs
|
|
on the system or specified via EAL device options (e.g. ``-a b6:00.0``).
|
|
The driver initializes the CUDA driver environment through ``cuInit(0)`` function.
|
|
For this reason, it's required to set any CUDA environment configuration before
|
|
calling ``rte_eal_init`` function in the DPDK application.
|
|
|
|
If the CUDA driver environment has been already initialized, the ``cuInit(0)``
|
|
in CUDA driver library has no effect.
|
|
|
|
CUDA Driver sub-contexts
|
|
~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
After initialization, a CUDA application can create multiple sub-contexts
|
|
on GPU physical devices.
|
|
Through gpudev library, is possible to register these sub-contexts
|
|
in the CUDA driver library as child devices having as parent a GPU physical device.
|
|
|
|
CUDA driver library also supports `MPS
|
|
<https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf>`__.
|
|
|
|
GPU memory management
|
|
~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The CUDA driver library maintains a table of GPU memory addresses allocated
|
|
and CPU memory addresses registered associated to the input CUDA context.
|
|
Whenever the application tried to deallocate or deregister a memory address,
|
|
if the address is not in the table the CUDA driver library will return an error.
|
|
|
|
Features
|
|
--------
|
|
|
|
- Register new child devices aka new CUDA Driver contexts.
|
|
- Allocate memory on the GPU.
|
|
- Register CPU memory to make it visible from GPU.
|
|
|
|
Minimal requirements
|
|
--------------------
|
|
|
|
Minimal requirements to enable the CUDA driver library are:
|
|
|
|
- NVIDIA GPU Ampere or Volta
|
|
- CUDA 11.4 Driver API or newer
|
|
|
|
`GPUDirect RDMA Technology <https://docs.nvidia.com/cuda/gpudirect-rdma/index.html>`_
|
|
allows compatible network cards (e.g. Mellanox) to directly send and receive packets
|
|
using GPU memory instead of additional memory copies through the CPU system memory.
|
|
To enable this technology, system requirements are:
|
|
|
|
- `nvidia-peermem <https://docs.nvidia.com/cuda/gpudirect-rdma/index.html#nvidia-peermem>`_
|
|
module running on the system;
|
|
- Mellanox network card ConnectX-5 or newer (BlueField models included);
|
|
- DPDK mlx5 PMD enabled;
|
|
- To reach the best performance, an additional PCIe switch between GPU and NIC is recommended.
|
|
|
|
Limitations
|
|
-----------
|
|
|
|
Supported only on Linux.
|
|
|
|
Supported GPUs
|
|
--------------
|
|
|
|
The following NVIDIA GPU devices are supported by this CUDA driver library:
|
|
|
|
- NVIDIA A100 80GB PCIe
|
|
- NVIDIA A100 40GB PCIe
|
|
- NVIDIA A30 24GB
|
|
- NVIDIA A10 24GB
|
|
- NVIDIA V100 32GB PCIe
|
|
- NVIDIA V100 16GB PCIe
|
|
|
|
External references
|
|
-------------------
|
|
|
|
A good example of how to use the GPU CUDA driver library through the gpudev library
|
|
is the l2fwd-nv application that can be found `here <https://github.com/NVIDIA/l2fwd-nv>`_.
|
|
|
|
The application is based on vanilla DPDK example l2fwd
|
|
and is enhanced with GPU memory managed through gpudev library
|
|
and CUDA to launch the swap of packets MAC addresses workload on the GPU.
|
|
|
|
l2fwd-nv is not intended to be used for performance
|
|
(testpmd is the good candidate for this).
|
|
The goal is to show different use-cases about how a CUDA application can use DPDK to:
|
|
|
|
- Allocate memory on GPU device using gpudev library.
|
|
- Use that memory to create an external GPU memory mempool.
|
|
- Receive packets directly in GPU memory.
|
|
- Coordinate the workload on the GPU with the network and CPU activity to receive packets.
|
|
- Send modified packets directly from the GPU memory.
|