numam-dpdk

Author	SHA1	Message	Date
Elena Agostini	1fd3de64ff	gpudev: fix page alignment in communication list Memory allocated for CPU mapping the status flag in the communication list should be aligned to the GPU page size, which can be different of CPU page alignment. The GPU page size is added to the GPU info, and is used when creating a communication list. Fixes: `9b8cae4d99` ("gpudev: use CPU mapping in communication list") Signed-off-by: Elena Agostini <eagostini@nvidia.com>	2022-03-09 00:14:55 +01:00
Thomas Monjalon	b403498e14	build: hide local symbols in shared libraries The symbols which are not listed in the version script are exported by default. Adding a local section with a wildcard make non-listed functions and variables as hidden, as it should be in all version.map files. These are the changes done in the shared libraries: - DF .text Base auxiliary_add_device - DF .text Base auxiliary_dev_exists - DF .text Base auxiliary_dev_iterate - DF .text Base auxiliary_insert_device - DF .text Base auxiliary_is_ignored_device - DF .text Base auxiliary_match - DF .text Base auxiliary_on_scan - DF .text Base auxiliary_scan - DO .bss Base auxiliary_bus_logtype - DO .data Base auxiliary_bus - DO .bss Base gpu_logtype There is no impact on regexdev library. Because these local symbols were exported as non-internal in DPDK 21.11, any change in these functions would break the ABI. Exception rules are added for these experimental libraries, so the ABI check will skip them until the next ABI version. A check is added to avoid such miss in future. Fixes: `1afce3086c` ("bus/auxiliary: introduce auxiliary bus") Fixes: `8b8036a66e` ("gpudev: introduce GPU device class library") Cc: stable@dpdk.org Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2022-03-08 15:22:33 +01:00
Elena Agostini	9b8cae4d99	gpudev: use CPU mapping in communication list rte_gpu_mem_cpu_map() exposes a GPU memory area to the CPU. In gpudev communication list this is useful to store the status flag. A communication list status flag allocated on GPU memory and mapped for CPU visibility can be updated by CPU and polled by a GPU workload. The polling operation is more frequent than the CPU update operation. Having the status flag in GPU memory reduces the GPU workload polling latency. If CPU mapping feature is not enabled, status flag resides in CPU memory registered so it's visible from the GPU. To facilitate the interaction with the status flag, this patch provides also the set/get functions for it. Signed-off-by: Elena Agostini <eagostini@nvidia.com>	2022-02-22 20:08:52 +01:00
Elena Agostini	77f40e04d7	gpudev: use device memory pointer for CPU unmap Update rte_gpu_mem_cpu_unmap() header documentation and the test application to use GPU pointer when unmapping. Signed-off-by: Elena Agostini <eagostini@nvidia.com>	2022-02-22 20:04:39 +01:00
Sean Morrissey	30a1de105a	lib: remove unneeded header includes These header includes have been flagged by the iwyu_tool and removed. Signed-off-by: Sean Morrissey <sean.morrissey@intel.com>	2022-02-22 13:10:39 +01:00
Elena Agostini	d69bb47d21	gpudev: expose GPU memory to CPU Enable the possibility to expose a GPU memory area and make it accessible from the CPU. GPU memory has to be allocated via rte_gpu_mem_alloc(). This patch allows the gpudev library to map (and unmap), through the GPU driver, a chunk of GPU memory and to return a memory pointer usable by the CPU to access the GPU memory area. Signed-off-by: Elena Agostini <eagostini@nvidia.com>	2022-02-10 10:06:56 +01:00
Elena Agostini	c8557ed434	gpudev: add alignment for memory allocation Similarly to rte_malloc, rte_gpu_mem_alloc accepts as input the memory alignment size. GPU driver should return GPU memory address aligned with the input value. Signed-off-by: Elena Agostini <eagostini@nvidia.com>	2022-01-21 11:33:25 +01:00
Elena Agostini	579147d7aa	gpudev: remove unnecessary memory barrier Remove unnecessary rte_gpu_wmb from rte_gpu_comm_populate_list_pkts. It causes a performance degradation in case of NVIDIA GPU V100. This change doesn't affect any functionality as the status resides in CPU registered memory. Fixes: `c7ebd65c13` ("gpudev: add communication list") Signed-off-by: Elena Agostini <eagostini@nvidia.com>	2021-11-26 12:26:46 +01:00
Elena Agostini	1674c56dc3	gpudev: manage null parameters in memory functions The gpudev functions free, register and unregister return gracefully if input pointer is NULL or size 0, as API doc was indicating no-op accepted values. CUDA driver checks are removed because redundant with the checks added in gpudev library. Fixes: `e818c4e2bf` ("gpudev: add memory API") Signed-off-by: Elena Agostini <eagostini@nvidia.com>	2021-11-24 09:38:43 +01:00
Elena Agostini	c7ebd65c13	gpudev: add communication list In heterogeneous computing system, processing is not only in the CPU. Some tasks can be delegated to devices working in parallel. When mixing network activity with task processing there may be the need to put in communication the CPU with the device in order to synchronize operations. An example could be a receive-and-process application where CPU is responsible for receiving packets in multiple mbufs and the GPU is responsible for processing the content of those packets. The purpose of this list is to provide a buffer in CPU memory visible from the GPU that can be treated as a circular buffer to let the CPU provide fondamental info of received packets to the GPU. A possible use-case is described below. CPU: - Trigger some task on the GPU - in a loop: - receive a number of packets - provide packets info to the GPU GPU: - Do some pre-processing - Wait to receive a new set of packet to be processed Layout of a communication list would be: ------- \| 0 \| => pkt_list \| status \| \| #pkts \| ------- \| 1 \| => pkt_list \| status \| \| #pkts \| ------- \| 2 \| => pkt_list \| status \| \| #pkts \| ------- \| .... \| => pkt_list ------- Signed-off-by: Elena Agostini <eagostini@nvidia.com>	2021-11-08 17:20:53 +01:00
Elena Agostini	f56160a255	gpudev: add communication flag In heterogeneous computing system, processing is not only in the CPU. Some tasks can be delegated to devices working in parallel. When mixing network activity with task processing there may be the need to put in communication the CPU with the device in order to synchronize operations. The purpose of this flag is to allow the CPU and the GPU to exchange ACKs. A possible use-case is described below. CPU: - Trigger some task on the GPU - Prepare some data - Signal to the GPU the data is ready updating the communication flag GPU: - Do some pre-processing - Wait for more data from the CPU polling on the communication flag - Consume the data prepared by the CPU Signed-off-by: Elena Agostini <eagostini@nvidia.com>	2021-11-08 17:20:53 +01:00
Elena Agostini	2d61b429cf	gpudev: add memory barrier Add a function for the application to ensure the coherency of the writes executed by another device into the GPU memory. Signed-off-by: Elena Agostini <eagostini@nvidia.com>	2021-11-08 17:20:53 +01:00
Elena Agostini	e818c4e2bf	gpudev: add memory API In heterogeneous computing system, processing is not only in the CPU. Some tasks can be delegated to devices working in parallel. Such workload distribution can be achieved by sharing some memory. As a first step, the features are focused on memory management. A function allows to allocate memory inside the device, or in the main (CPU) memory while making it visible for the device. This memory may be used to save packets or for synchronization data. The next step should focus on GPU processing task control. Signed-off-by: Elena Agostini <eagostini@nvidia.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2021-11-08 17:20:53 +01:00
Thomas Monjalon	a9af048aba	gpudev: support multi-process The device data shared between processes are moved in a struct allocated in a shared memory (a new memzone for all GPUs). The main struct rte_gpu references the shared memory via the pointer mpshared. The API function rte_gpu_attach() is added to attach a device from the secondary process. The function rte_gpu_allocate() can be used only by primary process. Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2021-11-08 17:20:53 +01:00
Thomas Monjalon	82e5f6b658	gpudev: add child device representing a device context The computing device may operate in some isolated contexts. Memory and processing are isolated in a silo represented by a child device. The context is provided as an opaque by the caller of rte_gpu_add_child(). Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2021-11-08 17:20:52 +01:00
Thomas Monjalon	18cb075631	gpudev: add event notification Callback functions may be registered for a device event. Callback management is per-process and not thread-safe. The events RTE_GPU_EVENT_NEW and RTE_GPU_EVENT_DEL are notified respectively after creation and before removal of a device, as part of the library functions. Some future events may be emitted from drivers. Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2021-11-08 17:20:52 +01:00
Elena Agostini	8b8036a66e	gpudev: introduce GPU device class library In heterogeneous computing system, processing is not only in the CPU. Some tasks can be delegated to devices working in parallel. The new library gpudev is for dealing with GPGPU computing devices from a DPDK application running on the CPU. The infrastructure is prepared to welcome drivers in drivers/gpu/. Signed-off-by: Elena Agostini <eagostini@nvidia.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>	2021-11-08 17:20:52 +01:00

17 Commits