From 3e2297140ca17ff97bda235b2b5962ebf185953f Mon Sep 17 00:00:00 2001 From: Darek Stojaczyk Date: Mon, 29 Oct 2018 08:02:30 +0100 Subject: [PATCH] doc: describe dynamic memory management Change-Id: I0ffa5a8ea0ffb46113c6a000546a445784ce7b2f Signed-off-by: Darek Stojaczyk Reviewed-on: https://review.gerrithub.io/431094 Reviewed-by: Tomasz Zawadzki Reviewed-by: Jim Harris Reviewed-by: Ben Walker Tested-by: SPDK CI Jenkins Chandler-Test-Pool: SPDK Automated Test System --- CHANGELOG.md | 13 +++++++++++++ doc/applications.md | 5 +++++ doc/memory.md | 5 +---- doc/nvme.md | 4 ++++ doc/nvmf.md | 12 ++++++++++++ doc/virtio.md | 4 ---- include/spdk/env.h | 29 ++++++++++++++--------------- 7 files changed, 49 insertions(+), 23 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 9f4a27ff3d..7f2ad4e137 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -51,6 +51,19 @@ A new structure spdk_mem_map_ops has been introduced to hold memory map related callbacks. This structure is now passed as the second argument of spdk_mem_map_alloc in lieu of the notify callback. +### DPDK 18.08 + +The DPDK submodule has been updated to the DPDK 18.08 release. SPDK will now automatically +utilize DPDK's dynamic memory management with DPDK versions >= 18.05.1. + +Hugepages can be still reserved with `[-s|--mem-size ]` option at application startup, +but once we use them all up, instead of failing user allocations with -ENOMEM, we'll try +to dynamically reserve even more. This allows starting SPDK with `--mem-size 0` and using +only as many hugepages as it is really needed. + +Due to this change, the memory buffers returned by `spdk_*malloc()` are no longer guaranteed +to be physically contiguous. + ### iscsi target Parameter names of `set_iscsi_options` and `get_iscsi_global_params` RPC diff --git a/doc/applications.md b/doc/applications.md index e363c04543..5779bc82b0 100644 --- a/doc/applications.md +++ b/doc/applications.md @@ -112,6 +112,11 @@ reserve memory from all available hugetlbfs mounts, starting with the one with the highest page size. This option accepts a number of bytes with a possible binary prefix, e.g. 1024, 1024M, 1G. The default unit is megabyte. +Starting with DPDK 18.05.1, it's possible to reserve hugepages at runtime, meaning +that SPDK application can be started with 0 pre-reserved memory. Unlike hugepages +pre-reserved at the application startup, the hugepages reserved at runtime will be +released to the system as soon as they're no longer used. + ### Disable PCI access {#cmd_arg_disable_pci_access} If SPDK is run with PCI access disabled it won't detect any PCI devices. This diff --git a/doc/memory.md b/doc/memory.md index abe81ffaf7..c9109b20eb 100644 --- a/doc/memory.md +++ b/doc/memory.md @@ -85,10 +85,7 @@ allocating `hugepages` (by default, 2MiB). The Linux kernel treats hugepages differently than regular 4KiB pages. Specifically, the operating system will never change their physical location. This is not by intent, and so things could change in future versions, but it is true today and has been for a number -of years (see the later section on the IOMMU for a future-proof solution). DPDK -goes through great pains to allocate hugepages such that it can string together -the longest runs of physical pages possible, such that it can accommodate -physically contiguous allocations larger than a single page. +of years (see the later section on the IOMMU for a future-proof solution). With this explanation, hopefully it is now clear why all data buffers passed to SPDK must be allocated using spdk_dma_malloc() or its siblings. The buffers diff --git a/doc/nvme.md b/doc/nvme.md index 3565156d71..f4033406d1 100644 --- a/doc/nvme.md +++ b/doc/nvme.md @@ -195,6 +195,10 @@ single NVM subsystem directly, the NVMe library will call `probe_cb` for just that subsystem; this allows the user to skip the discovery step and connect directly to a subsystem with a known address. +## RDMA Limitations + +Please refer to NVMe-oF target's @ref nvmf_rdma_limitations + # NVMe Multi Process {#nvme_multi_process} This capability enables the SPDK NVMe driver to support multiple processes accessing the diff --git a/doc/nvmf.md b/doc/nvmf.md index 0c9c74cc81..f682b5b1c8 100644 --- a/doc/nvmf.md +++ b/doc/nvmf.md @@ -224,3 +224,15 @@ nvme disconnect -n "nqn.2016-06.io.spdk:cnode1" SPDK has a tracing framework for capturing low-level event information at runtime. @ref nvmf_tgt_tracepoints enable analysis of both performance and application crashes. + +## RDMA Limitations {#nvmf_rdma_limitations} + +As RDMA NICs put a limitation on the number of memory regions registered, the SPDK NVMe-oF +target application may eventually start failing to allocate more DMA-able memory. This is +an imperfection of the DPDK dynamic memory management and is most likely to occur with too +many 2MB hugepages reserved at runtime. Some of our NICs report as many as 2048 for the +maximum number of memory regions, meaning that exactly that many pages can be allocated. +With 2MB hugepages, this gives us a 4GB memory limit. It can be overcome by using 1GB +hugepages or by pre-reserving memory at application startup with `--mem-size` or `-s` +option. All pre-reserved memory will be registered as a single region, but won't be +returned to the system until the SPDK application is terminated. diff --git a/doc/virtio.md b/doc/virtio.md index 2836773541..7533565182 100644 --- a/doc/virtio.md +++ b/doc/virtio.md @@ -27,7 +27,3 @@ uses one file per hugepage by default. So *by default* this makes SPDK Virtio pr with only 1GB hugepages. To run an SPDK app using Virtio initiator with 2MB hugepages it is required to pass '-g' command-line option . This forces DPDK to create a single non-physically-contiguous hugetlbfs file for all its memory. - -This functionality requires latest DPDK changes that are officially landing in DPDK -18.05, but have been also backported to spdk-18.02 branch of our internal DPDK fork -which is currently used as a default git submodule for SPDK. diff --git a/include/spdk/env.h b/include/spdk/env.h index 307499b72f..72d6c693bb 100644 --- a/include/spdk/env.h +++ b/include/spdk/env.h @@ -90,8 +90,8 @@ struct spdk_env_opts { }; /** - * Allocate dma/sharable memory based on a given dma_flg. It is a physically - * contiguous memory buffer with the given size, alignment and socket id. + * Allocate dma/sharable memory based on a given dma_flg. It is a memory buffer + * with the given size, alignment and socket id. * * \param size Size in bytes. * \param align Alignment value for the allocated memory. If '0', the allocated @@ -110,9 +110,8 @@ struct spdk_env_opts { void *spdk_malloc(size_t size, size_t align, uint64_t *phys_addr, int socket_id, uint32_t flags); /** - * Allocate dma/sharable memory based on a given dma_flg. It is a physically - * contiguous memory buffer with the given size, alignment and socket id. - * Also, the buffer will be zeroed. + * Allocate dma/sharable memory based on a given dma_flg. It is a memory buffer + * with the given size, alignment and socket id. Also, the buffer will be zeroed. * * \param size Size in bytes. * \param align Alignment value for the allocated memory. If '0', the allocated @@ -153,8 +152,7 @@ void spdk_env_opts_init(struct spdk_env_opts *opts); int spdk_env_init(const struct spdk_env_opts *opts); /** - * Allocate a pinned, physically contiguous memory buffer with the given size - * and alignment. + * Allocate a pinned memory buffer with the given size and alignment. * * \param size Size in bytes. * \param align Alignment value for the allocated memory. If '0', the allocated @@ -169,8 +167,7 @@ int spdk_env_init(const struct spdk_env_opts *opts); void *spdk_dma_malloc(size_t size, size_t align, uint64_t *phys_addr); /** - * Allocate a pinned, physically contiguous memory buffer with the given size, - * alignment and socket id. + * Allocate a pinned, memory buffer with the given size, alignment and socket id. * * \param size Size in bytes. * \param align Alignment value for the allocated memory. If '0', the allocated @@ -187,8 +184,8 @@ void *spdk_dma_malloc(size_t size, size_t align, uint64_t *phys_addr); void *spdk_dma_malloc_socket(size_t size, size_t align, uint64_t *phys_addr, int socket_id); /** - * Allocate a pinned, physically contiguous memory buffer with the given size - * and alignment. The buffer will be zeroed. + * Allocate a pinned memory buffer with the given size and alignment. The buffer + * will be zeroed. * * \param size Size in bytes. * \param align Alignment value for the allocated memory. If '0', the allocated @@ -203,8 +200,8 @@ void *spdk_dma_malloc_socket(size_t size, size_t align, uint64_t *phys_addr, int void *spdk_dma_zmalloc(size_t size, size_t align, uint64_t *phys_addr); /** - * Allocate a pinned, physically contiguous memory buffer with the given size, - * alignment and socket id. The buffer will be zeroed. + * Allocate a pinned memory buffer with the given size, alignment and socket id. + * The buffer will be zeroed. * * \param size Size in bytes. * \param align Alignment value for the allocated memory. If '0', the allocated @@ -247,7 +244,8 @@ void spdk_dma_free(void *buf); /** * Reserve a named, process shared memory zone with the given size, socket_id - * and flags. + * and flags. Unless `SPDK_MEMZONE_NO_IOVA_CONTIG` flag is provided, the returned + * memory will be IOVA contiguous. * * \param name Name to set for this memory zone. * \param len Length in bytes. @@ -261,7 +259,8 @@ void *spdk_memzone_reserve(const char *name, size_t len, int socket_id, unsigned /** * Reserve a named, process shared memory zone with the given size, socket_id, - * flags and alignment. + * flags and alignment. Unless `SPDK_MEMZONE_NO_IOVA_CONTIG` flag is provided, + * the returned memory will be IOVA contiguous. * * \param name Name to set for this memory zone. * \param len Length in bytes.